Re: hisax isdn card (Sedlbauer Speed Fax+) does not get an interrupt
esetting card <06>2007 May 30 14:21:40 cbs kern: Sedlbauer Speed Fax +: IRQ 11 count 10 <04>2007 May 30 14:21:40 cbs kern: Sedlbauer Speed Fax +: IRQ(11) getting no interrupts during init 1 <06>2007 May 30 14:21:40 cbs kern: Sedlbauer: resetting card <06>2007 May 30 14:21:40 cbs kern: Sedlbauer: resetting card <06>2007 May 30 14:21:40 cbs kern: Sedlbauer Speed Fax +: IRQ 11 count 10 <04>2007 May 30 14:21:40 cbs kern: Sedlbauer Speed Fax +: IRQ(11) getting no interrupts during init 2 <06>2007 May 30 14:21:40 cbs kern: Sedlbauer: resetting card <06>2007 May 30 14:21:40 cbs kern: Sedlbauer: resetting card <06>2007 May 30 14:21:40 cbs kern: Sedlbauer Speed Fax +: IRQ 11 count 10 <04>2007 May 30 14:21:40 cbs kern: Sedlbauer Speed Fax +: IRQ(11) getting no interrupts during init 3 <06>2007 May 30 14:21:40 cbs kern: Sedlbauer: resetting card <04>2007 May 30 14:21:40 cbs kern: HiSax: Card Sedlbauer Speed Fax + not installed ! == While the output seems to suggest a hardware problem, the same system loads the hisax driver perfectly on recent 2.4 kernels. We tried several kernel versions, up to 2.6.21.3 Any hints are appreciated :) Likely a driver problem - the device is using IRQ 11, but the driver never actually registered a handler for that interrupt (it's not in the list of handlers, only USB is). Maybe retrieving the interrupt before pci_enable_device? (I haven't looked at the code in question.) -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Document the hotplug code is incompatible with x86 irq handling
Eric W. Biederman wrote: I just realized that except for doing the code review and noticing that the current cpu hotplug code is fundamentally incompatible with x86 I haven't done anything about it. So here is my patch to document what is wrong. The current cpu hotplug code requires irqs to be migrated from a cpu outside of irq context. On x86 ioapics simply do not support this, making the code unfixable without major redesign of the generic cpu hotplug code. So this patch makes CPU_HOTPLUG on x86 depend on CONFIG_BROKEN and adds a WARN_ON so people that do enable it are not in doubt about which part of the code is broken, even if it does work for them. Signed-off-by: Eric W. Biederman <[EMAIL PROTECTED]> I don't think this is useful, though the code may be problematic, this patch will break suspend on all SMP machines with an existing config, which is a major regression.. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: MCP55 NCQ problem?
: ata2.00: exception Emask 0x0 SAct 0x6 SErr 0x200 action 0x2 frozen May 29 12:49:23 localhost kernel: ata2: SError: {UnrecFIS } May 29 12:49:23 localhost kernel: ata2.00: cmd 61/40:08:3f:71:fa/00:00:07:00:00/40 tag 1 cdb 0x0 data 32768 out May 29 12:49:23 localhost kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) May 29 12:49:23 localhost kernel: ata2.00: status: {DRDY } May 29 12:49:23 localhost kernel: ata2.00: cmd 61/10:10:7f:e7:fa/00:00:07:00:00/40 tag 2 cdb 0x0 data 8192 out May 29 12:49:23 localhost kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) May 29 12:49:23 localhost kernel: ata2.00: status: {DRDY } May 29 12:49:23 localhost kernel: ata2: hard resetting port May 29 12:49:24 localhost kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) May 29 12:49:24 localhost kernel: ata2.00: ata_hpa_resize: sectors = 625142448, hpa_sectors = 625142448 May 29 12:49:24 localhost kernel: ata2.00: ata_hpa_resize: sectors = 625142448, hpa_sectors = 625142448 May 29 12:49:24 localhost kernel: ata2.00: configured for UDMA/133 May 29 12:49:24 localhost kernel: ata2: EH pending after completion, repeating EH (cnt=4) May 29 12:49:24 localhost kernel: ata2: EH complete May 29 12:49:24 localhost kernel: sd 1:0:0:0: [sdb] 625142448 512-byte hardware sectors (320073 MB) May 29 12:49:24 localhost kernel: sd 1:0:0:0: [sdb] Write Protect is off May 29 12:49:24 localhost kernel: sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA May 29 12:50:24 localhost kernel: ata2.00: NCQ disabled due to excessive errors May 29 12:50:24 localhost kernel: ata2.00: exception Emask 0x0 SAct 0x6 SErr 0x0 action 0x2 frozen May 29 12:50:24 localhost kernel: ata2.00: cmd 61/10:08:c7:88:b8/00:00:0f:00:00/40 tag 1 cdb 0x0 data 8192 out May 29 12:50:24 localhost kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) May 29 12:50:24 localhost kernel: ata2.00: status: {DRDY } May 29 12:50:24 localhost kernel: ata2.00: cmd 61/10:10:9f:8a:b8/00:00:0f:00:00/40 tag 2 cdb 0x0 data 8192 out May 29 12:50:24 localhost kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) May 29 12:50:24 localhost kernel: ata2.00: status: {DRDY } May 29 12:50:24 localhost kernel: ata2: hard resetting port May 29 12:50:25 localhost kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) May 29 12:50:25 localhost kernel: ata2.00: ata_hpa_resize: sectors = 625142448, hpa_sectors = 625142448 May 29 12:50:25 localhost kernel: ata2.00: ata_hpa_resize: sectors = 625142448, hpa_sectors = 625142448 May 29 12:50:25 localhost kernel: ata2.00: configured for UDMA/133 May 29 12:50:25 localhost kernel: ata2: EH pending after completion, repeating EH (cnt=4) May 29 12:50:25 localhost kernel: ata2: EH complete May 29 12:50:25 localhost kernel: sd 1:0:0:0: [sdb] 625142448 512-byte hardware sectors (320073 MB) May 29 12:50:25 localhost kernel: sd 1:0:0:0: [sdb] Write Protect is off May 29 12:50:25 localhost kernel: sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA After the system disabled NCQ there weren't any more more ata resets and the disks were working OK. Strange thing is, when I have run PostgreSQL pgbench with 25 clients on 2.6.22-rc2 + cfs-v13 + swncq (which clearly showed advanced transfer rate) I had no such problems. The PostgreSQL DB was also on ata2.00. Best regards, Zoltán Böszörményi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Case: 7454422: Re: Kernel 2.6.21.3 does not work with 8GB of RAM on Intel 965WH motherboards. (FULL DMESG)
Justin Piszcz wrote: On Thu, 31 May 2007, Parag Warudkar wrote: Robert Hancock wrote: I think that mem=8832M would work as well, to make the kernel use only the memory that is marked cacheable. (It looks like this parameter takes the highest memory address we want the kernel to use, not the highest memory amount.) Yep, and that would be much easier too. I am curious though as this seems to be somewhat common a problem, could we make the kernel analyze which memory is not cacheable (it already knows this via MTRR) and not use that portion for anything? Plus may be warn the user to contact their BIOS vendor to correct the problem? I think that would be possible - even if the kernel knows late that the memory was uncached we could migrate those pages in that region to someplace else? Parag That is an excellent question and I wonder the same thing. I also had this problem when I only used 4GB of ram and upgraded the (another motherboard, I have two) past version 1666P and I had no idea what was going on other than the BIOS did not work correctly. In this case however it worked with 4GB with bios version 1612P but not with 8GB. Is this the case of a buggy BIOS for the 965 chipset or do Intel boards have a lot of issues? We could conceivably generate a warning if the MTRRs don't map all of the physical memory as write-back. Actually, conceivably we could actually go and fix up the MTRRs if we found them to be wrong according to the E820 memory map. That would be more complicated, however. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Case: 7454422: Re: Kernel 2.6.21.3 does not work with 8GB of RAM on Intel 965WH motherboards. (FULL DMESG)
Justin Piszcz wrote: On Thu, 31 May 2007, Parag Warudkar wrote: Robert Hancock wrote: I think that mem=8832M would work as well, to make the kernel use only the memory that is marked cacheable. (It looks like this parameter takes the highest memory address we want the kernel to use, not the highest memory amount.) Yep, and that would be much easier too. I am curious though as this seems to be somewhat common a problem, could we make the kernel analyze which memory is not cacheable (it already knows this via MTRR) and not use that portion for anything? Plus may be warn the user to contact their BIOS vendor to correct the problem? I think that would be possible - even if the kernel knows late that the memory was uncached we could migrate those pages in that region to someplace else? Parag That is an excellent question and I wonder the same thing. I also had this problem when I only used 4GB of ram and upgraded the (another motherboard, I have two) past version 1666P and I had no idea what was going on other than the BIOS did not work correctly. In this case however it worked with 4GB with bios version 1612P but not with 8GB. Is this the case of a buggy BIOS for the 965 chipset or do Intel boards have a lot of issues? We could conceivably generate a warning if the MTRRs don't map all of the physical memory as write-back. Actually, conceivably we could actually go and fix up the MTRRs if we found them to be wrong according to the E820 memory map. That would be more complicated, however. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: MCP55 NCQ problem?
: ata2.00: exception Emask 0x0 SAct 0x6 SErr 0x200 action 0x2 frozen May 29 12:49:23 localhost kernel: ata2: SError: {UnrecFIS } May 29 12:49:23 localhost kernel: ata2.00: cmd 61/40:08:3f:71:fa/00:00:07:00:00/40 tag 1 cdb 0x0 data 32768 out May 29 12:49:23 localhost kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) May 29 12:49:23 localhost kernel: ata2.00: status: {DRDY } May 29 12:49:23 localhost kernel: ata2.00: cmd 61/10:10:7f:e7:fa/00:00:07:00:00/40 tag 2 cdb 0x0 data 8192 out May 29 12:49:23 localhost kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) May 29 12:49:23 localhost kernel: ata2.00: status: {DRDY } May 29 12:49:23 localhost kernel: ata2: hard resetting port May 29 12:49:24 localhost kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) May 29 12:49:24 localhost kernel: ata2.00: ata_hpa_resize: sectors = 625142448, hpa_sectors = 625142448 May 29 12:49:24 localhost kernel: ata2.00: ata_hpa_resize: sectors = 625142448, hpa_sectors = 625142448 May 29 12:49:24 localhost kernel: ata2.00: configured for UDMA/133 May 29 12:49:24 localhost kernel: ata2: EH pending after completion, repeating EH (cnt=4) May 29 12:49:24 localhost kernel: ata2: EH complete May 29 12:49:24 localhost kernel: sd 1:0:0:0: [sdb] 625142448 512-byte hardware sectors (320073 MB) May 29 12:49:24 localhost kernel: sd 1:0:0:0: [sdb] Write Protect is off May 29 12:49:24 localhost kernel: sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA May 29 12:50:24 localhost kernel: ata2.00: NCQ disabled due to excessive errors May 29 12:50:24 localhost kernel: ata2.00: exception Emask 0x0 SAct 0x6 SErr 0x0 action 0x2 frozen May 29 12:50:24 localhost kernel: ata2.00: cmd 61/10:08:c7:88:b8/00:00:0f:00:00/40 tag 1 cdb 0x0 data 8192 out May 29 12:50:24 localhost kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) May 29 12:50:24 localhost kernel: ata2.00: status: {DRDY } May 29 12:50:24 localhost kernel: ata2.00: cmd 61/10:10:9f:8a:b8/00:00:0f:00:00/40 tag 2 cdb 0x0 data 8192 out May 29 12:50:24 localhost kernel: res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout) May 29 12:50:24 localhost kernel: ata2.00: status: {DRDY } May 29 12:50:24 localhost kernel: ata2: hard resetting port May 29 12:50:25 localhost kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 300) May 29 12:50:25 localhost kernel: ata2.00: ata_hpa_resize: sectors = 625142448, hpa_sectors = 625142448 May 29 12:50:25 localhost kernel: ata2.00: ata_hpa_resize: sectors = 625142448, hpa_sectors = 625142448 May 29 12:50:25 localhost kernel: ata2.00: configured for UDMA/133 May 29 12:50:25 localhost kernel: ata2: EH pending after completion, repeating EH (cnt=4) May 29 12:50:25 localhost kernel: ata2: EH complete May 29 12:50:25 localhost kernel: sd 1:0:0:0: [sdb] 625142448 512-byte hardware sectors (320073 MB) May 29 12:50:25 localhost kernel: sd 1:0:0:0: [sdb] Write Protect is off May 29 12:50:25 localhost kernel: sd 1:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA After the system disabled NCQ there weren't any more more ata resets and the disks were working OK. Strange thing is, when I have run PostgreSQL pgbench with 25 clients on 2.6.22-rc2 + cfs-v13 + swncq (which clearly showed advanced transfer rate) I had no such problems. The PostgreSQL DB was also on ata2.00. Best regards, Zoltán Böszörményi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] x86: Document the hotplug code is incompatible with x86 irq handling
Eric W. Biederman wrote: I just realized that except for doing the code review and noticing that the current cpu hotplug code is fundamentally incompatible with x86 I haven't done anything about it. So here is my patch to document what is wrong. The current cpu hotplug code requires irqs to be migrated from a cpu outside of irq context. On x86 ioapics simply do not support this, making the code unfixable without major redesign of the generic cpu hotplug code. So this patch makes CPU_HOTPLUG on x86 depend on CONFIG_BROKEN and adds a WARN_ON so people that do enable it are not in doubt about which part of the code is broken, even if it does work for them. Signed-off-by: Eric W. Biederman [EMAIL PROTECTED] I don't think this is useful, though the code may be problematic, this patch will break suspend on all SMP machines with an existing config, which is a major regression.. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: hisax isdn card (Sedlbauer Speed Fax+) does not get an interrupt
062007 May 30 14:21:40 cbs kern: Sedlbauer Speed Fax +: IRQ 11 count 10 042007 May 30 14:21:40 cbs kern: Sedlbauer Speed Fax +: IRQ(11) getting no interrupts during init 2 062007 May 30 14:21:40 cbs kern: Sedlbauer: resetting card 062007 May 30 14:21:40 cbs kern: Sedlbauer: resetting card 062007 May 30 14:21:40 cbs kern: Sedlbauer Speed Fax +: IRQ 11 count 10 042007 May 30 14:21:40 cbs kern: Sedlbauer Speed Fax +: IRQ(11) getting no interrupts during init 3 062007 May 30 14:21:40 cbs kern: Sedlbauer: resetting card 042007 May 30 14:21:40 cbs kern: HiSax: Card Sedlbauer Speed Fax + not installed ! == While the output seems to suggest a hardware problem, the same system loads the hisax driver perfectly on recent 2.4 kernels. We tried several kernel versions, up to 2.6.21.3 Any hints are appreciated :) Likely a driver problem - the device is using IRQ 11, but the driver never actually registered a handler for that interrupt (it's not in the list of handlers, only USB is). Maybe retrieving the interrupt before pci_enable_device? (I haven't looked at the code in question.) -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Compact Flash performance...
Jeff Garzik wrote: Mark Lord wrote: To maximize throughput, some kind of host-queuing would be needed, or just have the driver sit in a tight loop, starting the next I/O immediately when the previous one finishes. Linux isn't that quick (yet). I was talking on IRC with Tejun just recently. There are several controllers (and/or situations) like this, where some amount of host queueing would permit greater throughput, even when NCQ is not supported. sata_sx4 is the most dramatic example, where host queueing could potentially increase speed by a factor of 10 or more, since it is penalized by an awful two-irq-per-command (w/ a per-host bottleneck to boot) setup. Silicon Image has a command buffer. And overall, I designed -qc_prep() hook separate from -qc_issue() to enable the prepartion of multiple commands such that it only takes a simple go I/O to start a transaction, immediately after the previous one ends. Jeff Theoretically NVIDIA nForce4 ADMA could likely do this as well, as it seems to allow chaining up multiple commands to execute in succession (assuming they're not NCQ).. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: SELECT() returns 1 But FIONREAD says (Input/output error)
Uncle George wrote: David Schwartz wrote: Nope. An errored connection is always ready for read/write -- there is nothing to wait for as far as the kernel is concerned. Your code keeps asking the kernel if something interesting has happened, the kernel keeps telling it yes, and it refuses to do anything about it. The select() returns because i pulled the USB cable from hub. Seems reasonable. The next select() found what? to be interesting in order to prematurely terminate the select-wait? As far as I can tell, nothing interesting has happened since the previous select(). In this case the select() is only looking at read()'s. It's because you haven't done anything to handle the error which is still persisting. Likely the only thing sane you can do in this case is close the fd and try to reopen it later. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Case: 7454422: Re: Kernel 2.6.21.3 does not work with 8GB of RAM on Intel 965WH motherboards. (FULL DMESG)
Parag Warudkar wrote: Robert Hancock wrote: 0-3319MB 4096-8832MB leaving 64MB of memory at the top of RAM uncached. What do you want to bet that something important (kernel code?) is getting loaded there.. So essentially it's a BIOS problem, it's not setting up the MTRRs properly in order to map all of RAM as cacheable. As Andi says, complain to Intel. Could the BADRAM patch be useful for him? http://rick.vanrein.org/linux/badram/download.html has 2.6.21 version. It says it supports x86_64. May be using this patch he can exclude that RAM from being used/accessed? I think that mem=8832M would work as well, to make the kernel use only the memory that is marked cacheable. (It looks like this parameter takes the highest memory address we want the kernel to use, not the highest memory amount.) -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm] 1/2: MMCONFIG: validate against ACPI motherboard resources
This path adds validation of the MMCONFIG table against the ACPI reserved motherboard resources. If the MMCONFIG table is found to be reserved in ACPI, we don't bother checking the E820 table. The PCI Express firmware spec apparently tells BIOS developers that reservation in ACPI is required and E820 reservation is optional, so checking against ACPI first makes sense. Many BIOSes don't reserve the MMCONFIG region in E820 even though it is perfectly functional, the existing check needlessly disables MMCONFIG in these cases. In order to do this, MMCONFIG setup has been split into two phases. If PCI configuration type 1 is not available then MMCONFIG is enabled early as before. Otherwise, it is enabled later after the ACPI interpreter is enabled, since we need to be able to execute control methods in order to check the ACPI reserved resources. Presently this is just triggered off the end of ACPI interpreter initialization. There are a few other behavioral changes here: -Validate all MMCONFIG configurations provided, not just the first one. -Validate the entire required length of each configuration according to the provided ending bus number is reserved, not just the minimum required allocation. -Validate that the area is reserved even if we read it from the chipset directly and not from the MCFG table. This catches the case where the BIOS didn't set the location properly in the chipset and has mapped it over other things it shouldn't have. Based on an original patch by Rajesh Shah from Intel. Signed-off-by: Robert Hancock <[EMAIL PROTECTED]> --- This should fix up some of the whitespace/formatting problems in the previous version. There were actually some bugs in the check_mcfg_resource function, there were some <= that should have been <. Also forgot the attribution for Rajesh Shah who wrote the original version of some of this code. diff -rup --exclude-from=linux-2.6.22-rc2-mm1/Documentation/dontdiff linux-2.6.22-rc2-mm1/arch/i386/pci/init.c linux-2.6.22-rc2-mm1edit/arch/i386/pci/init.c --- linux-2.6.22-rc2-mm1/arch/i386/pci/init.c 2007-05-23 21:20:43.0 -0600 +++ linux-2.6.22-rc2-mm1edit/arch/i386/pci/init.c 2007-05-23 21:31:50.0 -0600 @@ -12,7 +12,7 @@ static __init int pci_access_init(void) type = pci_direct_probe(); #endif #ifdef CONFIG_PCI_MMCONFIG - pci_mmcfg_init(type); + pci_mmcfg_early_init(type); #endif if (raw_pci_ops) return 0; diff -rup --exclude-from=linux-2.6.22-rc2-mm1/Documentation/dontdiff linux-2.6.22-rc2-mm1/arch/i386/pci/mmconfig-shared.c linux-2.6.22-rc2-mm1edit/arch/i386/pci/mmconfig-shared.c --- linux-2.6.22-rc2-mm1/arch/i386/pci/mmconfig-shared.c2007-05-23 21:21:04.0 -0600 +++ linux-2.6.22-rc2-mm1edit/arch/i386/pci/mmconfig-shared.c2007-05-30 18:40:31.0 -0600 @@ -206,9 +206,78 @@ static void __init pci_mmcfg_insert_reso pci_mmcfg_resources_inserted = 1; } -static void __init pci_mmcfg_reject_broken(int type) +static acpi_status __init check_mcfg_resource(struct acpi_resource *res, + void *data) +{ + struct resource *mcfg_res = data; + struct acpi_resource_address64 address; + acpi_status status; + + if (res->type == ACPI_RESOURCE_TYPE_FIXED_MEMORY32) { + struct acpi_resource_fixed_memory32 *fixmem32 = + >data.fixed_memory32; + if (!fixmem32) + return AE_OK; + if ((mcfg_res->start >= fixmem32->address) && + (mcfg_res->end < (fixmem32->address + + fixmem32->address_length))) { + mcfg_res->flags = 1; + return AE_CTRL_TERMINATE; + } + } + if ((res->type != ACPI_RESOURCE_TYPE_ADDRESS32) && + (res->type != ACPI_RESOURCE_TYPE_ADDRESS64)) + return AE_OK; + + status = acpi_resource_to_address64(res, ); + if (ACPI_FAILURE(status) || + (address.address_length <= 0) || + (address.resource_type != ACPI_MEMORY_RANGE)) + return AE_OK; + + if ((mcfg_res->start >= address.minimum) && + (mcfg_res->end < (address.minimum + address.address_length))) { + mcfg_res->flags = 1; + return AE_CTRL_TERMINATE; + } + return AE_OK; +} + +static acpi_status __init find_mboard_resource(acpi_handle handle, u32 lvl, + void *context, void **rv) +{ + struct resource *mcfg_res = context; + + acpi_walk_resources(handle, METHOD_NAME__CRS, + check_mcfg_resource, context); + + if (mcfg_res->flags) + return AE_CTRL_TERMINATE; + + return AE_OK; +} + +static int __init is_acpi_reserved(unsigned long start, unsigned long end) +{ +
Re: [PATCH -mm] 1/2: MMCONFIG: validate against ACPI motherboard resources
Mark Lord wrote: Linus Torvalds wrote: And once I looked closer, I just went "aiieee, it wasn't all the email client" ;) Not long ago, Tejun pointed out the "External Editor" extension for Thunderbird, which turns out to be the only really sane way to submit patches with that client. Download and install it, then add a button for it using View->Toolbars->Customize... and finally just click on it when in the Compose dialog. A very useful tip. Thanks again to Tejun for pointing it out. Yes, I've been using that one, as well as changing the word wrap length to 0 characters to switch that off. Apparently disabling format=flowed is needed as well, however :-) -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm] 0/2: PCI MMCONFIG-related updates
Jesse Barnes wrote: On Tuesday, May 29, 2007 9:01:22 Robert Hancock wrote: These two patches implement some changes in behavior related to PCI MMCONFIG configuration space access. One changes the way in which we validate the MCFG table provided by the BIOS by checking it against ACPI motherboard resources instead of the E820 table. The BIOS is not required to reserve this area in the E820 table, so checking that results in MMCONFIG being unnecessarily disabled on some machines. Some Intel chipsets where MMCONFIG was being disabled previously (but won't be with the first patch) had problems, not due to the MCFG table being broken, but because the access was hosed by the way in which we do PCI BAR sizing. The second patch fixes this problem. This is requested for inclusion in the -mm tree for testing. Robert, should we also pull in the 915 and 965 chipset specific register poking code? It might be a good sanity check against ACPI (i.e. if ACPI and the actual register window disagree, we can assume the BIOS is broken and MCFG is not safe to use). If so, I'll update and repost them against your patchset. Probably not a bad idea.. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Case: 7454422: Re: Kernel 2.6.21.3 does not work with 8GB of RAM on Intel 965WH motherboards. (FULL DMESG)
Justin Piszcz wrote: > That output looked nasty, attaching entries from syslog. > > Justin. Here's your E820 memory map, from dmesg: BIOS-e820: - 0008f000 (usable) BIOS-e820: 0008f000 - 000a (reserved) BIOS-e820: 000e - 0010 (reserved) BIOS-e820: 0010 - cf58f000 (usable) BIOS-e820: cf58f000 - cf59c000 (reserved) BIOS-e820: cf59c000 - cf653000 (usable) BIOS-e820: cf653000 - cf6a5000 (ACPI NVS) BIOS-e820: cf6a5000 - cf6a8000 (ACPI data) BIOS-e820: cf6a8000 - cf6ef000 (ACPI NVS) BIOS-e820: cf6ef000 - cf6f1000 (ACPI data) BIOS-e820: cf6f1000 - cf6f2000 (usable) BIOS-e820: cf6f2000 - cf6ff000 (ACPI data) BIOS-e820: cf6ff000 - cf70 (usable) BIOS-e820: cf70 - d000 (reserved) BIOS-e820: fff0 - 0001 (reserved) BIOS-e820: 0001 - 00022c00 (usable) so the usable memory ranges are: 0-572K 1MB-3317.55MB 3317.60MB-3317.75MB 3318.94MB-3318.945MB 3318.996MB-3319MB 4096MB-8896MB and the MTRRs (from /proc/mtrr, from private email): reg00: base=0x ( 0MB), size=2048MB: write-back, count=1 reg01: base=0x8000 (2048MB), size=1024MB: write-back, count=1 reg02: base=0xc000 (3072MB), size= 256MB: write-back, count=1 reg03: base=0xcf80 (3320MB), size= 8MB: uncachable, count=1 reg04: base=0xcf70 (3319MB), size= 1MB: uncachable, count=1 reg05: base=0x1 (4096MB), size=4096MB: write-back, count=1 reg06: base=0x2 (8192MB), size= 512MB: write-back, count=1 reg07: base=0x22000 (8704MB), size= 128MB: write-back, count=1 so the ranges mapped as cacheable are: 0-3319MB 4096-8832MB leaving 64MB of memory at the top of RAM uncached. What do you want to bet that something important (kernel code?) is getting loaded there.. So essentially it's a BIOS problem, it's not setting up the MTRRs properly in order to map all of RAM as cacheable. As Andi says, complain to Intel. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm] 1/2: MMCONFIG: validate against ACPI motherboard resources
Linus Torvalds wrote: > On Tue, 29 May 2007, Robert Hancock wrote: >> This path adds validation of the MMCONFIG table against the ACPI reserved >> motherboard resources. > > Please fix the formatting of your code. > > "for" and "if" are not functions, and they have a space before the > parenthesis. > > And pretty much every single conditional in this patch is spread out over > two or more lines and has at least three different indentations. There's > something wrong here. Code can't look this bad and still be fine. Some of > this looks like random whitespace noise: > > + if(is_acpi_reserved(cfg->address, > + cfg->address + size - 1)) > + printk(KERN_NOTICE "PCI: MCFG area at %Lx reserved " > + "in ACPI motherboard resources\n", > + cfg->address); > + else { > > That's just horrid. Please try to make the code _look_ nicer. I'll try and fix up the formatting and repost this patch. I suspect some of the issues are from the added code clashing with the way the existing code was formatted. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Kernel 2.6.21.3 does not work with 8GB of RAM on Intel 965WH motherboards.
Justin Piszcz wrote: Kernel dmesg attached from 8GB bootup. It looks like part of the start of the output was truncated.. Robert, how come the option is not applicable in 64-bit mode? If I want to use all 8GB of memory I need to run a 32-bit kernel? Justin. Highmem and PAE (which are essentially what the 4GB/64GB memory options control) are not needed in 64-bit mode, since we can access the entire 64-bit address space directly. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Case: 7454422: Re: Kernel 2.6.21.3 does not work with 8GB of RAM on Intel 965WH motherboards. (FULL DMESG)
Justin Piszcz wrote: That output looked nasty, attaching entries from syslog. Justin. Here's your E820 memory map, from dmesg: BIOS-e820: - 0008f000 (usable) BIOS-e820: 0008f000 - 000a (reserved) BIOS-e820: 000e - 0010 (reserved) BIOS-e820: 0010 - cf58f000 (usable) BIOS-e820: cf58f000 - cf59c000 (reserved) BIOS-e820: cf59c000 - cf653000 (usable) BIOS-e820: cf653000 - cf6a5000 (ACPI NVS) BIOS-e820: cf6a5000 - cf6a8000 (ACPI data) BIOS-e820: cf6a8000 - cf6ef000 (ACPI NVS) BIOS-e820: cf6ef000 - cf6f1000 (ACPI data) BIOS-e820: cf6f1000 - cf6f2000 (usable) BIOS-e820: cf6f2000 - cf6ff000 (ACPI data) BIOS-e820: cf6ff000 - cf70 (usable) BIOS-e820: cf70 - d000 (reserved) BIOS-e820: fff0 - 0001 (reserved) BIOS-e820: 0001 - 00022c00 (usable) so the usable memory ranges are: 0-572K 1MB-3317.55MB 3317.60MB-3317.75MB 3318.94MB-3318.945MB 3318.996MB-3319MB 4096MB-8896MB and the MTRRs (from /proc/mtrr, from private email): reg00: base=0x ( 0MB), size=2048MB: write-back, count=1 reg01: base=0x8000 (2048MB), size=1024MB: write-back, count=1 reg02: base=0xc000 (3072MB), size= 256MB: write-back, count=1 reg03: base=0xcf80 (3320MB), size= 8MB: uncachable, count=1 reg04: base=0xcf70 (3319MB), size= 1MB: uncachable, count=1 reg05: base=0x1 (4096MB), size=4096MB: write-back, count=1 reg06: base=0x2 (8192MB), size= 512MB: write-back, count=1 reg07: base=0x22000 (8704MB), size= 128MB: write-back, count=1 so the ranges mapped as cacheable are: 0-3319MB 4096-8832MB leaving 64MB of memory at the top of RAM uncached. What do you want to bet that something important (kernel code?) is getting loaded there.. So essentially it's a BIOS problem, it's not setting up the MTRRs properly in order to map all of RAM as cacheable. As Andi says, complain to Intel. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm] 0/2: PCI MMCONFIG-related updates
Jesse Barnes wrote: On Tuesday, May 29, 2007 9:01:22 Robert Hancock wrote: These two patches implement some changes in behavior related to PCI MMCONFIG configuration space access. One changes the way in which we validate the MCFG table provided by the BIOS by checking it against ACPI motherboard resources instead of the E820 table. The BIOS is not required to reserve this area in the E820 table, so checking that results in MMCONFIG being unnecessarily disabled on some machines. Some Intel chipsets where MMCONFIG was being disabled previously (but won't be with the first patch) had problems, not due to the MCFG table being broken, but because the access was hosed by the way in which we do PCI BAR sizing. The second patch fixes this problem. This is requested for inclusion in the -mm tree for testing. Robert, should we also pull in the 915 and 965 chipset specific register poking code? It might be a good sanity check against ACPI (i.e. if ACPI and the actual register window disagree, we can assume the BIOS is broken and MCFG is not safe to use). If so, I'll update and repost them against your patchset. Probably not a bad idea.. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm] 1/2: MMCONFIG: validate against ACPI motherboard resources
Mark Lord wrote: Linus Torvalds wrote: And once I looked closer, I just went aiieee, it wasn't all the email client ;) Not long ago, Tejun pointed out the External Editor extension for Thunderbird, which turns out to be the only really sane way to submit patches with that client. Download and install it, then add a button for it using View-Toolbars-Customize... and finally just click on it when in the Compose dialog. A very useful tip. Thanks again to Tejun for pointing it out. Yes, I've been using that one, as well as changing the word wrap length to 0 characters to switch that off. Apparently disabling format=flowed is needed as well, however :-) -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm] 1/2: MMCONFIG: validate against ACPI motherboard resources
This path adds validation of the MMCONFIG table against the ACPI reserved motherboard resources. If the MMCONFIG table is found to be reserved in ACPI, we don't bother checking the E820 table. The PCI Express firmware spec apparently tells BIOS developers that reservation in ACPI is required and E820 reservation is optional, so checking against ACPI first makes sense. Many BIOSes don't reserve the MMCONFIG region in E820 even though it is perfectly functional, the existing check needlessly disables MMCONFIG in these cases. In order to do this, MMCONFIG setup has been split into two phases. If PCI configuration type 1 is not available then MMCONFIG is enabled early as before. Otherwise, it is enabled later after the ACPI interpreter is enabled, since we need to be able to execute control methods in order to check the ACPI reserved resources. Presently this is just triggered off the end of ACPI interpreter initialization. There are a few other behavioral changes here: -Validate all MMCONFIG configurations provided, not just the first one. -Validate the entire required length of each configuration according to the provided ending bus number is reserved, not just the minimum required allocation. -Validate that the area is reserved even if we read it from the chipset directly and not from the MCFG table. This catches the case where the BIOS didn't set the location properly in the chipset and has mapped it over other things it shouldn't have. Based on an original patch by Rajesh Shah from Intel. Signed-off-by: Robert Hancock [EMAIL PROTECTED] --- This should fix up some of the whitespace/formatting problems in the previous version. There were actually some bugs in the check_mcfg_resource function, there were some = that should have been . Also forgot the attribution for Rajesh Shah who wrote the original version of some of this code. diff -rup --exclude-from=linux-2.6.22-rc2-mm1/Documentation/dontdiff linux-2.6.22-rc2-mm1/arch/i386/pci/init.c linux-2.6.22-rc2-mm1edit/arch/i386/pci/init.c --- linux-2.6.22-rc2-mm1/arch/i386/pci/init.c 2007-05-23 21:20:43.0 -0600 +++ linux-2.6.22-rc2-mm1edit/arch/i386/pci/init.c 2007-05-23 21:31:50.0 -0600 @@ -12,7 +12,7 @@ static __init int pci_access_init(void) type = pci_direct_probe(); #endif #ifdef CONFIG_PCI_MMCONFIG - pci_mmcfg_init(type); + pci_mmcfg_early_init(type); #endif if (raw_pci_ops) return 0; diff -rup --exclude-from=linux-2.6.22-rc2-mm1/Documentation/dontdiff linux-2.6.22-rc2-mm1/arch/i386/pci/mmconfig-shared.c linux-2.6.22-rc2-mm1edit/arch/i386/pci/mmconfig-shared.c --- linux-2.6.22-rc2-mm1/arch/i386/pci/mmconfig-shared.c2007-05-23 21:21:04.0 -0600 +++ linux-2.6.22-rc2-mm1edit/arch/i386/pci/mmconfig-shared.c2007-05-30 18:40:31.0 -0600 @@ -206,9 +206,78 @@ static void __init pci_mmcfg_insert_reso pci_mmcfg_resources_inserted = 1; } -static void __init pci_mmcfg_reject_broken(int type) +static acpi_status __init check_mcfg_resource(struct acpi_resource *res, + void *data) +{ + struct resource *mcfg_res = data; + struct acpi_resource_address64 address; + acpi_status status; + + if (res-type == ACPI_RESOURCE_TYPE_FIXED_MEMORY32) { + struct acpi_resource_fixed_memory32 *fixmem32 = + res-data.fixed_memory32; + if (!fixmem32) + return AE_OK; + if ((mcfg_res-start = fixmem32-address) + (mcfg_res-end (fixmem32-address + + fixmem32-address_length))) { + mcfg_res-flags = 1; + return AE_CTRL_TERMINATE; + } + } + if ((res-type != ACPI_RESOURCE_TYPE_ADDRESS32) + (res-type != ACPI_RESOURCE_TYPE_ADDRESS64)) + return AE_OK; + + status = acpi_resource_to_address64(res, address); + if (ACPI_FAILURE(status) || + (address.address_length = 0) || + (address.resource_type != ACPI_MEMORY_RANGE)) + return AE_OK; + + if ((mcfg_res-start = address.minimum) + (mcfg_res-end (address.minimum + address.address_length))) { + mcfg_res-flags = 1; + return AE_CTRL_TERMINATE; + } + return AE_OK; +} + +static acpi_status __init find_mboard_resource(acpi_handle handle, u32 lvl, + void *context, void **rv) +{ + struct resource *mcfg_res = context; + + acpi_walk_resources(handle, METHOD_NAME__CRS, + check_mcfg_resource, context); + + if (mcfg_res-flags) + return AE_CTRL_TERMINATE; + + return AE_OK; +} + +static int __init is_acpi_reserved(unsigned long start, unsigned long end) +{ + struct resource mcfg_res; + + mcfg_res.start = start; + mcfg_res.end = end; + mcfg_res.flags
Re: Case: 7454422: Re: Kernel 2.6.21.3 does not work with 8GB of RAM on Intel 965WH motherboards. (FULL DMESG)
Parag Warudkar wrote: Robert Hancock wrote: 0-3319MB 4096-8832MB leaving 64MB of memory at the top of RAM uncached. What do you want to bet that something important (kernel code?) is getting loaded there.. So essentially it's a BIOS problem, it's not setting up the MTRRs properly in order to map all of RAM as cacheable. As Andi says, complain to Intel. Could the BADRAM patch be useful for him? http://rick.vanrein.org/linux/badram/download.html has 2.6.21 version. It says it supports x86_64. May be using this patch he can exclude that RAM from being used/accessed? I think that mem=8832M would work as well, to make the kernel use only the memory that is marked cacheable. (It looks like this parameter takes the highest memory address we want the kernel to use, not the highest memory amount.) -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Kernel 2.6.21.3 does not work with 8GB of RAM on Intel 965WH motherboards.
Justin Piszcz wrote: Kernel dmesg attached from 8GB bootup. It looks like part of the start of the output was truncated.. Robert, how come the option is not applicable in 64-bit mode? If I want to use all 8GB of memory I need to run a 32-bit kernel? Justin. Highmem and PAE (which are essentially what the 4GB/64GB memory options control) are not needed in 64-bit mode, since we can access the entire 64-bit address space directly. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm] 1/2: MMCONFIG: validate against ACPI motherboard resources
Linus Torvalds wrote: On Tue, 29 May 2007, Robert Hancock wrote: This path adds validation of the MMCONFIG table against the ACPI reserved motherboard resources. Please fix the formatting of your code. for and if are not functions, and they have a space before the parenthesis. And pretty much every single conditional in this patch is spread out over two or more lines and has at least three different indentations. There's something wrong here. Code can't look this bad and still be fine. Some of this looks like random whitespace noise: + if(is_acpi_reserved(cfg-address, + cfg-address + size - 1)) + printk(KERN_NOTICE PCI: MCFG area at %Lx reserved + in ACPI motherboard resources\n, + cfg-address); + else { That's just horrid. Please try to make the code _look_ nicer. I'll try and fix up the formatting and repost this patch. I suspect some of the issues are from the added code clashing with the way the existing code was formatted. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Kernel 2.6.21.3 does not work with 8GB of RAM on Intel 965WH motherboards.
Justin Piszcz wrote: Short Description of Problem: Linux 2.6.21.3 does not run properly with 8GB of ram on the Intel 965WH motherboard. Long Description of Problem: When I use 8GB of memory on my x86_64 system, CPU-bound processes are VERY slow, up to 36x slower than usual. My temporary fix is force Linux to only use 4GB of memory, I am currently using mem=4096M. I ran memtest86 and the memory is fine, not a single error. I tried the following to mem= 1024, 2048 4096 and blank "" to let the kernel use all 8GB of memory. What is wrong with the kernel and how come it cannot use 8GB of memory without slowing down all CPU-related processes to a snail-like pace? There is something horribly wrong here. Specifications: Intel Motherboard: 965WH Linux Kernel: 2.6.21.3 Distribution: Debian Testing x86_64 GCC: gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21) Target: x86_64-linux-gnu Tests: 1. append line = 1024M top - 18:28:26 up 1 min, 4 users, load average: 0.42, 0.17, 0.06 Tasks: 157 total, 1 running, 156 sleeping, 0 stopped, 0 zombie Cpu(s): 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 1027016k total, 964288k used,62728k free, 1232k buffers Swap: 16787768k total,0k used, 16787768k free, 105168k cached ---> STATUS: No problems, box is fine, no lag, etc.. 2. append line = 2048M top - 18:34:23 up 2 min, 2 users, load average: 0.14, 0.14, 0.05 Tasks: 147 total, 1 running, 146 sleeping, 0 stopped, 0 zombie Cpu(s): 1.7%us, 1.2%sy, 0.4%ni, 95.2%id, 1.5%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 2059696k total, 956324k used, 1103372k free, 1232k buffers Swap: 16787768k total,0k used, 16787768k free, 102924k cached ---> STATUS: No problems, box is fine, no lag, etc.. 3. append line = 4096M top - 18:37:55 up 1 min, 1 user, load average: 0.52, 0.19, 0.07 Tasks: 143 total, 1 running, 142 sleeping, 0 stopped, 0 zombie Cpu(s): 2.9%us, 2.2%sy, 0.7%ni, 91.6%id, 2.6%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 3339536k total, 949792k used, 2389744k free, 1232k buffers Swap: 16787768k total,0k used, 16787768k free,99920k cached $ time ssh p34 uptime 19:00:16 up 1 min, 1 user, load average: 0.67, 0.18, 0.06 real0m0.159s user0m0.013s sys 0m0.003s ---> STATUS: No problems, box is fine, no lag, etc.. 4. append line = "" (use all 8GB) top - 18:52:50 up 9 min, 1 user, load average: 2.88, 2.43, 1.41 Tasks: 149 total, 3 running, 146 sleeping, 0 stopped, 0 zombie Cpu(s): 36.3%us, 2.2%sy, 10.3%ni, 50.8%id, 0.4%wa, 0.0%hi, 0.1%si, 0.0%st Mem: 8104460k total, 1064416k used, 7040044k free, 3296k buffers Swap: 16787768k total,0k used, 16787768k free, 201852k cached $ ssh p34 ssh: connect to host p34 port 22: Connection refused Machine takes 5-10 minutes to boot, it acts like a 286 computer, about 8 minutes later: $ time ssh p34 uptime # 5 SECONDS!! 36x slower when using 8GB of RAM 18:51:39 up 8 min, 1 user, load average: 2.74, 2.31, 1.30 real0m5.757s user0m0.015s sys 0m0.004s The machine is VERY slow and this is on a gigabit network, I/O does not seem to be affected but rather, CPU-bound processes. PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 2483 root 25 0 25324 5292 1072 R 96 0.1 4:37.12 mailgraph 3604 logcheck 30 10 3408 1120 544 R 91 0.0 0:03.55 grep These normally take seconds but when I use all 8GB of memory, it runs for a very long time. Conclusion: For now, I will be using mem=4096M until someone can help me understand what is happening here. Can anyone offer any insight? I found it interesting in make menuconfig on x86_64 there is no 4GB/64GB options in the kernel that I remember seeing in 32bit. That's because that option is not applicable in 64-bit mode. Can you send your full dmesg output from the 8GB bootup? -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH -mm] 2/2: PCI: disable decode of IO/memory during BAR sizing
Change PCI BAR sizing to disable the decode of memory or IO, as appropriate, while we are writing the all-ones value to the BAR to determine the size. If this is not done, the device may spuriously decode accesses to memory areas it should not. On some Intel PCI Express chipsets, this breaks MMCONFIG configuration space access, since the memory the graphics card ends up decoding during this period overlaps the MMCONFIG area, and thus it steals the accesses to the area to do any other configuration space access, including changing the BAR back to its previous value. However, don't do this disabling on host bridge devices, as it is reported that some of them do silly things like disable CPU to RAM access if this is done. Based on an original patch by Jesse Barnes. Signed-off-by: Robert Hancock <[EMAIL PROTECTED]> --- linux-2.6.22-rc2-mm1/drivers/pci/probe.c2007-05-23 21:21:05.0 -0600 +++ linux-2.6.22-rc2-mm1edit/drivers/pci/probe.c2007-05-29 21:31:47.0 -0600 @@ -180,6 +180,58 @@ static inline int is_64bit_memory(u32 ma return 0; } +#define BAR_IS_MEMORY(bar) (((bar) & PCI_BASE_ADDRESS_SPACE) ==\ + PCI_BASE_ADDRESS_SPACE_MEMORY) + +/** + * pci_bar_size - get raw PCI BAR size + * @dev: PCI device + * @reg: BAR to probe + * + * Use basic PCI probing: + * - save original BAR value + * - disable MEM or IO decode in PCI_COMMAND reg if appropriate + * - write all 1s to the BAR + * - read back value + * - reenble MEM or IO decode as necessary + * - write original value back + * + * Returns raw BAR size to caller. + */ +static u32 pci_bar_size(struct pci_dev *dev, unsigned int reg) +{ + u32 orig_reg, sz; + u16 orig_cmd; + + pci_read_config_dword(dev, reg, _reg); + pci_read_config_word(dev, PCI_COMMAND, _cmd); + + /* +* Disable memory or IO decode on the device while writing the test +* value to the BAR. This prevents possible spurious decoding +* of random addresses by the device. Don't do this for host bridges, +* however, since some of them do silly things like disable CPU to RAM +* access if this is done. +*/ + if ((dev->class >> 8) != PCI_CLASS_BRIDGE_HOST) { + if (BAR_IS_MEMORY(orig_reg)) + pci_write_config_word(dev, PCI_COMMAND, + orig_cmd & ~PCI_COMMAND_MEMORY); + else + pci_write_config_word(dev, PCI_COMMAND, + orig_cmd & ~PCI_COMMAND_IO); + } + + pci_write_config_dword(dev, reg, 0x); + pci_read_config_dword(dev, reg, ); + pci_write_config_dword(dev, reg, orig_reg); + + if ((dev->class >> 8) != PCI_CLASS_BRIDGE_HOST) + pci_write_config_word(dev, PCI_COMMAND, orig_cmd); + + return sz; +} + static void pci_read_bases(struct pci_dev *dev, unsigned int howmany, int rom) { unsigned int pos, reg, next; @@ -196,16 +248,13 @@ static void pci_read_bases(struct pci_de res->name = pci_name(dev); reg = PCI_BASE_ADDRESS_0 + (pos << 2); pci_read_config_dword(dev, reg, ); - pci_write_config_dword(dev, reg, ~0); - pci_read_config_dword(dev, reg, ); - pci_write_config_dword(dev, reg, l); + sz = pci_bar_size(dev, reg); if (!sz || sz == 0x) continue; if (l == 0x) l = 0; raw_sz = sz; - if ((l & PCI_BASE_ADDRESS_SPACE) == - PCI_BASE_ADDRESS_SPACE_MEMORY) { + if (BAR_IS_MEMORY(l)) { sz = pci_size(l, sz, (u32)PCI_BASE_ADDRESS_MEM_MASK); /* * For 64bit prefetchable memory sz could be 0, if the @@ -229,9 +278,7 @@ static void pci_read_bases(struct pci_de u32 szhi, lhi; pci_read_config_dword(dev, reg+4, ); - pci_write_config_dword(dev, reg+4, ~0); - pci_read_config_dword(dev, reg+4, ); - pci_write_config_dword(dev, reg+4, lhi); + szhi = pci_bar_size(dev, reg+4); sz64 = ((u64)szhi << 32) | raw_sz; l64 = ((u64)lhi << 32) | l; sz64 = pci_size64(l64, sz64, PCI_BASE_ADDRESS_MEM_MASK); - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH -mm] 0/2: PCI MMCONFIG-related updates
These two patches implement some changes in behavior related to PCI MMCONFIG configuration space access. One changes the way in which we validate the MCFG table provided by the BIOS by checking it against ACPI motherboard resources instead of the E820 table. The BIOS is not required to reserve this area in the E820 table, so checking that results in MMCONFIG being unnecessarily disabled on some machines. Some Intel chipsets where MMCONFIG was being disabled previously (but won't be with the first patch) had problems, not due to the MCFG table being broken, but because the access was hosed by the way in which we do PCI BAR sizing. The second patch fixes this problem. This is requested for inclusion in the -mm tree for testing. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH -mm] 1/2: MMCONFIG: validate against ACPI motherboard resources
This path adds validation of the MMCONFIG table against the ACPI reserved motherboard resources. If the MMCONFIG table is found to be reserved in ACPI, we don't bother checking the E820 table. The PCI Express firmware spec apparently tells BIOS developers that reservation in ACPI is required and E820 reservation is optional, so checking against ACPI first makes sense. Many BIOSes don't reserve the MMCONFIG region in E820 even though it is perfectly functional, the existing check needlessly disables MMCONFIG in these cases. In order to do this, MMCONFIG setup has been split into two phases. If PCI configuration type 1 is not available then MMCONFIG is enabled early as before. Otherwise, it is enabled later after the ACPI interpreter is enabled, since we need to be able to execute control methods in order to check the ACPI reserved resources. Presently this is just triggered off the end of ACPI interpreter initialization. There are a few other behavioral changes here: -Validate all MMCONFIG configurations provided, not just the first one. -Validate the entire required length of each configuration according to the provided ending bus number is reserved, not just the minimum required allocation. -Validate that the area is reserved even if we read it from the chipset directly and not from the MCFG table. This catches the case where the BIOS didn't set the location properly in the chipset and has mapped it over other things it shouldn't have. Signed-off-by: Robert Hancock <[EMAIL PROTECTED]> diff -up linux-2.6.22-rc2-mm1/arch/i386/pci/init.c linux-2.6.22-rc2-mm1edit/arch/i386/pci/init.c --- linux-2.6.22-rc2-mm1/arch/i386/pci/init.c 2007-05-23 21:20:43.0 -0600 +++ linux-2.6.22-rc2-mm1edit/arch/i386/pci/init.c 2007-05-23 21:31:50.0 -0600 @@ -12,7 +12,7 @@ static __init int pci_access_init(void) type = pci_direct_probe(); #endif #ifdef CONFIG_PCI_MMCONFIG - pci_mmcfg_init(type); + pci_mmcfg_early_init(type); #endif if (raw_pci_ops) return 0; diff -up linux-2.6.22-rc2-mm1/arch/i386/pci/mmconfig-shared.c linux-2.6.22-rc2-mm1edit/arch/i386/pci/mmconfig-shared.c --- linux-2.6.22-rc2-mm1/arch/i386/pci/mmconfig-shared.c2007-05-23 21:21:04.0 -0600 +++ linux-2.6.22-rc2-mm1edit/arch/i386/pci/mmconfig-shared.c2007-05-23 21:38:19.0 -0600 @@ -206,9 +206,77 @@ static void __init pci_mmcfg_insert_reso pci_mmcfg_resources_inserted = 1; } -static void __init pci_mmcfg_reject_broken(int type) +static acpi_status __init check_mcfg_resource(struct acpi_resource *res, + void *data) +{ + struct resource *mcfg_res = data; + struct acpi_resource_address64 address; + acpi_status status; + + if (res->type == ACPI_RESOURCE_TYPE_FIXED_MEMORY32) { + struct acpi_resource_fixed_memory32 *fixmem32 = + >data.fixed_memory32; + if (!fixmem32) + return AE_OK; + if ((mcfg_res->start >= fixmem32->address) && + (mcfg_res->end <= (fixmem32->address + + fixmem32->address_length))) { + mcfg_res->flags = 1; + return AE_CTRL_TERMINATE; + } + } + if ((res->type != ACPI_RESOURCE_TYPE_ADDRESS32) && + (res->type != ACPI_RESOURCE_TYPE_ADDRESS64)) + return AE_OK; + + status = acpi_resource_to_address64(res, ); + if (ACPI_FAILURE(status) || (address.address_length <= 0) || + (address.resource_type != ACPI_MEMORY_RANGE)) + return AE_OK; + + if ((mcfg_res->start >= address.minimum) && + (mcfg_res->end <= +(address.minimum +address.address_length))) { + mcfg_res->flags = 1; + return AE_CTRL_TERMINATE; + } + return AE_OK; +} + +static acpi_status __init find_mboard_resource(acpi_handle handle, u32 lvl, + void *context, void **rv) +{ + struct resource *mcfg_res = context; + + acpi_walk_resources(handle, METHOD_NAME__CRS, + check_mcfg_resource, context); + + if (mcfg_res->flags) + return AE_CTRL_TERMINATE; + + return AE_OK; +} + +static int __init is_acpi_reserved(unsigned long start, unsigned long end) +{ + struct resource mcfg_res; + + mcfg_res.start = start; + mcfg_res.end = end; + mcfg_res.flags = 0; + + acpi_get_devices("PNP0C01", find_mboard_resource, _res, NULL); + + if( !mcfg_res.flags ) + acpi_get_devices("PNP0C02", find_mboard_resource, _res, NULL); + + return mcfg_res.flags; +} + +static void __init pci_mmcfg_reject_broken(void) { typeof(pci_mmcfg_config
[PATCH -mm] 0/2: PCI MMCONFIG-related updates
These two patches implement some changes in behavior related to PCI MMCONFIG configuration space access. One changes the way in which we validate the MCFG table provided by the BIOS by checking it against ACPI motherboard resources instead of the E820 table. The BIOS is not required to reserve this area in the E820 table, so checking that results in MMCONFIG being unnecessarily disabled on some machines. Some Intel chipsets where MMCONFIG was being disabled previously (but won't be with the first patch) had problems, not due to the MCFG table being broken, but because the access was hosed by the way in which we do PCI BAR sizing. The second patch fixes this problem. This is requested for inclusion in the -mm tree for testing. - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH -mm] 1/2: MMCONFIG: validate against ACPI motherboard resources
This path adds validation of the MMCONFIG table against the ACPI reserved motherboard resources. If the MMCONFIG table is found to be reserved in ACPI, we don't bother checking the E820 table. The PCI Express firmware spec apparently tells BIOS developers that reservation in ACPI is required and E820 reservation is optional, so checking against ACPI first makes sense. Many BIOSes don't reserve the MMCONFIG region in E820 even though it is perfectly functional, the existing check needlessly disables MMCONFIG in these cases. In order to do this, MMCONFIG setup has been split into two phases. If PCI configuration type 1 is not available then MMCONFIG is enabled early as before. Otherwise, it is enabled later after the ACPI interpreter is enabled, since we need to be able to execute control methods in order to check the ACPI reserved resources. Presently this is just triggered off the end of ACPI interpreter initialization. There are a few other behavioral changes here: -Validate all MMCONFIG configurations provided, not just the first one. -Validate the entire required length of each configuration according to the provided ending bus number is reserved, not just the minimum required allocation. -Validate that the area is reserved even if we read it from the chipset directly and not from the MCFG table. This catches the case where the BIOS didn't set the location properly in the chipset and has mapped it over other things it shouldn't have. Signed-off-by: Robert Hancock [EMAIL PROTECTED] diff -up linux-2.6.22-rc2-mm1/arch/i386/pci/init.c linux-2.6.22-rc2-mm1edit/arch/i386/pci/init.c --- linux-2.6.22-rc2-mm1/arch/i386/pci/init.c 2007-05-23 21:20:43.0 -0600 +++ linux-2.6.22-rc2-mm1edit/arch/i386/pci/init.c 2007-05-23 21:31:50.0 -0600 @@ -12,7 +12,7 @@ static __init int pci_access_init(void) type = pci_direct_probe(); #endif #ifdef CONFIG_PCI_MMCONFIG - pci_mmcfg_init(type); + pci_mmcfg_early_init(type); #endif if (raw_pci_ops) return 0; diff -up linux-2.6.22-rc2-mm1/arch/i386/pci/mmconfig-shared.c linux-2.6.22-rc2-mm1edit/arch/i386/pci/mmconfig-shared.c --- linux-2.6.22-rc2-mm1/arch/i386/pci/mmconfig-shared.c2007-05-23 21:21:04.0 -0600 +++ linux-2.6.22-rc2-mm1edit/arch/i386/pci/mmconfig-shared.c2007-05-23 21:38:19.0 -0600 @@ -206,9 +206,77 @@ static void __init pci_mmcfg_insert_reso pci_mmcfg_resources_inserted = 1; } -static void __init pci_mmcfg_reject_broken(int type) +static acpi_status __init check_mcfg_resource(struct acpi_resource *res, + void *data) +{ + struct resource *mcfg_res = data; + struct acpi_resource_address64 address; + acpi_status status; + + if (res-type == ACPI_RESOURCE_TYPE_FIXED_MEMORY32) { + struct acpi_resource_fixed_memory32 *fixmem32 = + res-data.fixed_memory32; + if (!fixmem32) + return AE_OK; + if ((mcfg_res-start = fixmem32-address) + (mcfg_res-end = (fixmem32-address + + fixmem32-address_length))) { + mcfg_res-flags = 1; + return AE_CTRL_TERMINATE; + } + } + if ((res-type != ACPI_RESOURCE_TYPE_ADDRESS32) + (res-type != ACPI_RESOURCE_TYPE_ADDRESS64)) + return AE_OK; + + status = acpi_resource_to_address64(res, address); + if (ACPI_FAILURE(status) || (address.address_length = 0) || + (address.resource_type != ACPI_MEMORY_RANGE)) + return AE_OK; + + if ((mcfg_res-start = address.minimum) + (mcfg_res-end = +(address.minimum +address.address_length))) { + mcfg_res-flags = 1; + return AE_CTRL_TERMINATE; + } + return AE_OK; +} + +static acpi_status __init find_mboard_resource(acpi_handle handle, u32 lvl, + void *context, void **rv) +{ + struct resource *mcfg_res = context; + + acpi_walk_resources(handle, METHOD_NAME__CRS, + check_mcfg_resource, context); + + if (mcfg_res-flags) + return AE_CTRL_TERMINATE; + + return AE_OK; +} + +static int __init is_acpi_reserved(unsigned long start, unsigned long end) +{ + struct resource mcfg_res; + + mcfg_res.start = start; + mcfg_res.end = end; + mcfg_res.flags = 0; + + acpi_get_devices(PNP0C01, find_mboard_resource, mcfg_res, NULL); + + if( !mcfg_res.flags ) + acpi_get_devices(PNP0C02, find_mboard_resource, mcfg_res, NULL); + + return mcfg_res.flags; +} + +static void __init pci_mmcfg_reject_broken(void) { typeof(pci_mmcfg_config[0]) *cfg; + int i; if ((pci_mmcfg_config_num == 0) || (pci_mmcfg_config == NULL) || @@ -228,18
[PATCH -mm] 2/2: PCI: disable decode of IO/memory during BAR sizing
Change PCI BAR sizing to disable the decode of memory or IO, as appropriate, while we are writing the all-ones value to the BAR to determine the size. If this is not done, the device may spuriously decode accesses to memory areas it should not. On some Intel PCI Express chipsets, this breaks MMCONFIG configuration space access, since the memory the graphics card ends up decoding during this period overlaps the MMCONFIG area, and thus it steals the accesses to the area to do any other configuration space access, including changing the BAR back to its previous value. However, don't do this disabling on host bridge devices, as it is reported that some of them do silly things like disable CPU to RAM access if this is done. Based on an original patch by Jesse Barnes. Signed-off-by: Robert Hancock [EMAIL PROTECTED] --- linux-2.6.22-rc2-mm1/drivers/pci/probe.c2007-05-23 21:21:05.0 -0600 +++ linux-2.6.22-rc2-mm1edit/drivers/pci/probe.c2007-05-29 21:31:47.0 -0600 @@ -180,6 +180,58 @@ static inline int is_64bit_memory(u32 ma return 0; } +#define BAR_IS_MEMORY(bar) (((bar) PCI_BASE_ADDRESS_SPACE) ==\ + PCI_BASE_ADDRESS_SPACE_MEMORY) + +/** + * pci_bar_size - get raw PCI BAR size + * @dev: PCI device + * @reg: BAR to probe + * + * Use basic PCI probing: + * - save original BAR value + * - disable MEM or IO decode in PCI_COMMAND reg if appropriate + * - write all 1s to the BAR + * - read back value + * - reenble MEM or IO decode as necessary + * - write original value back + * + * Returns raw BAR size to caller. + */ +static u32 pci_bar_size(struct pci_dev *dev, unsigned int reg) +{ + u32 orig_reg, sz; + u16 orig_cmd; + + pci_read_config_dword(dev, reg, orig_reg); + pci_read_config_word(dev, PCI_COMMAND, orig_cmd); + + /* +* Disable memory or IO decode on the device while writing the test +* value to the BAR. This prevents possible spurious decoding +* of random addresses by the device. Don't do this for host bridges, +* however, since some of them do silly things like disable CPU to RAM +* access if this is done. +*/ + if ((dev-class 8) != PCI_CLASS_BRIDGE_HOST) { + if (BAR_IS_MEMORY(orig_reg)) + pci_write_config_word(dev, PCI_COMMAND, + orig_cmd ~PCI_COMMAND_MEMORY); + else + pci_write_config_word(dev, PCI_COMMAND, + orig_cmd ~PCI_COMMAND_IO); + } + + pci_write_config_dword(dev, reg, 0x); + pci_read_config_dword(dev, reg, sz); + pci_write_config_dword(dev, reg, orig_reg); + + if ((dev-class 8) != PCI_CLASS_BRIDGE_HOST) + pci_write_config_word(dev, PCI_COMMAND, orig_cmd); + + return sz; +} + static void pci_read_bases(struct pci_dev *dev, unsigned int howmany, int rom) { unsigned int pos, reg, next; @@ -196,16 +248,13 @@ static void pci_read_bases(struct pci_de res-name = pci_name(dev); reg = PCI_BASE_ADDRESS_0 + (pos 2); pci_read_config_dword(dev, reg, l); - pci_write_config_dword(dev, reg, ~0); - pci_read_config_dword(dev, reg, sz); - pci_write_config_dword(dev, reg, l); + sz = pci_bar_size(dev, reg); if (!sz || sz == 0x) continue; if (l == 0x) l = 0; raw_sz = sz; - if ((l PCI_BASE_ADDRESS_SPACE) == - PCI_BASE_ADDRESS_SPACE_MEMORY) { + if (BAR_IS_MEMORY(l)) { sz = pci_size(l, sz, (u32)PCI_BASE_ADDRESS_MEM_MASK); /* * For 64bit prefetchable memory sz could be 0, if the @@ -229,9 +278,7 @@ static void pci_read_bases(struct pci_de u32 szhi, lhi; pci_read_config_dword(dev, reg+4, lhi); - pci_write_config_dword(dev, reg+4, ~0); - pci_read_config_dword(dev, reg+4, szhi); - pci_write_config_dword(dev, reg+4, lhi); + szhi = pci_bar_size(dev, reg+4); sz64 = ((u64)szhi 32) | raw_sz; l64 = ((u64)lhi 32) | l; sz64 = pci_size64(l64, sz64, PCI_BASE_ADDRESS_MEM_MASK); - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Kernel 2.6.21.3 does not work with 8GB of RAM on Intel 965WH motherboards.
Justin Piszcz wrote: Short Description of Problem: Linux 2.6.21.3 does not run properly with 8GB of ram on the Intel 965WH motherboard. Long Description of Problem: When I use 8GB of memory on my x86_64 system, CPU-bound processes are VERY slow, up to 36x slower than usual. My temporary fix is force Linux to only use 4GB of memory, I am currently using mem=4096M. I ran memtest86 and the memory is fine, not a single error. I tried the following to mem= 1024, 2048 4096 and blank to let the kernel use all 8GB of memory. What is wrong with the kernel and how come it cannot use 8GB of memory without slowing down all CPU-related processes to a snail-like pace? There is something horribly wrong here. Specifications: Intel Motherboard: 965WH Linux Kernel: 2.6.21.3 Distribution: Debian Testing x86_64 GCC: gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21) Target: x86_64-linux-gnu Tests: 1. append line = 1024M top - 18:28:26 up 1 min, 4 users, load average: 0.42, 0.17, 0.06 Tasks: 157 total, 1 running, 156 sleeping, 0 stopped, 0 zombie Cpu(s): 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 1027016k total, 964288k used,62728k free, 1232k buffers Swap: 16787768k total,0k used, 16787768k free, 105168k cached --- STATUS: No problems, box is fine, no lag, etc.. 2. append line = 2048M top - 18:34:23 up 2 min, 2 users, load average: 0.14, 0.14, 0.05 Tasks: 147 total, 1 running, 146 sleeping, 0 stopped, 0 zombie Cpu(s): 1.7%us, 1.2%sy, 0.4%ni, 95.2%id, 1.5%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 2059696k total, 956324k used, 1103372k free, 1232k buffers Swap: 16787768k total,0k used, 16787768k free, 102924k cached --- STATUS: No problems, box is fine, no lag, etc.. 3. append line = 4096M top - 18:37:55 up 1 min, 1 user, load average: 0.52, 0.19, 0.07 Tasks: 143 total, 1 running, 142 sleeping, 0 stopped, 0 zombie Cpu(s): 2.9%us, 2.2%sy, 0.7%ni, 91.6%id, 2.6%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 3339536k total, 949792k used, 2389744k free, 1232k buffers Swap: 16787768k total,0k used, 16787768k free,99920k cached $ time ssh p34 uptime 19:00:16 up 1 min, 1 user, load average: 0.67, 0.18, 0.06 real0m0.159s user0m0.013s sys 0m0.003s --- STATUS: No problems, box is fine, no lag, etc.. 4. append line = (use all 8GB) top - 18:52:50 up 9 min, 1 user, load average: 2.88, 2.43, 1.41 Tasks: 149 total, 3 running, 146 sleeping, 0 stopped, 0 zombie Cpu(s): 36.3%us, 2.2%sy, 10.3%ni, 50.8%id, 0.4%wa, 0.0%hi, 0.1%si, 0.0%st Mem: 8104460k total, 1064416k used, 7040044k free, 3296k buffers Swap: 16787768k total,0k used, 16787768k free, 201852k cached $ ssh p34 ssh: connect to host p34 port 22: Connection refused Machine takes 5-10 minutes to boot, it acts like a 286 computer, about 8 minutes later: $ time ssh p34 uptime # 5 SECONDS!! 36x slower when using 8GB of RAM 18:51:39 up 8 min, 1 user, load average: 2.74, 2.31, 1.30 real0m5.757s user0m0.015s sys 0m0.004s The machine is VERY slow and this is on a gigabit network, I/O does not seem to be affected but rather, CPU-bound processes. PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 2483 root 25 0 25324 5292 1072 R 96 0.1 4:37.12 mailgraph 3604 logcheck 30 10 3408 1120 544 R 91 0.0 0:03.55 grep These normally take seconds but when I use all 8GB of memory, it runs for a very long time. Conclusion: For now, I will be using mem=4096M until someone can help me understand what is happening here. Can anyone offer any insight? I found it interesting in make menuconfig on x86_64 there is no 4GB/64GB options in the kernel that I remember seeing in 32bit. That's because that option is not applicable in 64-bit mode. Can you send your full dmesg output from the 8GB bootup? -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Network broken in kernel level.
Wang Penghui wrote: Hello, list, Recently, i have messed up with the follow problem, i have two server both with two ethernet cards. Here are them: [EMAIL PROTECTED] ~]# lspci | grep -i eth 05:00.0 Ethernet controller: Marvell Technology Group Ltd. Gigabit Ethernet Controller (rev 18) 07:04.0 Ethernet controller: Intel Corp. 82541GI/PI Gigabit Ethernet Controller (rev 05) And they are running MySQL server on both of them. The OS is RHEL 4 with the default kernel 2.6.9-5.ELsmp. These days there are lots of error message comming out in /var/log/message and dmesg. That kernel is very old, you should get the latest RHEL errata update kernel and see if that helps. There have been hundreds of bugfixes in RHEL kernels since that version. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Network broken in kernel level.
Wang Penghui wrote: Hello, list, Recently, i have messed up with the follow problem, i have two server both with two ethernet cards. Here are them: [EMAIL PROTECTED] ~]# lspci | grep -i eth 05:00.0 Ethernet controller: Marvell Technology Group Ltd. Gigabit Ethernet Controller (rev 18) 07:04.0 Ethernet controller: Intel Corp. 82541GI/PI Gigabit Ethernet Controller (rev 05) And they are running MySQL server on both of them. The OS is RHEL 4 with the default kernel 2.6.9-5.ELsmp. These days there are lots of error message comming out in /var/log/message and dmesg. That kernel is very old, you should get the latest RHEL errata update kernel and see if that helps. There have been hundreds of bugfixes in RHEL kernels since that version. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.21.1] resume doesn't run suspended kernel?
Bill Davidsen wrote: I was testing susp2disk in 2.6.21.1 under FC6, to support reliable computing environment (RCE) needs. The idea is that if power fails, after some short time on UPS the system does susp2disk with a time set, and boots back every so often to see if power is stable. No, I don't want susp2mem until I debug it, console come up in useless mode, console as kalidescope is not what I need. Anyway, I pulled the plug on the UPS, and the system shut down. But when it powered up, it booted the default kernel rather than the test kernel, decided that it couldn't resume, and then did a cold boot. I can bypass this by making the debug kernel the default, but WHY? Is the kernel not saved such that any kernel can be rolled back into memory and run? Actually, the answer is HELL NO, so I really ask if this is the intended mode of operation, that only the default boot kernel will restore. Fedora scripts for hibernation are supposed to tell GRUB to set the default kernel on the next boot to be the current one before suspending to disk, so that it comes up with the same version it was running and the resume can succeed. If the way you're triggering the suspend bypasses this mechanism, you'll see this problem. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.21.1] resume doesn't run suspended kernel?
Bill Davidsen wrote: I was testing susp2disk in 2.6.21.1 under FC6, to support reliable computing environment (RCE) needs. The idea is that if power fails, after some short time on UPS the system does susp2disk with a time set, and boots back every so often to see if power is stable. No, I don't want susp2mem until I debug it, console come up in useless mode, console as kalidescope is not what I need. Anyway, I pulled the plug on the UPS, and the system shut down. But when it powered up, it booted the default kernel rather than the test kernel, decided that it couldn't resume, and then did a cold boot. I can bypass this by making the debug kernel the default, but WHY? Is the kernel not saved such that any kernel can be rolled back into memory and run? Actually, the answer is HELL NO, so I really ask if this is the intended mode of operation, that only the default boot kernel will restore. Fedora scripts for hibernation are supposed to tell GRUB to set the default kernel on the next boot to be the current one before suspending to disk, so that it comes up with the same version it was running and the resume can succeed. If the way you're triggering the suspend bypasses this mechanism, you'll see this problem. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [scsi] Remove __GFP_DMA
Alan Cox wrote: On Wed, 23 May 2007 15:17:08 -0400 "Salyzyn, Mark" <[EMAIL PROTECTED]> wrote: The 31 bit limit for some of these cards is a problem, we currently only do __GFP_DMA for bounce buffer sg elements allocated for user supplied references in ioctls. I figure we should be using pci_alloc_consistent calls for these allocations to more accurately acquire memory within the 31 bit limit if necessary, we could switch to these to remove the need for the __GFP_DMA flag in the aacraid driver? That didn't used to work right on the AMD boards when I tried it last as we ended up with a buffer that was mapped by the IOMMU for some reason and that was not below 2GB. The physical address you mean? If that is still happening then it needs to get fixed. The allocation should not succeed if it can't provide memory that's inside the DMA mask for the device.. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources
Jesse Barnes wrote: On Wednesday, May 23, 2007 8:20:14 Linus Torvalds wrote: On Wed, 23 May 2007, Linus Torvalds wrote: Sure. I think mmconfig is perfectly sane if it falls back to conf1 accesses for legacy stuff.. .. but without a regression, it's obviously a post-2.6.22 thing, I guess I should make that clear, just because I think people send me patches after -rc1 way too eagerly just because they think it fixes a bug. Basically if it's not somethign that has _ever_ worked some way, it's not a bug, it's a feature ;) No, I know better than to send something after your merge window closes. I have no desire to be flamed even further on this topic. :) And come to think of it, adding the enable/disable bits might be good even with the patch to make legacy accesses go through type 1, since PCIe BAR probing is probably done the same way (I haven't looked) and so we might run into the same problems there. I think that disabling decode on non-host-bridge devices during the BAR sizing is something we should at least try, indeed. The issue I have with forcing legacy config space accesses to type1 is that it would make it much less obvious if the MMCONFIG access wasn't working properly. You'd likely be able to boot up but then wonder why something that does extended config space accesses didn't work or hung the box. As I mentioned before, either we trust the MMCONFIG or we don't, and if we decide that we don't on a particular box, we should really be shutting it off entirely. Hopefully with the ACPI reservation checking patch and the disable-decode-during-BAR-sizing patch we wouldn't need to add that restriction. But yes, post-2.6.22 for all of this :-) -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources
Jesse Barnes wrote: On Wednesday, May 23, 2007 4:04 pm David Miller wrote: From: Linus Torvalds <[EMAIL PROTECTED]> Date: Wed, 23 May 2007 15:16:23 -0700 (PDT) That crap should be seen for the crap it is! Dammit, how hard can it be to just admit that mmconfig isn't that great? I knew mmconfig was broken conceptually the first time I started seeing write posting "bug fixes" for it that would do a read back from PCI config space via mmconfig to post the write, which of course has potential side-effects on the device and is absolutely illegal if the write just performed put the device into a PM state or whatever. I've actually seen that specific form of posted write flushing cause crashes on some machines, so yes, it sucks. Unfortunately, I don't think we have any other way of getting at extended config space on x86, unless EFI provides methods or something, but I'm not sure that would be an improvement... That "fix" shouldn't be needed at all, the MMCONFIG memory range shouldn't be covered by PCI ordering rules, so there should be no such thing as write posting. I suspect that the author of such patch(es) was doing so out of some misguided sense that it was needed. (And if there is some chipset where it is actually needed, better just disable MMCONFIG on that one, as there's no way to use it sanely.) -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources
Linus Torvalds wrote: On Wed, 23 May 2007, Jesse Barnes wrote: Fixed it (finally). I don't think moving the 64 bit probing around would make a difference, since we'd restore its original value anyway before moving on to the 32 bit probe which is where I think the problem is. Well, the thing is, I'm pretty sure there is at least one northbridge that stops memory accesses from the CPU when you turn off the MEM bit on it. Oops, you just killed the machine. Which is retarded, since the command bits are only supposed to be for memory ranges that are part of the BARs, it's not supposed to completely kill the device function. Unless somehow the memory on that system is accessed through the PCI bus or something. Anyway, it's something we have to deal with. Looking at the 925X datasheet (which I happened to have around in my google search history because of the discussions of the sky2 DMA problems), it looks like at least that one just hardcodes the MEM bit to be 1, and thus writing to it is a total no-op. But I really think that clearing the MEM bit for at least the host bridge is conceptually quite wrong, even if it might turn out that all chipsets end up just saying (like Intel) "screw it, the user is insane, we're not going to actually do what he asks us to do". Do we really want to be that insane? Turn off memory accesses when probing the CPU host bridge? So at a _minimum_ I would say that that thing needs to be more careful about host bridges. Maybe it's not needed, who knows? I think we should likely avoid disabling the command bits on host bridges (maybe any bridge) due to this risk of disabling something that will break things. Ideally we can get around this without doing any disabling at all, as noted in my last email. Linus, since you were the one concerned about breaking working setups, what do you think? Should we use this approach, or specifically quirk out cases where mmconfig space might conflict with BAR probing? So see above. I think at a minimum, we should consider the host bridge special. I also suspect that we'd be simply better off if we didn't use mmconfig at all unless we _have_ to. Why use mmconfig for the standard BAR accesses? Is there really any reason? I can understand using it for extended config space, since then the old-fashioned approach won't work. But for normal accesses? What's the point, really? Why not? Either you trust that the MMCONFIG is working or you don't. If you trust it, you might as well use it for everything, and if you don't, you can't risk using it for anything. If there are problems that show up only with MMCONFIG, doing what you propose would simply cover them up until somebody actually tried accessing extended config space. mmconfig seems to be fundamentally designed to be impossible to bootstrap off, so there's no way you can have a machine that _only_ supports mmconfig. So why do people seem to think it's so wonderful? Please fill me in on this fundamental mystery. Sure you can bootstrap off it, you just need to have some way to know where to find it (either ACPI or some other system-specific mechanism). Quite frankly, if we just didn't use mmconfig, the whole issue would go away. Isn't _that_ the much better solution? I don't think that is going to be viable in the long run now that Windows Vista is out and MS is actually encouraging HW developers to allow using that config space.. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources
Jesse Barnes wrote: On Tuesday, May 22, 2007 6:06 pm Robert Hancock wrote: There was a big discussion about this back in 2002, in which Linus wasn't overly enthused about disabling the decode during probing due to risk of causing problems with some devices: http://lkml.org/lkml/2002/12/19/145 In this particular case (64-bit BAR) we might be able to avoid the problem by changing the order in which we probe the two halves of the address, i.e. change the top half to 0x before messing with the bottom half and then change it back last. That way, we end up mapping it way to the top of 64-bit address space, which hopefully is less likely to conflict.. Fixed it (finally). I don't think moving the 64 bit probing around would make a difference, since we'd restore its original value anyway before moving on to the 32 bit probe which is where I think the problem is. You couldn't just reorder the code the way it is now, you'd have to rearrange the way we do things for 64-bit BARs: -write to high part of 64-bit address (we end up moving the BAR to 0xC000 for example) -If any bits stick, we know what the size is now (more than 4GB of decode), so just change it back, we're done -If not, we need to check the low part, so write to low part of 64-bit address (BAR moves to 0x) -Check which bits stick and calculate the address -Change the low part of the address back (BAR moves to 0xC00) -Change the high part of the address back (BAR moves to the original 0xC000 address) This means that at no point do we map the BAR anywhere near the top of 32-bit memory, so we should avoid this issue in this particular case. I don't think this strategy is too likely to break anything, surely less likely than disabling command bits. Jesse, you might want to try hacking up something like this and see what happens. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources
Jesse Barnes wrote: On Tuesday, May 22, 2007 6:06 pm Robert Hancock wrote: There was a big discussion about this back in 2002, in which Linus wasn't overly enthused about disabling the decode during probing due to risk of causing problems with some devices: http://lkml.org/lkml/2002/12/19/145 In this particular case (64-bit BAR) we might be able to avoid the problem by changing the order in which we probe the two halves of the address, i.e. change the top half to 0x before messing with the bottom half and then change it back last. That way, we end up mapping it way to the top of 64-bit address space, which hopefully is less likely to conflict.. Fixed it (finally). I don't think moving the 64 bit probing around would make a difference, since we'd restore its original value anyway before moving on to the 32 bit probe which is where I think the problem is. You couldn't just reorder the code the way it is now, you'd have to rearrange the way we do things for 64-bit BARs: -write to high part of 64-bit address (we end up moving the BAR to 0xC000 for example) -If any bits stick, we know what the size is now (more than 4GB of decode), so just change it back, we're done -If not, we need to check the low part, so write to low part of 64-bit address (BAR moves to 0x) -Check which bits stick and calculate the address -Change the low part of the address back (BAR moves to 0xC00) -Change the high part of the address back (BAR moves to the original 0xC000 address) This means that at no point do we map the BAR anywhere near the top of 32-bit memory, so we should avoid this issue in this particular case. I don't think this strategy is too likely to break anything, surely less likely than disabling command bits. Jesse, you might want to try hacking up something like this and see what happens. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources
Linus Torvalds wrote: On Wed, 23 May 2007, Jesse Barnes wrote: Fixed it (finally). I don't think moving the 64 bit probing around would make a difference, since we'd restore its original value anyway before moving on to the 32 bit probe which is where I think the problem is. Well, the thing is, I'm pretty sure there is at least one northbridge that stops memory accesses from the CPU when you turn off the MEM bit on it. Oops, you just killed the machine. Which is retarded, since the command bits are only supposed to be for memory ranges that are part of the BARs, it's not supposed to completely kill the device function. Unless somehow the memory on that system is accessed through the PCI bus or something. Anyway, it's something we have to deal with. Looking at the 925X datasheet (which I happened to have around in my google search history because of the discussions of the sky2 DMA problems), it looks like at least that one just hardcodes the MEM bit to be 1, and thus writing to it is a total no-op. But I really think that clearing the MEM bit for at least the host bridge is conceptually quite wrong, even if it might turn out that all chipsets end up just saying (like Intel) screw it, the user is insane, we're not going to actually do what he asks us to do. Do we really want to be that insane? Turn off memory accesses when probing the CPU host bridge? So at a _minimum_ I would say that that thing needs to be more careful about host bridges. Maybe it's not needed, who knows? I think we should likely avoid disabling the command bits on host bridges (maybe any bridge) due to this risk of disabling something that will break things. Ideally we can get around this without doing any disabling at all, as noted in my last email. Linus, since you were the one concerned about breaking working setups, what do you think? Should we use this approach, or specifically quirk out cases where mmconfig space might conflict with BAR probing? So see above. I think at a minimum, we should consider the host bridge special. I also suspect that we'd be simply better off if we didn't use mmconfig at all unless we _have_ to. Why use mmconfig for the standard BAR accesses? Is there really any reason? I can understand using it for extended config space, since then the old-fashioned approach won't work. But for normal accesses? What's the point, really? Why not? Either you trust that the MMCONFIG is working or you don't. If you trust it, you might as well use it for everything, and if you don't, you can't risk using it for anything. If there are problems that show up only with MMCONFIG, doing what you propose would simply cover them up until somebody actually tried accessing extended config space. mmconfig seems to be fundamentally designed to be impossible to bootstrap off, so there's no way you can have a machine that _only_ supports mmconfig. So why do people seem to think it's so wonderful? Please fill me in on this fundamental mystery. Sure you can bootstrap off it, you just need to have some way to know where to find it (either ACPI or some other system-specific mechanism). Quite frankly, if we just didn't use mmconfig, the whole issue would go away. Isn't _that_ the much better solution? I don't think that is going to be viable in the long run now that Windows Vista is out and MS is actually encouraging HW developers to allow using that config space.. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources
Jesse Barnes wrote: On Wednesday, May 23, 2007 4:04 pm David Miller wrote: From: Linus Torvalds [EMAIL PROTECTED] Date: Wed, 23 May 2007 15:16:23 -0700 (PDT) That crap should be seen for the crap it is! Dammit, how hard can it be to just admit that mmconfig isn't that great? I knew mmconfig was broken conceptually the first time I started seeing write posting bug fixes for it that would do a read back from PCI config space via mmconfig to post the write, which of course has potential side-effects on the device and is absolutely illegal if the write just performed put the device into a PM state or whatever. I've actually seen that specific form of posted write flushing cause crashes on some machines, so yes, it sucks. Unfortunately, I don't think we have any other way of getting at extended config space on x86, unless EFI provides methods or something, but I'm not sure that would be an improvement... That fix shouldn't be needed at all, the MMCONFIG memory range shouldn't be covered by PCI ordering rules, so there should be no such thing as write posting. I suspect that the author of such patch(es) was doing so out of some misguided sense that it was needed. (And if there is some chipset where it is actually needed, better just disable MMCONFIG on that one, as there's no way to use it sanely.) -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources
Jesse Barnes wrote: On Wednesday, May 23, 2007 8:20:14 Linus Torvalds wrote: On Wed, 23 May 2007, Linus Torvalds wrote: Sure. I think mmconfig is perfectly sane if it falls back to conf1 accesses for legacy stuff.. .. but without a regression, it's obviously a post-2.6.22 thing, I guess I should make that clear, just because I think people send me patches after -rc1 way too eagerly just because they think it fixes a bug. Basically if it's not somethign that has _ever_ worked some way, it's not a bug, it's a feature ;) No, I know better than to send something after your merge window closes. I have no desire to be flamed even further on this topic. :) And come to think of it, adding the enable/disable bits might be good even with the patch to make legacy accesses go through type 1, since PCIe BAR probing is probably done the same way (I haven't looked) and so we might run into the same problems there. I think that disabling decode on non-host-bridge devices during the BAR sizing is something we should at least try, indeed. The issue I have with forcing legacy config space accesses to type1 is that it would make it much less obvious if the MMCONFIG access wasn't working properly. You'd likely be able to boot up but then wonder why something that does extended config space accesses didn't work or hung the box. As I mentioned before, either we trust the MMCONFIG or we don't, and if we decide that we don't on a particular box, we should really be shutting it off entirely. Hopefully with the ACPI reservation checking patch and the disable-decode-during-BAR-sizing patch we wouldn't need to add that restriction. But yes, post-2.6.22 for all of this :-) -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] [scsi] Remove __GFP_DMA
Alan Cox wrote: On Wed, 23 May 2007 15:17:08 -0400 Salyzyn, Mark [EMAIL PROTECTED] wrote: The 31 bit limit for some of these cards is a problem, we currently only do __GFP_DMA for bounce buffer sg elements allocated for user supplied references in ioctls. I figure we should be using pci_alloc_consistent calls for these allocations to more accurately acquire memory within the 31 bit limit if necessary, we could switch to these to remove the need for the __GFP_DMA flag in the aacraid driver? That didn't used to work right on the AMD boards when I tried it last as we ended up with a buffer that was mapped by the IOMMU for some reason and that was not below 2GB. The physical address you mean? If that is still happening then it needs to get fixed. The allocation should not succeed if it can't provide memory that's inside the DMA mask for the device.. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources
Jesse Barnes wrote: On Tuesday, May 22, 2007, Robert Hancock wrote: Eww. I don't see where we disable the decode at all while we probe the BARs on the device. That seems like a bad thing, especially with the way we probe 64-bit BARs (do the low 32 bits first and then the high 32 bits). This means the base address effectively gets set to 0xfff0 momentarily, which might cause some issues. I'm a bit shocked that things work as well as they do without the disabling... I'd try adding some code inside pci_setup_device (drivers/pci/probe.c) to disable PCI_COMMAND_IO and PCI_COMMAND_MEMORY on the device when probing devices with the standard header type and then restoring the previous command bits afterwards, and see what effect that has. It'll be interesting if it does, since obviously it seems to work as it is with non-MMCONFIG access methods. Maybe the base address being set like that interferes with MMCONFIG access itself somehow? I tried that, and it seems to get past probing the graphics device at least, but it hangs a bit later. It could be that the enable/disable I added wasn't correct though, I didn't check to see which one I should disable in the command word, which may be a problem (just disabled them both every probe). I'll try again with more precise enable/disable semantics. There was a big discussion about this back in 2002, in which Linus wasn't overly enthused about disabling the decode during probing due to risk of causing problems with some devices: http://lkml.org/lkml/2002/12/19/145 In this particular case (64-bit BAR) we might be able to avoid the problem by changing the order in which we probe the two halves of the address, i.e. change the top half to 0x before messing with the bottom half and then change it back last. That way, we end up mapping it way to the top of 64-bit address space, which hopefully is less likely to conflict.. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources
Jesse Barnes wrote: On Tuesday, May 22, 2007, Robert Hancock wrote: Eww. I don't see where we disable the decode at all while we probe the BARs on the device. That seems like a bad thing, especially with the way we probe 64-bit BARs (do the low 32 bits first and then the high 32 bits). This means the base address effectively gets set to 0xfff0 momentarily, which might cause some issues. I'm a bit shocked that things work as well as they do without the disabling... I'd try adding some code inside pci_setup_device (drivers/pci/probe.c) to disable PCI_COMMAND_IO and PCI_COMMAND_MEMORY on the device when probing devices with the standard header type and then restoring the previous command bits afterwards, and see what effect that has. It'll be interesting if it does, since obviously it seems to work as it is with non-MMCONFIG access methods. Maybe the base address being set like that interferes with MMCONFIG access itself somehow? I tried that, and it seems to get past probing the graphics device at least, but it hangs a bit later. It could be that the enable/disable I added wasn't correct though, I didn't check to see which one I should disable in the command word, which may be a problem (just disabled them both every probe). I'll try again with more precise enable/disable semantics. It'd be interesting to see at what access it ran into trouble next, at least if it's consistent. Could be that some device doesn't like having the decode disabled.. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources
Jesse Barnes wrote: On Monday, May 21, 2007, Jesse Barnes wrote: Yeah, I've got that data... just a sec while I make sure it's reproducable... Aha, I hadn't decoded the devfn before, looks like it's dying on an access to the graphics device (bus 0, slot 2, device 0): ... pci_mmcfg_read: 0, 0, 0x10, 0x18, 4 = 0xc00c pci_mmcfg_read: 0, 0, 0x10, 0x18, 4 = ... Offset 0x18 into the graphics config space should be the graphics memory range address, and 0xc00c is the correct value. But for some reason it hangs on the second access. It hangs here everytime. That register is in the config space BAR region, so it should be ok to write 0x to it and read it back to size the register. However, it's after writing the 0x to it and trying to read it back that the machine hangs. I didn't see any accesses to the command register to disable decoding (at least not via the mmconfig methods), so maybe that's broken during MCFG based probing? Eww. I don't see where we disable the decode at all while we probe the BARs on the device. That seems like a bad thing, especially with the way we probe 64-bit BARs (do the low 32 bits first and then the high 32 bits). This means the base address effectively gets set to 0xfff0 momentarily, which might cause some issues. I'd try adding some code inside pci_setup_device (drivers/pci/probe.c) to disable PCI_COMMAND_IO and PCI_COMMAND_MEMORY on the device when probing devices with the standard header type and then restoring the previous command bits afterwards, and see what effect that has. It'll be interesting if it does, since obviously it seems to work as it is with non-MMCONFIG access methods. Maybe the base address being set like that interferes with MMCONFIG access itself somehow? -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Enabling power states for Core 2 Duo
Paa Paa wrote: For some reason I'm not able to enable processor power states (c1, c2 etc.) for my Core 2 Duo. This is what I get:: cat /proc/acpi/processor/CPU1/info processor id:0 acpi id: 1 bus mastering control: no power management:no throttling control: no limit interface: no cat /proc/acpi/processor/CPU1/power active state:C0 max_cstate: C8 bus master activity: maximum allowed latency: 2000 usec states: "dmesg | grep -i power" also gives nothing. I have ACPI enabled in BIOS and in kernel I have these set ("grep -i acpi .config | grep =y"): CONFIG_ACPI=y CONFIG_ACPI_PROCESSOR=y CONFIG_ACPI_THERMAL=y CONFIG_ACPI_EC=y CONFIG_ACPI_POWER=y CONFIG_ACPI_SYSTEM=y CONFIG_X86_ACPI_CPUFREQ=y CONFIG_PNPACPI=y CONFIG_SATA_ACPI=y I'm probably missing something crucial here. So how do I enable power states? I'm using 64-bit Gentoo. My mobo is Asus P5B Deluxe. Otherwise ACPI works fine. The BIOS has to expose this support in ACPI, if it doesn't (which is often the case on desktop boards) you won't get any C-state support (well, except for C1 which is just the normal halt state). -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources
Jesse Barnes wrote: On Monday, May 21, 2007, Jesse Barnes wrote: Yeah, I've got that data... just a sec while I make sure it's reproducable... Aha, I hadn't decoded the devfn before, looks like it's dying on an access to the graphics device (bus 0, slot 2, device 0): ... pci_mmcfg_read: 0, 0, 0x10, 0x18, 4 = 0xc00c pci_mmcfg_read: 0, 0, 0x10, 0x18, 4 = hang ... Offset 0x18 into the graphics config space should be the graphics memory range address, and 0xc00c is the correct value. But for some reason it hangs on the second access. It hangs here everytime. That register is in the config space BAR region, so it should be ok to write 0x to it and read it back to size the register. However, it's after writing the 0x to it and trying to read it back that the machine hangs. I didn't see any accesses to the command register to disable decoding (at least not via the mmconfig methods), so maybe that's broken during MCFG based probing? Eww. I don't see where we disable the decode at all while we probe the BARs on the device. That seems like a bad thing, especially with the way we probe 64-bit BARs (do the low 32 bits first and then the high 32 bits). This means the base address effectively gets set to 0xfff0 momentarily, which might cause some issues. I'd try adding some code inside pci_setup_device (drivers/pci/probe.c) to disable PCI_COMMAND_IO and PCI_COMMAND_MEMORY on the device when probing devices with the standard header type and then restoring the previous command bits afterwards, and see what effect that has. It'll be interesting if it does, since obviously it seems to work as it is with non-MMCONFIG access methods. Maybe the base address being set like that interferes with MMCONFIG access itself somehow? -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources
Jesse Barnes wrote: On Tuesday, May 22, 2007, Robert Hancock wrote: Eww. I don't see where we disable the decode at all while we probe the BARs on the device. That seems like a bad thing, especially with the way we probe 64-bit BARs (do the low 32 bits first and then the high 32 bits). This means the base address effectively gets set to 0xfff0 momentarily, which might cause some issues. I'm a bit shocked that things work as well as they do without the disabling... I'd try adding some code inside pci_setup_device (drivers/pci/probe.c) to disable PCI_COMMAND_IO and PCI_COMMAND_MEMORY on the device when probing devices with the standard header type and then restoring the previous command bits afterwards, and see what effect that has. It'll be interesting if it does, since obviously it seems to work as it is with non-MMCONFIG access methods. Maybe the base address being set like that interferes with MMCONFIG access itself somehow? I tried that, and it seems to get past probing the graphics device at least, but it hangs a bit later. It could be that the enable/disable I added wasn't correct though, I didn't check to see which one I should disable in the command word, which may be a problem (just disabled them both every probe). I'll try again with more precise enable/disable semantics. It'd be interesting to see at what access it ran into trouble next, at least if it's consistent. Could be that some device doesn't like having the decode disabled.. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources
Jesse Barnes wrote: On Tuesday, May 22, 2007, Robert Hancock wrote: Eww. I don't see where we disable the decode at all while we probe the BARs on the device. That seems like a bad thing, especially with the way we probe 64-bit BARs (do the low 32 bits first and then the high 32 bits). This means the base address effectively gets set to 0xfff0 momentarily, which might cause some issues. I'm a bit shocked that things work as well as they do without the disabling... I'd try adding some code inside pci_setup_device (drivers/pci/probe.c) to disable PCI_COMMAND_IO and PCI_COMMAND_MEMORY on the device when probing devices with the standard header type and then restoring the previous command bits afterwards, and see what effect that has. It'll be interesting if it does, since obviously it seems to work as it is with non-MMCONFIG access methods. Maybe the base address being set like that interferes with MMCONFIG access itself somehow? I tried that, and it seems to get past probing the graphics device at least, but it hangs a bit later. It could be that the enable/disable I added wasn't correct though, I didn't check to see which one I should disable in the command word, which may be a problem (just disabled them both every probe). I'll try again with more precise enable/disable semantics. There was a big discussion about this back in 2002, in which Linus wasn't overly enthused about disabling the decode during probing due to risk of causing problems with some devices: http://lkml.org/lkml/2002/12/19/145 In this particular case (64-bit BAR) we might be able to avoid the problem by changing the order in which we probe the two halves of the address, i.e. change the top half to 0x before messing with the bottom half and then change it back last. That way, we end up mapping it way to the top of 64-bit address space, which hopefully is less likely to conflict.. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Enabling power states for Core 2 Duo
Paa Paa wrote: For some reason I'm not able to enable processor power states (c1, c2 etc.) for my Core 2 Duo. This is what I get:: cat /proc/acpi/processor/CPU1/info processor id:0 acpi id: 1 bus mastering control: no power management:no throttling control: no limit interface: no cat /proc/acpi/processor/CPU1/power active state:C0 max_cstate: C8 bus master activity: maximum allowed latency: 2000 usec states: dmesg | grep -i power also gives nothing. I have ACPI enabled in BIOS and in kernel I have these set (grep -i acpi .config | grep =y): CONFIG_ACPI=y CONFIG_ACPI_PROCESSOR=y CONFIG_ACPI_THERMAL=y CONFIG_ACPI_EC=y CONFIG_ACPI_POWER=y CONFIG_ACPI_SYSTEM=y CONFIG_X86_ACPI_CPUFREQ=y CONFIG_PNPACPI=y CONFIG_SATA_ACPI=y I'm probably missing something crucial here. So how do I enable power states? I'm using 64-bit Gentoo. My mobo is Asus P5B Deluxe. Otherwise ACPI works fine. The BIOS has to expose this support in ACPI, if it doesn't (which is often the case on desktop boards) you won't get any C-state support (well, except for C1 which is just the normal halt state). -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources
Jesse Barnes wrote: What happens if you take out the chipset register detection, does the MCFG table give you the same result? Wonder if they're doing something funny with start/end bus values or something in their table. There's some code in my patch that prints out the important data from the MCFG table, can you tell me what that shows with the chipset detection taken out? I can't see how any MCFG based accesses could work on this box, but I don't know why. According to the boot log (with our code patched in but disabled after checking the ACPI reserved status), the space is fine: ... ACPI: (supports S0 S3 S4 S5) ACPI: Using IOAPIC for interrupt routing pciexbar lo: 0xf003 pciexbar hi: 0x Enabled MCFG space at 0xf000, size 134217728 PCI: Found Intel Corporation G965 Express Memory Controller Hub with MMCONFIG support. PCI: MCFG configuration 0: base f000 segment 0 buses 0 - 127 PCI: MCFG area at f000 reserved in ACPI motherboard resources PCI: Not using MMCONFIG. <-- due to the 'goto reject' after if (is_acpi_reserved) { ... } PM: Adding info for acpi:acpi_system:00 PM: Adding info for acpi:button_power:00 ... Same thing happens if I disable the chipset specific code and just use the ACPI stuff you added. If I leave it enabled, several config cycles work fine, but the box eventually hangs after probing 24 devices or so. I don't see anything else mapped into this space, and the MTRRs seem ok, so either there's something hidden in this memory range or there's another chipset register that needs poking to fully enable this space properly. Sysrq doesn't seem to work, and I don't see any events in my machine log, so figuring out exactly why it's hanging is a bit difficult. Any ideas on what to try next? I'll see if I can get some more details from our BIOS folks and do yet another pass over the documentation to see if there's something I'm missing. Can you find out which config access (bus, device, function, address) is the one that hangs the box? I assume that either the corresponding address in the MCFG table is problematic (i.e. has something else mapped over it), or maybe that device just doesn't like being probed with MCFG somehow. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Add Seagate STT20000A to DMA blacklist.
Dave Jones wrote: On Mon, May 21, 2007 at 05:15:51PM +0100, Alan Cox wrote: > On Mon, 21 May 2007 10:50:42 -0400 > Dave Jones <[EMAIL PROTECTED]> wrote: > > > http://bugzilla.kernel.org/show_bug.cgi?id=1044 > > has been open for _four_ years with a patch available. > > Here's a rediffed version of the same. > > Please update libata as well when you udpate the blacklists. Sure, point me at the table(s) ? Dave ata_device_blacklist in drivers/ata/libata-core.c -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: something strange in libata-core.c for kernel 2.6.22-rc3
Alan Cox wrote: Yeah, that's consistent to what I've seen on my machine which is a variant of A8N. No matter what value I through at _STM, _GTM just echoed the result thus always leading to 80c configuration. I guess this means that what we have to do is trust that the BIOS set up a reasonable mode and base the cable detect on that (either by reading back the boot-up controller registers, or by calling GTM). I imagine this is what the Windows default IDE driver is doing (just using the boot-up mode and feeding it back using GTM/STM on suspend/resume cycles). Alan, what do you think? Interesting, sounds like it is still useful rather than just reading the registers as the GTM/STM seem to survive resume cycles which drive config may not (eg if the driver is loaded after a s2ram/resume. I don't think that case is handled in this BIOS anyway - if you call GTM after resume without previously calling STM, it's just going to read whatever random values are in the controller and give you timings based on that, which presumably will be junk. It looks like the main purpose for what it's doing with saving those registers in the _PTS method is to save and restore a couple of controller registers called ID20 (PCI config space offset 0x50, 16 bits) and ID22 (PCI config space offset 0x5C, 32 bits) which aren't otherwise used in the AML. According to pata_amd, for the AMD IDE interface the former is some reserved bits as well as the cable detect bits, while the latter is the cycle time and address setup time register. Presumably those aren't really the cable detect bits though, since the detection based on those bits in pata_amd doesn't really work.. If it just echoes back we should also be able to detect this by using knowingly invalid values. Well, this implementation doesn't purely echo back the same values, it echoes back values derived from what the controller was actually set to, so I imagine if you put in something ridiculous it would come back with the closest possible mode that it was set to (PIO mode 0, etc.) I suspect the implementation we would need to use (which doesn't depend on anything not given in the spec) would be: -On driver load, execute _GTM to get the timing mode the BIOS had set. Assume this represents the fastest modes the controller supports, and set cable detect based on whether it includes UDMA modes > 2. -If we decide to set a slower mode (speed down due to errors, etc.), set it using _STM and then read back the actual values that were set using _GTM (for possible use in suspend/resume). -On resume after suspend, re-set the last mode using _STM followed by executing _GTF and running those commands. This won't handle the case where the driver is loaded after the system was already suspended to RAM and resumed, however I don't know exactly how one could handle that in this situation.. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: something strange in libata-core.c for kernel 2.6.22-rc3
Tejun Heo wrote: [EMAIL PROTECTED] wrote: Mybe I am wrong, but if you are detecting 40-wire cable to set them to DMA/33, why the check includes also 80-wire cables configuring them to DMA/33 too? With this patch my nvidia4 IDE controllers detects correctly and configure correctly DMA/100 for my HD and DMA/33 for my DVD (the first uses a 80-wire cable, the second a 40-wire cable). Am I wrong somewhere? That's the drive side verification of 80c cable check, so if the condition triggers we downgrade 80c or unknown to 40c. Cable detection on nvidia PATA is a disaster. You're supposed to do some ACPI dancing and drive side detection is completely bogus. Eeeek Alan, did you have a chance to test the ACPI cable detection? It just didn't work when I tried it. It always returned 80c on my machine. On a whim I started poking around in the disassembled ACPI DSDT code for my Asus A8N-SLI Deluxe board, which is one of these chipsets. The original thought was that the STM/GTM trick on these chipsets is supposed to allow us to determine what modes we should use based on what modes it sets up appropriately. Unfortunately, unless I'm missing something in the AML (which is possible) it doesn't seem like there is any validation being done on the settings passed in. The settings appear to essentially just get programmed into the controller when STM is called and read back on GTM. The only complication is some logic on the _PTS method (Prepare to Sleep) which stores the current settings into some variables, and in STM, if a flag was set by the _PTS method, the previous settings for all registers are stored back first before writing the requested values into the correct places. So in this case, obviously the approach used by pata_acpi, etc. won't work for cable detection. Whatever magic register on the chipset contains the cable detect value, the AML doesn't seem to be accessing it. The ACPI spec doesn't really give any guarantee that the "try STM on all possible modes" trick will work either, since there seems to be no mention of the AML being required to validate the mode and the STM function has no return value to indicate failure. I guess this means that what we have to do is trust that the BIOS set up a reasonable mode and base the cable detect on that (either by reading back the boot-up controller registers, or by calling GTM). I imagine this is what the Windows default IDE driver is doing (just using the boot-up mode and feeding it back using GTM/STM on suspend/resume cycles). -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: IDE/ATA: Intel i865-based mainboard, CDROM not detected
Jonathan Woithe wrote: I've just tried a quick test after enabling most of the PATA drivers under the libata section including the Jmicron driver (basically everything except those labelled "highly experimental"). As far as I can tell the CDROM/DVDROM is still not detected even with all these built into the kernel. Maybe I do need one of those "highly experimental" drivers. Can you post the entire lspci -v for this board? Also, it's unrelated to this problem, but you should check the BIOS settings for the SATA controller - you really want to get the controller into AHCI mode for best performance. I've often wondered how the BIOS descriptions correlate with the modes the controller ends up in. I've always gone for things like "enhanced" or "SATA" or "native" (the exact string of course being dependent on the BIOS writer's mood on the day). This seems to work out OK in practice. How can you tell from the Linux boot messages that the controller is in AHCI mode - is it as simple as looking for AHCI driver messages? In this case the scsi0 : ata_piix scsi1 : ata_piix indicate that things are suboptimal I assume. Right, you should see that showing up as ahci. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: IDE/ATA: Intel i865-based mainboard, CDROM not detected
Jonathan Woithe wrote: A collegue of mine has an Intel mainboard with the i865 chipset onboard (DQ965). All kernels up to and including 2.6.22-rc2 do not detect the IDE CDROM/DVDROM when booting. The SATA hard drive is found without any problems. Relevant parts from lspci: 00:1f.2 0101: 8086:2820 (rev 02) 00:1f.2 IDE interface: Intel Corporation 82801H (ICH8 Family) 4 port SATA IDE Controller (rev 02) (prog-if 8f [Master SecP SecO PriP PriO]) Subsystem: Intel Corporation Unknown device 514d Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 19 00:1f.5 0101: 8086:2825 (rev 02) 00:1f.5 IDE interface: Intel Corporation 82801H (ICH8 Family) 2 port SATA IDE Controller (rev 02) (prog-if 85 [Master SecO PriO]) Subsystem: Intel Corporation Unknown device 514d Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 19 What's interesting here is that 00:1f.2 and 00:1f.5 are both identified as "n port SATA" controllers even though one of them (I suspect 00:1f.5) is a PATA controller. This may just be a typo in lspci's database though. Boot messages: ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx Probing IDE interface ide0... Probing IDE interface ide1... : ata_piix :00:1f.2: version 2.11 ata_piix :00:1f.2: MAP [ P0 P2 P1 P3 ] ACPI: PCI Interrupt :00:1f.2[A] -> GSI 19 (level, low) -> IRQ 19 PCI: Setting latency timer of device :00:1f.2 to 64 scsi0 : ata_piix scsi1 : ata_piix ata1: SATA max UDMA/133 cmd 0x00012138 ctl 0x00012156 bmdma 0x00012110 irq 0 ata2: SATA max UDMA/133 cmd 0x00012130 ctl 0x00012152 bmdma 0x00012118 irq 0 ata1.00: ata_hpa_resize 1: sectors = 488397168, hpa_sectors = 488397168 ata1.00: ATA-7: WDC WD2500AAJS-00RYA0, 12.01B01, max UDMA/133 ata1.00: 488397168 sectors, multi 16: LBA48 NCQ (depth 0/32) ata1.00: ata_hpa_resize 1: sectors = 488397168, hpa_sectors = 488397168 ata1.00: configured for UDMA/133 ATA: abnormal status 0x7F on port 0x00012137 scsi 0:0:0:0: Direct-Access ATA WDC WD2500AAJS-0 12.0 PQ: 0 ANSI: 5 sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors (250059 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors (250059 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sda: sda1 sda2 sd 0:0:0:0: [sda] Attached SCSI disk ata_piix :00:1f.5: MAP [ P0 P2 P1 P3 ] Here the HDD is clearly detected while the CDROM/DVDROM (attached to ide0) isn't. libata is compiled into the kernel as is the non-libata PATA driver. In the libata configuration, only SATA_AHCI, ATA_PIIX and ATA_GENERIC are defined. For the non-libata side of things most options are selected including BLK_DEV_IDE, BLK_DEV_IDECD, IDE_GENERIC, BLK_DEV_IDEPCI, BLK_DEV_GENERIC, BLK_DEV_IDEDMA_PCI and BLK_DEV_PIIX. Does anyone have any ideas as to why there is a problem detecting the PATA (IDE) CDROM/DVDROM in this machine? Further information/testing can be provided if requested. A lot of newer Intel boards have the IDE interface provided by an external JMicron, etc. chip so you may need to enable that driver for things to work. Also, it's unrelated to this problem, but you should check the BIOS settings for the SATA controller - you really want to get the controller into AHCI mode for best performance. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: IDE/ATA: Intel i865-based mainboard, CDROM not detected
Jonathan Woithe wrote: A collegue of mine has an Intel mainboard with the i865 chipset onboard (DQ965). All kernels up to and including 2.6.22-rc2 do not detect the IDE CDROM/DVDROM when booting. The SATA hard drive is found without any problems. Relevant parts from lspci: 00:1f.2 0101: 8086:2820 (rev 02) 00:1f.2 IDE interface: Intel Corporation 82801H (ICH8 Family) 4 port SATA IDE Controller (rev 02) (prog-if 8f [Master SecP SecO PriP PriO]) Subsystem: Intel Corporation Unknown device 514d Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 19 00:1f.5 0101: 8086:2825 (rev 02) 00:1f.5 IDE interface: Intel Corporation 82801H (ICH8 Family) 2 port SATA IDE Controller (rev 02) (prog-if 85 [Master SecO PriO]) Subsystem: Intel Corporation Unknown device 514d Flags: bus master, 66MHz, medium devsel, latency 0, IRQ 19 What's interesting here is that 00:1f.2 and 00:1f.5 are both identified as n port SATA controllers even though one of them (I suspect 00:1f.5) is a PATA controller. This may just be a typo in lspci's database though. Boot messages: ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx Probing IDE interface ide0... Probing IDE interface ide1... : ata_piix :00:1f.2: version 2.11 ata_piix :00:1f.2: MAP [ P0 P2 P1 P3 ] ACPI: PCI Interrupt :00:1f.2[A] - GSI 19 (level, low) - IRQ 19 PCI: Setting latency timer of device :00:1f.2 to 64 scsi0 : ata_piix scsi1 : ata_piix ata1: SATA max UDMA/133 cmd 0x00012138 ctl 0x00012156 bmdma 0x00012110 irq 0 ata2: SATA max UDMA/133 cmd 0x00012130 ctl 0x00012152 bmdma 0x00012118 irq 0 ata1.00: ata_hpa_resize 1: sectors = 488397168, hpa_sectors = 488397168 ata1.00: ATA-7: WDC WD2500AAJS-00RYA0, 12.01B01, max UDMA/133 ata1.00: 488397168 sectors, multi 16: LBA48 NCQ (depth 0/32) ata1.00: ata_hpa_resize 1: sectors = 488397168, hpa_sectors = 488397168 ata1.00: configured for UDMA/133 ATA: abnormal status 0x7F on port 0x00012137 scsi 0:0:0:0: Direct-Access ATA WDC WD2500AAJS-0 12.0 PQ: 0 ANSI: 5 sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors (250059 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sd 0:0:0:0: [sda] 488397168 512-byte hardware sectors (250059 MB) sd 0:0:0:0: [sda] Write Protect is off sd 0:0:0:0: [sda] Mode Sense: 00 3a 00 00 sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA sda: sda1 sda2 sd 0:0:0:0: [sda] Attached SCSI disk ata_piix :00:1f.5: MAP [ P0 P2 P1 P3 ] Here the HDD is clearly detected while the CDROM/DVDROM (attached to ide0) isn't. libata is compiled into the kernel as is the non-libata PATA driver. In the libata configuration, only SATA_AHCI, ATA_PIIX and ATA_GENERIC are defined. For the non-libata side of things most options are selected including BLK_DEV_IDE, BLK_DEV_IDECD, IDE_GENERIC, BLK_DEV_IDEPCI, BLK_DEV_GENERIC, BLK_DEV_IDEDMA_PCI and BLK_DEV_PIIX. Does anyone have any ideas as to why there is a problem detecting the PATA (IDE) CDROM/DVDROM in this machine? Further information/testing can be provided if requested. A lot of newer Intel boards have the IDE interface provided by an external JMicron, etc. chip so you may need to enable that driver for things to work. Also, it's unrelated to this problem, but you should check the BIOS settings for the SATA controller - you really want to get the controller into AHCI mode for best performance. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: IDE/ATA: Intel i865-based mainboard, CDROM not detected
Jonathan Woithe wrote: I've just tried a quick test after enabling most of the PATA drivers under the libata section including the Jmicron driver (basically everything except those labelled highly experimental). As far as I can tell the CDROM/DVDROM is still not detected even with all these built into the kernel. Maybe I do need one of those highly experimental drivers. Can you post the entire lspci -v for this board? Also, it's unrelated to this problem, but you should check the BIOS settings for the SATA controller - you really want to get the controller into AHCI mode for best performance. I've often wondered how the BIOS descriptions correlate with the modes the controller ends up in. I've always gone for things like enhanced or SATA or native (the exact string of course being dependent on the BIOS writer's mood on the day). This seems to work out OK in practice. How can you tell from the Linux boot messages that the controller is in AHCI mode - is it as simple as looking for AHCI driver messages? In this case the scsi0 : ata_piix scsi1 : ata_piix indicate that things are suboptimal I assume. Right, you should see that showing up as ahci. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: something strange in libata-core.c for kernel 2.6.22-rc3
Tejun Heo wrote: [EMAIL PROTECTED] wrote: Mybe I am wrong, but if you are detecting 40-wire cable to set them to DMA/33, why the check includes also 80-wire cables configuring them to DMA/33 too? With this patch my nvidia4 IDE controllers detects correctly and configure correctly DMA/100 for my HD and DMA/33 for my DVD (the first uses a 80-wire cable, the second a 40-wire cable). Am I wrong somewhere? That's the drive side verification of 80c cable check, so if the condition triggers we downgrade 80c or unknown to 40c. Cable detection on nvidia PATA is a disaster. You're supposed to do some ACPI dancing and drive side detection is completely bogus. Eeeek Alan, did you have a chance to test the ACPI cable detection? It just didn't work when I tried it. It always returned 80c on my machine. On a whim I started poking around in the disassembled ACPI DSDT code for my Asus A8N-SLI Deluxe board, which is one of these chipsets. The original thought was that the STM/GTM trick on these chipsets is supposed to allow us to determine what modes we should use based on what modes it sets up appropriately. Unfortunately, unless I'm missing something in the AML (which is possible) it doesn't seem like there is any validation being done on the settings passed in. The settings appear to essentially just get programmed into the controller when STM is called and read back on GTM. The only complication is some logic on the _PTS method (Prepare to Sleep) which stores the current settings into some variables, and in STM, if a flag was set by the _PTS method, the previous settings for all registers are stored back first before writing the requested values into the correct places. So in this case, obviously the approach used by pata_acpi, etc. won't work for cable detection. Whatever magic register on the chipset contains the cable detect value, the AML doesn't seem to be accessing it. The ACPI spec doesn't really give any guarantee that the try STM on all possible modes trick will work either, since there seems to be no mention of the AML being required to validate the mode and the STM function has no return value to indicate failure. I guess this means that what we have to do is trust that the BIOS set up a reasonable mode and base the cable detect on that (either by reading back the boot-up controller registers, or by calling GTM). I imagine this is what the Windows default IDE driver is doing (just using the boot-up mode and feeding it back using GTM/STM on suspend/resume cycles). -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: something strange in libata-core.c for kernel 2.6.22-rc3
Alan Cox wrote: Yeah, that's consistent to what I've seen on my machine which is a variant of A8N. No matter what value I through at _STM, _GTM just echoed the result thus always leading to 80c configuration. I guess this means that what we have to do is trust that the BIOS set up a reasonable mode and base the cable detect on that (either by reading back the boot-up controller registers, or by calling GTM). I imagine this is what the Windows default IDE driver is doing (just using the boot-up mode and feeding it back using GTM/STM on suspend/resume cycles). Alan, what do you think? Interesting, sounds like it is still useful rather than just reading the registers as the GTM/STM seem to survive resume cycles which drive config may not (eg if the driver is loaded after a s2ram/resume. I don't think that case is handled in this BIOS anyway - if you call GTM after resume without previously calling STM, it's just going to read whatever random values are in the controller and give you timings based on that, which presumably will be junk. It looks like the main purpose for what it's doing with saving those registers in the _PTS method is to save and restore a couple of controller registers called ID20 (PCI config space offset 0x50, 16 bits) and ID22 (PCI config space offset 0x5C, 32 bits) which aren't otherwise used in the AML. According to pata_amd, for the AMD IDE interface the former is some reserved bits as well as the cable detect bits, while the latter is the cycle time and address setup time register. Presumably those aren't really the cable detect bits though, since the detection based on those bits in pata_amd doesn't really work.. If it just echoes back we should also be able to detect this by using knowingly invalid values. Well, this implementation doesn't purely echo back the same values, it echoes back values derived from what the controller was actually set to, so I imagine if you put in something ridiculous it would come back with the closest possible mode that it was set to (PIO mode 0, etc.) I suspect the implementation we would need to use (which doesn't depend on anything not given in the spec) would be: -On driver load, execute _GTM to get the timing mode the BIOS had set. Assume this represents the fastest modes the controller supports, and set cable detect based on whether it includes UDMA modes 2. -If we decide to set a slower mode (speed down due to errors, etc.), set it using _STM and then read back the actual values that were set using _GTM (for possible use in suspend/resume). -On resume after suspend, re-set the last mode using _STM followed by executing _GTF and running those commands. This won't handle the case where the driver is loaded after the system was already suspended to RAM and resumed, however I don't know exactly how one could handle that in this situation.. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: Add Seagate STT20000A to DMA blacklist.
Dave Jones wrote: On Mon, May 21, 2007 at 05:15:51PM +0100, Alan Cox wrote: On Mon, 21 May 2007 10:50:42 -0400 Dave Jones [EMAIL PROTECTED] wrote: http://bugzilla.kernel.org/show_bug.cgi?id=1044 has been open for _four_ years with a patch available. Here's a rediffed version of the same. Please update libata as well when you udpate the blacklists. Sure, point me at the table(s) ? Dave ata_device_blacklist in drivers/ata/libata-core.c -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH] PCI MMCONFIG: add validation against ACPI motherboard resources
Jesse Barnes wrote: What happens if you take out the chipset register detection, does the MCFG table give you the same result? Wonder if they're doing something funny with start/end bus values or something in their table. There's some code in my patch that prints out the important data from the MCFG table, can you tell me what that shows with the chipset detection taken out? I can't see how any MCFG based accesses could work on this box, but I don't know why. According to the boot log (with our code patched in but disabled after checking the ACPI reserved status), the space is fine: ... ACPI: (supports S0 S3 S4 S5) ACPI: Using IOAPIC for interrupt routing pciexbar lo: 0xf003 pciexbar hi: 0x Enabled MCFG space at 0xf000, size 134217728 PCI: Found Intel Corporation G965 Express Memory Controller Hub with MMCONFIG support. PCI: MCFG configuration 0: base f000 segment 0 buses 0 - 127 PCI: MCFG area at f000 reserved in ACPI motherboard resources PCI: Not using MMCONFIG. -- due to the 'goto reject' after if (is_acpi_reserved) { ... } PM: Adding info for acpi:acpi_system:00 PM: Adding info for acpi:button_power:00 ... Same thing happens if I disable the chipset specific code and just use the ACPI stuff you added. If I leave it enabled, several config cycles work fine, but the box eventually hangs after probing 24 devices or so. I don't see anything else mapped into this space, and the MTRRs seem ok, so either there's something hidden in this memory range or there's another chipset register that needs poking to fully enable this space properly. Sysrq doesn't seem to work, and I don't see any events in my machine log, so figuring out exactly why it's hanging is a bit difficult. Any ideas on what to try next? I'll see if I can get some more details from our BIOS folks and do yet another pass over the documentation to see if there's something I'm missing. Can you find out which config access (bus, device, function, address) is the one that hangs the box? I assume that either the corresponding address in the MCFG table is problematic (i.e. has something else mapped over it), or maybe that device just doesn't like being probed with MCFG somehow. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: something strange in libata-core.c for kernel 2.6.22-rc3
Tejun Heo wrote: [EMAIL PROTECTED] wrote: Mybe I am wrong, but if you are detecting 40-wire cable to set them to DMA/33, why the check includes also 80-wire cables configuring them to DMA/33 too? With this patch my nvidia4 IDE controllers detects correctly and configure correctly DMA/100 for my HD and DMA/33 for my DVD (the first uses a 80-wire cable, the second a 40-wire cable). Am I wrong somewhere? That's the drive side verification of 80c cable check, so if the condition triggers we downgrade 80c or unknown to 40c. Cable detection on nvidia PATA is a disaster. You're supposed to do some ACPI dancing and drive side detection is completely bogus. Eeeek Alan, did you have a chance to test the ACPI cable detection? It just didn't work when I tried it. It always returned 80c on my machine. Hopefully when we get that support in and working it will solve a lot of these issues (and others, like the laptops that have a short 40-wire cable that is good for high UDMA speeds which we presently have to hard-code detection for specific models). -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sd_resume redundant? [was: [PATCH] libata: implement ata_wait_after_reset()]
Randy Dunlap wrote: On Sun, 20 May 2007 11:45:03 -0600 Robert Hancock wrote: Indan Zupancic wrote: Everything seems to work fine without sd_resume(), so why is it needed? Because not all disks spin up without being told to do so and like it or not spinning disks up on resume is the default behavior. As I wrote in the other reply, it would be worthwhile to make it configurable. Not even after they receive a read command? Ugh. ATA disks are supposed to spin up, yes. SCSI disks require a command to tell them to spin up if they're in the "stopped" state. Good info, but linux-ide was dropped. Is that due to lack of reply-to-all or is it a newsgroup thing or what? That would be a newsgroup thing. It seems that sometimes CCs get dropped when the posts are forwarded to fa.linux.kernel where I normally read them. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sd_resume redundant? [was: [PATCH] libata: implement ata_wait_after_reset()]
Indan Zupancic wrote: Everything seems to work fine without sd_resume(), so why is it needed? Because not all disks spin up without being told to do so and like it or not spinning disks up on resume is the default behavior. As I wrote in the other reply, it would be worthwhile to make it configurable. Not even after they receive a read command? Ugh. ATA disks are supposed to spin up, yes. SCSI disks require a command to tell them to spin up if they're in the "stopped" state. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: something strange in libata-core.c for kernel 2.6.22-rc3
[EMAIL PROTECTED] wrote: Mybe I am wrong, but if you are detecting 40-wire cable to set them to DMA/33, why the check includes also 80-wire cables configuring them to DMA/33 too? With this patch my nvidia4 IDE controllers detects correctly and configure correctly DMA/100 for my HD and DMA/33 for my DVD (the first uses a 80-wire cable, the second a 40-wire cable). Am I wrong somewhere? --- libata-core.c.orig 2007-05-20 14:31:25.0 +0200 +++ libata-core.c 2007-05-20 14:34:01.0 +0200 @@ -3901,8 +3901,7 @@ /* UDMA/44 or higher would be available */ if((ap->cbl == ATA_CBL_PATA40) || (ata_drive_40wire(dev->id) && -(ap->cbl == ATA_CBL_PATA_UNK || - ap->cbl == ATA_CBL_PATA80))) { +(ap->cbl == ATA_CBL_PATA_UNK))) { ata_dev_printk(dev, KERN_WARNING, "limited to UDMA/33 due to 40-wire cable\n"); xfer_mask &= ~(0xF8 << ATA_SHIFT_UDMA); It only does that for ATA_CBL_PATA80 if ata_drive_40wire returns true, which means that the drive is detecting a 40-wire cable on its side. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: something strange in libata-core.c for kernel 2.6.22-rc3
[EMAIL PROTECTED] wrote: Mybe I am wrong, but if you are detecting 40-wire cable to set them to DMA/33, why the check includes also 80-wire cables configuring them to DMA/33 too? With this patch my nvidia4 IDE controllers detects correctly and configure correctly DMA/100 for my HD and DMA/33 for my DVD (the first uses a 80-wire cable, the second a 40-wire cable). Am I wrong somewhere? --- libata-core.c.orig 2007-05-20 14:31:25.0 +0200 +++ libata-core.c 2007-05-20 14:34:01.0 +0200 @@ -3901,8 +3901,7 @@ /* UDMA/44 or higher would be available */ if((ap-cbl == ATA_CBL_PATA40) || (ata_drive_40wire(dev-id) -(ap-cbl == ATA_CBL_PATA_UNK || - ap-cbl == ATA_CBL_PATA80))) { +(ap-cbl == ATA_CBL_PATA_UNK))) { ata_dev_printk(dev, KERN_WARNING, limited to UDMA/33 due to 40-wire cable\n); xfer_mask = ~(0xF8 ATA_SHIFT_UDMA); It only does that for ATA_CBL_PATA80 if ata_drive_40wire returns true, which means that the drive is detecting a 40-wire cable on its side. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sd_resume redundant? [was: [PATCH] libata: implement ata_wait_after_reset()]
Indan Zupancic wrote: Everything seems to work fine without sd_resume(), so why is it needed? Because not all disks spin up without being told to do so and like it or not spinning disks up on resume is the default behavior. As I wrote in the other reply, it would be worthwhile to make it configurable. Not even after they receive a read command? Ugh. ATA disks are supposed to spin up, yes. SCSI disks require a command to tell them to spin up if they're in the stopped state. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: sd_resume redundant? [was: [PATCH] libata: implement ata_wait_after_reset()]
Randy Dunlap wrote: On Sun, 20 May 2007 11:45:03 -0600 Robert Hancock wrote: Indan Zupancic wrote: Everything seems to work fine without sd_resume(), so why is it needed? Because not all disks spin up without being told to do so and like it or not spinning disks up on resume is the default behavior. As I wrote in the other reply, it would be worthwhile to make it configurable. Not even after they receive a read command? Ugh. ATA disks are supposed to spin up, yes. SCSI disks require a command to tell them to spin up if they're in the stopped state. Good info, but linux-ide was dropped. Is that due to lack of reply-to-all or is it a newsgroup thing or what? That would be a newsgroup thing. It seems that sometimes CCs get dropped when the posts are forwarded to fa.linux.kernel where I normally read them. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: something strange in libata-core.c for kernel 2.6.22-rc3
Tejun Heo wrote: [EMAIL PROTECTED] wrote: Mybe I am wrong, but if you are detecting 40-wire cable to set them to DMA/33, why the check includes also 80-wire cables configuring them to DMA/33 too? With this patch my nvidia4 IDE controllers detects correctly and configure correctly DMA/100 for my HD and DMA/33 for my DVD (the first uses a 80-wire cable, the second a 40-wire cable). Am I wrong somewhere? That's the drive side verification of 80c cable check, so if the condition triggers we downgrade 80c or unknown to 40c. Cable detection on nvidia PATA is a disaster. You're supposed to do some ACPI dancing and drive side detection is completely bogus. Eeeek Alan, did you have a chance to test the ACPI cable detection? It just didn't work when I tried it. It always returned 80c on my machine. Hopefully when we get that support in and working it will solve a lot of these issues (and others, like the laptops that have a short 40-wire cable that is good for high UDMA speeds which we presently have to hard-code detection for specific models). -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sata_sil: Greatly improve DMA support
Jeff Garzik wrote: Since Alan expressed a desire to see Large Block Transfer (LBT) support in pata_sil680, I though I would re-post my patch for adding LBT support to sata_sil. Silicon Image's Large Block Transfer (LBT) support is a vendor-specific DMA scatter/gather engine, which enables 64-bit DMA addresses (where supported by platform) and eliminates the annoying 64k DMA boundary found in legacy PCI IDE BMDMA engines. Looks like it doesn't allow 64-bit DMA addresses, it only gets rid of the 64K boundary limitation. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] sata_sil: Greatly improve DMA support
Jeff Garzik wrote: Since Alan expressed a desire to see Large Block Transfer (LBT) support in pata_sil680, I though I would re-post my patch for adding LBT support to sata_sil. Silicon Image's Large Block Transfer (LBT) support is a vendor-specific DMA scatter/gather engine, which enables 64-bit DMA addresses (where supported by platform) and eliminates the annoying 64k DMA boundary found in legacy PCI IDE BMDMA engines. Looks like it doesn't allow 64-bit DMA addresses, it only gets rid of the 64K boundary limitation. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] drivers/ata: Add the SW NCQ support to sata_nv for MCP51/MCP55/MCP61
Kuan Luo wrote: Thanks for your comment, see the explaination inline. We'll apply your advice in later patch. ... Please don't duplicate this code in the driver, this is part of libata core in libata-scsi.c. Add an export for these functions if you need to use them in the driver. [kuan]: These calls are declared in static type . I can't export them and don't want to modify libata code. If you really need these functions then they should be made non-static and exported. Duplicating the code is not a solution. I'm not so sure you actually need all that, though. I suspect you can likely handle the deferring of commands if you detect an FPDMA data phase inside the qc_issue function only (like you already do in some cases) instead of having to mess with deferring them at the SCSI layer. I'm still puzzling out how this stuff all works, but it looks like this code makes you stop sending new commands if: -the port is in the FPDMA Data Phase (DMA Setup FIS received but the transfer is not complete yet) - I assume the hardware doesn't handle this itself, which seems rather unique -we previously deferred a command inside of qc_issue because we were in the FPDMA data phase -we previously saw dhfis_flags not equal to qc_active, or we got a BACKOUT interrupt (whatever exactly that means), both of which set some value in the back_byte [kuan]: -If we got BACKOUT interrupt, it means that a command just sent by driver backed out.The driver should resend the command.So new commands should be defered. -If dhfis_flags != qc_active, it indicates that the last command doesn't generate a device to host register FIS . After sending some commands, I found that the last command sometimes has this problem but previous commands are normal.In this case, we need resend the last command. Both cases set back_byte. The case where the command didn't generate a D2H FIS should likely be investigated further, otherwise we don't necessarily know that this workaround will work in all cases? This code seems a bit odd. Isn't this tossing out a bunch of potential error status, etc? [kuan]: If there are commands in queue, the driver can send a new command only after receiving dhfis intr of previous command and before receiving any dmasetup fis intr. In this place, i do the last check before sending the command. But the D2H FIS can contain an error indication, correct? If that happens here it won't detect this. In this situation error handling should be triggered. + done_mask = pp->qc_active ^ sactive; + if (unlikely(done_mask & sactive)) { + ata_port_printk(ap, KERN_ERR, "illegal qc_active transition " + "(%08x->%08x)\n", ap->qc_active, sactive); + return -EINVAL; + } Shouldn't this trigger error handling if it happens instead of just printing an error? [kuan]: I think the error handling can be triggered by timeout. In fact, this case should seldom happen. There have been reports of some drives with bad NCQ implementations that return completion status for commands that were not issued. If we detect this case we should raise an HSM violation which will disable NCQ on this drive if it happens repeatedly. See the code in ahci.c in ahci_host_intr. This comment still applies: Additional/general comments: Think you need some code to handle suspend and resume (re-enable SATA MMIO space, etc.) -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] drivers/ata: Add the SW NCQ support to sata_nv for MCP51/MCP55/MCP61
Kuan Luo wrote: Thanks for your comment, see the explaination inline. We'll apply your advice in later patch. ... Please don't duplicate this code in the driver, this is part of libata core in libata-scsi.c. Add an export for these functions if you need to use them in the driver. [kuan]: These calls are declared in static type . I can't export them and don't want to modify libata code. If you really need these functions then they should be made non-static and exported. Duplicating the code is not a solution. I'm not so sure you actually need all that, though. I suspect you can likely handle the deferring of commands if you detect an FPDMA data phase inside the qc_issue function only (like you already do in some cases) instead of having to mess with deferring them at the SCSI layer. I'm still puzzling out how this stuff all works, but it looks like this code makes you stop sending new commands if: -the port is in the FPDMA Data Phase (DMA Setup FIS received but the transfer is not complete yet) - I assume the hardware doesn't handle this itself, which seems rather unique -we previously deferred a command inside of qc_issue because we were in the FPDMA data phase -we previously saw dhfis_flags not equal to qc_active, or we got a BACKOUT interrupt (whatever exactly that means), both of which set some value in the back_byte [kuan]: -If we got BACKOUT interrupt, it means that a command just sent by driver backed out.The driver should resend the command.So new commands should be defered. -If dhfis_flags != qc_active, it indicates that the last command doesn't generate a device to host register FIS . After sending some commands, I found that the last command sometimes has this problem but previous commands are normal.In this case, we need resend the last command. Both cases set back_byte. The case where the command didn't generate a D2H FIS should likely be investigated further, otherwise we don't necessarily know that this workaround will work in all cases? This code seems a bit odd. Isn't this tossing out a bunch of potential error status, etc? [kuan]: If there are commands in queue, the driver can send a new command only after receiving dhfis intr of previous command and before receiving any dmasetup fis intr. In this place, i do the last check before sending the command. But the D2H FIS can contain an error indication, correct? If that happens here it won't detect this. In this situation error handling should be triggered. + done_mask = pp-qc_active ^ sactive; + if (unlikely(done_mask sactive)) { + ata_port_printk(ap, KERN_ERR, illegal qc_active transition + (%08x-%08x)\n, ap-qc_active, sactive); + return -EINVAL; + } Shouldn't this trigger error handling if it happens instead of just printing an error? [kuan]: I think the error handling can be triggered by timeout. In fact, this case should seldom happen. There have been reports of some drives with bad NCQ implementations that return completion status for commands that were not issued. If we detect this case we should raise an HSM violation which will disable NCQ on this drive if it happens repeatedly. See the code in ahci.c in ahci_host_intr. This comment still applies: Additional/general comments: Think you need some code to handle suspend and resume (re-enable SATA MMIO space, etc.) -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] drivers/ata: Add the SW NCQ support to sata_nv for MCP51/MCP55/MCP61
Zoltan Boszormenyi wrote: Hi, thanks for publishing this. Add the Software NCQ support to sata_nv.c for MCP51/MCP55/MCP61 SATA controller. This patch base on sata_nv.c file from kernel 2.6.22-rc1 See attachment for the patch. Signed-off-by: Kuan Luo <[EMAIL PROTECTED]> Signed-off-by: Peer Chen <[EMAIL PROTECTED]> == See attached file. == However, I saw this in the patch: + /* determine if physical DMA addr spans 64K boundary. +* Note h/w doesn't support 64-bit, so we unconditionally +* truncate dma_addr_t to u32. +*/ + addr = (u32) sg_dma_address(sg); Does it mean that I can't upgrade my machine to 4 GB or more without losing NCQ or risking data corruption? Can the code be made IOMMU-aware? That shouldn't be a problem, libata default DMA mask is 32 bits (which isn't overridden with this controller) and so the block layer will bounce any data being read/written above that point with IOMMU or swiotlb. The comment is a bit unnecessarily scary. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] drivers/ata: Add the SW NCQ support to sata_nv for MCP51/MCP55/MCP61
Peer Chen wrote: Add the Software NCQ support to sata_nv.c for MCP51/MCP55/MCP61 SATA controller. This patch base on sata_nv.c file from kernel 2.6.22-rc1 See attachment for the patch. Signed-off-by: Kuan Luo <[EMAIL PROTECTED]> Signed-off-by: Peer Chen <[EMAIL PROTECTED]> Good to finally see this come out. I've pasted the code below (indented) in order to make some comments: --- linux-2.6.22-rc1/drivers/ata/sata_nv.c.orig 2007-05-17 14:48:26.0 -0400 +++ linux-2.6.22-rc1/drivers/ata/sata_nv.c 2007-05-17 17:07:28.0 -0400 @@ -46,6 +46,8 @@ #include #include #include +#include +#include #include #define DRV_NAME "sata_nv" @@ -169,6 +171,36 @@ enum { NV_ADMA_PORT_REGISTER_MODE = (1 << 0), NV_ADMA_ATAPI_SETUP_COMPLETE= (1 << 1), + /* MCP55 reg offset */ + NV_CTL_MCP55= 0x400, + NV_INT_STATUS_MCP55 = 0x440, + NV_INT_ENABLE_MCP55 = 0x444, + NV_NCQ_REG_MCP55= 0x448, + NV_CH1_SACTIVE_MCP55= 0x0C, + + /* MCP55 */ + NV_INT_ALL_MCP55= 0x, + NV_INT_PORT_SHIFT_MCP55 = 16, /* each port occupies 16 bits */ + NV_INT_MASK_MCP55 = NV_INT_ALL_MCP55 & 0xfffd, + + /* NCQ ENABLE BITS*/ + NV_CTL_PRI_SWNCQ= 0x02, + NV_CTL_SEC_SWNCQ= 0x04, + + /* MCP55 status bits*/ + NV_INT_DEV_MCP55= 0x01, + NV_INT_PM_MCP55 = 0x02, + NV_INT_ADDED_MCP55 = 0x04, + NV_INT_REMOVED_MCP55= 0x08, + + NV_INT_BACKOUT_MCP55= 0x10, + NV_INT_SDBFIS_MCP55 = 0x20, + NV_INT_DHREGFIS_MCP55 = 0x40, + NV_INT_DMASETUP_MCP55 = 0x80, + + NV_INT_HOTPLUG_MCP55= (NV_INT_ADDED_MCP55 | + NV_INT_REMOVED_MCP55), + }; /* ADMA Physical Region Descriptor - one SG segment */ @@ -264,13 +296,118 @@ static void nv_adma_host_stop(struct ata static void nv_adma_post_internal_cmd(struct ata_queued_cmd *qc); static void nv_adma_tf_read(struct ata_port *ap, struct ata_taskfile *tf); +static void ncq_error_handler(struct ata_port *ap); +static void nv_mcp55_thaw(struct ata_port *ap); +static void nv_mcp55_freeze(struct ata_port *ap); +static void ncq_host_init(struct ata_host *host); +static void nv_bmdma_stop(struct ata_port *ap); +static int nv_std_qc_defer(struct ata_port *ap); +static int nv_port_start(struct ata_port *ap); +static void nv_port_stop(struct ata_port *ap); +static void ncq_clear(struct ata_port *ap); +static void nv_qc_prep(struct ata_queued_cmd *qc); +static void nv_fill_sg(struct ata_queued_cmd *qc); +static void ncq_sactive_start (struct ata_queued_cmd *qc); +static u32 ncq_sactive_value (struct ata_port *ap); +static unsigned int nv_qc_issue_prot(struct ata_queued_cmd *qc); +static u32 ncq_tag_value(struct ata_port *ap); +static int nv_ncqintr_sdbfis(struct ata_port *ap); +static int nv_ncqintr_dmasetupfis(struct ata_port *ap); +static void ncq_clear_singlefis(struct ata_port *ap, u32 val); +static u32 ncq_ownfisintr_value (struct ata_port *ap); +void ncq_hotplug(struct ata_port *ap, u32 fis); +static irqreturn_t nv_mcp55_interrupt(int irq, void *dev_instance); +static int ncq_interrupt(struct ata_port *ap, u32 fis); +static int nv_scsi_queuecmd(struct scsi_cmnd *cmd, + void (*done)(struct scsi_cmnd *)); These functions should use "mcp51" or "swncq" or something in the name instead of "ncq", since the latter implies it may be related to ADMA as well. + +#undef NCQ_DEBUG +#undef NCQ_VERBOSE_DEBUG +#ifdef NCQ_DEBUG +#define NPRINTK(fmt, args...) printk(KERN_ERR "%s: " fmt, __FUNCTION__, ## args) +#ifdef NCQ_VERBOSE_DEBUG +#define NVPRINTK(fmt, args...) printk(KERN_ERR "%s: " fmt, __FUNCTION__, ## args) +#else +#define NVPRINTK(fmt, args...) do { } while(0) +#endif /* NCQ_VERBOSE_DEBUG */ +#else +#define NPRINTK(fmt, args...) do { } while(0) +#define NVPRINTK(fmt, args...) do { } while(0) +#endif We don't need these private helper macros, just use the ones that libata defines. + +/*cmd_stop
Re: [PATCH] drivers/ata: Add the SW NCQ support to sata_nv for MCP51/MCP55/MCP61
Peer Chen wrote: Add the Software NCQ support to sata_nv.c for MCP51/MCP55/MCP61 SATA controller. This patch base on sata_nv.c file from kernel 2.6.22-rc1 See attachment for the patch. Signed-off-by: Kuan Luo [EMAIL PROTECTED] Signed-off-by: Peer Chen [EMAIL PROTECTED] Good to finally see this come out. I've pasted the code below (indented) in order to make some comments: --- linux-2.6.22-rc1/drivers/ata/sata_nv.c.orig 2007-05-17 14:48:26.0 -0400 +++ linux-2.6.22-rc1/drivers/ata/sata_nv.c 2007-05-17 17:07:28.0 -0400 @@ -46,6 +46,8 @@ #include linux/device.h #include scsi/scsi_host.h #include scsi/scsi_device.h +#include scsi/scsi.h +#include scsi/scsi_cmnd.h #include linux/libata.h #define DRV_NAME sata_nv @@ -169,6 +171,36 @@ enum { NV_ADMA_PORT_REGISTER_MODE = (1 0), NV_ADMA_ATAPI_SETUP_COMPLETE= (1 1), + /* MCP55 reg offset */ + NV_CTL_MCP55= 0x400, + NV_INT_STATUS_MCP55 = 0x440, + NV_INT_ENABLE_MCP55 = 0x444, + NV_NCQ_REG_MCP55= 0x448, + NV_CH1_SACTIVE_MCP55= 0x0C, + + /* MCP55 */ + NV_INT_ALL_MCP55= 0x, + NV_INT_PORT_SHIFT_MCP55 = 16, /* each port occupies 16 bits */ + NV_INT_MASK_MCP55 = NV_INT_ALL_MCP55 0xfffd, + + /* NCQ ENABLE BITS*/ + NV_CTL_PRI_SWNCQ= 0x02, + NV_CTL_SEC_SWNCQ= 0x04, + + /* MCP55 status bits*/ + NV_INT_DEV_MCP55= 0x01, + NV_INT_PM_MCP55 = 0x02, + NV_INT_ADDED_MCP55 = 0x04, + NV_INT_REMOVED_MCP55= 0x08, + + NV_INT_BACKOUT_MCP55= 0x10, + NV_INT_SDBFIS_MCP55 = 0x20, + NV_INT_DHREGFIS_MCP55 = 0x40, + NV_INT_DMASETUP_MCP55 = 0x80, + + NV_INT_HOTPLUG_MCP55= (NV_INT_ADDED_MCP55 | + NV_INT_REMOVED_MCP55), + }; /* ADMA Physical Region Descriptor - one SG segment */ @@ -264,13 +296,118 @@ static void nv_adma_host_stop(struct ata static void nv_adma_post_internal_cmd(struct ata_queued_cmd *qc); static void nv_adma_tf_read(struct ata_port *ap, struct ata_taskfile *tf); +static void ncq_error_handler(struct ata_port *ap); +static void nv_mcp55_thaw(struct ata_port *ap); +static void nv_mcp55_freeze(struct ata_port *ap); +static void ncq_host_init(struct ata_host *host); +static void nv_bmdma_stop(struct ata_port *ap); +static int nv_std_qc_defer(struct ata_port *ap); +static int nv_port_start(struct ata_port *ap); +static void nv_port_stop(struct ata_port *ap); +static void ncq_clear(struct ata_port *ap); +static void nv_qc_prep(struct ata_queued_cmd *qc); +static void nv_fill_sg(struct ata_queued_cmd *qc); +static void ncq_sactive_start (struct ata_queued_cmd *qc); +static u32 ncq_sactive_value (struct ata_port *ap); +static unsigned int nv_qc_issue_prot(struct ata_queued_cmd *qc); +static u32 ncq_tag_value(struct ata_port *ap); +static int nv_ncqintr_sdbfis(struct ata_port *ap); +static int nv_ncqintr_dmasetupfis(struct ata_port *ap); +static void ncq_clear_singlefis(struct ata_port *ap, u32 val); +static u32 ncq_ownfisintr_value (struct ata_port *ap); +void ncq_hotplug(struct ata_port *ap, u32 fis); +static irqreturn_t nv_mcp55_interrupt(int irq, void *dev_instance); +static int ncq_interrupt(struct ata_port *ap, u32 fis); +static int nv_scsi_queuecmd(struct scsi_cmnd *cmd, + void (*done)(struct scsi_cmnd *)); These functions should use mcp51 or swncq or something in the name instead of ncq, since the latter implies it may be related to ADMA as well. + +#undef NCQ_DEBUG +#undef NCQ_VERBOSE_DEBUG +#ifdef NCQ_DEBUG +#define NPRINTK(fmt, args...) printk(KERN_ERR %s: fmt, __FUNCTION__, ## args) +#ifdef NCQ_VERBOSE_DEBUG +#define NVPRINTK(fmt, args...) printk(KERN_ERR %s: fmt, __FUNCTION__, ## args) +#else +#define NVPRINTK(fmt, args...) do { } while(0) +#endif /* NCQ_VERBOSE_DEBUG */ +#else +#define NPRINTK(fmt, args...) do { } while(0) +#define NVPRINTK(fmt, args...) do { } while(0) +#endif We don't need these private helper macros, just use
Re: [PATCH] drivers/ata: Add the SW NCQ support to sata_nv for MCP51/MCP55/MCP61
Zoltan Boszormenyi wrote: Hi, thanks for publishing this. Add the Software NCQ support to sata_nv.c for MCP51/MCP55/MCP61 SATA controller. This patch base on sata_nv.c file from kernel 2.6.22-rc1 See attachment for the patch. Signed-off-by: Kuan Luo [EMAIL PROTECTED] Signed-off-by: Peer Chen [EMAIL PROTECTED] == See attached file. == However, I saw this in the patch: + /* determine if physical DMA addr spans 64K boundary. +* Note h/w doesn't support 64-bit, so we unconditionally +* truncate dma_addr_t to u32. +*/ + addr = (u32) sg_dma_address(sg); Does it mean that I can't upgrade my machine to 4 GB or more without losing NCQ or risking data corruption? Can the code be made IOMMU-aware? That shouldn't be a problem, libata default DMA mask is 32 bits (which isn't overridden with this controller) and so the block layer will bounce any data being read/written above that point with IOMMU or swiotlb. The comment is a bit unnecessarily scary. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [announce] Intel announces the PowerTOP utility for Linux
Looks like the radeon driver has the same problem as the i915 driver mentioned on the known problems page - I get 60 wakeups/sec from it on my Compaq X1000 laptop (Radeon 9000 graphics) while in X, which essentially prevents entry into C3. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] libata: add human-readable error value decoding (v2)
This adds human-readable decoding of the ATA status and error registers (similar to what drivers/ide does) as well as the SATA Serror register to libata error handling output. This prevents the need to pore through standards documents to figure out the meaning of the bits in these registers when looking at error reports. Some bits that drivers/ide decoded are not decoded here, since the bits are either command-dependent or obsolete, and properly parsing them would add too much complexity. This version reduces the length of the SError parsed output strings relative to the previous version of this patch. Signed-off-by: Robert Hancock <[EMAIL PROTECTED]> --- linux-2.6.21.1/drivers/ata/libata-eh.c 2007-04-27 15:49:26.0 -0600 +++ linux-2.6.21.1edit/drivers/ata/libata-eh.c 2007-05-14 17:38:35.0 -0600 @@ -1523,6 +1523,27 @@ static void ata_eh_report(struct ata_por ata_port_printk(ap, KERN_ERR, "(%s)\n", desc); } + if (ehc->i.serror) + ata_port_printk(ap, KERN_ERR, + "SError: {%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s}\n", + ehc->i.serror & SERR_DATA_RECOVERED ? "RecovData " : "", + ehc->i.serror & SERR_COMM_RECOVERED ? "RecovComm " : "", + ehc->i.serror & SERR_DATA ? "UnrecovData " : "", + ehc->i.serror & SERR_PERSISTENT ? "Persist " : "", + ehc->i.serror & SERR_PROTOCOL ? "Proto " : "", + ehc->i.serror & SERR_INTERNAL ? "HostInt " : "", + ehc->i.serror & SERR_PHYRDY_CHG ? "PHYRdyChg " : "", + ehc->i.serror & SERR_PHY_INT_ERR ? "PHYInt " : "", + ehc->i.serror & SERR_COMM_WAKE ? "CommWake " : "", + ehc->i.serror & SERR_10B_8B_ERR ? "10B8B " : "", + ehc->i.serror & SERR_DISPARITY ? "Dispar " : "", + ehc->i.serror & SERR_CRC ? "BadCRC " : "", + ehc->i.serror & SERR_HANDSHAKE ? "Handshk " : "", + ehc->i.serror & SERR_LINK_SEQ_ERR ? "LinkSeq " : "", + ehc->i.serror & SERR_TRANS_ST_ERROR ? "TrStaTrns " : "", + ehc->i.serror & SERR_UNRECOG_FIS ? "UnrecFIS " : "", + ehc->i.serror & SERR_DEV_XCHG ? "DevExch " : "" ); + for (tag = 0; tag < ATA_MAX_QUEUE; tag++) { static const char *dma_str[] = { [DMA_BIDIRECTIONAL] = "bidi", @@ -1552,6 +1573,29 @@ static void ata_eh_report(struct ata_por res->hob_feature, res->hob_nsect, res->hob_lbal, res->hob_lbam, res->hob_lbah, res->device, qc->err_mask, ata_err_string(qc->err_mask)); + + if (res->command & (ATA_BUSY | ATA_DRDY | ATA_DF | ATA_DRQ | + ATA_ERR) ) { + if (res->command & ATA_BUSY) + ata_dev_printk(qc->dev, KERN_ERR, + "status: {Busy}\n" ); + else + ata_dev_printk(qc->dev, KERN_ERR, + "status: {%s%s%s%s}\n", + res->command & ATA_DRDY ? "DRDY " : "", + res->command & ATA_DF ? "DF " : "", + res->command & ATA_DRQ ? "DRQ " : "", + res->command & ATA_ERR ? "ERR " : "" ); + } + + if (cmd->command != ATA_CMD_PACKET && + (res->feature & (ATA_ICRC | ATA_UNC | ATA_IDNF | ATA_ABORTED))) + ata_dev_printk(qc->dev, KERN_ERR, + "error: {%s%s%s%s}\n", + res->feature & ATA_ICRC ? "ICRC " : "", + res->feature & ATA_UNC ? "UNC " : "", + res->feature & ATA_IDNF ? "IDNF " : "", + res->feature & ATA_ABORTED ? "ABRT " : "" ); } } --- linux-2.6.21.1/include/linux/ata.h 2007-04-27 15:49:26.0 -0600 +++ linux-2.6.21.1edit/include/linux/ata.h 2007-05
[PATCH] libata: add human-readable error value decoding (v2)
This adds human-readable decoding of the ATA status and error registers (similar to what drivers/ide does) as well as the SATA Serror register to libata error handling output. This prevents the need to pore through standards documents to figure out the meaning of the bits in these registers when looking at error reports. Some bits that drivers/ide decoded are not decoded here, since the bits are either command-dependent or obsolete, and properly parsing them would add too much complexity. This version reduces the length of the SError parsed output strings relative to the previous version of this patch. Signed-off-by: Robert Hancock [EMAIL PROTECTED] --- linux-2.6.21.1/drivers/ata/libata-eh.c 2007-04-27 15:49:26.0 -0600 +++ linux-2.6.21.1edit/drivers/ata/libata-eh.c 2007-05-14 17:38:35.0 -0600 @@ -1523,6 +1523,27 @@ static void ata_eh_report(struct ata_por ata_port_printk(ap, KERN_ERR, (%s)\n, desc); } + if (ehc-i.serror) + ata_port_printk(ap, KERN_ERR, + SError: {%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s}\n, + ehc-i.serror SERR_DATA_RECOVERED ? RecovData : , + ehc-i.serror SERR_COMM_RECOVERED ? RecovComm : , + ehc-i.serror SERR_DATA ? UnrecovData : , + ehc-i.serror SERR_PERSISTENT ? Persist : , + ehc-i.serror SERR_PROTOCOL ? Proto : , + ehc-i.serror SERR_INTERNAL ? HostInt : , + ehc-i.serror SERR_PHYRDY_CHG ? PHYRdyChg : , + ehc-i.serror SERR_PHY_INT_ERR ? PHYInt : , + ehc-i.serror SERR_COMM_WAKE ? CommWake : , + ehc-i.serror SERR_10B_8B_ERR ? 10B8B : , + ehc-i.serror SERR_DISPARITY ? Dispar : , + ehc-i.serror SERR_CRC ? BadCRC : , + ehc-i.serror SERR_HANDSHAKE ? Handshk : , + ehc-i.serror SERR_LINK_SEQ_ERR ? LinkSeq : , + ehc-i.serror SERR_TRANS_ST_ERROR ? TrStaTrns : , + ehc-i.serror SERR_UNRECOG_FIS ? UnrecFIS : , + ehc-i.serror SERR_DEV_XCHG ? DevExch : ); + for (tag = 0; tag ATA_MAX_QUEUE; tag++) { static const char *dma_str[] = { [DMA_BIDIRECTIONAL] = bidi, @@ -1552,6 +1573,29 @@ static void ata_eh_report(struct ata_por res-hob_feature, res-hob_nsect, res-hob_lbal, res-hob_lbam, res-hob_lbah, res-device, qc-err_mask, ata_err_string(qc-err_mask)); + + if (res-command (ATA_BUSY | ATA_DRDY | ATA_DF | ATA_DRQ | + ATA_ERR) ) { + if (res-command ATA_BUSY) + ata_dev_printk(qc-dev, KERN_ERR, + status: {Busy}\n ); + else + ata_dev_printk(qc-dev, KERN_ERR, + status: {%s%s%s%s}\n, + res-command ATA_DRDY ? DRDY : , + res-command ATA_DF ? DF : , + res-command ATA_DRQ ? DRQ : , + res-command ATA_ERR ? ERR : ); + } + + if (cmd-command != ATA_CMD_PACKET + (res-feature (ATA_ICRC | ATA_UNC | ATA_IDNF | ATA_ABORTED))) + ata_dev_printk(qc-dev, KERN_ERR, + error: {%s%s%s%s}\n, + res-feature ATA_ICRC ? ICRC : , + res-feature ATA_UNC ? UNC : , + res-feature ATA_IDNF ? IDNF : , + res-feature ATA_ABORTED ? ABRT : ); } } --- linux-2.6.21.1/include/linux/ata.h 2007-04-27 15:49:26.0 -0600 +++ linux-2.6.21.1edit/include/linux/ata.h 2007-05-09 19:25:54.0 -0600 @@ -223,6 +223,15 @@ enum { SERR_PROTOCOL = (1 10), /* protocol violation */ SERR_INTERNAL = (1 11), /* host internal error */ SERR_PHYRDY_CHG = (1 16), /* PHY RDY changed */ + SERR_PHY_INT_ERR= (1 17), /* PHY internal error */ + SERR_COMM_WAKE = (1 18), /* Comm wake */ + SERR_10B_8B_ERR = (1 19), /* 10b to 8b decode error */ + SERR_DISPARITY = (1 20), /* Disparity */ + SERR_CRC= (1 21), /* CRC error */ + SERR_HANDSHAKE = (1 22), /* Handshake error */ + SERR_LINK_SEQ_ERR = (1 23), /* Link sequence error */ + SERR_TRANS_ST_ERROR = (1 24), /* Transport state transition error */ + SERR_UNRECOG_FIS= (1 25), /* Unrecognized FIS */ SERR_DEV_XCHG = (1 26), /* device exchanged */ /* struct ata_taskfile flags */ - To unsubscribe from
Re: [announce] Intel announces the PowerTOP utility for Linux
Looks like the radeon driver has the same problem as the i915 driver mentioned on the known problems page - I get 60 wakeups/sec from it on my Compaq X1000 laptop (Radeon 9000 graphics) while in X, which essentially prevents entry into C3. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Workaround for a PCI restoring bug
Lukas Hejtmanek wrote: Hello, as of 2.6.21-git16, the bugs related to restoring PCI are still present. The save pci function reads only -1 from the PCI config space and when restoring, it messes up totaly most PCI devices. The attached patch is workaround only until proper fix is found and included. Could it be included into the mainline for now? It's not really a fix, that value might be legitimately supposed to be in the config space. Sounds like some driver is disabling the device before saving the state or something.. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.21.1] SATA freeze
Fred Moyer wrote: This appears to be a different problem. Something is issuing SMART-related commands (smartd or smartctl perhaps) which the drive seems to be reacting strangely to. It apparently completed the command but never raised DRQ to request any data being transferred even though we expected it to. Maybe SMART is disabled on the drive and that's causing it to just toss these commands? CCing linux-ide in case anyone knows what would cause this. Here's smartctl -a for this drive - same output for both sda and sdb. Smartd is currently running. Any advice appreciated. Previously on 2.6.15 I was seeing sdb remount as readonly under heavy i/o. I have not seen that issue yet with 2.6.21 (with Robert's patch from May 5th for sata_nv), but that occurrence of remounts read-only was infrequently, so that issue may be solved. app2 ~ # smartctl -a /dev/sda smartctl version 5.36 [x86_64-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen Home page is http://smartmontools.sourceforge.net/ Device: ATA ST3808110AS Version: n/a Serial number: 5LR8895K Device type: disk Local Time is: Sat May 12 12:05:58 2007 PDT Device does not support SMART Error Counter logging not supported [GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on'] Device does not support Self Test logging Sounds like SMART is likely disabled on that drive. You can try doing "smartctl -s on /dev/sda" and see if that will turn it on. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.21.1] SATA freeze
Fred Moyer wrote: I just joined the list today so apologies if this email breaks any email client post threading. I have been seeing similar errors on two different systems. I applied Robert's sata_nv patch posted to the list on May 5th, and approved today by Jeff Garzik. I've taken several steps to insure that this isn't a faulty cable or drive issue. This is running on a hp dl145g2. Here is my lspci, dmesg, and relevant kernel config sections: (snip) ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata1.00: cmd b0/d2:f1:00:4f:c2/00:00:00:00:00/00 tag 0 cdb 0x0 data 123392 in res 50/00:f1:00:4f:c2/00:00:00:00:00/00 Emask 0x202 (HSM violation) ata1: soft resetting port ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata1.00: configured for UDMA/100 ata1: EH complete This appears to be a different problem. Something is issuing SMART-related commands (smartd or smartctl perhaps) which the drive seems to be reacting strangely to. It apparently completed the command but never raised DRQ to request any data being transferred even though we expected it to. Maybe SMART is disabled on the drive and that's causing it to just toss these commands? CCing linux-ide in case anyone knows what would cause this. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.21.1] SATA freeze
Gerhard Mack wrote: On Wed, 9 May 2007, Robert Hancock wrote: Gerhard Mack wrote: On Wed, 9 May 2007, Jeff Garzik wrote: Gerhard Mack wrote: May 9 14:51:35 mgerhard kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x180 action 0x2 frozen May 9 14:51:35 mgerhard kernel: ata1.00: cmd 35/00:00:80:6d:c8/00:04:09:00:00/e0 tag 0 cdb 0x0 data 524288 out May 9 14:51:35 mgerhard kernel: res 40/00:c8:68:65:c8/84:00:09:00:00/e0 Emask 0x4 (timeout) May 9 14:51:42 mgerhard kernel: ata1: port is slow to respond, please be patient (Status 0xd0) Anything I can do to figgure out what's causing this? You're showing various flags set in the SError register, which suggests you're having SATA communication problems with the drive. A bad SATA cable or power problems would be a strong possibility. It really would be nice if we decoded these things more usefully for the user (same with the regular ATA errors, like drivers/ide does), but in general SError showing up as non-zero is a bad thing: 0x40 = "Handshake error: When set to one, this bit indicates that one or more R_ERR handshake response was received in response to frame transmission. Such errors may be the result of a CRC error detected by the recipient, a disparity or 10b/8b decoding error, or other error condition leading to a negative handshake on a transmitted frame." 0x180 = "Link Sequence Error: When set to one, this bit indicates that one or more Link state machine error conditions was encountered since the last time this bit was cleared. The Link Layer state machine defines the conditions under which the link layer detects an erroneous transition." and "Transport state transition error: When set to one, this bit indicates that an error has occurred in the transition from one state to another within the Transport layer since the last time this bit was cleared." Just out of curiosity how often is that bit cleared? I believe that is cleared only on error handling or controller reset, so it just means that it happened sometime since boot or the last libata error recovery. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] "volatile considered harmful" document
Bill Davidsen wrote: Jonathan Corbet wrote: +There are still a few rare situations where volatile makes sense in the +kernel: + + - The above-mentioned accessor functions might use volatile on +architectures where direct I/O memory access does work. Essentially, +each accessor call becomes a little critical section on its own and +ensures that the access happens as expected by the programmer. + + - Inline assembly code which changes memory, but which has no other +visible side effects, risks being deleted by GCC. Adding the volatile +keyword to asm statements will prevent this removal. + + - The jiffies variable is special in that it can have a different value +every time it is referenced, but it can be read without any special +locking. So jiffies can be volatile, but the addition of other +variables of this type is frowned upon. Jiffies is considered to be a +"stupid legacy" issue in this regard. It would seem that any variable which is (a) subject to change by other threads or hardware, and (b) the value of which is going to be used without writing the variable, would be a valid use for volatile. You don't need volatile in that case, rmb() can be used. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] volatile considered harmful document
Bill Davidsen wrote: Jonathan Corbet wrote: +There are still a few rare situations where volatile makes sense in the +kernel: + + - The above-mentioned accessor functions might use volatile on +architectures where direct I/O memory access does work. Essentially, +each accessor call becomes a little critical section on its own and +ensures that the access happens as expected by the programmer. + + - Inline assembly code which changes memory, but which has no other +visible side effects, risks being deleted by GCC. Adding the volatile +keyword to asm statements will prevent this removal. + + - The jiffies variable is special in that it can have a different value +every time it is referenced, but it can be read without any special +locking. So jiffies can be volatile, but the addition of other +variables of this type is frowned upon. Jiffies is considered to be a +stupid legacy issue in this regard. It would seem that any variable which is (a) subject to change by other threads or hardware, and (b) the value of which is going to be used without writing the variable, would be a valid use for volatile. You don't need volatile in that case, rmb() can be used. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.21.1] SATA freeze
Gerhard Mack wrote: On Wed, 9 May 2007, Robert Hancock wrote: Gerhard Mack wrote: On Wed, 9 May 2007, Jeff Garzik wrote: Gerhard Mack wrote: May 9 14:51:35 mgerhard kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x180 action 0x2 frozen May 9 14:51:35 mgerhard kernel: ata1.00: cmd 35/00:00:80:6d:c8/00:04:09:00:00/e0 tag 0 cdb 0x0 data 524288 out May 9 14:51:35 mgerhard kernel: res 40/00:c8:68:65:c8/84:00:09:00:00/e0 Emask 0x4 (timeout) May 9 14:51:42 mgerhard kernel: ata1: port is slow to respond, please be patient (Status 0xd0) Anything I can do to figgure out what's causing this? You're showing various flags set in the SError register, which suggests you're having SATA communication problems with the drive. A bad SATA cable or power problems would be a strong possibility. It really would be nice if we decoded these things more usefully for the user (same with the regular ATA errors, like drivers/ide does), but in general SError showing up as non-zero is a bad thing: 0x40 = Handshake error: When set to one, this bit indicates that one or more R_ERR handshake response was received in response to frame transmission. Such errors may be the result of a CRC error detected by the recipient, a disparity or 10b/8b decoding error, or other error condition leading to a negative handshake on a transmitted frame. 0x180 = Link Sequence Error: When set to one, this bit indicates that one or more Link state machine error conditions was encountered since the last time this bit was cleared. The Link Layer state machine defines the conditions under which the link layer detects an erroneous transition. and Transport state transition error: When set to one, this bit indicates that an error has occurred in the transition from one state to another within the Transport layer since the last time this bit was cleared. Just out of curiosity how often is that bit cleared? I believe that is cleared only on error handling or controller reset, so it just means that it happened sometime since boot or the last libata error recovery. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.21.1] SATA freeze
Fred Moyer wrote: I just joined the list today so apologies if this email breaks any email client post threading. I have been seeing similar errors on two different systems. I applied Robert's sata_nv patch posted to the list on May 5th, and approved today by Jeff Garzik. I've taken several steps to insure that this isn't a faulty cable or drive issue. This is running on a hp dl145g2. Here is my lspci, dmesg, and relevant kernel config sections: (snip) ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen ata1.00: cmd b0/d2:f1:00:4f:c2/00:00:00:00:00/00 tag 0 cdb 0x0 data 123392 in res 50/00:f1:00:4f:c2/00:00:00:00:00/00 Emask 0x202 (HSM violation) ata1: soft resetting port ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 300) ata1.00: configured for UDMA/100 ata1: EH complete This appears to be a different problem. Something is issuing SMART-related commands (smartd or smartctl perhaps) which the drive seems to be reacting strangely to. It apparently completed the command but never raised DRQ to request any data being transferred even though we expected it to. Maybe SMART is disabled on the drive and that's causing it to just toss these commands? CCing linux-ide in case anyone knows what would cause this. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [2.6.21.1] SATA freeze
Fred Moyer wrote: This appears to be a different problem. Something is issuing SMART-related commands (smartd or smartctl perhaps) which the drive seems to be reacting strangely to. It apparently completed the command but never raised DRQ to request any data being transferred even though we expected it to. Maybe SMART is disabled on the drive and that's causing it to just toss these commands? CCing linux-ide in case anyone knows what would cause this. Here's smartctl -a for this drive - same output for both sda and sdb. Smartd is currently running. Any advice appreciated. Previously on 2.6.15 I was seeing sdb remount as readonly under heavy i/o. I have not seen that issue yet with 2.6.21 (with Robert's patch from May 5th for sata_nv), but that occurrence of remounts read-only was infrequently, so that issue may be solved. app2 ~ # smartctl -a /dev/sda smartctl version 5.36 [x86_64-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen Home page is http://smartmontools.sourceforge.net/ Device: ATA ST3808110AS Version: n/a Serial number: 5LR8895K Device type: disk Local Time is: Sat May 12 12:05:58 2007 PDT Device does not support SMART Error Counter logging not supported [GLTSD (Global Logging Target Save Disable) set. Enable Save with '-S on'] Device does not support Self Test logging Sounds like SMART is likely disabled on that drive. You can try doing smartctl -s on /dev/sda and see if that will turn it on. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Workaround for a PCI restoring bug
Lukas Hejtmanek wrote: Hello, as of 2.6.21-git16, the bugs related to restoring PCI are still present. The save pci function reads only -1 from the PCI config space and when restoring, it messes up totaly most PCI devices. The attached patch is workaround only until proper fix is found and included. Could it be included into the mainline for now? It's not really a fix, that value might be legitimately supposed to be in the config space. Sounds like some driver is disabling the device before saving the state or something.. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] libata: add human-readable error value decoding
Tejun Heo wrote: Chuck Ebbert wrote: Robert Hancock wrote: + ehc->i.serror & SERR_TRANS_ST_ERROR ? "TransStatTransErr " : "", + ehc->i.serror & SERR_UNRECOG_FIS ? "UnrecogFIS " : "", + ehc->i.serror & SERR_DEV_XCHG ? "DevExchanged " : "" ); I'm not really convinced whether this is necessary. The human readable form is also a bit cryptic and can get quite long. So, mild NACK from me. It certainly seems useful when debugging hotplug issues or random SATA problems which end up being caused by communication problems. Without this output, Joe User stands no chance of figuring out what's going on, and neither does Joe libata Developer unless they really care to dig through the spec and count bits to figure out what they mean. At least with this you can see that there was a CRC error, etc. and go from that.. Why not just document the error messages? And the scsi ones too, I can't seem to find what the sense codes mean. They are well documented elsewhere - the standard documents. For sense codes, t10.org. For SError bits, t13.org. You can get drafts free of charge. The ATA ones are more of a pain in that regard than SCSI though - SCSI has all distinct error codes for different errors, whereas ATA has bitmasks for everything.. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] libata: add human-readable error value decoding
Tejun Heo wrote: Chuck Ebbert wrote: Robert Hancock wrote: + ehc-i.serror SERR_TRANS_ST_ERROR ? TransStatTransErr : , + ehc-i.serror SERR_UNRECOG_FIS ? UnrecogFIS : , + ehc-i.serror SERR_DEV_XCHG ? DevExchanged : ); I'm not really convinced whether this is necessary. The human readable form is also a bit cryptic and can get quite long. So, mild NACK from me. It certainly seems useful when debugging hotplug issues or random SATA problems which end up being caused by communication problems. Without this output, Joe User stands no chance of figuring out what's going on, and neither does Joe libata Developer unless they really care to dig through the spec and count bits to figure out what they mean. At least with this you can see that there was a CRC error, etc. and go from that.. Why not just document the error messages? And the scsi ones too, I can't seem to find what the sense codes mean. They are well documented elsewhere - the standard documents. For sense codes, t10.org. For SError bits, t13.org. You can get drafts free of charge. The ATA ones are more of a pain in that regard than SCSI though - SCSI has all distinct error codes for different errors, whereas ATA has bitmasks for everything.. -- Robert Hancock Saskatoon, SK, Canada To email, remove nospam from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] libata: add human-readable error value decoding
Jeff Garzik wrote: Mark Lord wrote: If we're compiling the messages into the kernel regardless, then it doesn't really make much sense to NOT show all of them on the error paths. Not true. Uncontrolled message spewage inevitably results in critical information scrolling off the screen, before a user can take a digital photo of the output... Or of users being confused by subsequent error fallout (i.e. multiple oopses reporting problem). Moderation and restraint still have roles to play... :) Jeff I don't think this is as big of a deal here as in other cases, like oops output. With libata errors, if they're at the console (which they'd have to be to see these messages), unless something has actually caused a panic the scrollback buffer should still be functional and they'd be able to see the entire output.. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] libata: add human-readable error value decoding
Tejun Heo wrote: +if (ehc->i.serror) +ata_port_printk(ap, KERN_ERR, + "SError: {%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s%s}\n", + ehc->i.serror & SERR_DATA_RECOVERED ? "RecovDataErr " : "", + ehc->i.serror & SERR_COMM_RECOVERED ? "RecovCommErr " : "", + ehc->i.serror & SERR_DATA ? "UnrecovDataErr " : "", + ehc->i.serror & SERR_PERSISTENT ? "PersistErr " : "", + ehc->i.serror & SERR_PROTOCOL ? "ProtocolErr " : "", + ehc->i.serror & SERR_INTERNAL ? "HostInternalErr " : "", + ehc->i.serror & SERR_PHYRDY_CHG ? "PHYRdyChg " : "", + ehc->i.serror & SERR_PHY_INT_ERR ? "PHYInternalErr " : "", + ehc->i.serror & SERR_COMM_WAKE ? "CommWake " : "", + ehc->i.serror & SERR_10B_8B_ERR ? "10B8BErr " : "", + ehc->i.serror & SERR_DISPARITY ? "Disparity " : "", + ehc->i.serror & SERR_CRC ? "CRCErr " : "", + ehc->i.serror & SERR_HANDSHAKE ? "HandshakeErr " : "", + ehc->i.serror & SERR_LINK_SEQ_ERR ? "LinkSeqErr " : "", + ehc->i.serror & SERR_TRANS_ST_ERROR ? "TransStatTransErr " : "", + ehc->i.serror & SERR_UNRECOG_FIS ? "UnrecogFIS " : "", + ehc->i.serror & SERR_DEV_XCHG ? "DevExchanged " : "" ); I'm not really convinced whether this is necessary. The human readable form is also a bit cryptic and can get quite long. So, mild NACK from me. It certainly seems useful when debugging hotplug issues or random SATA problems which end up being caused by communication problems. Without this output, Joe User stands no chance of figuring out what's going on, and neither does Joe libata Developer unless they really care to dig through the spec and count bits to figure out what they mean. At least with this you can see that there was a CRC error, etc. and go from that.. -- Robert Hancock Saskatoon, SK, Canada To email, remove "nospam" from [EMAIL PROTECTED] Home Page: http://www.roberthancock.com/ - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/