Re: [PATCH 4/5] atmel_serial: Split the interrupt handler
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Tue, 18 Dec 2007, Haavard Skinnemoen wrote: > On Tue, 18 Dec 2007 18:06:14 +0100 > Haavard Skinnemoen <[EMAIL PROTECTED]> wrote: > > > From: Remy Bohmer <[EMAIL PROTECTED]> > > Heh. That's obviously wrong. Wonder what happened there? > > Looks like Chip's address got mangled too. You can find me at <[EMAIL PROTECTED]> or <[EMAIL PROTECTED]> these days, although <[EMAIL PROTECTED]> still works for the time being. Chip - -- Charles M. "Chip" Coldwell Senior Software Engineer Red Hat, Inc 978-392-2426 GPG ID: 852E052F GPG FPR: 77E5 2B51 4907 F08A 7E92 DE80 AFA9 9A8F 852E 052F -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) iD8DBQFHaA87r6maj4UuBS8RApaiAKCDFvC/WtA/s0pysvMIZIsTlAcN7wCffRoa JwA3E6Kt16pU9yi1dx1lCWk= =M528 -END PGP SIGNATURE- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 5/5] atmel_serial: Add DMA support
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Tue, 18 Dec 2007, Haavard Skinnemoen wrote: > From: Chip Coldwell <[EMAIL PROTECTED]> > > This patch is based on the DMA-patch by Chip Coldwell for the > AT91/AT32 serial USARTS, with some tweaks to make it apply neatly on > top of the other patches in this series. > > The RX code has been moved to a tasklet and reworked a bit. Instead of > depending on the ENDRX and TIMEOUT bits in CSR, we simply grab as much > data as we can from the DMA buffers. I think this closes a race where > the ENDRX bit is set after we read CSR but before we read RPR, > although I haven't confirmed this. > > This also fixes a DMA sync bug in the original patch. > > [EMAIL PROTECTED]: rebased onto irq-splitup patch] > [EMAIL PROTECTED]: moved to tasklet, fixed dma bug, misc cleanups] > Signed-off-by: Remy Bohmer <[EMAIL PROTECTED]> > Signed-off-by: Haavard Skinnemoen <[EMAIL PROTECTED]> > --- > drivers/serial/atmel_serial.c | 386 ++-- > 1 files changed, 366 insertions(+), 20 deletions(-) > > diff --git a/drivers/serial/atmel_serial.c b/drivers/serial/atmel_serial.c > index 990d3ab..07c2734 100644 > --- a/drivers/serial/atmel_serial.c > +++ b/drivers/serial/atmel_serial.c > @@ -7,6 +7,8 @@ > * Based on drivers/char/serial_sa1100.c, by Deep Blue Solutions Ltd. > * Based on drivers/char/serial.c, by Linus Torvalds, Theodore Ts'o. > * > + * DMA support added by Chip Coldwell. I will ACK/Sign-off on this soon; I just want to do some tests on real hardware first. Chip - -- Charles M. Coldwell "Turn on, log in, tune out" Somerville, Massachusetts, New England -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) iD8DBQFHaA7cr6maj4UuBS8RAjMSAJsGcKoFKCP/R3aAPhW5hj+v3Qt6ZACgshsF 5NP6/9+NbhDAxBC/7jo8J0Y= =hx4t -END PGP SIGNATURE- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 5/5] atmel_serial: Add DMA support
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Tue, 18 Dec 2007, Haavard Skinnemoen wrote: From: Chip Coldwell [EMAIL PROTECTED] This patch is based on the DMA-patch by Chip Coldwell for the AT91/AT32 serial USARTS, with some tweaks to make it apply neatly on top of the other patches in this series. The RX code has been moved to a tasklet and reworked a bit. Instead of depending on the ENDRX and TIMEOUT bits in CSR, we simply grab as much data as we can from the DMA buffers. I think this closes a race where the ENDRX bit is set after we read CSR but before we read RPR, although I haven't confirmed this. This also fixes a DMA sync bug in the original patch. [EMAIL PROTECTED]: rebased onto irq-splitup patch] [EMAIL PROTECTED]: moved to tasklet, fixed dma bug, misc cleanups] Signed-off-by: Remy Bohmer [EMAIL PROTECTED] Signed-off-by: Haavard Skinnemoen [EMAIL PROTECTED] --- drivers/serial/atmel_serial.c | 386 ++-- 1 files changed, 366 insertions(+), 20 deletions(-) diff --git a/drivers/serial/atmel_serial.c b/drivers/serial/atmel_serial.c index 990d3ab..07c2734 100644 --- a/drivers/serial/atmel_serial.c +++ b/drivers/serial/atmel_serial.c @@ -7,6 +7,8 @@ * Based on drivers/char/serial_sa1100.c, by Deep Blue Solutions Ltd. * Based on drivers/char/serial.c, by Linus Torvalds, Theodore Ts'o. * + * DMA support added by Chip Coldwell. I will ACK/Sign-off on this soon; I just want to do some tests on real hardware first. Chip - -- Charles M. Coldwell Turn on, log in, tune out Somerville, Massachusetts, New England -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) iD8DBQFHaA7cr6maj4UuBS8RAjMSAJsGcKoFKCP/R3aAPhW5hj+v3Qt6ZACgshsF 5NP6/9+NbhDAxBC/7jo8J0Y= =hx4t -END PGP SIGNATURE- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/5] atmel_serial: Split the interrupt handler
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Tue, 18 Dec 2007, Haavard Skinnemoen wrote: On Tue, 18 Dec 2007 18:06:14 +0100 Haavard Skinnemoen [EMAIL PROTECTED] wrote: From: Remy Bohmer [EMAIL PROTECTED] Heh. That's obviously wrong. Wonder what happened there? Looks like Chip's address got mangled too. You can find me at [EMAIL PROTECTED] or [EMAIL PROTECTED] these days, although [EMAIL PROTECTED] still works for the time being. Chip - -- Charles M. Chip Coldwell Senior Software Engineer Red Hat, Inc 978-392-2426 GPG ID: 852E052F GPG FPR: 77E5 2B51 4907 F08A 7E92 DE80 AFA9 9A8F 852E 052F -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) iD8DBQFHaA87r6maj4UuBS8RApaiAKCDFvC/WtA/s0pysvMIZIsTlAcN7wCffRoa JwA3E6Kt16pU9yi1dx1lCWk= =M528 -END PGP SIGNATURE- -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PATCH/RFC: [kdump] fix APIC shutdown sequence
On Wed, 8 Aug 2007, Vivek Goyal wrote: > On Tue, Aug 07, 2007 at 07:41:30PM +0200, Martin Wilck wrote: > > > > Can you explain how, on the front side bus, the IO-APIC knows whether > > a CPU has accepted the INT message? There is no response > > to the INT message on the bus, except for the EOI which comes much later. > > I'm not saying that you're wrong, I just really don't understand this > > point. > > > > I don't know what is exactly hardware protocol. I am just going by > intel documentation. I think it's important to distinguish between the LAPIC receiving an interrupt and the CPU receiving an interrupt. The former could happen without the latter if the CPU has set the TPR above the priority of the interrupt received by the LAPIC. In that case, the interrupt is kept pending in the LAPIC and recorded in the IRR if I understand the Intel documentation correctly. So I think the scenario which leaves IRR set when the kdump kernel starts is possible. Chip -- Charles M. "Chip" Coldwell Senior Software Engineer Red Hat, Inc 978-392-2426 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PATCH/RFC: [kdump] fix APIC shutdown sequence
On Wed, 8 Aug 2007, Vivek Goyal wrote: On Tue, Aug 07, 2007 at 07:41:30PM +0200, Martin Wilck wrote: Can you explain how, on the front side bus, the IO-APIC knows whether a CPU has accepted the INT message? There is no response to the INT message on the bus, except for the EOI which comes much later. I'm not saying that you're wrong, I just really don't understand this point. I don't know what is exactly hardware protocol. I am just going by intel documentation. I think it's important to distinguish between the LAPIC receiving an interrupt and the CPU receiving an interrupt. The former could happen without the latter if the CPU has set the TPR above the priority of the interrupt received by the LAPIC. In that case, the interrupt is kept pending in the LAPIC and recorded in the IRR if I understand the Intel documentation correctly. So I think the scenario which leaves IRR set when the kdump kernel starts is possible. Chip -- Charles M. Chip Coldwell Senior Software Engineer Red Hat, Inc 978-392-2426 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PATCH/RFC: [kdump] fix APIC shutdown sequence
On Tue, 7 Aug 2007, Vivek Goyal wrote: > On Mon, Aug 06, 2007 at 05:08:05PM +0200, Martin Wilck wrote: > > > 1. If, under SMP, the IO-APIC logical destination field is > >set by the IRQ balancing code to one of the "other" > >CPUs (i.e. not the crashing_cpu), and an IRQ arrives > >on the respective pin after that CPU has shut down > >its local APIC (but before the IO-APIC pin is masked) > >the IRQ message can't be delivered. > > Point 1 and Point 2 seems to be same. > > > > > 2. The crashing CPU itself disables its local APIC > >before the IO-APIC, leaving a short time window > >where the IOAPIC can receive IRQs, but not > >deliver them. > > > > I doubut that it would be the issue. Looking at intel IOAPIC (82093AA) > documentation, it says that IRR bit of IOAPIC will be set only if > destination CPU has accepted the interrupt. I think you mean "destination Local APIC has accepted the interrupt" above. The Intel documentation cited above contains this text on page 12: For level triggered interrupts, this bit is set to 1 when local APIC(s) accept the level interrupt sent by the IOAPIC. The Remote IRR bit is set to 0 when an EOI message with a matching interrupt vector is received from a local APIC. The following text is from the IA-64 documentation ... Any interrupt that is received by the processor is kept pending and recorded in the Interrupt Request Register (IRR). If the processor is not servicing an interrupt, then the contents of the TPR determines whether the processor will accept a pending interrupt depending on the priority of the interrupt compared to the current TPR. If the interrupt has a higher priority, then the processor is interrupted, otherwise the interrupt is kept pending. So, I think if the CPU has interrupts disabled, but the Local APIC does not, the IRR could get set. I guess we need to be sure to turn off the Local APIC first before disabling interrupts in the CPU. Chip -- Charles M. "Chip" Coldwell Senior Software Engineer Red Hat, Inc 978-392-2426 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: PATCH/RFC: [kdump] fix APIC shutdown sequence
On Tue, 7 Aug 2007, Vivek Goyal wrote: On Mon, Aug 06, 2007 at 05:08:05PM +0200, Martin Wilck wrote: 1. If, under SMP, the IO-APIC logical destination field is set by the IRQ balancing code to one of the other CPUs (i.e. not the crashing_cpu), and an IRQ arrives on the respective pin after that CPU has shut down its local APIC (but before the IO-APIC pin is masked) the IRQ message can't be delivered. Point 1 and Point 2 seems to be same. 2. The crashing CPU itself disables its local APIC before the IO-APIC, leaving a short time window where the IOAPIC can receive IRQs, but not deliver them. I doubut that it would be the issue. Looking at intel IOAPIC (82093AA) documentation, it says that IRR bit of IOAPIC will be set only if destination CPU has accepted the interrupt. I think you mean destination Local APIC has accepted the interrupt above. The Intel documentation cited above contains this text on page 12: For level triggered interrupts, this bit is set to 1 when local APIC(s) accept the level interrupt sent by the IOAPIC. The Remote IRR bit is set to 0 when an EOI message with a matching interrupt vector is received from a local APIC. The following text is from the IA-64 documentation ... Any interrupt that is received by the processor is kept pending and recorded in the Interrupt Request Register (IRR). If the processor is not servicing an interrupt, then the contents of the TPR determines whether the processor will accept a pending interrupt depending on the priority of the interrupt compared to the current TPR. If the interrupt has a higher priority, then the processor is interrupted, otherwise the interrupt is kept pending. So, I think if the CPU has interrupts disabled, but the Local APIC does not, the IRR could get set. I guess we need to be sure to turn off the Local APIC first before disabling interrupts in the CPU. Chip -- Charles M. Chip Coldwell Senior Software Engineer Red Hat, Inc 978-392-2426 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?)
On Wed, 17 Jan 2007, Andi Kleen wrote: > On Wednesday 17 January 2007 07:31, Chris Wedgwood wrote: > > On Tue, Jan 16, 2007 at 08:52:32PM +0100, Christoph Anton Mitterer wrote: > > > I agree,... it seems drastic, but this is the only really secure > > > solution. > > > > I'd like to here from Andi how he feels about this? It seems like a > > somewhat drastic solution in some ways given a lot of hardware doesn't > > seem to be affected (or maybe in those cases it's just really hard to > > hit, I don't know). > > AMD is looking at the issue. Only Nvidia chipsets seem to be affected, > although there were similar problems on VIA in the past too. > Unless a good workaround comes around soon I'll probably default > to iommu=soft on Nvidia. We (Sun, AMD, Nvidia and Red Hat) have been testing a patch that seems to solve the problem. AMD and Nvidia analyzed an HDT trace that seemed to indicate that CPU updates of the GATT were still in cache when a subsequent table walk caused by a device load used a stale GATT PTE. That analysis inspired this patch, submitted to this list as an RFC. It is not obvious (to me, at least) why this problem has only shown up on Nvidia SATA controllers. We are continuing to investigate. diff --git a/arch/x86_64/kernel/pci-gart.c b/arch/x86_64/kernel/pci-gart.c index 030eb37..1dd461a 100644 --- a/arch/x86_64/kernel/pci-gart.c +++ b/arch/x86_64/kernel/pci-gart.c @@ -69,6 +69,8 @@ static u32 gart_unmapped_entry; #define AGPEXTERN #endif +#define GATT_CLFLUSH(i) asm volatile ("clflush (%0)" :: "r" (iommu_gatt_base + (i))) + /* backdoor interface to AGP driver */ AGPEXTERN int agp_memory_reserved; AGPEXTERN __u32 *agp_gatt_table; @@ -221,6 +223,7 @@ static dma_addr_t dma_map_area(struct device *dev, dma_addr_t phys_mem, for (i = 0; i < npages; i++) { iommu_gatt_base[iommu_page + i] = GPTE_ENCODE(phys_mem); SET_LEAK(iommu_page + i); + GATT_CLFLUSH(iommu_page + i); phys_mem += PAGE_SIZE; } return iommu_bus_base + iommu_page*PAGE_SIZE + (phys_mem & ~PAGE_MASK); @@ -348,6 +351,7 @@ static int __dma_map_cont(struct scatterlist *sg, int start, int stopat, while (pages--) { iommu_gatt_base[iommu_page] = GPTE_ENCODE(addr); SET_LEAK(iommu_page); + GATT_CLFLUSH(iommu_page); addr += PAGE_SIZE; iommu_page++; } Chip -- Charles M. "Chip" Coldwell Senior Software Engineer Red Hat, Inc 978-392-2426 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?)
On Wed, 17 Jan 2007, Andi Kleen wrote: On Wednesday 17 January 2007 07:31, Chris Wedgwood wrote: On Tue, Jan 16, 2007 at 08:52:32PM +0100, Christoph Anton Mitterer wrote: I agree,... it seems drastic, but this is the only really secure solution. I'd like to here from Andi how he feels about this? It seems like a somewhat drastic solution in some ways given a lot of hardware doesn't seem to be affected (or maybe in those cases it's just really hard to hit, I don't know). AMD is looking at the issue. Only Nvidia chipsets seem to be affected, although there were similar problems on VIA in the past too. Unless a good workaround comes around soon I'll probably default to iommu=soft on Nvidia. We (Sun, AMD, Nvidia and Red Hat) have been testing a patch that seems to solve the problem. AMD and Nvidia analyzed an HDT trace that seemed to indicate that CPU updates of the GATT were still in cache when a subsequent table walk caused by a device load used a stale GATT PTE. That analysis inspired this patch, submitted to this list as an RFC. It is not obvious (to me, at least) why this problem has only shown up on Nvidia SATA controllers. We are continuing to investigate. diff --git a/arch/x86_64/kernel/pci-gart.c b/arch/x86_64/kernel/pci-gart.c index 030eb37..1dd461a 100644 --- a/arch/x86_64/kernel/pci-gart.c +++ b/arch/x86_64/kernel/pci-gart.c @@ -69,6 +69,8 @@ static u32 gart_unmapped_entry; #define AGPEXTERN #endif +#define GATT_CLFLUSH(i) asm volatile (clflush (%0) :: r (iommu_gatt_base + (i))) + /* backdoor interface to AGP driver */ AGPEXTERN int agp_memory_reserved; AGPEXTERN __u32 *agp_gatt_table; @@ -221,6 +223,7 @@ static dma_addr_t dma_map_area(struct device *dev, dma_addr_t phys_mem, for (i = 0; i npages; i++) { iommu_gatt_base[iommu_page + i] = GPTE_ENCODE(phys_mem); SET_LEAK(iommu_page + i); + GATT_CLFLUSH(iommu_page + i); phys_mem += PAGE_SIZE; } return iommu_bus_base + iommu_page*PAGE_SIZE + (phys_mem ~PAGE_MASK); @@ -348,6 +351,7 @@ static int __dma_map_cont(struct scatterlist *sg, int start, int stopat, while (pages--) { iommu_gatt_base[iommu_page] = GPTE_ENCODE(addr); SET_LEAK(iommu_page); + GATT_CLFLUSH(iommu_page); addr += PAGE_SIZE; iommu_page++; } Chip -- Charles M. Chip Coldwell Senior Software Engineer Red Hat, Inc 978-392-2426 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?)
On Thu, 18 Jan 2007, Andi Kleen wrote: The Northbridge guarantees coherency over the aperture, but only if the caching attributes match. That's interesting. Makes sense, I suppose. You would need to change_page_attr() every kernel address that is mapped into the IOMMU to use an uncached aperture. AGP does this, but the frequency of mapping for the IOMMU is much higher and it would be prohibitively costly unfortunately. But it still might be a reasonable thing to do to test the theory that the problem is cache coherency across the graphics aperture, even if it isn't a long-term solution for the problem. In the past we saw corruptions from such conflicts, so this is more than just theory. I suspect you traded a more easy to trigger corruption with a more subtle one. Yup. That was the inspiration for the script. Chip -- Charles M. "Chip" Coldwell Senior Software Engineer Red Hat, Inc 978-392-2426 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?)
On Thu, 18 Jan 2007, Andi Kleen wrote: The Northbridge guarantees coherency over the aperture, but only if the caching attributes match. That's interesting. Makes sense, I suppose. You would need to change_page_attr() every kernel address that is mapped into the IOMMU to use an uncached aperture. AGP does this, but the frequency of mapping for the IOMMU is much higher and it would be prohibitively costly unfortunately. But it still might be a reasonable thing to do to test the theory that the problem is cache coherency across the graphics aperture, even if it isn't a long-term solution for the problem. In the past we saw corruptions from such conflicts, so this is more than just theory. I suspect you traded a more easy to trigger corruption with a more subtle one. Yup. That was the inspiration for the script. Chip -- Charles M. Chip Coldwell Senior Software Engineer Red Hat, Inc 978-392-2426 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?)
On Wed, 17 Jan 2007, Chip Coldwell wrote: On Wed, 17 Jan 2007, Andi Kleen wrote: On Wednesday 17 January 2007 07:31, Chris Wedgwood wrote: On Tue, Jan 16, 2007 at 08:52:32PM +0100, Christoph Anton Mitterer wrote: I agree,... it seems drastic, but this is the only really secure solution. I'd like to here from Andi how he feels about this? It seems like a somewhat drastic solution in some ways given a lot of hardware doesn't seem to be affected (or maybe in those cases it's just really hard to hit, I don't know). AMD is looking at the issue. Only Nvidia chipsets seem to be affected, although there were similar problems on VIA in the past too. Unless a good workaround comes around soon I'll probably default to iommu=soft on Nvidia. We've just verified that configuring the graphics aperture to be write-combining instead of write-back using an MTRR also solves the problem. It appears to be a cache incoherency issue in the graphics aperture. I take it back. Further testing has revealed that this does not solve the problem. Chip -- Charles M. "Chip" Coldwell Senior Software Engineer Red Hat, Inc 978-392-2426 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?)
On Wed, 17 Jan 2007, Andi Kleen wrote: On Wednesday 17 January 2007 07:31, Chris Wedgwood wrote: On Tue, Jan 16, 2007 at 08:52:32PM +0100, Christoph Anton Mitterer wrote: I agree,... it seems drastic, but this is the only really secure solution. I'd like to here from Andi how he feels about this? It seems like a somewhat drastic solution in some ways given a lot of hardware doesn't seem to be affected (or maybe in those cases it's just really hard to hit, I don't know). AMD is looking at the issue. Only Nvidia chipsets seem to be affected, although there were similar problems on VIA in the past too. Unless a good workaround comes around soon I'll probably default to iommu=soft on Nvidia. -Andi - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ We've just verified that configuring the graphics aperture to be write-combining instead of write-back using an MTRR also solves the problem. It appears to be a cache incoherency issue in the graphics aperture. This script does the trick: [ -- cut here -- ] #!/bin/bash # Read the northbridge offset 0x90 to get the size of the aperture size=0x`lspci -xxx -s 0:18.3 | awk '/^90:/ { print $2 }'` # bit 0 indicates the aperture is enabled, bits 1 - 3 indicate the size if [ $((size & 1)) -eq 0 ] ; then echo "GART disabled; exiting" exit 0 fi shft=$(((size >> 1) & 7)) size=$((0x200 << shft)) # Read the northbridge offset 0x94 to get the base address of the aperture base=0x`lspci -xxx -s 0:18.3 | awk '/^90:/ { print $6 }'` base=$((base << 25)) basehex=`printf 0x%08x $base` printf "IOMMU aperture found at base=0x%08x size=0x%08x (%d KiB)\n" $base $size $((size/1024)) if grep -q $basehex /proc/mtrr ; then echo "MTRR already configured for IOMMU aperture; exiting" exit 0 fi echo "Configuring write-combining MTRR for IOMMU aperture" printf "base=0x%08x size=0x%08x type=write-combining\n" $base $size >/proc/mtrr exit 0 [ -- cut here-- ] Chip -- Charles M. "Chip" Coldwell Senior Software Engineer Red Hat, Inc 978-392-2426 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?)
On Wed, 17 Jan 2007, Andi Kleen wrote: On Wednesday 17 January 2007 07:31, Chris Wedgwood wrote: On Tue, Jan 16, 2007 at 08:52:32PM +0100, Christoph Anton Mitterer wrote: I agree,... it seems drastic, but this is the only really secure solution. I'd like to here from Andi how he feels about this? It seems like a somewhat drastic solution in some ways given a lot of hardware doesn't seem to be affected (or maybe in those cases it's just really hard to hit, I don't know). AMD is looking at the issue. Only Nvidia chipsets seem to be affected, although there were similar problems on VIA in the past too. Unless a good workaround comes around soon I'll probably default to iommu=soft on Nvidia. -Andi - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ We've just verified that configuring the graphics aperture to be write-combining instead of write-back using an MTRR also solves the problem. It appears to be a cache incoherency issue in the graphics aperture. This script does the trick: [ -- cut here -- ] #!/bin/bash # Read the northbridge offset 0x90 to get the size of the aperture size=0x`lspci -xxx -s 0:18.3 | awk '/^90:/ { print $2 }'` # bit 0 indicates the aperture is enabled, bits 1 - 3 indicate the size if [ $((size 1)) -eq 0 ] ; then echo GART disabled; exiting exit 0 fi shft=$(((size 1) 7)) size=$((0x200 shft)) # Read the northbridge offset 0x94 to get the base address of the aperture base=0x`lspci -xxx -s 0:18.3 | awk '/^90:/ { print $6 }'` base=$((base 25)) basehex=`printf 0x%08x $base` printf IOMMU aperture found at base=0x%08x size=0x%08x (%d KiB)\n $base $size $((size/1024)) if grep -q $basehex /proc/mtrr ; then echo MTRR already configured for IOMMU aperture; exiting exit 0 fi echo Configuring write-combining MTRR for IOMMU aperture printf base=0x%08x size=0x%08x type=write-combining\n $base $size /proc/mtrr exit 0 [ -- cut here-- ] Chip -- Charles M. Chip Coldwell Senior Software Engineer Red Hat, Inc 978-392-2426 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?)
On Wed, 17 Jan 2007, Chip Coldwell wrote: On Wed, 17 Jan 2007, Andi Kleen wrote: On Wednesday 17 January 2007 07:31, Chris Wedgwood wrote: On Tue, Jan 16, 2007 at 08:52:32PM +0100, Christoph Anton Mitterer wrote: I agree,... it seems drastic, but this is the only really secure solution. I'd like to here from Andi how he feels about this? It seems like a somewhat drastic solution in some ways given a lot of hardware doesn't seem to be affected (or maybe in those cases it's just really hard to hit, I don't know). AMD is looking at the issue. Only Nvidia chipsets seem to be affected, although there were similar problems on VIA in the past too. Unless a good workaround comes around soon I'll probably default to iommu=soft on Nvidia. We've just verified that configuring the graphics aperture to be write-combining instead of write-back using an MTRR also solves the problem. It appears to be a cache incoherency issue in the graphics aperture. I take it back. Further testing has revealed that this does not solve the problem. Chip -- Charles M. Chip Coldwell Senior Software Engineer Red Hat, Inc 978-392-2426 - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/