Re: [PATCH 4/5] atmel_serial: Split the interrupt handler

2007-12-18 Thread Chip Coldwell
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Tue, 18 Dec 2007, Haavard Skinnemoen wrote:

> On Tue, 18 Dec 2007 18:06:14 +0100
> Haavard Skinnemoen <[EMAIL PROTECTED]> wrote:
> 
> > From: Remy Bohmer <[EMAIL PROTECTED]>
> 
> Heh. That's obviously wrong. Wonder what happened there?
> 
> Looks like Chip's address got mangled too.

You can find me at <[EMAIL PROTECTED]> or <[EMAIL PROTECTED]> these
days, although <[EMAIL PROTECTED]> still works for the time
being.

Chip

- -- 
Charles M. "Chip" Coldwell
Senior Software Engineer
Red Hat, Inc
978-392-2426

GPG ID:  852E052F
GPG FPR: 77E5 2B51 4907 F08A 7E92  DE80 AFA9 9A8F 852E 052F
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQFHaA87r6maj4UuBS8RApaiAKCDFvC/WtA/s0pysvMIZIsTlAcN7wCffRoa
JwA3E6Kt16pU9yi1dx1lCWk=
=M528
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/5] atmel_serial: Add DMA support

2007-12-18 Thread Chip Coldwell
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Tue, 18 Dec 2007, Haavard Skinnemoen wrote:

> From: Chip Coldwell <[EMAIL PROTECTED]>
> 
> This patch is based on the DMA-patch by Chip Coldwell for the
> AT91/AT32 serial USARTS, with some tweaks to make it apply neatly on
> top of the other patches in this series.
> 
> The RX code has been moved to a tasklet and reworked a bit. Instead of
> depending on the ENDRX and TIMEOUT bits in CSR, we simply grab as much
> data as we can from the DMA buffers. I think this closes a race where
> the ENDRX bit is set after we read CSR but before we read RPR,
> although I haven't confirmed this.
> 
> This also fixes a DMA sync bug in the original patch.
> 
> [EMAIL PROTECTED]: rebased onto irq-splitup patch]
> [EMAIL PROTECTED]: moved to tasklet, fixed dma bug, misc cleanups]
> Signed-off-by: Remy Bohmer <[EMAIL PROTECTED]>
> Signed-off-by: Haavard Skinnemoen <[EMAIL PROTECTED]>
> ---
>  drivers/serial/atmel_serial.c |  386 ++--
>  1 files changed, 366 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/serial/atmel_serial.c b/drivers/serial/atmel_serial.c
> index 990d3ab..07c2734 100644
> --- a/drivers/serial/atmel_serial.c
> +++ b/drivers/serial/atmel_serial.c
> @@ -7,6 +7,8 @@
>   *  Based on drivers/char/serial_sa1100.c, by Deep Blue Solutions Ltd.
>   *  Based on drivers/char/serial.c, by Linus Torvalds, Theodore Ts'o.
>   *
> + *  DMA support added by Chip Coldwell.

I will ACK/Sign-off on this soon; I just want to do some tests on real
hardware first.

Chip

- -- 
Charles M. Coldwell
"Turn on, log in, tune out"
Somerville, Massachusetts, New England
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQFHaA7cr6maj4UuBS8RAjMSAJsGcKoFKCP/R3aAPhW5hj+v3Qt6ZACgshsF
5NP6/9+NbhDAxBC/7jo8J0Y=
=hx4t
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/5] atmel_serial: Add DMA support

2007-12-18 Thread Chip Coldwell
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Tue, 18 Dec 2007, Haavard Skinnemoen wrote:

 From: Chip Coldwell [EMAIL PROTECTED]
 
 This patch is based on the DMA-patch by Chip Coldwell for the
 AT91/AT32 serial USARTS, with some tweaks to make it apply neatly on
 top of the other patches in this series.
 
 The RX code has been moved to a tasklet and reworked a bit. Instead of
 depending on the ENDRX and TIMEOUT bits in CSR, we simply grab as much
 data as we can from the DMA buffers. I think this closes a race where
 the ENDRX bit is set after we read CSR but before we read RPR,
 although I haven't confirmed this.
 
 This also fixes a DMA sync bug in the original patch.
 
 [EMAIL PROTECTED]: rebased onto irq-splitup patch]
 [EMAIL PROTECTED]: moved to tasklet, fixed dma bug, misc cleanups]
 Signed-off-by: Remy Bohmer [EMAIL PROTECTED]
 Signed-off-by: Haavard Skinnemoen [EMAIL PROTECTED]
 ---
  drivers/serial/atmel_serial.c |  386 ++--
  1 files changed, 366 insertions(+), 20 deletions(-)
 
 diff --git a/drivers/serial/atmel_serial.c b/drivers/serial/atmel_serial.c
 index 990d3ab..07c2734 100644
 --- a/drivers/serial/atmel_serial.c
 +++ b/drivers/serial/atmel_serial.c
 @@ -7,6 +7,8 @@
   *  Based on drivers/char/serial_sa1100.c, by Deep Blue Solutions Ltd.
   *  Based on drivers/char/serial.c, by Linus Torvalds, Theodore Ts'o.
   *
 + *  DMA support added by Chip Coldwell.

I will ACK/Sign-off on this soon; I just want to do some tests on real
hardware first.

Chip

- -- 
Charles M. Coldwell
Turn on, log in, tune out
Somerville, Massachusetts, New England
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQFHaA7cr6maj4UuBS8RAjMSAJsGcKoFKCP/R3aAPhW5hj+v3Qt6ZACgshsF
5NP6/9+NbhDAxBC/7jo8J0Y=
=hx4t
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/5] atmel_serial: Split the interrupt handler

2007-12-18 Thread Chip Coldwell
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Tue, 18 Dec 2007, Haavard Skinnemoen wrote:

 On Tue, 18 Dec 2007 18:06:14 +0100
 Haavard Skinnemoen [EMAIL PROTECTED] wrote:
 
  From: Remy Bohmer [EMAIL PROTECTED]
 
 Heh. That's obviously wrong. Wonder what happened there?
 
 Looks like Chip's address got mangled too.

You can find me at [EMAIL PROTECTED] or [EMAIL PROTECTED] these
days, although [EMAIL PROTECTED] still works for the time
being.

Chip

- -- 
Charles M. Chip Coldwell
Senior Software Engineer
Red Hat, Inc
978-392-2426

GPG ID:  852E052F
GPG FPR: 77E5 2B51 4907 F08A 7E92  DE80 AFA9 9A8F 852E 052F
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQFHaA87r6maj4UuBS8RApaiAKCDFvC/WtA/s0pysvMIZIsTlAcN7wCffRoa
JwA3E6Kt16pU9yi1dx1lCWk=
=M528
-END PGP SIGNATURE-
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PATCH/RFC: [kdump] fix APIC shutdown sequence

2007-08-08 Thread Chip Coldwell
On Wed, 8 Aug 2007, Vivek Goyal wrote:

> On Tue, Aug 07, 2007 at 07:41:30PM +0200, Martin Wilck wrote:
> > 
> > Can you explain how, on the front side bus, the IO-APIC knows whether
> > a CPU has accepted the INT message? There is no response
> > to the INT message on the bus, except for the EOI which comes much later.
> > I'm not saying that you're wrong, I just really don't understand this
> > point.
> > 
> 
> I don't know what is exactly hardware protocol. I am just going by 
> intel documentation. 

I think it's important to distinguish between the LAPIC receiving an
interrupt and the CPU receiving an interrupt.  The former could happen
without the latter if the CPU has set the TPR above the priority of
the interrupt received by the LAPIC.  In that case, the interrupt is
kept pending in the LAPIC and recorded in the IRR if I understand the
Intel documentation correctly.

So I think the scenario which leaves IRR set when the kdump kernel
starts is possible.

Chip

-- 
Charles M. "Chip" Coldwell
Senior Software Engineer
Red Hat, Inc
978-392-2426

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PATCH/RFC: [kdump] fix APIC shutdown sequence

2007-08-08 Thread Chip Coldwell
On Wed, 8 Aug 2007, Vivek Goyal wrote:

 On Tue, Aug 07, 2007 at 07:41:30PM +0200, Martin Wilck wrote:
  
  Can you explain how, on the front side bus, the IO-APIC knows whether
  a CPU has accepted the INT message? There is no response
  to the INT message on the bus, except for the EOI which comes much later.
  I'm not saying that you're wrong, I just really don't understand this
  point.
  
 
 I don't know what is exactly hardware protocol. I am just going by 
 intel documentation. 

I think it's important to distinguish between the LAPIC receiving an
interrupt and the CPU receiving an interrupt.  The former could happen
without the latter if the CPU has set the TPR above the priority of
the interrupt received by the LAPIC.  In that case, the interrupt is
kept pending in the LAPIC and recorded in the IRR if I understand the
Intel documentation correctly.

So I think the scenario which leaves IRR set when the kdump kernel
starts is possible.

Chip

-- 
Charles M. Chip Coldwell
Senior Software Engineer
Red Hat, Inc
978-392-2426

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PATCH/RFC: [kdump] fix APIC shutdown sequence

2007-08-07 Thread Chip Coldwell
On Tue, 7 Aug 2007, Vivek Goyal wrote:

> On Mon, Aug 06, 2007 at 05:08:05PM +0200, Martin Wilck wrote:
> 
> > 1. If, under SMP, the IO-APIC logical destination field is
> >set by the IRQ balancing code to one of the "other"
> >CPUs (i.e. not the crashing_cpu), and an IRQ arrives
> >on the respective pin after that CPU has shut down
> >its local APIC (but before the IO-APIC pin is masked)
> >the IRQ message can't be delivered.
> 
> Point 1 and Point 2 seems to be same.
> 
> > 
> > 2. The crashing CPU itself disables its local APIC
> >before the IO-APIC, leaving a short time window
> >where the IOAPIC can receive IRQs, but not
> >deliver them.
> > 
> 
> I doubut that it would be the issue. Looking at intel IOAPIC (82093AA)
> documentation, it says that IRR bit of IOAPIC will be set only if
> destination CPU has accepted the interrupt.

I think you mean "destination Local APIC has accepted the interrupt"
above.  The Intel documentation cited above contains this text on page
12:

For level triggered interrupts, this bit is set to 1 when local
APIC(s) accept the level interrupt sent by the IOAPIC. The Remote
IRR bit is set to 0 when an EOI message with a matching interrupt
vector is received from a local APIC.

The following text is from the IA-64 documentation ...

Any interrupt that is received by the processor is kept pending
and recorded in the Interrupt Request Register (IRR). If the
processor is not servicing an interrupt, then the contents of the
TPR determines whether the processor will accept a pending
interrupt depending on the priority of the interrupt compared to
the current TPR. If the interrupt has a higher priority, then the
processor is interrupted, otherwise the interrupt is kept pending.

So, I think if the CPU has interrupts disabled, but the Local APIC
does not, the IRR could get set.  I guess we need to be sure to turn
off the Local APIC first before disabling interrupts in the CPU.

Chip

-- 
Charles M. "Chip" Coldwell
Senior Software Engineer
Red Hat, Inc
978-392-2426

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: PATCH/RFC: [kdump] fix APIC shutdown sequence

2007-08-07 Thread Chip Coldwell
On Tue, 7 Aug 2007, Vivek Goyal wrote:

 On Mon, Aug 06, 2007 at 05:08:05PM +0200, Martin Wilck wrote:
 
  1. If, under SMP, the IO-APIC logical destination field is
 set by the IRQ balancing code to one of the other
 CPUs (i.e. not the crashing_cpu), and an IRQ arrives
 on the respective pin after that CPU has shut down
 its local APIC (but before the IO-APIC pin is masked)
 the IRQ message can't be delivered.
 
 Point 1 and Point 2 seems to be same.
 
  
  2. The crashing CPU itself disables its local APIC
 before the IO-APIC, leaving a short time window
 where the IOAPIC can receive IRQs, but not
 deliver them.
  
 
 I doubut that it would be the issue. Looking at intel IOAPIC (82093AA)
 documentation, it says that IRR bit of IOAPIC will be set only if
 destination CPU has accepted the interrupt.

I think you mean destination Local APIC has accepted the interrupt
above.  The Intel documentation cited above contains this text on page
12:

For level triggered interrupts, this bit is set to 1 when local
APIC(s) accept the level interrupt sent by the IOAPIC. The Remote
IRR bit is set to 0 when an EOI message with a matching interrupt
vector is received from a local APIC.

The following text is from the IA-64 documentation ...

Any interrupt that is received by the processor is kept pending
and recorded in the Interrupt Request Register (IRR). If the
processor is not servicing an interrupt, then the contents of the
TPR determines whether the processor will accept a pending
interrupt depending on the priority of the interrupt compared to
the current TPR. If the interrupt has a higher priority, then the
processor is interrupted, otherwise the interrupt is kept pending.

So, I think if the CPU has interrupts disabled, but the Local APIC
does not, the IRR could get set.  I guess we need to be sure to turn
off the Local APIC first before disabling interrupts in the CPU.

Chip

-- 
Charles M. Chip Coldwell
Senior Software Engineer
Red Hat, Inc
978-392-2426

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?)

2007-02-21 Thread Chip Coldwell
On Wed, 17 Jan 2007, Andi Kleen wrote:

> On Wednesday 17 January 2007 07:31, Chris Wedgwood wrote:
> > On Tue, Jan 16, 2007 at 08:52:32PM +0100, Christoph Anton Mitterer wrote:
> > > I agree,... it seems drastic, but this is the only really secure
> > > solution.
> >
> > I'd like to here from Andi how he feels about this?  It seems like a
> > somewhat drastic solution in some ways given a lot of hardware doesn't
> > seem to be affected (or maybe in those cases it's just really hard to
> > hit, I don't know).
> 
> AMD is looking at the issue. Only Nvidia chipsets seem to be affected,
> although there were similar problems on VIA in the past too.
> Unless a good workaround comes around soon I'll probably default
> to iommu=soft on Nvidia.

We (Sun, AMD, Nvidia and Red Hat) have been testing a patch that seems
to solve the problem.  AMD and Nvidia analyzed an HDT trace that
seemed to indicate that CPU updates of the GATT were still in cache
when a subsequent table walk caused by a device load used a stale GATT
PTE.  That analysis inspired this patch, submitted to this list as an
RFC.  It is not obvious (to me, at least) why this problem has only
shown up on Nvidia SATA controllers.

We are continuing to investigate.

diff --git a/arch/x86_64/kernel/pci-gart.c b/arch/x86_64/kernel/pci-gart.c
index 030eb37..1dd461a 100644
--- a/arch/x86_64/kernel/pci-gart.c
+++ b/arch/x86_64/kernel/pci-gart.c
@@ -69,6 +69,8 @@ static u32 gart_unmapped_entry;
 #define AGPEXTERN
 #endif
 
+#define GATT_CLFLUSH(i) asm volatile ("clflush (%0)" :: "r" (iommu_gatt_base + 
(i)))
+
 /* backdoor interface to AGP driver */
 AGPEXTERN int agp_memory_reserved;
 AGPEXTERN __u32 *agp_gatt_table;
@@ -221,6 +223,7 @@ static dma_addr_t dma_map_area(struct device *dev, 
dma_addr_t phys_mem,
for (i = 0; i < npages; i++) {
iommu_gatt_base[iommu_page + i] = GPTE_ENCODE(phys_mem);
SET_LEAK(iommu_page + i);
+   GATT_CLFLUSH(iommu_page + i);
phys_mem += PAGE_SIZE;
}
return iommu_bus_base + iommu_page*PAGE_SIZE + (phys_mem & ~PAGE_MASK);
@@ -348,6 +351,7 @@ static int __dma_map_cont(struct scatterlist *sg, int 
start, int stopat,
while (pages--) { 
iommu_gatt_base[iommu_page] = GPTE_ENCODE(addr); 
SET_LEAK(iommu_page);
+   GATT_CLFLUSH(iommu_page);
addr += PAGE_SIZE;
iommu_page++;
}


Chip

-- 
Charles M. "Chip" Coldwell
Senior Software Engineer
Red Hat, Inc
978-392-2426

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?)

2007-02-21 Thread Chip Coldwell
On Wed, 17 Jan 2007, Andi Kleen wrote:

 On Wednesday 17 January 2007 07:31, Chris Wedgwood wrote:
  On Tue, Jan 16, 2007 at 08:52:32PM +0100, Christoph Anton Mitterer wrote:
   I agree,... it seems drastic, but this is the only really secure
   solution.
 
  I'd like to here from Andi how he feels about this?  It seems like a
  somewhat drastic solution in some ways given a lot of hardware doesn't
  seem to be affected (or maybe in those cases it's just really hard to
  hit, I don't know).
 
 AMD is looking at the issue. Only Nvidia chipsets seem to be affected,
 although there were similar problems on VIA in the past too.
 Unless a good workaround comes around soon I'll probably default
 to iommu=soft on Nvidia.

We (Sun, AMD, Nvidia and Red Hat) have been testing a patch that seems
to solve the problem.  AMD and Nvidia analyzed an HDT trace that
seemed to indicate that CPU updates of the GATT were still in cache
when a subsequent table walk caused by a device load used a stale GATT
PTE.  That analysis inspired this patch, submitted to this list as an
RFC.  It is not obvious (to me, at least) why this problem has only
shown up on Nvidia SATA controllers.

We are continuing to investigate.

diff --git a/arch/x86_64/kernel/pci-gart.c b/arch/x86_64/kernel/pci-gart.c
index 030eb37..1dd461a 100644
--- a/arch/x86_64/kernel/pci-gart.c
+++ b/arch/x86_64/kernel/pci-gart.c
@@ -69,6 +69,8 @@ static u32 gart_unmapped_entry;
 #define AGPEXTERN
 #endif
 
+#define GATT_CLFLUSH(i) asm volatile (clflush (%0) :: r (iommu_gatt_base + 
(i)))
+
 /* backdoor interface to AGP driver */
 AGPEXTERN int agp_memory_reserved;
 AGPEXTERN __u32 *agp_gatt_table;
@@ -221,6 +223,7 @@ static dma_addr_t dma_map_area(struct device *dev, 
dma_addr_t phys_mem,
for (i = 0; i  npages; i++) {
iommu_gatt_base[iommu_page + i] = GPTE_ENCODE(phys_mem);
SET_LEAK(iommu_page + i);
+   GATT_CLFLUSH(iommu_page + i);
phys_mem += PAGE_SIZE;
}
return iommu_bus_base + iommu_page*PAGE_SIZE + (phys_mem  ~PAGE_MASK);
@@ -348,6 +351,7 @@ static int __dma_map_cont(struct scatterlist *sg, int 
start, int stopat,
while (pages--) { 
iommu_gatt_base[iommu_page] = GPTE_ENCODE(addr); 
SET_LEAK(iommu_page);
+   GATT_CLFLUSH(iommu_page);
addr += PAGE_SIZE;
iommu_page++;
}


Chip

-- 
Charles M. Chip Coldwell
Senior Software Engineer
Red Hat, Inc
978-392-2426

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?)

2007-01-18 Thread Chip Coldwell

On Thu, 18 Jan 2007, Andi Kleen wrote:


The Northbridge guarantees coherency over the aperture, but
only if the caching attributes match.


That's interesting.  Makes sense, I suppose.


You would need to change_page_attr() every kernel address that is mapped into
the  IOMMU to use an uncached aperture. AGP does this, but the frequency of
mapping for the IOMMU  is much higher and it would be prohibitively costly
unfortunately.


But it still might be a reasonable thing to do to test the theory that
the problem is cache coherency across the graphics aperture, even if
it isn't a long-term solution for the problem.


In the past we saw corruptions from such conflicts, so this is more
than just theory. I suspect you traded a more easy to trigger
corruption with a more subtle one.


Yup.  That was the inspiration for the script.

Chip

--
Charles M. "Chip" Coldwell
Senior Software Engineer
Red Hat, Inc
978-392-2426

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?)

2007-01-18 Thread Chip Coldwell

On Thu, 18 Jan 2007, Andi Kleen wrote:


The Northbridge guarantees coherency over the aperture, but
only if the caching attributes match.


That's interesting.  Makes sense, I suppose.


You would need to change_page_attr() every kernel address that is mapped into
the  IOMMU to use an uncached aperture. AGP does this, but the frequency of
mapping for the IOMMU  is much higher and it would be prohibitively costly
unfortunately.


But it still might be a reasonable thing to do to test the theory that
the problem is cache coherency across the graphics aperture, even if
it isn't a long-term solution for the problem.


In the past we saw corruptions from such conflicts, so this is more
than just theory. I suspect you traded a more easy to trigger
corruption with a more subtle one.


Yup.  That was the inspiration for the script.

Chip

--
Charles M. Chip Coldwell
Senior Software Engineer
Red Hat, Inc
978-392-2426

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?)

2007-01-17 Thread Chip Coldwell

On Wed, 17 Jan 2007, Chip Coldwell wrote:


On Wed, 17 Jan 2007, Andi Kleen wrote:


On Wednesday 17 January 2007 07:31, Chris Wedgwood wrote:

On Tue, Jan 16, 2007 at 08:52:32PM +0100, Christoph Anton Mitterer wrote:

I agree,... it seems drastic, but this is the only really secure
solution.


I'd like to here from Andi how he feels about this?  It seems like a
somewhat drastic solution in some ways given a lot of hardware doesn't
seem to be affected (or maybe in those cases it's just really hard to
hit, I don't know).


AMD is looking at the issue. Only Nvidia chipsets seem to be affected,
although there were similar problems on VIA in the past too.
Unless a good workaround comes around soon I'll probably default
to iommu=soft on Nvidia.



We've just verified that configuring the graphics aperture to be
write-combining instead of write-back using an MTRR also solves the
problem.  It appears to be a cache incoherency issue in the graphics
aperture.


I take it back.  Further testing has revealed that this does not solve
the problem.

Chip

--
Charles M. "Chip" Coldwell
Senior Software Engineer
Red Hat, Inc
978-392-2426

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?)

2007-01-17 Thread Chip Coldwell

On Wed, 17 Jan 2007, Andi Kleen wrote:


On Wednesday 17 January 2007 07:31, Chris Wedgwood wrote:

On Tue, Jan 16, 2007 at 08:52:32PM +0100, Christoph Anton Mitterer wrote:

I agree,... it seems drastic, but this is the only really secure
solution.


I'd like to here from Andi how he feels about this?  It seems like a
somewhat drastic solution in some ways given a lot of hardware doesn't
seem to be affected (or maybe in those cases it's just really hard to
hit, I don't know).


AMD is looking at the issue. Only Nvidia chipsets seem to be affected,
although there were similar problems on VIA in the past too.
Unless a good workaround comes around soon I'll probably default
to iommu=soft on Nvidia.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


We've just verified that configuring the graphics aperture to be
write-combining instead of write-back using an MTRR also solves the
problem.  It appears to be a cache incoherency issue in the graphics
aperture.

This script does the trick:

[ -- cut here -- ]
#!/bin/bash

# Read the northbridge offset 0x90 to get the size of the aperture
size=0x`lspci -xxx -s 0:18.3 | awk '/^90:/ { print $2 }'`

# bit 0 indicates the aperture is enabled, bits 1 - 3 indicate the size
if [ $((size & 1)) -eq 0 ] ; then
echo "GART disabled; exiting"
exit 0
fi

shft=$(((size >> 1) & 7))
size=$((0x200 << shft))

# Read the northbridge offset 0x94 to get the base address of the aperture
base=0x`lspci -xxx -s 0:18.3 | awk '/^90:/ { print $6 }'`
base=$((base << 25))
basehex=`printf 0x%08x $base`

printf "IOMMU aperture found at base=0x%08x size=0x%08x (%d KiB)\n" $base $size 
$((size/1024))

if grep -q $basehex /proc/mtrr ; then
echo "MTRR already configured for IOMMU aperture; exiting"
exit 0
fi

echo "Configuring write-combining MTRR for IOMMU aperture"
printf "base=0x%08x size=0x%08x type=write-combining\n" $base $size >/proc/mtrr

exit 0
[ -- cut here-- ]

Chip

--
Charles M. "Chip" Coldwell
Senior Software Engineer
Red Hat, Inc
978-392-2426

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?)

2007-01-17 Thread Chip Coldwell

On Wed, 17 Jan 2007, Andi Kleen wrote:


On Wednesday 17 January 2007 07:31, Chris Wedgwood wrote:

On Tue, Jan 16, 2007 at 08:52:32PM +0100, Christoph Anton Mitterer wrote:

I agree,... it seems drastic, but this is the only really secure
solution.


I'd like to here from Andi how he feels about this?  It seems like a
somewhat drastic solution in some ways given a lot of hardware doesn't
seem to be affected (or maybe in those cases it's just really hard to
hit, I don't know).


AMD is looking at the issue. Only Nvidia chipsets seem to be affected,
although there were similar problems on VIA in the past too.
Unless a good workaround comes around soon I'll probably default
to iommu=soft on Nvidia.

-Andi
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


We've just verified that configuring the graphics aperture to be
write-combining instead of write-back using an MTRR also solves the
problem.  It appears to be a cache incoherency issue in the graphics
aperture.

This script does the trick:

[ -- cut here -- ]
#!/bin/bash

# Read the northbridge offset 0x90 to get the size of the aperture
size=0x`lspci -xxx -s 0:18.3 | awk '/^90:/ { print $2 }'`

# bit 0 indicates the aperture is enabled, bits 1 - 3 indicate the size
if [ $((size  1)) -eq 0 ] ; then
echo GART disabled; exiting
exit 0
fi

shft=$(((size  1)  7))
size=$((0x200  shft))

# Read the northbridge offset 0x94 to get the base address of the aperture
base=0x`lspci -xxx -s 0:18.3 | awk '/^90:/ { print $6 }'`
base=$((base  25))
basehex=`printf 0x%08x $base`

printf IOMMU aperture found at base=0x%08x size=0x%08x (%d KiB)\n $base $size 
$((size/1024))

if grep -q $basehex /proc/mtrr ; then
echo MTRR already configured for IOMMU aperture; exiting
exit 0
fi

echo Configuring write-combining MTRR for IOMMU aperture
printf base=0x%08x size=0x%08x type=write-combining\n $base $size /proc/mtrr

exit 0
[ -- cut here-- ]

Chip

--
Charles M. Chip Coldwell
Senior Software Engineer
Red Hat, Inc
978-392-2426

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: data corruption with nvidia chipsets and IDE/SATA drives (k8 cpu errata needed?)

2007-01-17 Thread Chip Coldwell

On Wed, 17 Jan 2007, Chip Coldwell wrote:


On Wed, 17 Jan 2007, Andi Kleen wrote:


On Wednesday 17 January 2007 07:31, Chris Wedgwood wrote:

On Tue, Jan 16, 2007 at 08:52:32PM +0100, Christoph Anton Mitterer wrote:

I agree,... it seems drastic, but this is the only really secure
solution.


I'd like to here from Andi how he feels about this?  It seems like a
somewhat drastic solution in some ways given a lot of hardware doesn't
seem to be affected (or maybe in those cases it's just really hard to
hit, I don't know).


AMD is looking at the issue. Only Nvidia chipsets seem to be affected,
although there were similar problems on VIA in the past too.
Unless a good workaround comes around soon I'll probably default
to iommu=soft on Nvidia.



We've just verified that configuring the graphics aperture to be
write-combining instead of write-back using an MTRR also solves the
problem.  It appears to be a cache incoherency issue in the graphics
aperture.


I take it back.  Further testing has revealed that this does not solve
the problem.

Chip

--
Charles M. Chip Coldwell
Senior Software Engineer
Red Hat, Inc
978-392-2426

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/