I'd like to mention what might be a new twist on this problem. We are
seeing the same kind of 4k-block data corruption on multiple Tyan
dual-Opteron boards (S3870) with a ServerWorks chipset, not Nvidia. I
wonder if it really an Nvidia-specific issue. The Nvidia boards are a
lot more popular, s
Hi folks.
1) Are there any new developments in this issue? Does someone know if
AMD and Nvidia is still investigating?
2) Steve Langasek from Debian sent me a patch that disables the hw-iommu
per default on Nvidia boards.
I've attached it in the kernel bugzilla and asked for inclusion in the
kern
Christoph Anton Mitterer wrote:
Ok,.. that sounds reasonable,.. so the whole thing might (!) actually be
a hardware design error,... but we just don't use that hardware any
longer when accessing devices via sata_nv.
So this doesn't solve our problem with PATA drives or other devices
(although we
Robert Hancock wrote:
>> What is that GART thing exactly? Is this the hardware IOMMU? I've always
>> thought GART was something graphics card related,.. but if so,.. how
>> could this solve our problem (that seems to occur mainly on harddisks)?
>>
> The GART built into the Athlon 64/Opteron CP
Christoph Anton Mitterer wrote:
Sorry, as always I've forgot some things... *g*
Robert Hancock wrote:
If this is related to some problem with using the GART IOMMU with memory
hole remapping enabled
What is that GART thing exactly? Is this the hardware IOMMU? I've always
thought GART was some
Sorry, as always I've forgot some things... *g*
Robert Hancock wrote:
> If this is related to some problem with using the GART IOMMU with memory
> hole remapping enabled
What is that GART thing exactly? Is this the hardware IOMMU? I've always
thought GART was something graphics card related,..
Hi everybody.
Sorry again for my late reply...
Robert gave us the following interesting information some days ago:
Robert Hancock wrote:
> If this is related to some problem with using the GART IOMMU with memory
> hole remapping enabled, then 2.6.20-rc kernels may avoid this problem on
> nForc
Hi.
Some days ago I received the following message from "Sunny Days". I
think he did not send it lkml so I forward it now:
Sunny Days wrote:
> hello,
>
> i have done some extensive testing on this.
>
> various opterons, always single socket
> various dimms 1 and 2gb modules
> and hitachi+seagate
Hi.
Just for you information: I've put the issue into the kernel.org bugzilla.
http://bugzilla.kernel.org/show_bug.cgi?id=7768
Chris.
begin:vcard
fn:Mitterer, Christoph Anton
n:Mitterer;Christoph Anton
email;internet:[EMAIL PROTECTED]
x-mozilla-html:TRUE
version:2.1
end:vcard
Christoph Anton Mitterer wrote:
Hi.
Perhaps some of you have read my older two threads:
http://marc.theaimsgroup.com/?t=11631244001&r=1&w=2 and the even
older http://marc.theaimsgroup.com/?t=11629131451&r=1&w=2
The issue was basically the following:
I found a severe bug mainly by fortun
Hi everybody.
After my last mails to this issue (btw: anything new in the meantime? I
received no replys..) I wrote again to nvidia and AMD...
This time with some more success.
Below is the answer from Mr. Friedman to my mail. He says that he wasn't
able to reproduce the problem and asks for a te
John A Chaves wrote:
> I didn't need to run a specific test for this. The normal workload of the
> machine approximates a continuous selftest for almost the last year.
>
> Large files (4-12GB is typical) are being continuously packed and unpacked
> with gzip and bzip2. Statistical analysis of the
On Friday 22 December 2006 20:04, Christoph Anton Mitterer wrote:
> This brings me to:
> Chris Wedgwood wrote:
> > Does anyone have an amd64 with an nforce4 chipset and >4GB that does
> > NOT have this problem? If so it might be worth chasing the BIOS
> > vendors to see what errata they are dealing
Hi my friends
It became a little bit silent about this issue... any new ideas or results?
Karsten Weiss wrote:
> BTW: Did someone already open an official bug at
> http://bugzilla.kernel.org ?
Karsten, did you already file a bug?
I told the whole issue to the Debian people which are abou
[EMAIL PROTECTED] wrote:
>On Wed, Dec 13, 2006 at 09:11:29PM +0100, Christoph Anton Mitterer wrote:
>
>> - error in the Opteron (memory controller)
>> - error in the Nvidia chipsets
>> - error in the kernel
>
>My guess without further information would be that some, but not all
>BIOSes are doing so
On Sat, 2006-12-02 at 01:56 +0100, Christoph Anton Mitterer wrote:
> Hi.
>
> Perhaps some of you have read my older two threads:
> http://marc.theaimsgroup.com/?t=11631244001&r=1&w=2 and the even
> older http://marc.theaimsgroup.com/?t=11629131451&r=1&w=2
>
> The issue was basically the f
Muli Ben-Yehuda wrote:
>> 4)
>> And does someone know if the nforce/opteron iommu requires IBM Calgary
>> IOMMU support?
>>
> It doesn't, Calgary isn't found in machine with Opteron CPUs or NForce
> chipsets (AFAIK). However, compiling Calgary in should make no
> difference, as we detect in ru
On Thu, Dec 14, 2006 at 02:16:31PM +0100, Karsten Weiss wrote:
> On Thu, 14 Dec 2006, Muli Ben-Yehuda wrote:
>
> > The rest looks good. Please resend and I'll add my Acked-by.
>
> Thanks a lot for your comments and suggestions. Here's my 2nd try:
>
> ===
>
> From: Karsten Weiss <[EMAIL PROTECTE
On Thu, 14 Dec 2006, Muli Ben-Yehuda wrote:
> The rest looks good. Please resend and I'll add my Acked-by.
Thanks a lot for your comments and suggestions. Here's my 2nd try:
===
From: Karsten Weiss <[EMAIL PROTECTED]>
$ diffstat ~/iommu-patch_v2.patch
Documentation/kernel-parameters.txt |
On Thu, Dec 14, 2006 at 12:38:08PM +0100, Karsten Weiss wrote:
> On Thu, 14 Dec 2006, Muli Ben-Yehuda wrote:
>
> > On Wed, Dec 13, 2006 at 09:34:16PM +0100, Karsten Weiss wrote:
> >
> > > BTW: It would be really great if this area of the kernel would get some
> > > more and better documentation.
On Thu, 14 Dec 2006, Muli Ben-Yehuda wrote:
> On Wed, Dec 13, 2006 at 09:34:16PM +0100, Karsten Weiss wrote:
>
> > BTW: It would be really great if this area of the kernel would get some
> > more and better documentation. The information at
> > linux-2.6/Documentation/x86_64/boot_options.txt is
On Thu, Dec 14, 2006 at 02:52:35AM -0700, Erik Andersen wrote:
> On Thu Dec 14, 2006 at 11:23:11AM +0200, Muli Ben-Yehuda wrote:
> > > I just realized that booting with "iommu=soft" makes my pcHDTV
> > > HD5500 DVB cards not work. Time to go back to disabling the
> > > memhole and losing 1 GB. :-
On Thu Dec 14, 2006 at 11:23:11AM +0200, Muli Ben-Yehuda wrote:
> > I just realized that booting with "iommu=soft" makes my pcHDTV
> > HD5500 DVB cards not work. Time to go back to disabling the
> > memhole and losing 1 GB. :-(
>
> That points to a bug in the driver (likely) or swiotlb (unlikely
On Wed, Dec 13, 2006 at 09:11:29PM +0100, Christoph Anton Mitterer wrote:
> - error in the Opteron (memory controller)
> - error in the Nvidia chipsets
> - error in the kernel
My guess without further information would be that some, but not all
BIOSes are doing some work to avoid this.
Does anyo
On Thu, Dec 14, 2006 at 12:33:23AM +0100, Christoph Anton Mitterer wrote:
> 4)
> And does someone know if the nforce/opteron iommu requires IBM Calgary
> IOMMU support?
It doesn't, Calgary isn't found in machine with Opteron CPUs or NForce
chipsets (AFAIK). However, compiling Calgary in should ma
On Wed, Dec 13, 2006 at 01:29:25PM -0700, Erik Andersen wrote:
> On Mon Dec 11, 2006 at 10:24:02AM +0100, Karsten Weiss wrote:
> > We could not reproduce the data corruption anymore if we boot
> > the machines with the kernel parameter "iommu=soft" i.e. if we
> > use software bounce buffering inste
On Wed, Dec 13, 2006 at 09:34:16PM +0100, Karsten Weiss wrote:
> FWIW: As far as I understand the linux kernel code (I am no kernel
> developer so please correct me if I am wrong) the PCI dma mapping code is
> abstracted by struct dma_mapping_ops. I.e. there are currently four
> possible implem
Hi.
I've just looked for some kernel config options that might relate to our
issue:
1)
Old style AMD Opteron NUMA detection (CONFIG_K8_NUMA)
Enable K8 NUMA node topology detection. You should say Y here if you
have a multi processor AMD K8 system. This uses an old method to read
the NUMA config
On Wed, Dec 13, 2006 at 08:57:23PM +0100, Christoph Anton Mitterer wrote:
> Don't understand me wrong,.. I don't use Windows (expect for upgrading
> my Plextor firmware and EAC ;) )... but I ask because the more
> information we get (even if it's not Linux specific) the more steps we
> can take ;)
Lennart Sorensen wrote:
> I upgrade my plextor firmware using linux. pxupdate for most devices,
> and pxfw for new drivers (like the PX760). Works perfectly for me. It
> is one of the reasons I buy plextors.
Yes I know about it,.. although never tested it,... anyway the main
reason for Windows i
On Wed, 13 Dec 2006, Chris Wedgwood wrote:
> > Any ideas why iommu=disabled in the bios does not solve the issue?
>
> The kernel will still use the IOMMU if the BIOS doesn't set it up if
> it can, check your dmesg for IOMMU strings, there might be something
> printed to this effect.
FWIW: As far
Erik Andersen wrote:
> I just realized that booting with "iommu=soft" makes my pcHDTV
> HD5500 DVB cards not work. Time to go back to disabling the
> memhole and losing 1 GB. :-(
Crazy,...
I have a Hauppauge Nova-T 500 DualDVB-T card,... I'll check it later if
I have the same problem and will inf
On Mon Dec 11, 2006 at 10:24:02AM +0100, Karsten Weiss wrote:
> We could not reproduce the data corruption anymore if we boot
> the machines with the kernel parameter "iommu=soft" i.e. if we
> use software bounce buffering instead of the hw-iommu.
I just realized that booting with "iommu=soft" mak
On Wed, 13 Dec 2006, Erik Andersen wrote:
> On Mon Dec 11, 2006 at 10:24:02AM +0100, Karsten Weiss wrote:
> > Last week we did some more testing with the following result:
> >
> > We could not reproduce the data corruption anymore if we boot the machines
> > with the kernel parameter "iommu=soft
On Mon Dec 11, 2006 at 10:24:02AM +0100, Karsten Weiss wrote:
> Last week we did some more testing with the following result:
>
> We could not reproduce the data corruption anymore if we boot the machines
> with the kernel parameter "iommu=soft" i.e. if we use software bounce
> buffering instead
Chris Wedgwood wrote:
>> Did anyone made any test under Windows? I cannot set there
>> iommu=soft, can I?
>>
> Windows never uses the hardware iommu, so it's always doing the
> equivalent on iommu=soft
>
That would mean that I'm not able to reproduce the issue unter windows,
right?
Does tha
Karsten Weiss wrote:
> "Memory hole mapping" was set to "hardware". With "disabled" we only
> see 3 of our 4 GB memory.
>
That sounds reasonable,... I even only see 2,5 GB,.. as my memhole takes
1536 MB (don't ask me which PCI device needs that much address space ;) )
begin:vcard
fn:Mitterer, Ch
Karsten Weiss wrote:
> Of course, the big question "Why does the hardware iommu *not*
> work on those machines?" still remains.
>
I'm going to check AMDs errata docs these days,.. perhaps I find
something that relates. But I'd ask you to do the same as I don't
consider myself as an expert in t
On Wed, 13 Dec 2006, Christoph Anton Mitterer wrote:
Christoph, I will carefully re-read your entire posting and the
included links on Monday and will also try the memory hole
setting.
And did you get out anything new?
As I already mentioned the kernel parameter "iommu=soft" fixes
the data c
On Wed, Dec 13, 2006 at 08:18:21PM +0100, Christoph Anton Mitterer wrote:
> booting with iommu=soft => works fine
> booting with iommu=noagp => DOESN'T solve the error
> booting with iommu=off => the system doesn't even boot and panics
> When I set IOMMU to disabled in the BIOS the error is not s
On Wed, Dec 13, 2006 at 08:20:59PM +0100, Christoph Anton Mitterer wrote:
> Did anyone made any test under Windows? I cannot set there
> iommu=soft, can I?
Windows never uses the hardware iommu, so it's always doing the
equivalent on iommu=soft
-
To unsubscribe from this list: send the line "unsu
Ah and I forgot,...
Did anyone made any test under Windows? I cannot set there iommu=soft,
can I?
Chris.
begin:vcard
fn:Mitterer, Christoph Anton
n:Mitterer;Christoph Anton
email;internet:[EMAIL PROTECTED]
x-mozilla-html:TRUE
version:2.1
end:vcard
Karsten Weiss wrote:
> Last week we did some more testing with the following result:
>
> We could not reproduce the data corruption anymore if we boot the machines
> with the kernel parameter "iommu=soft" i.e. if we use software bounce
> buffering instead of the hw-iommu. (As mentioned before, bo
Karsten Weiss wrote:
> Here's a diff of a corrupted and a good file written during our
> testcase:
>
> ("-" == corrupted file, "+" == good file)
> ...
> 009f2ff0 67 2a 4c c4 6d 9d 34 44 ad e6 3c 45 05 9a 4d c4
> |g*L.m.4D.. -009f3000 39 60 e6 44 20 ab 46 44 56 aa 46 44 c2 35 e6 44 |9.D .FD
On Mon, Dec 11, 2006 at 10:24:02AM +0100, Karsten Weiss wrote:
> We could not reproduce the data corruption anymore if we boot the
> machines with the kernel parameter "iommu=soft" i.e. if we use
> software bounce buffering instead of the hw-iommu. (As mentioned
> before, booting with mem=2g works
On Sat, 2 Dec 2006, Karsten Weiss wrote:
> On Sat, 2 Dec 2006, Christoph Anton Mitterer wrote:
>
> > I found a severe bug mainly by fortune because it occurs very rarely.
> > My test looks like the following: I have about 30GB of testing data on
>
> This sounds very familiar! One of the Linux co
Ville Herva wrote:
> I saw something very similar with Via KT133 years ago. Then the culprit was
> botched PCI implementation that sometimes corrupted PCI transfers when there
> was heavy PCI I/O going on. Usually than meant running two disk transfers at
> the same time. Doing heavy network I/O at
Chris Wedgwood wrote:
> Heh, I see this also with an Tyan S2866 (nforce4 chipset). I've been
> aware something is a miss for a while because if I transfer about 40GB
> of data from one machine to another there are checksum mismatches and
> some files have to be transfered again.
>
It seems that
Alan wrote:
> See the thread http://lkml.org/lkml/2006/8/16/305
>
Hi Alan.
Thanks for your reply. I've read this thread already some weeks ago
but from my limited knowledge I understood, that this was an issue
related to a SCSI adapter or so. Or did I understand this wrong. And as
soon as
On Sat, 2 Dec 2006 12:00:36 +0100 (CET)
Karsten Weiss <[EMAIL PROTECTED]> wrote:
> Hello Christoph!
>
> On Sat, 2 Dec 2006, Christoph Anton Mitterer wrote:
>
> > I found a severe bug mainly by fortune because it occurs very rarely.
> > My test looks like the following: I have about 30GB of testi
Hello Christoph!
On Sat, 2 Dec 2006, Christoph Anton Mitterer wrote:
I found a severe bug mainly by fortune because it occurs very rarely.
My test looks like the following: I have about 30GB of testing data on
This sounds very familiar! One of the Linux compute clusters I
administer at work i
On Sat, Dec 02, 2006 at 01:56:06AM +0100, Christoph Anton Mitterer wrote:
> I found a severe bug mainly by fortune because it occurs very
> rarely. My test looks like the following: I have about 30GB of
> testing data on my harddisk,... I repeat verifying sha512 sums on
> these files and check if
Erik Andersen wrote:
> Doh! I have a Tyan S2895 in my system, and I've been pulling my
> hair out trying to track down the cause of a similar somewhat
> rare failure for the pre-computer sha1 of a block of data to
> actually match the calculated sha1. I'd been hunting in vain the
> past few days
On Sat Dec 02, 2006 at 01:56:06AM +0100, Christoph Anton Mitterer wrote:
> The issue was basically the following:
> I found a severe bug mainly by fortune because it occurs very rarely.
> My test looks like the following: I have about 30GB of testing data on
> my harddisk,... I repeat verifying sha
Hi.
Perhaps some of you have read my older two threads:
http://marc.theaimsgroup.com/?t=11631244001&r=1&w=2 and the even
older http://marc.theaimsgroup.com/?t=11629131451&r=1&w=2
The issue was basically the following:
I found a severe bug mainly by fortune because it occurs very rarely.
M
55 matches
Mail list logo