Re: atl1 driver corrupting memory?

2007-07-25 Thread Chris Snook

Chuck Ebbert wrote:

I have a report of random errors when using the atl1 driver
with kernel 2.6.22.1. Could that be a problem fixed by the
recent changes to DMA setup in 2.6.23-rc?


I hope so.  As far as we can tell the driver and the NIC itself are doing the 
right thing, and the pci layer or chipset is screwing up the 64-bit DMA.  This 
only manifests when physical memory addresses cross the 4 GB boundary, and as 
far as I'm aware atl1 is only used on desktop boards, so we don't have a lot of 
testers.  If someone wants to buy me and Jay more RAM so we can test it 
ourselves, I guess we wouldn't object :)


I favor disabling 64-bit DMA in atl1 until Atheros can track this down in the 
lab.  If we don't get confirmation that this bug is fixed by the DMA changes, I 
think we should revert to 32-bit DMA for 2.6.23.  Limiting ourselves to 32-bit 
DMA on desktop systems is a lot less bad than allowing arbitrary memory corruption.


-- Chris
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: atl1 driver corrupting memory?

2007-07-25 Thread Chris Snook

Chuck Ebbert wrote:

On 07/25/2007 05:22 PM, Chris Snook wrote:

Chuck Ebbert wrote:

I have a report of random errors when using the atl1 driver
with kernel 2.6.22.1. Could that be a problem fixed by the
recent changes to DMA setup in 2.6.23-rc?

I hope so.  As far as we can tell the driver and the NIC itself are
doing the right thing, and the pci layer or chipset is screwing up the
64-bit DMA.  This only manifests when physical memory addresses cross
the 4 GB boundary, and as far as I'm aware atl1 is only used on desktop
boards, so we don't have a lot of testers.  If someone wants to buy me
and Jay more RAM so we can test it ourselves, I guess we wouldn't object :)



Our reporter has 8GB of memory in an x86_64 machine.

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=249511



I favor disabling 64-bit DMA in atl1 until Atheros can track this down
in the lab.  If we don't get confirmation that this bug is fixed by the
DMA changes, I think we should revert to 32-bit DMA for 2.6.23. 
Limiting ourselves to 32-bit DMA on desktop systems is a lot less bad

than allowing arbitrary memory corruption.



This is what was committed.

http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=3f516c00d416bd39aab6cfb348b68919e295fe23
http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=ef76e3e2505db01f7d4b537854f4a177220c26c8


Oh, I thought you were referring to a problem reproduced *after* those changes, 
to be fixed by some generic DMA setup patch.  Has anyone reproduced the problem 
after those changes?


CCing atl1-devel to see if we can get some more testing...

-- Chris
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: atl1 driver corrupting memory?

2007-07-25 Thread Chuck Ebbert
On 07/25/2007 05:22 PM, Chris Snook wrote:
 Chuck Ebbert wrote:
 I have a report of random errors when using the atl1 driver
 with kernel 2.6.22.1. Could that be a problem fixed by the
 recent changes to DMA setup in 2.6.23-rc?
 
 I hope so.  As far as we can tell the driver and the NIC itself are
 doing the right thing, and the pci layer or chipset is screwing up the
 64-bit DMA.  This only manifests when physical memory addresses cross
 the 4 GB boundary, and as far as I'm aware atl1 is only used on desktop
 boards, so we don't have a lot of testers.  If someone wants to buy me
 and Jay more RAM so we can test it ourselves, I guess we wouldn't object :)
 

Our reporter has 8GB of memory in an x86_64 machine.

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=249511


 I favor disabling 64-bit DMA in atl1 until Atheros can track this down
 in the lab.  If we don't get confirmation that this bug is fixed by the
 DMA changes, I think we should revert to 32-bit DMA for 2.6.23. 
 Limiting ourselves to 32-bit DMA on desktop systems is a lot less bad
 than allowing arbitrary memory corruption.
 

This is what was committed.

http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=3f516c00d416bd39aab6cfb348b68919e295fe23
http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=ef76e3e2505db01f7d4b537854f4a177220c26c8

-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: atl1 driver corrupting memory?

2007-07-25 Thread Jay Cliburn
On Wed, 25 Jul 2007 17:31:02 -0400
Chuck Ebbert [EMAIL PROTECTED] wrote:

 This is what was committed.
 
 http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=3f516c00d416bd39aab6cfb348b68919e295fe23
 http://git.kernel.org/git/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=ef76e3e2505db01f7d4b537854f4a177220c26c8

I'm doubtful these patches will fix the highmem corruption problem
we've seen in the L1.  I actually extracted the changes in the
referenced commits from the vendor's current out-of-tree driver, and
unfortunately he was able to duplicate the problem in his lab using
that driver.

As a workaround, Chuck, your reporter can boot with mem=3900 until the
problem is resolved.

I go on record with Chris:  we should apply the patch at
http://lkml.org/lkml/2007/6/25/293 until we get to the bottom of it.
The patch is in Jeff's queue, but I think he suspects a driver bug and
so far hasn't chosen to apply the patch.

Jeff, can we ask you to please reconsider?

Jay
-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


atl1 driver corrupting memory?

2007-07-25 Thread Chuck Ebbert
I have a report of random errors when using the atl1 driver
with kernel 2.6.22.1. Could that be a problem fixed by the
recent changes to DMA setup in 2.6.23-rc?


-
To unsubscribe from this list: send the line unsubscribe netdev in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html