Re: [Dri-devel] Status of AMD 760MP + Radeon lockups?
On Thu, 2002-06-13 at 00:09, Wayne Whitney wrote: On Mon, 10 Jun 2002, Linus Torvalds wrote: If you have an AMD system and have seen problems with GART usage, and are willing to test out stuff, please give this a try. I'd love to hear actual user reports about whether this actually solves any problems. The DRI lockup on switching console back to X I am seeing still occurs with 2.4.19-pre10 plus the pageattr-B1-2.4.19-pre10 patch recently posted to LKML by Ben LaHaise. For the record, the other relevant details of my system are a Radeon VE QY video card, Tyan S2460 motherboard (AMD 760MP chipset) and RedHat XFree86-4.2.0-8. On a similar system, replacing the motherboard with an ASUS A7M266-D (AMD 760MPX chipset) eliminated the lockup. As both chipsets use the AMD 762 northbridge, I guess either the motherboards program the northbridge differently, or else there is some interaction with the two different southbridges (AMD 766 in 760MP, AMD 768 in 760MPX). I experience the same problem on a P4 machine... -- Earthling Michel Dänzer (MrCooper)/ Debian GNU/Linux (powerpc) developer XFree86 and DRI project member / CS student, Free Software enthusiast ___ Don't miss the 2002 Sprint PCS Application Developer's Conference August 25-28 in Las Vegas - http://devcon.sprintpcs.com/adp/index.cfm?source=osdntextlink ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] Status of AMD 760MP + Radeon lockups?
On Mon, 10 Jun 2002, Linus Torvalds wrote: If you have an AMD system and have seen problems with GART usage, and are willing to test out stuff, please give this a try. I'd love to hear actual user reports about whether this actually solves any problems. The DRI lockup on switching console back to X I am seeing still occurs with 2.4.19-pre10 plus the pageattr-B1-2.4.19-pre10 patch recently posted to LKML by Ben LaHaise. For the record, the other relevant details of my system are a Radeon VE QY video card, Tyan S2460 motherboard (AMD 760MP chipset) and RedHat XFree86-4.2.0-8. On a similar system, replacing the motherboard with an ASUS A7M266-D (AMD 760MPX chipset) eliminated the lockup. As both chipsets use the AMD 762 northbridge, I guess either the motherboards program the northbridge differently, or else there is some interaction with the two different southbridges (AMD 766 in 760MP, AMD 768 in 760MPX). Wayne ___ Sponsored by: ThinkGeek at http://www.ThinkGeek.com/ ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] Status of AMD 760MP + Radeon lockups?
AMD just sent out this email about a kernel bug/interaction with the AMD Athlons and AGP GART usage. I'll just quote the whole thing here, it would be interesting to hear whether the suggested patches seem to make any difference to any AMD/Radeon problems.. If you have an AMD system and have seen problems with GART usage, and are willing to test out stuff, please give this a try. I'd love to hear actual user reports about whether this actually solves any problems. Also, I personally don't think the short-term fix is workable, and if somebody cares to port (and test) the change_page_attr() PAT solution to 2.5.x I'd be thrilled (except I do not think it makes any sense to try to salvage the 4MB page feature - just disable it). Linus From: [EMAIL PROTECTED] Subject: Cache-attribute conflict bug in the kernel exposed on newer AMD Athlon processors Date: Mon, 10 Jun 2002 20:05:42 -0500 I'm Rich Brunner and I work in AMD's Software Research Development group. AMD has been working with Andrea Arcangeli, Andi Kleen, and Dave Jones from SuSE in researching what looks like a cache-attribute conflict bug in the Linux kernel that is being exposed by newer versions of AMD's Athlon processors (AthlonXP and AthlonMP). The kernel bug is often exposed in conjunction with use of the AGP Aperture on these platforms. The good news is that a short-term fix is easy to do and there are several long-term fixes that can do an even better job in addressing this. AMD is still sort of new to the Linux Community; we hope that giving lots of details in this long note is the right way to let the community know about this. We wanted to be crystal clear on the cause and solutions before bothering you. The discussion is laid out below: 1) Architecture Athlon Processor Background 2) The problem in the Linux Kernel 3) Short-term Linux Kernel Solution 4) Long-term Linux Kernel Solution We thought we could discuss it privately among ourselves until you felt it was appropriate to post on LKML. There is nothing AMD confidential about this, but, we thought best to contact you first before posting it. We very much appreciate your input and invaluable insight into this problem. Thanks! -Rich ... [[EMAIL PROTECTED]-- (360)-867-0654] [Senior Member, Technical Staff, SW RD @ AMD] 1. Architecture and Athlon Processor Background === The x86 architecture allows a number of important performance optimizations for memory which is marked as write-back cacheable. One such important optimization allows the processor to speculatively read memory and cache it. Such cache lines can be allocated in the shared, exclusive, or modified states. [1] [4] The architecture even allows a processor to speculate on some of the sub-operations that will be necessary for an instruction that will write memory. Although the processor can not speculatively commit to memory a speculative write nor make its results visible to software [2], it is allowed to speculatively read the cache line that could be modified into the cache and place it in the modified state without modifying it. [1] (This cache-line read is also referred to as Read-For-Ownership.) If the speculated write instruction is not taken, the line is allowed to remain unmodified, but still marked as modified in the cache. Normal cache eviction can then write the line back to memory at an appropriate time. Correctness is ensured because the architecture requires cache-coherency for write-back (WB) cacheable data. [3] [4] Thus, if all processors see the data as write-back cacheable, there is never a possibility of data-corruption or stale data. This is the intent of the x86 architecture. [6] This is an important requirement; the x86 architecture does not support the practice of having a single physical page mapped to two or more different linear addresses (virtual aliasing), each with different memory types because it may lead to undefined operations that can result in a system failure. [5] By extension, not only are conflicting cache attributes not allowed for virtual aliasing, they are also not allowed for physical aliasing. Physical aliasing is possible through the AGP Aperture which provides a re-mapping table of Aperture physical page addresses to DRAM physical page addresses. We will see below how this causes the actual problem. Footnotes - [1] AMD x86-64 Architecture Programmer's Manual, Volume 2: System Programming, Revision 3.0, Section 7.3 Memory Types. [2] Ibid., Section 7.1.2 Write Ordering [3] Ibid., Section 7.2 Memory Coherency and Protocol [4] IA-32 Intel Architecture Software Developer's Manual Volume 3: System Programming Guide, Order Number 245472-006, Section 10.3. METHODS OF CACHING AVAILABLE [5] Ibid., Section 10.12.4. Programming the PAT [6] Ibid., Section 10.11.8. MTRR Considerations in MP Systems Speculation for WB data on Newer Athlon Processors
Re: [Dri-devel] Status of AMD 760MP + Radeon lockups?
On Tue, 4 Jun 2002, hy0 wrote: Thanks for digging into this problem. Here are a few more things to try according to your feedback. I again tried all your suggestions at once, and the lockup still ocurred. [If it had stopped, I would have done a binary search on the changes.] I should make some general comments on my set up, in case I am doing something wrong. I have a stock RedHat 7.3 installation, except my kernel is 2.5.20-dj2. I built and installed XFree86 from CVS with #define ProjectRoot /usr/X11R6-CVS #define NothingOutsideProjectRoot YES #define EtcX11Directory ProjectRoot/etc in xc/config/cf/site.def. Then I changed the link /usr/X11R6/bin/X to point to /usr/X11R6-CVS/bin/XFree86. So I am running gdm from RedHat 7.3, but it runs the new XFree86. I noticed something else new. Namely, my test now consists of starting gdm, switching to the 1st console, and switching back. With the patched CVS XFree86, when I do this, the screen first shows the prior screen with a box of garbage, then the background is redrawn (and the gdm window is blank), then the gdm window is redrawn. I am even able to type in a few characters to gdm, and only then does X lock up. From the XFree86.0.log file (below), it looks as if the point of lockup is the point at which drmRadeonWaitForIdleCP() starts returning -1022 instead of 0. With the stock RedHat 7.3 XFree86, X locks up just after showing the prior screen with the garbage box (and the garbage box is a different size and location). I didn't test the unpatched XFree86 CVS, although if you like, I could do that, and if it behaves differently from the patched XFree86 CVS, I could do a binary search on the difference until I find out what explains the difference. Anyway, that's the current scoop. If you like, I'll be able to try more changes until Saturday, after that I'm going to swapping out this Tyan S2460 motherboard. Below is first the full diff between the my current patched XFree86 and the version I grabbed from CVS, then the XFree86.0.log from the above test. Cheers, Wayne P.S. It's not so important, but for some reason with the XFree86 CVS, my gnome-terminals come up with the wrong background color (black instead of off-white) and the colors in xosview are all wrong. diff -u radeon_accel.c.~1.25.~ radeon_accel.c --- radeon_accel.c.~1.25.~ Wed Apr 24 09:20:39 2002 +++ radeon_accel.c Tue Jun 4 11:48:23 2002 @@ -191,11 +191,13 @@ intret; inti= 0; -FLUSH_RING(); +RADEONTRACE((RADEONCPWaitForIdle: Skipping FLUSH_RING()\n)); +/* FLUSH_RING(); */ for (;;) { do { ret = drmRadeonWaitForIdleCP(info-drmFD); + RADEONTRACE((RADEONCPWaitForIdle: drmRadeonWaitForIdleCP returned %d\n, +ret)); if (ret ret != -EBUSY) { xf86DrvMsg(pScrn-scrnIndex, X_ERROR, %s: CP idle %d\n, __FUNCTION__, ret); @@ -1572,6 +1574,9 @@ /* Sync */ a-Sync = RADEONCPWaitForIdle; +/* Disable 2D Acceleration */ +return; + /* Solid Filled Rectangle */ a-PolyFillRectSolidFlags = 0; a-SetupForSolidFill= RADEONCPSetupForSolidFill; diff -u radeon_dri.c.~1.16.~ radeon_dri.c --- radeon_dri.c.~1.16.~Wed Apr 24 09:20:40 2002 +++ radeon_dri.cTue Jun 4 11:18:16 2002 @@ -795,6 +795,14 @@ mode, vendor, device, info-PciInfo-vendor, info-PciInfo-chipType); + +mode = 0x1f000201; + +xf86DrvMsg(pScreen-myNum, X_INFO, + [agp] Mode 0x%08lx [AGP 0x%04x/0x%04x; Card 0x%04x/0x%04x]\n, + mode, vendor, device, + info-PciInfo-vendor, + info-PciInfo-chipType); if (drmAgpEnable(info-drmFD, mode) 0) { xf86DrvMsg(pScreen-myNum, X_ERROR, [agp] AGP not enabled\n); diff -u radeon_driver.c.~1.56.~ radeon_driver.c --- radeon_driver.c.~1.56.~ Tue May 14 13:02:34 2002 +++ radeon_driver.c Tue Jun 4 11:19:58 2002 @@ -4188,7 +4188,7 @@ } /* Define PLL registers for requested video mode */ -static void RADEONInitPLLRegisters(RADEONSavePtr save, RADEONPLLPtr pll, +static void RADEONInitPLLRegisters(ScrnInfoPtr pScrn, RADEONSavePtr save, +RADEONPLLPtr pll, double dot_clock) { unsigned long freq = dot_clock * 100; @@ -4240,7 +4240,7 @@ } /* Define PLL2 registers for requested video mode */ -static void RADEONInitPLL2Registers(RADEONSavePtr save, RADEONPLLPtr pll, +static void RADEONInitPLL2Registers(ScrnInfoPtr pScrn, RADEONSavePtr save, +RADEONPLLPtr pll, double dot_clock) { unsigned long freq = dot_clock * 100; @@ -4360,14 +4360,14 @@ if (!RADEONInitCrtc2Registers(pScrn, save, pScrn-currentMode,info)) return FALSE; -
Re: [Dri-devel] Status of AMD 760MP + Radeon lockups?
Thanks for digging into this problem. Here are a few more things to try according to your feedback. On Sun, 26 May 2002, hy0 wrote: This one (VT switching lockup with DRI) has been haunting us for a while. It appears to be hardware (Agp chipset) related. Yes, and here is something a bit odd: in one of my boxes, replacing a Tyan S2460 motherboard (AMD 760MP chipset) with an ASUS A7M266-D motherboard (AMD 760MPX chipset) got rid of the problem. But the 760MP and 760MPX chipset have the same northbridge, the AMD 762, and differ only in the southbridge (AMD 766 vs AMD 768). I just checked, the two boards have the same revision of the AMD 762. So shouldn't these motherboards be identical from the AGP point of view? Unless the BIOSes set up the northbridge differently on each machine. What does the [agp] Mode... line say with ASUS A7M266-D motherboard? Unfortunately I can't reproduce this problem on all my boxes. There are a few things you can try to narrow the problem down: 1. What is the agp mode used by drmAgpEnable call? This should already be in your log file -- search for '[agp] Mode' line. If I don't put any Option AGPMode line in my XF86Config, it reads [agp] Mode 0x0f000211 [AGP 0x1022/0x700c; Card 0x1002/0x5159]. With Option AGPMode 4, the first hex value is instead 0x0f000217. Right before drmAgpEnable call in radeon_dri.c, try to add following line: mode = 0x1f000201; Can it make any difference? (After making the change, you don't need to recompile the whole X server, just go to ...xfree86/drivers/ati directory do a make install there, then restart X) 2. Try to verify if the lockup happens in RADEONCP_START call (from RADEONEnterVT in radeon_driver.c). If you can still remote login or do a hot reboot after the lockup, this can be easily verified by adding some log messages around that call. It happens after RADEONCP_START. Well, I decided to try all your suggestions at once (see below), so all I can say is that with sleep(1) before and after RADEONCP_START, the lockup happens after RADEONCP_START. Also what does the dmesg say after the lockup? Nothing--the lockup appears to be only X (and hence the console). I don't have a machine handy to remotely login with, but if I did, I bet I could kill X and then if I could reinitialize the video card and console, I'd be back in business. 3. Since you can see some drawings, the lockup seems to happen later (after the CP_START call). If that's the case, try to add some delay (sleep(1)) before and after RADEONCP_START in RADEONEnterVT. If it doesn't help, you can add a return; right after a-sync = ... in RADEONCPAccelInit of radeon_accel.c. This will disable all 2D acceleration routines, just to see OK, I decided to try everything you suggested at once, so as to only recompile X once. Below is first the patch I used (relative to the directory xc/programs/Xserver/hw/xfree86/drivers/ati), then the full XFree86.0.log. I turned on RADEON_DEBUG, and I had to fix a couple things to get it to compile with RADEON_DEBUG turned on. I should note that without this patch, when switching back to X, it just shows the screen with the top just garbage, then is frozen (I'm guessing this is because the chipset is reconfigured for the graphics display, and it is just showing the contents of the framebuffer, which is what it was when I switched to the text VT, but the top part was scribbled over by the text VT). With the patch, there's clearly three different screens: first I would say the screen with the top scribbled, then the screen without the top scribbled, but it is still not quite right (maybe the border is funny?), then the screen with the top scribbled again. Anyway, it was still kind of fast, so I don't know if my impressions are accurate or that useful. Add a trace after DRIUnlock in RADEONEnterVT, just in case it locks up there (unlikely though). Leave all acceleration routines disabled (return after a-Sync = RADEONCPWaitForIdle). In RADEONCPWaitForIdle of radeon_accel.c, comment off FLUSH_RING() and add a log message there. Also add a trace right after drmATIWaitForIdleCP call, check what this call returns. Hopefully this can further narrow down where the lockup occurs. Thanks. Hui Cheers, Wayne ___ Don't miss the 2002 Sprint PCS Application Developer's Conference August 25-28 in Las Vegas -- http://devcon.sprintpcs.com/adp/index.cfm ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] Status of AMD 760MP + Radeon lockups?
On Tue, 4 Jun 2002, hy0 wrote: What does the [agp] Mode... line say with ASUS A7M266-D motherboard? On the ASUS A7M266-D motherboard (with Option AGPMode 4 in /etc/X11/XF86Config), it says [agp] Mode 0x0f000217 [AGP 0x1022/0x700c; Card 0x1002/0x5159]. This is the same as on the Tyan S2460. I'll try your other suggestions later this week. Actually, I won't have easy access to the Tyan S2460 that much longer, probably just this week. Cheers, Wayne ___ Don't miss the 2002 Sprint PCS Application Developer's Conference August 25-28 in Las Vegas -- http://devcon.sprintpcs.com/adp/index.cfm ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel
Re: [Dri-devel] Status of AMD 760MP + Radeon lockups?
The VT switching lockup problem with DRI is different from the one (AMD 761) discussed lately. The XFree CVS or RH73 code has the fix for that one, see http://www.geocrawler.com/lists/3/SourceForge/2634/25/8680261/. This one (VT switching lockup with DRI) has been haunting us for a while. It appears to be hardware (Agp chipset) related. Unfortunately I can't reproduce this problem on all my boxes. There are a few things you can try to narrow the problem down: 1. What is the agp mode used by drmAgpEnable call? This should already be in your log file -- search for '[agp] Mode' line. 2. Try to verify if the lockup happens in RADEONCP_START call (from RADEONEnterVT in radeon_driver.c). If you can still remote login or do a hot reboot after the lockup, this can be easily verified by adding some log messages around that call. Also what does the dmesg say after the lockup? 3. Since you can see some drawings, the lockup seems to happen later (after the CP_START call). If that's the case, try to add some delay (sleep(1)) before and after RADEONCP_START in RADEONEnterVT. If it doesn't help, you can add a return; right after a-sync = ... in RADEONCPAccelInit of radeon_accel.c. This will disable all 2D acceleration routines, just to see if it can make any difference. Hui - Original Message - From: Wayne Whitney [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Saturday, May 25, 2002 10:55 AM Subject: [Dri-devel] Status of AMD 760MP + Radeon lockups? Hello, I noticed a thread in April, 2002 about DRI lockups people were seeing when using a Radeon card with the AMD 760MP chipset. I didn't see a resolution, though, and as I am seeing the same thing now, I wanted to ask what the status is. I'm using a Radeon VE QY with a Tyan S2460 motherboard, and whenever I enable DRI, switching from a text console back to X causes X to lockup. (But the kernel is OK, I can use Alt-Sysrq.) The screen shows the expected contents except for a rectangle of garbage near the top. I don't have a different video card to try, but if I disable DRI, or if I use an AMD 760MPX based motherboard (Asus A7M266-D), the problem goes away. FWIW, the BIOS update pages on the Tyan S2460 and S2462 (the only AMD 760MP motherboards available) both show that earlier versions of the BIOS had a problem with Radeon cards reinitializing the display on warm boots. I don't know if this former 760MP + Radeon BIOS problem is related to the current 760MP + Radeon DRI problem. I noticed the following Changelog entry in the xfree86.org CVS: 114. Fixes for DRI lockup problems with Radeon 7500/VE and the AMD 761 chipset (Hui Yu@ATI). Of course, the AMD 760MP uses the AMD 762 northbridge, but I thought this might be related. So I compiled the latest xfree86 CVS and tried it. I'm running kernel 2.5.15-dj2, so I also grabbed kernel 2.5.18, which includes a DRI CVS merge, and used the drivers/char/drm code from it to compile the kernel DRM module. Unfortunately, this combination still shows the lockups. Any other suggestions on what to try? Or is there further information I should provide? I noticed that in the April thread, the person reporting the problem eventually provide a trace from a static X server, but I didn't see a response after that. If it would be helpful to have another trace, I could try to capture one. Thanks, Wayne ___ Don't miss the 2002 Sprint PCS Application Developer's Conference August 25-28 in Las Vegas -- http://devcon.sprintpcs.com/adp/index.cfm ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel ___ Don't miss the 2002 Sprint PCS Application Developer's Conference August 25-28 in Las Vegas -- http://devcon.sprintpcs.com/adp/index.cfm ___ Dri-devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dri-devel