Re: [Dri-devel] Status of AMD 760MP + Radeon lockups?

2002-06-15 Thread Michel Dänzer

On Thu, 2002-06-13 at 00:09, Wayne Whitney wrote: 
 On Mon, 10 Jun 2002, Linus Torvalds wrote:
 
  If you have an AMD system and have seen problems with GART usage, and
  are willing to test out stuff, please give this a try. I'd love to hear
  actual user reports about whether this actually solves any problems.
 
 The DRI lockup on switching console back to X I am seeing still occurs
 with 2.4.19-pre10 plus the pageattr-B1-2.4.19-pre10 patch recently posted
 to LKML by Ben LaHaise.  For the record, the other relevant details of my
 system are a Radeon VE QY video card, Tyan S2460 motherboard (AMD 760MP
 chipset) and RedHat XFree86-4.2.0-8.
 
 On a similar system, replacing the motherboard with an ASUS A7M266-D (AMD
 760MPX chipset) eliminated the lockup.  As both chipsets use the AMD 762
 northbridge, I guess either the motherboards program the northbridge
 differently, or else there is some interaction with the two different
 southbridges (AMD 766 in 760MP, AMD 768 in 760MPX).

I experience the same problem on a P4 machine...


-- 
Earthling Michel Dänzer (MrCooper)/ Debian GNU/Linux (powerpc) developer
XFree86 and DRI project member   /  CS student, Free Software enthusiast

___

Don't miss the 2002 Sprint PCS Application Developer's Conference
August 25-28 in Las Vegas - 
http://devcon.sprintpcs.com/adp/index.cfm?source=osdntextlink

___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Dri-devel] Status of AMD 760MP + Radeon lockups?

2002-06-12 Thread Wayne Whitney

On Mon, 10 Jun 2002, Linus Torvalds wrote:

 If you have an AMD system and have seen problems with GART usage, and
 are willing to test out stuff, please give this a try. I'd love to hear
 actual user reports about whether this actually solves any problems.

The DRI lockup on switching console back to X I am seeing still occurs
with 2.4.19-pre10 plus the pageattr-B1-2.4.19-pre10 patch recently posted
to LKML by Ben LaHaise.  For the record, the other relevant details of my
system are a Radeon VE QY video card, Tyan S2460 motherboard (AMD 760MP
chipset) and RedHat XFree86-4.2.0-8.

On a similar system, replacing the motherboard with an ASUS A7M266-D (AMD
760MPX chipset) eliminated the lockup.  As both chipsets use the AMD 762
northbridge, I guess either the motherboards program the northbridge
differently, or else there is some interaction with the two different
southbridges (AMD 766 in 760MP, AMD 768 in 760MPX).

Wayne



___

Sponsored by:
ThinkGeek at http://www.ThinkGeek.com/
___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Dri-devel] Status of AMD 760MP + Radeon lockups?

2002-06-10 Thread Linus Torvalds


AMD just sent out this email about a kernel bug/interaction with the AMD
Athlons and AGP GART usage. I'll just quote the whole thing here, it would
be interesting to hear whether the suggested patches seem to make any
difference to any AMD/Radeon problems..

If you have an AMD system and have seen problems with GART usage, and are
willing to test out stuff, please give this a try. I'd love to hear actual
user reports about whether this actually solves any problems.

Also, I personally don't think the short-term fix is workable, and if
somebody cares to port (and test) the change_page_attr() PAT solution to
2.5.x I'd be thrilled (except I do not think it makes any sense to try to
salvage the 4MB page feature - just disable it).

Linus


From: [EMAIL PROTECTED]
Subject: Cache-attribute conflict bug in the kernel exposed on newer AMD Athlon 
processors
Date: Mon, 10 Jun 2002 20:05:42 -0500

I'm Rich Brunner and I work in AMD's Software Research 
Development group. AMD has been working with Andrea
Arcangeli, Andi Kleen, and Dave Jones from SuSE in
researching what looks like a cache-attribute conflict bug
in the Linux kernel that is being exposed by newer versions
of AMD's Athlon processors (AthlonXP and AthlonMP). The
kernel bug is often exposed in conjunction with use of the
AGP Aperture on these platforms.

The good news is that a short-term fix is easy to do and
there are several long-term fixes that can do an even better
job in addressing this.

AMD is still sort of new to the Linux Community; we hope that
giving lots of details in this long note is the right way to
let the community know about this.  We wanted to be crystal
clear on the cause and solutions before bothering you.  The
discussion is laid out below:

  1) Architecture  Athlon Processor Background

  2) The problem in the Linux Kernel

  3) Short-term Linux Kernel Solution

  4) Long-term Linux Kernel Solution


We thought we could discuss it privately among ourselves
until you felt it was appropriate to post on LKML. There is
nothing AMD confidential about this, but, we thought best to
contact you first before posting it.

We very much appreciate your input and invaluable insight
into this problem.

Thanks!

-Rich ...

[[EMAIL PROTECTED]-- (360)-867-0654]
[Senior Member, Technical Staff, SW RD @ AMD]




1. Architecture and Athlon Processor Background
===
The x86 architecture allows a number of important
performance optimizations for memory which is marked as
write-back cacheable.  One such important optimization
allows the processor to speculatively read memory and cache
it. Such cache lines can be allocated in the shared,
exclusive, or modified states. [1] [4]

The architecture even allows a processor to speculate on
some of the sub-operations that will be necessary for an
instruction that will write memory. Although the processor
can not speculatively commit to memory a speculative write
nor make its results visible to software [2], it is allowed
to speculatively read the cache line that could be modified
into the cache and place it in the modified state without
modifying it. [1] (This cache-line read is also referred to
as Read-For-Ownership.) If the speculated write instruction
is not taken, the line is allowed to remain unmodified,
but still marked as modified in the cache. Normal cache
eviction can then write the line back to memory at an
appropriate time.

Correctness is ensured because the architecture requires
cache-coherency for write-back (WB) cacheable data. [3] [4]
Thus, if all processors see the data as write-back
cacheable, there is never a possibility of data-corruption
or stale data. This is the intent of the x86
architecture. [6]

This is an important requirement; the x86 architecture does
not support the practice of having a single physical page
mapped to two or more different linear addresses (virtual
aliasing), each with different memory types because it may
lead to undefined operations that can result in a system
failure. [5] By extension, not only are conflicting cache
attributes not allowed for virtual aliasing, they are also
not allowed for physical aliasing.

Physical aliasing is possible through the AGP Aperture
which provides a re-mapping table of Aperture physical page
addresses to DRAM physical page addresses. We will see below
how this causes the actual problem.


Footnotes
-
[1] AMD x86-64 Architecture Programmer's Manual, Volume 2:
System Programming, Revision 3.0, Section 7.3 Memory Types.
[2] Ibid., Section 7.1.2 Write Ordering
[3] Ibid., Section 7.2 Memory Coherency and Protocol
[4] IA-32 Intel Architecture Software Developer's Manual
Volume 3: System Programming Guide, Order Number
245472-006, Section 10.3. METHODS OF CACHING AVAILABLE
[5] Ibid., Section 10.12.4. Programming the PAT
[6] Ibid., Section 10.11.8. MTRR Considerations in MP Systems



Speculation for WB data on Newer Athlon Processors

Re: [Dri-devel] Status of AMD 760MP + Radeon lockups?

2002-06-04 Thread Wayne Whitney

On Tue, 4 Jun 2002, hy0 wrote:

 Thanks for digging into this problem.
 Here are a few more things to try according to your feedback.

I again tried all your suggestions at once, and the lockup still ocurred.  
[If it had stopped, I would have done a binary search on the changes.]

I should make some general comments on my set up, in case I am doing
something wrong.  I have a stock RedHat 7.3 installation, except my kernel
is 2.5.20-dj2.  I built and installed XFree86 from CVS with
  #define ProjectRoot /usr/X11R6-CVS
  #define NothingOutsideProjectRoot YES
  #define EtcX11Directory ProjectRoot/etc
in xc/config/cf/site.def.  Then I changed the link /usr/X11R6/bin/X to 
point to /usr/X11R6-CVS/bin/XFree86.  So I am running gdm from RedHat 7.3, 
but it runs the new XFree86.

I noticed something else new.  Namely, my test now consists of starting
gdm, switching to the 1st console, and switching back.  With the patched
CVS XFree86, when I do this, the screen first shows the prior screen with
a box of garbage, then the background is redrawn (and the gdm window is
blank), then the gdm window is redrawn.  I am even able to type in a few
characters to gdm, and only then does X lock up.  From the XFree86.0.log
file (below), it looks as if the point of lockup is the point at which
drmRadeonWaitForIdleCP() starts returning -1022 instead of 0.

With the stock RedHat 7.3 XFree86, X locks up just after showing the prior
screen with the garbage box (and the garbage box is a different size and
location).  I didn't test the unpatched XFree86 CVS, although if you like,
I could do that, and if it behaves differently from the patched XFree86
CVS, I could do a binary search on the difference until I find out what
explains the difference.

Anyway, that's the current scoop.  If you like, I'll be able to try more
changes until Saturday, after that I'm going to swapping out this Tyan
S2460 motherboard.

Below is first the full diff between the my current patched XFree86 and
the version I grabbed from CVS, then the XFree86.0.log from the above
test.

Cheers,
Wayne

P.S.  It's not so important, but for some reason with the XFree86 CVS, my
gnome-terminals come up with the wrong background color (black instead of
off-white) and the colors in xosview are all wrong.



diff -u radeon_accel.c.~1.25.~ radeon_accel.c
--- radeon_accel.c.~1.25.~  Wed Apr 24 09:20:39 2002
+++ radeon_accel.c  Tue Jun  4 11:48:23 2002
@@ -191,11 +191,13 @@
 intret;
 inti= 0;
 
-FLUSH_RING();
+RADEONTRACE((RADEONCPWaitForIdle: Skipping FLUSH_RING()\n));
+/* FLUSH_RING(); */
 
 for (;;) {
do {
ret = drmRadeonWaitForIdleCP(info-drmFD);
+   RADEONTRACE((RADEONCPWaitForIdle: drmRadeonWaitForIdleCP returned %d\n, 
+ret));
if (ret  ret != -EBUSY) {
xf86DrvMsg(pScrn-scrnIndex, X_ERROR,
   %s: CP idle %d\n, __FUNCTION__, ret);
@@ -1572,6 +1574,9 @@
/* Sync */
 a-Sync = RADEONCPWaitForIdle;
 
+/* Disable 2D Acceleration */
+return;
+
/* Solid Filled Rectangle */
 a-PolyFillRectSolidFlags   = 0;
 a-SetupForSolidFill= RADEONCPSetupForSolidFill;
diff -u radeon_dri.c.~1.16.~ radeon_dri.c
--- radeon_dri.c.~1.16.~Wed Apr 24 09:20:40 2002
+++ radeon_dri.cTue Jun  4 11:18:16 2002
@@ -795,6 +795,14 @@
   mode, vendor, device,
   info-PciInfo-vendor,
   info-PciInfo-chipType);
+
+mode = 0x1f000201;
+
+xf86DrvMsg(pScreen-myNum, X_INFO,
+  [agp] Mode 0x%08lx [AGP 0x%04x/0x%04x; Card 0x%04x/0x%04x]\n,
+  mode, vendor, device,
+  info-PciInfo-vendor,
+  info-PciInfo-chipType);
 
 if (drmAgpEnable(info-drmFD, mode)  0) {
xf86DrvMsg(pScreen-myNum, X_ERROR, [agp] AGP not enabled\n);
diff -u radeon_driver.c.~1.56.~ radeon_driver.c
--- radeon_driver.c.~1.56.~ Tue May 14 13:02:34 2002
+++ radeon_driver.c Tue Jun  4 11:19:58 2002
@@ -4188,7 +4188,7 @@
 }
 
 /* Define PLL registers for requested video mode */
-static void RADEONInitPLLRegisters(RADEONSavePtr save, RADEONPLLPtr pll,
+static void RADEONInitPLLRegisters(ScrnInfoPtr pScrn, RADEONSavePtr save, 
+RADEONPLLPtr pll,
   double dot_clock)
 {
 unsigned long  freq = dot_clock * 100;
@@ -4240,7 +4240,7 @@
 }
 
 /* Define PLL2 registers for requested video mode */
-static void RADEONInitPLL2Registers(RADEONSavePtr save, RADEONPLLPtr pll,
+static void RADEONInitPLL2Registers(ScrnInfoPtr pScrn, RADEONSavePtr save, 
+RADEONPLLPtr pll,
double dot_clock)
 {
 unsigned long  freq = dot_clock * 100;
@@ -4360,14 +4360,14 @@
if (!RADEONInitCrtc2Registers(pScrn, save,
  pScrn-currentMode,info))
return FALSE;
-   

Re: [Dri-devel] Status of AMD 760MP + Radeon lockups?

2002-06-03 Thread hy0

Thanks for digging into this problem.
Here are a few more things to try according to your feedback.

 On Sun, 26 May 2002, hy0 wrote:

  This one (VT switching lockup with DRI) has been haunting us for a
  while. It appears to be hardware (Agp chipset) related.

 Yes, and here is something a bit odd:  in one of my boxes, replacing a
 Tyan S2460 motherboard (AMD 760MP chipset) with an ASUS A7M266-D
 motherboard (AMD 760MPX chipset) got rid of the problem.  But the 760MP
 and 760MPX chipset have the same northbridge, the AMD 762, and differ only
 in the southbridge (AMD 766 vs AMD 768).  I just checked, the two boards
 have the same revision of the AMD 762.  So shouldn't these motherboards be
 identical from the AGP point of view?  Unless the BIOSes set up the
 northbridge differently on each machine.

What does the [agp] Mode... line say with ASUS A7M266-D motherboard?

  Unfortunately I can't reproduce this problem on all my boxes. There are
  a few things you can try to narrow the problem down:
 
  1. What is the agp mode used by drmAgpEnable call? This should already
  be in your log file -- search for '[agp] Mode' line.

 If I don't put any Option AGPMode line in my XF86Config, it reads
 [agp] Mode 0x0f000211 [AGP 0x1022/0x700c; Card 0x1002/0x5159].  With
 Option AGPMode 4, the first hex value is instead 0x0f000217.

Right before drmAgpEnable call in radeon_dri.c, try to add following line:
mode = 0x1f000201;
Can it make any difference?
(After making the change, you don't need to recompile the whole X server,
just go to ...xfree86/drivers/ati directory do a make install there, then
restart X)

  2. Try to verify if the lockup happens in RADEONCP_START call (from
  RADEONEnterVT in radeon_driver.c). If you can still remote login or do a
  hot reboot after the lockup, this can be easily verified by adding some
  log messages around that call.

 It happens after RADEONCP_START.  Well, I decided to try all your
 suggestions at once (see below), so all I can say is that with sleep(1)
 before and after RADEONCP_START, the lockup happens after RADEONCP_START.

  Also what does the dmesg say after the lockup?

 Nothing--the lockup appears to be only X (and hence the console).  I don't
 have a machine handy to remotely login with, but if I did, I bet I could
 kill X and then if I could reinitialize the video card and console, I'd be
 back in business.

  3. Since you can see some drawings, the lockup seems to happen later
  (after the CP_START call). If that's the case, try to add some delay
  (sleep(1)) before and after RADEONCP_START in RADEONEnterVT. If it
  doesn't help, you can add a return; right after a-sync = ... in
  RADEONCPAccelInit of radeon_accel.c. This will disable all 2D
  acceleration routines, just to see

 OK, I decided to try everything you suggested at once, so as to only
 recompile X once.  Below is first the patch I used (relative to the
 directory xc/programs/Xserver/hw/xfree86/drivers/ati), then the full
 XFree86.0.log.  I turned on RADEON_DEBUG, and I had to fix a couple things
 to get it to compile with RADEON_DEBUG turned on.

 I should note that without this patch, when switching back to X, it just
 shows the screen with the top just garbage, then is frozen (I'm guessing
 this is because the chipset is reconfigured for the graphics display, and
 it is just showing the contents of the framebuffer, which is what it was
 when I switched to the text VT, but the top part was scribbled over by the
 text VT).  With the patch, there's clearly three different screens: first
 I would say the screen with the top scribbled, then the screen without the
 top scribbled, but it is still not quite right (maybe the border is
 funny?), then the screen with the top scribbled again.  Anyway, it was
 still kind of fast, so I don't know if my impressions are accurate or that
 useful.

Add a trace after DRIUnlock in RADEONEnterVT, just in case it locks up there
(unlikely though).
Leave all acceleration routines disabled (return after  a-Sync =
RADEONCPWaitForIdle). In RADEONCPWaitForIdle of radeon_accel.c, comment off
FLUSH_RING() and add a log message there. Also add a trace right after
drmATIWaitForIdleCP call, check what this call returns.

Hopefully this can further narrow down where the lockup occurs. Thanks.

Hui

 Cheers,
 Wayne




___

Don't miss the 2002 Sprint PCS Application Developer's Conference
August 25-28 in Las Vegas -- http://devcon.sprintpcs.com/adp/index.cfm

___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Dri-devel] Status of AMD 760MP + Radeon lockups?

2002-06-03 Thread Wayne Whitney

On Tue, 4 Jun 2002, hy0 wrote:

 What does the [agp] Mode... line say with ASUS A7M266-D motherboard?

On the ASUS A7M266-D motherboard (with Option AGPMode 4 in
/etc/X11/XF86Config), it says [agp] Mode 0x0f000217 [AGP 0x1022/0x700c;  
Card 0x1002/0x5159].  This is the same as on the Tyan S2460.

I'll try your other suggestions later this week.  Actually, I won't have
easy access to the Tyan S2460 that much longer, probably just this week.

Cheers,
Wayne



___

Don't miss the 2002 Sprint PCS Application Developer's Conference
August 25-28 in Las Vegas -- http://devcon.sprintpcs.com/adp/index.cfm

___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel



Re: [Dri-devel] Status of AMD 760MP + Radeon lockups?

2002-05-26 Thread hy0

The VT switching lockup problem with DRI is different from the one (AMD 761)
discussed lately. The XFree CVS or RH73 code has the fix for that one, see
http://www.geocrawler.com/lists/3/SourceForge/2634/25/8680261/.
This one (VT switching lockup with DRI) has been haunting us for a while. It
appears to be hardware (Agp chipset) related. Unfortunately I can't
reproduce this problem on all my boxes. There are a few things you can try
to narrow the problem down:
1. What is the agp mode used by drmAgpEnable call? This should already be in
your log file -- search for '[agp] Mode' line.
2. Try to verify if the lockup happens in RADEONCP_START call (from
RADEONEnterVT in radeon_driver.c). If you can still remote login or do a hot
reboot after the lockup, this can be easily verified by adding some log
messages around that call. Also what does the dmesg say after the lockup?
3. Since you can see some drawings, the lockup seems to happen later (after
the CP_START call). If that's the case, try to add some delay (sleep(1))
before and after RADEONCP_START in RADEONEnterVT. If it doesn't help, you
can add a return; right after a-sync = ... in RADEONCPAccelInit of
radeon_accel.c. This will disable all 2D acceleration routines, just to see
if it can make any difference.

Hui


- Original Message -
From: Wayne Whitney [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Saturday, May 25, 2002 10:55 AM
Subject: [Dri-devel] Status of AMD 760MP + Radeon lockups?


 Hello,

 I noticed a thread in April, 2002 about DRI lockups people were seeing
 when using a Radeon card with the AMD 760MP chipset.  I didn't see a
 resolution, though, and as I am seeing the same thing now, I wanted to ask
 what the status is.

 I'm using a Radeon VE QY with a Tyan S2460 motherboard, and whenever I
 enable DRI, switching from a text console back to X causes X to lockup.
 (But the kernel is OK, I can use Alt-Sysrq.)  The screen shows the
 expected contents except for a rectangle of garbage near the top.  I don't
 have a different video card to try, but if I disable DRI, or if I use an
 AMD 760MPX based motherboard (Asus A7M266-D), the problem goes away.
 FWIW, the BIOS update pages on the Tyan S2460 and S2462 (the only AMD
 760MP motherboards available) both show that earlier versions of the BIOS
 had a problem with Radeon cards reinitializing the display on warm boots.
 I don't know if this former 760MP + Radeon BIOS problem is related to the
 current 760MP + Radeon DRI problem.

 I noticed the following Changelog entry in the xfree86.org CVS:  114.
 Fixes for DRI lockup problems with Radeon 7500/VE and the AMD 761 chipset
 (Hui Yu@ATI).  Of course, the AMD 760MP uses the AMD 762 northbridge, but
 I thought this might be related.  So I compiled the latest xfree86 CVS and
 tried it.  I'm running kernel 2.5.15-dj2, so I also grabbed kernel 2.5.18,
 which includes a DRI CVS merge, and used the drivers/char/drm code from it
 to compile the kernel DRM module.  Unfortunately, this combination still
 shows the lockups.

 Any other suggestions on what to try?  Or is there further information I
 should provide?  I noticed that in the April thread, the person reporting
 the problem eventually provide a trace from a static X server, but I
 didn't see a response after that.  If it would be helpful to have another
 trace, I could try to capture one.

 Thanks,
 Wayne



 ___

 Don't miss the 2002 Sprint PCS Application Developer's Conference
 August 25-28 in Las Vegas -- http://devcon.sprintpcs.com/adp/index.cfm

 ___
 Dri-devel mailing list
 [EMAIL PROTECTED]
 https://lists.sourceforge.net/lists/listinfo/dri-devel



___

Don't miss the 2002 Sprint PCS Application Developer's Conference
August 25-28 in Las Vegas -- http://devcon.sprintpcs.com/adp/index.cfm

___
Dri-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dri-devel