Bug#261251: xserver-xfree86: [ati/radeon+crusoe] Xclients are crashing at start
Ok, I have made some progress with this bug, but now I am a bit stuck. I really need assistance for the following because I don't know what to do next. Here are the facts: I have a laptop Vaio C1MZX with a Transmeta Crusoe TM5800. The graphic card is a ATI Radeon Mobility M6 LY. From time to time the Xserver stop to work properly and refuse new connections from any Xclients. The Xserver itself is still standing but any Xclient trying to connect get a message from the Xserver that makes it crash (see the gdb log attached to this mail: xlogo_bug.log). It seems to be the 9th reply from the Xserver that makes the Xclients crash (the normal negotiation between the Xserver and the Xclient is traced in xlogo_nobug.log). I also noticed that stopping the Xserver and starting it again was not helping to remove the bug. The Xserver will still behave the same. More surprisingly, if you do a copy of the binary file of the buggy Xserver and run it, then it will work without bug, but when running the original binary file that started with the bug, the bug will appear again. All Xservers (with optimization or not, with debug or not, or all the combinations) will crash at some point and behave as mentioned previously. I tried to go inside the Xserver by attaching gdb to the Xserver process but the Xserver start to use a lot of cpu time and nothing happen. When interrupted inside gdb by a Ctrl-C, gdb give a prompt again but a bt gives some nonsense informations (when the bug is not present this process of attaching gdb to the process works normally). Some Hypothesis... Well, I don't know if it is right but it seems that there is a cache which contains a wrong image of the Xserver which is not flushed. If we accept this hypothesis, I don't know why the bug is still the same (usually these cache problems are random). Another problem is that I don't know what to try next and how to get deeper or how to have a better understanding on what is going on. Can somebody confirm the behavior that I described or give me some hints on what to do next ? Some useful links to better understand the specificities of the Crusoe processors: http://www.realworldtech.com/page.cfm?ArticleID=RWT01020400 http://www.realworldtech.com/page.cfm?ArticleID=RWT012704012616 Regards -- Emmanuel Fleury Computer Science Department, | Office: B1-201 Aalborg University, | Phone: +45 96 35 72 23 Fredriks Bajersvej 7E, | Fax:+45 98 15 98 89 9220 Aalborg East, Denmark | Email: [EMAIL PROTECTED] GNU gdb 6.1-debian Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as i386-linux...Using host libthread_db library /lib/tls/libthread_db.so.1. (gdb) b XlibInt.c:_XReply Make breakpoint pending on future shared library load? (y or [n]) Breakpoint 1 (XlibInt.c:_XReply) pending. (gdb) r Starting program: /home/fleury/devel/crusoe_bug/src/xfree86-4.3.0-dfsg/xc/programs/xlogo/xlogo Breakpoint 2 at 0x4026f3bc: file XlibInt.c, line 1642. Pending breakpoint XlibInt.c:_XReply resolved Breakpoint 2, _XReply (dpy=0x8050488, rep=0xb590, extra=0, discard=0) at XlibInt.c:1642 in XlibInt.c (gdb) c Continuing. Breakpoint 2, _XReply (dpy=0x8050488, rep=0xb5b0, extra=0, discard=0) at XlibInt.c:1642 1642 in XlibInt.c (gdb) Continuing. Breakpoint 2, _XReply (dpy=0x8050488, rep=0xb420, extra=0, discard=1) at XlibInt.c:1642 1642 in XlibInt.c (gdb) Continuing. Breakpoint 2, _XReply (dpy=0x8050488, rep=0xb600, extra=0, discard=1) at XlibInt.c:1642 1642 in XlibInt.c (gdb) Continuing. Breakpoint 2, _XReply (dpy=0x8050488, rep=0xb600, extra=0, discard=1) at XlibInt.c:1642 1642 in XlibInt.c (gdb) Continuing. Breakpoint 2, _XReply (dpy=0x8050488, rep=0xbfffe4d0, extra=0, discard=1) at XlibInt.c:1642 1642 in XlibInt.c (gdb) Continuing. Breakpoint 2, _XReply (dpy=0x8050488, rep=0xbfffe4e0, extra=0, discard=0) at XlibInt.c:1642 1642 in XlibInt.c (gdb) Continuing. Breakpoint 2, _XReply (dpy=0x8050488, rep=0xb3d0, extra=0, discard=1) at XlibInt.c:1642 1642 in XlibInt.c (gdb) Continuing. X Error of failed request: BadLength (poly request too large or internal Xlib length error) Major opcode of failed request: 18 (X_ChangeProperty) Serial number of failed request: 15 Current serial number in output stream: 18 Program exited with code 01. (gdb) quit GNU gdb 6.1-debian Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This
Bug#261251: xserver-xfree86: [ati/radeon+crusoe] Xclients are crashing at start
Hi all, I still haven't found precisely the bug but I might have found a way to work around. I downloaded the sources of the Xserver at: http://ftp.debian.org/debian/pool/main/x/xfree86/xfree86_4.3.0.dfsg.1.orig.tar.gz And I compiled it with no modification (I mean the minimum modifications needed to make it compile). It means that this version was compiled with optimization of level 2 and no debug. When I entered the specific mode where the problem was occurring, I tried to run the Xserver compiled by myself and it was running ok (no bug). Just to be sure that I was still in the specific mode I tried another time to run the Xserver of the package from Debian afterward and it crashed as expected. The point here was that I never had any problem with the Xserver that I compiled by myself and I thought at first that it was because of the debug options, but I always had the debug options on for all of them. So, just to be sure, I tried to compile one with full optimization and no debug. I was hoping that it would either crash or run. My guess now is that it is coming from one of the patches applied... but I might be wrong. I noticed also that when running the Xserver that I compiled by myself the DRM was activated and the Xserver was much faster. I attach to this mail, the /var/log/XFree86.0.log from the Xserver that I compiled and the one of the plain debian package. If anybody want more details or has some suggestions to track this bug a bit further, I would be pleased to perform the tests or give the informations. For now, I will be running the Xserver that I compiled by myself (it is faster and it resist to the bug, so I would be stupid to not have it). Regards -- Emmanuel Fleury Computer Science Department, | Office: B1-201 Aalborg University, | Phone: +45 96 35 72 23 Fredriks Bajersvej 7E, | Fax:+45 98 15 98 89 9220 Aalborg East, Denmark | Email: [EMAIL PROTECTED] XFree86 Version 4.3.0 Release Date: 27 February 2003 X Protocol Version 11, Revision 0, Release 6.6 Build Operating System: Linux 2.6.7 i686 [ELF] Build Date: 24 July 2004 Before reporting problems, check http://www.XFree86.Org/ to make sure that you have the latest version. Module Loader present Markers: (--) probed, (**) from config file, (==) default setting, (++) from command line, (!!) notice, (II) informational, (WW) warning, (EE) error, (NI) not implemented, (??) unknown. (==) Log file: /var/log/XFree86.0.log, Time: Sun Jul 25 17:14:19 2004 (==) Using config file: /etc/X11/XF86Config-4 (==) ServerLayout Default Layout (**) |--Screen Screen 0 (0) (**) | |--Monitor LCD Display (**) | |--Device Radeon Mobility 0 (**) |--Input Device Keyboard (**) Option AutoRepeat 500 30 (**) Option XkbRules xfree86 (**) XKB: rules: xfree86 (**) Option XkbModel pc101 (**) XKB: model: pc101 (**) Option XkbLayout us (**) XKB: layout: us (==) Keyboard: CustomKeycode disabled (**) |--Input Device Mouse (**) |--Input Device USB Mouse (**) FontPath set to /usr/X11R6/lib/X11/fonts/local/,/usr/X11R6/lib/X11/fonts/misc/,/usr/X11R6/lib/X11/fonts/100dpi/:unscaled,/usr/X11R6/lib/X11/fonts/75dpi/:unscaled,/usr/X11R6/lib/X11/fonts/Type1/,/usr/X11R6/lib/X11/fonts/Speedo/,/usr/X11R6/lib/X11/fonts/100dpi/,/usr/X11R6/lib/X11/fonts/75dpi/ (**) RgbPath set to /usr/X11R6/lib/X11/rgb (==) ModulePath set to /usr/X11R6/lib/modules (--) using VT number 7 (WW) Open APM failed (/dev/apm_bios) (No such file or directory) (II) Module ABI versions: XFree86 ANSI C Emulation: 0.2 XFree86 Video Driver: 0.6 XFree86 XInput driver : 0.4 XFree86 Server Extension : 0.2 XFree86 Font Renderer : 0.4 (II) Loader running on linux (II) LoadModule: bitmap (II) Loading /usr/X11R6/lib/modules/fonts/libbitmap.a (II) Module bitmap: vendor=The XFree86 Project compiled for 4.3.0.1, module version = 1.0.0 Module class: XFree86 Font Renderer ABI class: XFree86 Font Renderer, version 0.4 (II) Loading font Bitmap (II) LoadModule: pcidata (II) Loading /usr/X11R6/lib/modules/libpcidata.a (II) Module pcidata: vendor=The XFree86 Project compiled for 4.3.0.1, module version = 1.0.0 ABI class: XFree86 Video Driver, version 0.6 (II) PCI: Probing config type using method 1 (II) PCI: Config type is 1 (II) PCI: stages = 0x03, oldVal1 = 0x, mode1Res1 = 0x8000 (II) PCI: PCI scan (all values are in hex) (II) PCI: 00:00:0: chip 1279,0395 card 104d,80ec rev 03 class 06,00,00 hdr 80 (II) PCI: 00:00:1: chip 1279,0396 card 104d,80ec rev 00 class 05,00,00 hdr 80 (II) PCI: 00:00:2: chip 1279,0397 card 104d,80ec rev 00 class 05,00,00 hdr 80 (II) PCI: 00:06:0: chip 10b9,5451 card 104d,80ec rev 02 class 04,01,00 hdr 00 (II) PCI: 00:07:0: chip 10b9,1533 card 104d,80ec rev 00 class 06,01,00 hdr 00 (II) PCI: 00:08:0: chip 10b9,5457 card 104d,80ec rev 00 class 07,03,00 hdr 00 (II) PCI: 00:09:0: chip 104c,8023 card 104d,80ec rev 00 class 0c,00,10 hdr 00 (II) PCI: 00:0a:0: chip 10cf,2011 card 104d,80ec rev 00 class 04,80,00 hdr 00 (II) PCI: 00:0b:0:
Bug#261251: xserver-xfree86: [ati/radeon+crusoe] Xclients are crashing at start
Hi, I don't understand anything... I just encountered the bug again starting with the new Xserver that I compiled. What is really strange is that when I ran the Xserver from Debian (which was crashing previously) it worked (the bug did not appear but still the Xserver compiled by me was stuck). It seems that when one Xserver is crashing, you cannot run it anymore but you can run others... It looks like black art to me. I have no explanation. I'm lost. :-( If somebody has some rational explanation or some experiments to perform, I'll be happy. Any clue ? Regards -- Emmanuel Fleury Computer Science Department, | Office: B1-201 Aalborg University, | Phone: +45 96 35 72 23 Fredriks Bajersvej 7E, | Fax:+45 98 15 98 89 9220 Aalborg East, Denmark | Email: [EMAIL PROTECTED]
Bug#261251: xserver-xfree86: [ati/radeon+crusoe] Xclients are crashing at start
Package: xserver-xfree86 Version: 4.3.0.dfsg.1-6 Severity: important Hi all, I am hunting a very strange bug (probably hardware related). This bug has been reported by several people in bug #234556 as a bug of the xlibs package but I think I managed to push it down to the xserver package now (at least I believe so). This bug seems to appear only on architectures which contain a processor Transmeta Crusoe and/or a graphic card ATI Radeon Mobility (mine is a Radeon Mobility M6 LY). I should emphase the fact that we don't know which one from the Crusoe or the Radeon is responsible of this problem (in fact, I don't even know if it is hardware). The problem is the following, from time to time at boot the laptop enter in a specific mode where the bug occurs. Since this mode has been reached the bug is fully reproducible, but I don't know yet what are the conditions to enter in this mode. So the only way to reach the bug is to reboot several time until you reach this mode. The bug itself is that the Xserver does not accept any connection from any Xclient. Each time a Xclient is trying to connect the Xserver we get the following message (exemple using xlogo): [EMAIL PROTECTED] xlogo]$ ./xlogo X Error of failed request: BadLength (poly request too large or internal Xlib length error) Major opcode of failed request: 18 (X_ChangeProperty) Serial number of failed request: 16 Current serial number in output stream: 19 You can perfectly stop and start again the Xserver but only the Xserver (using xinit will fail because of the attempt to start the xconsole, but using only /usr/bin/X11/X or /usr/bin/X11/XFree86 will work). I started to debug from one Xclient (xlogo) compiled with debug option and no optimization (it was misleading gdb) and the xlib compiled also with debug and no optimization. I appeared that the Xclient makes an attempt to connect to the Xserver and send 9 messages to the Xserver. But when getting the 9th answer from the Xserver, the size of the message (or the message itself, I don't know) seems to be unexpected from the Xclient and makes it terminate. So, the bug is probably deeper in the Xserver (the behavior of the Xclient seems to be as expected). This is here that things are starting to be strange... I compiled an Xserver with the debug options in order to follow what was going on inside the Xserver. I reached the specific mode where the bug was occurring and ran the Xserver with debug and... it worked. I mean the Xclient connected normally to the Xserver with debug. Of course, I checked that I was still into the specific mode by using a non-debug Xserver and the problem was still here. At first, I really though I did something wrong while the compilation, so I took the Xserver package with debug options and it was the exact same behavior (i.e. once in the specific mode the XFree86 was crashing all the Xclients and the XFree86-debug was allowing the Xclients to connect). Then I tried to do the following: - run XFree86 (no debug) - run gdb on xlogo and stop just before send the 9th message - run gdb on XFree86 (no debug) and attach to the first XFree86 My hope was to be able to disassemble the functions and follow what was going on. The problem was that while attaching the process XFree86 was suddenly taking all the cpu time and nothing was happening (an interesting point is that when not in this specific mode where the bug is happening, this operation is possible). So, now I'm stuck. I really don't know how to go deeper in XFree86. My theory is that there is a problem with the driver of the Radeon Mobility (maybe combined with some specific feature of the motherboard for the Crusoe). Does anybody have an idea ? Regards -- Emmanuel -- Package-specific info: Contents of /var/lib/xfree86/X.roster: xserver-xfree86 xserver-xfree86-dbg /var/lib/xfree86/X.md5sum does not exist. X server symlink status: lrwxrwxrwx 1 root root 20 Aug 23 2003 /etc/X11/X - /usr/bin/X11/XFree86 -rwxr-xr-x 1 root root 1745388 Jul 7 17:07 /usr/bin/X11/XFree86 Contents of /var/lib/xfree86/XF86Config-4.roster: xserver-xfree86 xserver-xfree86-dbg VGA-compatible devices on PCI bus: :00:0c.0 VGA compatible controller: ATI Technologies Inc Radeon Mobility M6 LY /etc/X11/XF86Config-4 does not match checksum in /var/lib/xfree86/XF86Config-4.md5sum. XFree86 X server configuration file status: -rw-r--r-- 1 root root 15888 Jul 16 17:42 /etc/X11/XF86Config-4 Contents of /etc/X11/XF86Config-4: # # XFree86 4.x configuration for Sony PCG-C1MZX # Version: 1.0 # # Author: Emmanuel Fleury [EMAIL PROTECTED] # # 28-aug-2003: First version based on the file of # Felix Groebert [EMAIL PROTECTED] # # # Copyright (c) 1999 by The XFree86 Project, Inc. # # Permission is hereby granted, free of charge, to any person obtaining a # copy of this software and associated documentation files (the Software), # to deal in the Software without restriction, including without limitation # the rights to
Processed: Re: Bug#261251: xserver-xfree86: [ati/radeon+crusoe] Xclients are crashing at start
Processing commands for [EMAIL PROTECTED]: priority 261251 normal Bug#261251: xserver-xfree86: [ati/radeon+crusoe] Xclients are crashing at start Severity set to `normal'. merge 261251 216933 Bug#216933: libx11-6: many clients get BadLength error from X_ChangeProperty request (Transmeta Crusoe smoking gun) Bug#261251: xserver-xfree86: [ati/radeon+crusoe] Xclients are crashing at start Mismatch - only Bugs in same state can be merged: Values for `package' don't match: #216933 has `libx11-6'; #261251 has `xserver-xfree86' thanks Stopping processing here. Please contact me if you need assistance. Debian bug tracking system administrator (administrator, Debian Bugs database)
Bug#216933: Bug#261251: xserver-xfree86: [ati/radeon+crusoe] Xclients are crashing at start
priority 261251 normal merge 261251 216933 thanks On Sat, 2004-07-24 at 17:17 +0200, Emmanuel Fleury wrote: I am hunting a very strange bug (probably hardware related). This bug has been reported by several people in bug #234556 as a bug of the xlibs package but I think I managed to push it down to the xserver package now (at least I believe so). It's still the same bug though, why are you reporting it as a new one? Merging with the existing bugs about this problem... This bug seems to appear only on architectures which contain a processor Transmeta Crusoe and/or a graphic card ATI Radeon Mobility (mine is a Radeon Mobility M6 LY). I should emphase the fact that we don't know which one from the Crusoe or the Radeon is responsible of this problem (in fact, I don't even know if it is hardware). It's certainly not the graphics chip, as the code which generates the error isn't even near the driver, let alone the hardware, and people are seeing the problem with a variety of graphics chips, but only with Transmeta CPUs. I compiled an Xserver with the debug options in order to follow what was going on inside the Xserver. I reached the specific mode where the bug was occurring and ran the Xserver with debug and... it worked. I mean the Xclient connected normally to the Xserver with debug. Of course, I checked that I was still into the specific mode by using a non-debug Xserver and the problem was still here. At first, I really though I did something wrong while the compilation, so I took the Xserver package with debug options and it was the exact same behavior (i.e. once in the specific mode the XFree86 was crashing all the Xclients and the XFree86-debug was allowing the Xclients to connect). Interesting. So it sounds like the optimized code either contains some instruction(s) that work slightly differently on Transmeta CPUs than on other x86 CPUs, and/or it triggers a bug in the code morphing. Can others seeing the problem confirm that it doesn't happen with the XFree86-debug server from xserver-xfree86-dbg? -- Earthling Michel Dänzer | Debian (powerpc), X and DRI developer Libre software enthusiast| http://svcs.affero.net/rm.php?r=daenzer
Bug#261251: xserver-xfree86: [ati/radeon+crusoe] Xclients are crashing at start
On Sat, 2004-07-24 at 20:04, Michel Dänzer wrote: It's still the same bug though, why are you reporting it as a new one? Merging with the existing bugs about this problem... The first bug was reported to be a bug of the xlibs and I though that it was changing a bit to relate it to the xserver package now. (Moreover, I lost the reference to the last mail I sent...) Sorry for this. It's certainly not the graphics chip, as the code which generates the error isn't even near the driver, let alone the hardware, and people are seeing the problem with a variety of graphics chips, but only with Transmeta CPUs. Ok. I might have been too fast to draw hypothesis on this. But I am now sure that we have to go deeper (i.e. in the Xserver). I don't know where it will stop. Interesting. So it sounds like the optimized code either contains some instruction(s) that work slightly differently on Transmeta CPUs than on other x86 CPUs, and/or it triggers a bug in the code morphing. Hum, does it means that the XFree86-debug does not contain any optimization ? If so, I have to tell you that I actually tried to compile one Xserver with debug and no optimization (I removed all the -O2 from the Makefiles). And it was working. In fact, when debug is activated it seems to work. If somebody knows what are the differences between a normal binary and a binary with debug options, I would like to know. And if one of these differences can explain such behavior, then I really would like to know. :) Can others seeing the problem confirm that it doesn't happen with the XFree86-debug server from xserver-xfree86-dbg? Just to make it more clear where is the breaking point, here are some hints: - start the Xserver and the Xserver only - run gdb on xlogo - put a break on the function _XReply in the file XlibInt.c - after 9 times through this break you should encounter the bug. The size of the 9th message sent by xlogo is 476 and the wrong reply from the Xserver is 32. Regards -- Emmanuel Fleury Computer Science Department, | Office: B1-201 Aalborg University, | Phone: +45 96 35 72 23 Fredriks Bajersvej 7E, | Fax:+45 98 15 98 89 9220 Aalborg East, Denmark | Email: [EMAIL PROTECTED]