Bug#261251: xserver-xfree86: [ati/radeon+crusoe] Xclients are crashing at start

2004-07-27 Thread Emmanuel Fleury
Ok,

I have made some progress with this bug, but now I am a bit stuck. 
I really need assistance for the following because I don't know what to
do next. 

Here are the facts:

I have a laptop Vaio C1MZX with a Transmeta Crusoe TM5800.
The graphic card is a ATI Radeon Mobility M6 LY.

From time to time the Xserver stop to work properly and refuse new
connections from any Xclients. The Xserver itself is still standing but
any Xclient trying to connect get a message from the Xserver that makes
it crash (see the gdb log attached to this mail: xlogo_bug.log). It
seems to be the 9th reply from the Xserver that makes the Xclients crash
(the normal negotiation between the Xserver and the Xclient is traced in
xlogo_nobug.log).

I also noticed that stopping the Xserver and starting it again was not
helping to remove the bug. The Xserver will still behave the same.

More surprisingly, if you do a copy of the binary file of the buggy
Xserver and run it, then it will work without bug, but when running the
original binary file that started with the bug, the bug will appear
again.

All Xservers (with optimization or not, with debug or not, or all the
combinations) will crash at some point and behave as mentioned
previously.

I tried to go inside the Xserver by attaching gdb to the Xserver process
but the Xserver start to use a lot of cpu time and nothing happen. When
interrupted inside gdb by a Ctrl-C, gdb give a prompt again but a bt
gives some nonsense informations (when the bug is not present this
process of attaching gdb to the process works normally).


Some Hypothesis...

Well, I don't know if it is right but it seems that there is a cache
which contains a wrong image of the Xserver which is not flushed.

If we accept this hypothesis, I don't know why the bug is still the same
(usually these cache problems are random).

Another problem is that I don't know what to try next and how to get
deeper or how to have a better understanding on what is going on.

Can somebody confirm the behavior that I described or give me some hints
on what to do next ?

Some useful links to better understand the specificities of the Crusoe
processors:
http://www.realworldtech.com/page.cfm?ArticleID=RWT01020400
http://www.realworldtech.com/page.cfm?ArticleID=RWT012704012616

Regards
-- 
Emmanuel Fleury
 
Computer Science Department, |  Office: B1-201
Aalborg University,  |  Phone:  +45 96 35 72 23
Fredriks Bajersvej 7E,   |  Fax:+45 98 15 98 89
9220 Aalborg East, Denmark   |  Email:  [EMAIL PROTECTED]
GNU gdb 6.1-debian
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for details.
This GDB was configured as i386-linux...Using host libthread_db library /lib/tls/libthread_db.so.1.

(gdb) b XlibInt.c:_XReply
Make breakpoint pending on future shared library load? (y or [n]) 
Breakpoint 1 (XlibInt.c:_XReply) pending.
(gdb) r
Starting program: /home/fleury/devel/crusoe_bug/src/xfree86-4.3.0-dfsg/xc/programs/xlogo/xlogo 
Breakpoint 2 at 0x4026f3bc: file XlibInt.c, line 1642.
Pending breakpoint XlibInt.c:_XReply resolved

Breakpoint 2, _XReply (dpy=0x8050488, rep=0xb590, extra=0, discard=0)
at XlibInt.c:1642
	in XlibInt.c
(gdb) c
Continuing.

Breakpoint 2, _XReply (dpy=0x8050488, rep=0xb5b0, extra=0, discard=0)
at XlibInt.c:1642
1642	in XlibInt.c
(gdb) 
Continuing.

Breakpoint 2, _XReply (dpy=0x8050488, rep=0xb420, extra=0, discard=1)
at XlibInt.c:1642
1642	in XlibInt.c
(gdb) 
Continuing.

Breakpoint 2, _XReply (dpy=0x8050488, rep=0xb600, extra=0, discard=1)
at XlibInt.c:1642
1642	in XlibInt.c
(gdb) 
Continuing.

Breakpoint 2, _XReply (dpy=0x8050488, rep=0xb600, extra=0, discard=1)
at XlibInt.c:1642
1642	in XlibInt.c
(gdb) 
Continuing.

Breakpoint 2, _XReply (dpy=0x8050488, rep=0xbfffe4d0, extra=0, discard=1)
at XlibInt.c:1642
1642	in XlibInt.c
(gdb) 
Continuing.

Breakpoint 2, _XReply (dpy=0x8050488, rep=0xbfffe4e0, extra=0, discard=0)
at XlibInt.c:1642
1642	in XlibInt.c
(gdb) 
Continuing.

Breakpoint 2, _XReply (dpy=0x8050488, rep=0xb3d0, extra=0, discard=1)
at XlibInt.c:1642
1642	in XlibInt.c
(gdb) 
Continuing.

X Error of failed request:  BadLength (poly request too large or internal Xlib length error)
  Major opcode of failed request:  18 (X_ChangeProperty)
  Serial number of failed request:  15
  Current serial number in output stream:  18

Program exited with code 01.
(gdb) quit
GNU gdb 6.1-debian
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for details.
This 

Bug#261251: xserver-xfree86: [ati/radeon+crusoe] Xclients are crashing at start

2004-07-25 Thread Emmanuel Fleury
Hi all,

I still haven't found precisely the bug but I might have found a way to
work around.

I downloaded the sources of the Xserver at:
http://ftp.debian.org/debian/pool/main/x/xfree86/xfree86_4.3.0.dfsg.1.orig.tar.gz

And I compiled it with no modification (I mean the minimum modifications
needed to make it compile). It means that this version was compiled with
optimization of level 2 and no debug.

When I entered the specific mode where the problem was occurring, I
tried to run the Xserver compiled by myself and it was running ok (no
bug). Just to be sure that I was still in the specific mode I tried
another time to run the Xserver of the package from Debian afterward and
it crashed as expected.

The point here was that I never had any problem with the Xserver that I
compiled by myself and I thought at first that it was because of the
debug options, but I always had the debug options on for all of them.
So, just to be sure, I tried to compile one with full optimization and
no debug. I was hoping that it would either crash or run.

My guess now is that it is coming from one of the patches applied... but
I might be wrong.

I noticed also that when running the Xserver that I compiled by myself
the DRM was activated and the Xserver was much faster.

I attach to this mail, the /var/log/XFree86.0.log from the Xserver that
I compiled and the one of the plain debian package.

If anybody want more details or has some suggestions to track this bug a
bit further, I would be pleased to perform the tests or give the
informations.

For now, I will be running the Xserver that I compiled by myself (it is
faster and it resist to the bug, so I would be stupid to not have it).

Regards
-- 
Emmanuel Fleury
 
Computer Science Department, |  Office: B1-201
Aalborg University,  |  Phone:  +45 96 35 72 23
Fredriks Bajersvej 7E,   |  Fax:+45 98 15 98 89
9220 Aalborg East, Denmark   |  Email:  [EMAIL PROTECTED]

XFree86 Version 4.3.0
Release Date: 27 February 2003
X Protocol Version 11, Revision 0, Release 6.6
Build Operating System: Linux 2.6.7 i686 [ELF] 
Build Date: 24 July 2004
	Before reporting problems, check http://www.XFree86.Org/
	to make sure that you have the latest version.
Module Loader present
Markers: (--) probed, (**) from config file, (==) default setting,
 (++) from command line, (!!) notice, (II) informational,
 (WW) warning, (EE) error, (NI) not implemented, (??) unknown.
(==) Log file: /var/log/XFree86.0.log, Time: Sun Jul 25 17:14:19 2004
(==) Using config file: /etc/X11/XF86Config-4
(==) ServerLayout Default Layout
(**) |--Screen Screen 0 (0)
(**) |   |--Monitor LCD Display
(**) |   |--Device Radeon Mobility 0
(**) |--Input Device Keyboard
(**) Option AutoRepeat 500 30
(**) Option XkbRules xfree86
(**) XKB: rules: xfree86
(**) Option XkbModel pc101
(**) XKB: model: pc101
(**) Option XkbLayout us
(**) XKB: layout: us
(==) Keyboard: CustomKeycode disabled
(**) |--Input Device Mouse
(**) |--Input Device USB Mouse
(**) FontPath set to /usr/X11R6/lib/X11/fonts/local/,/usr/X11R6/lib/X11/fonts/misc/,/usr/X11R6/lib/X11/fonts/100dpi/:unscaled,/usr/X11R6/lib/X11/fonts/75dpi/:unscaled,/usr/X11R6/lib/X11/fonts/Type1/,/usr/X11R6/lib/X11/fonts/Speedo/,/usr/X11R6/lib/X11/fonts/100dpi/,/usr/X11R6/lib/X11/fonts/75dpi/
(**) RgbPath set to /usr/X11R6/lib/X11/rgb
(==) ModulePath set to /usr/X11R6/lib/modules
(--) using VT number 7

(WW) Open APM failed (/dev/apm_bios) (No such file or directory)
(II) Module ABI versions:
	XFree86 ANSI C Emulation: 0.2
	XFree86 Video Driver: 0.6
	XFree86 XInput driver : 0.4
	XFree86 Server Extension : 0.2
	XFree86 Font Renderer : 0.4
(II) Loader running on linux
(II) LoadModule: bitmap
(II) Loading /usr/X11R6/lib/modules/fonts/libbitmap.a
(II) Module bitmap: vendor=The XFree86 Project
	compiled for 4.3.0.1, module version = 1.0.0
	Module class: XFree86 Font Renderer
	ABI class: XFree86 Font Renderer, version 0.4
(II) Loading font Bitmap
(II) LoadModule: pcidata
(II) Loading /usr/X11R6/lib/modules/libpcidata.a
(II) Module pcidata: vendor=The XFree86 Project
	compiled for 4.3.0.1, module version = 1.0.0
	ABI class: XFree86 Video Driver, version 0.6
(II) PCI: Probing config type using method 1
(II) PCI: Config type is 1
(II) PCI: stages = 0x03, oldVal1 = 0x, mode1Res1 = 0x8000
(II) PCI: PCI scan (all values are in hex)
(II) PCI: 00:00:0: chip 1279,0395 card 104d,80ec rev 03 class 06,00,00 hdr 80
(II) PCI: 00:00:1: chip 1279,0396 card 104d,80ec rev 00 class 05,00,00 hdr 80
(II) PCI: 00:00:2: chip 1279,0397 card 104d,80ec rev 00 class 05,00,00 hdr 80
(II) PCI: 00:06:0: chip 10b9,5451 card 104d,80ec rev 02 class 04,01,00 hdr 00
(II) PCI: 00:07:0: chip 10b9,1533 card 104d,80ec rev 00 class 06,01,00 hdr 00
(II) PCI: 00:08:0: chip 10b9,5457 card 104d,80ec rev 00 class 07,03,00 hdr 00
(II) PCI: 00:09:0: chip 104c,8023 card 104d,80ec rev 00 class 0c,00,10 hdr 00
(II) PCI: 00:0a:0: chip 10cf,2011 card 104d,80ec rev 00 class 04,80,00 hdr 00
(II) PCI: 00:0b:0: 

Bug#261251: xserver-xfree86: [ati/radeon+crusoe] Xclients are crashing at start

2004-07-25 Thread Emmanuel Fleury
Hi,

I don't understand anything... I just encountered the bug again starting
with the new Xserver that I compiled. What is really strange is that
when I ran the Xserver from Debian (which was crashing previously) it
worked (the bug did not appear but still the Xserver compiled by me was
stuck).

It seems that when one Xserver is crashing, you cannot run it anymore
but you can run others... It looks like black art to me. I have no
explanation. I'm lost. :-(

If somebody has some rational explanation or some experiments to
perform, I'll be happy.

Any clue ?

Regards
-- 
Emmanuel Fleury
 
Computer Science Department, |  Office: B1-201
Aalborg University,  |  Phone:  +45 96 35 72 23
Fredriks Bajersvej 7E,   |  Fax:+45 98 15 98 89
9220 Aalborg East, Denmark   |  Email:  [EMAIL PROTECTED]




Bug#261251: xserver-xfree86: [ati/radeon+crusoe] Xclients are crashing at start

2004-07-24 Thread Emmanuel Fleury
Package: xserver-xfree86
Version: 4.3.0.dfsg.1-6
Severity: important


Hi all,

I am hunting a very strange bug (probably hardware related). This bug
has been reported by several people in bug #234556 as a bug of the
xlibs package but I think I managed to push it down to the xserver
package now (at least I believe so).

This bug seems to appear only on architectures which contain a
processor Transmeta Crusoe and/or a graphic card ATI Radeon Mobility
(mine is a Radeon Mobility M6 LY). I should emphase the fact that we
don't know which one from the Crusoe or the Radeon is responsible of
this problem (in fact, I don't even know if it is hardware).

The problem is the following, from time to time at boot the laptop
enter in a specific mode where the bug occurs. Since this mode has
been reached the bug is fully reproducible, but I don't know yet what
are the conditions to enter in this mode. So the only way to reach the
bug is to reboot several time until you reach this mode.

The bug itself is that the Xserver does not accept any connection from
any Xclient. Each time a Xclient is trying to connect the Xserver we
get the following message (exemple using xlogo):

[EMAIL PROTECTED] xlogo]$ ./xlogo
X Error of failed request:  BadLength (poly request too large or
internal Xlib length error)
  Major opcode of failed request:  18 (X_ChangeProperty)
  Serial number of failed request:  16
  Current serial number in output stream:  19

You can perfectly stop and start again the Xserver but only the
Xserver (using xinit will fail because of the attempt to start the
xconsole, but using only /usr/bin/X11/X or /usr/bin/X11/XFree86 will
work).

I started to debug from one Xclient (xlogo) compiled with debug option
and no optimization (it was misleading gdb) and the xlib compiled
also with debug and no optimization.

I appeared that the Xclient makes an attempt to connect to the Xserver
and send 9 messages to the Xserver. But when getting the 9th answer
from the Xserver, the size of the message (or the message itself, I
don't know) seems to be unexpected from the Xclient and makes it
terminate.

So, the bug is probably deeper in the Xserver (the behavior of the
Xclient seems to be as expected). This is here that things are
starting to be strange...

I compiled an Xserver with the debug options in order to follow what
was going on inside the Xserver. I reached the specific mode where the
bug was occurring and ran the Xserver with debug and... it worked. I
mean the Xclient connected normally to the Xserver with debug. Of
course, I checked that I was still into the specific mode by using a
non-debug Xserver and the problem was still here.

At first, I really though I did something wrong while the compilation,
so I took the Xserver package with debug options and it was the exact
same behavior (i.e. once in the specific mode the XFree86 was crashing
all the Xclients and the XFree86-debug was allowing the Xclients to
connect).

Then I tried to do the following:
- run XFree86 (no debug)
- run gdb on xlogo and stop just before send the 9th message
- run gdb on XFree86 (no debug) and attach to the first XFree86

My hope was to be able to disassemble the functions and follow what
was going on. The problem was that while attaching the process XFree86
was suddenly taking all the cpu time and nothing was happening (an
interesting point is that when not in this specific mode where the bug
is happening, this operation is possible).

So, now I'm stuck.

I really don't know how to go deeper in XFree86. My theory is that
there is a problem with the driver of the Radeon Mobility (maybe
combined with some specific feature of the motherboard for the Crusoe).

Does anybody have an idea ?

Regards
-- 
Emmanuel

-- Package-specific info:
Contents of /var/lib/xfree86/X.roster:
xserver-xfree86
xserver-xfree86-dbg

/var/lib/xfree86/X.md5sum does not exist.

X server symlink status:
lrwxrwxrwx  1 root root 20 Aug 23  2003 /etc/X11/X -
/usr/bin/X11/XFree86
-rwxr-xr-x  1 root root 1745388 Jul  7 17:07 /usr/bin/X11/XFree86

Contents of /var/lib/xfree86/XF86Config-4.roster:
xserver-xfree86
xserver-xfree86-dbg

VGA-compatible devices on PCI bus:
:00:0c.0 VGA compatible controller: ATI Technologies Inc Radeon
Mobility M6 LY

/etc/X11/XF86Config-4 does not match checksum in
/var/lib/xfree86/XF86Config-4.md5sum.

XFree86 X server configuration file status:
-rw-r--r--  1 root root 15888 Jul 16 17:42 /etc/X11/XF86Config-4

Contents of /etc/X11/XF86Config-4:
#
# XFree86 4.x configuration for Sony PCG-C1MZX
# Version: 1.0
#
# Author: Emmanuel Fleury [EMAIL PROTECTED]
#
# 28-aug-2003: First version based on the file of 
#  Felix Groebert [EMAIL PROTECTED]
#


#
# Copyright (c) 1999 by The XFree86 Project, Inc.
#
# Permission is hereby granted, free of charge, to any person obtaining
a
# copy of this software and associated documentation files (the
Software),
# to deal in the Software without restriction, including without
limitation
# the rights to 

Processed: Re: Bug#261251: xserver-xfree86: [ati/radeon+crusoe] Xclients are crashing at start

2004-07-24 Thread Debian Bug Tracking System
Processing commands for [EMAIL PROTECTED]:

 priority 261251 normal
Bug#261251: xserver-xfree86: [ati/radeon+crusoe] Xclients are crashing at start
Severity set to `normal'.

 merge 261251 216933
Bug#216933: libx11-6: many clients get BadLength error from X_ChangeProperty 
request (Transmeta Crusoe smoking gun)
Bug#261251: xserver-xfree86: [ati/radeon+crusoe] Xclients are crashing at start
Mismatch - only Bugs in same state can be merged:
Values for `package' don't match:
 #216933 has `libx11-6';
 #261251 has `xserver-xfree86'

 thanks
Stopping processing here.

Please contact me if you need assistance.

Debian bug tracking system administrator
(administrator, Debian Bugs database)



Bug#216933: Bug#261251: xserver-xfree86: [ati/radeon+crusoe] Xclients are crashing at start

2004-07-24 Thread Michel Dänzer
priority 261251 normal
merge 261251 216933
thanks

On Sat, 2004-07-24 at 17:17 +0200, Emmanuel Fleury wrote:
 
 I am hunting a very strange bug (probably hardware related). This bug
 has been reported by several people in bug #234556 as a bug of the
 xlibs package but I think I managed to push it down to the xserver
 package now (at least I believe so).

It's still the same bug though, why are you reporting it as a new one?
Merging with the existing bugs about this problem...

 This bug seems to appear only on architectures which contain a
 processor Transmeta Crusoe and/or a graphic card ATI Radeon Mobility
 (mine is a Radeon Mobility M6 LY). I should emphase the fact that we
 don't know which one from the Crusoe or the Radeon is responsible of
 this problem (in fact, I don't even know if it is hardware).

It's certainly not the graphics chip, as the code which generates the
error isn't even near the driver, let alone the hardware, and people are
seeing the problem with a variety of graphics chips, but only with
Transmeta CPUs.


 I compiled an Xserver with the debug options in order to follow what
 was going on inside the Xserver. I reached the specific mode where the
 bug was occurring and ran the Xserver with debug and... it worked. I
 mean the Xclient connected normally to the Xserver with debug. Of
 course, I checked that I was still into the specific mode by using a
 non-debug Xserver and the problem was still here.
 
 At first, I really though I did something wrong while the compilation,
 so I took the Xserver package with debug options and it was the exact
 same behavior (i.e. once in the specific mode the XFree86 was crashing
 all the Xclients and the XFree86-debug was allowing the Xclients to
 connect).

Interesting. So it sounds like the optimized code either contains some
instruction(s) that work slightly differently on Transmeta CPUs than on
other x86 CPUs, and/or it triggers a bug in the code morphing.

Can others seeing the problem confirm that it doesn't happen with the
XFree86-debug server from xserver-xfree86-dbg?


-- 
Earthling Michel Dänzer  | Debian (powerpc), X and DRI developer
Libre software enthusiast|   http://svcs.affero.net/rm.php?r=daenzer





Bug#261251: xserver-xfree86: [ati/radeon+crusoe] Xclients are crashing at start

2004-07-24 Thread Emmanuel Fleury
On Sat, 2004-07-24 at 20:04, Michel Dänzer wrote:
 
 It's still the same bug though, why are you reporting it as a new one?
 Merging with the existing bugs about this problem...

The first bug was reported to be a bug of the xlibs and I though that it
was changing a bit to relate it to the xserver package now.

(Moreover, I lost the reference to the last mail I sent...)

Sorry for this.

 It's certainly not the graphics chip, as the code which generates the
 error isn't even near the driver, let alone the hardware, and people are
 seeing the problem with a variety of graphics chips, but only with
 Transmeta CPUs.

Ok. I might have been too fast to draw hypothesis on this.

But I am now sure that we have to go deeper (i.e. in the Xserver).
I don't know where it will stop.

 Interesting. So it sounds like the optimized code either contains some
 instruction(s) that work slightly differently on Transmeta CPUs than on
 other x86 CPUs, and/or it triggers a bug in the code morphing.

Hum, does it means that the XFree86-debug does not contain any
optimization ?

If so, I have to tell you that I actually tried to compile one Xserver
with debug and no optimization (I removed all the -O2 from the
Makefiles). And it was working.

In fact, when debug is activated it seems to work.

If somebody knows what are the differences between a normal binary and a
binary with debug options, I would like to know. And if one of these
differences can explain such behavior, then I really would like to know.
:)

 Can others seeing the problem confirm that it doesn't happen with the
 XFree86-debug server from xserver-xfree86-dbg?

Just to make it more clear where is the breaking point, here are some
hints:

- start the Xserver and the Xserver only
- run gdb on xlogo
- put a break on the function _XReply in the file XlibInt.c 
- after 9 times through this break you should encounter the bug.

The size of the 9th message sent by xlogo is 476 and the wrong reply
from the Xserver is 32.

Regards
-- 
Emmanuel Fleury
 
Computer Science Department, |  Office: B1-201
Aalborg University,  |  Phone:  +45 96 35 72 23
Fredriks Bajersvej 7E,   |  Fax:+45 98 15 98 89
9220 Aalborg East, Denmark   |  Email:  [EMAIL PROTECTED]