On Mon, Feb 1, 2010 at 13:17, Émeric Maschino <emeric.masch...@gmail.com> wrote: > 2010/1/31 Jerome Glisse <gli...@freedesktop.org>: >>> <snip> >>> Eventually, strace log is flooded with >>> ioctl(4, 0xc0106451, 0x60000fffffd530f8) = 0 >>> roughly at the time the CPU charge increases. This is consistent with >>> what is recorded in syslog: >>> Jan 29 21:16:03 longspeak kernel: [ 318.611783] [drm:drm_ioctl], >>> pid=2426, cmd=0xc0106451, nr=0x51, dev 0xe200, auth=1 >>> Jan 29 21:16:03 longspeak kernel: [ 318.611789] >>> [drm:radeon_cp_getparam], pid=2426 >>> repeated several tens of thousands times where 2426 is glxgears PID. >>> <snip> >> You are hitting GPU lockup which traduce by userspace keep >> trying the same ioctl over and over again which completely >> eat all CPU. > > Thank you for clarifying. Does GPU lockup mean that this problem is > specific to my current hardware configuration? If I try an other > graphics adapter (choices are scarce on ia64), is it possible that I > don't experience GPU lockup at all or a different one? > >> There is no easy way to debug GPU lockup and no way at >> all than by staring a GPU command stream or making wild >> guess and testing things randomly. > > Just to clarify: I imagine that a GPU command stream is specific to a > given GPU/driver. Does it mean that the commands sent to the GPU are > not the sames on different Linux platforms (e.g. ia64/r300 vs. > x86/r300)? > > About GPU command, is this something I can read in the various > logfiles? Is there some kind of command generator to send a specific > command or command stream to the GPU in order to help determine which > one is the faulty one? > > I don't know if these are the command sent to the GPU but, looking > again at the strace glxgears output I've recorded, I'm getting: > futex(0x60000fffffd53420, > FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, NULL, > 200000000004d1e8) = -1 EAGAIN (Resource temporarily unavailable) > and numerous > read(3, 0x60000000000093e4, 4096) = -1 EAGAIN (Resource > temporarily unavailable) > Should the return value of read() be equal to the number of blocks (I > imagine) passed as the third argument? In this case, before getting > EAGAIN error when trying to read blocks, I'm getting this following > sequence that seem to shift something: > writev(3, [{"b\0\5\0\f\0\0\0BIG-REQUESTS", 20}], 1) = 20 > poll([{fd=3, events=POLLIN}], 1, -1) = 1 ([{fd=3, revents=POLLIN}]) > read(3, "\1\0\1\0\0\0\0\0\1\216\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", > 4096) = 32 > poll([{fd=3, events=POLLIN|POLLOUT}], 1, -1) = 1 ([{fd=3, revents=POLLOUT}]) > writev(3, [{"\216\0\1\0", 4}], 1) = 4 > poll([{fd=3, events=POLLIN}], 1, -1) = 1 ([{fd=3, revents=POLLIN}]) > read(3, "\1\0\2\0\0\0\0\0\377\377?\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", > 4096) = 32 > read(3, 0x60000000000093e4, 4096) = -1 EAGAIN (Resource > temporarily unavailable) > poll([{fd=3, events=POLLIN|POLLOUT}], 1, -1) = 1 ([{fd=3, revents=POLLOUT}]) > >From there, all subsequent pair of read() calls fail. > By contrast, in the (old) strace glxgears excerpt posted here > (http://ubuntuforums.org/showthread.php?t=75007), the read calls seem > to always succeed. > > Could this be a starting point or not at all? >
If an ia64 machine lockups, it will usually store an MCA telling you about why it locked/where in the code this happened. This is how I got ia64 DRI going a bunch of years ago. For what it's worth, most of the bugs were: - pci resources casted to 32 bit in the DRM - some 32 bit adresses but that got fixed as a side effect of us having x86_64 supported now - large (32 or 64 bit) writes to I/O areas (should be all 8 bit, the ia64 crashes otherwise) either from the kernel or from user space Really to track those the MCA errors proved extremely useful. Usually they carry a pci adress and all... Stephane ------------------------------------------------------------------------------ The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com -- _______________________________________________ Dri-devel mailing list Dri-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dri-devel