On Mon, Feb 1, 2010 at 13:17, Émeric Maschino <emeric.masch...@gmail.com> wrote:
> 2010/1/31 Jerome Glisse <gli...@freedesktop.org>:
>>> <snip>
>>> Eventually, strace log is flooded with
>>> ioctl(4, 0xc0106451, 0x60000fffffd530f8) = 0
>>> roughly at the time the CPU charge increases. This is consistent with
>>> what is recorded in syslog:
>>> Jan 29 21:16:03 longspeak kernel: [  318.611783] [drm:drm_ioctl],
>>> pid=2426, cmd=0xc0106451, nr=0x51, dev 0xe200, auth=1
>>> Jan 29 21:16:03 longspeak kernel: [  318.611789]
>>> [drm:radeon_cp_getparam], pid=2426
>>> repeated several tens of thousands times where 2426 is glxgears PID.
>>> <snip>
>> You are hitting GPU lockup which traduce by userspace keep
>> trying the same ioctl over and over again which completely
>> eat all CPU.
>
> Thank you for clarifying. Does GPU lockup mean that this problem is
> specific to my current hardware configuration? If I try an other
> graphics adapter (choices are scarce on ia64), is it possible that I
> don't experience GPU lockup at all or a different one?
>
>> There is no easy way to debug GPU lockup and no way at
>> all than by staring a GPU command stream or making wild
>> guess and testing things randomly.
>
> Just to clarify: I imagine that a GPU command stream is specific to a
> given GPU/driver. Does it mean that the commands sent to the GPU are
> not the sames on different Linux platforms (e.g. ia64/r300 vs.
> x86/r300)?
>
> About GPU command, is this something I can read in the various
> logfiles? Is there some kind of command generator to send a specific
> command or command stream to the GPU in order to help determine which
> one is the faulty one?
>
> I don't know if these are the command sent to the GPU but, looking
> again at the strace glxgears output I've recorded, I'm getting:
> futex(0x60000fffffd53420,
> FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, NULL,
> 200000000004d1e8) = -1 EAGAIN (Resource temporarily unavailable)
> and numerous
> read(3, 0x60000000000093e4, 4096)       = -1 EAGAIN (Resource
> temporarily unavailable)
> Should the return value of read() be equal to the number of blocks (I
> imagine) passed as the third argument? In this case, before getting
> EAGAIN error when trying to read blocks, I'm getting this following
> sequence that seem to shift something:
> writev(3, [{"b\0\5\0\f\0\0\0BIG-REQUESTS", 20}], 1) = 20
> poll([{fd=3, events=POLLIN}], 1, -1)    = 1 ([{fd=3, revents=POLLIN}])
> read(3, "\1\0\1\0\0\0\0\0\1\216\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0",
> 4096) = 32
> poll([{fd=3, events=POLLIN|POLLOUT}], 1, -1) = 1 ([{fd=3, revents=POLLOUT}])
> writev(3, [{"\216\0\1\0", 4}], 1)       = 4
> poll([{fd=3, events=POLLIN}], 1, -1)    = 1 ([{fd=3, revents=POLLIN}])
> read(3, "\1\0\2\0\0\0\0\0\377\377?\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0",
> 4096) = 32
> read(3, 0x60000000000093e4, 4096)       = -1 EAGAIN (Resource
> temporarily unavailable)
> poll([{fd=3, events=POLLIN|POLLOUT}], 1, -1) = 1 ([{fd=3, revents=POLLOUT}])
> >From there, all subsequent pair of read() calls fail.
> By contrast, in the (old) strace glxgears excerpt posted here
> (http://ubuntuforums.org/showthread.php?t=75007), the read calls seem
> to always succeed.
>
> Could this be a starting point or not at all?
>

If an ia64 machine lockups, it will usually store an MCA telling you
about why it locked/where in the code this happened.
This is how I got ia64 DRI going a bunch of years ago. For what it's
worth, most of the bugs were:
- pci resources casted to 32 bit in the DRM
- some 32 bit adresses but that got fixed as a side effect of us
having x86_64 supported now
- large (32 or 64 bit) writes to I/O areas (should be all 8 bit, the
ia64 crashes otherwise) either from the kernel or from user space

Really to track those the MCA errors proved extremely useful. Usually
they carry a pci adress and all...

Stephane

------------------------------------------------------------------------------
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
--
_______________________________________________
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel

Reply via email to