Re: 3D OpenGL applications eat CPU ressources

2010-02-16 Thread Émeric Maschino
2010/2/3 Stephane Marchesin stephane.marche...@gmail.com:
 Really if you have such lockups they may also happen on x86, did you
 try the card there?

Hello,

I had some free time. So I've tried my FireGL X1 adapter on x86
hardware, no problem.

I don't know if it can provide valuable information, but I've also
tried an AGP Radeon 7500 graphics adapter in my ia64 system.

Without a xorg.conf file, AGP rate was automatically set at 4x, w/ SBA
and w/o FW. XAA acceleration was enabled by default. I did not
experience any problem with tiny OpenGL applications, like glxgears
(~380 fps in average). As a test, I ran quake2 : textures on the walls
and the floor were quickly corrupted, as if a translucent rainbow
color texture was blended with the wall/floor texture. And within
seconds, screen refresh was frozen, as if the application locked the
system hard. But it was not: top reveals no abusive CPU usage, quake2
process was killed and X restarted without a problem.

I've then tried EXA acceleration. I didn't succeed in reproducing the
problem again since then, but I've experienced two GPU lockups when I
was simply moving a terminal window in the GNOME desktop environment
(reducing AGP rate didn't help). Running glxgears gave ~390 fps in
average. Under quake2, the floor and wall textures were OK. But the
screen freezes as with XAA acceleration.

As a last attempt, I've also tried an AGP Radeon 9600 Pro graphics
adapter. My ia64 system didn't POST at all.

Cheers,

Émeric

--
SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
http://p.sf.net/sfu/solaris-dev2dev
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: 3D OpenGL applications eat CPU ressources

2010-02-11 Thread Émeric Maschino
2010/2/4 Jerome Glisse gli...@freedesktop.org:
 IIRC old radeon drm doesn't have any thing to dump GPU command stream.
 Look at http://www.x.org/docs/AMD/R5xx_Acceleration_v1.4.pdf to see
 what radeon GPU stream command looks like (packet pm4 stuff)

Interesting read for the parts I can understand. But a lot of this
documentation goes way beyond my knowledge.

Looking at the logs I've recorded, I can see R300_CMD_PACKET0,
R300_CMD_WAIT, R300_CMD_PACKET3_RAW or R300_CMD_END3 traces. If I'm
not mistaken, they come from r300_cmdbuf.c but I don't have the
impression that they give sufficient information to isolate which
command triggers a GPU lockup, right?

As an alternative approach, I was wondering whether simple OpenGL
applications, with growing number of different OpenGL instructions,
would help diagnose the offending r300 command or not at all?

Émeric

--
SOLARIS 10 is the OS for Data Centers - provides features such as DTrace,
Predictive Self Healing and Award Winning ZFS. Get Solaris 10 NOW
http://p.sf.net/sfu/solaris-dev2dev
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: 3D OpenGL applications eat CPU ressources

2010-02-08 Thread Émeric Maschino
2010/2/8 Alex Deucher alexdeuc...@gmail.com:
 Does AGP work at all on ia64?  I know on some alphas there were cache
 coherency issues or something like that that more or less prevented
 AGP from be usable at all.  It was mostly there to accommodate AGP
 form factor cards.

I would say that AGP works on ia64, or at least it used to ;-)

Indeed, ATI proprietary fglrx driver was running nicely, but was
limited to XFree86 4.1.x (there was a check of the XFree86 version at
runtime). This was during the kernel 2.4 era.

And NVIDIA proprietary driver was running fine during the kernel
2.4/early 2.6 era (I remember having used it with kernel 2.6.10).

At that time, the zx1 driver was already there. And except from
API/ABI adjustments, I don't think it has been massively rewritten
since then. That's why I tend to think that the GPU lockup probably
resides somewhere else.

Looking again at the lspci -vv output, I can read GART64- and
64bit- in this line:

Capabilities: [58] AGP version 2.0
   Status: RQ=80 Iso- ArqSz=0 Cal=0 SBA+ ITACoh- GART64-
HTrans- 64bit-
FW+ AGP3- Rate=x1,x2,x4

Are these capabilities related to 64-bit architectures or not at all?
If related, should we read GART64+ and 64bit+ on ia64 systems?

Émeric

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: 3D OpenGL applications eat CPU ressources

2010-02-07 Thread Stephane Marchesin
On Sat, Feb 6, 2010 at 11:47, Émeric Maschino emeric.masch...@gmail.com wrote:
 2010/2/4 Jerome Glisse gli...@freedesktop.org:
 IIRC old radeon drm doesn't have any thing to dump GPU command stream.
 Look at http://www.x.org/docs/AMD/R5xx_Acceleration_v1.4.pdf to see
 what radeon GPU stream command looks like (packet pm4 stuff). Note that
 dump GPU command stream can quickly eat Gigs of data and finding what
 is causing the lockup is then very cumberstone especialy as in your
 case it sounds like it's a timing issue. You might want to force your
 card into pci mode to see if it's agp related.

 Yep, setting Option BusType PCI in /etc/X11/xorg.conf prevents
 from GPU lockup.

 A a side note, strace glxinfo and strace glxgears still give me read()
 errors on /tmp/.X11-unix/X0, so they're probably not related to GPU
 lockup.

 Anyway, I don't know whether this is due to PCI mode or not, but
 OpenGL performances, although there's no more GPU lockup, are poor.
 And serious OpenGL applications, as simulated by the SPECviewperf test
 suite, have very irregular frame rates. If I'm not mistaken, the
 BusType option is specific to the radeon driver (or maybe other
 drivers too)? I mean, it's not a X.org wide configuration option,
 isn't it? This would thus narrow my investigation path to the AGP code
 of the radeon driver, right?


From what I recall, all the ia64 AGP chipsets (well the zx1 and the
460) have to be run:
- without side band adressing
- without fast writes
- at 4x speed
otherwise they're unstable.

I think by default agpgart puts them at AGP 1x with fast writes...

Stephane

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: 3D OpenGL applications eat CPU ressources

2010-02-07 Thread Émeric Maschino
2010/2/7 Stephane Marchesin stephane.marche...@gmail.com:
 From what I recall, all the ia64 AGP chipsets (well the zx1 and the
 460) have to be run:
 - without side band adressing
 - without fast writes
 - at 4x speed
 otherwise they're unstable.

 I think by default agpgart puts them at AGP 1x with fast writes...

Without /etc/X11/xorg.conf, AGP is configured as follows:
- 2x rate
- fast writes are disabled.

Adding /etc/X11/xorg.conf in order to manually set AGP rate at 4x
didn't help. Running glxgears triggers GPU lockup slightly faster than
at 2x or 1x rate (well, GPU lockup appears in less than 1 sec. vs.
~2-3 sec. at slower rates).

I've no idea about sideband addressing. Is there a way to check
whether it's enabled or not? And is there a way to disable it?

Just for completeness, Chapter 8.2.3 (AGP Registers) of HP zx1 ioa
External Reference Specification
(http://ftp.parisc-linux.org/docs/chips/zx1-ioa-mercury_ers.pdf) says
that zx1 chipset supports:
- AGP 1x, 2x and 4x data rate
- fast writes for PIO transactions
- sideband addressing.

Émeric

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: 3D OpenGL applications eat CPU ressources

2010-02-07 Thread Émeric Maschino
2010/2/7 Dave Airlie airl...@linux.ie:
 This would thus narrow my investigation path to the AGP code
 of the radeon driver, right?

 No it narrows it down the to the AGP hardware in your machine along with
 the probable lack of info on it, and maybe some tweaks that we know
 nothing about.

By AGP hardware, do you mean the chipset or the graphics adapter?

I know nothing about driver development, but it seems to me that the
zx1 chipset is fairly well documented
(http://ftp.parisc-linux.org/docs/chips/zx1-ioa-mercury_ers.pdf). And
from the copyright in drivers/char/agp/hp-agp.c, it seems to me that
the zx1 driver was written by Bjorn Helgaas who works at hp.

About the ATI FireGL X1 graphics adapter, since it's powered by an FGL
9700 GPU and people had to reverse-engineer this range of products,
maybe are there some tweaks we know nothing about, indeed ;-)

 If it was as simple as a codepath in the radeon driver I think we'd have
 fixed it by now.

Would it be possible that something in the codepath in the radeon
driver behaves differently/incorrectly on ia64 systems? As an example
of generic code that nevertheless triggers a bad behaviour on ia64
systems (only?), patch drm: Preserve SHMLBA bits in hash key for
_DRM_SHM mappings prevents DRI from being enabled
(http://bugzilla.kernel.org/show_bug.cgi?id=15212).

Back to the radeon driver, would it help if I can put my hand on an
ATI Radeon 9700 graphics adapter? It was probably more widely used by
gamers, and thus tested by the Linux community, than the CAD-oriented
FireGL X1 one.

Émeric

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: 3D OpenGL applications eat CPU ressources

2010-02-07 Thread Émeric Maschino
2010/2/7 Émeric Maschino emeric.masch...@gmail.com:
 I've no idea about sideband addressing. Is there a way to check
 whether it's enabled or not? And is there a way to disable it?

lspci -vv gives:

80:00.0 VGA compatible controller: ATI Technologies Inc Radeon R300 NG
[FireGL X1] (rev 80) (prog-if 00 [VGA controller])
Subsystem: ATI Technologies Inc Device 0152
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping+ SERR+ FastB2B- DisINTx-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium TAbort-
TAbort- MAbort- SERR- PERR- INTx-
Latency: 192 (2000ns min), Cache Line Size: 128 bytes
Interrupt: pin A routed to IRQ 61
Region 0: Memory at d000 (32-bit, prefetchable) [size=128M]
Region 1: I/O ports at 8000 [size=256]
Region 2: Memory at d803 (32-bit, non-prefetchable) [size=64K]
Expansion ROM at d800 [disabled] [size=128K]
Capabilities: [58] AGP version 2.0
Status: RQ=80 Iso- ArqSz=0 Cal=0 SBA+ ITACoh- GART64- HTrans- 
64bit-
FW+ AGP3- Rate=x1,x2,x4
Command: RQ=16 ArqSz=0 Cal=0 SBA+ AGP+ GART64- 64bit- FW- 
Rate=x4
Capabilities: [50] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA 
PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-

80:00.1 Display controller: ATI Technologies Inc Radeon R300 [FireGL
X1] (Secondary) (rev 80)
Subsystem: ATI Technologies Inc Device 0153
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr-
Stepping+ SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium TAbort-
TAbort- MAbort- SERR- PERR- INTx-
Latency: 192 (2000ns min), Cache Line Size: 128 bytes
Region 0: Memory at c800 (32-bit, prefetchable) [size=128M]
Region 1: Memory at d802 (32-bit, non-prefetchable) [size=64K]
Capabilities: [50] Power Management version 2
Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA 
PME(D0-,D1-,D2-,D3hot-,D3cold-)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-

I imagine that SBA+ stands for SideBand Addressing enabled, right? I
still don't know how to disable it.

Émeric

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: 3D OpenGL applications eat CPU ressources

2010-02-07 Thread Alex Deucher
On Sun, Feb 7, 2010 at 12:18 PM, Émeric Maschino
emeric.masch...@gmail.com wrote:
 2010/2/7 Stephane Marchesin stephane.marche...@gmail.com:
 From what I recall, all the ia64 AGP chipsets (well the zx1 and the
 460) have to be run:
 - without side band adressing
 - without fast writes
 - at 4x speed
 otherwise they're unstable.

 I think by default agpgart puts them at AGP 1x with fast writes...

 Without /etc/X11/xorg.conf, AGP is configured as follows:
 - 2x rate
 - fast writes are disabled.

 Adding /etc/X11/xorg.conf in order to manually set AGP rate at 4x
 didn't help. Running glxgears triggers GPU lockup slightly faster than
 at 2x or 1x rate (well, GPU lockup appears in less than 1 sec. vs.
 ~2-3 sec. at slower rates).

 I've no idea about sideband addressing. Is there a way to check
 whether it's enabled or not? And is there a way to disable it?

 Just for completeness, Chapter 8.2.3 (AGP Registers) of HP zx1 ioa
 External Reference Specification
 (http://ftp.parisc-linux.org/docs/chips/zx1-ioa-mercury_ers.pdf) says
 that zx1 chipset supports:
 - AGP 1x, 2x and 4x data rate
 - fast writes for PIO transactions
 - sideband addressing.

Does AGP work at all on ia64?  I know on some alphas there were cache
coherency issues or something like that that more or less prevented
AGP from be usable at all.  It was mostly there to accommodate AGP
form factor cards.

Alex

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: 3D OpenGL applications eat CPU ressources

2010-02-06 Thread Émeric Maschino
2010/2/4 Jerome Glisse gli...@freedesktop.org:
 IIRC old radeon drm doesn't have any thing to dump GPU command stream.
 Look at http://www.x.org/docs/AMD/R5xx_Acceleration_v1.4.pdf to see
 what radeon GPU stream command looks like (packet pm4 stuff). Note that
 dump GPU command stream can quickly eat Gigs of data and finding what
 is causing the lockup is then very cumberstone especialy as in your
 case it sounds like it's a timing issue. You might want to force your
 card into pci mode to see if it's agp related.

Yep, setting Option BusType PCI in /etc/X11/xorg.conf prevents
from GPU lockup.

A a side note, strace glxinfo and strace glxgears still give me read()
errors on /tmp/.X11-unix/X0, so they're probably not related to GPU
lockup.

Anyway, I don't know whether this is due to PCI mode or not, but
OpenGL performances, although there's no more GPU lockup, are poor.
And serious OpenGL applications, as simulated by the SPECviewperf test
suite, have very irregular frame rates. If I'm not mistaken, the
BusType option is specific to the radeon driver (or maybe other
drivers too)? I mean, it's not a X.org wide configuration option,
isn't it? This would thus narrow my investigation path to the AGP code
of the radeon driver, right?

Émeric

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: 3D OpenGL applications eat CPU ressources

2010-02-06 Thread Alex Deucher
On Sat, Feb 6, 2010 at 2:47 PM, Émeric Maschino
emeric.masch...@gmail.com wrote:
 2010/2/4 Jerome Glisse gli...@freedesktop.org:
 IIRC old radeon drm doesn't have any thing to dump GPU command stream.
 Look at http://www.x.org/docs/AMD/R5xx_Acceleration_v1.4.pdf to see
 what radeon GPU stream command looks like (packet pm4 stuff). Note that
 dump GPU command stream can quickly eat Gigs of data and finding what
 is causing the lockup is then very cumberstone especialy as in your
 case it sounds like it's a timing issue. You might want to force your
 card into pci mode to see if it's agp related.

 Yep, setting Option BusType PCI in /etc/X11/xorg.conf prevents
 from GPU lockup.

 A a side note, strace glxinfo and strace glxgears still give me read()
 errors on /tmp/.X11-unix/X0, so they're probably not related to GPU
 lockup.

 Anyway, I don't know whether this is due to PCI mode or not, but
 OpenGL performances, although there's no more GPU lockup, are poor.
 And serious OpenGL applications, as simulated by the SPECviewperf test
 suite, have very irregular frame rates. If I'm not mistaken, the
 BusType option is specific to the radeon driver (or maybe other
 drivers too)? I mean, it's not a X.org wide configuration option,
 isn't it? This would thus narrow my investigation path to the AGP code
 of the radeon driver, right?

AGP is somewhat broken by design.  There are alots of subtle
incompatibilities and quirks between different AGP and GPU
combinations.  Your best bet is to play with the agp options in your
bios, or try adjusting the agpmode option:
Option AGPMode x
where x = 1 or 2 or 4 or 8
If you find a mode that works, we can add a quirk for your chipset/gpu
combination.

Alex

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: 3D OpenGL applications eat CPU ressources

2010-02-06 Thread Dave Airlie

 Anyway, I don't know whether this is due to PCI mode or not, but
 OpenGL performances, although there's no more GPU lockup, are poor.
 And serious OpenGL applications, as simulated by the SPECviewperf test
 suite, have very irregular frame rates. If I'm not mistaken, the
 BusType option is specific to the radeon driver (or maybe other
 drivers too)? I mean, it's not a X.org wide configuration option,
 isn't it? This would thus narrow my investigation path to the AGP code
 of the radeon driver, right?

No it narrows it down the to the AGP hardware in your machine along with 
the probable lack of info on it, and maybe some tweaks that we know 
nothing about.

If it was as simple as a codepath in the radeon driver I think we'd have 
fixed it by now.

Dave.

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: 3D OpenGL applications eat CPU ressources

2010-02-04 Thread Émeric Maschino
2010/2/3 Stephane Marchesin stephane.marche...@gmail.com:
 No, you are right they don't trigger MCA. Hmm I didn't have any of
 those back then, my lockups came from the bus mostly...

Thank you for clarifying this point.

 Really if you have such lockups they may also happen on x86, did you
 try the card there?

Yes, I have no problem with this (AGP Pro 4x) graphics adapter (ATI
FireGL X1) in x86 hardware.

 At this point your best bet is probably replay the crashing sequence
 until you can reduce it to the offending couple of commands.

OK. Are the commands you're talking about the argument passed in the
various ioctl() calls logged when stracing the offending OpenGL
application? For example, strace glxgears gives lines like:
ioctl(4, 0xc0106451, 0x6fd52d30) = 0
ioctl(4, 0xc0186419, 0x6fd52d30) = 0
ioctl(4, 0x40106459, 0x6fd52d58) = 0
where 4 is the file descriptor of /dev/dri/card0. Are 0xc0106451,
0xc0186419 or 0x40106459 the commands passed to the GPU?

I don't know if it's related to GPU lockup or not (I mean, being the
cause or a consequence), but I've also noticed in the strace glxgears
logs (or even simple application like glxinfo) that most of the read()
calls to /tmp/.X11-unix/X0 fail, whereas the writev() calls seem to
succeed:
poll([{fd=3, events=POLLIN|POLLOUT}], 1, -1) = 1 ([{fd=3, revents=POLLOUT}])
writev(3, [{\222\0\3\0\4\0\0\0\0\0\0\0, 12}, {NULL, 0}, {, 0}], 3) = 12
poll([{fd=3, events=POLLIN}], 1, -1)= 1 ([{fd=3, revents=POLLIN}])
read(3, \1\0*\0\0\0\0\0\4\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0,
4096) = 32
read(3, 0x600093e4, 4096)   = -1 EAGAIN (Resource
temporarily unavailable)
where 3 is the file descriptor of /tmp/.X11-unix/X0

Émeric

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: 3D OpenGL applications eat CPU ressources

2010-02-04 Thread Jerome Glisse
On Thu, Feb 04, 2010 at 03:37:58PM +0100, Émeric Maschino wrote:
 2010/2/3 Stephane Marchesin stephane.marche...@gmail.com:
  No, you are right they don't trigger MCA. Hmm I didn't have any of
  those back then, my lockups came from the bus mostly...
 
 Thank you for clarifying this point.
 
  Really if you have such lockups they may also happen on x86, did you
  try the card there?
 
 Yes, I have no problem with this (AGP Pro 4x) graphics adapter (ATI
 FireGL X1) in x86 hardware.
 
  At this point your best bet is probably replay the crashing sequence
  until you can reduce it to the offending couple of commands.
 
 OK. Are the commands you're talking about the argument passed in the
 various ioctl() calls logged when stracing the offending OpenGL
 application? For example, strace glxgears gives lines like:
 ioctl(4, 0xc0106451, 0x6fd52d30) = 0
 ioctl(4, 0xc0186419, 0x6fd52d30) = 0
 ioctl(4, 0x40106459, 0x6fd52d58) = 0
 where 4 is the file descriptor of /dev/dri/card0. Are 0xc0106451,
 0xc0186419 or 0x40106459 the commands passed to the GPU?
 
 I don't know if it's related to GPU lockup or not (I mean, being the
 cause or a consequence), but I've also noticed in the strace glxgears
 logs (or even simple application like glxinfo) that most of the read()
 calls to /tmp/.X11-unix/X0 fail, whereas the writev() calls seem to
 succeed:
 poll([{fd=3, events=POLLIN|POLLOUT}], 1, -1) = 1 ([{fd=3, revents=POLLOUT}])
 writev(3, [{\222\0\3\0\4\0\0\0\0\0\0\0, 12}, {NULL, 0}, {, 0}], 3) = 12
 poll([{fd=3, events=POLLIN}], 1, -1)= 1 ([{fd=3, revents=POLLIN}])
 read(3, \1\0*\0\0\0\0\0\4\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0,
 4096) = 32
 read(3, 0x600093e4, 4096)   = -1 EAGAIN (Resource
 temporarily unavailable)
 where 3 is the file descriptor of /tmp/.X11-unix/X0
 
 Émeric
 

IIRC old radeon drm doesn't have any thing to dump GPU command stream.
Look at http://www.x.org/docs/AMD/R5xx_Acceleration_v1.4.pdf to see
what radeon GPU stream command looks like (packet pm4 stuff). Note that
dump GPU command stream can quickly eat Gigs of data and finding what
is causing the lockup is then very cumberstone especialy as in your
case it sounds like it's a timing issue. You might want to force your
card into pci mode to see if it's agp related.

Cheers,
Jerome

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: 3D OpenGL applications eat CPU ressources

2010-02-02 Thread Émeric Maschino
2010/2/1 Stephane Marchesin stephane.marche...@gmail.com:
 If an ia64 machine lockups, it will usually store an MCA telling you
 about why it locked/where in the code this happened.
 This is how I got ia64 DRI going a bunch of years ago. For what it's
 worth, most of the bugs were:
 - pci resources casted to 32 bit in the DRM
 - some 32 bit adresses but that got fixed as a side effect of us
 having x86_64 supported now
 - large (32 or 64 bit) writes to I/O areas (should be all 8 bit, the
 ia64 crashes otherwise) either from the kernel or from user space

 Really to track those the MCA errors proved extremely useful. Usually
 they carry a pci adress and all...

Just to understand: in the present case, I've been told that I'm
experiencing GPU lockups. I can still remote log in to the station and
kill the offending application. So, I imagine that's different than
ia64 lockup, isn't it? Will an MCA event thus be triggered?

Émeric

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: 3D OpenGL applications eat CPU ressources

2010-02-01 Thread Émeric Maschino
2010/1/31 Jerome Glisse gli...@freedesktop.org:
 snip
 Eventually, strace log is flooded with
 ioctl(4, 0xc0106451, 0x6fd530f8) = 0
 roughly at the time the CPU charge increases. This is consistent with
 what is recorded in syslog:
 Jan 29 21:16:03 longspeak kernel: [  318.611783] [drm:drm_ioctl],
 pid=2426, cmd=0xc0106451, nr=0x51, dev 0xe200, auth=1
 Jan 29 21:16:03 longspeak kernel: [  318.611789]
 [drm:radeon_cp_getparam], pid=2426
 repeated several tens of thousands times where 2426 is glxgears PID.
 snip
 You are hitting GPU lockup which traduce by userspace keep
 trying the same ioctl over and over again which completely
 eat all CPU.

Thank you for clarifying. Does GPU lockup mean that this problem is
specific to my current hardware configuration? If I try an other
graphics adapter (choices are scarce on ia64), is it possible that I
don't experience GPU lockup at all or a different one?

 There is no easy way to debug GPU lockup and no way at
 all than by staring a GPU command stream or making wild
 guess and testing things randomly.

Just to clarify: I imagine that a GPU command stream is specific to a
given GPU/driver. Does it mean that the commands sent to the GPU are
not the sames on different Linux platforms (e.g. ia64/r300 vs.
x86/r300)?

About GPU command, is this something I can read in the various
logfiles? Is there some kind of command generator to send a specific
command or command stream to the GPU in order to help determine which
one is the faulty one?

I don't know if these are the command sent to the GPU but, looking
again at the strace glxgears output I've recorded, I'm getting:
futex(0x6fd53420,
FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, NULL,
2004d1e8) = -1 EAGAIN (Resource temporarily unavailable)
and numerous
read(3, 0x600093e4, 4096)   = -1 EAGAIN (Resource
temporarily unavailable)
Should the return value of read() be equal to the number of blocks (I
imagine) passed as the third argument? In this case, before getting
EAGAIN error when trying to read blocks, I'm getting this following
sequence that seem to shift something:
writev(3, [{b\0\5\0\f\0\0\0BIG-REQUESTS, 20}], 1) = 20
poll([{fd=3, events=POLLIN}], 1, -1)= 1 ([{fd=3, revents=POLLIN}])
read(3, \1\0\1\0\0\0\0\0\1\216\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0,
4096) = 32
poll([{fd=3, events=POLLIN|POLLOUT}], 1, -1) = 1 ([{fd=3, revents=POLLOUT}])
writev(3, [{\216\0\1\0, 4}], 1)   = 4
poll([{fd=3, events=POLLIN}], 1, -1)= 1 ([{fd=3, revents=POLLIN}])
read(3, \1\0\2\0\0\0\0\0\377\377?\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0,
4096) = 32
read(3, 0x600093e4, 4096)   = -1 EAGAIN (Resource
temporarily unavailable)
poll([{fd=3, events=POLLIN|POLLOUT}], 1, -1) = 1 ([{fd=3, revents=POLLOUT}])
From there, all subsequent pair of read() calls fail.
By contrast, in the (old) strace glxgears excerpt posted here
(http://ubuntuforums.org/showthread.php?t=75007), the read calls seem
to always succeed.

Could this be a starting point or not at all?

Émeric

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: 3D OpenGL applications eat CPU ressources

2010-02-01 Thread Stephane Marchesin
On Mon, Feb 1, 2010 at 13:17, Émeric Maschino emeric.masch...@gmail.com wrote:
 2010/1/31 Jerome Glisse gli...@freedesktop.org:
 snip
 Eventually, strace log is flooded with
 ioctl(4, 0xc0106451, 0x6fd530f8) = 0
 roughly at the time the CPU charge increases. This is consistent with
 what is recorded in syslog:
 Jan 29 21:16:03 longspeak kernel: [  318.611783] [drm:drm_ioctl],
 pid=2426, cmd=0xc0106451, nr=0x51, dev 0xe200, auth=1
 Jan 29 21:16:03 longspeak kernel: [  318.611789]
 [drm:radeon_cp_getparam], pid=2426
 repeated several tens of thousands times where 2426 is glxgears PID.
 snip
 You are hitting GPU lockup which traduce by userspace keep
 trying the same ioctl over and over again which completely
 eat all CPU.

 Thank you for clarifying. Does GPU lockup mean that this problem is
 specific to my current hardware configuration? If I try an other
 graphics adapter (choices are scarce on ia64), is it possible that I
 don't experience GPU lockup at all or a different one?

 There is no easy way to debug GPU lockup and no way at
 all than by staring a GPU command stream or making wild
 guess and testing things randomly.

 Just to clarify: I imagine that a GPU command stream is specific to a
 given GPU/driver. Does it mean that the commands sent to the GPU are
 not the sames on different Linux platforms (e.g. ia64/r300 vs.
 x86/r300)?

 About GPU command, is this something I can read in the various
 logfiles? Is there some kind of command generator to send a specific
 command or command stream to the GPU in order to help determine which
 one is the faulty one?

 I don't know if these are the command sent to the GPU but, looking
 again at the strace glxgears output I've recorded, I'm getting:
 futex(0x6fd53420,
 FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1, NULL,
 2004d1e8) = -1 EAGAIN (Resource temporarily unavailable)
 and numerous
 read(3, 0x600093e4, 4096)       = -1 EAGAIN (Resource
 temporarily unavailable)
 Should the return value of read() be equal to the number of blocks (I
 imagine) passed as the third argument? In this case, before getting
 EAGAIN error when trying to read blocks, I'm getting this following
 sequence that seem to shift something:
 writev(3, [{b\0\5\0\f\0\0\0BIG-REQUESTS, 20}], 1) = 20
 poll([{fd=3, events=POLLIN}], 1, -1)    = 1 ([{fd=3, revents=POLLIN}])
 read(3, \1\0\1\0\0\0\0\0\1\216\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0,
 4096) = 32
 poll([{fd=3, events=POLLIN|POLLOUT}], 1, -1) = 1 ([{fd=3, revents=POLLOUT}])
 writev(3, [{\216\0\1\0, 4}], 1)       = 4
 poll([{fd=3, events=POLLIN}], 1, -1)    = 1 ([{fd=3, revents=POLLIN}])
 read(3, \1\0\2\0\0\0\0\0\377\377?\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0,
 4096) = 32
 read(3, 0x600093e4, 4096)       = -1 EAGAIN (Resource
 temporarily unavailable)
 poll([{fd=3, events=POLLIN|POLLOUT}], 1, -1) = 1 ([{fd=3, revents=POLLOUT}])
 From there, all subsequent pair of read() calls fail.
 By contrast, in the (old) strace glxgears excerpt posted here
 (http://ubuntuforums.org/showthread.php?t=75007), the read calls seem
 to always succeed.

 Could this be a starting point or not at all?


If an ia64 machine lockups, it will usually store an MCA telling you
about why it locked/where in the code this happened.
This is how I got ia64 DRI going a bunch of years ago. For what it's
worth, most of the bugs were:
- pci resources casted to 32 bit in the DRM
- some 32 bit adresses but that got fixed as a side effect of us
having x86_64 supported now
- large (32 or 64 bit) writes to I/O areas (should be all 8 bit, the
ia64 crashes otherwise) either from the kernel or from user space

Really to track those the MCA errors proved extremely useful. Usually
they carry a pci adress and all...

Stephane

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


3D OpenGL applications eat CPU ressources

2010-01-31 Thread Émeric Maschino
Hello,

I really don't know where to start, so feel free to redirect me to the
right mailing list if this one is not the correct one.

[Summary]
I'm trying to help revive 3D hardware acceleration on ia64
architecture. This is a very long story that started in 2006
(http://bugs.freedesktop.org/show_bug.cgi?id=7770).

Currently, DRI can't be activated at all because of a regression
introduced during kernel 2.6.30 development cycle
(http://marc.info/?l=linux-ia64m=126419878611433w=2). I've bisected
the regression to commit f1a2a9b6189f9f5c27672d4d32fec9492c6486b2
(drm: Preserve SHMLBA bits in hash key for _DRM_SHM mappings). Simply
reverting it from current kernel source enables DRI again on ia64.
I've asked for help several times from the author (David S. Miller
da...@davemloft.net) through the linux-ia64 list and by contacting
him directly but got no answer at this time. So I really don't know
what to do with this patch. I bet that asking for its removal from the
kernel source is not an acceptable solution, isn't it?
[End of summary]

Anyway, with DRI enabled, I'm now trying to make it works again. My
ia64 workstation sports an ATI FireGL X1 AGP adapter. I'm using the
r300 open source driver. As soon as an 3D OpenGL application is
started (e.g. glxgears), it eats CPU ressources within seconds.
Switching between XAA/EXA acceleration makes no difference. Reducing
AGP speed from 2x (set by default when no xorg.conf file is present)
to 1x has little impact (the offending application takes 3 sec. rather
than 1-2 sec. to eat CPU ressources). The system isn't locked as it
can be remotely rebooted, but is really unusable once a 3D OpenGL
application has started eating CPU. Killing the offending application
makes the X server eats CPU ressources. This behaviour is consistent
with what I noticed one year ago with older X.org X server
(http://bugs.freedesktop.org/show_bug.cgi?id=7770#c42), so I bet the
problem is still there with current X.org implementation (I'm using
X.org X Server 1.7.4 on a Debian Squeeze Testing distribution).

I don't know what information is useful, so I simply straced glxgears
with drm.debug=1 passed to kernel with my current hardware
configuration. Eventually, strace log is flooded with
ioctl(4, 0xc0106451, 0x6fd530f8) = 0
roughly at the time the CPU charge increases. This is consistent with
what is recorded in syslog:
Jan 29 21:16:03 longspeak kernel: [  318.611783] [drm:drm_ioctl],
pid=2426, cmd=0xc0106451, nr=0x51, dev 0xe200, auth=1
Jan 29 21:16:03 longspeak kernel: [  318.611789]
[drm:radeon_cp_getparam], pid=2426
repeated several tens of thousands times where 2426 is glxgears PID.
Is this 0xc0106451 command a valuable information?

I don't know if it's informative either, but enabling the side-bar in
GNOME Shell eats CPU ressources too and syslog is flooded with:
Jan 30 12:38:26 longspeak kernel: [  325.146380] [drm:radeon_do_cp_idle],
Jan 30 12:38:26 longspeak kernel: [  325.332672]
[drm:radeon_do_wait_for_idle], wait idle failed status : 0x84110140
0x9C000800
Jan 30 12:38:26 longspeak kernel: [  325.332676]
[drm:radeon_do_release], radeon_do_cp_idle -16
Does this failed status provides a useful starting point?

Thanks for reading and any advice/suggestion welcome.

Émeric

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel


Re: 3D OpenGL applications eat CPU ressources

2010-01-31 Thread Jerome Glisse
On Sun, Jan 31, 2010 at 02:28:39PM +0100, Émeric Maschino wrote:
 Hello,
 
 I really don't know where to start, so feel free to redirect me to the
 right mailing list if this one is not the correct one.
 
 [Summary]
 I'm trying to help revive 3D hardware acceleration on ia64
 architecture. This is a very long story that started in 2006
 (http://bugs.freedesktop.org/show_bug.cgi?id=7770).
 
 Currently, DRI can't be activated at all because of a regression
 introduced during kernel 2.6.30 development cycle
 (http://marc.info/?l=linux-ia64m=126419878611433w=2). I've bisected
 the regression to commit f1a2a9b6189f9f5c27672d4d32fec9492c6486b2
 (drm: Preserve SHMLBA bits in hash key for _DRM_SHM mappings). Simply
 reverting it from current kernel source enables DRI again on ia64.
 I've asked for help several times from the author (David S. Miller
 da...@davemloft.net) through the linux-ia64 list and by contacting
 him directly but got no answer at this time. So I really don't know
 what to do with this patch. I bet that asking for its removal from the
 kernel source is not an acceptable solution, isn't it?
 [End of summary]
 
 Anyway, with DRI enabled, I'm now trying to make it works again. My
 ia64 workstation sports an ATI FireGL X1 AGP adapter. I'm using the
 r300 open source driver. As soon as an 3D OpenGL application is
 started (e.g. glxgears), it eats CPU ressources within seconds.
 Switching between XAA/EXA acceleration makes no difference. Reducing
 AGP speed from 2x (set by default when no xorg.conf file is present)
 to 1x has little impact (the offending application takes 3 sec. rather
 than 1-2 sec. to eat CPU ressources). The system isn't locked as it
 can be remotely rebooted, but is really unusable once a 3D OpenGL
 application has started eating CPU. Killing the offending application
 makes the X server eats CPU ressources. This behaviour is consistent
 with what I noticed one year ago with older X.org X server
 (http://bugs.freedesktop.org/show_bug.cgi?id=7770#c42), so I bet the
 problem is still there with current X.org implementation (I'm using
 X.org X Server 1.7.4 on a Debian Squeeze Testing distribution).
 
 I don't know what information is useful, so I simply straced glxgears
 with drm.debug=1 passed to kernel with my current hardware
 configuration. Eventually, strace log is flooded with
 ioctl(4, 0xc0106451, 0x6fd530f8) = 0
 roughly at the time the CPU charge increases. This is consistent with
 what is recorded in syslog:
 Jan 29 21:16:03 longspeak kernel: [  318.611783] [drm:drm_ioctl],
 pid=2426, cmd=0xc0106451, nr=0x51, dev 0xe200, auth=1
 Jan 29 21:16:03 longspeak kernel: [  318.611789]
 [drm:radeon_cp_getparam], pid=2426
 repeated several tens of thousands times where 2426 is glxgears PID.
 Is this 0xc0106451 command a valuable information?
 
 I don't know if it's informative either, but enabling the side-bar in
 GNOME Shell eats CPU ressources too and syslog is flooded with:
 Jan 30 12:38:26 longspeak kernel: [  325.146380] [drm:radeon_do_cp_idle],
 Jan 30 12:38:26 longspeak kernel: [  325.332672]
 [drm:radeon_do_wait_for_idle], wait idle failed status : 0x84110140
 0x9C000800
 Jan 30 12:38:26 longspeak kernel: [  325.332676]
 [drm:radeon_do_release], radeon_do_cp_idle -16
 Does this failed status provides a useful starting point?
 
 Thanks for reading and any advice/suggestion welcome.
 
 Émeric

You are hitting GPU lockup which traduce by userspace keep
trying the same ioctl over and over again which completely
eat all CPU.

There is no easy way to debug GPU lockup and no way at
all than by staring a GPU command stream or making wild
guess and testing things randomly.

Cheers,
Jerome

--
The Planet: dedicated and managed hosting, cloud storage, colocation
Stay online with enterprise data centers and the best network in the business
Choose flexible plans and management services without long-term contracts
Personal 24x7 support from experience hosting pros just a phone call away.
http://p.sf.net/sfu/theplanet-com
--
___
Dri-devel mailing list
Dri-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dri-devel