date:20120502

[Nouveau] [Bug 49351] glx-swap-pixmap piglit test breaks display

2012-05-02 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=49351

--- Comment #5 from Michel Dänzer mic...@daenzer.net 2012-05-02 01:09:26 PDT 
---
Looks like bug 42913; the X driver needs to not try and flip pixmaps.

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH v2 4/4] drm/nouveau: gpu lockup recovery

2012-05-02 Thread Ben Skeggs

On Sat, 2012-04-28 at 16:49 +0200, Marcin Slusarz wrote:
 On Thu, Apr 26, 2012 at 05:32:29PM +1000, Ben Skeggs wrote:
  On Wed, 2012-04-25 at 23:20 +0200, Marcin Slusarz wrote:
   Overall idea:
   Detect lockups by watching for timeouts (vm flush / fence), return -EIOs,
   handle them at ioctl level, reset the GPU and repeat last ioctl.
   
   GPU reset is done by doing suspend / resume cycle with few tweaks:
   - CPU-only bo eviction
   - ignoring vm flush / fence timeouts
   - shortening waits
  Okay.  I've thought about this a bit for a couple of days and think I'll
  be able to coherently share my thoughts on this issue now :)
  
  Firstly, while I agree that we need to become more resilient to errors,
  I don't think that following in the radeon/intel footsteps with
  something (imo, hackish) like this is the right choice for us
  necessarily.
 
 This is not only radeon/intel way. Windows, since Vista SP1, does the
 same - see http://msdn.microsoft.com/en-us/windows/hardware/gg487368.
 It's funny how similar it is to this patch (I haven't seen this page earlier).
Yes, I am aware of this feature in Windows.  And I'm not arguing that
something like it isn't necessary.

 
 If you fear people will stop reporting bugs - don't. GPU reset is painfully
 slow and can take up to 50 seconds (BO eviction is the most time consuming
 part), so people will be annoyed enough to report them.
 Currently, GPU lockups make users so angry, they frequently switch to blob
 without even thinking about reporting anything.
I'm not so concerned about the lost bug reports, I expect the same
people that are actually willing to report bugs now will continue to do
so :)

 
  The *vast* majority of lockups we have are as a result of us badly
  mishandling exceptions reported to us by the GPU.  There are a couple of
  exceptions, however, they're very rare..
 
  A very common example is where people gain DMA_PUSHERs for whatever
  reason, and things go haywire eventually.
 
 Nope, I had tens of lockups during testing, and only once I had DMA_PUSHER
 before detecting GPU lockup.
Out of curiosity, what were the lockup situations you were triggering
exactly?

 
  To handle a DMA_PUSHER
  sanely, generally you have to drop all pending commands for the channel
  (set GET=PUT, etc) and continue on.  However, this leaves us with fences
  and semaphores unsignalled etc, causing issues further up the stack with
  perfectly good channels hanging on attempting to sync with the crashed
  channel etc.
  
  The next most common example I can think of is nv4x hardware, getting a
  LIMIT_COLOR/ZETA exception from PGRAPH, and then a hang.  The solution
  is simple, learn how to handle the exception, log it, and PGRAPH
  survives.
  
  I strongly believe that if we focused our efforts on dealing with what
  the GPU reports to us a lot better, we'll find we really don't need such
  lockup recovery.
 
 While I agree we need to improve on error handling to make lockup recovery
 not needed, the reality is we can't predict everything and driver needs to
 cope with its own bugs.
Right, again, I don't disagree :)  I think we can improve a lot on the
big-hammer-suspend-the-gpu solution though, and instead reset only the
faulting engine.  It's (in theory) almost possible for us to do now, but
I have a couple of reworks to areas related to this pending (basically,
making the various driver subsystems more independent), which should be
ready soon.  This'll go a long way to making it very easy to reset a
single engine, and likely result in *far* faster recovery from hangs.

 
  I am, however, considering pulling the vm flush timeout error
  propagation and break-out-of-waits-on-signals that builds on it.  As we
  really do need to become better at having killable processes if things
  go wrong :)
 
 Good :)
 
 Marcin
 ___
 Nouveau mailing list
 Nouveau@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/nouveau


___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH v2 4/4] drm/nouveau: gpu lockup recovery

2012-05-02 Thread Martin Peres


On 02/05/2012 13:28, Ben Skeggs wrote:

Right, again, I don't disagree :)  I think we can improve a lot on the
big-hammer-suspend-the-gpu solution though, and instead reset only the
faulting engine.  It's (in theory) almost possible for us to do now, but
I have a couple of reworks to areas related to this pending (basically,
making the various driver subsystems more independent), which should be
ready soon.  This'll go a long way to making it very easy to reset a
single engine, and likely result in *far* faster recovery from hangs.

Hey,

What about kicking a channel that put the card in a bad state? Wouldn't 
that be possible?


This way, we don't loose the context of other channels and only the 
application that hang the card will be exited.


I wonder how pfifo handles commands sent to a non-existing channel, but 
I'm sure it shouldn't hang or anything.


Anyway, if this is not possible to only kick one channel, then what 
about kicking all channels, rePOSTING the card and using KMS to output 
the lockup report (and send a notification of the report through udev 
and store the report in a sysfs file)?


Let's not try to be perfect, let us just be able to do better bug reports.

Martin
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH v2 4/4] drm/nouveau: gpu lockup recovery

2012-05-02 Thread Ben Skeggs

On Wed, 2012-05-02 at 15:33 +0200, Martin Peres wrote:
 On 02/05/2012 13:28, Ben Skeggs wrote:
  Right, again, I don't disagree :)  I think we can improve a lot on the
  big-hammer-suspend-the-gpu solution though, and instead reset only the
  faulting engine.  It's (in theory) almost possible for us to do now, but
  I have a couple of reworks to areas related to this pending (basically,
  making the various driver subsystems more independent), which should be
  ready soon.  This'll go a long way to making it very easy to reset a
  single engine, and likely result in *far* faster recovery from hangs.
 Hey,
 
 What about kicking a channel that put the card in a bad state? Wouldn't 
 that be possible?
 
 This way, we don't loose the context of other channels and only the 
 application that hang the card will be exited.
That's pretty much the idea.  The trouble comes in where PFIFO will hang
waiting for the stuck engine to report that it's done (eg. it will wait
for PGRAPH to go i've finished unloading my context now after it's
told PGRAPH to do so).

Hence why it's important to be able to (preferably) un-stick the stuck
engine (usually handling the appropriate interrupts properly will
achieve this), and failing that, reset it and lose the context for just
that channel.

The work I'm doing at the moment will, among other nice things, make
handling all of this a lot nicer.  And it should be nice and speedy in
comparison to the suspend/resume option, we won't have to evict all
buffers from vram without accel, which can take quite a while (not to
mention that it might not even be possible to get to the VRAM not mapped
into the FB BAR on earlier chipsets if accel dies).

 
 I wonder how pfifo handles commands sent to a non-existing channel, but 
 I'm sure it shouldn't hang or anything.
It can't happen anyway, if we destroyed the fifo context for a channel
we wouldn't be telling it to execute commands still :)

 
 Anyway, if this is not possible to only kick one channel, then what 
 about kicking all channels, rePOSTING the card and using KMS to output 
 the lockup report (and send a notification of the report through udev 
 and store the report in a sysfs file)?
 
 Let's not try to be perfect, let us just be able to do better bug reports.
I'm still skeptical about how useful any kind of generic lockup report
can possibly be, beyond kernel logs..  However, as part of the work I'm
working on, there may be some additional information available via
debugfs..  I don't wan't to elaborate on this too much yet until I wrap
my head around what exactly I want to achieve, but I'll give you a
heads-up once I do :)

Ben.

 
 Martin
 ___
 Nouveau mailing list
 Nouveau@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/nouveau


___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

Re: [Nouveau] [PATCH v2 4/4] drm/nouveau: gpu lockup recovery

2012-05-02 Thread Martin Peres


On 02/05/2012 15:48, Ben Skeggs wrote:

On Wed, 2012-05-02 at 15:33 +0200, Martin Peres wrote:

On 02/05/2012 13:28, Ben Skeggs wrote:

Right, again, I don't disagree :)  I think we can improve a lot on the
big-hammer-suspend-the-gpu solution though, and instead reset only the
faulting engine.  It's (in theory) almost possible for us to do now, but
I have a couple of reworks to areas related to this pending (basically,
making the various driver subsystems more independent), which should be
ready soon.  This'll go a long way to making it very easy to reset a
single engine, and likely result in *far* faster recovery from hangs.

Hey,

What about kicking a channel that put the card in a bad state? Wouldn't
that be possible?

This way, we don't loose the context of other channels and only the
application that hang the card will be exited.

That's pretty much the idea.  The trouble comes in where PFIFO will hang
waiting for the stuck engine to report that it's done (eg. it will wait
for PGRAPH to go i've finished unloading my context now after it's
told PGRAPH to do so).

Hence why it's important to be able to (preferably) un-stick the stuck
engine (usually handling the appropriate interrupts properly will
achieve this), and failing that, reset it and lose the context for just
that channel.

The work I'm doing at the moment will, among other nice things, make
handling all of this a lot nicer.  And it should be nice and speedy in
comparison to the suspend/resume option, we won't have to evict all
buffers from vram without accel, which can take quite a while (not to
mention that it might not even be possible to get to the VRAM not mapped
into the FB BAR on earlier chipsets if accel dies).

I get it, that seems nice and good.



I wonder how pfifo handles commands sent to a non-existing channel, but
I'm sure it shouldn't hang or anything.

It can't happen anyway, if we destroyed the fifo context for a channel
we wouldn't be telling it to execute commands still :)
Right, but there may still be some commands left in the IB ring buffer, 
right?



Anyway, if this is not possible to only kick one channel, then what
about kicking all channels, rePOSTING the card and using KMS to output
the lockup report (and send a notification of the report through udev
and store the report in a sysfs file)?

Let's not try to be perfect, let us just be able to do better bug reports.

I'm still skeptical about how useful any kind of generic lockup report
can possibly be, beyond kernel logs..  However, as part of the work I'm
working on, there may be some additional information available via
debugfs..  I don't wan't to elaborate on this too much yet until I wrap
my head around what exactly I want to achieve, but I'll give you a
heads-up once I do :)
Well, a good report is important so as we can have an idea of what went 
wrong

and also, that would allow us to differenciate bug reports.

Basically, I'm now convinced that the nvaX random lockup is not actually 
one issue.

Having such an enhanced bug report could allow us to verify this theory.

PS: Speaking about nvaX lockups. I still get lockups (nva3/5) and I 
suspect that the
problem comes from the context switching micro code. Not loosing the 
email I'm writing

simply because kwin's channel crashed would be a big win to me.

Martin
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

[Nouveau] [Bug 49397] Not able to set monitor resolution

2012-05-02 Thread bugzilla-daemon

https://bugs.freedesktop.org/show_bug.cgi?id=49397

Aaron Plattner aplatt...@nvidia.com changed:

   What|Removed |Added

 AssignedTo|aplatt...@nvidia.com|nouveau@lists.freedesktop.o
   ||rg
  Component|Driver/nVidia (open)|Driver/nouveau

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau

[Nouveau] [Bug 49351] glx-swap-pixmap piglit test breaks display

Re: [Nouveau] [PATCH v2 4/4] drm/nouveau: gpu lockup recovery

Re: [Nouveau] [PATCH v2 4/4] drm/nouveau: gpu lockup recovery

Re: [Nouveau] [PATCH v2 4/4] drm/nouveau: gpu lockup recovery

Re: [Nouveau] [PATCH v2 4/4] drm/nouveau: gpu lockup recovery

[Nouveau] [Bug 49397] Not able to set monitor resolution

6 matches

Site Navigation

Mail list logo

Footer information