[Nouveau] [Bug 49243] graphical corruption with GeForce 6150SE nForce 430

2012-04-28 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=49243

--- Comment #3 from Shawn Landden  2012-04-28 23:12:08 
PDT ---
I booted with Fedora 17 Beta LiveCD (to test something completely differn't),
and it was pretty ugly with gnome-shell. For the most part I just got a big
blue screen with silhouettes of gnome-shell stuff.

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


[Nouveau] [Bug 43029] System won't boot using nouveau and Gainward Phantom adapters

2012-04-28 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=43029

Robert Riches  changed:

   What|Removed |Added

 CC||rm.ric...@jacob21819.net

--- Comment #8 from Robert Riches  2012-04-28 
22:01:44 PDT ---
I have similar symptoms with an Asus ENGTX560 DC/2DI/1GD5 card running Mageia 1
(kernel 2.6.38.8-server-10.mga).  According to modinfo, "srcversion:
7FFBFFA368D6517B0115747".  During boot with the normal kernel, nouveau said it
detected a NVc0 card, 0ce080a1 (if I wrote it down correctly).  Then, "fb:
conflicting fb hw usage nouveaufb vs VESA VGA - removing generic driver". 
Then, the machine was locked up so hard sysrq did nothing that I could discern.
 I had to use the hardware reset button.

Using the linux-nonfb GRUB option, which I understand points to a
non-framebuffer version of the non-updated kernel/initrd, I saw a nouveau stack
trace fly by on the screen, then the same hard lockup.

Using the failsafe option, which I understand is a different non-updated
kernel/initrd, booting gets farther before locking up, and sysrq is able to
reboot the machine.

Knoppix 6.4.4 produces a stack trace from drm or nouveau, and sysrq is able to
reboot the machine--blindly if I remember correctly.

Is there documentation of whether a later (Mageia 2, perhaps) kernel would work
with this card?

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


[Nouveau] [Bug 49243] graphical corruption with GeForce 6150SE nForce 430

2012-04-28 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=49243

Marcin Slusarz  changed:

   What|Removed |Added

  Attachment #60750|application/octet-stream|text/plain
  mime type||

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


[Nouveau] [Bug 49243] graphical corruption with GeForce 6150SE nForce 430

2012-04-28 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=49243

--- Comment #2 from Shawn Landden  2012-04-28 15:05:55 
PDT ---
Created attachment 60750
  --> https://bugs.freedesktop.org/attachment.cgi?id=60750
dmesg | grep nouveau

-- 
Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are the assignee for the bug.
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH v2 4/4] drm/nouveau: gpu lockup recovery

2012-04-28 Thread Marcin Slusarz
On Wed, Apr 25, 2012 at 11:20:36PM +0200, Marcin Slusarz wrote:
> Overall idea:
> Detect lockups by watching for timeouts (vm flush / fence), return -EIOs,
> handle them at ioctl level, reset the GPU and repeat last ioctl.
> 
> GPU reset is done by doing suspend / resume cycle with few tweaks:
> - CPU-only bo eviction
> - ignoring vm flush / fence timeouts
> - shortening waits
> 
> Signed-off-by: Marcin Slusarz 
> ---

Martin,

I'm wondering how below patch (which builds upon the above) affects
reclocking stability. I can't test it on my card, because it has only
one performance level. Can you test it on yours?

---
From: Marcin Slusarz 
Subject: [PATCH] drm/nouveau: take ioctls_rwsem before reclocking

Signed-off-by: Marcin Slusarz 
---
 drivers/gpu/drm/nouveau/nouveau_pm.c|6 ++
 drivers/gpu/drm/nouveau/nouveau_reset.c |2 +-
 2 files changed, 7 insertions(+), 1 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_pm.c 
b/drivers/gpu/drm/nouveau/nouveau_pm.c
index 34d591b..4716f39 100644
--- a/drivers/gpu/drm/nouveau/nouveau_pm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_pm.c
@@ -383,9 +383,15 @@ nouveau_pm_set_perflvl(struct device *d, struct 
device_attribute *a,
   const char *buf, size_t count)
 {
struct drm_device *dev = pci_get_drvdata(to_pci_dev(d));
+   struct drm_nouveau_private *dev_priv = dev->dev_private;
int ret;
 
+   intr_rwsem_down_write(&dev_priv->ioctls_rwsem);
+
ret = nouveau_pm_profile_set(dev, buf);
+
+   intr_rwsem_up_write(&dev_priv->ioctls_rwsem);
+
if (ret)
return ret;
return strlen(buf);
diff --git a/drivers/gpu/drm/nouveau/nouveau_reset.c 
b/drivers/gpu/drm/nouveau/nouveau_reset.c
index e893096..7c25a3c 100644
--- a/drivers/gpu/drm/nouveau/nouveau_reset.c
+++ b/drivers/gpu/drm/nouveau/nouveau_reset.c
@@ -139,7 +139,7 @@ int nouveau_reset_device(struct drm_device *dev)
end = jiffies;
NV_INFO(dev, "GPU reset done, took %lu s\n", (end - start) / 
DRM_HZ);
while 
(intr_rwsem_down_read_interruptible(&dev_priv->ioctls_rwsem))
-   ; /* not possible, we are holding reset_lock */
+   ;
}
mutex_unlock(&dev_priv->reset_lock);
 
-- 
1.7.8.5

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH v2 4/4] drm/nouveau: gpu lockup recovery

2012-04-28 Thread Marcin Slusarz
On Thu, Apr 26, 2012 at 05:32:29PM +1000, Ben Skeggs wrote:
> On Wed, 2012-04-25 at 23:20 +0200, Marcin Slusarz wrote:
> > Overall idea:
> > Detect lockups by watching for timeouts (vm flush / fence), return -EIOs,
> > handle them at ioctl level, reset the GPU and repeat last ioctl.
> > 
> > GPU reset is done by doing suspend / resume cycle with few tweaks:
> > - CPU-only bo eviction
> > - ignoring vm flush / fence timeouts
> > - shortening waits
> Okay.  I've thought about this a bit for a couple of days and think I'll
> be able to coherently share my thoughts on this issue now :)
> 
> Firstly, while I agree that we need to become more resilient to errors,
> I don't think that following in the radeon/intel footsteps with
> something (imo, hackish) like this is the right choice for us
> necessarily.

This is not only radeon/intel way. Windows, since Vista SP1, does the
same - see http://msdn.microsoft.com/en-us/windows/hardware/gg487368.
It's funny how similar it is to this patch (I haven't seen this page earlier).

If you fear people will stop reporting bugs - don't. GPU reset is painfully
slow and can take up to 50 seconds (BO eviction is the most time consuming
part), so people will be annoyed enough to report them.
Currently, GPU lockups make users so angry, they frequently switch to blob
without even thinking about reporting anything.

> The *vast* majority of "lockups" we have are as a result of us badly
> mishandling exceptions reported to us by the GPU.  There are a couple of
> exceptions, however, they're very rare..

> A very common example is where people gain DMA_PUSHERs for whatever
> reason, and things go haywire eventually.

Nope, I had tens of lockups during testing, and only once I had DMA_PUSHER
before detecting GPU lockup.

> To handle a DMA_PUSHER
> sanely, generally you have to drop all pending commands for the channel
> (set GET=PUT, etc) and continue on.  However, this leaves us with fences
> and semaphores unsignalled etc, causing issues further up the stack with
> perfectly good channels hanging on attempting to sync with the crashed
> channel etc.
> 
> The next most common example I can think of is nv4x hardware, getting a
> LIMIT_COLOR/ZETA exception from PGRAPH, and then a hang.  The solution
> is simple, learn how to handle the exception, log it, and PGRAPH
> survives.
> 
> I strongly believe that if we focused our efforts on dealing with what
> the GPU reports to us a lot better, we'll find we really don't need such
> "lockup recovery".

While I agree we need to improve on error handling to make "lockup recovery"
not needed, the reality is we can't predict everything and driver needs to
cope with its own bugs.

> I am, however, considering pulling the vm flush timeout error
> propagation and break-out-of-waits-on-signals that builds on it.  As we
> really do need to become better at having killable processes if things
> go wrong :)

Good :)

Marcin
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/nouveau