Re: [REGRESSION] nouveau: Memory corruption using nva3 engine for 0xaf
On Mon, Jul 09, 2012 at 03:13:25PM +0200, Henrik Rydberg wrote: > On Thu, Jul 05, 2012 at 10:34:10AM +0200, Henrik Rydberg wrote: > > On Thu, Jul 05, 2012 at 08:54:46AM +0200, Henrik Rydberg wrote: > > > > Thanks for tracking down the source of this corruption. I don't have > > > > any such hardware, so until someone can figure it out, I think we > > > > should apply this patch. > > > > > > In that case, I would have to massage the patch a bit first; it > > > creates a problem with suspend/resume. Might be something with > > > nva3_pm.c, who knows. I am really stabbing in the dark here. :-) > > > > It seems the suspend/resume problem is unrelated (bad systemd update), > > so I am fine with applying this as is. Obviously not the best > > solution, and if I have time I will continue to look for problems in > > the nva3 copy code, but for now, > > > > Signed-off-by: Henrik Rydberg > > I have not encountered the problem in a long while, and I do not have > the patch applied. It is entirely possible that this was fixed by > something else. Unless you have already applied the patch, I would > suggest holding on to it to see if the problem reappears. > > Sorry for the churn. ... and there it was again, hours after giving up on it. Oh well. What makes this bug particularly difficult is that as soon as the patch is applied, the problem disappears and does not show itself again - with or without the patch applied. Sounds very much like the problem is a failure state that does not get reset by current mainline, but somehow gets reset with the patch applied. I also learnt that the problem is not in the nva3_copy code itself; I reverted nva3_copy.c and nva3_pm.c back to v3.4, but the problem persisted. A DMA problem elsewhere, in the drm code or in the pci layer, seems more likely than this particular hardware having problems with this particular copy engine. As it stands, though, applying the patch is the only thing known to work. Thanks, Henrik ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[REGRESSION] nouveau: Memory corruption using nva3 engine for 0xaf
On Mon, Jul 09, 2012 at 03:13:25PM +0200, Henrik Rydberg wrote: > On Thu, Jul 05, 2012 at 10:34:10AM +0200, Henrik Rydberg wrote: > > On Thu, Jul 05, 2012 at 08:54:46AM +0200, Henrik Rydberg wrote: > > > > Thanks for tracking down the source of this corruption. I don't have > > > > any such hardware, so until someone can figure it out, I think we > > > > should apply this patch. > > > > > > In that case, I would have to massage the patch a bit first; it > > > creates a problem with suspend/resume. Might be something with > > > nva3_pm.c, who knows. I am really stabbing in the dark here. :-) > > > > It seems the suspend/resume problem is unrelated (bad systemd update), > > so I am fine with applying this as is. Obviously not the best > > solution, and if I have time I will continue to look for problems in > > the nva3 copy code, but for now, > > > > Signed-off-by: Henrik Rydberg > > I have not encountered the problem in a long while, and I do not have > the patch applied. It is entirely possible that this was fixed by > something else. Unless you have already applied the patch, I would > suggest holding on to it to see if the problem reappears. > > Sorry for the churn. ... and there it was again, hours after giving up on it. Oh well. What makes this bug particularly difficult is that as soon as the patch is applied, the problem disappears and does not show itself again - with or without the patch applied. Sounds very much like the problem is a failure state that does not get reset by current mainline, but somehow gets reset with the patch applied. I also learnt that the problem is not in the nva3_copy code itself; I reverted nva3_copy.c and nva3_pm.c back to v3.4, but the problem persisted. A DMA problem elsewhere, in the drm code or in the pci layer, seems more likely than this particular hardware having problems with this particular copy engine. As it stands, though, applying the patch is the only thing known to work. Thanks, Henrik
[REGRESSION] nouveau: Memory corruption using nva3 engine for 0xaf
On Thu, Jul 05, 2012 at 10:34:10AM +0200, Henrik Rydberg wrote: > On Thu, Jul 05, 2012 at 08:54:46AM +0200, Henrik Rydberg wrote: > > > Thanks for tracking down the source of this corruption. I don't have > > > any such hardware, so until someone can figure it out, I think we > > > should apply this patch. > > > > In that case, I would have to massage the patch a bit first; it > > creates a problem with suspend/resume. Might be something with > > nva3_pm.c, who knows. I am really stabbing in the dark here. :-) > > It seems the suspend/resume problem is unrelated (bad systemd update), > so I am fine with applying this as is. Obviously not the best > solution, and if I have time I will continue to look for problems in > the nva3 copy code, but for now, > > Signed-off-by: Henrik Rydberg I have not encountered the problem in a long while, and I do not have the patch applied. It is entirely possible that this was fixed by something else. Unless you have already applied the patch, I would suggest holding on to it to see if the problem reappears. Sorry for the churn. Thanks, Henrik
Re: [REGRESSION] nouveau: Memory corruption using nva3 engine for 0xaf
On Thu, Jul 05, 2012 at 10:34:10AM +0200, Henrik Rydberg wrote: > On Thu, Jul 05, 2012 at 08:54:46AM +0200, Henrik Rydberg wrote: > > > Thanks for tracking down the source of this corruption. I don't have > > > any such hardware, so until someone can figure it out, I think we > > > should apply this patch. > > > > In that case, I would have to massage the patch a bit first; it > > creates a problem with suspend/resume. Might be something with > > nva3_pm.c, who knows. I am really stabbing in the dark here. :-) > > It seems the suspend/resume problem is unrelated (bad systemd update), > so I am fine with applying this as is. Obviously not the best > solution, and if I have time I will continue to look for problems in > the nva3 copy code, but for now, > > Signed-off-by: Henrik Rydberg I have not encountered the problem in a long while, and I do not have the patch applied. It is entirely possible that this was fixed by something else. Unless you have already applied the patch, I would suggest holding on to it to see if the problem reappears. Sorry for the churn. Thanks, Henrik ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[REGRESSION] nouveau: Memory corruption using nva3 engine for 0xaf
On Thu, Jul 05, 2012 at 08:31:13AM +0200, Henrik Rydberg wrote: > Hi Ben, Dave, Hey Henrik, > > Since 3.5-rc0, I have been experiencing occasional screen corruption > on my MacBookAir3,1, using a GeForce 320M (nv50, 0xaf). The X driver > version is xf86-video-nouvea-1.0.1-1 (arch). > > I do not know what the root problem is, but I have been able to > isolate the symptoms to the usage of nva3_copy.c. The patch below is > the least intrusive way I could find which kills the symptoms. > > Hopefully this will sched some light on the true problem, such that a > fix can be found for 3.5. Thanks for tracking down the source of this corruption. I don't have any such hardware, so until someone can figure it out, I think we should apply this patch. Cheers, Ben. > > Thanks, > Henrik > > The nva3 copy engine exhibits random memory corruption in at least one > case, the GeForce 320M (nv50, 0xaf) in the MacBookAir3,1. This patch > omits creating the engine for the specific chipset, falling back to > M2MF, which kills the symptoms. > --- Signed-off-by: Ben Skeggs > drivers/gpu/drm/nouveau/nouveau_state.c | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/drivers/gpu/drm/nouveau/nouveau_state.c > b/drivers/gpu/drm/nouveau/nouveau_state.c > index 19706f0..b466937 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_state.c > +++ b/drivers/gpu/drm/nouveau/nouveau_state.c > @@ -731,7 +731,6 @@ nouveau_card_init(struct drm_device *dev) > case 0xa3: > case 0xa5: > case 0xa8: > - case 0xaf: > nva3_copy_create(dev); > break; > } > > ___ > dri-devel mailing list > dri-devel at lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/dri-devel
[REGRESSION] nouveau: Memory corruption using nva3 engine for 0xaf
On Thu, Jul 05, 2012 at 08:54:46AM +0200, Henrik Rydberg wrote: > > Thanks for tracking down the source of this corruption. I don't have > > any such hardware, so until someone can figure it out, I think we > > should apply this patch. > > In that case, I would have to massage the patch a bit first; it > creates a problem with suspend/resume. Might be something with > nva3_pm.c, who knows. I am really stabbing in the dark here. :-) It seems the suspend/resume problem is unrelated (bad systemd update), so I am fine with applying this as is. Obviously not the best solution, and if I have time I will continue to look for problems in the nva3 copy code, but for now, Signed-off-by: Henrik Rydberg Thanks, Henrik
[REGRESSION] nouveau: Memory corruption using nva3 engine for 0xaf
> Thanks for tracking down the source of this corruption. I don't have > any such hardware, so until someone can figure it out, I think we > should apply this patch. In that case, I would have to massage the patch a bit first; it creates a problem with suspend/resume. Might be something with nva3_pm.c, who knows. I am really stabbing in the dark here. :-) Thanks, Henrik
[REGRESSION] nouveau: Memory corruption using nva3 engine for 0xaf
Hi Ben, Dave, Since 3.5-rc0, I have been experiencing occasional screen corruption on my MacBookAir3,1, using a GeForce 320M (nv50, 0xaf). The X driver version is xf86-video-nouvea-1.0.1-1 (arch). I do not know what the root problem is, but I have been able to isolate the symptoms to the usage of nva3_copy.c. The patch below is the least intrusive way I could find which kills the symptoms. Hopefully this will sched some light on the true problem, such that a fix can be found for 3.5. Thanks, Henrik The nva3 copy engine exhibits random memory corruption in at least one case, the GeForce 320M (nv50, 0xaf) in the MacBookAir3,1. This patch omits creating the engine for the specific chipset, falling back to M2MF, which kills the symptoms. --- drivers/gpu/drm/nouveau/nouveau_state.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_state.c b/drivers/gpu/drm/nouveau/nouveau_state.c index 19706f0..b466937 100644 --- a/drivers/gpu/drm/nouveau/nouveau_state.c +++ b/drivers/gpu/drm/nouveau/nouveau_state.c @@ -731,7 +731,6 @@ nouveau_card_init(struct drm_device *dev) case 0xa3: case 0xa5: case 0xa8: - case 0xaf: nva3_copy_create(dev); break; }
Re: [REGRESSION] nouveau: Memory corruption using nva3 engine for 0xaf
On Thu, Jul 05, 2012 at 08:54:46AM +0200, Henrik Rydberg wrote: > > Thanks for tracking down the source of this corruption. I don't have > > any such hardware, so until someone can figure it out, I think we > > should apply this patch. > > In that case, I would have to massage the patch a bit first; it > creates a problem with suspend/resume. Might be something with > nva3_pm.c, who knows. I am really stabbing in the dark here. :-) It seems the suspend/resume problem is unrelated (bad systemd update), so I am fine with applying this as is. Obviously not the best solution, and if I have time I will continue to look for problems in the nva3 copy code, but for now, Signed-off-by: Henrik Rydberg Thanks, Henrik ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [REGRESSION] nouveau: Memory corruption using nva3 engine for 0xaf
> Thanks for tracking down the source of this corruption. I don't have > any such hardware, so until someone can figure it out, I think we > should apply this patch. In that case, I would have to massage the patch a bit first; it creates a problem with suspend/resume. Might be something with nva3_pm.c, who knows. I am really stabbing in the dark here. :-) Thanks, Henrik ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [REGRESSION] nouveau: Memory corruption using nva3 engine for 0xaf
On Thu, Jul 05, 2012 at 08:31:13AM +0200, Henrik Rydberg wrote: > Hi Ben, Dave, Hey Henrik, > > Since 3.5-rc0, I have been experiencing occasional screen corruption > on my MacBookAir3,1, using a GeForce 320M (nv50, 0xaf). The X driver > version is xf86-video-nouvea-1.0.1-1 (arch). > > I do not know what the root problem is, but I have been able to > isolate the symptoms to the usage of nva3_copy.c. The patch below is > the least intrusive way I could find which kills the symptoms. > > Hopefully this will sched some light on the true problem, such that a > fix can be found for 3.5. Thanks for tracking down the source of this corruption. I don't have any such hardware, so until someone can figure it out, I think we should apply this patch. Cheers, Ben. > > Thanks, > Henrik > > The nva3 copy engine exhibits random memory corruption in at least one > case, the GeForce 320M (nv50, 0xaf) in the MacBookAir3,1. This patch > omits creating the engine for the specific chipset, falling back to > M2MF, which kills the symptoms. > --- Signed-off-by: Ben Skeggs > drivers/gpu/drm/nouveau/nouveau_state.c | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/drivers/gpu/drm/nouveau/nouveau_state.c > b/drivers/gpu/drm/nouveau/nouveau_state.c > index 19706f0..b466937 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_state.c > +++ b/drivers/gpu/drm/nouveau/nouveau_state.c > @@ -731,7 +731,6 @@ nouveau_card_init(struct drm_device *dev) > case 0xa3: > case 0xa5: > case 0xa8: > - case 0xaf: > nva3_copy_create(dev); > break; > } > > ___ > dri-devel mailing list > dri-devel@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/dri-devel ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
[REGRESSION] nouveau: Memory corruption using nva3 engine for 0xaf
Hi Ben, Dave, Since 3.5-rc0, I have been experiencing occasional screen corruption on my MacBookAir3,1, using a GeForce 320M (nv50, 0xaf). The X driver version is xf86-video-nouvea-1.0.1-1 (arch). I do not know what the root problem is, but I have been able to isolate the symptoms to the usage of nva3_copy.c. The patch below is the least intrusive way I could find which kills the symptoms. Hopefully this will sched some light on the true problem, such that a fix can be found for 3.5. Thanks, Henrik The nva3 copy engine exhibits random memory corruption in at least one case, the GeForce 320M (nv50, 0xaf) in the MacBookAir3,1. This patch omits creating the engine for the specific chipset, falling back to M2MF, which kills the symptoms. --- drivers/gpu/drm/nouveau/nouveau_state.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_state.c b/drivers/gpu/drm/nouveau/nouveau_state.c index 19706f0..b466937 100644 --- a/drivers/gpu/drm/nouveau/nouveau_state.c +++ b/drivers/gpu/drm/nouveau/nouveau_state.c @@ -731,7 +731,6 @@ nouveau_card_init(struct drm_device *dev) case 0xa3: case 0xa5: case 0xa8: - case 0xaf: nva3_copy_create(dev); break; } ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel