[PATCH] nouveau: Skip unvailable ttm page entries

2021-03-13 Thread Tobias Klausmann
Starting with commit f295c8cfec833c2707ff1512da10d65386dde7af
("drm/nouveau: fix dma syncing warning with debugging on.")
the following oops occures:

   BUG: kernel NULL pointer dereference, address: 
   #PF: supervisor read access in kernel mode
   #PF: error_code(0x) - not-present page
   PGD 0 P4D 0
   Oops:  [#1] PREEMPT SMP PTI
   CPU: 6 PID: 1013 Comm: Xorg.bin Tainted: G E 5.11.0-desktop-rc0+ #2
   Hardware name: Acer Aspire VN7-593G/Pluto_KLS, BIOS V1.11 08/01/2018
   RIP: 0010:nouveau_bo_sync_for_device+0x40/0xb0 [nouveau]
   Call Trace:
nouveau_bo_validate+0x5d/0x80 [nouveau]
nouveau_gem_ioctl_pushbuf+0x662/0x1120 [nouveau]
? nouveau_gem_ioctl_new+0xf0/0xf0 [nouveau]
drm_ioctl_kernel+0xa6/0xf0 [drm]
drm_ioctl+0x1f4/0x3a0 [drm]
? nouveau_gem_ioctl_new+0xf0/0xf0 [nouveau]
nouveau_drm_ioctl+0x50/0xa0 [nouveau]
__x64_sys_ioctl+0x7e/0xb0
do_syscall_64+0x33/0x80
entry_SYSCALL_64_after_hwframe+0x44/0xae
   ---[ end trace ccfb1e7f4064374f ]---
   RIP: 0010:nouveau_bo_sync_for_device+0x40/0xb0 [nouveau]

The underlying problem is not introduced by the commit, yet it uncovered the
underlying issue. The cited commit relies on valid pages. This is not given for
due to some bugs. For now, just warn and work around the issue by just ignoring
the bad ttm objects.
Below is some debug info gathered while debugging this issue:

nouveau :01:00.0: DRM: ttm_dma->num_pages: 2048
nouveau :01:00.0: DRM: ttm_dma->pages is NULL
nouveau :01:00.0: DRM: ttm_dma: e96058e7
nouveau :01:00.0: DRM: ttm_dma->page_flags:
nouveau :01:00.0: DRM: ttm_dma:   Populated: 1
nouveau :01:00.0: DRM: ttm_dma:   No Retry: 0
nouveau :01:00.0: DRM: ttm_dma:   SG: 256
nouveau :01:00.0: DRM: ttm_dma:   Zero Alloc: 0
nouveau :01:00.0: DRM: ttm_dma:   Swapped: 0

Signed-off-by: Tobias Klausmann 
---
 drivers/gpu/drm/nouveau/nouveau_bo.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c 
b/drivers/gpu/drm/nouveau/nouveau_bo.c
index fabb314a0b2f..5902e21d5dfe 100644
--- a/drivers/gpu/drm/nouveau/nouveau_bo.c
+++ b/drivers/gpu/drm/nouveau/nouveau_bo.c
@@ -551,6 +551,10 @@ nouveau_bo_sync_for_device(struct nouveau_bo *nvbo)
 
if (!ttm_dma)
return;
+   if (!ttm_dma->pages) {
+   NV_DEBUG(drm, "ttm_dma 0x%p: pages NULL\n", ttm_dma);
+   return;
+   }
 
/* Don't waste time looping if the object is coherent */
if (nvbo->force_coherent)
-- 
2.30.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH] nouveau: forward error generated while resuming objects tree

2019-03-29 Thread Tobias Klausmann
On a failed resume we may experience unrecoverable errors. Plumb the error code
through to actually let the driver fail. On a reverse-prime setup this helps the
drm subsystem to at least recover the integrated gpu.

This can especially happen with secboot timing out, leaving the hardware in a
non-functioning state.

Signed-off-by: Tobias Klausmann 
---
 drivers/gpu/drm/nouveau/nouveau_drm.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c 
b/drivers/gpu/drm/nouveau/nouveau_drm.c
index 5020265bfbd9..56a107f3a0e1 100644
--- a/drivers/gpu/drm/nouveau/nouveau_drm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_drm.c
@@ -802,10 +802,15 @@ nouveau_do_suspend(struct drm_device *dev, bool runtime)
 static int
 nouveau_do_resume(struct drm_device *dev, bool runtime)
 {
+   int ret = 0;
struct nouveau_drm *drm = nouveau_drm(dev);
 
NV_DEBUG(drm, "resuming object tree...\n");
-   nvif_client_resume(&drm->master.base);
+   ret = nvif_client_resume(&drm->master.base);
+   if (ret) {
+   NV_ERROR(drm, "Client resume failed with error: %d\n", ret);
+   return ret;
+   }
 
NV_DEBUG(drm, "resuming fence...\n");
if (drm->fence && nouveau_fence(drm)->resume)
@@ -925,6 +930,7 @@ nouveau_pmops_runtime_resume(struct device *dev)
 {
struct pci_dev *pdev = to_pci_dev(dev);
struct drm_device *drm_dev = pci_get_drvdata(pdev);
+   struct nouveau_drm *drm = nouveau_drm(drm_dev);
struct nvif_device *device = &nouveau_drm(drm_dev)->client.device;
int ret;
 
@@ -941,6 +947,10 @@ nouveau_pmops_runtime_resume(struct device *dev)
pci_set_master(pdev);
 
ret = nouveau_do_resume(drm_dev, true);
+   if (ret) {
+   NV_ERROR(drm, "resume failed with: %d\n", ret);
+   return ret;
+   }
 
/* do magic */
nvif_mask(&device->object, 0x088488, (1 << 25), (1 << 25));
-- 
2.21.0

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [Nouveau] [PATCH] drm/nouveau/therm/gp100: Do not report temperature when subdev is shadowed

2018-01-28 Thread Tobias Klausmann
Well fixing the return of wrong values in this function is reasonable by 
any means, of course not reading the mem in the first place would be 
nice, but deciding this is imho not in the scope of a temp_get function 
but somewhere in the code calling temp_get.


On 1/26/18 3:03 PM, Karol Herbst wrote:

well I just tried to say, that you are not fixing the issue you think
were fixing. In your case the GPU is powered off and you get garbage
values from any mmio read, so parsing those values is just wrong and
we need to prevent doing anything on the hw whenever it is powered off
directly in hwmon.

On Fri, Jan 26, 2018 at 2:40 PM, Tobias Klausmann
 wrote:

Not sure if i understand completely what you intend to say here, with this
we prevent hwmon from reporting utterly wrong temperature values returning
an error (we could return -EBUSY or somehting instead, granted), yet if the
device is shadowed, getting a sane temp value out of is seems unlikely to
me!

Greetings,

Tobias


On 1/26/18 12:40 PM, Karol Herbst wrote:

no, we can't do that. We actually have to prevent this from hwom. The
issue here is, that the reg read returns 0x and parsing that
is the first step in the first place.

On Thu, Jan 25, 2018 at 7:16 PM, Tobias Klausmann
 wrote:

This fixes wrong temperature outputs e.g. 511°C if the card is asleep.

Signed-off-by: Tobias Klausmann 
---
   drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c | 4 +++-
   1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c
b/drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c
index 9f0dea3f61dc..45d0ec632b5a 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c
@@ -32,8 +32,10 @@ gp100_temp_get(struct nvkm_therm *therm)
  u32 inttemp = (tsensor & 0x0001fff8);

  /* device SHADOWed */
-   if (tsensor & 0x4000)
+   if (tsensor & 0x4000) {
  nvkm_trace(subdev, "reading temperature from SHADOWed
sensor\n");
+   return -ENODEV;
+   }

  /* device valid */
  if (tsensor & 0x2000)
--
2.16.1

___
Nouveau mailing list
nouv...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [Nouveau] [PATCH] drm/nouveau/therm/gp100: Do not report temperature when subdev is shadowed

2018-01-28 Thread Tobias Klausmann
Not sure if i understand completely what you intend to say here, with 
this we prevent hwmon from reporting utterly wrong temperature values 
returning an error (we could return -EBUSY or somehting instead, 
granted), yet if the device is shadowed, getting a sane temp value out 
of is seems unlikely to me!


Greetings,

Tobias

On 1/26/18 12:40 PM, Karol Herbst wrote:

no, we can't do that. We actually have to prevent this from hwom. The
issue here is, that the reg read returns 0x and parsing that
is the first step in the first place.

On Thu, Jan 25, 2018 at 7:16 PM, Tobias Klausmann
 wrote:

This fixes wrong temperature outputs e.g. 511°C if the card is asleep.

Signed-off-by: Tobias Klausmann 
---
  drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c 
b/drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c
index 9f0dea3f61dc..45d0ec632b5a 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c
@@ -32,8 +32,10 @@ gp100_temp_get(struct nvkm_therm *therm)
 u32 inttemp = (tsensor & 0x0001fff8);

 /* device SHADOWed */
-   if (tsensor & 0x4000)
+   if (tsensor & 0x4000) {
 nvkm_trace(subdev, "reading temperature from SHADOWed 
sensor\n");
+   return -ENODEV;
+   }

 /* device valid */
 if (tsensor & 0x2000)
--
2.16.1

___
Nouveau mailing list
nouv...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH] drm/nouveau/therm/gp100: Do not report temperature when subdev is shadowed

2018-01-26 Thread Tobias Klausmann
This fixes wrong temperature outputs e.g. 511°C if the card is asleep.

Signed-off-by: Tobias Klausmann 
---
 drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c 
b/drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c
index 9f0dea3f61dc..45d0ec632b5a 100644
--- a/drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c
+++ b/drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c
@@ -32,8 +32,10 @@ gp100_temp_get(struct nvkm_therm *therm)
u32 inttemp = (tsensor & 0x0001fff8);
 
/* device SHADOWed */
-   if (tsensor & 0x4000)
+   if (tsensor & 0x4000) {
nvkm_trace(subdev, "reading temperature from SHADOWed 
sensor\n");
+   return -ENODEV;
+   }
 
/* device valid */
if (tsensor & 0x2000)
-- 
2.16.1

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: nouveau. swiotlb: coherent allocation failed for device 0000:01:00.0 size=2097152

2017-12-19 Thread Tobias Klausmann


On 12/18/17 7:06 PM, Mike Galbraith wrote:

Greetings,

Kernel bound workloads seem to trigger the below for whatever reason.
  I only see this when beating up NFS.  There was a kworker wakeup
latency issue, but with a bandaid applied to fix that up, I can still
trigger this.



Hi,

i have seen this one as well with my system, but i could not find an 
easy way to trigger it for bisecting purpose. If you can trigger it 
conveniently, a bisect would be nice!


Greetings,

Tobias




[ 1313.811031] nouveau :01:00.0: swiotlb buffer is full (sz: 2097152 bytes)
[ 1313.811035] swiotlb: coherent allocation failed for device :01:00.0 
size=2097152
[ 1313.811038] CPU: 6 PID: 3026 Comm: Xorg Tainted: GE
4.15.0.g1291a0d5-master #355
[ 1313.811040] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 
09/23/2013
[ 1313.811041] Call Trace:
[ 1313.811049]  dump_stack+0x7c/0xb6
[ 1313.811053]  swiotlb_alloc_coherent+0x13f/0x150
[ 1313.811060]  ttm_dma_pool_alloc_new_pages+0x106/0x3c0 [ttm]
[ 1313.811066]  ttm_dma_pool_get_pages+0x10a/0x1e0 [ttm]
[ 1313.811070]  ttm_dma_populate+0x21f/0x2f0 [ttm]
[ 1313.811075]  ttm_tt_bind+0x2f/0x60 [ttm]
[ 1313.811079]  ttm_bo_handle_move_mem+0x51f/0x580 [ttm]
[ 1313.811084]  ? ttm_bo_handle_move_mem+0x5/0x580 [ttm]
[ 1313.811088]  ttm_bo_validate+0x10c/0x120 [ttm]
[ 1313.811092]  ? ttm_bo_validate+0x5/0x120 [ttm]
[ 1313.811106]  ? drm_mode_setcrtc+0x20e/0x540 [drm]
[ 1313.811109]  ttm_bo_init_reserved+0x290/0x490 [ttm]
[ 1313.84]  ttm_bo_init+0x52/0xb0 [ttm]
[ 1313.811141]  ? nv10_bo_put_tile_region+0x60/0x60 [nouveau]
[ 1313.811163]  nouveau_bo_new+0x465/0x5e0 [nouveau]
[ 1313.811184]  ? nv10_bo_put_tile_region+0x60/0x60 [nouveau]
[ 1313.811203]  nouveau_gem_new+0x66/0x110 [nouveau]
[ 1313.811223]  ? nouveau_gem_new+0x110/0x110 [nouveau]
[ 1313.811241]  nouveau_gem_ioctl_new+0x48/0xc0 [nouveau]
[ 1313.811249]  drm_ioctl_kernel+0x64/0xb0 [drm]
[ 1313.811257]  drm_ioctl+0x2a4/0x360 [drm]
[ 1313.811276]  ? nouveau_gem_new+0x110/0x110 [nouveau]
[ 1313.811285]  ? drm_ioctl+0x5/0x360 [drm]
[ 1313.811304]  nouveau_drm_ioctl+0x50/0xb0 [nouveau]
[ 1313.811308]  do_vfs_ioctl+0x90/0x690
[ 1313.811311]  ? do_vfs_ioctl+0x5/0x690
[ 1313.811313]  SyS_ioctl+0x3b/0x70
[ 1313.811316]  entry_SYSCALL_64_fastpath+0x1f/0x91
[ 1313.811320] RIP: 0033:0x7f3234746227
[ 1313.811321] RSP: 002b:7ffc3ace0408 EFLAGS: 3246 ORIG_RAX: 
0010
[ 1313.811324] RAX: ffda RBX: 025515d0 RCX: 7f3234746227
[ 1313.811325] RDX: 7ffc3ace0460 RSI: c0306480 RDI: 000b
[ 1313.811326] RBP: 00824120 R08: 02548f80 R09: 025490d0
[ 1313.811328] R10:  R11: 3246 R12: 093d
[ 1313.811329] R13: 02aff74c R14: 00824150 R15: 

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: Regression in TTM driver w/Linus' master

2017-11-27 Thread Tobias Klausmann


On 11/24/17 4:35 PM, Christian König wrote:

Am 24.11.2017 um 16:17 schrieb Tobias Klausmann:


On 11/24/17 3:54 PM, Daniel Vetter wrote:

On Thu, Nov 23, 2017 at 03:24:38PM +0100, Tobias Klausmann wrote:

On 11/23/17 2:58 AM, Dave Airlie wrote:
On 23 November 2017 at 11:17, Laura Abbott  
wrote:

Hi,

Fedora QA testing reported a panic when booting up VMs
using qmeu vga drivers
(https://paste.fedoraproject.org/paste/498yRWTCJv2LKIrmj4EliQ)

[   30.108507] [ cut here ]
[   30.108920] kernel BUG at ./include/linux/gfp.h:408!
[   30.109356] invalid opcode:  [#1] SMP
[   30.109700] Modules linked in: fuse nf_conntrack_netbios_ns
nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT 
nf_reject_ipv6
xt_conntrack devlink ip_set nfnetlink ebtable_nat ebtable_broute 
bridge
ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 
ip6table_mangle
ip6table_raw ip6table_security iptable_nat nf_conntrack_ipv4 
nf_defrag_ipv4

nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_mangle iptable_raw
iptable_security ebtable_filter ebtables ip6table_filter ip6_tables
snd_hda_codec_generic kvm_intel kvm snd_hda_intel snd_hda_codec 
irqbypass
ppdev snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm 
bochs_drm ttm
joydev drm_kms_helper virtio_balloon snd_timer snd parport_pc drm 
soundcore
parport i2c_piix4 nls_utf8 isofs squashfs zstd_decompress xxhash 
8021q garp

mrp stp llc virtio_net
[   30.115605]  virtio_console virtio_scsi crct10dif_pclmul 
crc32_pclmul
crc32c_intel ghash_clmulni_intel serio_raw virtio_pci virtio_ring 
virtio

ata_generic pata_acpi qemu_fw_cfg sunrpc scsi_transport_iscsi loop
[   30.117425] CPU: 0 PID: 1347 Comm: gnome-shell Not tainted
4.15.0-0.rc0.git6.1.fc28.x86_64 #1
[   30.118141] Hardware name: QEMU Standard PC (i440FX + PIIX, 
1996), BIOS

1.10.2-2.fc27 04/01/2014
[   30.118866] task: 923a77e03380 task.stack: a78182228000
[   30.119366] RIP: 0010:__alloc_pages_nodemask+0x35e/0x430
[   30.119810] RSP: :a7818222bba8 EFLAGS: 00010202
[   30.120250] RAX: 0001 RBX: 014382c6 RCX:
0006
[   30.120840] RDX:  RSI: 0009 RDI:

[   30.121443] RBP: 923a760d6000 R08:  R09:
0006
[   30.122039] R10: 0040 R11: 0300 R12:
923a729273c0
[   30.122629] R13:  R14:  R15:
923a7483d400
[   30.123223] FS:  7fe48da7dac0() GS:923a7cc0()
knlGS:
[   30.123896] CS:  0010 DS:  ES:  CR0: 80050033
[   30.124373] CR2: 7fe457b73000 CR3: 78313000 CR4:
06f0
[   30.124968] Call Trace:
[   30.125186]  ttm_pool_populate+0x19b/0x400 [ttm]
[   30.125578]  ttm_bo_vm_fault+0x325/0x570 [ttm]
[   30.125964]  __do_fault+0x19/0x11e
[   30.126255]  __handle_mm_fault+0xcd3/0x1260
[   30.126609]  handle_mm_fault+0x14c/0x310
[   30.126947]  __do_page_fault+0x28c/0x530
[   30.127282]  do_page_fault+0x32/0x270
[   30.127593]  async_page_fault+0x22/0x30
[   30.127922] RIP: 0033:0x7fe48aae39a8
[   30.128225] RSP: 002b:7ffc21c4d928 EFLAGS: 00010206
[   30.128664] RAX: 7fe457b73000 RBX: 55cd4c1041a0 RCX:
7fe457b73040
[   30.129259] RDX: 0030 RSI:  RDI:
7fe457b73000
[   30.129855] RBP: 0300 R08: 000c R09:
0001
[   30.130457] R10: 0001 R11: 0246 R12:
55cd4c1041a0
[   30.131054] R13: 55cd4bdfe990 R14: 55cd4c104110 R15:
0400
[   30.131648] Code: 11 01 00 0f 84 a9 00 00 00 65 ff 0d 6d cc dd 
44 e9 0f
ff ff ff 40 80 cd 80 e9 99 fe ff ff 48 89 c7 e8 e7 f6 01 00 e9 b7 
fe ff ff

<0f> 0b 0f ff e9 40 fd ff ff 65 48 8b 04 25 80 d5 00 00 8b 40 4c
[   30.133245] RIP: __alloc_pages_nodemask+0x35e/0x430 RSP: 
a7818222bba8

[   30.133836] ---[ end trace d4f1deb60784f40a ]---

This is based off of Linus' master branch at
c8a0739b185d11d6e2ca7ad9f5835841d1cfc765
Configs are at
https://git.kernel.org/pub/scm/linux/kernel/git/jwboyer/fedora.git/commit/?h=rawhide&id=0be14662c54f49b4e640868b9d67df18d39edff0 




Looks like a TTM regression due to:

0284f1ead87463bc17cf5e81a24fc65c052486f3
drm/ttm: add transparent huge page support for cached allocations v2

If the driver requests dma32 pages, we can end up trying to alloc 
huge

dma32 pages which triggers the oops. The bochs driver always requests
dma32 here.

I'll send a rough patch once I boot it.

Dave.


Hi Dave,

fyi only: It looks like this is not the only regression in this 
cycle with

ttm, novueau seems to suffer as well [1].

Adding ttm folks. Might be useful if we have an entry for ttm in
MAINTAINERS ...
-Daniel



A bit more of investigation for the nouveau regression: This only 
show when Transparent Hugepages (CONFIG_TRANSPARENT_HUGEPAGE) are 
enable. Thanks Dave for pointing me to that!


Yeah, sorry for that. I missed to handle the DMA32 case with 
transpare

Re: Regression in TTM driver w/Linus' master

2017-11-27 Thread Tobias Klausmann


On 11/24/17 3:54 PM, Daniel Vetter wrote:

On Thu, Nov 23, 2017 at 03:24:38PM +0100, Tobias Klausmann wrote:

On 11/23/17 2:58 AM, Dave Airlie wrote:

On 23 November 2017 at 11:17, Laura Abbott  wrote:

Hi,

Fedora QA testing reported a panic when booting up VMs
using qmeu vga drivers
(https://paste.fedoraproject.org/paste/498yRWTCJv2LKIrmj4EliQ)

[   30.108507] [ cut here ]
[   30.108920] kernel BUG at ./include/linux/gfp.h:408!
[   30.109356] invalid opcode:  [#1] SMP
[   30.109700] Modules linked in: fuse nf_conntrack_netbios_ns
nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6
xt_conntrack devlink ip_set nfnetlink ebtable_nat ebtable_broute bridge
ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle
ip6table_raw ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4
nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_mangle iptable_raw
iptable_security ebtable_filter ebtables ip6table_filter ip6_tables
snd_hda_codec_generic kvm_intel kvm snd_hda_intel snd_hda_codec irqbypass
ppdev snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm bochs_drm ttm
joydev drm_kms_helper virtio_balloon snd_timer snd parport_pc drm soundcore
parport i2c_piix4 nls_utf8 isofs squashfs zstd_decompress xxhash 8021q garp
mrp stp llc virtio_net
[   30.115605]  virtio_console virtio_scsi crct10dif_pclmul crc32_pclmul
crc32c_intel ghash_clmulni_intel serio_raw virtio_pci virtio_ring virtio
ata_generic pata_acpi qemu_fw_cfg sunrpc scsi_transport_iscsi loop
[   30.117425] CPU: 0 PID: 1347 Comm: gnome-shell Not tainted
4.15.0-0.rc0.git6.1.fc28.x86_64 #1
[   30.118141] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.10.2-2.fc27 04/01/2014
[   30.118866] task: 923a77e03380 task.stack: a78182228000
[   30.119366] RIP: 0010:__alloc_pages_nodemask+0x35e/0x430
[   30.119810] RSP: :a7818222bba8 EFLAGS: 00010202
[   30.120250] RAX: 0001 RBX: 014382c6 RCX:
0006
[   30.120840] RDX:  RSI: 0009 RDI:

[   30.121443] RBP: 923a760d6000 R08:  R09:
0006
[   30.122039] R10: 0040 R11: 0300 R12:
923a729273c0
[   30.122629] R13:  R14:  R15:
923a7483d400
[   30.123223] FS:  7fe48da7dac0() GS:923a7cc0()
knlGS:
[   30.123896] CS:  0010 DS:  ES:  CR0: 80050033
[   30.124373] CR2: 7fe457b73000 CR3: 78313000 CR4:
06f0
[   30.124968] Call Trace:
[   30.125186]  ttm_pool_populate+0x19b/0x400 [ttm]
[   30.125578]  ttm_bo_vm_fault+0x325/0x570 [ttm]
[   30.125964]  __do_fault+0x19/0x11e
[   30.126255]  __handle_mm_fault+0xcd3/0x1260
[   30.126609]  handle_mm_fault+0x14c/0x310
[   30.126947]  __do_page_fault+0x28c/0x530
[   30.127282]  do_page_fault+0x32/0x270
[   30.127593]  async_page_fault+0x22/0x30
[   30.127922] RIP: 0033:0x7fe48aae39a8
[   30.128225] RSP: 002b:7ffc21c4d928 EFLAGS: 00010206
[   30.128664] RAX: 7fe457b73000 RBX: 55cd4c1041a0 RCX:
7fe457b73040
[   30.129259] RDX: 0030 RSI:  RDI:
7fe457b73000
[   30.129855] RBP: 0300 R08: 000c R09:
0001
[   30.130457] R10: 0001 R11: 0246 R12:
55cd4c1041a0
[   30.131054] R13: 55cd4bdfe990 R14: 55cd4c104110 R15:
0400
[   30.131648] Code: 11 01 00 0f 84 a9 00 00 00 65 ff 0d 6d cc dd 44 e9 0f
ff ff ff 40 80 cd 80 e9 99 fe ff ff 48 89 c7 e8 e7 f6 01 00 e9 b7 fe ff ff
<0f> 0b 0f ff e9 40 fd ff ff 65 48 8b 04 25 80 d5 00 00 8b 40 4c
[   30.133245] RIP: __alloc_pages_nodemask+0x35e/0x430 RSP: a7818222bba8
[   30.133836] ---[ end trace d4f1deb60784f40a ]---

This is based off of Linus' master branch at
c8a0739b185d11d6e2ca7ad9f5835841d1cfc765
Configs are at
https://git.kernel.org/pub/scm/linux/kernel/git/jwboyer/fedora.git/commit/?h=rawhide&id=0be14662c54f49b4e640868b9d67df18d39edff0


Looks like a TTM regression due to:

0284f1ead87463bc17cf5e81a24fc65c052486f3
drm/ttm: add transparent huge page support for cached allocations v2

If the driver requests dma32 pages, we can end up trying to alloc huge
dma32 pages which triggers the oops. The bochs driver always requests
dma32 here.

I'll send a rough patch once I boot it.

Dave.


Hi Dave,

fyi only: It looks like this is not the only regression in this cycle with
ttm, novueau seems to suffer as well [1].

Adding ttm folks. Might be useful if we have an entry for ttm in
MAINTAINERS ...
-Daniel



A bit more of investigation for the nouveau regression: This only show 
when Transparent Hugepages (CONFIG_TRANSPARENT_HUGEPAGE) are enable. 
Thanks Dave for pointing me to that!



Greetings,

Tobias



Greetings,

Tobias


[1]:


[  404.918139] [ cut here ]
[  404.918147] kernel BUG at mm/shmem.c:4334!
[  404.918152] invalid opcode:  [#2] PREEM

Re: Regression in TTM driver w/Linus' master

2017-11-24 Thread Tobias Klausmann


On 11/23/17 2:58 AM, Dave Airlie wrote:

On 23 November 2017 at 11:17, Laura Abbott  wrote:

Hi,

Fedora QA testing reported a panic when booting up VMs
using qmeu vga drivers
(https://paste.fedoraproject.org/paste/498yRWTCJv2LKIrmj4EliQ)

[   30.108507] [ cut here ]
[   30.108920] kernel BUG at ./include/linux/gfp.h:408!
[   30.109356] invalid opcode:  [#1] SMP
[   30.109700] Modules linked in: fuse nf_conntrack_netbios_ns
nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6
xt_conntrack devlink ip_set nfnetlink ebtable_nat ebtable_broute bridge
ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle
ip6table_raw ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4
nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_mangle iptable_raw
iptable_security ebtable_filter ebtables ip6table_filter ip6_tables
snd_hda_codec_generic kvm_intel kvm snd_hda_intel snd_hda_codec irqbypass
ppdev snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm bochs_drm ttm
joydev drm_kms_helper virtio_balloon snd_timer snd parport_pc drm soundcore
parport i2c_piix4 nls_utf8 isofs squashfs zstd_decompress xxhash 8021q garp
mrp stp llc virtio_net
[   30.115605]  virtio_console virtio_scsi crct10dif_pclmul crc32_pclmul
crc32c_intel ghash_clmulni_intel serio_raw virtio_pci virtio_ring virtio
ata_generic pata_acpi qemu_fw_cfg sunrpc scsi_transport_iscsi loop
[   30.117425] CPU: 0 PID: 1347 Comm: gnome-shell Not tainted
4.15.0-0.rc0.git6.1.fc28.x86_64 #1
[   30.118141] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
1.10.2-2.fc27 04/01/2014
[   30.118866] task: 923a77e03380 task.stack: a78182228000
[   30.119366] RIP: 0010:__alloc_pages_nodemask+0x35e/0x430
[   30.119810] RSP: :a7818222bba8 EFLAGS: 00010202
[   30.120250] RAX: 0001 RBX: 014382c6 RCX:
0006
[   30.120840] RDX:  RSI: 0009 RDI:

[   30.121443] RBP: 923a760d6000 R08:  R09:
0006
[   30.122039] R10: 0040 R11: 0300 R12:
923a729273c0
[   30.122629] R13:  R14:  R15:
923a7483d400
[   30.123223] FS:  7fe48da7dac0() GS:923a7cc0()
knlGS:
[   30.123896] CS:  0010 DS:  ES:  CR0: 80050033
[   30.124373] CR2: 7fe457b73000 CR3: 78313000 CR4:
06f0
[   30.124968] Call Trace:
[   30.125186]  ttm_pool_populate+0x19b/0x400 [ttm]
[   30.125578]  ttm_bo_vm_fault+0x325/0x570 [ttm]
[   30.125964]  __do_fault+0x19/0x11e
[   30.126255]  __handle_mm_fault+0xcd3/0x1260
[   30.126609]  handle_mm_fault+0x14c/0x310
[   30.126947]  __do_page_fault+0x28c/0x530
[   30.127282]  do_page_fault+0x32/0x270
[   30.127593]  async_page_fault+0x22/0x30
[   30.127922] RIP: 0033:0x7fe48aae39a8
[   30.128225] RSP: 002b:7ffc21c4d928 EFLAGS: 00010206
[   30.128664] RAX: 7fe457b73000 RBX: 55cd4c1041a0 RCX:
7fe457b73040
[   30.129259] RDX: 0030 RSI:  RDI:
7fe457b73000
[   30.129855] RBP: 0300 R08: 000c R09:
0001
[   30.130457] R10: 0001 R11: 0246 R12:
55cd4c1041a0
[   30.131054] R13: 55cd4bdfe990 R14: 55cd4c104110 R15:
0400
[   30.131648] Code: 11 01 00 0f 84 a9 00 00 00 65 ff 0d 6d cc dd 44 e9 0f
ff ff ff 40 80 cd 80 e9 99 fe ff ff 48 89 c7 e8 e7 f6 01 00 e9 b7 fe ff ff
<0f> 0b 0f ff e9 40 fd ff ff 65 48 8b 04 25 80 d5 00 00 8b 40 4c
[   30.133245] RIP: __alloc_pages_nodemask+0x35e/0x430 RSP: a7818222bba8
[   30.133836] ---[ end trace d4f1deb60784f40a ]---

This is based off of Linus' master branch at
c8a0739b185d11d6e2ca7ad9f5835841d1cfc765
Configs are at
https://git.kernel.org/pub/scm/linux/kernel/git/jwboyer/fedora.git/commit/?h=rawhide&id=0be14662c54f49b4e640868b9d67df18d39edff0


Looks like a TTM regression due to:

0284f1ead87463bc17cf5e81a24fc65c052486f3
drm/ttm: add transparent huge page support for cached allocations v2

If the driver requests dma32 pages, we can end up trying to alloc huge
dma32 pages which triggers the oops. The bochs driver always requests
dma32 here.

I'll send a rough patch once I boot it.

Dave.



Hi Dave,

fyi only: It looks like this is not the only regression in this cycle 
with ttm, novueau seems to suffer as well [1].


Greetings,

Tobias


[1]:


[  404.918139] [ cut here ]
[  404.918147] kernel BUG at mm/shmem.c:4334!
[  404.918152] invalid opcode:  [#2] PREEMPT SMP
[  404.918157] Modules linked in: rfcomm af_packet bnep uvcvideo 
videobuf2_vmalloc videobuf2_memops rtsx_usb_ms videobuf2_v4l2 memstick 
videodev videobuf2_core btusb btrtl btbcm arc4 msr snd_hda_codec_hdmi 
snd_hda_codec_realtek snd_hda_codec_generic joydev nls_iso8859_1 
nls_cp437 hid_multitouch vfat fat iTCO_wdt iTCO_vendor_support 
intel_rapl x86_pkg_temp_thermal intel_powerclamp ath10k_pci coretemp 
ath10k_core a

Re: [Nouveau] [PATCH] drm/nouveau/mpeg: print more debug info when rejecting dma objects

2017-08-06 Thread Tobias Klausmann
Hi,

Lgtm!

Reviewed-by: Tobias Klausmann 


On 8/6/17 4:19 AM, Ilia Mirkin wrote:
> Signed-off-by: Ilia Mirkin 
> ---
>
> This was helpful when debugging our earlier mpeg woes. May as well have it 
> upstream.
>
>  drivers/gpu/drm/nouveau/nvkm/engine/mpeg/nv31.c | 7 ++-
>  drivers/gpu/drm/nouveau/nvkm/engine/mpeg/nv40.c | 7 ++-
>  2 files changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/mpeg/nv31.c 
> b/drivers/gpu/drm/nouveau/nvkm/engine/mpeg/nv31.c
> index 8a8895246d26..99f33d88d940 100644
> --- a/drivers/gpu/drm/nouveau/nvkm/engine/mpeg/nv31.c
> +++ b/drivers/gpu/drm/nouveau/nvkm/engine/mpeg/nv31.c
> @@ -124,6 +124,8 @@ nv31_mpeg_tile(struct nvkm_engine *engine, int i, struct 
> nvkm_fb_tile *tile)
>  static bool
>  nv31_mpeg_mthd_dma(struct nvkm_device *device, u32 mthd, u32 data)
>  {
> + struct nv31_mpeg *mpeg = nv31_mpeg(device->mpeg);
> + struct nvkm_subdev *subdev = &mpeg->engine.subdev;
>   u32 inst = data << 4;
>   u32 dma0 = nvkm_rd32(device, 0x70 + inst);
>   u32 dma1 = nvkm_rd32(device, 0x74 + inst);
> @@ -132,8 +134,11 @@ nv31_mpeg_mthd_dma(struct nvkm_device *device, u32 mthd, 
> u32 data)
>   u32 size = dma1 + 1;
>  
>   /* only allow linear DMA objects */
> - if (!(dma0 & 0x2000))
> + if (!(dma0 & 0x2000)) {
> + nvkm_error(subdev, "inst %08x dma0 %08x dma1 %08x dma2 %08x\n",
> +inst, dma0, dma1, dma2);
>   return false;
> + }
>  
>   if (mthd == 0x0190) {
>   /* DMA_CMD */
> diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/mpeg/nv40.c 
> b/drivers/gpu/drm/nouveau/nvkm/engine/mpeg/nv40.c
> index 16de5bd94b14..b5ec7c504dc6 100644
> --- a/drivers/gpu/drm/nouveau/nvkm/engine/mpeg/nv40.c
> +++ b/drivers/gpu/drm/nouveau/nvkm/engine/mpeg/nv40.c
> @@ -31,6 +31,8 @@ bool
>  nv40_mpeg_mthd_dma(struct nvkm_device *device, u32 mthd, u32 data)
>  {
>   struct nvkm_instmem *imem = device->imem;
> + struct nv31_mpeg *mpeg = nv31_mpeg(device->mpeg);
> + struct nvkm_subdev *subdev = &mpeg->engine.subdev;
>   u32 inst = data << 4;
>   u32 dma0 = nvkm_instmem_rd32(imem, inst + 0);
>   u32 dma1 = nvkm_instmem_rd32(imem, inst + 4);
> @@ -39,8 +41,11 @@ nv40_mpeg_mthd_dma(struct nvkm_device *device, u32 mthd, 
> u32 data)
>   u32 size = dma1 + 1;
>  
>   /* only allow linear DMA objects */
> - if (!(dma0 & 0x2000))
> + if (!(dma0 & 0x2000)) {
> + nvkm_error(subdev, "inst %08x dma0 %08x dma1 %08x dma2 %08x\n",
> +inst, dma0, dma1, dma2);
>   return false;
> + }
>  
>   if (mthd == 0x0190) {
>   /* DMA_CMD */
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 17/29] drm/nouveau: switch to drm_*{get,put} helpers

2017-08-03 Thread Tobias Klausmann
Looks good to me!

Reviewed-by: Tobias Klausmann 


On 8/3/17 1:58 PM, Cihangir Akturk wrote:
> drm_*_reference() and drm_*_unreference() functions are just
> compatibility alias for drm_*_get() and drm_*_put() adn should not be
> used by new code. So convert all users of compatibility functions to use
> the new APIs.
>
> Signed-off-by: Cihangir Akturk 
> ---
>  drivers/gpu/drm/nouveau/dispnv04/crtc.c   |  2 +-
>  drivers/gpu/drm/nouveau/nouveau_abi16.c   |  2 +-
>  drivers/gpu/drm/nouveau/nouveau_display.c |  8 
>  drivers/gpu/drm/nouveau/nouveau_fbcon.c   |  2 +-
>  drivers/gpu/drm/nouveau/nouveau_gem.c | 14 +++---
>  drivers/gpu/drm/nouveau/nv50_display.c|  2 +-
>  6 files changed, 15 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/gpu/drm/nouveau/dispnv04/crtc.c 
> b/drivers/gpu/drm/nouveau/dispnv04/crtc.c
> index 4b4b0b4..18b4be1 100644
> --- a/drivers/gpu/drm/nouveau/dispnv04/crtc.c
> +++ b/drivers/gpu/drm/nouveau/dispnv04/crtc.c
> @@ -1019,7 +1019,7 @@ nv04_crtc_cursor_set(struct drm_crtc *crtc, struct 
> drm_file *file_priv,
>   nv_crtc->cursor.set_offset(nv_crtc, nv_crtc->cursor.offset);
>   nv_crtc->cursor.show(nv_crtc, true);
>  out:
> - drm_gem_object_unreference_unlocked(gem);
> + drm_gem_object_put_unlocked(gem);
>   return ret;
>  }
>  
> diff --git a/drivers/gpu/drm/nouveau/nouveau_abi16.c 
> b/drivers/gpu/drm/nouveau/nouveau_abi16.c
> index f98f800..3e9db5a 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_abi16.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_abi16.c
> @@ -136,7 +136,7 @@ nouveau_abi16_chan_fini(struct nouveau_abi16 *abi16,
>   if (chan->ntfy) {
>   nouveau_bo_vma_del(chan->ntfy, &chan->ntfy_vma);
>   nouveau_bo_unpin(chan->ntfy);
> - drm_gem_object_unreference_unlocked(&chan->ntfy->gem);
> + drm_gem_object_put_unlocked(&chan->ntfy->gem);
>   }
>  
>   if (chan->heap.block_size)
> diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c 
> b/drivers/gpu/drm/nouveau/nouveau_display.c
> index 8d1df56..a68fe1a 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_display.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_display.c
> @@ -206,7 +206,7 @@ nouveau_user_framebuffer_destroy(struct drm_framebuffer 
> *drm_fb)
>   struct nouveau_framebuffer *fb = nouveau_framebuffer(drm_fb);
>  
>   if (fb->nvbo)
> - drm_gem_object_unreference_unlocked(&fb->nvbo->gem);
> + drm_gem_object_put_unlocked(&fb->nvbo->gem);
>  
>   drm_framebuffer_cleanup(drm_fb);
>   kfree(fb);
> @@ -267,7 +267,7 @@ nouveau_user_framebuffer_create(struct drm_device *dev,
>   if (ret == 0)
>   return &fb->base;
>  
> - drm_gem_object_unreference_unlocked(gem);
> + drm_gem_object_put_unlocked(gem);
>   return ERR_PTR(ret);
>  }
>  
> @@ -947,7 +947,7 @@ nouveau_display_dumb_create(struct drm_file *file_priv, 
> struct drm_device *dev,
>   return ret;
>  
>   ret = drm_gem_handle_create(file_priv, &bo->gem, &args->handle);
> - drm_gem_object_unreference_unlocked(&bo->gem);
> + drm_gem_object_put_unlocked(&bo->gem);
>   return ret;
>  }
>  
> @@ -962,7 +962,7 @@ nouveau_display_dumb_map_offset(struct drm_file 
> *file_priv,
>   if (gem) {
>   struct nouveau_bo *bo = nouveau_gem_object(gem);
>   *poffset = drm_vma_node_offset_addr(&bo->bo.vma_node);
> - drm_gem_object_unreference_unlocked(gem);
> + drm_gem_object_put_unlocked(gem);
>   return 0;
>   }
>  
> diff --git a/drivers/gpu/drm/nouveau/nouveau_fbcon.c 
> b/drivers/gpu/drm/nouveau/nouveau_fbcon.c
> index 2665a07..6c9e1ec 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_fbcon.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_fbcon.c
> @@ -451,7 +451,7 @@ nouveau_fbcon_destroy(struct drm_device *dev, struct 
> nouveau_fbdev *fbcon)
>   nouveau_bo_vma_del(nouveau_fb->nvbo, &nouveau_fb->vma);
>   nouveau_bo_unmap(nouveau_fb->nvbo);
>   nouveau_bo_unpin(nouveau_fb->nvbo);
> - drm_framebuffer_unreference(&nouveau_fb->base);
> + drm_framebuffer_put(&nouveau_fb->base);
>   }
>  
>   return 0;
> diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c 
> b/drivers/gpu/drm/nouveau/nouveau_gem.c
> index 2170534..653425c 100644
> --- a/drivers/gpu/drm/nouveau/nouveau_gem.c
> +++ b/drivers/gpu/drm/nouveau/nouveau_gem.c
> @@ -281,7 +281,7 @@ nouveau_gem_ioctl_new(struct drm_device *dev, vo

[PATCH] drm: disable vblank only if it got previously enabled

2017-07-21 Thread Tobias Klausmann
mimic the behavior of vblank_disable_fn(), another caller of
drm_vblank_disable_and_save().

This avoids oopsing, while trying to disable vblank on a not connected display:

[   12.768079] WARNING: CPU: 0 PID: 274 at drivers/gpu/drm/drm_vblank.c:609 
drm_calc_vbltimestamp_from_scanoutpos+0x296/0x320 [drm]
[   12.768080] Modules linked in: bnep snd_hda_codec_hdmi rtsx_usb_sdmmc 
uvcvideo rtsx_usb_ms mmc_core videobuf2_vmalloc memstick videobuf2_memops 
videobuf2_v4l2 videobuf2_core rtsx_usb videodev btusb btrtl arc4 
snd_hda_codec_realtek snd_hda_codec_generic joydev nls_iso8859_1 hid_multitouch 
nls_cp437 intel_rapl x86_pkg_temp_thermal intel_powerclamp vfat coretemp fat 
kvm_intel iTCO_wdt iTCO_vendor_support kvm irqbypass crct10dif_pclmul 
crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel ath10k_pci 
snd_hda_intel ath10k_core aes_x86_64 snd_hda_codec crypto_simd ath glue_helper 
cryptd snd_hda_core mac80211 snd_hwdep snd_pcm pcspkr r8169 cfg80211 mii 
snd_timer acer_wmi snd sparse_keymap wmi_bmof idma64 hci_uart virt_dma mei_me 
soundcore i2c_i801 mei btbcm shpchp intel_lpss_pci intel_pch_thermal
[   12.768130]  serdev btqca ucsi_acpi btintel typec_ucsi thermal typec 
bluetooth ecdh_generic battery ac pinctrl_sunrisepoint rfkill intel_lpss_acpi 
pinctrl_intel intel_lpss acpi_pad nouveau serio_raw i915 mxm_wmi ttm 
i2c_algo_bit drm_kms_helper xhci_pci syscopyarea sysfillrect sysimgblt xhci_hcd 
fb_sys_fops usbcore drm i2c_hid wmi video button sg efivarfs
[   12.768158] CPU: 0 PID: 274 Comm: kworker/0:2 Not tainted 
4.12.0-desktop-debug-drm+ #2
[   12.768160] Hardware name: Acer Aspire VN7-593G/Pluto_KLS, BIOS V1.04 
03/30/2017
[   12.768164] Workqueue: pm pm_runtime_work
[   12.768166] task: 889bf1627040 task.stack: 9541013e4000
[   12.768180] RIP: 0010:drm_calc_vbltimestamp_from_scanoutpos+0x296/0x320 [drm]
[   12.768181] RSP: 0018:9541013e7b30 EFLAGS: 00010086
[   12.768183] RAX: 001c RBX: 889b4cebd000 RCX: 0004
[   12.768184] RDX: 8004 RSI: 87a2d952 RDI: 
[   12.768186] RBP: 9541013e7b90 R08: 0001 R09: 039f
[   12.768187] R10: c05fe530 R11:  R12: 
[   12.768188] R13: 9541013e7ba4 R14: 889bf0426088 R15: 889bf0426000
[   12.768190] FS:  () GS:889bfec0() 
knlGS:
[   12.768191] CS:  0010 DS:  ES:  CR0: 80050033
[   12.768192] CR2: 00edb16580b8 CR3: 00020cc09000 CR4: 003406f0
[   12.768193] Call Trace:
[   12.768198]  ? enqueue_task_fair+0x64/0x600
[   12.768211]  ? drm_get_last_vbltimestamp+0x47/0x70 [drm]
[   12.768223]  ? drm_update_vblank_count+0x65/0x240 [drm]
[   12.768227]  ? pci_pm_runtime_resume+0xa0/0xa0
[   12.768238]  ? drm_vblank_disable_and_save+0x55/0xc0 [drm]
[   12.768250]  ? drm_crtc_vblank_off+0xa9/0x1e0 [drm]
[   12.768253]  ? pci_pm_runtime_resume+0xa0/0xa0
[   12.768299]  ? nouveau_display_fini+0x56/0xd0 [nouveau]
[   12.768339]  ? nouveau_display_suspend+0x51/0x110 [nouveau]
[   12.768378]  ? nouveau_do_suspend+0x76/0x1c0 [nouveau]
[   12.768413]  ? nouveau_pmops_runtime_suspend+0x54/0xb0 [nouveau]
[   12.768416]  ? pci_pm_runtime_suspend+0x5c/0x160
[   12.768419]  ? __rpm_callback+0xb6/0x1e0
[   12.768423]  ? kobject_uevent_env+0x111/0x5e0
[   12.768425]  ? pci_pm_runtime_resume+0xa0/0xa0
[   12.768427]  ? rpm_callback+0x1f/0x70
[   12.768429]  ? pci_pm_runtime_resume+0xa0/0xa0
[   12.768431]  ? rpm_suspend+0x11f/0x640
[   12.768441]  ? drm_fb_helper_hotplug_event+0x9a/0xe0 [drm_kms_helper]
[   12.768447]  ? output_poll_execute+0x17b/0x1a0 [drm_kms_helper]
[   12.768449]  ? pm_runtime_work+0x64/0xa0
[   12.768453]  ? process_one_work+0x1db/0x410
[   12.768456]  ? worker_thread+0x47/0x3d0
[   12.768459]  ? process_one_work+0x410/0x410
[   12.768461]  ? kthread+0x117/0x130
[   12.768463]  ? kthread_create_on_node+0x40/0x40
[   12.768466]  ? ret_from_fork+0x25/0x30
[   12.768468] Code: 80 3d 26 f3 01 00 00 0f 85 ad fd ff ff 48 8b 43 20 48 c7 
c7 31 a2 20 c0 c6 05 0e f3 01 00 01 48 8b b0 60 01 00 00 e8 75 2e ec c6 <0f> ff 
e9 88 fd ff ff 31 f6 44 88 55 b0 e8 38 fa ed c6 44 0f b6
[   12.768508] ---[ end trace d9bb853af3659bd5 ]---

Signed-off-by: Tobias Klausmann 
---
 drivers/gpu/drm/drm_vblank.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_vblank.c b/drivers/gpu/drm/drm_vblank.c
index a233a6be934a..4a21756bf2bd 100644
--- a/drivers/gpu/drm/drm_vblank.c
+++ b/drivers/gpu/drm/drm_vblank.c
@@ -1140,8 +1140,11 @@ void drm_crtc_vblank_off(struct drm_crtc *crtc)
 
/* Avoid redundant vblank disables without previous
 * drm_crtc_vblank_on(). */
-   if (drm_core_check_feature(dev, DRIVER_ATOMIC) || !vblank->inmodeset)
+   if (drm_core_check_feature(dev, DRIVER_ATOMIC) || (!vblank->inmodeset &&
+   vblank->enabled)) {
+   DRM_DEBUG(&q

Re: [Nouveau] [PATCH] drm: disable vblank only if it got previously enabled

2017-07-21 Thread Tobias Klausmann
Mh ok,

paper over in nouveau_display_fini until Ben comes up with a better idea
then?!


Greetings,

Tobias


On 7/20/17 10:13 AM, Daniel Vetter wrote:
> On Wed, Jul 19, 2017 at 04:10:50PM -0400, Ilia Mirkin wrote:
>> I believe the solution is to not call drm_crtc_vblank_off for atomic
>> modesetting in nouveau_display_fini. I think Ben's working on it.
> Yes, the goal of vblank_on/off was very much to not paper over driver bugs
> with clever tricks like these. If the driver cant keep track of its
> vblank, something has gone wrong, and the core should _not_ fix it up.
> Otherwise we're back to the old style vblank horror show.
>
> Thanks, Daniel
>
>> On Wed, Jul 19, 2017 at 1:25 PM, Tobias Klausmann
>>  wrote:
>>> mimic the behavior of vblank_disable_fn(), another caller of
>>> drm_vblank_disable_and_save().
>>>
>>> This avoids oopsing, while trying to disable vblank on a not connected 
>>> display:
>>>
>>> [   12.768079] WARNING: CPU: 0 PID: 274 at drivers/gpu/drm/drm_vblank.c:609 
>>> drm_calc_vbltimestamp_from_scanoutpos+0x296/0x320 [drm]
>>> [   12.768080] Modules linked in: bnep snd_hda_codec_hdmi rtsx_usb_sdmmc 
>>> uvcvideo rtsx_usb_ms mmc_core videobuf2_vmalloc memstick videobuf2_memops 
>>> videobuf2_v4l2 videobuf2_core rtsx_usb videodev btusb btrtl arc4 
>>> snd_hda_codec_realtek snd_hda_codec_generic joydev nls_iso8859_1 
>>> hid_multitouch nls_cp437 intel_rapl x86_pkg_temp_thermal intel_powerclamp 
>>> vfat coretemp fat kvm_intel iTCO_wdt iTCO_vendor_support kvm irqbypass 
>>> crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc 
>>> aesni_intel ath10k_pci snd_hda_intel ath10k_core aes_x86_64 snd_hda_codec 
>>> crypto_simd ath glue_helper cryptd snd_hda_core mac80211 snd_hwdep snd_pcm 
>>> pcspkr r8169 cfg80211 mii snd_timer acer_wmi snd sparse_keymap wmi_bmof 
>>> idma64 hci_uart virt_dma mei_me soundcore i2c_i801 mei btbcm shpchp 
>>> intel_lpss_pci intel_pch_thermal
>>> [   12.768130]  serdev btqca ucsi_acpi btintel typec_ucsi thermal typec 
>>> bluetooth ecdh_generic battery ac pinctrl_sunrisepoint rfkill 
>>> intel_lpss_acpi pinctrl_intel intel_lpss acpi_pad nouveau serio_raw i915 
>>> mxm_wmi ttm i2c_algo_bit drm_kms_helper xhci_pci syscopyarea sysfillrect 
>>> sysimgblt xhci_hcd fb_sys_fops usbcore drm i2c_hid wmi video button sg 
>>> efivarfs
>>> [   12.768158] CPU: 0 PID: 274 Comm: kworker/0:2 Not tainted 
>>> 4.12.0-desktop-debug-drm+ #2
>>> [   12.768160] Hardware name: Acer Aspire VN7-593G/Pluto_KLS, BIOS V1.04 
>>> 03/30/2017
>>> [   12.768164] Workqueue: pm pm_runtime_work
>>> [   12.768166] task: 889bf1627040 task.stack: 9541013e4000
>>> [   12.768180] RIP: 0010:drm_calc_vbltimestamp_from_scanoutpos+0x296/0x320 
>>> [drm]
>>> [   12.768181] RSP: 0018:9541013e7b30 EFLAGS: 00010086
>>> [   12.768183] RAX: 001c RBX: 889b4cebd000 RCX: 
>>> 0004
>>> [   12.768184] RDX: 8004 RSI: 87a2d952 RDI: 
>>> 
>>> [   12.768186] RBP: 9541013e7b90 R08: 0001 R09: 
>>> 039f
>>> [   12.768187] R10: c05fe530 R11:  R12: 
>>> 
>>> [   12.768188] R13: 9541013e7ba4 R14: 889bf0426088 R15: 
>>> 889bf0426000
>>> [   12.768190] FS:  () GS:889bfec0() 
>>> knlGS:
>>> [   12.768191] CS:  0010 DS:  ES:  CR0: 80050033
>>> [   12.768192] CR2: 00edb16580b8 CR3: 00020cc09000 CR4: 
>>> 003406f0
>>> [   12.768193] Call Trace:
>>> [   12.768198]  ? enqueue_task_fair+0x64/0x600
>>> [   12.768211]  ? drm_get_last_vbltimestamp+0x47/0x70 [drm]
>>> [   12.768223]  ? drm_update_vblank_count+0x65/0x240 [drm]
>>> [   12.768227]  ? pci_pm_runtime_resume+0xa0/0xa0
>>> [   12.768238]  ? drm_vblank_disable_and_save+0x55/0xc0 [drm]
>>> [   12.768250]  ? drm_crtc_vblank_off+0xa9/0x1e0 [drm]
>>> [   12.768253]  ? pci_pm_runtime_resume+0xa0/0xa0
>>> [   12.768299]  ? nouveau_display_fini+0x56/0xd0 [nouveau]
>>> [   12.768339]  ? nouveau_display_suspend+0x51/0x110 [nouveau]
>>> [   12.768378]  ? nouveau_do_suspend+0x76/0x1c0 [nouveau]
>>> [   12.768413]  ? nouveau_pmops_runtime_suspend+0x54/0xb0 [nouveau]
>>> [   12.768416]  ? pci_pm_runtime_suspend+0x5c/0x160
>>> [   12.768419]  ? __rpm_callback+0xb6/0x1e0
>>> [   12.768423]  ? kobject_uevent_env+0x111/0x5

Re: [Nouveau] [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335

2017-07-15 Thread Tobias Klausmann
The conversion is a nice catch, but i'd like to have a bit more context, 
see below!


With a better description:

Tobias Klausmann 


On 7/14/17 5:10 PM, Karol Herbst wrote:

Yeah, we shouldn't let the machine die. Are there more WARN_ON_ONCE
usage we could convert to WARN_ONCE?

Reviewed-By: Karol Herbst 

On Fri, Jul 14, 2017 at 5:05 PM, Tobias Klausmann
 wrote:

On 7/14/17 3:41 PM, Mike Galbraith wrote:

On Fri, 2017-07-14 at 15:36 +0200, Mike Galbraith wrote:

   All DRM did was to slip a
WARN_ON_ONCE() that nouveau triggers into a kernel module where such
things no longer warn, they blow the box out of the water.

BTW, turn that irksome WARN_ON_ONCE() in drivers/gpu/drm/drm_vblank.c
into a WARN_ONCE(), and all is peachy, you get the warning, box lives.

---
   drivers/gpu/drm/drm_vblank.c |3 ++-
   1 file changed, 2 insertions(+), 1 deletion(-)

--- a/drivers/gpu/drm/drm_vblank.c
+++ b/drivers/gpu/drm/drm_vblank.c
@@ -605,7 +605,8 @@ bool drm_calc_vbltimestamp_from_scanoutp
  */
 if (mode->crtc_clock == 0) {
 DRM_DEBUG("crtc %u: Noop due to uninitialized mode.\n",
pipe);
-   WARN_ON_ONCE(drm_drv_uses_atomic_modeset(dev));
+   WARN_ONCE(drm_drv_uses_atomic_modeset(dev), "%s: report
me.\n",


"report me" seems a bit odd, maybe just uninitialized mode?



+ dev->driver->name);
 return false;
 }



Hey,

confirmed this helps saving the box, but we still have to find the root
cause! Backtrace with the above fix applied (and the one which came in with
the latest drm-fixes merge)!


[1] https://hastebin.com/uyoqifijed.http

Thanks,

Tobias
Reviewed-By: Karol Herbst 
___
Nouveau mailing list
nouv...@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335

2017-07-15 Thread Tobias Klausmann

On 7/14/17 3:41 PM, Mike Galbraith wrote:

On Fri, 2017-07-14 at 15:36 +0200, Mike Galbraith wrote:

  All DRM did was to slip a
WARN_ON_ONCE() that nouveau triggers into a kernel module where such
things no longer warn, they blow the box out of the water.

BTW, turn that irksome WARN_ON_ONCE() in drivers/gpu/drm/drm_vblank.c
into a WARN_ONCE(), and all is peachy, you get the warning, box lives.

---
  drivers/gpu/drm/drm_vblank.c |3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

--- a/drivers/gpu/drm/drm_vblank.c
+++ b/drivers/gpu/drm/drm_vblank.c
@@ -605,7 +605,8 @@ bool drm_calc_vbltimestamp_from_scanoutp
 */
if (mode->crtc_clock == 0) {
DRM_DEBUG("crtc %u: Noop due to uninitialized mode.\n", pipe);
-   WARN_ON_ONCE(drm_drv_uses_atomic_modeset(dev));
+   WARN_ONCE(drm_drv_uses_atomic_modeset(dev), "%s: report me.\n",
+ dev->driver->name);
  
  		return false;

}



Hey,

confirmed this helps saving the box, but we still have to find the root 
cause! Backtrace with the above fix applied (and the one which came in 
with the latest drm-fixes merge)!



[1] https://hastebin.com/uyoqifijed.http

Thanks,

Tobias

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335

2017-07-12 Thread Tobias Klausmann


On 7/12/17 7:19 PM, Mike Galbraith wrote:

On Wed, 2017-07-12 at 07:37 -0400, Ilia Mirkin wrote:

On Wed, Jul 12, 2017 at 7:25 AM, Mike Galbraith  wrote:

On Wed, 2017-07-12 at 11:55 +0200, Mike Galbraith wrote:

On Tue, 2017-07-11 at 14:22 -0400, Ilia Mirkin wrote:

Some display stuff did change for 4.13 for GM20x+ boards. If it's not
too much trouble, a bisect would be pretty useful.

Bisection seemingly went fine, but the result is odd.

e98c58e55f68f8785aebfab1f8c9a03d8de0afe1 is the first bad commit

But it really really is bad.  Looking at gitk fork in the road leading
to it...

52d9d38c183b drm/sti:fix spelling mistake: "compoment" -> "component" - good
e4e818cc2d7c drm: make drm_panel.h self-contained - good
9cf8f5802f39 drm: add missing declaration to drm_blend.h  - good

Before the git highway splits, all is well.  The lane with commits
works fine at both ends, but e98c58e55f68 is busted.  Merge arfifact?

Hmmm... that tree does not appear to have gotten a v4.12 backmerge at
any point. The last backmerge from Linus as far as I can tell was
v4.11-rc7. Could be an interaction with some out-of-tree change.

FWIW, checking out the fingered commit then..

git log --oneline 52d9d38c183b..e98c58e55f68|grep nouveau and reverting
the lot helped not at all.

Checking out 6b7781b42dc9 and reverting the fingered commit did.  Given
the nouveau bits reverted are mostly the vblank changes, CC to Daniel,
maybe he'll know why both GTX 980 and GeForce 8600 GT get all upset.

Either I'm damn lucky, both of my nvidia equipped boxen going boom 100%
repeatably, or there are a lot of folks out there who haven't yet tried
suspend with our latest/greatest kernel.  I suspect the later.

-Mike



I should have had a look at my inbox, would have save me a log of work 
bisecting. Yet i come to the same conclusion:


# first bad commit: [e98c58e55f68f8785aebfab1f8c9a03d8de0afe1] Merge tag 
'drm-misc-next-2017-05-16' of git://anongit.freedesktop.org/git/drm-misc 
into drm-next



I suspect it is some vblank change as it shows up in every trace i have 
seen while bisecting, but that is just a wild guess...


Greetings,

Tobias

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[Nouveau] [PATCH] ram/gf100-: error out if a ridiculous amount of vram is detected

2015-05-20 Thread Tobias Klausmann
Any idea on how to solve the problem. other than just reporting it?

But for now this adds a helpful error message... you may add my R-b.

On 20.05.2015 22:01, Ilia Mirkin wrote:
> Some newer chips have trouble coming up, and we get bad MMIO reads from
> them, like 0xbadf100. This ends up translating into crazy amounts of
> VRAM, which destroys all sorts of other logic down the line. Instead,
> fail device init.
>
> Signed-off-by: Ilia Mirkin 
> Cc: stable at kernel.org
> ---
>   drm/nouveau/nvkm/subdev/fb/ramgf100.c | 6 ++
>   1 file changed, 6 insertions(+)
>
> diff --git a/drm/nouveau/nvkm/subdev/fb/ramgf100.c 
> b/drm/nouveau/nvkm/subdev/fb/ramgf100.c
> index de9f395..9d4d196 100644
> --- a/drm/nouveau/nvkm/subdev/fb/ramgf100.c
> +++ b/drm/nouveau/nvkm/subdev/fb/ramgf100.c
> @@ -545,6 +545,12 @@ gf100_ram_create_(struct nvkm_object *parent, struct 
> nvkm_object *engine,
>   }
>   }
>   
> + /* if over 1TB of VRAM is reported, something went very wrong, bail */
> + if (ram->size > (1ULL << 40)) {
> + nv_error(pfb, "invalid vram size: %llx\n", ram->size);
> + return -EINVAL;
> + }
> +
>   /* if all controllers have the same amount attached, there's no holes */
>   if (uniform) {
>   offset = rsvd_head;



3.18-rc regression: drm/nouveau: use shared fences for readable objects

2014-11-27 Thread Tobias Klausmann


On 26.11.2014 21:29, Michael Marineau wrote:
> On Mon, Nov 24, 2014 at 11:43 PM, Maarten Lankhorst
>  wrote:
>> Hey,
>>
>> Op 22-11-14 om 21:16 schreef Michael Marineau:
>>> On Nov 22, 2014 11:45 AM, "Michael Marineau"  wrote:
 On Nov 22, 2014 8:56 AM, "Maarten Lankhorst" <
>>> maarten.lankhorst at canonical.com> wrote:
> Hey,
>
> Op 22-11-14 om 01:19 schreef Michael Marineau:
>> On Thu, Nov 20, 2014 at 12:53 AM, Maarten Lankhorst
>>  wrote:
>>> Op 20-11-14 om 05:06 schreef Michael Marineau:
 On Wed, Nov 19, 2014 at 12:10 AM, Maarten Lankhorst
  wrote:
> Hey,
>
> On 19-11-14 07:43, Michael Marineau wrote:
>> On 3.18-rc kernel's I have been intermittently experiencing GPU
>> lockups shortly after startup, accompanied with one or both of the
>> following errors:
>>
>> nouveau E[   PFIFO][:01:00.0] read fault at 0x000734a000 [PTE]
>> from PBDMA0/HOST_CPU on channel 0x007faa3000 [unknown]
>> nouveau E[ DRM] GPU lockup - switching to software fbcon
>>
>> I was able to trace the issue with bisect to commit
>> 809e9447b92ffe1346b2d6ec390e212d5307f61c "drm/nouveau: use shared
>> fences for readable objects". The lockups appear to have cleared
>>> up
>> since reverting that and a few related followup commits:
>>
>> 809e9447: "drm/nouveau: use shared fences for readable objects"
>> 055dffdf: "drm/nouveau: bump driver patchlevel to 1.2.1"
>> e3be4c23: "drm/nouveau: specify if interruptible wait is desired
>>> in
>> nouveau_fence_sync"
>> 15a996bb: "drm/nouveau: assign fence_chan->name correctly"
> Weird. I'm not sure yet what causes it.
>
>
>>> http://cgit.freedesktop.org/~mlankhorst/linux/commit/?h=fixed-fences-for-bisect&id=86be4f216bbb9ea3339843a5658d4c21162c7ee2
 Building a kernel from that commit gives me an entirely new
>>> behavior:
 X hangs for at least 10-20 seconds at a time with brief moments of
 responsiveness before hanging again while gitk on the kernel repo
 loads. Otherwise the system is responsive. The head of that
 fixed-fences-for-bisect branch (1c6aafb5) which is the "use shared
 fences for readable objects" commit I originally bisected to does
 feature the complete lockups I was seeing before.
>>> Ok for the sake of argument lets just assume they're separate bugs,
>>> and we should look at xorg
>>> hanging first.
>>>
>>> Is there anything in the dmesg when the hanging happens?
>>>
>>> And it's probably 15 seconds, if it's called through
>>> nouveau_fence_wait.
>>> Try changing else if (!ret) to else if (WARN_ON(!ret)) in that
>>> function, and see if you get some dmesg spam. :)
>> Adding the WARN_ON to 86be4f21 repots the following:
>>
>> [ 1188.676073] [ cut here ]
>> [ 1188.676161] WARNING: CPU: 1 PID: 474 at
>> drivers/gpu/drm/nouveau/nouveau_fence.c:359
>> nouveau_fence_wait.part.9+0x33/0x40 [nouveau]()
>> [ 1188.676166] Modules linked in: rndis_host cdc_ether usbnet mii bnep
>> ecb btusb bluetooth rfkill bridge stp llc hid_generic usb_storage
>> joydev mousedev hid_apple usbhid bcm5974 nls_iso8859_1 nls_cp437 vfat
>> fat nouveau snd_hda_codec_hdmi coretemp x86_pkg_temp_thermal
>> intel_powerclamp kvm_intel kvm iTCO_wdt crct10dif_pclmul
>> iTCO_vendor_support crc32c_intel evdev aesni_intel mac_hid aes_x86_64
>> lrw glue_helper ablk_helper applesmc snd_hda_codec_cirrus cryptd
>> input_polldev snd_hda_codec_generic mxm_wmi led_class wmi microcode
>> hwmon snd_hda_intel ttm snd_hda_controller lpc_ich i2c_i801 mfd_core
>> snd_hda_codec i2c_algo_bit snd_hwdep drm_kms_helper snd_pcm sbs drm
>> apple_gmux i2ccore snd_timer snd agpgart mei_me soundcore sbshc mei
>> video xhci_hcd usbcore usb_common apple_bl button battery ac efivars
>> autofs4
>> [ 1188.676300]  efivarfs
>> [ 1188.676308] CPU: 1 PID: 474 Comm: Xorg Tainted: GW
>> 3.17.0-rc2-nvtest+ #147
>> [ 1188.676313] Hardware name: Apple Inc.
>> MacBookPro11,3/Mac-2BD1B31983FE1663, BIOS
>> MBP112.88Z.0138.B11.1408291503 08/29/2014
>> [ 1188.676316]  0009 88045daebce8 814f0c09
>> 
>> [ 1188.676325]  88045daebd20 8104ea5d 88006a6c1468
>> fff0
>> [ 1188.676333]    88006a6c1000
>> 88045daebd30
>> [ 1188.676341] Call Trace:
>> [ 1188.676356]  [] dump_stack+0x4d/0x66
>> [ 1188.676369]  [] warn_slowpath_common+0x7d/0xa0
>> [ 1188.676377]  [] warn_slowpath_null+0x1a/0x20
>> [ 1188.676439]  []
>> nouveau_fence_wait.part.9+0x33/0x40 [nouveau]
>> [ 1188.676496]  [] nouveau_fence_wait+0x16/0x30
>>> [nouveau]
>> [ 1188.676552]  []
>> nouveau

3.18-rc regression: drm/nouveau: use shared fences for readable objects

2014-11-20 Thread Tobias Klausmann
On 19.11.2014 09:10, Maarten Lankhorst wrote:
> ...
> On the EDITED patch from fixed-fences-for-bisect, can you do the following:
>
> In nouveau/nv84_fence.c function nv84_fence_context_new, remove
>
> fctx->base.sequence = nv84_fence_read(chan);
>
> and add back
>
> nouveau_bo_wr32(priv->bo, chan->chid * 16/4, 0x);
>
> ...

Added the above on top of your "fixed-fences-for-bisect" branch and 
guessed it would work, but did not :/
Anyway, as this "initializes" the fence to a known state, maybe you 
should consider pushing that.

Going to compile the kernel with trace events (lets see how) ...

Tobias


3.18-rc regression: drm/nouveau: use shared fences for readable objects

2014-11-19 Thread Tobias Klausmann
On 19.11.2014 09:10, Maarten Lankhorst wrote:
> Hey,
>
> On 19-11-14 07:43, Michael Marineau wrote:
>> On 3.18-rc kernel's I have been intermittently experiencing GPU
>> lockups shortly after startup, accompanied with one or both of the
>> following errors:
>>
>> nouveau E[   PFIFO][:01:00.0] read fault at 0x000734a000 [PTE]
>> from PBDMA0/HOST_CPU on channel 0x007faa3000 [unknown]
>> nouveau E[ DRM] GPU lockup - switching to software fbcon
>>
>> I was able to trace the issue with bisect to commit
>> 809e9447b92ffe1346b2d6ec390e212d5307f61c "drm/nouveau: use shared
>> fences for readable objects". The lockups appear to have cleared up
>> since reverting that and a few related followup commits:
>>
>> 809e9447: "drm/nouveau: use shared fences for readable objects"
>> 055dffdf: "drm/nouveau: bump driver patchlevel to 1.2.1"
>> e3be4c23: "drm/nouveau: specify if interruptible wait is desired in
>> nouveau_fence_sync"
>> 15a996bb: "drm/nouveau: assign fence_chan->name correctly"
> Weird. I'm not sure yet what causes it.
>
> http://cgit.freedesktop.org/~mlankhorst/linux/commit/?h=fixed-fences-for-bisect&id=86be4f216bbb9ea3339843a5658d4c21162c7ee2
>
> On the EDITED patch from fixed-fences-for-bisect, can you do the following:
>
> In nouveau/nv84_fence.c function nv84_fence_context_new, remove
>
> fctx->base.sequence = nv84_fence_read(chan);
>
> and add back
>
> nouveau_bo_wr32(priv->bo, chan->chid * 16/4, 0x);
>
> If that fails you should compile your kernel with trace events, to get some 
> debugging info from the fences. I'll post debugging info if this does not fix 
> it.
>
> ~Maarten

Hey,
as mentioned in IRC the new fencing hangs my GPU for a while as well (nve7).
Bisected back to  86be4f216bbb9ea3339843a5658d4c21162c7ee2
, EDITED

from the fixed-fences-for-bisect branch mentioned above.

Original bisect on linus brach brought me to:
29ba89b2371d466ca68973525816cf10debc2655
drm/nouveau: rework to new fence interface

Michael if you are going to bisect the "fixed-fences-for-bisect" branch, 
maybe take a closer look if you come anywhere near that commit, if that 
does or does not trigger the GPU hangs for you!

Tobias


I915 DRI_PRIME Bug

2013-08-18 Thread Tobias Klausmann

Hello there,
while testing my "Optimus" Notebook i saw a stack trace in my logs, 
maybe someone is interested!
I can easily reproduce this any time. It happens when offloading a GL 
app, here Unigine Heaven 3.0 to the nvidia card. To be more exactly: 
When starting Unigine the window stays black. To get something useful i 
have to minimize and maximize the window. Exactly when maximizing the 
window the trace happens.


Hope this helps anyway!

[ cut here ]
WARNING: CPU: 7 PID: 718 at drivers/gpu/drm/i915/i915_gem.c:3967 
i915_gem_free_object+0x124/0x150 [i915]()
Modules linked in: af_packet bnep fuse snd_hda_codec_hdmi 
snd_hda_codec_realtek snd_hda_intel snd_hda_codec ath3k btusb bluetooth 
uvcvideo snd_hwdep videobuf2_core videodev videobuf2_vmalloc 
videobuf2_memops snd_pcm snd_seq snd_timer arc4 ath9k snd_seq_device 
mac80211 ath9k_common ath9k_hw ath snd iTCO_wdt sdhci_pci sdhci mmc_core 
sr_mod sg tg3 ptp pps_core iTCO_vendor_support cfg80211 lpc_ich i2c_i801 
pcspkr joydev acer_wmi sparse_keymap rfkill cdrom soundcore 
snd_page_alloc mperf battery ac autofs4 i915 xhci_hcd processor 
scsi_dh_alua scsi_dh_hp_sw scsi_dh_emc scsi_dh_rdac scsi_dh nouveau ttm 
drm_kms_helper drm i2c_algo_bit mxm_wmi video thermal_sys wmi button

CPU: 7 PID: 718 Comm: Xorg Not tainted 3.11.0-rc5-desktop+ #27
Hardware name: Acer Aspire V3-571G/VA50_HC_CR, BIOS V1.13 10/09/2012
  0009 81568703 
 81047f81 88021cf1ab00 88024f1d 88025e421930
 88021c7bcd40 8802540f2da0 a0210a44 88021cf1ab00
Call Trace:
 [] ? dump_stack+0x50/0x80
 [] ? warn_slowpath_common+0x81/0xb0
 [] ? i915_gem_free_object+0x124/0x150 [i915]
 [] ? i915_gem_dmabuf_release+0x80/0x90 [i915]
 [] ? dma_buf_release+0x23/0x80
 [] ? __fput+0xcd/0x230
 [] ? task_work_run+0x97/0xd0
 [] ? do_notify_resume+0x79/0xa0
 [] ? int_signal+0x12/0x17
---[ end trace 99a0c147e69ddcd1 ]---

Thanks,
Tobias Klausmann
___
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel


I915 DRI_PRIME Bug

2013-08-17 Thread Tobias Klausmann
Hello there,
while testing my "Optimus" Notebook i saw a stack trace in my logs, 
maybe someone is interested!
I can easily reproduce this any time. It happens when offloading a GL 
app, here Unigine Heaven 3.0 to the nvidia card. To be more exactly: 
When starting Unigine the window stays black. To get something useful i 
have to minimize and maximize the window. Exactly when maximizing the 
window the trace happens.

Hope this helps anyway!

[ cut here ]
WARNING: CPU: 7 PID: 718 at drivers/gpu/drm/i915/i915_gem.c:3967 
i915_gem_free_object+0x124/0x150 [i915]()
Modules linked in: af_packet bnep fuse snd_hda_codec_hdmi 
snd_hda_codec_realtek snd_hda_intel snd_hda_codec ath3k btusb bluetooth 
uvcvideo snd_hwdep videobuf2_core videodev videobuf2_vmalloc 
videobuf2_memops snd_pcm snd_seq snd_timer arc4 ath9k snd_seq_device 
mac80211 ath9k_common ath9k_hw ath snd iTCO_wdt sdhci_pci sdhci mmc_core 
sr_mod sg tg3 ptp pps_core iTCO_vendor_support cfg80211 lpc_ich i2c_i801 
pcspkr joydev acer_wmi sparse_keymap rfkill cdrom soundcore 
snd_page_alloc mperf battery ac autofs4 i915 xhci_hcd processor 
scsi_dh_alua scsi_dh_hp_sw scsi_dh_emc scsi_dh_rdac scsi_dh nouveau ttm 
drm_kms_helper drm i2c_algo_bit mxm_wmi video thermal_sys wmi button
CPU: 7 PID: 718 Comm: Xorg Not tainted 3.11.0-rc5-desktop+ #27
Hardware name: Acer Aspire V3-571G/VA50_HC_CR, BIOS V1.13 10/09/2012
   0009 81568703 
  81047f81 88021cf1ab00 88024f1d 88025e421930
  88021c7bcd40 8802540f2da0 a0210a44 88021cf1ab00
Call Trace:
  [] ? dump_stack+0x50/0x80
  [] ? warn_slowpath_common+0x81/0xb0
  [] ? i915_gem_free_object+0x124/0x150 [i915]
  [] ? i915_gem_dmabuf_release+0x80/0x90 [i915]
  [] ? dma_buf_release+0x23/0x80
  [] ? __fput+0xcd/0x230
  [] ? task_work_run+0x97/0xd0
  [] ? do_notify_resume+0x79/0xa0
  [] ? int_signal+0x12/0x17
---[ end trace 99a0c147e69ddcd1 ]---

Thanks,
Tobias Klausmann