[PATCH] nouveau: Skip unvailable ttm page entries
Starting with commit f295c8cfec833c2707ff1512da10d65386dde7af ("drm/nouveau: fix dma syncing warning with debugging on.") the following oops occures: BUG: kernel NULL pointer dereference, address: #PF: supervisor read access in kernel mode #PF: error_code(0x) - not-present page PGD 0 P4D 0 Oops: [#1] PREEMPT SMP PTI CPU: 6 PID: 1013 Comm: Xorg.bin Tainted: G E 5.11.0-desktop-rc0+ #2 Hardware name: Acer Aspire VN7-593G/Pluto_KLS, BIOS V1.11 08/01/2018 RIP: 0010:nouveau_bo_sync_for_device+0x40/0xb0 [nouveau] Call Trace: nouveau_bo_validate+0x5d/0x80 [nouveau] nouveau_gem_ioctl_pushbuf+0x662/0x1120 [nouveau] ? nouveau_gem_ioctl_new+0xf0/0xf0 [nouveau] drm_ioctl_kernel+0xa6/0xf0 [drm] drm_ioctl+0x1f4/0x3a0 [drm] ? nouveau_gem_ioctl_new+0xf0/0xf0 [nouveau] nouveau_drm_ioctl+0x50/0xa0 [nouveau] __x64_sys_ioctl+0x7e/0xb0 do_syscall_64+0x33/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae ---[ end trace ccfb1e7f4064374f ]--- RIP: 0010:nouveau_bo_sync_for_device+0x40/0xb0 [nouveau] The underlying problem is not introduced by the commit, yet it uncovered the underlying issue. The cited commit relies on valid pages. This is not given for due to some bugs. For now, just warn and work around the issue by just ignoring the bad ttm objects. Below is some debug info gathered while debugging this issue: nouveau :01:00.0: DRM: ttm_dma->num_pages: 2048 nouveau :01:00.0: DRM: ttm_dma->pages is NULL nouveau :01:00.0: DRM: ttm_dma: e96058e7 nouveau :01:00.0: DRM: ttm_dma->page_flags: nouveau :01:00.0: DRM: ttm_dma: Populated: 1 nouveau :01:00.0: DRM: ttm_dma: No Retry: 0 nouveau :01:00.0: DRM: ttm_dma: SG: 256 nouveau :01:00.0: DRM: ttm_dma: Zero Alloc: 0 nouveau :01:00.0: DRM: ttm_dma: Swapped: 0 Signed-off-by: Tobias Klausmann --- drivers/gpu/drm/nouveau/nouveau_bo.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/nouveau/nouveau_bo.c b/drivers/gpu/drm/nouveau/nouveau_bo.c index fabb314a0b2f..5902e21d5dfe 100644 --- a/drivers/gpu/drm/nouveau/nouveau_bo.c +++ b/drivers/gpu/drm/nouveau/nouveau_bo.c @@ -551,6 +551,10 @@ nouveau_bo_sync_for_device(struct nouveau_bo *nvbo) if (!ttm_dma) return; + if (!ttm_dma->pages) { + NV_DEBUG(drm, "ttm_dma 0x%p: pages NULL\n", ttm_dma); + return; + } /* Don't waste time looping if the object is coherent */ if (nvbo->force_coherent) -- 2.30.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH] nouveau: forward error generated while resuming objects tree
On a failed resume we may experience unrecoverable errors. Plumb the error code through to actually let the driver fail. On a reverse-prime setup this helps the drm subsystem to at least recover the integrated gpu. This can especially happen with secboot timing out, leaving the hardware in a non-functioning state. Signed-off-by: Tobias Klausmann --- drivers/gpu/drm/nouveau/nouveau_drm.c | 12 +++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/nouveau/nouveau_drm.c b/drivers/gpu/drm/nouveau/nouveau_drm.c index 5020265bfbd9..56a107f3a0e1 100644 --- a/drivers/gpu/drm/nouveau/nouveau_drm.c +++ b/drivers/gpu/drm/nouveau/nouveau_drm.c @@ -802,10 +802,15 @@ nouveau_do_suspend(struct drm_device *dev, bool runtime) static int nouveau_do_resume(struct drm_device *dev, bool runtime) { + int ret = 0; struct nouveau_drm *drm = nouveau_drm(dev); NV_DEBUG(drm, "resuming object tree...\n"); - nvif_client_resume(&drm->master.base); + ret = nvif_client_resume(&drm->master.base); + if (ret) { + NV_ERROR(drm, "Client resume failed with error: %d\n", ret); + return ret; + } NV_DEBUG(drm, "resuming fence...\n"); if (drm->fence && nouveau_fence(drm)->resume) @@ -925,6 +930,7 @@ nouveau_pmops_runtime_resume(struct device *dev) { struct pci_dev *pdev = to_pci_dev(dev); struct drm_device *drm_dev = pci_get_drvdata(pdev); + struct nouveau_drm *drm = nouveau_drm(drm_dev); struct nvif_device *device = &nouveau_drm(drm_dev)->client.device; int ret; @@ -941,6 +947,10 @@ nouveau_pmops_runtime_resume(struct device *dev) pci_set_master(pdev); ret = nouveau_do_resume(drm_dev, true); + if (ret) { + NV_ERROR(drm, "resume failed with: %d\n", ret); + return ret; + } /* do magic */ nvif_mask(&device->object, 0x088488, (1 << 25), (1 << 25)); -- 2.21.0 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [Nouveau] [PATCH] drm/nouveau/therm/gp100: Do not report temperature when subdev is shadowed
Well fixing the return of wrong values in this function is reasonable by any means, of course not reading the mem in the first place would be nice, but deciding this is imho not in the scope of a temp_get function but somewhere in the code calling temp_get. On 1/26/18 3:03 PM, Karol Herbst wrote: well I just tried to say, that you are not fixing the issue you think were fixing. In your case the GPU is powered off and you get garbage values from any mmio read, so parsing those values is just wrong and we need to prevent doing anything on the hw whenever it is powered off directly in hwmon. On Fri, Jan 26, 2018 at 2:40 PM, Tobias Klausmann wrote: Not sure if i understand completely what you intend to say here, with this we prevent hwmon from reporting utterly wrong temperature values returning an error (we could return -EBUSY or somehting instead, granted), yet if the device is shadowed, getting a sane temp value out of is seems unlikely to me! Greetings, Tobias On 1/26/18 12:40 PM, Karol Herbst wrote: no, we can't do that. We actually have to prevent this from hwom. The issue here is, that the reg read returns 0x and parsing that is the first step in the first place. On Thu, Jan 25, 2018 at 7:16 PM, Tobias Klausmann wrote: This fixes wrong temperature outputs e.g. 511°C if the card is asleep. Signed-off-by: Tobias Klausmann --- drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c b/drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c index 9f0dea3f61dc..45d0ec632b5a 100644 --- a/drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c @@ -32,8 +32,10 @@ gp100_temp_get(struct nvkm_therm *therm) u32 inttemp = (tsensor & 0x0001fff8); /* device SHADOWed */ - if (tsensor & 0x4000) + if (tsensor & 0x4000) { nvkm_trace(subdev, "reading temperature from SHADOWed sensor\n"); + return -ENODEV; + } /* device valid */ if (tsensor & 0x2000) -- 2.16.1 ___ Nouveau mailing list nouv...@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [Nouveau] [PATCH] drm/nouveau/therm/gp100: Do not report temperature when subdev is shadowed
Not sure if i understand completely what you intend to say here, with this we prevent hwmon from reporting utterly wrong temperature values returning an error (we could return -EBUSY or somehting instead, granted), yet if the device is shadowed, getting a sane temp value out of is seems unlikely to me! Greetings, Tobias On 1/26/18 12:40 PM, Karol Herbst wrote: no, we can't do that. We actually have to prevent this from hwom. The issue here is, that the reg read returns 0x and parsing that is the first step in the first place. On Thu, Jan 25, 2018 at 7:16 PM, Tobias Klausmann wrote: This fixes wrong temperature outputs e.g. 511°C if the card is asleep. Signed-off-by: Tobias Klausmann --- drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c b/drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c index 9f0dea3f61dc..45d0ec632b5a 100644 --- a/drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c @@ -32,8 +32,10 @@ gp100_temp_get(struct nvkm_therm *therm) u32 inttemp = (tsensor & 0x0001fff8); /* device SHADOWed */ - if (tsensor & 0x4000) + if (tsensor & 0x4000) { nvkm_trace(subdev, "reading temperature from SHADOWed sensor\n"); + return -ENODEV; + } /* device valid */ if (tsensor & 0x2000) -- 2.16.1 ___ Nouveau mailing list nouv...@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[PATCH] drm/nouveau/therm/gp100: Do not report temperature when subdev is shadowed
This fixes wrong temperature outputs e.g. 511°C if the card is asleep. Signed-off-by: Tobias Klausmann --- drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c b/drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c index 9f0dea3f61dc..45d0ec632b5a 100644 --- a/drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c +++ b/drivers/gpu/drm/nouveau/nvkm/subdev/therm/gp100.c @@ -32,8 +32,10 @@ gp100_temp_get(struct nvkm_therm *therm) u32 inttemp = (tsensor & 0x0001fff8); /* device SHADOWed */ - if (tsensor & 0x4000) + if (tsensor & 0x4000) { nvkm_trace(subdev, "reading temperature from SHADOWed sensor\n"); + return -ENODEV; + } /* device valid */ if (tsensor & 0x2000) -- 2.16.1 ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: nouveau. swiotlb: coherent allocation failed for device 0000:01:00.0 size=2097152
On 12/18/17 7:06 PM, Mike Galbraith wrote: Greetings, Kernel bound workloads seem to trigger the below for whatever reason. I only see this when beating up NFS. There was a kworker wakeup latency issue, but with a bandaid applied to fix that up, I can still trigger this. Hi, i have seen this one as well with my system, but i could not find an easy way to trigger it for bisecting purpose. If you can trigger it conveniently, a bisect would be nice! Greetings, Tobias [ 1313.811031] nouveau :01:00.0: swiotlb buffer is full (sz: 2097152 bytes) [ 1313.811035] swiotlb: coherent allocation failed for device :01:00.0 size=2097152 [ 1313.811038] CPU: 6 PID: 3026 Comm: Xorg Tainted: GE 4.15.0.g1291a0d5-master #355 [ 1313.811040] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013 [ 1313.811041] Call Trace: [ 1313.811049] dump_stack+0x7c/0xb6 [ 1313.811053] swiotlb_alloc_coherent+0x13f/0x150 [ 1313.811060] ttm_dma_pool_alloc_new_pages+0x106/0x3c0 [ttm] [ 1313.811066] ttm_dma_pool_get_pages+0x10a/0x1e0 [ttm] [ 1313.811070] ttm_dma_populate+0x21f/0x2f0 [ttm] [ 1313.811075] ttm_tt_bind+0x2f/0x60 [ttm] [ 1313.811079] ttm_bo_handle_move_mem+0x51f/0x580 [ttm] [ 1313.811084] ? ttm_bo_handle_move_mem+0x5/0x580 [ttm] [ 1313.811088] ttm_bo_validate+0x10c/0x120 [ttm] [ 1313.811092] ? ttm_bo_validate+0x5/0x120 [ttm] [ 1313.811106] ? drm_mode_setcrtc+0x20e/0x540 [drm] [ 1313.811109] ttm_bo_init_reserved+0x290/0x490 [ttm] [ 1313.84] ttm_bo_init+0x52/0xb0 [ttm] [ 1313.811141] ? nv10_bo_put_tile_region+0x60/0x60 [nouveau] [ 1313.811163] nouveau_bo_new+0x465/0x5e0 [nouveau] [ 1313.811184] ? nv10_bo_put_tile_region+0x60/0x60 [nouveau] [ 1313.811203] nouveau_gem_new+0x66/0x110 [nouveau] [ 1313.811223] ? nouveau_gem_new+0x110/0x110 [nouveau] [ 1313.811241] nouveau_gem_ioctl_new+0x48/0xc0 [nouveau] [ 1313.811249] drm_ioctl_kernel+0x64/0xb0 [drm] [ 1313.811257] drm_ioctl+0x2a4/0x360 [drm] [ 1313.811276] ? nouveau_gem_new+0x110/0x110 [nouveau] [ 1313.811285] ? drm_ioctl+0x5/0x360 [drm] [ 1313.811304] nouveau_drm_ioctl+0x50/0xb0 [nouveau] [ 1313.811308] do_vfs_ioctl+0x90/0x690 [ 1313.811311] ? do_vfs_ioctl+0x5/0x690 [ 1313.811313] SyS_ioctl+0x3b/0x70 [ 1313.811316] entry_SYSCALL_64_fastpath+0x1f/0x91 [ 1313.811320] RIP: 0033:0x7f3234746227 [ 1313.811321] RSP: 002b:7ffc3ace0408 EFLAGS: 3246 ORIG_RAX: 0010 [ 1313.811324] RAX: ffda RBX: 025515d0 RCX: 7f3234746227 [ 1313.811325] RDX: 7ffc3ace0460 RSI: c0306480 RDI: 000b [ 1313.811326] RBP: 00824120 R08: 02548f80 R09: 025490d0 [ 1313.811328] R10: R11: 3246 R12: 093d [ 1313.811329] R13: 02aff74c R14: 00824150 R15: ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: Regression in TTM driver w/Linus' master
On 11/24/17 4:35 PM, Christian König wrote: Am 24.11.2017 um 16:17 schrieb Tobias Klausmann: On 11/24/17 3:54 PM, Daniel Vetter wrote: On Thu, Nov 23, 2017 at 03:24:38PM +0100, Tobias Klausmann wrote: On 11/23/17 2:58 AM, Dave Airlie wrote: On 23 November 2017 at 11:17, Laura Abbott wrote: Hi, Fedora QA testing reported a panic when booting up VMs using qmeu vga drivers (https://paste.fedoraproject.org/paste/498yRWTCJv2LKIrmj4EliQ) [ 30.108507] [ cut here ] [ 30.108920] kernel BUG at ./include/linux/gfp.h:408! [ 30.109356] invalid opcode: [#1] SMP [ 30.109700] Modules linked in: fuse nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack devlink ip_set nfnetlink ebtable_nat ebtable_broute bridge ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_mangle iptable_raw iptable_security ebtable_filter ebtables ip6table_filter ip6_tables snd_hda_codec_generic kvm_intel kvm snd_hda_intel snd_hda_codec irqbypass ppdev snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm bochs_drm ttm joydev drm_kms_helper virtio_balloon snd_timer snd parport_pc drm soundcore parport i2c_piix4 nls_utf8 isofs squashfs zstd_decompress xxhash 8021q garp mrp stp llc virtio_net [ 30.115605] virtio_console virtio_scsi crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel serio_raw virtio_pci virtio_ring virtio ata_generic pata_acpi qemu_fw_cfg sunrpc scsi_transport_iscsi loop [ 30.117425] CPU: 0 PID: 1347 Comm: gnome-shell Not tainted 4.15.0-0.rc0.git6.1.fc28.x86_64 #1 [ 30.118141] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014 [ 30.118866] task: 923a77e03380 task.stack: a78182228000 [ 30.119366] RIP: 0010:__alloc_pages_nodemask+0x35e/0x430 [ 30.119810] RSP: :a7818222bba8 EFLAGS: 00010202 [ 30.120250] RAX: 0001 RBX: 014382c6 RCX: 0006 [ 30.120840] RDX: RSI: 0009 RDI: [ 30.121443] RBP: 923a760d6000 R08: R09: 0006 [ 30.122039] R10: 0040 R11: 0300 R12: 923a729273c0 [ 30.122629] R13: R14: R15: 923a7483d400 [ 30.123223] FS: 7fe48da7dac0() GS:923a7cc0() knlGS: [ 30.123896] CS: 0010 DS: ES: CR0: 80050033 [ 30.124373] CR2: 7fe457b73000 CR3: 78313000 CR4: 06f0 [ 30.124968] Call Trace: [ 30.125186] ttm_pool_populate+0x19b/0x400 [ttm] [ 30.125578] ttm_bo_vm_fault+0x325/0x570 [ttm] [ 30.125964] __do_fault+0x19/0x11e [ 30.126255] __handle_mm_fault+0xcd3/0x1260 [ 30.126609] handle_mm_fault+0x14c/0x310 [ 30.126947] __do_page_fault+0x28c/0x530 [ 30.127282] do_page_fault+0x32/0x270 [ 30.127593] async_page_fault+0x22/0x30 [ 30.127922] RIP: 0033:0x7fe48aae39a8 [ 30.128225] RSP: 002b:7ffc21c4d928 EFLAGS: 00010206 [ 30.128664] RAX: 7fe457b73000 RBX: 55cd4c1041a0 RCX: 7fe457b73040 [ 30.129259] RDX: 0030 RSI: RDI: 7fe457b73000 [ 30.129855] RBP: 0300 R08: 000c R09: 0001 [ 30.130457] R10: 0001 R11: 0246 R12: 55cd4c1041a0 [ 30.131054] R13: 55cd4bdfe990 R14: 55cd4c104110 R15: 0400 [ 30.131648] Code: 11 01 00 0f 84 a9 00 00 00 65 ff 0d 6d cc dd 44 e9 0f ff ff ff 40 80 cd 80 e9 99 fe ff ff 48 89 c7 e8 e7 f6 01 00 e9 b7 fe ff ff <0f> 0b 0f ff e9 40 fd ff ff 65 48 8b 04 25 80 d5 00 00 8b 40 4c [ 30.133245] RIP: __alloc_pages_nodemask+0x35e/0x430 RSP: a7818222bba8 [ 30.133836] ---[ end trace d4f1deb60784f40a ]--- This is based off of Linus' master branch at c8a0739b185d11d6e2ca7ad9f5835841d1cfc765 Configs are at https://git.kernel.org/pub/scm/linux/kernel/git/jwboyer/fedora.git/commit/?h=rawhide&id=0be14662c54f49b4e640868b9d67df18d39edff0 Looks like a TTM regression due to: 0284f1ead87463bc17cf5e81a24fc65c052486f3 drm/ttm: add transparent huge page support for cached allocations v2 If the driver requests dma32 pages, we can end up trying to alloc huge dma32 pages which triggers the oops. The bochs driver always requests dma32 here. I'll send a rough patch once I boot it. Dave. Hi Dave, fyi only: It looks like this is not the only regression in this cycle with ttm, novueau seems to suffer as well [1]. Adding ttm folks. Might be useful if we have an entry for ttm in MAINTAINERS ... -Daniel A bit more of investigation for the nouveau regression: This only show when Transparent Hugepages (CONFIG_TRANSPARENT_HUGEPAGE) are enable. Thanks Dave for pointing me to that! Yeah, sorry for that. I missed to handle the DMA32 case with transpare
Re: Regression in TTM driver w/Linus' master
On 11/24/17 3:54 PM, Daniel Vetter wrote: On Thu, Nov 23, 2017 at 03:24:38PM +0100, Tobias Klausmann wrote: On 11/23/17 2:58 AM, Dave Airlie wrote: On 23 November 2017 at 11:17, Laura Abbott wrote: Hi, Fedora QA testing reported a panic when booting up VMs using qmeu vga drivers (https://paste.fedoraproject.org/paste/498yRWTCJv2LKIrmj4EliQ) [ 30.108507] [ cut here ] [ 30.108920] kernel BUG at ./include/linux/gfp.h:408! [ 30.109356] invalid opcode: [#1] SMP [ 30.109700] Modules linked in: fuse nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack devlink ip_set nfnetlink ebtable_nat ebtable_broute bridge ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_mangle iptable_raw iptable_security ebtable_filter ebtables ip6table_filter ip6_tables snd_hda_codec_generic kvm_intel kvm snd_hda_intel snd_hda_codec irqbypass ppdev snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm bochs_drm ttm joydev drm_kms_helper virtio_balloon snd_timer snd parport_pc drm soundcore parport i2c_piix4 nls_utf8 isofs squashfs zstd_decompress xxhash 8021q garp mrp stp llc virtio_net [ 30.115605] virtio_console virtio_scsi crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel serio_raw virtio_pci virtio_ring virtio ata_generic pata_acpi qemu_fw_cfg sunrpc scsi_transport_iscsi loop [ 30.117425] CPU: 0 PID: 1347 Comm: gnome-shell Not tainted 4.15.0-0.rc0.git6.1.fc28.x86_64 #1 [ 30.118141] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014 [ 30.118866] task: 923a77e03380 task.stack: a78182228000 [ 30.119366] RIP: 0010:__alloc_pages_nodemask+0x35e/0x430 [ 30.119810] RSP: :a7818222bba8 EFLAGS: 00010202 [ 30.120250] RAX: 0001 RBX: 014382c6 RCX: 0006 [ 30.120840] RDX: RSI: 0009 RDI: [ 30.121443] RBP: 923a760d6000 R08: R09: 0006 [ 30.122039] R10: 0040 R11: 0300 R12: 923a729273c0 [ 30.122629] R13: R14: R15: 923a7483d400 [ 30.123223] FS: 7fe48da7dac0() GS:923a7cc0() knlGS: [ 30.123896] CS: 0010 DS: ES: CR0: 80050033 [ 30.124373] CR2: 7fe457b73000 CR3: 78313000 CR4: 06f0 [ 30.124968] Call Trace: [ 30.125186] ttm_pool_populate+0x19b/0x400 [ttm] [ 30.125578] ttm_bo_vm_fault+0x325/0x570 [ttm] [ 30.125964] __do_fault+0x19/0x11e [ 30.126255] __handle_mm_fault+0xcd3/0x1260 [ 30.126609] handle_mm_fault+0x14c/0x310 [ 30.126947] __do_page_fault+0x28c/0x530 [ 30.127282] do_page_fault+0x32/0x270 [ 30.127593] async_page_fault+0x22/0x30 [ 30.127922] RIP: 0033:0x7fe48aae39a8 [ 30.128225] RSP: 002b:7ffc21c4d928 EFLAGS: 00010206 [ 30.128664] RAX: 7fe457b73000 RBX: 55cd4c1041a0 RCX: 7fe457b73040 [ 30.129259] RDX: 0030 RSI: RDI: 7fe457b73000 [ 30.129855] RBP: 0300 R08: 000c R09: 0001 [ 30.130457] R10: 0001 R11: 0246 R12: 55cd4c1041a0 [ 30.131054] R13: 55cd4bdfe990 R14: 55cd4c104110 R15: 0400 [ 30.131648] Code: 11 01 00 0f 84 a9 00 00 00 65 ff 0d 6d cc dd 44 e9 0f ff ff ff 40 80 cd 80 e9 99 fe ff ff 48 89 c7 e8 e7 f6 01 00 e9 b7 fe ff ff <0f> 0b 0f ff e9 40 fd ff ff 65 48 8b 04 25 80 d5 00 00 8b 40 4c [ 30.133245] RIP: __alloc_pages_nodemask+0x35e/0x430 RSP: a7818222bba8 [ 30.133836] ---[ end trace d4f1deb60784f40a ]--- This is based off of Linus' master branch at c8a0739b185d11d6e2ca7ad9f5835841d1cfc765 Configs are at https://git.kernel.org/pub/scm/linux/kernel/git/jwboyer/fedora.git/commit/?h=rawhide&id=0be14662c54f49b4e640868b9d67df18d39edff0 Looks like a TTM regression due to: 0284f1ead87463bc17cf5e81a24fc65c052486f3 drm/ttm: add transparent huge page support for cached allocations v2 If the driver requests dma32 pages, we can end up trying to alloc huge dma32 pages which triggers the oops. The bochs driver always requests dma32 here. I'll send a rough patch once I boot it. Dave. Hi Dave, fyi only: It looks like this is not the only regression in this cycle with ttm, novueau seems to suffer as well [1]. Adding ttm folks. Might be useful if we have an entry for ttm in MAINTAINERS ... -Daniel A bit more of investigation for the nouveau regression: This only show when Transparent Hugepages (CONFIG_TRANSPARENT_HUGEPAGE) are enable. Thanks Dave for pointing me to that! Greetings, Tobias Greetings, Tobias [1]: [ 404.918139] [ cut here ] [ 404.918147] kernel BUG at mm/shmem.c:4334! [ 404.918152] invalid opcode: [#2] PREEM
Re: Regression in TTM driver w/Linus' master
On 11/23/17 2:58 AM, Dave Airlie wrote: On 23 November 2017 at 11:17, Laura Abbott wrote: Hi, Fedora QA testing reported a panic when booting up VMs using qmeu vga drivers (https://paste.fedoraproject.org/paste/498yRWTCJv2LKIrmj4EliQ) [ 30.108507] [ cut here ] [ 30.108920] kernel BUG at ./include/linux/gfp.h:408! [ 30.109356] invalid opcode: [#1] SMP [ 30.109700] Modules linked in: fuse nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 xt_conntrack devlink ip_set nfnetlink ebtable_nat ebtable_broute bridge ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack libcrc32c iptable_mangle iptable_raw iptable_security ebtable_filter ebtables ip6table_filter ip6_tables snd_hda_codec_generic kvm_intel kvm snd_hda_intel snd_hda_codec irqbypass ppdev snd_hda_core snd_hwdep snd_seq snd_seq_device snd_pcm bochs_drm ttm joydev drm_kms_helper virtio_balloon snd_timer snd parport_pc drm soundcore parport i2c_piix4 nls_utf8 isofs squashfs zstd_decompress xxhash 8021q garp mrp stp llc virtio_net [ 30.115605] virtio_console virtio_scsi crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel serio_raw virtio_pci virtio_ring virtio ata_generic pata_acpi qemu_fw_cfg sunrpc scsi_transport_iscsi loop [ 30.117425] CPU: 0 PID: 1347 Comm: gnome-shell Not tainted 4.15.0-0.rc0.git6.1.fc28.x86_64 #1 [ 30.118141] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014 [ 30.118866] task: 923a77e03380 task.stack: a78182228000 [ 30.119366] RIP: 0010:__alloc_pages_nodemask+0x35e/0x430 [ 30.119810] RSP: :a7818222bba8 EFLAGS: 00010202 [ 30.120250] RAX: 0001 RBX: 014382c6 RCX: 0006 [ 30.120840] RDX: RSI: 0009 RDI: [ 30.121443] RBP: 923a760d6000 R08: R09: 0006 [ 30.122039] R10: 0040 R11: 0300 R12: 923a729273c0 [ 30.122629] R13: R14: R15: 923a7483d400 [ 30.123223] FS: 7fe48da7dac0() GS:923a7cc0() knlGS: [ 30.123896] CS: 0010 DS: ES: CR0: 80050033 [ 30.124373] CR2: 7fe457b73000 CR3: 78313000 CR4: 06f0 [ 30.124968] Call Trace: [ 30.125186] ttm_pool_populate+0x19b/0x400 [ttm] [ 30.125578] ttm_bo_vm_fault+0x325/0x570 [ttm] [ 30.125964] __do_fault+0x19/0x11e [ 30.126255] __handle_mm_fault+0xcd3/0x1260 [ 30.126609] handle_mm_fault+0x14c/0x310 [ 30.126947] __do_page_fault+0x28c/0x530 [ 30.127282] do_page_fault+0x32/0x270 [ 30.127593] async_page_fault+0x22/0x30 [ 30.127922] RIP: 0033:0x7fe48aae39a8 [ 30.128225] RSP: 002b:7ffc21c4d928 EFLAGS: 00010206 [ 30.128664] RAX: 7fe457b73000 RBX: 55cd4c1041a0 RCX: 7fe457b73040 [ 30.129259] RDX: 0030 RSI: RDI: 7fe457b73000 [ 30.129855] RBP: 0300 R08: 000c R09: 0001 [ 30.130457] R10: 0001 R11: 0246 R12: 55cd4c1041a0 [ 30.131054] R13: 55cd4bdfe990 R14: 55cd4c104110 R15: 0400 [ 30.131648] Code: 11 01 00 0f 84 a9 00 00 00 65 ff 0d 6d cc dd 44 e9 0f ff ff ff 40 80 cd 80 e9 99 fe ff ff 48 89 c7 e8 e7 f6 01 00 e9 b7 fe ff ff <0f> 0b 0f ff e9 40 fd ff ff 65 48 8b 04 25 80 d5 00 00 8b 40 4c [ 30.133245] RIP: __alloc_pages_nodemask+0x35e/0x430 RSP: a7818222bba8 [ 30.133836] ---[ end trace d4f1deb60784f40a ]--- This is based off of Linus' master branch at c8a0739b185d11d6e2ca7ad9f5835841d1cfc765 Configs are at https://git.kernel.org/pub/scm/linux/kernel/git/jwboyer/fedora.git/commit/?h=rawhide&id=0be14662c54f49b4e640868b9d67df18d39edff0 Looks like a TTM regression due to: 0284f1ead87463bc17cf5e81a24fc65c052486f3 drm/ttm: add transparent huge page support for cached allocations v2 If the driver requests dma32 pages, we can end up trying to alloc huge dma32 pages which triggers the oops. The bochs driver always requests dma32 here. I'll send a rough patch once I boot it. Dave. Hi Dave, fyi only: It looks like this is not the only regression in this cycle with ttm, novueau seems to suffer as well [1]. Greetings, Tobias [1]: [ 404.918139] [ cut here ] [ 404.918147] kernel BUG at mm/shmem.c:4334! [ 404.918152] invalid opcode: [#2] PREEMPT SMP [ 404.918157] Modules linked in: rfcomm af_packet bnep uvcvideo videobuf2_vmalloc videobuf2_memops rtsx_usb_ms videobuf2_v4l2 memstick videodev videobuf2_core btusb btrtl btbcm arc4 msr snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic joydev nls_iso8859_1 nls_cp437 hid_multitouch vfat fat iTCO_wdt iTCO_vendor_support intel_rapl x86_pkg_temp_thermal intel_powerclamp ath10k_pci coretemp ath10k_core a
Re: [Nouveau] [PATCH] drm/nouveau/mpeg: print more debug info when rejecting dma objects
Hi, Lgtm! Reviewed-by: Tobias Klausmann On 8/6/17 4:19 AM, Ilia Mirkin wrote: > Signed-off-by: Ilia Mirkin > --- > > This was helpful when debugging our earlier mpeg woes. May as well have it > upstream. > > drivers/gpu/drm/nouveau/nvkm/engine/mpeg/nv31.c | 7 ++- > drivers/gpu/drm/nouveau/nvkm/engine/mpeg/nv40.c | 7 ++- > 2 files changed, 12 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/mpeg/nv31.c > b/drivers/gpu/drm/nouveau/nvkm/engine/mpeg/nv31.c > index 8a8895246d26..99f33d88d940 100644 > --- a/drivers/gpu/drm/nouveau/nvkm/engine/mpeg/nv31.c > +++ b/drivers/gpu/drm/nouveau/nvkm/engine/mpeg/nv31.c > @@ -124,6 +124,8 @@ nv31_mpeg_tile(struct nvkm_engine *engine, int i, struct > nvkm_fb_tile *tile) > static bool > nv31_mpeg_mthd_dma(struct nvkm_device *device, u32 mthd, u32 data) > { > + struct nv31_mpeg *mpeg = nv31_mpeg(device->mpeg); > + struct nvkm_subdev *subdev = &mpeg->engine.subdev; > u32 inst = data << 4; > u32 dma0 = nvkm_rd32(device, 0x70 + inst); > u32 dma1 = nvkm_rd32(device, 0x74 + inst); > @@ -132,8 +134,11 @@ nv31_mpeg_mthd_dma(struct nvkm_device *device, u32 mthd, > u32 data) > u32 size = dma1 + 1; > > /* only allow linear DMA objects */ > - if (!(dma0 & 0x2000)) > + if (!(dma0 & 0x2000)) { > + nvkm_error(subdev, "inst %08x dma0 %08x dma1 %08x dma2 %08x\n", > +inst, dma0, dma1, dma2); > return false; > + } > > if (mthd == 0x0190) { > /* DMA_CMD */ > diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/mpeg/nv40.c > b/drivers/gpu/drm/nouveau/nvkm/engine/mpeg/nv40.c > index 16de5bd94b14..b5ec7c504dc6 100644 > --- a/drivers/gpu/drm/nouveau/nvkm/engine/mpeg/nv40.c > +++ b/drivers/gpu/drm/nouveau/nvkm/engine/mpeg/nv40.c > @@ -31,6 +31,8 @@ bool > nv40_mpeg_mthd_dma(struct nvkm_device *device, u32 mthd, u32 data) > { > struct nvkm_instmem *imem = device->imem; > + struct nv31_mpeg *mpeg = nv31_mpeg(device->mpeg); > + struct nvkm_subdev *subdev = &mpeg->engine.subdev; > u32 inst = data << 4; > u32 dma0 = nvkm_instmem_rd32(imem, inst + 0); > u32 dma1 = nvkm_instmem_rd32(imem, inst + 4); > @@ -39,8 +41,11 @@ nv40_mpeg_mthd_dma(struct nvkm_device *device, u32 mthd, > u32 data) > u32 size = dma1 + 1; > > /* only allow linear DMA objects */ > - if (!(dma0 & 0x2000)) > + if (!(dma0 & 0x2000)) { > + nvkm_error(subdev, "inst %08x dma0 %08x dma1 %08x dma2 %08x\n", > +inst, dma0, dma1, dma2); > return false; > + } > > if (mthd == 0x0190) { > /* DMA_CMD */ ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [PATCH 17/29] drm/nouveau: switch to drm_*{get,put} helpers
Looks good to me! Reviewed-by: Tobias Klausmann On 8/3/17 1:58 PM, Cihangir Akturk wrote: > drm_*_reference() and drm_*_unreference() functions are just > compatibility alias for drm_*_get() and drm_*_put() adn should not be > used by new code. So convert all users of compatibility functions to use > the new APIs. > > Signed-off-by: Cihangir Akturk > --- > drivers/gpu/drm/nouveau/dispnv04/crtc.c | 2 +- > drivers/gpu/drm/nouveau/nouveau_abi16.c | 2 +- > drivers/gpu/drm/nouveau/nouveau_display.c | 8 > drivers/gpu/drm/nouveau/nouveau_fbcon.c | 2 +- > drivers/gpu/drm/nouveau/nouveau_gem.c | 14 +++--- > drivers/gpu/drm/nouveau/nv50_display.c| 2 +- > 6 files changed, 15 insertions(+), 15 deletions(-) > > diff --git a/drivers/gpu/drm/nouveau/dispnv04/crtc.c > b/drivers/gpu/drm/nouveau/dispnv04/crtc.c > index 4b4b0b4..18b4be1 100644 > --- a/drivers/gpu/drm/nouveau/dispnv04/crtc.c > +++ b/drivers/gpu/drm/nouveau/dispnv04/crtc.c > @@ -1019,7 +1019,7 @@ nv04_crtc_cursor_set(struct drm_crtc *crtc, struct > drm_file *file_priv, > nv_crtc->cursor.set_offset(nv_crtc, nv_crtc->cursor.offset); > nv_crtc->cursor.show(nv_crtc, true); > out: > - drm_gem_object_unreference_unlocked(gem); > + drm_gem_object_put_unlocked(gem); > return ret; > } > > diff --git a/drivers/gpu/drm/nouveau/nouveau_abi16.c > b/drivers/gpu/drm/nouveau/nouveau_abi16.c > index f98f800..3e9db5a 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_abi16.c > +++ b/drivers/gpu/drm/nouveau/nouveau_abi16.c > @@ -136,7 +136,7 @@ nouveau_abi16_chan_fini(struct nouveau_abi16 *abi16, > if (chan->ntfy) { > nouveau_bo_vma_del(chan->ntfy, &chan->ntfy_vma); > nouveau_bo_unpin(chan->ntfy); > - drm_gem_object_unreference_unlocked(&chan->ntfy->gem); > + drm_gem_object_put_unlocked(&chan->ntfy->gem); > } > > if (chan->heap.block_size) > diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c > b/drivers/gpu/drm/nouveau/nouveau_display.c > index 8d1df56..a68fe1a 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_display.c > +++ b/drivers/gpu/drm/nouveau/nouveau_display.c > @@ -206,7 +206,7 @@ nouveau_user_framebuffer_destroy(struct drm_framebuffer > *drm_fb) > struct nouveau_framebuffer *fb = nouveau_framebuffer(drm_fb); > > if (fb->nvbo) > - drm_gem_object_unreference_unlocked(&fb->nvbo->gem); > + drm_gem_object_put_unlocked(&fb->nvbo->gem); > > drm_framebuffer_cleanup(drm_fb); > kfree(fb); > @@ -267,7 +267,7 @@ nouveau_user_framebuffer_create(struct drm_device *dev, > if (ret == 0) > return &fb->base; > > - drm_gem_object_unreference_unlocked(gem); > + drm_gem_object_put_unlocked(gem); > return ERR_PTR(ret); > } > > @@ -947,7 +947,7 @@ nouveau_display_dumb_create(struct drm_file *file_priv, > struct drm_device *dev, > return ret; > > ret = drm_gem_handle_create(file_priv, &bo->gem, &args->handle); > - drm_gem_object_unreference_unlocked(&bo->gem); > + drm_gem_object_put_unlocked(&bo->gem); > return ret; > } > > @@ -962,7 +962,7 @@ nouveau_display_dumb_map_offset(struct drm_file > *file_priv, > if (gem) { > struct nouveau_bo *bo = nouveau_gem_object(gem); > *poffset = drm_vma_node_offset_addr(&bo->bo.vma_node); > - drm_gem_object_unreference_unlocked(gem); > + drm_gem_object_put_unlocked(gem); > return 0; > } > > diff --git a/drivers/gpu/drm/nouveau/nouveau_fbcon.c > b/drivers/gpu/drm/nouveau/nouveau_fbcon.c > index 2665a07..6c9e1ec 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_fbcon.c > +++ b/drivers/gpu/drm/nouveau/nouveau_fbcon.c > @@ -451,7 +451,7 @@ nouveau_fbcon_destroy(struct drm_device *dev, struct > nouveau_fbdev *fbcon) > nouveau_bo_vma_del(nouveau_fb->nvbo, &nouveau_fb->vma); > nouveau_bo_unmap(nouveau_fb->nvbo); > nouveau_bo_unpin(nouveau_fb->nvbo); > - drm_framebuffer_unreference(&nouveau_fb->base); > + drm_framebuffer_put(&nouveau_fb->base); > } > > return 0; > diff --git a/drivers/gpu/drm/nouveau/nouveau_gem.c > b/drivers/gpu/drm/nouveau/nouveau_gem.c > index 2170534..653425c 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_gem.c > +++ b/drivers/gpu/drm/nouveau/nouveau_gem.c > @@ -281,7 +281,7 @@ nouveau_gem_ioctl_new(struct drm_device *dev, vo
[PATCH] drm: disable vblank only if it got previously enabled
mimic the behavior of vblank_disable_fn(), another caller of drm_vblank_disable_and_save(). This avoids oopsing, while trying to disable vblank on a not connected display: [ 12.768079] WARNING: CPU: 0 PID: 274 at drivers/gpu/drm/drm_vblank.c:609 drm_calc_vbltimestamp_from_scanoutpos+0x296/0x320 [drm] [ 12.768080] Modules linked in: bnep snd_hda_codec_hdmi rtsx_usb_sdmmc uvcvideo rtsx_usb_ms mmc_core videobuf2_vmalloc memstick videobuf2_memops videobuf2_v4l2 videobuf2_core rtsx_usb videodev btusb btrtl arc4 snd_hda_codec_realtek snd_hda_codec_generic joydev nls_iso8859_1 hid_multitouch nls_cp437 intel_rapl x86_pkg_temp_thermal intel_powerclamp vfat coretemp fat kvm_intel iTCO_wdt iTCO_vendor_support kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel ath10k_pci snd_hda_intel ath10k_core aes_x86_64 snd_hda_codec crypto_simd ath glue_helper cryptd snd_hda_core mac80211 snd_hwdep snd_pcm pcspkr r8169 cfg80211 mii snd_timer acer_wmi snd sparse_keymap wmi_bmof idma64 hci_uart virt_dma mei_me soundcore i2c_i801 mei btbcm shpchp intel_lpss_pci intel_pch_thermal [ 12.768130] serdev btqca ucsi_acpi btintel typec_ucsi thermal typec bluetooth ecdh_generic battery ac pinctrl_sunrisepoint rfkill intel_lpss_acpi pinctrl_intel intel_lpss acpi_pad nouveau serio_raw i915 mxm_wmi ttm i2c_algo_bit drm_kms_helper xhci_pci syscopyarea sysfillrect sysimgblt xhci_hcd fb_sys_fops usbcore drm i2c_hid wmi video button sg efivarfs [ 12.768158] CPU: 0 PID: 274 Comm: kworker/0:2 Not tainted 4.12.0-desktop-debug-drm+ #2 [ 12.768160] Hardware name: Acer Aspire VN7-593G/Pluto_KLS, BIOS V1.04 03/30/2017 [ 12.768164] Workqueue: pm pm_runtime_work [ 12.768166] task: 889bf1627040 task.stack: 9541013e4000 [ 12.768180] RIP: 0010:drm_calc_vbltimestamp_from_scanoutpos+0x296/0x320 [drm] [ 12.768181] RSP: 0018:9541013e7b30 EFLAGS: 00010086 [ 12.768183] RAX: 001c RBX: 889b4cebd000 RCX: 0004 [ 12.768184] RDX: 8004 RSI: 87a2d952 RDI: [ 12.768186] RBP: 9541013e7b90 R08: 0001 R09: 039f [ 12.768187] R10: c05fe530 R11: R12: [ 12.768188] R13: 9541013e7ba4 R14: 889bf0426088 R15: 889bf0426000 [ 12.768190] FS: () GS:889bfec0() knlGS: [ 12.768191] CS: 0010 DS: ES: CR0: 80050033 [ 12.768192] CR2: 00edb16580b8 CR3: 00020cc09000 CR4: 003406f0 [ 12.768193] Call Trace: [ 12.768198] ? enqueue_task_fair+0x64/0x600 [ 12.768211] ? drm_get_last_vbltimestamp+0x47/0x70 [drm] [ 12.768223] ? drm_update_vblank_count+0x65/0x240 [drm] [ 12.768227] ? pci_pm_runtime_resume+0xa0/0xa0 [ 12.768238] ? drm_vblank_disable_and_save+0x55/0xc0 [drm] [ 12.768250] ? drm_crtc_vblank_off+0xa9/0x1e0 [drm] [ 12.768253] ? pci_pm_runtime_resume+0xa0/0xa0 [ 12.768299] ? nouveau_display_fini+0x56/0xd0 [nouveau] [ 12.768339] ? nouveau_display_suspend+0x51/0x110 [nouveau] [ 12.768378] ? nouveau_do_suspend+0x76/0x1c0 [nouveau] [ 12.768413] ? nouveau_pmops_runtime_suspend+0x54/0xb0 [nouveau] [ 12.768416] ? pci_pm_runtime_suspend+0x5c/0x160 [ 12.768419] ? __rpm_callback+0xb6/0x1e0 [ 12.768423] ? kobject_uevent_env+0x111/0x5e0 [ 12.768425] ? pci_pm_runtime_resume+0xa0/0xa0 [ 12.768427] ? rpm_callback+0x1f/0x70 [ 12.768429] ? pci_pm_runtime_resume+0xa0/0xa0 [ 12.768431] ? rpm_suspend+0x11f/0x640 [ 12.768441] ? drm_fb_helper_hotplug_event+0x9a/0xe0 [drm_kms_helper] [ 12.768447] ? output_poll_execute+0x17b/0x1a0 [drm_kms_helper] [ 12.768449] ? pm_runtime_work+0x64/0xa0 [ 12.768453] ? process_one_work+0x1db/0x410 [ 12.768456] ? worker_thread+0x47/0x3d0 [ 12.768459] ? process_one_work+0x410/0x410 [ 12.768461] ? kthread+0x117/0x130 [ 12.768463] ? kthread_create_on_node+0x40/0x40 [ 12.768466] ? ret_from_fork+0x25/0x30 [ 12.768468] Code: 80 3d 26 f3 01 00 00 0f 85 ad fd ff ff 48 8b 43 20 48 c7 c7 31 a2 20 c0 c6 05 0e f3 01 00 01 48 8b b0 60 01 00 00 e8 75 2e ec c6 <0f> ff e9 88 fd ff ff 31 f6 44 88 55 b0 e8 38 fa ed c6 44 0f b6 [ 12.768508] ---[ end trace d9bb853af3659bd5 ]--- Signed-off-by: Tobias Klausmann --- drivers/gpu/drm/drm_vblank.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/drm_vblank.c b/drivers/gpu/drm/drm_vblank.c index a233a6be934a..4a21756bf2bd 100644 --- a/drivers/gpu/drm/drm_vblank.c +++ b/drivers/gpu/drm/drm_vblank.c @@ -1140,8 +1140,11 @@ void drm_crtc_vblank_off(struct drm_crtc *crtc) /* Avoid redundant vblank disables without previous * drm_crtc_vblank_on(). */ - if (drm_core_check_feature(dev, DRIVER_ATOMIC) || !vblank->inmodeset) + if (drm_core_check_feature(dev, DRIVER_ATOMIC) || (!vblank->inmodeset && + vblank->enabled)) { + DRM_DEBUG(&q
Re: [Nouveau] [PATCH] drm: disable vblank only if it got previously enabled
Mh ok, paper over in nouveau_display_fini until Ben comes up with a better idea then?! Greetings, Tobias On 7/20/17 10:13 AM, Daniel Vetter wrote: > On Wed, Jul 19, 2017 at 04:10:50PM -0400, Ilia Mirkin wrote: >> I believe the solution is to not call drm_crtc_vblank_off for atomic >> modesetting in nouveau_display_fini. I think Ben's working on it. > Yes, the goal of vblank_on/off was very much to not paper over driver bugs > with clever tricks like these. If the driver cant keep track of its > vblank, something has gone wrong, and the core should _not_ fix it up. > Otherwise we're back to the old style vblank horror show. > > Thanks, Daniel > >> On Wed, Jul 19, 2017 at 1:25 PM, Tobias Klausmann >> wrote: >>> mimic the behavior of vblank_disable_fn(), another caller of >>> drm_vblank_disable_and_save(). >>> >>> This avoids oopsing, while trying to disable vblank on a not connected >>> display: >>> >>> [ 12.768079] WARNING: CPU: 0 PID: 274 at drivers/gpu/drm/drm_vblank.c:609 >>> drm_calc_vbltimestamp_from_scanoutpos+0x296/0x320 [drm] >>> [ 12.768080] Modules linked in: bnep snd_hda_codec_hdmi rtsx_usb_sdmmc >>> uvcvideo rtsx_usb_ms mmc_core videobuf2_vmalloc memstick videobuf2_memops >>> videobuf2_v4l2 videobuf2_core rtsx_usb videodev btusb btrtl arc4 >>> snd_hda_codec_realtek snd_hda_codec_generic joydev nls_iso8859_1 >>> hid_multitouch nls_cp437 intel_rapl x86_pkg_temp_thermal intel_powerclamp >>> vfat coretemp fat kvm_intel iTCO_wdt iTCO_vendor_support kvm irqbypass >>> crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc >>> aesni_intel ath10k_pci snd_hda_intel ath10k_core aes_x86_64 snd_hda_codec >>> crypto_simd ath glue_helper cryptd snd_hda_core mac80211 snd_hwdep snd_pcm >>> pcspkr r8169 cfg80211 mii snd_timer acer_wmi snd sparse_keymap wmi_bmof >>> idma64 hci_uart virt_dma mei_me soundcore i2c_i801 mei btbcm shpchp >>> intel_lpss_pci intel_pch_thermal >>> [ 12.768130] serdev btqca ucsi_acpi btintel typec_ucsi thermal typec >>> bluetooth ecdh_generic battery ac pinctrl_sunrisepoint rfkill >>> intel_lpss_acpi pinctrl_intel intel_lpss acpi_pad nouveau serio_raw i915 >>> mxm_wmi ttm i2c_algo_bit drm_kms_helper xhci_pci syscopyarea sysfillrect >>> sysimgblt xhci_hcd fb_sys_fops usbcore drm i2c_hid wmi video button sg >>> efivarfs >>> [ 12.768158] CPU: 0 PID: 274 Comm: kworker/0:2 Not tainted >>> 4.12.0-desktop-debug-drm+ #2 >>> [ 12.768160] Hardware name: Acer Aspire VN7-593G/Pluto_KLS, BIOS V1.04 >>> 03/30/2017 >>> [ 12.768164] Workqueue: pm pm_runtime_work >>> [ 12.768166] task: 889bf1627040 task.stack: 9541013e4000 >>> [ 12.768180] RIP: 0010:drm_calc_vbltimestamp_from_scanoutpos+0x296/0x320 >>> [drm] >>> [ 12.768181] RSP: 0018:9541013e7b30 EFLAGS: 00010086 >>> [ 12.768183] RAX: 001c RBX: 889b4cebd000 RCX: >>> 0004 >>> [ 12.768184] RDX: 8004 RSI: 87a2d952 RDI: >>> >>> [ 12.768186] RBP: 9541013e7b90 R08: 0001 R09: >>> 039f >>> [ 12.768187] R10: c05fe530 R11: R12: >>> >>> [ 12.768188] R13: 9541013e7ba4 R14: 889bf0426088 R15: >>> 889bf0426000 >>> [ 12.768190] FS: () GS:889bfec0() >>> knlGS: >>> [ 12.768191] CS: 0010 DS: ES: CR0: 80050033 >>> [ 12.768192] CR2: 00edb16580b8 CR3: 00020cc09000 CR4: >>> 003406f0 >>> [ 12.768193] Call Trace: >>> [ 12.768198] ? enqueue_task_fair+0x64/0x600 >>> [ 12.768211] ? drm_get_last_vbltimestamp+0x47/0x70 [drm] >>> [ 12.768223] ? drm_update_vblank_count+0x65/0x240 [drm] >>> [ 12.768227] ? pci_pm_runtime_resume+0xa0/0xa0 >>> [ 12.768238] ? drm_vblank_disable_and_save+0x55/0xc0 [drm] >>> [ 12.768250] ? drm_crtc_vblank_off+0xa9/0x1e0 [drm] >>> [ 12.768253] ? pci_pm_runtime_resume+0xa0/0xa0 >>> [ 12.768299] ? nouveau_display_fini+0x56/0xd0 [nouveau] >>> [ 12.768339] ? nouveau_display_suspend+0x51/0x110 [nouveau] >>> [ 12.768378] ? nouveau_do_suspend+0x76/0x1c0 [nouveau] >>> [ 12.768413] ? nouveau_pmops_runtime_suspend+0x54/0xb0 [nouveau] >>> [ 12.768416] ? pci_pm_runtime_suspend+0x5c/0x160 >>> [ 12.768419] ? __rpm_callback+0xb6/0x1e0 >>> [ 12.768423] ? kobject_uevent_env+0x111/0x5
Re: [Nouveau] [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335
The conversion is a nice catch, but i'd like to have a bit more context, see below! With a better description: Tobias Klausmann On 7/14/17 5:10 PM, Karol Herbst wrote: Yeah, we shouldn't let the machine die. Are there more WARN_ON_ONCE usage we could convert to WARN_ONCE? Reviewed-By: Karol Herbst On Fri, Jul 14, 2017 at 5:05 PM, Tobias Klausmann wrote: On 7/14/17 3:41 PM, Mike Galbraith wrote: On Fri, 2017-07-14 at 15:36 +0200, Mike Galbraith wrote: All DRM did was to slip a WARN_ON_ONCE() that nouveau triggers into a kernel module where such things no longer warn, they blow the box out of the water. BTW, turn that irksome WARN_ON_ONCE() in drivers/gpu/drm/drm_vblank.c into a WARN_ONCE(), and all is peachy, you get the warning, box lives. --- drivers/gpu/drm/drm_vblank.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) --- a/drivers/gpu/drm/drm_vblank.c +++ b/drivers/gpu/drm/drm_vblank.c @@ -605,7 +605,8 @@ bool drm_calc_vbltimestamp_from_scanoutp */ if (mode->crtc_clock == 0) { DRM_DEBUG("crtc %u: Noop due to uninitialized mode.\n", pipe); - WARN_ON_ONCE(drm_drv_uses_atomic_modeset(dev)); + WARN_ONCE(drm_drv_uses_atomic_modeset(dev), "%s: report me.\n", "report me" seems a bit odd, maybe just uninitialized mode? + dev->driver->name); return false; } Hey, confirmed this helps saving the box, but we still have to find the root cause! Backtrace with the above fix applied (and the one which came in with the latest drm-fixes merge)! [1] https://hastebin.com/uyoqifijed.http Thanks, Tobias Reviewed-By: Karol Herbst ___ Nouveau mailing list nouv...@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/nouveau ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335
On 7/14/17 3:41 PM, Mike Galbraith wrote: On Fri, 2017-07-14 at 15:36 +0200, Mike Galbraith wrote: All DRM did was to slip a WARN_ON_ONCE() that nouveau triggers into a kernel module where such things no longer warn, they blow the box out of the water. BTW, turn that irksome WARN_ON_ONCE() in drivers/gpu/drm/drm_vblank.c into a WARN_ONCE(), and all is peachy, you get the warning, box lives. --- drivers/gpu/drm/drm_vblank.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) --- a/drivers/gpu/drm/drm_vblank.c +++ b/drivers/gpu/drm/drm_vblank.c @@ -605,7 +605,8 @@ bool drm_calc_vbltimestamp_from_scanoutp */ if (mode->crtc_clock == 0) { DRM_DEBUG("crtc %u: Noop due to uninitialized mode.\n", pipe); - WARN_ON_ONCE(drm_drv_uses_atomic_modeset(dev)); + WARN_ONCE(drm_drv_uses_atomic_modeset(dev), "%s: report me.\n", + dev->driver->name); return false; } Hey, confirmed this helps saving the box, but we still have to find the root cause! Backtrace with the above fix applied (and the one which came in with the latest drm-fixes merge)! [1] https://hastebin.com/uyoqifijed.http Thanks, Tobias ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: [regression drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335
On 7/12/17 7:19 PM, Mike Galbraith wrote: On Wed, 2017-07-12 at 07:37 -0400, Ilia Mirkin wrote: On Wed, Jul 12, 2017 at 7:25 AM, Mike Galbraith wrote: On Wed, 2017-07-12 at 11:55 +0200, Mike Galbraith wrote: On Tue, 2017-07-11 at 14:22 -0400, Ilia Mirkin wrote: Some display stuff did change for 4.13 for GM20x+ boards. If it's not too much trouble, a bisect would be pretty useful. Bisection seemingly went fine, but the result is odd. e98c58e55f68f8785aebfab1f8c9a03d8de0afe1 is the first bad commit But it really really is bad. Looking at gitk fork in the road leading to it... 52d9d38c183b drm/sti:fix spelling mistake: "compoment" -> "component" - good e4e818cc2d7c drm: make drm_panel.h self-contained - good 9cf8f5802f39 drm: add missing declaration to drm_blend.h - good Before the git highway splits, all is well. The lane with commits works fine at both ends, but e98c58e55f68 is busted. Merge arfifact? Hmmm... that tree does not appear to have gotten a v4.12 backmerge at any point. The last backmerge from Linus as far as I can tell was v4.11-rc7. Could be an interaction with some out-of-tree change. FWIW, checking out the fingered commit then.. git log --oneline 52d9d38c183b..e98c58e55f68|grep nouveau and reverting the lot helped not at all. Checking out 6b7781b42dc9 and reverting the fingered commit did. Given the nouveau bits reverted are mostly the vblank changes, CC to Daniel, maybe he'll know why both GTX 980 and GeForce 8600 GT get all upset. Either I'm damn lucky, both of my nvidia equipped boxen going boom 100% repeatably, or there are a lot of folks out there who haven't yet tried suspend with our latest/greatest kernel. I suspect the later. -Mike I should have had a look at my inbox, would have save me a log of work bisecting. Yet i come to the same conclusion: # first bad commit: [e98c58e55f68f8785aebfab1f8c9a03d8de0afe1] Merge tag 'drm-misc-next-2017-05-16' of git://anongit.freedesktop.org/git/drm-misc into drm-next I suspect it is some vblank change as it shows up in every trace i have seen while bisecting, but that is just a wild guess... Greetings, Tobias ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
[Nouveau] [PATCH] ram/gf100-: error out if a ridiculous amount of vram is detected
Any idea on how to solve the problem. other than just reporting it? But for now this adds a helpful error message... you may add my R-b. On 20.05.2015 22:01, Ilia Mirkin wrote: > Some newer chips have trouble coming up, and we get bad MMIO reads from > them, like 0xbadf100. This ends up translating into crazy amounts of > VRAM, which destroys all sorts of other logic down the line. Instead, > fail device init. > > Signed-off-by: Ilia Mirkin > Cc: stable at kernel.org > --- > drm/nouveau/nvkm/subdev/fb/ramgf100.c | 6 ++ > 1 file changed, 6 insertions(+) > > diff --git a/drm/nouveau/nvkm/subdev/fb/ramgf100.c > b/drm/nouveau/nvkm/subdev/fb/ramgf100.c > index de9f395..9d4d196 100644 > --- a/drm/nouveau/nvkm/subdev/fb/ramgf100.c > +++ b/drm/nouveau/nvkm/subdev/fb/ramgf100.c > @@ -545,6 +545,12 @@ gf100_ram_create_(struct nvkm_object *parent, struct > nvkm_object *engine, > } > } > > + /* if over 1TB of VRAM is reported, something went very wrong, bail */ > + if (ram->size > (1ULL << 40)) { > + nv_error(pfb, "invalid vram size: %llx\n", ram->size); > + return -EINVAL; > + } > + > /* if all controllers have the same amount attached, there's no holes */ > if (uniform) { > offset = rsvd_head;
3.18-rc regression: drm/nouveau: use shared fences for readable objects
On 26.11.2014 21:29, Michael Marineau wrote: > On Mon, Nov 24, 2014 at 11:43 PM, Maarten Lankhorst > wrote: >> Hey, >> >> Op 22-11-14 om 21:16 schreef Michael Marineau: >>> On Nov 22, 2014 11:45 AM, "Michael Marineau" wrote: On Nov 22, 2014 8:56 AM, "Maarten Lankhorst" < >>> maarten.lankhorst at canonical.com> wrote: > Hey, > > Op 22-11-14 om 01:19 schreef Michael Marineau: >> On Thu, Nov 20, 2014 at 12:53 AM, Maarten Lankhorst >> wrote: >>> Op 20-11-14 om 05:06 schreef Michael Marineau: On Wed, Nov 19, 2014 at 12:10 AM, Maarten Lankhorst wrote: > Hey, > > On 19-11-14 07:43, Michael Marineau wrote: >> On 3.18-rc kernel's I have been intermittently experiencing GPU >> lockups shortly after startup, accompanied with one or both of the >> following errors: >> >> nouveau E[ PFIFO][:01:00.0] read fault at 0x000734a000 [PTE] >> from PBDMA0/HOST_CPU on channel 0x007faa3000 [unknown] >> nouveau E[ DRM] GPU lockup - switching to software fbcon >> >> I was able to trace the issue with bisect to commit >> 809e9447b92ffe1346b2d6ec390e212d5307f61c "drm/nouveau: use shared >> fences for readable objects". The lockups appear to have cleared >>> up >> since reverting that and a few related followup commits: >> >> 809e9447: "drm/nouveau: use shared fences for readable objects" >> 055dffdf: "drm/nouveau: bump driver patchlevel to 1.2.1" >> e3be4c23: "drm/nouveau: specify if interruptible wait is desired >>> in >> nouveau_fence_sync" >> 15a996bb: "drm/nouveau: assign fence_chan->name correctly" > Weird. I'm not sure yet what causes it. > > >>> http://cgit.freedesktop.org/~mlankhorst/linux/commit/?h=fixed-fences-for-bisect&id=86be4f216bbb9ea3339843a5658d4c21162c7ee2 Building a kernel from that commit gives me an entirely new >>> behavior: X hangs for at least 10-20 seconds at a time with brief moments of responsiveness before hanging again while gitk on the kernel repo loads. Otherwise the system is responsive. The head of that fixed-fences-for-bisect branch (1c6aafb5) which is the "use shared fences for readable objects" commit I originally bisected to does feature the complete lockups I was seeing before. >>> Ok for the sake of argument lets just assume they're separate bugs, >>> and we should look at xorg >>> hanging first. >>> >>> Is there anything in the dmesg when the hanging happens? >>> >>> And it's probably 15 seconds, if it's called through >>> nouveau_fence_wait. >>> Try changing else if (!ret) to else if (WARN_ON(!ret)) in that >>> function, and see if you get some dmesg spam. :) >> Adding the WARN_ON to 86be4f21 repots the following: >> >> [ 1188.676073] [ cut here ] >> [ 1188.676161] WARNING: CPU: 1 PID: 474 at >> drivers/gpu/drm/nouveau/nouveau_fence.c:359 >> nouveau_fence_wait.part.9+0x33/0x40 [nouveau]() >> [ 1188.676166] Modules linked in: rndis_host cdc_ether usbnet mii bnep >> ecb btusb bluetooth rfkill bridge stp llc hid_generic usb_storage >> joydev mousedev hid_apple usbhid bcm5974 nls_iso8859_1 nls_cp437 vfat >> fat nouveau snd_hda_codec_hdmi coretemp x86_pkg_temp_thermal >> intel_powerclamp kvm_intel kvm iTCO_wdt crct10dif_pclmul >> iTCO_vendor_support crc32c_intel evdev aesni_intel mac_hid aes_x86_64 >> lrw glue_helper ablk_helper applesmc snd_hda_codec_cirrus cryptd >> input_polldev snd_hda_codec_generic mxm_wmi led_class wmi microcode >> hwmon snd_hda_intel ttm snd_hda_controller lpc_ich i2c_i801 mfd_core >> snd_hda_codec i2c_algo_bit snd_hwdep drm_kms_helper snd_pcm sbs drm >> apple_gmux i2ccore snd_timer snd agpgart mei_me soundcore sbshc mei >> video xhci_hcd usbcore usb_common apple_bl button battery ac efivars >> autofs4 >> [ 1188.676300] efivarfs >> [ 1188.676308] CPU: 1 PID: 474 Comm: Xorg Tainted: GW >> 3.17.0-rc2-nvtest+ #147 >> [ 1188.676313] Hardware name: Apple Inc. >> MacBookPro11,3/Mac-2BD1B31983FE1663, BIOS >> MBP112.88Z.0138.B11.1408291503 08/29/2014 >> [ 1188.676316] 0009 88045daebce8 814f0c09 >> >> [ 1188.676325] 88045daebd20 8104ea5d 88006a6c1468 >> fff0 >> [ 1188.676333] 88006a6c1000 >> 88045daebd30 >> [ 1188.676341] Call Trace: >> [ 1188.676356] [] dump_stack+0x4d/0x66 >> [ 1188.676369] [] warn_slowpath_common+0x7d/0xa0 >> [ 1188.676377] [] warn_slowpath_null+0x1a/0x20 >> [ 1188.676439] [] >> nouveau_fence_wait.part.9+0x33/0x40 [nouveau] >> [ 1188.676496] [] nouveau_fence_wait+0x16/0x30 >>> [nouveau] >> [ 1188.676552] [] >> nouveau
3.18-rc regression: drm/nouveau: use shared fences for readable objects
On 19.11.2014 09:10, Maarten Lankhorst wrote: > ... > On the EDITED patch from fixed-fences-for-bisect, can you do the following: > > In nouveau/nv84_fence.c function nv84_fence_context_new, remove > > fctx->base.sequence = nv84_fence_read(chan); > > and add back > > nouveau_bo_wr32(priv->bo, chan->chid * 16/4, 0x); > > ... Added the above on top of your "fixed-fences-for-bisect" branch and guessed it would work, but did not :/ Anyway, as this "initializes" the fence to a known state, maybe you should consider pushing that. Going to compile the kernel with trace events (lets see how) ... Tobias
3.18-rc regression: drm/nouveau: use shared fences for readable objects
On 19.11.2014 09:10, Maarten Lankhorst wrote: > Hey, > > On 19-11-14 07:43, Michael Marineau wrote: >> On 3.18-rc kernel's I have been intermittently experiencing GPU >> lockups shortly after startup, accompanied with one or both of the >> following errors: >> >> nouveau E[ PFIFO][:01:00.0] read fault at 0x000734a000 [PTE] >> from PBDMA0/HOST_CPU on channel 0x007faa3000 [unknown] >> nouveau E[ DRM] GPU lockup - switching to software fbcon >> >> I was able to trace the issue with bisect to commit >> 809e9447b92ffe1346b2d6ec390e212d5307f61c "drm/nouveau: use shared >> fences for readable objects". The lockups appear to have cleared up >> since reverting that and a few related followup commits: >> >> 809e9447: "drm/nouveau: use shared fences for readable objects" >> 055dffdf: "drm/nouveau: bump driver patchlevel to 1.2.1" >> e3be4c23: "drm/nouveau: specify if interruptible wait is desired in >> nouveau_fence_sync" >> 15a996bb: "drm/nouveau: assign fence_chan->name correctly" > Weird. I'm not sure yet what causes it. > > http://cgit.freedesktop.org/~mlankhorst/linux/commit/?h=fixed-fences-for-bisect&id=86be4f216bbb9ea3339843a5658d4c21162c7ee2 > > On the EDITED patch from fixed-fences-for-bisect, can you do the following: > > In nouveau/nv84_fence.c function nv84_fence_context_new, remove > > fctx->base.sequence = nv84_fence_read(chan); > > and add back > > nouveau_bo_wr32(priv->bo, chan->chid * 16/4, 0x); > > If that fails you should compile your kernel with trace events, to get some > debugging info from the fences. I'll post debugging info if this does not fix > it. > > ~Maarten Hey, as mentioned in IRC the new fencing hangs my GPU for a while as well (nve7). Bisected back to 86be4f216bbb9ea3339843a5658d4c21162c7ee2 , EDITED from the fixed-fences-for-bisect branch mentioned above. Original bisect on linus brach brought me to: 29ba89b2371d466ca68973525816cf10debc2655 drm/nouveau: rework to new fence interface Michael if you are going to bisect the "fixed-fences-for-bisect" branch, maybe take a closer look if you come anywhere near that commit, if that does or does not trigger the GPU hangs for you! Tobias
I915 DRI_PRIME Bug
Hello there, while testing my "Optimus" Notebook i saw a stack trace in my logs, maybe someone is interested! I can easily reproduce this any time. It happens when offloading a GL app, here Unigine Heaven 3.0 to the nvidia card. To be more exactly: When starting Unigine the window stays black. To get something useful i have to minimize and maximize the window. Exactly when maximizing the window the trace happens. Hope this helps anyway! [ cut here ] WARNING: CPU: 7 PID: 718 at drivers/gpu/drm/i915/i915_gem.c:3967 i915_gem_free_object+0x124/0x150 [i915]() Modules linked in: af_packet bnep fuse snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec ath3k btusb bluetooth uvcvideo snd_hwdep videobuf2_core videodev videobuf2_vmalloc videobuf2_memops snd_pcm snd_seq snd_timer arc4 ath9k snd_seq_device mac80211 ath9k_common ath9k_hw ath snd iTCO_wdt sdhci_pci sdhci mmc_core sr_mod sg tg3 ptp pps_core iTCO_vendor_support cfg80211 lpc_ich i2c_i801 pcspkr joydev acer_wmi sparse_keymap rfkill cdrom soundcore snd_page_alloc mperf battery ac autofs4 i915 xhci_hcd processor scsi_dh_alua scsi_dh_hp_sw scsi_dh_emc scsi_dh_rdac scsi_dh nouveau ttm drm_kms_helper drm i2c_algo_bit mxm_wmi video thermal_sys wmi button CPU: 7 PID: 718 Comm: Xorg Not tainted 3.11.0-rc5-desktop+ #27 Hardware name: Acer Aspire V3-571G/VA50_HC_CR, BIOS V1.13 10/09/2012 0009 81568703 81047f81 88021cf1ab00 88024f1d 88025e421930 88021c7bcd40 8802540f2da0 a0210a44 88021cf1ab00 Call Trace: [] ? dump_stack+0x50/0x80 [] ? warn_slowpath_common+0x81/0xb0 [] ? i915_gem_free_object+0x124/0x150 [i915] [] ? i915_gem_dmabuf_release+0x80/0x90 [i915] [] ? dma_buf_release+0x23/0x80 [] ? __fput+0xcd/0x230 [] ? task_work_run+0x97/0xd0 [] ? do_notify_resume+0x79/0xa0 [] ? int_signal+0x12/0x17 ---[ end trace 99a0c147e69ddcd1 ]--- Thanks, Tobias Klausmann ___ dri-devel mailing list dri-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/dri-devel
I915 DRI_PRIME Bug
Hello there, while testing my "Optimus" Notebook i saw a stack trace in my logs, maybe someone is interested! I can easily reproduce this any time. It happens when offloading a GL app, here Unigine Heaven 3.0 to the nvidia card. To be more exactly: When starting Unigine the window stays black. To get something useful i have to minimize and maximize the window. Exactly when maximizing the window the trace happens. Hope this helps anyway! [ cut here ] WARNING: CPU: 7 PID: 718 at drivers/gpu/drm/i915/i915_gem.c:3967 i915_gem_free_object+0x124/0x150 [i915]() Modules linked in: af_packet bnep fuse snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_intel snd_hda_codec ath3k btusb bluetooth uvcvideo snd_hwdep videobuf2_core videodev videobuf2_vmalloc videobuf2_memops snd_pcm snd_seq snd_timer arc4 ath9k snd_seq_device mac80211 ath9k_common ath9k_hw ath snd iTCO_wdt sdhci_pci sdhci mmc_core sr_mod sg tg3 ptp pps_core iTCO_vendor_support cfg80211 lpc_ich i2c_i801 pcspkr joydev acer_wmi sparse_keymap rfkill cdrom soundcore snd_page_alloc mperf battery ac autofs4 i915 xhci_hcd processor scsi_dh_alua scsi_dh_hp_sw scsi_dh_emc scsi_dh_rdac scsi_dh nouveau ttm drm_kms_helper drm i2c_algo_bit mxm_wmi video thermal_sys wmi button CPU: 7 PID: 718 Comm: Xorg Not tainted 3.11.0-rc5-desktop+ #27 Hardware name: Acer Aspire V3-571G/VA50_HC_CR, BIOS V1.13 10/09/2012 0009 81568703 81047f81 88021cf1ab00 88024f1d 88025e421930 88021c7bcd40 8802540f2da0 a0210a44 88021cf1ab00 Call Trace: [] ? dump_stack+0x50/0x80 [] ? warn_slowpath_common+0x81/0xb0 [] ? i915_gem_free_object+0x124/0x150 [i915] [] ? i915_gem_dmabuf_release+0x80/0x90 [i915] [] ? dma_buf_release+0x23/0x80 [] ? __fput+0xcd/0x230 [] ? task_work_run+0x97/0xd0 [] ? do_notify_resume+0x79/0xa0 [] ? int_signal+0x12/0x17 ---[ end trace 99a0c147e69ddcd1 ]--- Thanks, Tobias Klausmann