Re: [PATCH] drm/i915: Improve debug print in vm_fault_ttm
On 9/22/2022 6:38 PM, Matthew Auld wrote: On 22/09/2022 13:09, Nirmoy Das wrote: Print the error code returned by __i915_ttm_migrate() for better debuggability. References: https://gitlab.freedesktop.org/drm/intel/-/issues/6889 Signed-off-by: Nirmoy Das --- drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c index e3fc38dd5db0..9619c0fe1025 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c @@ -1034,7 +1034,7 @@ static vm_fault_t vm_fault_ttm(struct vm_fault *vmf) } if (err) { - drm_dbg(dev, "Unable to make resource CPU accessible\n"); + drm_dbg(dev, "Unable to make resource CPU accessible(err = %pe)\n", err); Yeah, looks useful. I think for that bug the object is just too large for the mappable part of lmem, so this just gives -2big or similar on small-bar systems. I presume that the test needs to be updated to account for the cpu_size or so. Yeah, can't think of any other case. The test need to be updated, going to send out igt fixes for this. With the kernel test robot warning fixed: Acked-by: Matthew Auld Thanks, I will resend a updated one. I looked at the GEM_BUG_ON(rq->reserved_space > ring->space), and I think the issue is maybe with emit_pte() using the ring->space to manually figure out the number of dwords it can emit (instead of the usual ring_begin()), which I guess works, but if we are unlucky and get interrupted (like with a very well timed sigbus here), while waiting for more ring space and end up bailing early, we might have trampled over the reserved_space when submitting the request. I guess normally the next ring_begin() would take care of the reserved_space, like when constructing the actual copy packet. I am not so familiar with the code but sounds logical. Nirmoy dma_resv_unlock(bo->base.resv); ret = VM_FAULT_SIGBUS; goto out_rpm;
Re: [PATCH] drm/i915: Improve debug print in vm_fault_ttm
On 22/09/2022 13:09, Nirmoy Das wrote: Print the error code returned by __i915_ttm_migrate() for better debuggability. References: https://gitlab.freedesktop.org/drm/intel/-/issues/6889 Signed-off-by: Nirmoy Das --- drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c index e3fc38dd5db0..9619c0fe1025 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c @@ -1034,7 +1034,7 @@ static vm_fault_t vm_fault_ttm(struct vm_fault *vmf) } if (err) { - drm_dbg(dev, "Unable to make resource CPU accessible\n"); + drm_dbg(dev, "Unable to make resource CPU accessible(err = %pe)\n", err); Yeah, looks useful. I think for that bug the object is just too large for the mappable part of lmem, so this just gives -2big or similar on small-bar systems. I presume that the test needs to be updated to account for the cpu_size or so. With the kernel test robot warning fixed: Acked-by: Matthew Auld I looked at the GEM_BUG_ON(rq->reserved_space > ring->space), and I think the issue is maybe with emit_pte() using the ring->space to manually figure out the number of dwords it can emit (instead of the usual ring_begin()), which I guess works, but if we are unlucky and get interrupted (like with a very well timed sigbus here), while waiting for more ring space and end up bailing early, we might have trampled over the reserved_space when submitting the request. I guess normally the next ring_begin() would take care of the reserved_space, like when constructing the actual copy packet. dma_resv_unlock(bo->base.resv); ret = VM_FAULT_SIGBUS; goto out_rpm;
Re: [Intel-gfx] [PATCH] drm/i915: Improve debug print in vm_fault_ttm
Hi Nirmoy, Thank you for the patch! Yet something to improve: [auto build test ERROR on drm-tip/drm-tip] url: https://github.com/intel-lab-lkp/linux/commits/Nirmoy-Das/drm-i915-Improve-debug-print-in-vm_fault_ttm/20220922-201041 base: git://anongit.freedesktop.org/drm/drm-tip drm-tip config: x86_64-defconfig (https://download.01.org/0day-ci/archive/20220923/202209230001.og3h9emy-...@intel.com/config) compiler: gcc-11 (Debian 11.3.0-5) 11.3.0 reproduce (this is a W=1 build): # https://github.com/intel-lab-lkp/linux/commit/e1a426a9e14837ada7e883d20af7c9abdf59823c git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review Nirmoy-Das/drm-i915-Improve-debug-print-in-vm_fault_ttm/20220922-201041 git checkout e1a426a9e14837ada7e883d20af7c9abdf59823c # save the config file mkdir build_dir && cp config build_dir/.config make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash If you fix the issue, kindly add following tag where applicable | Reported-by: kernel test robot All errors (new ones prefixed by >>): In file included from include/drm/drm_mm.h:51, from include/drm/ttm/ttm_bo_driver.h:33, from drivers/gpu/drm/i915/gem/i915_gem_ttm.c:8: drivers/gpu/drm/i915/gem/i915_gem_ttm.c: In function 'vm_fault_ttm': >> drivers/gpu/drm/i915/gem/i915_gem_ttm.c:1037:38: error: format '%p' expects >> argument of type 'void *', but argument 4 has type 'int' [-Werror=format=] 1037 | drm_dbg(dev, "Unable to make resource CPU accessible(err = %pe)\n", err); | ^ ~~~ | | | int include/drm/drm_print.h:461:63: note: in definition of macro 'drm_dbg' 461 | drm_dev_dbg((drm) ? (drm)->dev : NULL, DRM_UT_DRIVER, fmt, ##__VA_ARGS__) | ^~~ drivers/gpu/drm/i915/gem/i915_gem_ttm.c:1037:85: note: format string is defined here 1037 | drm_dbg(dev, "Unable to make resource CPU accessible(err = %pe)\n", err); | ~^ | | | void * | %d cc1: all warnings being treated as errors vim +1037 drivers/gpu/drm/i915/gem/i915_gem_ttm.c 986 987 static vm_fault_t vm_fault_ttm(struct vm_fault *vmf) 988 { 989 struct vm_area_struct *area = vmf->vma; 990 struct ttm_buffer_object *bo = area->vm_private_data; 991 struct drm_device *dev = bo->base.dev; 992 struct drm_i915_gem_object *obj; 993 intel_wakeref_t wakeref = 0; 994 vm_fault_t ret; 995 int idx; 996 997 obj = i915_ttm_to_gem(bo); 998 if (!obj) 999 return VM_FAULT_SIGBUS; 1000 1001 /* Sanity check that we allow writing into this object */ 1002 if (unlikely(i915_gem_object_is_readonly(obj) && 1003 area->vm_flags & VM_WRITE)) 1004 return VM_FAULT_SIGBUS; 1005 1006 ret = ttm_bo_vm_reserve(bo, vmf); 1007 if (ret) 1008 return ret; 1009 1010 if (obj->mm.madv != I915_MADV_WILLNEED) { 1011 dma_resv_unlock(bo->base.resv); 1012 return VM_FAULT_SIGBUS; 1013 } 1014 1015 if (i915_ttm_cpu_maps_iomem(bo->resource)) 1016 wakeref = intel_runtime_pm_get(_i915(obj->base.dev)->runtime_pm); 1017 1018 if (!i915_ttm_resource_mappable(bo->resource)) { 1019 int err = -ENODEV; 1020 int i; 1021 1022 for (i = 0; i < obj->mm.n_placements; i++) { 1023 struct intel_memory_region *mr = obj->mm.placements[i]; 1024 unsigned int flags; 1025 1026 if (!mr->io_size && mr->type != INTEL_MEMORY_SYSTEM) 1027 continue; 1028 1029 flags = obj->flags; 1030 flags &= ~I915_BO_ALLOC_GPU_ONLY; 1031 err = __i915_ttm_migrate(obj, mr, flags); 1032 if (!err) 1033 break; 1034 } 1035
Re: [Intel-gfx] [PATCH] drm/i915: Improve debug print in vm_fault_ttm
Hi Nirmoy, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on drm-tip/drm-tip] url: https://github.com/intel-lab-lkp/linux/commits/Nirmoy-Das/drm-i915-Improve-debug-print-in-vm_fault_ttm/20220922-201041 base: git://anongit.freedesktop.org/drm/drm-tip drm-tip config: x86_64-randconfig-a011 (https://download.01.org/0day-ci/archive/20220922/20220931.mirsqiu7-...@intel.com/config) compiler: gcc-11 (Debian 11.3.0-5) 11.3.0 reproduce (this is a W=1 build): # https://github.com/intel-lab-lkp/linux/commit/e1a426a9e14837ada7e883d20af7c9abdf59823c git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review Nirmoy-Das/drm-i915-Improve-debug-print-in-vm_fault_ttm/20220922-201041 git checkout e1a426a9e14837ada7e883d20af7c9abdf59823c # save the config file mkdir build_dir && cp config build_dir/.config make W=1 O=build_dir ARCH=x86_64 SHELL=/bin/bash drivers/gpu/drm/i915/ If you fix the issue, kindly add following tag where applicable | Reported-by: kernel test robot All warnings (new ones prefixed by >>): In file included from include/drm/drm_mm.h:51, from include/drm/ttm/ttm_bo_driver.h:33, from drivers/gpu/drm/i915/gem/i915_gem_ttm.c:8: drivers/gpu/drm/i915/gem/i915_gem_ttm.c: In function 'vm_fault_ttm': >> drivers/gpu/drm/i915/gem/i915_gem_ttm.c:1037:38: warning: format '%p' >> expects argument of type 'void *', but argument 4 has type 'int' [-Wformat=] 1037 | drm_dbg(dev, "Unable to make resource CPU accessible(err = %pe)\n", err); | ^ ~~~ | | | int include/drm/drm_print.h:461:63: note: in definition of macro 'drm_dbg' 461 | drm_dev_dbg((drm) ? (drm)->dev : NULL, DRM_UT_DRIVER, fmt, ##__VA_ARGS__) | ^~~ drivers/gpu/drm/i915/gem/i915_gem_ttm.c:1037:85: note: format string is defined here 1037 | drm_dbg(dev, "Unable to make resource CPU accessible(err = %pe)\n", err); | ~^ | | | void * | %d vim +1037 drivers/gpu/drm/i915/gem/i915_gem_ttm.c 986 987 static vm_fault_t vm_fault_ttm(struct vm_fault *vmf) 988 { 989 struct vm_area_struct *area = vmf->vma; 990 struct ttm_buffer_object *bo = area->vm_private_data; 991 struct drm_device *dev = bo->base.dev; 992 struct drm_i915_gem_object *obj; 993 intel_wakeref_t wakeref = 0; 994 vm_fault_t ret; 995 int idx; 996 997 obj = i915_ttm_to_gem(bo); 998 if (!obj) 999 return VM_FAULT_SIGBUS; 1000 1001 /* Sanity check that we allow writing into this object */ 1002 if (unlikely(i915_gem_object_is_readonly(obj) && 1003 area->vm_flags & VM_WRITE)) 1004 return VM_FAULT_SIGBUS; 1005 1006 ret = ttm_bo_vm_reserve(bo, vmf); 1007 if (ret) 1008 return ret; 1009 1010 if (obj->mm.madv != I915_MADV_WILLNEED) { 1011 dma_resv_unlock(bo->base.resv); 1012 return VM_FAULT_SIGBUS; 1013 } 1014 1015 if (i915_ttm_cpu_maps_iomem(bo->resource)) 1016 wakeref = intel_runtime_pm_get(_i915(obj->base.dev)->runtime_pm); 1017 1018 if (!i915_ttm_resource_mappable(bo->resource)) { 1019 int err = -ENODEV; 1020 int i; 1021 1022 for (i = 0; i < obj->mm.n_placements; i++) { 1023 struct intel_memory_region *mr = obj->mm.placements[i]; 1024 unsigned int flags; 1025 1026 if (!mr->io_size && mr->type != INTEL_MEMORY_SYSTEM) 1027 continue; 1028 1029 flags = obj->flags; 1030 flags &= ~I915_BO_ALLOC_GPU_ONLY; 1031 err = __i915_ttm_migrate(obj, mr, flags); 1032 if (!err) 1033 break; 1034 } 1035 1036
[PATCH] drm/i915: Improve debug print in vm_fault_ttm
Print the error code returned by __i915_ttm_migrate() for better debuggability. References: https://gitlab.freedesktop.org/drm/intel/-/issues/6889 Signed-off-by: Nirmoy Das --- drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c index e3fc38dd5db0..9619c0fe1025 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c @@ -1034,7 +1034,7 @@ static vm_fault_t vm_fault_ttm(struct vm_fault *vmf) } if (err) { - drm_dbg(dev, "Unable to make resource CPU accessible\n"); + drm_dbg(dev, "Unable to make resource CPU accessible(err = %pe)\n", err); dma_resv_unlock(bo->base.resv); ret = VM_FAULT_SIGBUS; goto out_rpm; -- 2.37.3