On 07/09/2018 03:04 PM, Christian König wrote:
Am 09.07.2018 um 07:13 schrieb Zhang, Jerry (Junwei):
On 07/06/2018 03:27 AM, Andrey Grodzovsky wrote:
Extract and present the reposnsible process and thread when
VM_FAULT happens.

v2: Use getter and setter functions.

Signed-off-by: Andrey Grodzovsky <andrey.grodzov...@amd.com>
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c |  4 ++++
  drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c  | 10 +++++++---
  drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  |  9 +++++++--
  3 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 7a625f3..609c8f5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -187,6 +187,10 @@ static int amdgpu_cs_parser_init(struct amdgpu_cs_parser 
*p, void *data)
      if (p->uf_entry.robj)
          p->job->uf_addr = uf_offset;
      kfree(chunk_array);
+
+    /* Use this opportunity to fill in task info for the vm */
+    amdgpu_vm_set_task_info(vm);
+

Shall we set the task info when vm init?

No, vm_init() is called from a completely different process which is later on 
user of the VM.

Originally I thought UMD opened DRI node and create a VM by vm_init(), then 
every command submission
would be passed in the same VM initialized by vm_init().

So that's different process?





      return 0;

  free_all_kdata:
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
index 08753e7..75f3ffb 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
@@ -46,7 +46,6 @@

  #include "ivsrcid/ivsrcid_vislands30.h"

-
  static void gmc_v8_0_set_gmc_funcs(struct amdgpu_device *adev);
  static void gmc_v8_0_set_irq_funcs(struct amdgpu_device *adev);
  static int gmc_v8_0_wait_for_idle(void *handle);
@@ -1449,8 +1448,13 @@ static int gmc_v8_0_process_interrupt(struct 
amdgpu_device *adev,
          gmc_v8_0_set_fault_enable_default(adev, false);

      if (printk_ratelimit()) {
-        dev_err(adev->dev, "GPU fault detected: %d 0x%08x\n",
-            entry->src_id, entry->src_data[0]);
+        struct amdgpu_task_info task_info = { 0 };
+
+        amdgpu_vm_get_task_info(adev, entry->pasid, &task_info);

Shall we find vm and get the vm->task_info directly instead of filling local 
variable task_info?
(current style is also OK for me, just for more info)

No, we can't guarantee that the task_info pointer in the VM won't be freed 
after dropping the spinlock. So returning a copy here is the better approach.


And we may also check the return value for "NULL" case, otherwise it may cause 
access errorin dev_err(),
if failed to find vm (although, it's most unlikely to happen).

Since we use a copy of the task info we should never get a NULL pointer. The string 
should already be zero terminated with the "{ 0 }" initialization above.

Thanks to explain that more.
Got it, fine for me.

Jerry


Christian.


Jerry

+
+        dev_err(adev->dev, "GPU fault detected: %d 0x%08x for process %s pid %d 
thread %s pid %d\n",
+            entry->src_id, entry->src_data[0], task_info.process_name,
+            task_info.tgid, task_info.task_name, task_info.pid);
          dev_err(adev->dev, " VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x%08X\n",
              addr);
          dev_err(adev->dev, " VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x%08X\n",
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index 691a659..9df94b4 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -259,11 +259,16 @@ static int gmc_v9_0_process_interrupt(struct 
amdgpu_device *adev,
      }

      if (printk_ratelimit()) {
+        struct amdgpu_task_info task_info = { 0 };
+
+        amdgpu_vm_get_task_info(adev, entry->pasid, &task_info);
+
          dev_err(adev->dev,
-            "[%s] VMC page fault (src_id:%u ring:%u vmid:%u pasid:%u)\n",
+            "[%s] VMC page fault (src_id:%u ring:%u vmid:%u pasid:%u, for process 
%s pid %d thread %s pid %d\n)\n",
              entry->vmid_src ? "mmhub" : "gfxhub",
              entry->src_id, entry->ring_id, entry->vmid,
-            entry->pasid);
+            entry->pasid, task_info.process_name, task_info.tgid,
+            task_info.task_name, task_info.pid);
          dev_err(adev->dev, "  at page 0x%016llx from %d\n",
              addr, entry->client_id);
          if (!amdgpu_sriov_vf(adev))


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Reply via email to