RE: Support for amdgpu VM update via CPU on large-bar systems

Kasiviswanathan, Harish Wed, 10 May 2017 12:24:54 -0700

Thanks Christian for the review. One clarification required. Please see my 
comments and question inline.


From: Christian König [mailto:deathsim...@vodafone.de] 
Sent: Wednesday, May 10, 2017 3:24 AM
To: Kasiviswanathan, Harish <harish.kasiviswanat...@amd.com>; 
amd-gfx@lists.freedesktop.org
Subject: Re: Support for amdgpu VM update via CPU on large-bar systems

Hi Harish,

for the next time please use git send-email to send patches to the mailing list.

Looks pretty good in general, but a few notes all over the place.

Patch #1:

+static bool amdgpu_vm_is_large_bar(struct amdgpu_device *adev)
+{
+    if (adev->mc.visible_vram_size > 0 &&
+            adev->mc.real_vram_size == adev->mc.visible_vram_size)
+        return true;
+    return false;
+}
The visible_vram_size > 0 check looks superfluous. The coding style looks off, 
the second line of the "if" is to far right.

And in general that should rather look like "return adev->mc.real_vram_size == 
adev->mc.visible_vram_size;".


+    /* CPU update is only supported for large BAR system */
+    vm->is_vm_update_mode_cpu = (vm_update_mode &&
+                     amdgpu_vm_is_large_bar(adev));
Mhm, it would be nice if we could enable this for testing even on systems with 
small BAR.
[HK] : I will add bit in the module parameter that can override large-bar 
requirement.

I would rather suggest to make the default value depend on if the BOX has a 
large or small BAR and let the user override the setting.

Additional to that we should limit this to 64bit systems.


+#define AMDGPU_VM_USE_CPU_UPDATE_FOR_GRAPHICS_MASK (1 << 0)
+#define AMDGPU_VM_USE_CPU_UPDATE_FOR_COMPUTE_MASK (1 << 1)
Not much of an issue, but the names are a bit long. Maybe use something like 
AMDGPU_VM_USE_CPU_FOR_GFX and AMDGPU_VM_USE_CPU_FOR_COMPUTE.


+    bool                    is_vm_update_mode_cpu : 1;
Please no bitmasks when we don't need them.

Patch #2:

+    u64 flags;
     unsigned shift = (adev->vm_manager.num_level - level) *
         adev->vm_manager.block_size;
Reverse tree order.


+    flags = (vm->is_vm_update_mode_cpu) ?
+            AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED :
+            AMDGPU_GEM_CREATE_NO_CPU_ACCESS |
+            AMDGPU_GEM_CREATE_SHADOW;
+
+
...

-                         AMDGPU_GEM_CREATE_NO_CPU_ACCESS |
-                         AMDGPU_GEM_CREATE_SHADOW |
+                         flags |
                          AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS |
                          AMDGPU_GEM_CREATE_VRAM_CLEARED,

I would rather write it like this:

flags = AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS | AMDGPU_GEM_CREATE_VRAM_CLEARED;
if (vm->is_vm_update_mode_cpu)
    flags |= AMDGPU_GEM_CREATE_CPU_ACCESS_REQUIRED;
else
    flags |= AMDGPU_GEM_CREATE_NO_CPU_ACCESS | AMDGPU_GEM_CREATE_SHADOW;


+    mb();
+    amdgpu_gart_flush_gpu_tlb(params->adev, 0);
You do this for the HDP flush, don't you?

[HK]: Yes


+            dev_warn(adev->dev, "Page table update using CPU failed. Fallback 
to SDMA\n");
NACK, if kmapping the BO fails we most likely are either out of memory or out 
of address space.

Falling back to the SDMA doesn't make the situation better, just return a 
proper error code here.


+            /* Wait for BO to be free. SDMA could be clearing it */
+            amdgpu_sync_create(&sync);
+            amdgpu_sync_resv(adev, &sync, parent->bo->tbo.resv,
+                     AMDGPU_FENCE_OWNER_VM);
+            amdgpu_sync_wait(&sync);
+            amdgpu_sync_free(&sync);
Superfluous and a bit incorrect, amdgpu_bo_kmap already calls 
reservation_object_wait_timeout_rcu() but only for the exclusive fence.

You probably ran into problems because of pipelined evictions, so that should 
be fixed in amdgpu_bo_kmap() instead.

[HK]: From Compute perspective, all validations are synchronous. PT/PD BOs are 
validated when restored after evictions. However, the clearing of PT/PD BOs 
during creation are still done by GPU which mandates the waiting. 
[HK]: I am not clear on how to modify amdgpu_bo_kmap(). The function currently 
waits for the exclusive fence and now optionally it has to wait for all the 
shared fences. So are you suggesting to modify kmap function to 
amdgpu_bo_kmap(struct amdgpu_bo *bo, void **ptr, void *fence_owner). If 
fence_onwer != NULL it would wait for all the shared fences. Please clarify.
  


-                    amdgpu_vm_do_set_ptes(&params,
-                                  last_shadow,
-                                  pt_addr, count,
-                                  incr,
-                                  AMDGPU_PTE_VALID);
-
-                amdgpu_vm_do_set_ptes(&params, last_pde,
-                              pt_addr, count, incr,
-                              AMDGPU_PTE_VALID);
+                    params.func(&params,
+                            last_shadow,
+                            pt_addr, count,
+                            incr,
+                            AMDGPU_PTE_VALID);
+
+                params.func(&params, last_pde,
+                        pt_addr, count, incr,
+                        AMDGPU_PTE_VALID);
Might be a good idea to separate that into a cleanup patch.

Patch #3: Looks good to me and is Reviewed-by: Christian König 
<christian.koe...@amd.com>.

Please move that one to the beginning of the series and/or commit it directly 
to amd-staging-4.9.

Patch #4:

+    /* The next three are used during VM update by CPU */
+    bool update_by_cpu;
We can distinguish that as well by looking at the function pointer, don't we?


+        dev_warn(adev->dev,
+             "CPU update of VM failed. Fallback to SDMA\n");
+
+        /* Reset params for SDMA fallback path */
+        params.update_by_cpu = false;
+        params.pages_addr = NULL;
+        params.kptr = NULL;
+        params.func = NULL;
Again completely drop the SDMA fallback path.

Regards,
Christian.

Am 09.05.2017 um 20:34 schrieb Kasiviswanathan, Harish:
Hi,

Please review the patch set that supports amdgpu VM update via CPU. This 
feature provides improved performance for compute (HSA) where mapping / 
unmapping is carried out (by Kernel) independent of command submissions (done 
directly by user space). This version doesn't support shadow copy of VM page 
tables for CPU based update.

Best Regards,
Harish




_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: Support for amdgpu VM update via CPU on large-bar systems

Reply via email to