Patch #1:
Could be that we need to add another module parameter to control this,
but I think for now that should be sufficient.
Patch is Reviewed-by: Christian König <christian.koe...@amd.com>
Patch #2:
+ if (direct_submit) {
+ r = amdgpu_ib_schedule(ring, job->num_ibs, job->ibs,
+ NULL, NULL, fence);
+ job->fence = fence_get(*fence);
+ if (r)
+ DRM_ERROR("Error scheduling IBs (%d)\n", r);
+ amdgpu_job_free(job);
When there is an error you should return the code as well.
+ } else {
+ r = amdgpu_job_submit(job, ring, &adev->mman.entity,
+ AMDGPU_FENCE_OWNER_UNDEFINED, fence);
+ if (r)
+ goto error_free;
+ }
return 0;
Just changing this to to "return r;" should be sufficient.
With that fixed the patch is Reviewed-by: Christian König
<christian.koe...@amd.com> as well.
Patch #3:
You mentioned that this isn't used during command submission but rather
during GPU reset, correct?
In this case I would advise not to use a member in the BO structure to
note the backup direction and instead have one function for back and one
for restoring the content (or note that as a parameter to the function).
Otherwise we could run into trouble when the CS wants to backup
something the GPU reset wants to restore at the same time.
Patch #4: Was already reviewed by me.
Patch #5:
+ pt = params->shadow ? vm->page_tables[pt_idx].entry.robj->shadow :
+ vm->page_tables[pt_idx].entry.robj;
You need to double check here if shadows are actually allocated or not.
Otherwise we will crash on an APU.
+ /* double ndw, since need to update shadow pt bo as well */
+ ndw *= 2;
+
We don't need to double the IB size, but only the number of commands in
it (ncmds).
The difference is when we want to write scattered GART entries. In this
case I've added the PTEs to the end of the IB.
Patch #6:
+ spinlock_t shadow_list_lock;
We might want to use a mutex here instead. Usually I would agree that a
spin lock is better for a linked list, but during the restore process in
a GPU reset we probably want to sleep here.
Alternatively you could splice the list to a local version on the stack,
grab references to the BOs and then drop the lock during the restore
process.
@@ -541,6 +546,13 @@ void amdgpu_bo_unref(struct amdgpu_bo **bo)
if ((*bo) == NULL)
return;
+ /* shadow must be freed before bo itself */
+ if ((!(*bo)->shadow) && !list_empty(&(*bo)->shadow_list)) {
+ spin_lock(&(*bo)->adev->shadow_list_lock);
+ list_del_init(&(*bo)->shadow_list);
+ spin_unlock(&(*bo)->adev->shadow_list_lock);
+
+ }
tbo = &((*bo)->tbo);
ttm_bo_unref(&tbo);
if (tbo == NULL)
That would trigger even when we take a temporary reference. I suggest to
add a amdgpu_bo_unref_shadow() function instead, unreferencing both the
shadow and the normal BO.
This can then be used in the VM code to cleanup the shadow created.
Going to skip patch #7 and #8 for now cause our team call begins in a
moment.
Regards,
Christian.
Am 17.08.2016 um 08:05 schrieb Chunming Zhou:
Since we cannot ensure VRAM is consistent after a GPU reset, page
table shadowing is necessary. Shadowed page tables are, in a sense, a
method to recover the consistent state of the page tables before the
reset occurred.
We need to allocate GTT bo as the shadow of VRAM bo when creating page table,
and make them the same. After gpu reset, we will need to use SDMA to copy GTT bo
content to VRAM bo, then page table will be recoveried.
V2:
Shadow bo uses a shadow entity running on normal run queue, after gpu reset,
we need to wait for all shadow jobs finished first, then recovery page table
from shadow.
V3:
Addressed Christian comments for shadow bo part.
V4:
Switch back to update page table twice (one of two is for shadow)
V5:
make it be gerneic shadow bo support. Address Christian comments.
Chunming Zhou (8):
drm/amdgpu: add need backup function V2
drm/amdgpu: add direct submision option for copy_buffer
drm/amdgpu: sync bo and shadow V2
drm/amdgpu: update pd shadow while updating pd
drm/amdgpu: update pt shadow while updating pt V2
drm/amdgpu: link all shadow bo
drm/amdgpu: implement recovery vram bo from shadow
drm/amdgpu: recover vram bo from shadow after gpu reset
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 9 ++-
drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c | 3 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 39 ++++++++++-
drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 94 ++++++++++++++++++++++++++-
drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 9 +++
drivers/gpu/drm/amd/amdgpu/amdgpu_test.c | 4 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 21 ++++--
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 69 ++++++++++++++------
8 files changed, 215 insertions(+), 33 deletions(-)
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx