Fw: [PATCH] drm/amdgpu:fix gfx fence allocate size
From: Monk Liu Sent: Tuesday, July 18, 2017 1:56 PM To: amd-gfx-boun...@lists.freedesktop.org Cc: Liu, Monk; Yu, Xiangliang Subject: [PATCH] drm/amdgpu:fix gfx fence allocate size 1, for sriov, we need 8dw for the gfx fence due to CP behaviour 2, cleanup wrong logic in wptr/rptr wb alloc and free Change-Id: Ifbfed17a4621dae57244942ffac7de1743de0294 Signed-off-by: Monk Liu Signed-off-by: Xiangliang Yu --- drivers/gpu/drm/amd/amdgpu/amdgpu.h| 2 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 32 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 26 3 files changed, 52 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index f6345b9..fe96236 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1191,7 +1191,9 @@ struct amdgpu_wb { int amdgpu_wb_get(struct amdgpu_device *adev, u32 *wb); void amdgpu_wb_free(struct amdgpu_device *adev, u32 wb); int amdgpu_wb_get_64bit(struct amdgpu_device *adev, u32 *wb); +int amdgpu_wb_get_256Bit(struct amdgpu_device *adev, u32 *wb); void amdgpu_wb_free_64bit(struct amdgpu_device *adev, u32 wb); +void amdgpu_wb_free_256bit(struct amdgpu_device *adev, u32 wb); void amdgpu_get_pcie_info(struct amdgpu_device *adev); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 7e11190..6050804 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -603,6 +603,21 @@ int amdgpu_wb_get_64bit(struct amdgpu_device *adev, u32 *wb) } } +int amdgpu_wb_get_256Bit(struct amdgpu_device *adev, u32 *wb) +{ + int i = 0; + unsigned long offset = bitmap_find_next_zero_area_off(adev->wb.used, + adev->wb.num_wb, 0, 8, 63, 0); + if ((offset + 7) < adev->wb.num_wb) { + for (i = 0; i < 8; i++) + __set_bit(offset + i, adev->wb.used); + *wb = offset; + return 0; + } else { + return -EINVAL; + } +} + /** * amdgpu_wb_free - Free a wb entry * @@ -634,6 +649,23 @@ void amdgpu_wb_free_64bit(struct amdgpu_device *adev, u32 wb) } /** + * amdgpu_wb_free_256bit - Free a wb entry + * + * @adev: amdgpu_device pointer + * @wb: wb index + * + * Free a wb slot allocated for use by the driver (all asics) + */ +void amdgpu_wb_free_256bit(struct amdgpu_device *adev, u32 wb) +{ + int i = 0; + + if ((wb + 7) < adev->wb.num_wb) + for (i = 0; i < 8; i++) + __clear_bit(wb + i, adev->wb.used); +} + +/** * amdgpu_vram_location - try to find VRAM location * @adev: amdgpu device structure holding all necessary informations * @mc: memory controller structure holding memory informations diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c index 75165e0..eea17ae 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c @@ -212,10 +212,19 @@ int amdgpu_ring_init(struct amdgpu_device *adev, struct amdgpu_ring *ring, } - r = amdgpu_wb_get(adev, &ring->fence_offs); - if (r) { - dev_err(adev->dev, "(%d) ring fence_offs wb alloc failed\n", r); - return r; + if (amdgpu_sriov_vf(adev) && ring->funcs->type == AMDGPU_RING_TYPE_GFX) { + r = amdgpu_wb_get_256Bit(adev, &ring->fence_offs); + if (r) { + dev_err(adev->dev, "(%d) ring fence_offs wb alloc failed\n", r); + return r; + } + + } else { + r = amdgpu_wb_get(adev, &ring->fence_offs); + if (r) { + dev_err(adev->dev, "(%d) ring fence_offs wb alloc failed\n", r); + return r; + } } r = amdgpu_wb_get(adev, &ring->cond_exe_offs); @@ -278,17 +287,18 @@ void amdgpu_ring_fini(struct amdgpu_ring *ring) ring->ready = false; if (ring->funcs->support_64bit_ptrs) { - amdgpu_wb_free_64bit(ring->adev, ring->cond_exe_offs); - amdgpu_wb_free_64bit(ring->adev, ring->fence_offs); amdgpu_wb_free_64bit(ring->adev, ring->rptr_offs); amdgpu_wb_free_64bit(ring->adev, ring->wptr_offs); } else { - amdgpu_wb_free(ring->adev, ring->cond_exe_offs); - amdgpu_wb_free(ring->adev, ring->fence_offs); amdgpu_wb_free(ring->adev, ring->rptr_offs); amdgpu_wb_free(ring->adev, ring->wptr_offs); } + amdgpu_wb_free(ring->adev, ring->cond_exe_offs); + if (amdgpu_sriov_vf(ring->adev) && ring->funcs->type == AMDGPU_RING_TYPE_GFX) + amdgpu_wb_free_256bit(ring->adev, ring->fence_offs); +
Re: [PATCH v3 1/4] drm/amdgpu: Fix KFD oversubscription by tracking queues correctly
Hi Alex, This patch series went into amd-kfd-staging. I'd like to also push it into amd-staging-4.11 as I'm just working to minimize any unnecessary differences between the branches before the big KFD history rework. I rebased it, resolved some contlicts, and removed the declaration of get_mec_num from kfd_device_queue_manager.h. Do you want me to push that rebased patch series? Thanks, Felix On 17-07-17 11:52 AM, Oded Gabbay wrote: > On Fri, Jul 14, 2017 at 7:24 PM, Alex Deucher wrote: >> On Thu, Jul 13, 2017 at 9:21 PM, Jay Cornwall wrote: >>> The number of compute queues available to the KFD was erroneously >>> calculated as 64. Only the first MEC can execute compute queues and >>> it has 32 queue slots. >>> >>> This caused the oversubscription limit to be calculated incorrectly, >>> leading to a missing chained runlist command at the end of an >>> oversubscribed runlist. >>> >>> v2: Remove unused num_mec field to avoid duplicate logic >>> v3: Separate num_mec removal into separate patches >>> >>> Change-Id: I9e7bba2cc1928b624e3eeb1edb06fdb602e5294f >>> Signed-off-by: Jay Cornwall >> Series is: >> Reviewed-by: Alex Deucher >> > Hi Jay, > Thanks for the patches, I applied them to amdkfd-fixes (after rebasing > them over 4.13-rc1) > > Oded > >>> --- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c >>> index 7060daf..aa4006a 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c >>> @@ -140,7 +140,7 @@ void amdgpu_amdkfd_device_init(struct amdgpu_device >>> *adev) >>> /* According to linux/bitmap.h we shouldn't use >>> bitmap_clear if >>> * nbits is not compile time constant >>> */ >>> - last_valid_bit = adev->gfx.mec.num_mec >>> + last_valid_bit = 1 /* only first MEC can have compute >>> queues */ >>> * adev->gfx.mec.num_pipe_per_mec >>> * adev->gfx.mec.num_queue_per_pipe; >>> for (i = last_valid_bit; i < KGD_MAX_QUEUES; ++i) >>> -- >>> 2.7.4 >>> >>> ___ >>> amd-gfx mailing list >>> amd-gfx@lists.freedesktop.org >>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >> ___ >> amd-gfx mailing list >> amd-gfx@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx > ___ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH] drm/amdgpu:fix gfx fence allocate size
1, for sriov, we need 8dw for the gfx fence due to CP behaviour 2, cleanup wrong logic in wptr/rptr wb alloc and free Change-Id: Ifbfed17a4621dae57244942ffac7de1743de0294 Signed-off-by: Monk Liu Signed-off-by: Xiangliang Yu --- drivers/gpu/drm/amd/amdgpu/amdgpu.h| 2 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 32 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 26 3 files changed, 52 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index f6345b9..fe96236 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1191,7 +1191,9 @@ struct amdgpu_wb { int amdgpu_wb_get(struct amdgpu_device *adev, u32 *wb); void amdgpu_wb_free(struct amdgpu_device *adev, u32 wb); int amdgpu_wb_get_64bit(struct amdgpu_device *adev, u32 *wb); +int amdgpu_wb_get_256Bit(struct amdgpu_device *adev, u32 *wb); void amdgpu_wb_free_64bit(struct amdgpu_device *adev, u32 wb); +void amdgpu_wb_free_256bit(struct amdgpu_device *adev, u32 wb); void amdgpu_get_pcie_info(struct amdgpu_device *adev); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 7e11190..6050804 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -603,6 +603,21 @@ int amdgpu_wb_get_64bit(struct amdgpu_device *adev, u32 *wb) } } +int amdgpu_wb_get_256Bit(struct amdgpu_device *adev, u32 *wb) +{ + int i = 0; + unsigned long offset = bitmap_find_next_zero_area_off(adev->wb.used, + adev->wb.num_wb, 0, 8, 63, 0); + if ((offset + 7) < adev->wb.num_wb) { + for (i = 0; i < 8; i++) + __set_bit(offset + i, adev->wb.used); + *wb = offset; + return 0; + } else { + return -EINVAL; + } +} + /** * amdgpu_wb_free - Free a wb entry * @@ -634,6 +649,23 @@ void amdgpu_wb_free_64bit(struct amdgpu_device *adev, u32 wb) } /** + * amdgpu_wb_free_256bit - Free a wb entry + * + * @adev: amdgpu_device pointer + * @wb: wb index + * + * Free a wb slot allocated for use by the driver (all asics) + */ +void amdgpu_wb_free_256bit(struct amdgpu_device *adev, u32 wb) +{ + int i = 0; + + if ((wb + 7) < adev->wb.num_wb) + for (i = 0; i < 8; i++) + __clear_bit(wb + i, adev->wb.used); +} + +/** * amdgpu_vram_location - try to find VRAM location * @adev: amdgpu device structure holding all necessary informations * @mc: memory controller structure holding memory informations diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c index 75165e0..eea17ae 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c @@ -212,10 +212,19 @@ int amdgpu_ring_init(struct amdgpu_device *adev, struct amdgpu_ring *ring, } - r = amdgpu_wb_get(adev, &ring->fence_offs); - if (r) { - dev_err(adev->dev, "(%d) ring fence_offs wb alloc failed\n", r); - return r; + if (amdgpu_sriov_vf(adev) && ring->funcs->type == AMDGPU_RING_TYPE_GFX) { + r = amdgpu_wb_get_256Bit(adev, &ring->fence_offs); + if (r) { + dev_err(adev->dev, "(%d) ring fence_offs wb alloc failed\n", r); + return r; + } + + } else { + r = amdgpu_wb_get(adev, &ring->fence_offs); + if (r) { + dev_err(adev->dev, "(%d) ring fence_offs wb alloc failed\n", r); + return r; + } } r = amdgpu_wb_get(adev, &ring->cond_exe_offs); @@ -278,17 +287,18 @@ void amdgpu_ring_fini(struct amdgpu_ring *ring) ring->ready = false; if (ring->funcs->support_64bit_ptrs) { - amdgpu_wb_free_64bit(ring->adev, ring->cond_exe_offs); - amdgpu_wb_free_64bit(ring->adev, ring->fence_offs); amdgpu_wb_free_64bit(ring->adev, ring->rptr_offs); amdgpu_wb_free_64bit(ring->adev, ring->wptr_offs); } else { - amdgpu_wb_free(ring->adev, ring->cond_exe_offs); - amdgpu_wb_free(ring->adev, ring->fence_offs); amdgpu_wb_free(ring->adev, ring->rptr_offs); amdgpu_wb_free(ring->adev, ring->wptr_offs); } + amdgpu_wb_free(ring->adev, ring->cond_exe_offs); + if (amdgpu_sriov_vf(ring->adev) && ring->funcs->type == AMDGPU_RING_TYPE_GFX) + amdgpu_wb_free_256bit(ring->adev, ring->fence_offs); + else + amdgpu_wb_free(ring->adev, ring->cond_exe_offs); amdgpu_bo_free_kernel(&ring->ring_obj, &ring->gpu_addr, -- 2.7.4 _
[PATCH] [rfc] radv: start moving semaphore support out of libdrm
From: Dave Airlie This is a port of radv to the new lowlevel cs submission APIs for libdrm that I submitted earlier. This moves a lot of the current non-shared semaphore handling and chunk creation out of libdrm_amdgpu. It provides a much simpler implementation without all the list handling, I'm sure I can even clean it up a lot further. For now I've left the old code paths under the RADV_OLD_LIBDRM define in this patch, I'd replace that with version or just rip out the whole lot once we get a libdrm release with the new APIs in. --- src/amd/vulkan/winsys/amdgpu/radv_amdgpu_cs.c | 202 +++--- 1 file changed, 184 insertions(+), 18 deletions(-) diff --git a/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_cs.c b/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_cs.c index ffc7566..ce73b88 100644 --- a/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_cs.c +++ b/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_cs.c @@ -75,6 +75,10 @@ radv_amdgpu_cs(struct radeon_winsys_cs *base) return (struct radv_amdgpu_cs*)base; } +struct radv_amdgpu_sem_info { + int wait_sem_count; + struct radeon_winsys_sem **wait_sems; +}; static int ring_to_hw_ip(enum ring_type ring) { switch (ring) { @@ -89,6 +93,21 @@ static int ring_to_hw_ip(enum ring_type ring) } } +static void radv_amdgpu_wait_sems(struct radv_amdgpu_ctx *ctx, + uint32_t ip_type, + uint32_t ring, + uint32_t sem_count, + struct radeon_winsys_sem **_sem, + struct radv_amdgpu_sem_info *sem_info); +static int radv_amdgpu_signal_sems(struct radv_amdgpu_ctx *ctx, + uint32_t ip_type, + uint32_t ring, + uint32_t sem_count, + struct radeon_winsys_sem **_sem); +static int radv_amdgpu_cs_submit(struct radv_amdgpu_ctx *ctx, +struct amdgpu_cs_request *request, +struct radv_amdgpu_sem_info *sem_info); + static void radv_amdgpu_request_to_fence(struct radv_amdgpu_ctx *ctx, struct radv_amdgpu_fence *fence, struct amdgpu_cs_request *req) @@ -647,6 +666,7 @@ static void radv_assign_last_submit(struct radv_amdgpu_ctx *ctx, static int radv_amdgpu_winsys_cs_submit_chained(struct radeon_winsys_ctx *_ctx, int queue_idx, + struct radv_amdgpu_sem_info *sem_info, struct radeon_winsys_cs **cs_array, unsigned cs_count, struct radeon_winsys_cs *initial_preamble_cs, @@ -703,7 +723,7 @@ static int radv_amdgpu_winsys_cs_submit_chained(struct radeon_winsys_ctx *_ctx, ibs[0] = ((struct radv_amdgpu_cs*)initial_preamble_cs)->ib; } - r = amdgpu_cs_submit(ctx->ctx, 0, &request, 1); + r = radv_amdgpu_cs_submit(ctx, &request, sem_info); if (r) { if (r == -ENOMEM) fprintf(stderr, "amdgpu: Not enough memory for command submission.\n"); @@ -724,6 +744,7 @@ static int radv_amdgpu_winsys_cs_submit_chained(struct radeon_winsys_ctx *_ctx, static int radv_amdgpu_winsys_cs_submit_fallback(struct radeon_winsys_ctx *_ctx, int queue_idx, +struct radv_amdgpu_sem_info *sem_info, struct radeon_winsys_cs **cs_array, unsigned cs_count, struct radeon_winsys_cs *initial_preamble_cs, @@ -775,7 +796,7 @@ static int radv_amdgpu_winsys_cs_submit_fallback(struct radeon_winsys_ctx *_ctx, } } - r = amdgpu_cs_submit(ctx->ctx, 0, &request, 1); + r = radv_amdgpu_cs_submit(ctx, &request, sem_info); if (r) { if (r == -ENOMEM) fprintf(stderr, "amdgpu: Not enough memory for command submission.\n"); @@ -801,6 +822,7 @@ static int radv_amdgpu_winsys_cs_submit_fallback(struct radeon_winsys_ctx *_ctx, static int radv_amdgpu_winsys_cs_submit_sysmem(struct radeon_winsys_ctx *_ctx, int queue_idx, + struct radv_amdgpu_sem_info *sem_info, struct radeon_winsys_cs **cs_array, unsigned cs_count, struct radeon_winsys_cs *initial_pr
[PATCH 2/2] drm/amdgpu: Implement ttm_bo_driver.access_memory callback v2
Allows gdb to access contents of user mode mapped VRAM BOs. v2: return error for non-VRAM pools Signed-off-by: Felix Kuehling Reviewed-by: Michel Dänzer Acked-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 62 + 1 file changed, 62 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index ff5614b..4d2a454 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -1115,6 +1115,67 @@ static bool amdgpu_ttm_bo_eviction_valuable(struct ttm_buffer_object *bo, return ttm_bo_eviction_valuable(bo, place); } +static int amdgpu_ttm_access_memory(struct ttm_buffer_object *bo, + unsigned long offset, + void *buf, int len, int write) +{ + struct amdgpu_bo *abo = container_of(bo, struct amdgpu_bo, tbo); + struct amdgpu_device *adev = amdgpu_ttm_adev(abo->tbo.bdev); + struct drm_mm_node *nodes = abo->tbo.mem.mm_node; + uint32_t value = 0; + int ret = 0; + uint64_t pos; + unsigned long flags; + + if (bo->mem.mem_type != TTM_PL_VRAM) + return -EIO; + + while (offset >= (nodes->size << PAGE_SHIFT)) { + offset -= nodes->size << PAGE_SHIFT; + ++nodes; + } + pos = (nodes->start << PAGE_SHIFT) + offset; + + while (len && pos < adev->mc.mc_vram_size) { + uint64_t aligned_pos = pos & ~(uint64_t)3; + uint32_t bytes = 4 - (pos & 3); + uint32_t shift = (pos & 3) * 8; + uint32_t mask = 0x << shift; + + if (len < bytes) { + mask &= 0x >> (bytes - len) * 8; + bytes = len; + } + + spin_lock_irqsave(&adev->mmio_idx_lock, flags); + WREG32(mmMM_INDEX, ((uint32_t)aligned_pos) | 0x8000); + WREG32(mmMM_INDEX_HI, aligned_pos >> 31); + if (!write || mask != 0x) + value = RREG32(mmMM_DATA); + if (write) { + value &= ~mask; + value |= (*(uint32_t *)buf << shift) & mask; + WREG32(mmMM_DATA, value); + } + spin_unlock_irqrestore(&adev->mmio_idx_lock, flags); + if (!write) { + value = (value & mask) >> shift; + memcpy(buf, &value, bytes); + } + + ret += bytes; + buf = (uint8_t *)buf + bytes; + pos += bytes; + len -= bytes; + if (pos >= (nodes->start + nodes->size) << PAGE_SHIFT) { + ++nodes; + pos = (nodes->start << PAGE_SHIFT); + } + } + + return ret; +} + static struct ttm_bo_driver amdgpu_bo_driver = { .ttm_tt_create = &amdgpu_ttm_tt_create, .ttm_tt_populate = &amdgpu_ttm_tt_populate, @@ -1130,6 +1191,7 @@ static bool amdgpu_ttm_bo_eviction_valuable(struct ttm_buffer_object *bo, .io_mem_reserve = &amdgpu_ttm_io_mem_reserve, .io_mem_free = &amdgpu_ttm_io_mem_free, .io_mem_pfn = amdgpu_ttm_io_mem_pfn, + .access_memory = &amdgpu_ttm_access_memory }; int amdgpu_ttm_init(struct amdgpu_device *adev) -- 1.9.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 1/2] drm/ttm: Implement vm_operations_struct.access v2
Allows gdb to access contents of user mode mapped BOs. System memory is handled by TTM using kmap. Other memory pools require a new driver callback in ttm_bo_driver. v2: * kmap only one page at a time * swap in BO if needed * make driver callback more generic to handle private memory pools * document callback return value * WARN_ON -> WARN_ON_ONCE Signed-off-by: Felix Kuehling --- drivers/gpu/drm/ttm/ttm_bo_vm.c | 79 - include/drm/ttm/ttm_bo_driver.h | 17 + 2 files changed, 95 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c index 9f53df9..945985e 100644 --- a/drivers/gpu/drm/ttm/ttm_bo_vm.c +++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c @@ -294,10 +294,87 @@ static void ttm_bo_vm_close(struct vm_area_struct *vma) vma->vm_private_data = NULL; } +static int ttm_bo_vm_access_kmap(struct ttm_buffer_object *bo, +unsigned long offset, +void *buf, int len, int write) +{ + unsigned long page = offset >> PAGE_SHIFT; + unsigned long bytes_left = len; + int ret; + + /* Copy a page at a time, that way no extra virtual address +* mapping is needed +*/ + offset -= page << PAGE_SHIFT; + do { + unsigned long bytes = min(bytes_left, PAGE_SIZE - offset); + struct ttm_bo_kmap_obj map; + void *ptr; + bool is_iomem; + + ret = ttm_bo_kmap(bo, page, 1, &map); + if (ret) + return ret; + + ptr = (uint8_t *)ttm_kmap_obj_virtual(&map, &is_iomem) + offset; + WARN_ON_ONCE(is_iomem); + if (write) + memcpy(ptr, buf, bytes); + else + memcpy(buf, ptr, bytes); + ttm_bo_kunmap(&map); + + page++; + bytes_left -= bytes; + offset = 0; + } while (bytes_left); + + return len; +} + +static int ttm_bo_vm_access(struct vm_area_struct *vma, unsigned long addr, + void *buf, int len, int write) +{ + unsigned long offset = (addr) - vma->vm_start; + struct ttm_buffer_object *bo = vma->vm_private_data; + int ret; + + if (len < 1 || (offset + len) >> PAGE_SHIFT > bo->num_pages) + return -EIO; + + ret = ttm_bo_reserve(bo, true, false, NULL); + if (ret) + return ret; + + switch(bo->mem.mem_type) { + case TTM_PL_SYSTEM: + if (unlikely(bo->ttm->page_flags & TTM_PAGE_FLAG_SWAPPED)) { + ret = ttm_tt_swapin(bo->ttm); + if (unlikely(ret != 0)) + return ret; + } + /* fall through */ + case TTM_PL_TT: + ret = ttm_bo_vm_access_kmap(bo, offset, buf, len, write); + break; + default: + if (bo->bdev->driver->access_memory) + ret = bo->bdev->driver->access_memory( + bo, offset, buf, len, write); + else + ret = -EIO; + } + + ttm_bo_unreserve(bo); + + return ret; +} + static const struct vm_operations_struct ttm_bo_vm_ops = { .fault = ttm_bo_vm_fault, .open = ttm_bo_vm_open, - .close = ttm_bo_vm_close + .close = ttm_bo_vm_close, + .access = ttm_bo_vm_access }; static struct ttm_buffer_object *ttm_bo_vm_lookup(struct ttm_bo_device *bdev, diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h index 6bbd34d..04380ba 100644 --- a/include/drm/ttm/ttm_bo_driver.h +++ b/include/drm/ttm/ttm_bo_driver.h @@ -471,6 +471,23 @@ struct ttm_bo_driver { */ unsigned long (*io_mem_pfn)(struct ttm_buffer_object *bo, unsigned long page_offset); + + /** +* Read/write memory buffers for ptrace access +* +* @bo: the BO to access +* @offset: the offset from the start of the BO +* @buf: pointer to source/destination buffer +* @len: number of bytes to copy +* @write: whether to read (0) from or write (non-0) to BO +* +* If successful, this function should return the number of +* bytes copied, -EIO otherwise. If the number of bytes +* returned is < len, the function may be called again with +* the remainder of the buffer to copy. +*/ + int (*access_memory)(struct ttm_buffer_object *bo, unsigned long offset, +void *buf, int len, int write); }; /** -- 1.9.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH libdrm] drm/amdgpu: add new low overhead command submission API. (v2)
From: Dave Airlie This just sends chunks to the kernel API for a single command stream. This should provide a more future proof and extensible API for command submission. v2: use amdgpu_bo_list_handle, add two helper functions to access bo and context internals. Signed-off-by: Dave Airlie --- amdgpu/amdgpu.h| 30 ++ amdgpu/amdgpu_cs.c | 47 +++ 2 files changed, 77 insertions(+) diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h index 183f974..238b1aa 100644 --- a/amdgpu/amdgpu.h +++ b/amdgpu/amdgpu.h @@ -1382,6 +1382,36 @@ int amdgpu_cs_import_syncobj(amdgpu_device_handle dev, int shared_fd, uint32_t *syncobj); +/** + * Submit raw command submission to kernel + * + * \param dev - \c [in] device handle + * \param context- \c [in] context handle for context id + * \param bo_list_handle - \c [in] request bo list handle (0 for none) + * \param num_chunks - \c [in] number of CS chunks to submit + * \param chunks - \c [in] array of CS chunks + * \param seq_no - \c [out] output sequence number for submission. + * + * \return 0 on success\n + * <0 - Negative POSIX Error code + * + */ +struct drm_amdgpu_cs_chunk; +struct drm_amdgpu_cs_chunk_dep; +struct drm_amdgpu_cs_chunk_data; + +int amdgpu_cs_submit_raw(amdgpu_device_handle dev, +amdgpu_context_handle context, +amdgpu_bo_list_handle bo_list_handle, +int num_chunks, +struct drm_amdgpu_cs_chunk *chunks, +uint64_t *seq_no); + +void amdgpu_cs_chunk_fence_to_dep(struct amdgpu_cs_fence *fence, + struct drm_amdgpu_cs_chunk_dep *dep); +void amdgpu_cs_chunk_fence_info_to_data(struct amdgpu_cs_fence_info *fence_info, + struct drm_amdgpu_cs_chunk_data *data); + #ifdef __cplusplus } #endif diff --git a/amdgpu/amdgpu_cs.c b/amdgpu/amdgpu_cs.c index 722fd75..dfba875 100644 --- a/amdgpu/amdgpu_cs.c +++ b/amdgpu/amdgpu_cs.c @@ -634,3 +634,50 @@ int amdgpu_cs_import_syncobj(amdgpu_device_handle dev, return drmSyncobjFDToHandle(dev->fd, shared_fd, handle); } + +int amdgpu_cs_submit_raw(amdgpu_device_handle dev, +amdgpu_context_handle context, +amdgpu_bo_list_handle bo_list_handle, +int num_chunks, +struct drm_amdgpu_cs_chunk *chunks, +uint64_t *seq_no) +{ + union drm_amdgpu_cs cs = {0}; + uint64_t *chunk_array; + int i, r; + if (num_chunks == 0) + return -EINVAL; + + chunk_array = alloca(sizeof(uint64_t) * num_chunks); + for (i = 0; i < num_chunks; i++) + chunk_array[i] = (uint64_t)(uintptr_t)&chunks[i]; + cs.in.chunks = (uint64_t)(uintptr_t)chunk_array; + cs.in.ctx_id = context->id; + cs.in.bo_list_handle = bo_list_handle ? bo_list_handle->handle : 0; + cs.in.num_chunks = num_chunks; + r = drmCommandWriteRead(dev->fd, DRM_AMDGPU_CS, + &cs, sizeof(cs)); + if (r) + return r; + + if (seq_no) + *seq_no = cs.out.handle; + return 0; +} + +void amdgpu_cs_chunk_fence_info_to_data(struct amdgpu_cs_fence_info *fence_info, + struct drm_amdgpu_cs_chunk_data *data) +{ + data->fence_data.handle = fence_info->handle->handle; + data->fence_data.offset = fence_info->offset * sizeof(uint64_t); +} + +void amdgpu_cs_chunk_fence_to_dep(struct amdgpu_cs_fence *fence, + struct drm_amdgpu_cs_chunk_dep *dep) +{ + dep->ip_type = fence->ip_type; + dep->ip_instance = fence->ip_instance; + dep->ring = fence->ring; + dep->ctx_id = fence->context->id; + dep->handle = fence->fence; +} -- 2.9.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
RE: [PATCH] drm/amdgpu: read reg in each iterate of psp_wait_for loop
> -Original Message- > From: Junwei Zhang [mailto:jerry.zh...@amd.com] > Sent: Monday, July 17, 2017 10:54 PM > To: amd-gfx@lists.freedesktop.org > Cc: Deucher, Alexander; Huang, Ray; gre...@linuxfoundation.org; Zhang, > Jerry; sta...@vger.kernel.org > Subject: [PATCH] drm/amdgpu: read reg in each iterate of psp_wait_for loop > > From: "Zhang, Jerry" > > v2: fixes the SOS loading failure for PSP v3.1 > > Signed-off-by: Junwei Zhang > Cc: sta...@vger.kernel.org > Acked-by: Alex Deucher (v1) > Acked-by: Huang Rui (v1) Reviewed-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 3 +-- > drivers/gpu/drm/amd/amdgpu/psp_v3_1.c | 2 -- > 2 files changed, 1 insertion(+), 4 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c > index c919579..644941d 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c > @@ -98,9 +98,8 @@ int psp_wait_for(struct psp_context *psp, uint32_t > reg_index, > int i; > struct amdgpu_device *adev = psp->adev; > > - val = RREG32(reg_index); > - > for (i = 0; i < adev->usec_timeout; i++) { > + val = RREG32(reg_index); > if (check_changed) { > if (val != reg_val) > return 0; > diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c > b/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c > index 2718e86..23106e3 100644 > --- a/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c > +++ b/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c > @@ -237,11 +237,9 @@ int psp_v3_1_bootloader_load_sos(struct > psp_context *psp) > > /* there might be handshake issue with hardware which needs delay > */ > mdelay(20); > -#if 0 > ret = psp_wait_for(psp, SOC15_REG_OFFSET(MP0, 0, > mmMP0_SMN_C2PMSG_81), > RREG32_SOC15(MP0, 0, > mmMP0_SMN_C2PMSG_81), > 0, true); > -#endif > > return ret; > } > -- > 1.9.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH libdrm] libdrm_amdgpu: add kernel semaphore support
On 18 July 2017 at 03:02, Christian König wrote: > Am 17.07.2017 um 05:36 schrieb Dave Airlie: >>> >>> I can take a look at it, I just won't have time until next week most >>> likely. >> >> I've taken a look, and it's seemingly more complicated than I'm >> expecting I'd want to land in Mesa before 17.2 ships, I'd really >> prefer to just push the new libdrm_amdgpu api from this patch. If I >> have to port all the current radv code to the new API, I'll most >> definitely get something wrong. >> >> Adding the new API so far looks like >> https://cgit.freedesktop.org/~airlied/drm/log/?h=drm-amdgpu-cs-submit-raw >> >> >> https://cgit.freedesktop.org/~airlied/drm/commit/?h=drm-amdgpu-cs-submit-raw&id=e7f85d0ca617fa41e72624780c9035df132e23c4 >> being the API, and whether it should take a uint32_t context id or >> context handle left as an open question in the last patch in the >> series. > > > I would stick with the context handle, as far as I can see there isn't any > value in using the uint32_t for this. > > We just want to be able to send arbitrary chunks down into the kernel > without libdrm_amdgpu involvement and/or the associated overhead of the > extra loop and the semaphore handling. > > So your "amdgpu/cs: add new raw cs submission interface just taking chunks" > patch looks fine to me as far as I can tell. > > As far as I can see the "amdgpu: refactor semaphore handling" patch is > actually incorrect. We must hole the mutex while sending the CS down to the > kernel, or otherwise "context->last_seq" won't be accurate. > >> However to hook this into radv or radeonsi will take a bit of >> rewriting of a lot of code that is probably a bit more fragile than >> I'd like for this sort of surgery at this point. > > > Again, I can move over the existing Mesa stuff if you like. > >> I'd actually suspect if we do want to proceed with this type of >> interface, we might be better doing it all in common mesa code, and >> maybe bypassing libdrm_amdgpu altogether, which I suppose the API I've >> written here is mostly already doing. > > > I want to stick with the other interfaces for now. No need to make it more > complicated than it already is. > > Only the CS stuff is the most performance critical and thing we have right > now. As I suspected this plan is full of traps. So with the raw cs api I posted (using amdgpu_bo_list_handle instead), I ran into two places the abstraction cuts me. CC winsys/amdgpu/radv_amdgpu_cs.lo winsys/amdgpu/radv_amdgpu_cs.c: In function ‘radv_amdgpu_cs_submit’: winsys/amdgpu/radv_amdgpu_cs.c:1173:63: error: dereferencing pointer to incomplete type ‘struct amdgpu_bo’ chunk_data[i].fence_data.handle = request->fence_info.handle->handle; ^~ winsys/amdgpu/radv_amdgpu_cs.c:1193:31: error: dereferencing pointer to incomplete type ‘struct amdgpu_context’ dep->ctx_id = info->context->id; In order to do user fence chunk I need the actual bo handle not the amdgpu wrapped one, we don't have an accessor method for that. In order to do the dependencies chunks, I need a context id. Now I suppose I can add chunk creation helpers to libdrm, but it does seems like it breaks the future proof interface if we can't access the details of a bunch of objects we want to pass through to the kernel API. Dave. ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH] drm/amdgpu: read reg in each iterate of psp_wait_for loop
From: "Zhang, Jerry" v2: fixes the SOS loading failure for PSP v3.1 Signed-off-by: Junwei Zhang Cc: sta...@vger.kernel.org Acked-by: Alex Deucher (v1) Acked-by: Huang Rui (v1) --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 3 +-- drivers/gpu/drm/amd/amdgpu/psp_v3_1.c | 2 -- 2 files changed, 1 insertion(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c index c919579..644941d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c @@ -98,9 +98,8 @@ int psp_wait_for(struct psp_context *psp, uint32_t reg_index, int i; struct amdgpu_device *adev = psp->adev; - val = RREG32(reg_index); - for (i = 0; i < adev->usec_timeout; i++) { + val = RREG32(reg_index); if (check_changed) { if (val != reg_val) return 0; diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c b/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c index 2718e86..23106e3 100644 --- a/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c +++ b/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c @@ -237,11 +237,9 @@ int psp_v3_1_bootloader_load_sos(struct psp_context *psp) /* there might be handshake issue with hardware which needs delay */ mdelay(20); -#if 0 ret = psp_wait_for(psp, SOC15_REG_OFFSET(MP0, 0, mmMP0_SMN_C2PMSG_81), RREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_81), 0, true); -#endif return ret; } -- 1.9.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: read reg in each iterate of psp_wait_for loop
On 07/17/2017 07:45 PM, Huang Rui wrote: On Mon, Jul 17, 2017 at 06:57:41PM +0800, Greg KH wrote: On Mon, Jul 17, 2017 at 04:56:26PM +0800, Zhang, Jerry (Junwei) wrote: > + sta...@vger.kernel.org This is not the correct way to submit patches for inclusion in the stable kernel tree. Please read: https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html for how to do this properly. Thanks, Greg. :-) Thanks Greg to reminder that BTW: please add Cc: in your patch, it need be backported to stable tree. Jerry, I might not describe it clearly. We need follow the rule that Greg provided. Actually, I meant to add Cc in your commit message like below, then sent it out: Thanks to explain in detail. I will prepare it them as one patch again. Jerry 8<-- Subject: [PATCH] drm/amdgpu: read reg in each iterate of psp_wait_for loop This fixes the SOS loading failure of psp v3.1. Signed-off-by: Junwei Zhang Cc: sta...@vger.kernel.org 8<-- And you'd better squeeze the two patches as one (actually it's only one fix) to make backporting more smooth. Thanks, Rui ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH libdrm 1/2] drm/amdgpu: add syncobj create/destroy/import/export apis
On 2017年07月18日 08:48, Dave Airlie wrote: From: Dave Airlie These are just wrappers using the amdgpu device handle. Signed-off-by: Dave Airlie Acked-by: Chunming Zhou --- amdgpu/amdgpu.h| 55 +- amdgpu/amdgpu_cs.c | 38 + 2 files changed, 92 insertions(+), 1 deletion(-) diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h index 1901fa8..183f974 100644 --- a/amdgpu/amdgpu.h +++ b/amdgpu/amdgpu.h @@ -1328,8 +1328,61 @@ int amdgpu_cs_destroy_semaphore(amdgpu_semaphore_handle sem); */ const char *amdgpu_get_marketing_name(amdgpu_device_handle dev); +/** + * Create kernel sync object + * + * \param dev - \c [in] device handle + * \param syncobj - \c [out] sync object handle + * + * \return 0 on success\n + * <0 - Negative POSIX Error code + * +*/ +int amdgpu_cs_create_syncobj(amdgpu_device_handle dev, +uint32_t *syncobj); +/** + * Destroy kernel sync object + * + * \param dev- \c [in] device handle + * \param syncobj - \c [in] sync object handle + * + * \return 0 on success\n + * <0 - Negative POSIX Error code + * +*/ +int amdgpu_cs_destroy_syncobj(amdgpu_device_handle dev, + uint32_t syncobj); + +/** + * Export kernel sync object to shareable fd. + * + * \param dev - \c [in] device handle + * \param syncobj- \c [in] sync object handle + * \param shared_fd - \c [out] shared file descriptor. + * + * \return 0 on success\n + * <0 - Negative POSIX Error code + * +*/ +int amdgpu_cs_export_syncobj(amdgpu_device_handle dev, +uint32_t syncobj, +int *shared_fd); +/** + * Import kernel sync object from shareable fd. + * + * \param dev - \c [in] device handle + * \param shared_fd - \c [in] shared file descriptor. + * \param syncobj- \c [out] sync object handle + * + * \return 0 on success\n + * <0 - Negative POSIX Error code + * +*/ +int amdgpu_cs_import_syncobj(amdgpu_device_handle dev, +int shared_fd, +uint32_t *syncobj); + #ifdef __cplusplus } #endif - #endif /* #ifdef _AMDGPU_H_ */ diff --git a/amdgpu/amdgpu_cs.c b/amdgpu/amdgpu_cs.c index 868eb7b..722fd75 100644 --- a/amdgpu/amdgpu_cs.c +++ b/amdgpu/amdgpu_cs.c @@ -596,3 +596,41 @@ int amdgpu_cs_destroy_semaphore(amdgpu_semaphore_handle sem) { return amdgpu_cs_unreference_sem(sem); } + +int amdgpu_cs_create_syncobj(amdgpu_device_handle dev, +uint32_t *handle) +{ + if (NULL == dev) + return -EINVAL; + + return drmSyncobjCreate(dev->fd, 0, handle); +} + +int amdgpu_cs_destroy_syncobj(amdgpu_device_handle dev, + uint32_t handle) +{ + if (NULL == dev) + return -EINVAL; + + return drmSyncobjDestroy(dev->fd, handle); +} + +int amdgpu_cs_export_syncobj(amdgpu_device_handle dev, +uint32_t handle, +int *shared_fd) +{ + if (NULL == dev) + return -EINVAL; + + return drmSyncobjHandleToFD(dev->fd, handle, shared_fd); +} + +int amdgpu_cs_import_syncobj(amdgpu_device_handle dev, +int shared_fd, +uint32_t *handle) +{ + if (NULL == dev) + return -EINVAL; + + return drmSyncobjFDToHandle(dev->fd, shared_fd, handle); +} ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH libdrm] libdrm_amdgpu: add kernel semaphore support
On 2017年07月18日 01:35, Christian König wrote: Am 17.07.2017 um 19:22 schrieb Marek Olšák: On Sun, Jul 16, 2017 at 11:36 PM, Dave Airlie wrote: I can take a look at it, I just won't have time until next week most likely. I've taken a look, and it's seemingly more complicated than I'm expecting I'd want to land in Mesa before 17.2 ships, I'd really prefer to just push the new libdrm_amdgpu api from this patch. If I have to port all the current radv code to the new API, I'll most definitely get something wrong. Adding the new API so far looks like https://cgit.freedesktop.org/~airlied/drm/log/?h=drm-amdgpu-cs-submit-raw https://cgit.freedesktop.org/~airlied/drm/commit/?h=drm-amdgpu-cs-submit-raw&id=e7f85d0ca617fa41e72624780c9035df132e23c4 being the API, and whether it should take a uint32_t context id or context handle left as an open question in the last patch in the series. However to hook this into radv or radeonsi will take a bit of rewriting of a lot of code that is probably a bit more fragile than I'd like for this sort of surgery at this point. I'd actually suspect if we do want to proceed with this type of interface, we might be better doing it all in common mesa code, and maybe bypassing libdrm_amdgpu altogether, which I suppose the API I've written here is mostly already doing. Well, we plan to stop using the BO list ioctl. The interface has bo_list_handle in it. Will we just set it to 0 when add the chunk for the inlined buffer list i.e. what radeon has? Yeah, exactly that was my thinking as well. Just one thought, Could we remove and not use bo list at all? Instead, we expose api like amdgpu_bo_make_resident with proper privilege to user mode? That way, we will obviously short CS ioctl. David Zhou Christian. Marek ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 2/2] drm/amdgpu: enable huge page handling in the VM v4
Still holding on? I thought this patch was pushed in earlier with my RB. Regards, David Zhou On 2017年07月18日 05:02, Christian König wrote: From: Christian König The hardware can use huge pages to map 2MB of address space with only one PDE. v2: few cleanups and rebased v3: skip PT updates if we are using the PDE v4: rebased, added support for CPU based updates Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 119 +++-- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 4 ++ 2 files changed, 103 insertions(+), 20 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index a3dbebe..62d97f5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -351,6 +351,7 @@ static int amdgpu_vm_alloc_levels(struct amdgpu_device *adev, entry->bo = pt; entry->addr = 0; + entry->huge_page = false; } if (level < adev->vm_manager.num_level) { @@ -1116,7 +1117,8 @@ static int amdgpu_vm_update_level(struct amdgpu_device *adev, pt = amdgpu_bo_gpu_offset(bo); pt = amdgpu_gart_get_vm_pde(adev, pt); - if (parent->entries[pt_idx].addr == pt) + if (parent->entries[pt_idx].addr == pt || + parent->entries[pt_idx].huge_page) continue; parent->entries[pt_idx].addr = pt; @@ -1257,29 +1259,95 @@ int amdgpu_vm_update_directories(struct amdgpu_device *adev, } /** - * amdgpu_vm_find_pt - find the page table for an address + * amdgpu_vm_find_entry - find the entry for an address * * @p: see amdgpu_pte_update_params definition * @addr: virtual address in question + * @entry: resulting entry or NULL + * @parent: parent entry * - * Find the page table BO for a virtual address, return NULL when none found. + * Find the vm_pt entry and it's parent for the given address. */ -static struct amdgpu_bo *amdgpu_vm_get_pt(struct amdgpu_pte_update_params *p, - uint64_t addr) +void amdgpu_vm_get_entry(struct amdgpu_pte_update_params *p, uint64_t addr, +struct amdgpu_vm_pt **entry, +struct amdgpu_vm_pt **parent) { - struct amdgpu_vm_pt *entry = &p->vm->root; unsigned idx, level = p->adev->vm_manager.num_level; - while (entry->entries) { + *parent = NULL; + *entry = &p->vm->root; + while ((*entry)->entries) { idx = addr >> (p->adev->vm_manager.block_size * level--); - idx %= amdgpu_bo_size(entry->bo) / 8; - entry = &entry->entries[idx]; + idx %= amdgpu_bo_size((*entry)->bo) / 8; + *parent = *entry; + *entry = &(*entry)->entries[idx]; } if (level) - return NULL; + *entry = NULL; +} + +/** + * amdgpu_vm_handle_huge_pages - handle updating the PD with huge pages + * + * @p: see amdgpu_pte_update_params definition + * @entry: vm_pt entry to check + * @parent: parent entry + * @nptes: number of PTEs updated with this operation + * @dst: destination address where the PTEs should point to + * @flags: access flags fro the PTEs + * + * Check if we can update the PD with a huge page. + */ +static int amdgpu_vm_handle_huge_pages(struct amdgpu_pte_update_params *p, + struct amdgpu_vm_pt *entry, + struct amdgpu_vm_pt *parent, + unsigned nptes, uint64_t dst, + uint64_t flags) +{ + bool use_cpu_update = (p->func == amdgpu_vm_cpu_set_ptes); + uint64_t pd_addr, pde; + int r; - return entry->bo; + /* In the case of a mixed PT the PDE must point to it*/ + if (p->adev->asic_type < CHIP_VEGA10 || + nptes != AMDGPU_VM_PTE_COUNT(p->adev) || + p->func != amdgpu_vm_do_set_ptes || + !(flags & AMDGPU_PTE_VALID)) { + + dst = amdgpu_bo_gpu_offset(entry->bo); + dst = amdgpu_gart_get_vm_pde(p->adev, dst); + flags = AMDGPU_PTE_VALID; + } else { + flags |= AMDGPU_PDE_PTE; + } + + if (entry->addr == dst && + entry->huge_page == !!(flags & AMDGPU_PDE_PTE)) + return 0; + + entry->addr = dst; + entry->huge_page = !!(flags & AMDGPU_PDE_PTE); + + if (use_cpu_update) { + r = amdgpu_bo_kmap(parent->bo, (void *)&pd_addr); + if (r) + return r; + + pde = pd_addr + (entry - parent->entries) * 8; + amdgpu_vm_cpu_set_ptes(p, pde, dst, 1, 0, flags); + } else { + if (parent->bo->shadow) { + pd_addr = amdgpu_bo_gpu_offset(parent->bo->shadow); +
[PATCH libdrm 2/2] drm/amdgpu: add new low overhead command submission API.
From: Dave Airlie This just sends chunks to the kernel API for a single command stream. This should provide a more future proof and extensible API for command submission. Signed-off-by: Dave Airlie --- amdgpu/amdgpu.h| 21 + amdgpu/amdgpu_cs.c | 30 ++ 2 files changed, 51 insertions(+) diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h index 183f974..b4a070d 100644 --- a/amdgpu/amdgpu.h +++ b/amdgpu/amdgpu.h @@ -1382,6 +1382,27 @@ int amdgpu_cs_import_syncobj(amdgpu_device_handle dev, int shared_fd, uint32_t *syncobj); +/** + * Submit raw command submission to kernel + * + * \param dev - \c [in] device handle + * \param context- \c [in] context handle for context id + * \param bo_list_handle - \c [in] request bo list handle (0 for none) + * \param num_chunks - \c [in] number of CS chunks to submit + * \param chunks - \c [in] array of CS chunks + * \param seq_no - \c [out] output sequence number for submission. + * + * \return 0 on success\n + * <0 - Negative POSIX Error code + * +*/ +struct drm_amdgpu_cs_chunk; +int amdgpu_cs_submit_raw(amdgpu_device_handle dev, +amdgpu_context_handle context, +uint32_t bo_list_handle, +int num_chunks, +struct drm_amdgpu_cs_chunk *chunks, +uint64_t *seq_no); #ifdef __cplusplus } #endif diff --git a/amdgpu/amdgpu_cs.c b/amdgpu/amdgpu_cs.c index 722fd75..3c32070 100644 --- a/amdgpu/amdgpu_cs.c +++ b/amdgpu/amdgpu_cs.c @@ -634,3 +634,33 @@ int amdgpu_cs_import_syncobj(amdgpu_device_handle dev, return drmSyncobjFDToHandle(dev->fd, shared_fd, handle); } + +int amdgpu_cs_submit_raw(amdgpu_device_handle dev, +amdgpu_context_handle context, +uint32_t bo_list_handle, +int num_chunks, +struct drm_amdgpu_cs_chunk *chunks, +uint64_t *seq_no) +{ + union drm_amdgpu_cs cs = {0}; + uint64_t *chunk_array; + int i, r; + if (num_chunks == 0) + return -EINVAL; + + chunk_array = alloca(sizeof(uint64_t) * num_chunks); + for (i = 0; i < num_chunks; i++) + chunk_array[i] = (uint64_t)(uintptr_t)&chunks[i]; + cs.in.chunks = (uint64_t)(uintptr_t)chunk_array; + cs.in.ctx_id = context->id; + cs.in.bo_list_handle = bo_list_handle; + cs.in.num_chunks = num_chunks; + r = drmCommandWriteRead(dev->fd, DRM_AMDGPU_CS, + &cs, sizeof(cs)); + if (r) + return r; + + if (seq_no) + *seq_no = cs.out.handle; + return 0; +} -- 2.9.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH libdrm 1/2] drm/amdgpu: add syncobj create/destroy/import/export apis
From: Dave Airlie These are just wrappers using the amdgpu device handle. Signed-off-by: Dave Airlie --- amdgpu/amdgpu.h| 55 +- amdgpu/amdgpu_cs.c | 38 + 2 files changed, 92 insertions(+), 1 deletion(-) diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h index 1901fa8..183f974 100644 --- a/amdgpu/amdgpu.h +++ b/amdgpu/amdgpu.h @@ -1328,8 +1328,61 @@ int amdgpu_cs_destroy_semaphore(amdgpu_semaphore_handle sem); */ const char *amdgpu_get_marketing_name(amdgpu_device_handle dev); +/** + * Create kernel sync object + * + * \param dev - \c [in] device handle + * \param syncobj - \c [out] sync object handle + * + * \return 0 on success\n + * <0 - Negative POSIX Error code + * +*/ +int amdgpu_cs_create_syncobj(amdgpu_device_handle dev, +uint32_t *syncobj); +/** + * Destroy kernel sync object + * + * \param dev- \c [in] device handle + * \param syncobj - \c [in] sync object handle + * + * \return 0 on success\n + * <0 - Negative POSIX Error code + * +*/ +int amdgpu_cs_destroy_syncobj(amdgpu_device_handle dev, + uint32_t syncobj); + +/** + * Export kernel sync object to shareable fd. + * + * \param dev - \c [in] device handle + * \param syncobj- \c [in] sync object handle + * \param shared_fd - \c [out] shared file descriptor. + * + * \return 0 on success\n + * <0 - Negative POSIX Error code + * +*/ +int amdgpu_cs_export_syncobj(amdgpu_device_handle dev, +uint32_t syncobj, +int *shared_fd); +/** + * Import kernel sync object from shareable fd. + * + * \param dev - \c [in] device handle + * \param shared_fd - \c [in] shared file descriptor. + * \param syncobj- \c [out] sync object handle + * + * \return 0 on success\n + * <0 - Negative POSIX Error code + * +*/ +int amdgpu_cs_import_syncobj(amdgpu_device_handle dev, +int shared_fd, +uint32_t *syncobj); + #ifdef __cplusplus } #endif - #endif /* #ifdef _AMDGPU_H_ */ diff --git a/amdgpu/amdgpu_cs.c b/amdgpu/amdgpu_cs.c index 868eb7b..722fd75 100644 --- a/amdgpu/amdgpu_cs.c +++ b/amdgpu/amdgpu_cs.c @@ -596,3 +596,41 @@ int amdgpu_cs_destroy_semaphore(amdgpu_semaphore_handle sem) { return amdgpu_cs_unreference_sem(sem); } + +int amdgpu_cs_create_syncobj(amdgpu_device_handle dev, +uint32_t *handle) +{ + if (NULL == dev) + return -EINVAL; + + return drmSyncobjCreate(dev->fd, 0, handle); +} + +int amdgpu_cs_destroy_syncobj(amdgpu_device_handle dev, + uint32_t handle) +{ + if (NULL == dev) + return -EINVAL; + + return drmSyncobjDestroy(dev->fd, handle); +} + +int amdgpu_cs_export_syncobj(amdgpu_device_handle dev, +uint32_t handle, +int *shared_fd) +{ + if (NULL == dev) + return -EINVAL; + + return drmSyncobjHandleToFD(dev->fd, handle, shared_fd); +} + +int amdgpu_cs_import_syncobj(amdgpu_device_handle dev, +int shared_fd, +uint32_t *handle) +{ + if (NULL == dev) + return -EINVAL; + + return drmSyncobjFDToHandle(dev->fd, shared_fd, handle); +} -- 2.9.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 2/2] drm/amdgpu: enable huge page handling in the VM v4
Am 17.07.2017 um 23:30 schrieb Felix Kuehling: On 17-07-17 05:02 PM, Christian König wrote: + if (p->adev->asic_type < CHIP_VEGA10 || + nptes != AMDGPU_VM_PTE_COUNT(p->adev) || + p->func != amdgpu_vm_do_set_ptes || + !(flags & AMDGPU_PTE_VALID)) { Because of this condition, I think this still won't work correctly for cpu page table updates. p->func will be amdgpu_vm_cpu_set_ptes. Good point. This is totally untested anyway, because of lack of hardware access at the moment. Just wanted to point you to the bits I've changed for testing it. Christian. Regards, Felix ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 2/2] drm/amdgpu: enable huge page handling in the VM v4
On 17-07-17 05:02 PM, Christian König wrote: > + if (p->adev->asic_type < CHIP_VEGA10 || > + nptes != AMDGPU_VM_PTE_COUNT(p->adev) || > + p->func != amdgpu_vm_do_set_ptes || > + !(flags & AMDGPU_PTE_VALID)) { Because of this condition, I think this still won't work correctly for cpu page table updates. p->func will be amdgpu_vm_cpu_set_ptes. Regards, Felix ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 2/2] drm/amdgpu: enable huge page handling in the VM v4
The minimum BO size is 4K, so that can never happen. Christian. Am 17.07.2017 um 23:21 schrieb StDenis, Tom: In amdgpu_vm_get_entry() if the bo size is less than 8 you'll get a divide by zero. Are there mechanisms to prevent this? Maybe add a BUG() there? Tom From: amd-gfx on behalf of Christian König Sent: Monday, July 17, 2017 17:02 To: amd-gfx@lists.freedesktop.org Cc: Kuehling, Felix Subject: [PATCH 2/2] drm/amdgpu: enable huge page handling in the VM v4 From: Christian König The hardware can use huge pages to map 2MB of address space with only one PDE. v2: few cleanups and rebased v3: skip PT updates if we are using the PDE v4: rebased, added support for CPU based updates Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 119 +++-- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 4 ++ 2 files changed, 103 insertions(+), 20 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index a3dbebe..62d97f5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -351,6 +351,7 @@ static int amdgpu_vm_alloc_levels(struct amdgpu_device *adev, entry->bo = pt; entry->addr = 0; + entry->huge_page = false; } if (level < adev->vm_manager.num_level) { @@ -1116,7 +1117,8 @@ static int amdgpu_vm_update_level(struct amdgpu_device *adev, pt = amdgpu_bo_gpu_offset(bo); pt = amdgpu_gart_get_vm_pde(adev, pt); - if (parent->entries[pt_idx].addr == pt) + if (parent->entries[pt_idx].addr == pt || + parent->entries[pt_idx].huge_page) continue; parent->entries[pt_idx].addr = pt; @@ -1257,29 +1259,95 @@ int amdgpu_vm_update_directories(struct amdgpu_device *adev, } /** - * amdgpu_vm_find_pt - find the page table for an address + * amdgpu_vm_find_entry - find the entry for an address * * @p: see amdgpu_pte_update_params definition * @addr: virtual address in question + * @entry: resulting entry or NULL + * @parent: parent entry * - * Find the page table BO for a virtual address, return NULL when none found. + * Find the vm_pt entry and it's parent for the given address. */ -static struct amdgpu_bo *amdgpu_vm_get_pt(struct amdgpu_pte_update_params *p, - uint64_t addr) +void amdgpu_vm_get_entry(struct amdgpu_pte_update_params *p, uint64_t addr, +struct amdgpu_vm_pt **entry, +struct amdgpu_vm_pt **parent) { - struct amdgpu_vm_pt *entry = &p->vm->root; unsigned idx, level = p->adev->vm_manager.num_level; - while (entry->entries) { + *parent = NULL; + *entry = &p->vm->root; + while ((*entry)->entries) { idx = addr >> (p->adev->vm_manager.block_size * level--); - idx %= amdgpu_bo_size(entry->bo) / 8; - entry = &entry->entries[idx]; + idx %= amdgpu_bo_size((*entry)->bo) / 8; + *parent = *entry; + *entry = &(*entry)->entries[idx]; } if (level) - return NULL; + *entry = NULL; +} + +/** + * amdgpu_vm_handle_huge_pages - handle updating the PD with huge pages + * + * @p: see amdgpu_pte_update_params definition + * @entry: vm_pt entry to check + * @parent: parent entry + * @nptes: number of PTEs updated with this operation + * @dst: destination address where the PTEs should point to + * @flags: access flags fro the PTEs + * + * Check if we can update the PD with a huge page. + */ +static int amdgpu_vm_handle_huge_pages(struct amdgpu_pte_update_params *p, + struct amdgpu_vm_pt *entry, + struct amdgpu_vm_pt *parent, + unsigned nptes, uint64_t dst, + uint64_t flags) +{ + bool use_cpu_update = (p->func == amdgpu_vm_cpu_set_ptes); + uint64_t pd_addr, pde; + int r; - return entry->bo; + /* In the case of a mixed PT the PDE must point to it*/ + if (p->adev->asic_type < CHIP_VEGA10 || + nptes != AMDGPU_VM_PTE_COUNT(p->adev) || + p->func != amdgpu_vm_do_set_ptes || + !(flags & AMDGPU_PTE_VALID)) { + + dst = amdgpu_bo_gpu_offset(entry->bo); + dst = amdgpu_gart_get_vm_pde(p->adev, dst); + flags = AMDGPU_PTE_VALID; + } else { + flags |= AMDGPU_PDE_PTE; + } + + if (entry->addr == dst && + entry->huge_page == !!(flags & AMDGPU_PDE_PTE)) + return 0; + + entry->addr = dst; + entry->huge_page = !!(flags & AMDGPU_PDE_PTE); + +
Re: [PATCH 2/2] drm/amdgpu: enable huge page handling in the VM v4
In amdgpu_vm_get_entry() if the bo size is less than 8 you'll get a divide by zero. Are there mechanisms to prevent this? Maybe add a BUG() there? Tom From: amd-gfx on behalf of Christian König Sent: Monday, July 17, 2017 17:02 To: amd-gfx@lists.freedesktop.org Cc: Kuehling, Felix Subject: [PATCH 2/2] drm/amdgpu: enable huge page handling in the VM v4 From: Christian König The hardware can use huge pages to map 2MB of address space with only one PDE. v2: few cleanups and rebased v3: skip PT updates if we are using the PDE v4: rebased, added support for CPU based updates Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 119 +++-- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 4 ++ 2 files changed, 103 insertions(+), 20 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index a3dbebe..62d97f5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -351,6 +351,7 @@ static int amdgpu_vm_alloc_levels(struct amdgpu_device *adev, entry->bo = pt; entry->addr = 0; + entry->huge_page = false; } if (level < adev->vm_manager.num_level) { @@ -1116,7 +1117,8 @@ static int amdgpu_vm_update_level(struct amdgpu_device *adev, pt = amdgpu_bo_gpu_offset(bo); pt = amdgpu_gart_get_vm_pde(adev, pt); - if (parent->entries[pt_idx].addr == pt) + if (parent->entries[pt_idx].addr == pt || + parent->entries[pt_idx].huge_page) continue; parent->entries[pt_idx].addr = pt; @@ -1257,29 +1259,95 @@ int amdgpu_vm_update_directories(struct amdgpu_device *adev, } /** - * amdgpu_vm_find_pt - find the page table for an address + * amdgpu_vm_find_entry - find the entry for an address * * @p: see amdgpu_pte_update_params definition * @addr: virtual address in question + * @entry: resulting entry or NULL + * @parent: parent entry * - * Find the page table BO for a virtual address, return NULL when none found. + * Find the vm_pt entry and it's parent for the given address. */ -static struct amdgpu_bo *amdgpu_vm_get_pt(struct amdgpu_pte_update_params *p, - uint64_t addr) +void amdgpu_vm_get_entry(struct amdgpu_pte_update_params *p, uint64_t addr, +struct amdgpu_vm_pt **entry, +struct amdgpu_vm_pt **parent) { - struct amdgpu_vm_pt *entry = &p->vm->root; unsigned idx, level = p->adev->vm_manager.num_level; - while (entry->entries) { + *parent = NULL; + *entry = &p->vm->root; + while ((*entry)->entries) { idx = addr >> (p->adev->vm_manager.block_size * level--); - idx %= amdgpu_bo_size(entry->bo) / 8; - entry = &entry->entries[idx]; + idx %= amdgpu_bo_size((*entry)->bo) / 8; + *parent = *entry; + *entry = &(*entry)->entries[idx]; } if (level) - return NULL; + *entry = NULL; +} + +/** + * amdgpu_vm_handle_huge_pages - handle updating the PD with huge pages + * + * @p: see amdgpu_pte_update_params definition + * @entry: vm_pt entry to check + * @parent: parent entry + * @nptes: number of PTEs updated with this operation + * @dst: destination address where the PTEs should point to + * @flags: access flags fro the PTEs + * + * Check if we can update the PD with a huge page. + */ +static int amdgpu_vm_handle_huge_pages(struct amdgpu_pte_update_params *p, + struct amdgpu_vm_pt *entry, + struct amdgpu_vm_pt *parent, + unsigned nptes, uint64_t dst, + uint64_t flags) +{ + bool use_cpu_update = (p->func == amdgpu_vm_cpu_set_ptes); + uint64_t pd_addr, pde; + int r; - return entry->bo; + /* In the case of a mixed PT the PDE must point to it*/ + if (p->adev->asic_type < CHIP_VEGA10 || + nptes != AMDGPU_VM_PTE_COUNT(p->adev) || + p->func != amdgpu_vm_do_set_ptes || + !(flags & AMDGPU_PTE_VALID)) { + + dst = amdgpu_bo_gpu_offset(entry->bo); + dst = amdgpu_gart_get_vm_pde(p->adev, dst); + flags = AMDGPU_PTE_VALID; + } else { + flags |= AMDGPU_PDE_PTE; + } + + if (entry->addr == dst && + entry->huge_page == !!(flags & AMDGPU_PDE_PTE)) + return 0; + + entry->addr = dst; + entry->huge_page = !!(flags & AMDGPU_PDE_PTE); + + if (use_cpu_update) { + r = amdgpu_bo_kmap(parent->bo, (void *)&pd_addr); + if (r) +
[PATCH 2/2] drm/amdgpu: enable huge page handling in the VM v4
From: Christian König The hardware can use huge pages to map 2MB of address space with only one PDE. v2: few cleanups and rebased v3: skip PT updates if we are using the PDE v4: rebased, added support for CPU based updates Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 119 +++-- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 4 ++ 2 files changed, 103 insertions(+), 20 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index a3dbebe..62d97f5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -351,6 +351,7 @@ static int amdgpu_vm_alloc_levels(struct amdgpu_device *adev, entry->bo = pt; entry->addr = 0; + entry->huge_page = false; } if (level < adev->vm_manager.num_level) { @@ -1116,7 +1117,8 @@ static int amdgpu_vm_update_level(struct amdgpu_device *adev, pt = amdgpu_bo_gpu_offset(bo); pt = amdgpu_gart_get_vm_pde(adev, pt); - if (parent->entries[pt_idx].addr == pt) + if (parent->entries[pt_idx].addr == pt || + parent->entries[pt_idx].huge_page) continue; parent->entries[pt_idx].addr = pt; @@ -1257,29 +1259,95 @@ int amdgpu_vm_update_directories(struct amdgpu_device *adev, } /** - * amdgpu_vm_find_pt - find the page table for an address + * amdgpu_vm_find_entry - find the entry for an address * * @p: see amdgpu_pte_update_params definition * @addr: virtual address in question + * @entry: resulting entry or NULL + * @parent: parent entry * - * Find the page table BO for a virtual address, return NULL when none found. + * Find the vm_pt entry and it's parent for the given address. */ -static struct amdgpu_bo *amdgpu_vm_get_pt(struct amdgpu_pte_update_params *p, - uint64_t addr) +void amdgpu_vm_get_entry(struct amdgpu_pte_update_params *p, uint64_t addr, +struct amdgpu_vm_pt **entry, +struct amdgpu_vm_pt **parent) { - struct amdgpu_vm_pt *entry = &p->vm->root; unsigned idx, level = p->adev->vm_manager.num_level; - while (entry->entries) { + *parent = NULL; + *entry = &p->vm->root; + while ((*entry)->entries) { idx = addr >> (p->adev->vm_manager.block_size * level--); - idx %= amdgpu_bo_size(entry->bo) / 8; - entry = &entry->entries[idx]; + idx %= amdgpu_bo_size((*entry)->bo) / 8; + *parent = *entry; + *entry = &(*entry)->entries[idx]; } if (level) - return NULL; + *entry = NULL; +} + +/** + * amdgpu_vm_handle_huge_pages - handle updating the PD with huge pages + * + * @p: see amdgpu_pte_update_params definition + * @entry: vm_pt entry to check + * @parent: parent entry + * @nptes: number of PTEs updated with this operation + * @dst: destination address where the PTEs should point to + * @flags: access flags fro the PTEs + * + * Check if we can update the PD with a huge page. + */ +static int amdgpu_vm_handle_huge_pages(struct amdgpu_pte_update_params *p, + struct amdgpu_vm_pt *entry, + struct amdgpu_vm_pt *parent, + unsigned nptes, uint64_t dst, + uint64_t flags) +{ + bool use_cpu_update = (p->func == amdgpu_vm_cpu_set_ptes); + uint64_t pd_addr, pde; + int r; - return entry->bo; + /* In the case of a mixed PT the PDE must point to it*/ + if (p->adev->asic_type < CHIP_VEGA10 || + nptes != AMDGPU_VM_PTE_COUNT(p->adev) || + p->func != amdgpu_vm_do_set_ptes || + !(flags & AMDGPU_PTE_VALID)) { + + dst = amdgpu_bo_gpu_offset(entry->bo); + dst = amdgpu_gart_get_vm_pde(p->adev, dst); + flags = AMDGPU_PTE_VALID; + } else { + flags |= AMDGPU_PDE_PTE; + } + + if (entry->addr == dst && + entry->huge_page == !!(flags & AMDGPU_PDE_PTE)) + return 0; + + entry->addr = dst; + entry->huge_page = !!(flags & AMDGPU_PDE_PTE); + + if (use_cpu_update) { + r = amdgpu_bo_kmap(parent->bo, (void *)&pd_addr); + if (r) + return r; + + pde = pd_addr + (entry - parent->entries) * 8; + amdgpu_vm_cpu_set_ptes(p, pde, dst, 1, 0, flags); + } else { + if (parent->bo->shadow) { + pd_addr = amdgpu_bo_gpu_offset(parent->bo->shadow); + pde = pd_addr + (entry - parent->entries) * 8; + amdgpu_vm_do_set_ptes
[PATCH 1/2] drm/amdgpu: increase fragmentation size for Vega10 v2
From: Christian König The fragment bits work differently for Vega10 compared to previous generations. Increase the fragment size to 2MB for now to better handle that. v2: handle the hardware setup as well Signed-off-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 5 +++-- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 5 +++-- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 4 +++- drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c | 4 +++- drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c | 4 +++- 5 files changed, 15 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c index 7a8da32..fc77844 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c @@ -588,8 +588,9 @@ static int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file dev_info.virtual_address_offset = AMDGPU_VA_RESERVED_SIZE; dev_info.virtual_address_max = (uint64_t)adev->vm_manager.max_pfn * AMDGPU_GPU_PAGE_SIZE; dev_info.virtual_address_alignment = max((int)PAGE_SIZE, AMDGPU_GPU_PAGE_SIZE); - dev_info.pte_fragment_size = (1 << AMDGPU_LOG2_PAGES_PER_FRAG) * -AMDGPU_GPU_PAGE_SIZE; + dev_info.pte_fragment_size = + (1 << AMDGPU_LOG2_PAGES_PER_FRAG(adev)) * + AMDGPU_GPU_PAGE_SIZE; dev_info.gart_page_size = AMDGPU_GPU_PAGE_SIZE; dev_info.cu_active_number = adev->gfx.cu_info.number; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index 55d1c7f..a3dbebe 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -1381,8 +1381,9 @@ static int amdgpu_vm_frag_ptes(struct amdgpu_pte_update_params*params, */ /* SI and newer are optimized for 64KB */ - uint64_t frag_flags = AMDGPU_PTE_FRAG(AMDGPU_LOG2_PAGES_PER_FRAG); - uint64_t frag_align = 1 << AMDGPU_LOG2_PAGES_PER_FRAG; + unsigned pages_per_frag = AMDGPU_LOG2_PAGES_PER_FRAG(params->adev); + uint64_t frag_flags = AMDGPU_PTE_FRAG(pages_per_frag); + uint64_t frag_align = 1 << pages_per_frag; uint64_t frag_start = ALIGN(start, frag_align); uint64_t frag_end = end & ~(frag_align - 1); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h index 3441ec5..c4f5d1f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h @@ -51,7 +51,9 @@ struct amdgpu_bo_list_entry; #define AMDGPU_VM_PTB_ALIGN_SIZE 32768 /* LOG2 number of continuous pages for the fragment field */ -#define AMDGPU_LOG2_PAGES_PER_FRAG 4 +#define AMDGPU_LOG2_PAGES_PER_FRAG(adev) \ + ((adev)->asic_type < CHIP_VEGA10 ? 4 : \ +(adev)->vm_manager.block_size) #define AMDGPU_PTE_VALID (1ULL << 0) #define AMDGPU_PTE_SYSTEM (1ULL << 1) diff --git a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c index 008ad3d..408723e 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c @@ -129,7 +129,7 @@ static void gfxhub_v1_0_init_cache_regs(struct amdgpu_device *adev) /* Setup L2 cache */ tmp = RREG32_SOC15(GC, 0, mmVM_L2_CNTL); tmp = REG_SET_FIELD(tmp, VM_L2_CNTL, ENABLE_L2_CACHE, 1); - tmp = REG_SET_FIELD(tmp, VM_L2_CNTL, ENABLE_L2_FRAGMENT_PROCESSING, 0); + tmp = REG_SET_FIELD(tmp, VM_L2_CNTL, ENABLE_L2_FRAGMENT_PROCESSING, 1); /* XXX for emulation, Refer to closed source code.*/ tmp = REG_SET_FIELD(tmp, VM_L2_CNTL, L2_PDE0_CACHE_TAG_GENERATION_MODE, 0); @@ -144,6 +144,8 @@ static void gfxhub_v1_0_init_cache_regs(struct amdgpu_device *adev) WREG32_SOC15(GC, 0, mmVM_L2_CNTL2, tmp); tmp = mmVM_L2_CNTL3_DEFAULT; + tmp = REG_SET_FIELD(tmp, VM_L2_CNTL3, BANK_SELECT, 12); + tmp = REG_SET_FIELD(tmp, VM_L2_CNTL3, L2_CACHE_BIGK_FRAGMENT_SIZE, 9); WREG32_SOC15(GC, 0, mmVM_L2_CNTL3, tmp); tmp = mmVM_L2_CNTL4_DEFAULT; diff --git a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c index 96f1628..ad8def3 100644 --- a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c +++ b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c @@ -143,7 +143,7 @@ static void mmhub_v1_0_init_cache_regs(struct amdgpu_device *adev) /* Setup L2 cache */ tmp = RREG32_SOC15(MMHUB, 0, mmVM_L2_CNTL); tmp = REG_SET_FIELD(tmp, VM_L2_CNTL, ENABLE_L2_CACHE, 1); - tmp = REG_SET_FIELD(tmp, VM_L2_CNTL, ENABLE_L2_FRAGMENT_PROCESSING, 0); + tmp = REG_SET_FIELD(tmp, VM_L2_CNTL, ENABLE_L2_FRAGMENT_PROCESSING, 1); /* XXX for emulation, Refer to closed source code.*/ tmp = REG_SET_FIELD(tmp, VM_L2_CNTL, L2_PDE0_CACHE_TAG_GENERATION_MODE,
Re: [PATCH libdrm] libdrm_amdgpu: add kernel semaphore support
Am 17.07.2017 um 19:22 schrieb Marek Olšák: On Sun, Jul 16, 2017 at 11:36 PM, Dave Airlie wrote: I can take a look at it, I just won't have time until next week most likely. I've taken a look, and it's seemingly more complicated than I'm expecting I'd want to land in Mesa before 17.2 ships, I'd really prefer to just push the new libdrm_amdgpu api from this patch. If I have to port all the current radv code to the new API, I'll most definitely get something wrong. Adding the new API so far looks like https://cgit.freedesktop.org/~airlied/drm/log/?h=drm-amdgpu-cs-submit-raw https://cgit.freedesktop.org/~airlied/drm/commit/?h=drm-amdgpu-cs-submit-raw&id=e7f85d0ca617fa41e72624780c9035df132e23c4 being the API, and whether it should take a uint32_t context id or context handle left as an open question in the last patch in the series. However to hook this into radv or radeonsi will take a bit of rewriting of a lot of code that is probably a bit more fragile than I'd like for this sort of surgery at this point. I'd actually suspect if we do want to proceed with this type of interface, we might be better doing it all in common mesa code, and maybe bypassing libdrm_amdgpu altogether, which I suppose the API I've written here is mostly already doing. Well, we plan to stop using the BO list ioctl. The interface has bo_list_handle in it. Will we just set it to 0 when add the chunk for the inlined buffer list i.e. what radeon has? Yeah, exactly that was my thinking as well. Christian. Marek ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH libdrm] libdrm_amdgpu: add kernel semaphore support
On Sun, Jul 16, 2017 at 11:36 PM, Dave Airlie wrote: >> >> I can take a look at it, I just won't have time until next week most likely. > > I've taken a look, and it's seemingly more complicated than I'm > expecting I'd want to land in Mesa before 17.2 ships, I'd really > prefer to just push the new libdrm_amdgpu api from this patch. If I > have to port all the current radv code to the new API, I'll most > definitely get something wrong. > > Adding the new API so far looks like > https://cgit.freedesktop.org/~airlied/drm/log/?h=drm-amdgpu-cs-submit-raw > > https://cgit.freedesktop.org/~airlied/drm/commit/?h=drm-amdgpu-cs-submit-raw&id=e7f85d0ca617fa41e72624780c9035df132e23c4 > being the API, and whether it should take a uint32_t context id or > context handle left as an open question in the last patch in the > series. > > However to hook this into radv or radeonsi will take a bit of > rewriting of a lot of code that is probably a bit more fragile than > I'd like for this sort of surgery at this point. > > I'd actually suspect if we do want to proceed with this type of > interface, we might be better doing it all in common mesa code, and > maybe bypassing libdrm_amdgpu altogether, which I suppose the API I've > written here is mostly already doing. Well, we plan to stop using the BO list ioctl. The interface has bo_list_handle in it. Will we just set it to 0 when add the chunk for the inlined buffer list i.e. what radeon has? Marek ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 2/2] drm/amdgpu: Implement ttm_bo_driver.access_vram callback
Am 14.07.2017 um 21:44 schrieb Felix Kuehling: On 17-07-14 06:08 AM, Christian König wrote: Am 13.07.2017 um 23:08 schrieb Felix Kuehling: [SNIP] +result += bytes; +buf = (uint8_t *)buf + bytes; +pos += bytes; +len -= bytes; +if (pos >= (nodes->start + nodes->size) << PAGE_SHIFT) { +++nodes; +pos = (nodes->start << PAGE_SHIFT); ... Here I handle crossing a node boundary. Yes, I actually added this case to my kfdtest unit test and made sure it works, along with all odd alignments that the code above handles. Ah, I see. Sorry totally missed that chunk. In this case the patch is Acked-by: Christian König Regards, Christian. Regards, Felix ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH libdrm] libdrm_amdgpu: add kernel semaphore support
Am 17.07.2017 um 05:36 schrieb Dave Airlie: I can take a look at it, I just won't have time until next week most likely. I've taken a look, and it's seemingly more complicated than I'm expecting I'd want to land in Mesa before 17.2 ships, I'd really prefer to just push the new libdrm_amdgpu api from this patch. If I have to port all the current radv code to the new API, I'll most definitely get something wrong. Adding the new API so far looks like https://cgit.freedesktop.org/~airlied/drm/log/?h=drm-amdgpu-cs-submit-raw https://cgit.freedesktop.org/~airlied/drm/commit/?h=drm-amdgpu-cs-submit-raw&id=e7f85d0ca617fa41e72624780c9035df132e23c4 being the API, and whether it should take a uint32_t context id or context handle left as an open question in the last patch in the series. I would stick with the context handle, as far as I can see there isn't any value in using the uint32_t for this. We just want to be able to send arbitrary chunks down into the kernel without libdrm_amdgpu involvement and/or the associated overhead of the extra loop and the semaphore handling. So your "amdgpu/cs: add new raw cs submission interface just taking chunks" patch looks fine to me as far as I can tell. As far as I can see the "amdgpu: refactor semaphore handling" patch is actually incorrect. We must hole the mutex while sending the CS down to the kernel, or otherwise "context->last_seq" won't be accurate. However to hook this into radv or radeonsi will take a bit of rewriting of a lot of code that is probably a bit more fragile than I'd like for this sort of surgery at this point. Again, I can move over the existing Mesa stuff if you like. I'd actually suspect if we do want to proceed with this type of interface, we might be better doing it all in common mesa code, and maybe bypassing libdrm_amdgpu altogether, which I suppose the API I've written here is mostly already doing. I want to stick with the other interfaces for now. No need to make it more complicated than it already is. Only the CS stuff is the most performance critical and thing we have right now. Christian. ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH 2/4] drm/amdkfd: Remove unused references to shared_resources.num_mec
On Fri, Jul 14, 2017 at 4:21 AM, Jay Cornwall wrote: > Dead code. > > Change-Id: Ic0bb1bcca87e96bc5e8fa9894727b0de152e8818 > Signed-off-by: Jay Cornwall > --- > drivers/gpu/drm/amd/amdkfd/kfd_device.c | 4 > drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 7 --- > 2 files changed, 11 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c > b/drivers/gpu/drm/amd/amdkfd/kfd_device.c > index 1cf00d4..95f9396 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c > @@ -494,10 +494,6 @@ bool kgd2kfd_device_init(struct kfd_dev *kfd, > } else > kfd->max_proc_per_quantum = hws_max_conc_proc; > > - /* We only use the first MEC */ > - if (kfd->shared_resources.num_mec > 1) > - kfd->shared_resources.num_mec = 1; > - > /* calculate max size of mqds needed for queues */ > size = max_num_of_queues_per_device * > kfd->device_info->mqd_size_aligned; > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > index 7607989..306144f 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c > @@ -82,13 +82,6 @@ static bool is_pipe_enabled(struct device_queue_manager > *dqm, int mec, int pipe) > return false; > } > > -unsigned int get_mec_num(struct device_queue_manager *dqm) > -{ > - BUG_ON(!dqm || !dqm->dev); > - > - return dqm->dev->shared_resources.num_mec; > -} > - FYI, I removed also the declaration of get_mec_num() in the header file Oded > unsigned int get_queues_num(struct device_queue_manager *dqm) > { > BUG_ON(!dqm || !dqm->dev); > -- > 2.7.4 > > ___ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH v3 1/4] drm/amdgpu: Fix KFD oversubscription by tracking queues correctly
On Fri, Jul 14, 2017 at 7:24 PM, Alex Deucher wrote: > On Thu, Jul 13, 2017 at 9:21 PM, Jay Cornwall wrote: >> The number of compute queues available to the KFD was erroneously >> calculated as 64. Only the first MEC can execute compute queues and >> it has 32 queue slots. >> >> This caused the oversubscription limit to be calculated incorrectly, >> leading to a missing chained runlist command at the end of an >> oversubscribed runlist. >> >> v2: Remove unused num_mec field to avoid duplicate logic >> v3: Separate num_mec removal into separate patches >> >> Change-Id: I9e7bba2cc1928b624e3eeb1edb06fdb602e5294f >> Signed-off-by: Jay Cornwall > > Series is: > Reviewed-by: Alex Deucher > Hi Jay, Thanks for the patches, I applied them to amdkfd-fixes (after rebasing them over 4.13-rc1) Oded >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c >> index 7060daf..aa4006a 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c >> @@ -140,7 +140,7 @@ void amdgpu_amdkfd_device_init(struct amdgpu_device >> *adev) >> /* According to linux/bitmap.h we shouldn't use bitmap_clear >> if >> * nbits is not compile time constant >> */ >> - last_valid_bit = adev->gfx.mec.num_mec >> + last_valid_bit = 1 /* only first MEC can have compute queues >> */ >> * adev->gfx.mec.num_pipe_per_mec >> * adev->gfx.mec.num_queue_per_pipe; >> for (i = last_valid_bit; i < KGD_MAX_QUEUES; ++i) >> -- >> 2.7.4 >> >> ___ >> amd-gfx mailing list >> amd-gfx@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx > ___ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: read reg in each iterate of psp_wait_for loop
On Mon, Jul 17, 2017 at 06:57:41PM +0800, Greg KH wrote: > On Mon, Jul 17, 2017 at 04:56:26PM +0800, Zhang, Jerry (Junwei) wrote: > > + sta...@vger.kernel.org > > > > This is not the correct way to submit patches for inclusion in the > stable kernel tree. Please read: > https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html > for how to do this properly. > > Thanks, Greg. :-) >> BTW: please add Cc: in your patch, it need be >> backported to stable tree. Jerry, I might not describe it clearly. We need follow the rule that Greg provided. Actually, I meant to add Cc in your commit message like below, then sent it out: 8<-- Subject: [PATCH] drm/amdgpu: read reg in each iterate of psp_wait_for loop This fixes the SOS loading failure of psp v3.1. Signed-off-by: Junwei Zhang Cc: sta...@vger.kernel.org 8<-- And you'd better squeeze the two patches as one (actually it's only one fix) to make backporting more smooth. Thanks, Rui ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 2/2] drm/amdgpu: enable sos status checking for vega10
Signed-off-by: Junwei Zhang --- drivers/gpu/drm/amd/amdgpu/psp_v3_1.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c b/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c index 2718e86..23106e3 100644 --- a/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c +++ b/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c @@ -237,11 +237,9 @@ int psp_v3_1_bootloader_load_sos(struct psp_context *psp) /* there might be handshake issue with hardware which needs delay */ mdelay(20); -#if 0 ret = psp_wait_for(psp, SOC15_REG_OFFSET(MP0, 0, mmMP0_SMN_C2PMSG_81), RREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_81), 0, true); -#endif return ret; } -- 1.9.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH 1/2] drm/amdgpu: read reg in each iterate of psp_wait_for loop
From: "Zhang, Jerry" Signed-off-by: Junwei Zhang Acked-by: Alex Deucher Acked-by: Huang Rui --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c index c919579..644941d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c @@ -98,9 +98,8 @@ int psp_wait_for(struct psp_context *psp, uint32_t reg_index, int i; struct amdgpu_device *adev = psp->adev; - val = RREG32(reg_index); - for (i = 0; i < adev->usec_timeout; i++) { + val = RREG32(reg_index); if (check_changed) { if (val != reg_val) return 0; -- 1.9.1 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: iMac 10,1 with Ubuntu 16.04: black screen after suspend
On Fri, Jun 02, 2017 at 06:47:07PM +0200, Florian Echtler wrote: > Regarding the SMC, there's actually only one key that consistently seems to > have > a different value whether the display is on or off: > > --- blank 2017-05-05 08:40:53.694565045 +0200 > +++ non_blank 2017-05-05 08:40:53.702565066 +0200 > @@ -143,7 +143,7 @@ >MSWR [ui8 ] 0 (bytes 00) >MVBO [hex_] (bytes ff ff) >MVDC [bin_] (bytes 00) > - MVDS [bin_] (bytes 08) > + MVDS [bin_] (bytes 0a) >MVE1 [si8 ] (bytes 0d) >MVE5 [si8 ] (bytes 0b) >MVHR [flag] (bytes 01) > > However, even with my modified SmcDumpKeys.c which I can use to enable TDM, I > cannot write to that key. Since other MV__ keys control the display, too, it > would make sense that that is related to the display state, but it seems to > be a > read-only key :-/ > > Running out of ideas again... any suggestions? Sorry for the delay Florian. Commit 564d8a2cf3ab by Mario Kleiner (+cc) landed in Linus' tree last week and is included in 4.13-rc1. It is supposed to fix black screen issues with the iMac10,1 that you're also using, though in Mario's case they seem to occur upon boot, rather than on suspend, still might be worth a try: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=564d8a2cf3abf16575af48bdc3e86e92ee8a617d Thanks, Lukas ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: read reg in each iterate of psp_wait_for loop
+ sta...@vger.kernel.org On 07/17/2017 03:57 PM, Huang Rui wrote: On Mon, Jul 17, 2017 at 03:52:10PM +0800, Huang Rui wrote: On Fri, Jul 14, 2017 at 06:20:17PM +0800, Junwei Zhang wrote: Signed-off-by: Junwei Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c index ba743d4..71ce3ee 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c @@ -95,9 +95,8 @@ int psp_wait_for(struct psp_context *psp, uint32_t reg_index, int i; struct amdgpu_device *adev = psp->adev; - val = RREG32(reg_index); - for (i = 0; i < adev->usec_timeout; i++) { + val = RREG32(reg_index); if (check_changed) { if (val != reg_val) return 0; Nice catch. I remembered Ken also mentioned it before. This should fix the issue I encountered before during bring-up. Can you open this handshake in psp_v3_1_bootloader_load_sos and double check if this handshake is workable with this fix. If yes, please add it back. Yes, it could fix this. Later I will enable it. Jerry #if 0 ret = psp_wait_for(psp, SOC15_REG_OFFSET(MP0, 0, mmMP0_SMN_C2PMSG_81), RREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_81), 0, true); #endif Acked-by: Huang Rui BTW: please add Cc: in your patch, it need be backported to stable tree. Thanks, Rui ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: read reg in each iterate of psp_wait_for loop
On Mon, Jul 17, 2017 at 03:52:10PM +0800, Huang Rui wrote: > On Fri, Jul 14, 2017 at 06:20:17PM +0800, Junwei Zhang wrote: > > Signed-off-by: Junwei Zhang > > --- > > drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 3 +-- > > 1 file changed, 1 insertion(+), 2 deletions(-) > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c > > index ba743d4..71ce3ee 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c > > @@ -95,9 +95,8 @@ int psp_wait_for(struct psp_context *psp, uint32_t > > reg_index, > > int i; > > struct amdgpu_device *adev = psp->adev; > > > > - val = RREG32(reg_index); > > - > > for (i = 0; i < adev->usec_timeout; i++) { > > + val = RREG32(reg_index); > > if (check_changed) { > > if (val != reg_val) > > return 0; > > Nice catch. I remembered Ken also mentioned it before. This should fix the > issue I encountered before during bring-up. Can you open this handshake in > psp_v3_1_bootloader_load_sos and double check if this handshake is workable > with this fix. If yes, please add it back. > > #if 0 > ret = psp_wait_for(psp, SOC15_REG_OFFSET(MP0, 0, mmMP0_SMN_C2PMSG_81), >RREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_81), >0, true); > #endif > > Acked-by: Huang Rui BTW: please add Cc: in your patch, it need be backported to stable tree. Thanks, Rui ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: read reg in each iterate of psp_wait_for loop
On Fri, Jul 14, 2017 at 06:20:17PM +0800, Junwei Zhang wrote: > Signed-off-by: Junwei Zhang > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c > index ba743d4..71ce3ee 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c > @@ -95,9 +95,8 @@ int psp_wait_for(struct psp_context *psp, uint32_t > reg_index, > int i; > struct amdgpu_device *adev = psp->adev; > > - val = RREG32(reg_index); > - > for (i = 0; i < adev->usec_timeout; i++) { > + val = RREG32(reg_index); > if (check_changed) { > if (val != reg_val) > return 0; Nice catch. I remembered Ken also mentioned it before. This should fix the issue I encountered before during bring-up. Can you open this handshake in psp_v3_1_bootloader_load_sos and double check if this handshake is workable with this fix. If yes, please add it back. #if 0 ret = psp_wait_for(psp, SOC15_REG_OFFSET(MP0, 0, mmMP0_SMN_C2PMSG_81), RREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_81), 0, true); #endif Acked-by: Huang Rui ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx