date:20170717

Hi Alex,

This patch series went into amd-kfd-staging. I'd like to also push it
into amd-staging-4.11 as I'm just working to minimize any unnecessary
differences between the branches before the big KFD history rework.

I rebased it, resolved some contlicts, and removed the declaration of
get_mec_num from kfd_device_queue_manager.h. Do you want me to push that
rebased patch series?

Thanks,
  Felix


On 17-07-17 11:52 AM, Oded Gabbay wrote:
> On Fri, Jul 14, 2017 at 7:24 PM, Alex Deucher  wrote:
>> On Thu, Jul 13, 2017 at 9:21 PM, Jay Cornwall  wrote:
>>> The number of compute queues available to the KFD was erroneously
>>> calculated as 64. Only the first MEC can execute compute queues and
>>> it has 32 queue slots.
>>>
>>> This caused the oversubscription limit to be calculated incorrectly,
>>> leading to a missing chained runlist command at the end of an
>>> oversubscribed runlist.
>>>
>>> v2: Remove unused num_mec field to avoid duplicate logic
>>> v3: Separate num_mec removal into separate patches
>>>
>>> Change-Id: I9e7bba2cc1928b624e3eeb1edb06fdb602e5294f
>>> Signed-off-by: Jay Cornwall 
>> Series is:
>> Reviewed-by: Alex Deucher 
>>
> Hi Jay,
> Thanks for the patches, I applied them to amdkfd-fixes (after rebasing
> them over 4.13-rc1)
>
> Oded
>
>>> ---
>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>>> index 7060daf..aa4006a 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>>> @@ -140,7 +140,7 @@ void amdgpu_amdkfd_device_init(struct amdgpu_device 
>>> *adev)
>>> /* According to linux/bitmap.h we shouldn't use 
>>> bitmap_clear if
>>>  * nbits is not compile time constant
>>>  */
>>> -   last_valid_bit = adev->gfx.mec.num_mec
>>> +   last_valid_bit = 1 /* only first MEC can have compute 
>>> queues */
>>> * adev->gfx.mec.num_pipe_per_mec
>>> * adev->gfx.mec.num_queue_per_pipe;
>>> for (i = last_valid_bit; i < KGD_MAX_QUEUES; ++i)
>>> --
>>> 2.7.4
>>>
>>> ___
>>> amd-gfx mailing list
>>> amd-gfx@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>> ___
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu:fix gfx fence allocate size

2017-07-17 Thread Monk Liu

1, for sriov, we need 8dw for the gfx fence due to CP
behaviour
2, cleanup wrong logic in wptr/rptr wb alloc and free

Change-Id: Ifbfed17a4621dae57244942ffac7de1743de0294
Signed-off-by: Monk Liu 
Signed-off-by: Xiangliang Yu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h|  2 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 32 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c   | 26 
 3 files changed, 52 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index f6345b9..fe96236 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1191,7 +1191,9 @@ struct amdgpu_wb {
 int amdgpu_wb_get(struct amdgpu_device *adev, u32 *wb);
 void amdgpu_wb_free(struct amdgpu_device *adev, u32 wb);
 int amdgpu_wb_get_64bit(struct amdgpu_device *adev, u32 *wb);
+int amdgpu_wb_get_256Bit(struct amdgpu_device *adev, u32 *wb);
 void amdgpu_wb_free_64bit(struct amdgpu_device *adev, u32 wb);
+void amdgpu_wb_free_256bit(struct amdgpu_device *adev, u32 wb);
 
 void amdgpu_get_pcie_info(struct amdgpu_device *adev);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 7e11190..6050804 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -603,6 +603,21 @@ int amdgpu_wb_get_64bit(struct amdgpu_device *adev, u32 
*wb)
}
 }
 
+int amdgpu_wb_get_256Bit(struct amdgpu_device *adev, u32 *wb)
+{
+   int i = 0;
+   unsigned long offset = bitmap_find_next_zero_area_off(adev->wb.used,
+   adev->wb.num_wb, 0, 8, 63, 0);
+   if ((offset + 7) < adev->wb.num_wb) {
+   for (i = 0; i < 8; i++)
+   __set_bit(offset + i, adev->wb.used);
+   *wb = offset;
+   return 0;
+   } else {
+   return -EINVAL;
+   }
+}
+
 /**
  * amdgpu_wb_free - Free a wb entry
  *
@@ -634,6 +649,23 @@ void amdgpu_wb_free_64bit(struct amdgpu_device *adev, u32 
wb)
 }
 
 /**
+ * amdgpu_wb_free_256bit - Free a wb entry
+ *
+ * @adev: amdgpu_device pointer
+ * @wb: wb index
+ *
+ * Free a wb slot allocated for use by the driver (all asics)
+ */
+void amdgpu_wb_free_256bit(struct amdgpu_device *adev, u32 wb)
+{
+   int i = 0;
+
+   if ((wb + 7) < adev->wb.num_wb)
+   for (i = 0; i < 8; i++)
+   __clear_bit(wb + i, adev->wb.used);
+}
+
+/**
  * amdgpu_vram_location - try to find VRAM location
  * @adev: amdgpu device structure holding all necessary informations
  * @mc: memory controller structure holding memory informations
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
index 75165e0..eea17ae 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
@@ -212,10 +212,19 @@ int amdgpu_ring_init(struct amdgpu_device *adev, struct 
amdgpu_ring *ring,
 
}
 
-   r = amdgpu_wb_get(adev, &ring->fence_offs);
-   if (r) {
-   dev_err(adev->dev, "(%d) ring fence_offs wb alloc failed\n", r);
-   return r;
+   if (amdgpu_sriov_vf(adev) && ring->funcs->type == AMDGPU_RING_TYPE_GFX) 
{
+   r = amdgpu_wb_get_256Bit(adev, &ring->fence_offs);
+   if (r) {
+   dev_err(adev->dev, "(%d) ring fence_offs wb alloc 
failed\n", r);
+   return r;
+   }
+
+   } else {
+   r = amdgpu_wb_get(adev, &ring->fence_offs);
+   if (r) {
+   dev_err(adev->dev, "(%d) ring fence_offs wb alloc 
failed\n", r);
+   return r;
+   }
}
 
r = amdgpu_wb_get(adev, &ring->cond_exe_offs);
@@ -278,17 +287,18 @@ void amdgpu_ring_fini(struct amdgpu_ring *ring)
ring->ready = false;
 
if (ring->funcs->support_64bit_ptrs) {
-   amdgpu_wb_free_64bit(ring->adev, ring->cond_exe_offs);
-   amdgpu_wb_free_64bit(ring->adev, ring->fence_offs);
amdgpu_wb_free_64bit(ring->adev, ring->rptr_offs);
amdgpu_wb_free_64bit(ring->adev, ring->wptr_offs);
} else {
-   amdgpu_wb_free(ring->adev, ring->cond_exe_offs);
-   amdgpu_wb_free(ring->adev, ring->fence_offs);
amdgpu_wb_free(ring->adev, ring->rptr_offs);
amdgpu_wb_free(ring->adev, ring->wptr_offs);
}
 
+   amdgpu_wb_free(ring->adev, ring->cond_exe_offs);
+   if (amdgpu_sriov_vf(ring->adev) && ring->funcs->type == 
AMDGPU_RING_TYPE_GFX)
+   amdgpu_wb_free_256bit(ring->adev, ring->fence_offs);
+   else
+   amdgpu_wb_free(ring->adev, ring->cond_exe_offs);
 
amdgpu_bo_free_kernel(&ring->ring_obj,
  &ring->gpu_addr,
-- 
2.7.4

_

[PATCH] [rfc] radv: start moving semaphore support out of libdrm

From: Dave Airlie 

This is a port of radv to the new lowlevel cs submission APIs
for libdrm that I submitted earlier.

This moves a lot of the current non-shared semaphore handling
and chunk creation out of libdrm_amdgpu. It provides a much
simpler implementation without all the list handling, I'm
sure I can even clean it up a lot further.

For now I've left the old code paths under the RADV_OLD_LIBDRM
define in this patch, I'd replace that with version or just rip
out the whole lot once we get a libdrm release with the new APIs
in.
---
 src/amd/vulkan/winsys/amdgpu/radv_amdgpu_cs.c | 202 +++---
 1 file changed, 184 insertions(+), 18 deletions(-)

diff --git a/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_cs.c 
b/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_cs.c
index ffc7566..ce73b88 100644
--- a/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_cs.c
+++ b/src/amd/vulkan/winsys/amdgpu/radv_amdgpu_cs.c
@@ -75,6 +75,10 @@ radv_amdgpu_cs(struct radeon_winsys_cs *base)
return (struct radv_amdgpu_cs*)base;
 }
 
+struct radv_amdgpu_sem_info {
+   int wait_sem_count;
+   struct radeon_winsys_sem **wait_sems;
+};
 static int ring_to_hw_ip(enum ring_type ring)
 {
switch (ring) {
@@ -89,6 +93,21 @@ static int ring_to_hw_ip(enum ring_type ring)
}
 }
 
+static void radv_amdgpu_wait_sems(struct radv_amdgpu_ctx *ctx,
+ uint32_t ip_type,
+ uint32_t ring,
+ uint32_t sem_count,
+ struct radeon_winsys_sem **_sem,
+ struct radv_amdgpu_sem_info *sem_info);
+static int radv_amdgpu_signal_sems(struct radv_amdgpu_ctx *ctx,
+  uint32_t ip_type,
+  uint32_t ring,
+  uint32_t sem_count,
+  struct radeon_winsys_sem **_sem);
+static int radv_amdgpu_cs_submit(struct radv_amdgpu_ctx *ctx,
+struct amdgpu_cs_request *request,
+struct radv_amdgpu_sem_info *sem_info);
+
 static void radv_amdgpu_request_to_fence(struct radv_amdgpu_ctx *ctx,
 struct radv_amdgpu_fence *fence,
 struct amdgpu_cs_request *req)
@@ -647,6 +666,7 @@ static void radv_assign_last_submit(struct radv_amdgpu_ctx 
*ctx,
 
 static int radv_amdgpu_winsys_cs_submit_chained(struct radeon_winsys_ctx *_ctx,
int queue_idx,
+   struct radv_amdgpu_sem_info 
*sem_info,
struct radeon_winsys_cs 
**cs_array,
unsigned cs_count,
struct radeon_winsys_cs 
*initial_preamble_cs,
@@ -703,7 +723,7 @@ static int radv_amdgpu_winsys_cs_submit_chained(struct 
radeon_winsys_ctx *_ctx,
ibs[0] = ((struct radv_amdgpu_cs*)initial_preamble_cs)->ib;
}
 
-   r = amdgpu_cs_submit(ctx->ctx, 0, &request, 1);
+   r = radv_amdgpu_cs_submit(ctx, &request, sem_info);
if (r) {
if (r == -ENOMEM)
fprintf(stderr, "amdgpu: Not enough memory for command 
submission.\n");
@@ -724,6 +744,7 @@ static int radv_amdgpu_winsys_cs_submit_chained(struct 
radeon_winsys_ctx *_ctx,
 
 static int radv_amdgpu_winsys_cs_submit_fallback(struct radeon_winsys_ctx 
*_ctx,
 int queue_idx,
+struct radv_amdgpu_sem_info 
*sem_info,
 struct radeon_winsys_cs 
**cs_array,
 unsigned cs_count,
 struct radeon_winsys_cs 
*initial_preamble_cs,
@@ -775,7 +796,7 @@ static int radv_amdgpu_winsys_cs_submit_fallback(struct 
radeon_winsys_ctx *_ctx,
}
}
 
-   r = amdgpu_cs_submit(ctx->ctx, 0, &request, 1);
+   r = radv_amdgpu_cs_submit(ctx, &request, sem_info);
if (r) {
if (r == -ENOMEM)
fprintf(stderr, "amdgpu: Not enough memory for 
command submission.\n");
@@ -801,6 +822,7 @@ static int radv_amdgpu_winsys_cs_submit_fallback(struct 
radeon_winsys_ctx *_ctx,
 
 static int radv_amdgpu_winsys_cs_submit_sysmem(struct radeon_winsys_ctx *_ctx,
   int queue_idx,
+  struct radv_amdgpu_sem_info 
*sem_info,
   struct radeon_winsys_cs 
**cs_array,
   unsigned cs_count,
   struct radeon_winsys_cs 
*initial_pr

[PATCH 2/2] drm/amdgpu: Implement ttm_bo_driver.access_memory callback v2

Allows gdb to access contents of user mode mapped VRAM BOs.

v2: return error for non-VRAM pools

Signed-off-by: Felix Kuehling 
Reviewed-by: Michel Dänzer 
Acked-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 62 +
 1 file changed, 62 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index ff5614b..4d2a454 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1115,6 +1115,67 @@ static bool amdgpu_ttm_bo_eviction_valuable(struct 
ttm_buffer_object *bo,
return ttm_bo_eviction_valuable(bo, place);
 }
 
+static int amdgpu_ttm_access_memory(struct ttm_buffer_object *bo,
+   unsigned long offset,
+   void *buf, int len, int write)
+{
+   struct amdgpu_bo *abo = container_of(bo, struct amdgpu_bo, tbo);
+   struct amdgpu_device *adev = amdgpu_ttm_adev(abo->tbo.bdev);
+   struct drm_mm_node *nodes = abo->tbo.mem.mm_node;
+   uint32_t value = 0;
+   int ret = 0;
+   uint64_t pos;
+   unsigned long flags;
+
+   if (bo->mem.mem_type != TTM_PL_VRAM)
+   return -EIO;
+
+   while (offset >= (nodes->size << PAGE_SHIFT)) {
+   offset -= nodes->size << PAGE_SHIFT;
+   ++nodes;
+   }
+   pos = (nodes->start << PAGE_SHIFT) + offset;
+
+   while (len && pos < adev->mc.mc_vram_size) {
+   uint64_t aligned_pos = pos & ~(uint64_t)3;
+   uint32_t bytes = 4 - (pos & 3);
+   uint32_t shift = (pos & 3) * 8;
+   uint32_t mask = 0x << shift;
+
+   if (len < bytes) {
+   mask &= 0x >> (bytes - len) * 8;
+   bytes = len;
+   }
+
+   spin_lock_irqsave(&adev->mmio_idx_lock, flags);
+   WREG32(mmMM_INDEX, ((uint32_t)aligned_pos) | 0x8000);
+   WREG32(mmMM_INDEX_HI, aligned_pos >> 31);
+   if (!write || mask != 0x)
+   value = RREG32(mmMM_DATA);
+   if (write) {
+   value &= ~mask;
+   value |= (*(uint32_t *)buf << shift) & mask;
+   WREG32(mmMM_DATA, value);
+   }
+   spin_unlock_irqrestore(&adev->mmio_idx_lock, flags);
+   if (!write) {
+   value = (value & mask) >> shift;
+   memcpy(buf, &value, bytes);
+   }
+
+   ret += bytes;
+   buf = (uint8_t *)buf + bytes;
+   pos += bytes;
+   len -= bytes;
+   if (pos >= (nodes->start + nodes->size) << PAGE_SHIFT) {
+   ++nodes;
+   pos = (nodes->start << PAGE_SHIFT);
+   }
+   }
+
+   return ret;
+}
+
 static struct ttm_bo_driver amdgpu_bo_driver = {
.ttm_tt_create = &amdgpu_ttm_tt_create,
.ttm_tt_populate = &amdgpu_ttm_tt_populate,
@@ -1130,6 +1191,7 @@ static bool amdgpu_ttm_bo_eviction_valuable(struct 
ttm_buffer_object *bo,
.io_mem_reserve = &amdgpu_ttm_io_mem_reserve,
.io_mem_free = &amdgpu_ttm_io_mem_free,
.io_mem_pfn = amdgpu_ttm_io_mem_pfn,
+   .access_memory = &amdgpu_ttm_access_memory
 };
 
 int amdgpu_ttm_init(struct amdgpu_device *adev)
-- 
1.9.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 1/2] drm/ttm: Implement vm_operations_struct.access v2

Allows gdb to access contents of user mode mapped BOs. System memory
is handled by TTM using kmap. Other memory pools require a new driver
callback in ttm_bo_driver.

v2:
* kmap only one page at a time
* swap in BO if needed
* make driver callback more generic to handle private memory pools
* document callback return value
* WARN_ON -> WARN_ON_ONCE

Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/ttm/ttm_bo_vm.c | 79 -
 include/drm/ttm/ttm_bo_driver.h | 17 +
 2 files changed, 95 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo_vm.c b/drivers/gpu/drm/ttm/ttm_bo_vm.c
index 9f53df9..945985e 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_vm.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_vm.c
@@ -294,10 +294,87 @@ static void ttm_bo_vm_close(struct vm_area_struct *vma)
vma->vm_private_data = NULL;
 }
 
+static int ttm_bo_vm_access_kmap(struct ttm_buffer_object *bo,
+unsigned long offset,
+void *buf, int len, int write)
+{
+   unsigned long page = offset >> PAGE_SHIFT;
+   unsigned long bytes_left = len;
+   int ret;
+
+   /* Copy a page at a time, that way no extra virtual address
+* mapping is needed
+*/
+   offset -= page << PAGE_SHIFT;
+   do {
+   unsigned long bytes = min(bytes_left, PAGE_SIZE - offset);
+   struct ttm_bo_kmap_obj map;
+   void *ptr;
+   bool is_iomem;
+
+   ret = ttm_bo_kmap(bo, page, 1, &map);
+   if (ret)
+   return ret;
+
+   ptr = (uint8_t *)ttm_kmap_obj_virtual(&map, &is_iomem) + offset;
+   WARN_ON_ONCE(is_iomem);
+   if (write)
+   memcpy(ptr, buf, bytes);
+   else
+   memcpy(buf, ptr, bytes);
+   ttm_bo_kunmap(&map);
+
+   page++;
+   bytes_left -= bytes;
+   offset = 0;
+   } while (bytes_left);
+
+   return len;
+}
+
+static int ttm_bo_vm_access(struct vm_area_struct *vma, unsigned long addr,
+   void *buf, int len, int write)
+{
+   unsigned long offset = (addr) - vma->vm_start;
+   struct ttm_buffer_object *bo = vma->vm_private_data;
+   int ret;
+
+   if (len < 1 || (offset + len) >> PAGE_SHIFT > bo->num_pages)
+   return -EIO;
+
+   ret = ttm_bo_reserve(bo, true, false, NULL);
+   if (ret)
+   return ret;
+
+   switch(bo->mem.mem_type) {
+   case TTM_PL_SYSTEM:
+   if (unlikely(bo->ttm->page_flags & TTM_PAGE_FLAG_SWAPPED)) {
+   ret = ttm_tt_swapin(bo->ttm);
+   if (unlikely(ret != 0))
+   return ret;
+   }
+   /* fall through */
+   case TTM_PL_TT:
+   ret = ttm_bo_vm_access_kmap(bo, offset, buf, len, write);
+   break;
+   default:
+   if (bo->bdev->driver->access_memory)
+   ret = bo->bdev->driver->access_memory(
+   bo, offset, buf, len, write);
+   else
+   ret = -EIO;
+   }
+
+   ttm_bo_unreserve(bo);
+
+   return ret;
+}
+
 static const struct vm_operations_struct ttm_bo_vm_ops = {
.fault = ttm_bo_vm_fault,
.open = ttm_bo_vm_open,
-   .close = ttm_bo_vm_close
+   .close = ttm_bo_vm_close,
+   .access = ttm_bo_vm_access
 };
 
 static struct ttm_buffer_object *ttm_bo_vm_lookup(struct ttm_bo_device *bdev,
diff --git a/include/drm/ttm/ttm_bo_driver.h b/include/drm/ttm/ttm_bo_driver.h
index 6bbd34d..04380ba 100644
--- a/include/drm/ttm/ttm_bo_driver.h
+++ b/include/drm/ttm/ttm_bo_driver.h
@@ -471,6 +471,23 @@ struct ttm_bo_driver {
 */
unsigned long (*io_mem_pfn)(struct ttm_buffer_object *bo,
unsigned long page_offset);
+
+   /**
+* Read/write memory buffers for ptrace access
+*
+* @bo: the BO to access
+* @offset: the offset from the start of the BO
+* @buf: pointer to source/destination buffer
+* @len: number of bytes to copy
+* @write: whether to read (0) from or write (non-0) to BO
+*
+* If successful, this function should return the number of
+* bytes copied, -EIO otherwise. If the number of bytes
+* returned is < len, the function may be called again with
+* the remainder of the buffer to copy.
+*/
+   int (*access_memory)(struct ttm_buffer_object *bo, unsigned long offset,
+void *buf, int len, int write);
 };
 
 /**
-- 
1.9.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH libdrm] drm/amdgpu: add new low overhead command submission API. (v2)

From: Dave Airlie 

This just sends chunks to the kernel API for a single command
stream.

This should provide a more future proof and extensible API
for command submission.

v2: use amdgpu_bo_list_handle, add two helper functions to
access bo and context internals.

Signed-off-by: Dave Airlie 
---
 amdgpu/amdgpu.h| 30 ++
 amdgpu/amdgpu_cs.c | 47 +++
 2 files changed, 77 insertions(+)

diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h
index 183f974..238b1aa 100644
--- a/amdgpu/amdgpu.h
+++ b/amdgpu/amdgpu.h
@@ -1382,6 +1382,36 @@ int amdgpu_cs_import_syncobj(amdgpu_device_handle dev,
 int shared_fd,
 uint32_t *syncobj);
 
+/**
+ *  Submit raw command submission to kernel
+ *
+ * \param   dev   - \c [in] device handle
+ * \param   context- \c [in] context handle for context id
+ * \param   bo_list_handle - \c [in] request bo list handle (0 for none)
+ * \param   num_chunks - \c [in] number of CS chunks to submit
+ * \param   chunks - \c [in] array of CS chunks
+ * \param   seq_no - \c [out] output sequence number for submission.
+ *
+ * \return   0 on success\n
+ *  <0 - Negative POSIX Error code
+ *
+ */
+struct drm_amdgpu_cs_chunk;
+struct drm_amdgpu_cs_chunk_dep;
+struct drm_amdgpu_cs_chunk_data;
+
+int amdgpu_cs_submit_raw(amdgpu_device_handle dev,
+amdgpu_context_handle context,
+amdgpu_bo_list_handle bo_list_handle,
+int num_chunks,
+struct drm_amdgpu_cs_chunk *chunks,
+uint64_t *seq_no);
+
+void amdgpu_cs_chunk_fence_to_dep(struct amdgpu_cs_fence *fence,
+ struct drm_amdgpu_cs_chunk_dep *dep);
+void amdgpu_cs_chunk_fence_info_to_data(struct amdgpu_cs_fence_info 
*fence_info,
+   struct drm_amdgpu_cs_chunk_data *data);
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/amdgpu/amdgpu_cs.c b/amdgpu/amdgpu_cs.c
index 722fd75..dfba875 100644
--- a/amdgpu/amdgpu_cs.c
+++ b/amdgpu/amdgpu_cs.c
@@ -634,3 +634,50 @@ int amdgpu_cs_import_syncobj(amdgpu_device_handle dev,
 
return drmSyncobjFDToHandle(dev->fd, shared_fd, handle);
 }
+
+int amdgpu_cs_submit_raw(amdgpu_device_handle dev,
+amdgpu_context_handle context,
+amdgpu_bo_list_handle bo_list_handle,
+int num_chunks,
+struct drm_amdgpu_cs_chunk *chunks,
+uint64_t *seq_no)
+{
+   union drm_amdgpu_cs cs = {0};
+   uint64_t *chunk_array;
+   int i, r;
+   if (num_chunks == 0)
+   return -EINVAL;
+
+   chunk_array = alloca(sizeof(uint64_t) * num_chunks);
+   for (i = 0; i < num_chunks; i++)
+   chunk_array[i] = (uint64_t)(uintptr_t)&chunks[i];
+   cs.in.chunks = (uint64_t)(uintptr_t)chunk_array;
+   cs.in.ctx_id = context->id;
+   cs.in.bo_list_handle = bo_list_handle ? bo_list_handle->handle : 0;
+   cs.in.num_chunks = num_chunks;
+   r = drmCommandWriteRead(dev->fd, DRM_AMDGPU_CS,
+   &cs, sizeof(cs));
+   if (r)
+   return r;
+
+   if (seq_no)
+   *seq_no = cs.out.handle;
+   return 0;
+}
+
+void amdgpu_cs_chunk_fence_info_to_data(struct amdgpu_cs_fence_info 
*fence_info,
+   struct drm_amdgpu_cs_chunk_data *data)
+{
+   data->fence_data.handle = fence_info->handle->handle;
+   data->fence_data.offset = fence_info->offset * sizeof(uint64_t);
+}
+
+void amdgpu_cs_chunk_fence_to_dep(struct amdgpu_cs_fence *fence,
+ struct drm_amdgpu_cs_chunk_dep *dep)
+{
+   dep->ip_type = fence->ip_type;
+   dep->ip_instance = fence->ip_instance;
+   dep->ring = fence->ring;
+   dep->ctx_id = fence->context->id;
+   dep->handle = fence->fence;
+}
-- 
2.9.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] drm/amdgpu: read reg in each iterate of psp_wait_for loop

2017-07-17 Thread Deucher, Alexander

> -Original Message-
> From: Junwei Zhang [mailto:jerry.zh...@amd.com]
> Sent: Monday, July 17, 2017 10:54 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander; Huang, Ray; gre...@linuxfoundation.org; Zhang,
> Jerry; sta...@vger.kernel.org
> Subject: [PATCH] drm/amdgpu: read reg in each iterate of psp_wait_for loop
> 
> From: "Zhang, Jerry" 
> 
> v2: fixes the SOS loading failure for PSP v3.1
> 
> Signed-off-by: Junwei Zhang 
> Cc: sta...@vger.kernel.org
> Acked-by: Alex Deucher  (v1)
> Acked-by: Huang Rui  (v1)

Reviewed-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 3 +--
>  drivers/gpu/drm/amd/amdgpu/psp_v3_1.c   | 2 --
>  2 files changed, 1 insertion(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> index c919579..644941d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> @@ -98,9 +98,8 @@ int psp_wait_for(struct psp_context *psp, uint32_t
> reg_index,
>   int i;
>   struct amdgpu_device *adev = psp->adev;
> 
> - val = RREG32(reg_index);
> -
>   for (i = 0; i < adev->usec_timeout; i++) {
> + val = RREG32(reg_index);
>   if (check_changed) {
>   if (val != reg_val)
>   return 0;
> diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c
> b/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c
> index 2718e86..23106e3 100644
> --- a/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c
> +++ b/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c
> @@ -237,11 +237,9 @@ int psp_v3_1_bootloader_load_sos(struct
> psp_context *psp)
> 
>   /* there might be handshake issue with hardware which needs delay
> */
>   mdelay(20);
> -#if 0
>   ret = psp_wait_for(psp, SOC15_REG_OFFSET(MP0, 0,
> mmMP0_SMN_C2PMSG_81),
>  RREG32_SOC15(MP0, 0,
> mmMP0_SMN_C2PMSG_81),
>  0, true);
> -#endif
> 
>   return ret;
>  }
> --
> 1.9.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH libdrm] libdrm_amdgpu: add kernel semaphore support

On 18 July 2017 at 03:02, Christian König  wrote:
> Am 17.07.2017 um 05:36 schrieb Dave Airlie:
>>>
>>> I can take a look at it, I just won't have time until next week most
>>> likely.
>>
>> I've taken a look, and it's seemingly more complicated than I'm
>> expecting I'd want to land in Mesa before 17.2 ships, I'd really
>> prefer to just push the new libdrm_amdgpu api from this patch. If I
>> have to port all the current radv code to the new API, I'll most
>> definitely get something wrong.
>>
>> Adding the new API so far looks like
>> https://cgit.freedesktop.org/~airlied/drm/log/?h=drm-amdgpu-cs-submit-raw
>>
>>
>> https://cgit.freedesktop.org/~airlied/drm/commit/?h=drm-amdgpu-cs-submit-raw&id=e7f85d0ca617fa41e72624780c9035df132e23c4
>> being the API, and whether it should take a uint32_t context id or
>> context handle left as an open question in the last patch in the
>> series.
>
>
> I would stick with the context handle, as far as I can see there isn't any
> value in using the uint32_t for this.
>
> We just want to be able to send arbitrary chunks down into the kernel
> without libdrm_amdgpu involvement and/or the associated overhead of the
> extra loop and the semaphore handling.
>
> So your "amdgpu/cs: add new raw cs submission interface just taking chunks"
> patch looks fine to me as far as I can tell.
>
> As far as I can see the "amdgpu: refactor semaphore handling" patch is
> actually incorrect. We must hole the mutex while sending the CS down to the
> kernel, or otherwise "context->last_seq" won't be accurate.
>
>> However to hook this into radv or radeonsi will take a bit of
>> rewriting of a lot of code that is probably a bit more fragile than
>> I'd like for this sort of surgery at this point.
>
>
> Again, I can move over the existing Mesa stuff if you like.
>
>> I'd actually suspect if we do want to proceed with this type of
>> interface, we might be better doing it all in common mesa code, and
>> maybe bypassing libdrm_amdgpu altogether, which I suppose the API I've
>> written here is mostly already doing.
>
>
> I want to stick with the other interfaces for now. No need to make it more
> complicated than it already is.
>
> Only the CS stuff is the most performance critical and thing we have right
> now.

As I suspected this plan is full of traps.

So with the raw cs api I posted (using amdgpu_bo_list_handle instead), I ran
into two places the abstraction cuts me.

  CC   winsys/amdgpu/radv_amdgpu_cs.lo
winsys/amdgpu/radv_amdgpu_cs.c: In function ‘radv_amdgpu_cs_submit’:
winsys/amdgpu/radv_amdgpu_cs.c:1173:63: error: dereferencing pointer
to incomplete type ‘struct amdgpu_bo’
   chunk_data[i].fence_data.handle = request->fence_info.handle->handle;
   ^~
winsys/amdgpu/radv_amdgpu_cs.c:1193:31: error: dereferencing pointer
to incomplete type ‘struct amdgpu_context’
dep->ctx_id = info->context->id;

In order to do user fence chunk I need the actual bo handle not the
amdgpu wrapped one, we don't have an accessor method for that.

In order to do the dependencies chunks, I need a context id.

Now I suppose I can add chunk creation helpers to libdrm, but it does
seems like it breaks the future proof interface if we can't access the
details of a bunch of objects we want to pass through to the kernel
API.

Dave.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu: read reg in each iterate of psp_wait_for loop

2017-07-17 Thread Junwei Zhang

From: "Zhang, Jerry" 

v2: fixes the SOS loading failure for PSP v3.1

Signed-off-by: Junwei Zhang 
Cc: sta...@vger.kernel.org
Acked-by: Alex Deucher  (v1)
Acked-by: Huang Rui  (v1)
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 3 +--
 drivers/gpu/drm/amd/amdgpu/psp_v3_1.c   | 2 --
 2 files changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index c919579..644941d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -98,9 +98,8 @@ int psp_wait_for(struct psp_context *psp, uint32_t reg_index,
int i;
struct amdgpu_device *adev = psp->adev;
 
-   val = RREG32(reg_index);
-
for (i = 0; i < adev->usec_timeout; i++) {
+   val = RREG32(reg_index);
if (check_changed) {
if (val != reg_val)
return 0;
diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c 
b/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c
index 2718e86..23106e3 100644
--- a/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c
+++ b/drivers/gpu/drm/amd/amdgpu/psp_v3_1.c
@@ -237,11 +237,9 @@ int psp_v3_1_bootloader_load_sos(struct psp_context *psp)
 
/* there might be handshake issue with hardware which needs delay */
mdelay(20);
-#if 0
ret = psp_wait_for(psp, SOC15_REG_OFFSET(MP0, 0, mmMP0_SMN_C2PMSG_81),
   RREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_81),
   0, true);
-#endif
 
return ret;
 }
-- 
1.9.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: read reg in each iterate of psp_wait_for loop

2017-07-17 Thread Zhang, Jerry (Junwei)


On 07/17/2017 07:45 PM, Huang Rui wrote:

On Mon, Jul 17, 2017 at 06:57:41PM +0800, Greg KH wrote:

On Mon, Jul 17, 2017 at 04:56:26PM +0800, Zhang, Jerry (Junwei) wrote:
> + sta...@vger.kernel.org



This is not the correct way to submit patches for inclusion in the
stable kernel tree.  Please read:
https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html
for how to do this properly.




Thanks, Greg. :-)


Thanks Greg to reminder that




BTW: please add Cc:  in your patch, it need be
backported to stable tree.


Jerry, I might not describe it clearly. We need follow the rule that Greg
provided. Actually, I meant to add Cc in your commit message like below,
then sent it out:


Thanks to explain in detail.
I will prepare it them as one patch again.

Jerry



8<--

Subject: [PATCH] drm/amdgpu: read reg in each iterate of psp_wait_for loop

This fixes the SOS loading failure of psp v3.1.

Signed-off-by: Junwei Zhang 
Cc: sta...@vger.kernel.org

8<--

And you'd better squeeze the two patches as one (actually it's only one
fix) to make backporting more smooth.

Thanks,
Rui

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH libdrm 1/2] drm/amdgpu: add syncobj create/destroy/import/export apis

2017-07-17 Thread zhoucm1




On 2017年07月18日 08:48, Dave Airlie wrote:

From: Dave Airlie 

These are just wrappers using the amdgpu device handle.

Signed-off-by: Dave Airlie 

Acked-by: Chunming Zhou 

---
  amdgpu/amdgpu.h| 55 +-
  amdgpu/amdgpu_cs.c | 38 +
  2 files changed, 92 insertions(+), 1 deletion(-)

diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h
index 1901fa8..183f974 100644
--- a/amdgpu/amdgpu.h
+++ b/amdgpu/amdgpu.h
@@ -1328,8 +1328,61 @@ int amdgpu_cs_destroy_semaphore(amdgpu_semaphore_handle 
sem);
  */
  const char *amdgpu_get_marketing_name(amdgpu_device_handle dev);
  
+/**

+ *  Create kernel sync object
+ *
+ * \param   dev  - \c [in]  device handle
+ * \param   syncobj   - \c [out] sync object handle
+ *
+ * \return   0 on success\n
+ *  <0 - Negative POSIX Error code
+ *
+*/
+int amdgpu_cs_create_syncobj(amdgpu_device_handle dev,
+uint32_t *syncobj);
+/**
+ *  Destroy kernel sync object
+ *
+ * \param   dev- \c [in] device handle
+ * \param   syncobj - \c [in] sync object handle
+ *
+ * \return   0 on success\n
+ *  <0 - Negative POSIX Error code
+ *
+*/
+int amdgpu_cs_destroy_syncobj(amdgpu_device_handle dev,
+ uint32_t syncobj);
+
+/**
+ *  Export kernel sync object to shareable fd.
+ *
+ * \param   dev   - \c [in] device handle
+ * \param   syncobj- \c [in] sync object handle
+ * \param   shared_fd  - \c [out] shared file descriptor.
+ *
+ * \return   0 on success\n
+ *  <0 - Negative POSIX Error code
+ *
+*/
+int amdgpu_cs_export_syncobj(amdgpu_device_handle dev,
+uint32_t syncobj,
+int *shared_fd);
+/**
+ *  Import kernel sync object from shareable fd.
+ *
+ * \param   dev   - \c [in] device handle
+ * \param   shared_fd  - \c [in] shared file descriptor.
+ * \param   syncobj- \c [out] sync object handle
+ *
+ * \return   0 on success\n
+ *  <0 - Negative POSIX Error code
+ *
+*/
+int amdgpu_cs_import_syncobj(amdgpu_device_handle dev,
+int shared_fd,
+uint32_t *syncobj);
+
  #ifdef __cplusplus
  }
  #endif
-
  #endif /* #ifdef _AMDGPU_H_ */
diff --git a/amdgpu/amdgpu_cs.c b/amdgpu/amdgpu_cs.c
index 868eb7b..722fd75 100644
--- a/amdgpu/amdgpu_cs.c
+++ b/amdgpu/amdgpu_cs.c
@@ -596,3 +596,41 @@ int amdgpu_cs_destroy_semaphore(amdgpu_semaphore_handle 
sem)
  {
return amdgpu_cs_unreference_sem(sem);
  }
+
+int amdgpu_cs_create_syncobj(amdgpu_device_handle dev,
+uint32_t *handle)
+{
+   if (NULL == dev)
+   return -EINVAL;
+
+   return drmSyncobjCreate(dev->fd, 0, handle);
+}
+
+int amdgpu_cs_destroy_syncobj(amdgpu_device_handle dev,
+ uint32_t handle)
+{
+   if (NULL == dev)
+   return -EINVAL;
+
+   return drmSyncobjDestroy(dev->fd, handle);
+}
+
+int amdgpu_cs_export_syncobj(amdgpu_device_handle dev,
+uint32_t handle,
+int *shared_fd)
+{
+   if (NULL == dev)
+   return -EINVAL;
+
+   return drmSyncobjHandleToFD(dev->fd, handle, shared_fd);
+}
+
+int amdgpu_cs_import_syncobj(amdgpu_device_handle dev,
+int shared_fd,
+uint32_t *handle)
+{
+   if (NULL == dev)
+   return -EINVAL;
+
+   return drmSyncobjFDToHandle(dev->fd, shared_fd, handle);
+}


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH libdrm] libdrm_amdgpu: add kernel semaphore support

2017-07-17 Thread zhoucm1

On 2017年07月18日 01:35, Christian König wrote:

Am 17.07.2017 um 19:22 schrieb Marek Olšák:

On Sun, Jul 16, 2017 at 11:36 PM, Dave Airlie wrote:
I can take a look at it, I just won't have time until next week
most likely.

I've taken a look, and it's seemingly more complicated than I'm
expecting I'd want to land in Mesa before 17.2 ships, I'd really
prefer to just push the new libdrm_amdgpu api from this patch. If I
have to port all the current radv code to the new API, I'll most
definitely get something wrong.

Adding the new API so far looks like
https://cgit.freedesktop.org/~airlied/drm/log/?h=drm-amdgpu-cs-submit-raw

https://cgit.freedesktop.org/~airlied/drm/commit/?h=drm-amdgpu-cs-submit-raw&id=e7f85d0ca617fa41e72624780c9035df132e23c4

being the API, and whether it should take a uint32_t context id or
context handle left as an open question in the last patch in the
series.

However to hook this into radv or radeonsi will take a bit of
rewriting of a lot of code that is probably a bit more fragile than
I'd like for this sort of surgery at this point.

I'd actually suspect if we do want to proceed with this type of
interface, we might be better doing it all in common mesa code, and
maybe bypassing libdrm_amdgpu altogether, which I suppose the API I've
written here is mostly already doing.

Well, we plan to stop using the BO list ioctl. The interface has
bo_list_handle in it. Will we just set it to 0 when add the chunk for
the inlined buffer list i.e. what radeon has?

Yeah, exactly that was my thinking as well.
Just one thought, Could we remove and not use bo list at all? Instead,
we expose api like amdgpu_bo_make_resident with proper privilege to user
mode? That way, we will obviously short CS ioctl.

David Zhou

Christian.

Marek

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 2/2] drm/amdgpu: enable huge page handling in the VM v4

2017-07-17 Thread zhoucm1


Still holding on? I thought this patch was pushed in earlier with my RB.

Regards,
David Zhou
On 2017年07月18日 05:02, Christian König wrote:

From: Christian König 

The hardware can use huge pages to map 2MB of address space with only one PDE.

v2: few cleanups and rebased
v3: skip PT updates if we are using the PDE
v4: rebased, added support for CPU based updates

Signed-off-by: Christian König 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 119 +++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |   4 ++
  2 files changed, 103 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index a3dbebe..62d97f5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -351,6 +351,7 @@ static int amdgpu_vm_alloc_levels(struct amdgpu_device 
*adev,
  
  			entry->bo = pt;

entry->addr = 0;
+   entry->huge_page = false;
}
  
  		if (level < adev->vm_manager.num_level) {

@@ -1116,7 +1117,8 @@ static int amdgpu_vm_update_level(struct amdgpu_device 
*adev,
  
  		pt = amdgpu_bo_gpu_offset(bo);

pt = amdgpu_gart_get_vm_pde(adev, pt);
-   if (parent->entries[pt_idx].addr == pt)
+   if (parent->entries[pt_idx].addr == pt ||
+   parent->entries[pt_idx].huge_page)
continue;
  
  		parent->entries[pt_idx].addr = pt;

@@ -1257,29 +1259,95 @@ int amdgpu_vm_update_directories(struct amdgpu_device 
*adev,
  }
  
  /**

- * amdgpu_vm_find_pt - find the page table for an address
+ * amdgpu_vm_find_entry - find the entry for an address
   *
   * @p: see amdgpu_pte_update_params definition
   * @addr: virtual address in question
+ * @entry: resulting entry or NULL
+ * @parent: parent entry
   *
- * Find the page table BO for a virtual address, return NULL when none found.
+ * Find the vm_pt entry and it's parent for the given address.
   */
-static struct amdgpu_bo *amdgpu_vm_get_pt(struct amdgpu_pte_update_params *p,
- uint64_t addr)
+void amdgpu_vm_get_entry(struct amdgpu_pte_update_params *p, uint64_t addr,
+struct amdgpu_vm_pt **entry,
+struct amdgpu_vm_pt **parent)
  {
-   struct amdgpu_vm_pt *entry = &p->vm->root;
unsigned idx, level = p->adev->vm_manager.num_level;
  
-	while (entry->entries) {

+   *parent = NULL;
+   *entry = &p->vm->root;
+   while ((*entry)->entries) {
idx = addr >> (p->adev->vm_manager.block_size * level--);
-   idx %= amdgpu_bo_size(entry->bo) / 8;
-   entry = &entry->entries[idx];
+   idx %= amdgpu_bo_size((*entry)->bo) / 8;
+   *parent = *entry;
+   *entry = &(*entry)->entries[idx];
}
  
  	if (level)

-   return NULL;
+   *entry = NULL;
+}
+
+/**
+ * amdgpu_vm_handle_huge_pages - handle updating the PD with huge pages
+ *
+ * @p: see amdgpu_pte_update_params definition
+ * @entry: vm_pt entry to check
+ * @parent: parent entry
+ * @nptes: number of PTEs updated with this operation
+ * @dst: destination address where the PTEs should point to
+ * @flags: access flags fro the PTEs
+ *
+ * Check if we can update the PD with a huge page.
+ */
+static int amdgpu_vm_handle_huge_pages(struct amdgpu_pte_update_params *p,
+  struct amdgpu_vm_pt *entry,
+  struct amdgpu_vm_pt *parent,
+  unsigned nptes, uint64_t dst,
+  uint64_t flags)
+{
+   bool use_cpu_update = (p->func == amdgpu_vm_cpu_set_ptes);
+   uint64_t pd_addr, pde;
+   int r;
  
-	return entry->bo;

+   /* In the case of a mixed PT the PDE must point to it*/
+   if (p->adev->asic_type < CHIP_VEGA10 ||
+   nptes != AMDGPU_VM_PTE_COUNT(p->adev) ||
+   p->func != amdgpu_vm_do_set_ptes ||
+   !(flags & AMDGPU_PTE_VALID)) {
+
+   dst = amdgpu_bo_gpu_offset(entry->bo);
+   dst = amdgpu_gart_get_vm_pde(p->adev, dst);
+   flags = AMDGPU_PTE_VALID;
+   } else {
+   flags |= AMDGPU_PDE_PTE;
+   }
+
+   if (entry->addr == dst &&
+   entry->huge_page == !!(flags & AMDGPU_PDE_PTE))
+   return 0;
+
+   entry->addr = dst;
+   entry->huge_page = !!(flags & AMDGPU_PDE_PTE);
+
+   if (use_cpu_update) {
+   r = amdgpu_bo_kmap(parent->bo, (void *)&pd_addr);
+   if (r)
+   return r;
+
+   pde = pd_addr + (entry - parent->entries) * 8;
+   amdgpu_vm_cpu_set_ptes(p, pde, dst, 1, 0, flags);
+   } else {
+   if (parent->bo->shadow) {
+   pd_addr = amdgpu_bo_gpu_offset(parent->bo->shadow);
+

[PATCH libdrm 2/2] drm/amdgpu: add new low overhead command submission API.

From: Dave Airlie 

This just sends chunks to the kernel API for a single command
stream.

This should provide a more future proof and extensible API
for command submission.

Signed-off-by: Dave Airlie 
---
 amdgpu/amdgpu.h| 21 +
 amdgpu/amdgpu_cs.c | 30 ++
 2 files changed, 51 insertions(+)

diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h
index 183f974..b4a070d 100644
--- a/amdgpu/amdgpu.h
+++ b/amdgpu/amdgpu.h
@@ -1382,6 +1382,27 @@ int amdgpu_cs_import_syncobj(amdgpu_device_handle dev,
 int shared_fd,
 uint32_t *syncobj);
 
+/**
+ *  Submit raw command submission to kernel
+ *
+ * \param   dev   - \c [in] device handle
+ * \param   context- \c [in] context handle for context id
+ * \param   bo_list_handle - \c [in] request bo list handle (0 for none)
+ * \param   num_chunks - \c [in] number of CS chunks to submit
+ * \param   chunks - \c [in] array of CS chunks
+ * \param   seq_no - \c [out] output sequence number for submission.
+ *
+ * \return   0 on success\n
+ *  <0 - Negative POSIX Error code
+ *
+*/
+struct drm_amdgpu_cs_chunk;
+int amdgpu_cs_submit_raw(amdgpu_device_handle dev,
+amdgpu_context_handle context,
+uint32_t bo_list_handle,
+int num_chunks,
+struct drm_amdgpu_cs_chunk *chunks,
+uint64_t *seq_no);
 #ifdef __cplusplus
 }
 #endif
diff --git a/amdgpu/amdgpu_cs.c b/amdgpu/amdgpu_cs.c
index 722fd75..3c32070 100644
--- a/amdgpu/amdgpu_cs.c
+++ b/amdgpu/amdgpu_cs.c
@@ -634,3 +634,33 @@ int amdgpu_cs_import_syncobj(amdgpu_device_handle dev,
 
return drmSyncobjFDToHandle(dev->fd, shared_fd, handle);
 }
+
+int amdgpu_cs_submit_raw(amdgpu_device_handle dev,
+amdgpu_context_handle context,
+uint32_t bo_list_handle,
+int num_chunks,
+struct drm_amdgpu_cs_chunk *chunks,
+uint64_t *seq_no)
+{
+   union drm_amdgpu_cs cs = {0};
+   uint64_t *chunk_array;
+   int i, r;
+   if (num_chunks == 0)
+   return -EINVAL;
+
+   chunk_array = alloca(sizeof(uint64_t) * num_chunks);
+   for (i = 0; i < num_chunks; i++)
+   chunk_array[i] = (uint64_t)(uintptr_t)&chunks[i];
+   cs.in.chunks = (uint64_t)(uintptr_t)chunk_array;
+   cs.in.ctx_id = context->id;
+   cs.in.bo_list_handle = bo_list_handle;
+   cs.in.num_chunks = num_chunks;
+   r = drmCommandWriteRead(dev->fd, DRM_AMDGPU_CS,
+   &cs, sizeof(cs));
+   if (r)
+   return r;
+
+   if (seq_no)
+   *seq_no = cs.out.handle;
+   return 0;
+}
-- 
2.9.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH libdrm 1/2] drm/amdgpu: add syncobj create/destroy/import/export apis

From: Dave Airlie 

These are just wrappers using the amdgpu device handle.

Signed-off-by: Dave Airlie 
---
 amdgpu/amdgpu.h| 55 +-
 amdgpu/amdgpu_cs.c | 38 +
 2 files changed, 92 insertions(+), 1 deletion(-)

diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h
index 1901fa8..183f974 100644
--- a/amdgpu/amdgpu.h
+++ b/amdgpu/amdgpu.h
@@ -1328,8 +1328,61 @@ int amdgpu_cs_destroy_semaphore(amdgpu_semaphore_handle 
sem);
 */
 const char *amdgpu_get_marketing_name(amdgpu_device_handle dev);
 
+/**
+ *  Create kernel sync object
+ *
+ * \param   dev  - \c [in]  device handle
+ * \param   syncobj   - \c [out] sync object handle
+ *
+ * \return   0 on success\n
+ *  <0 - Negative POSIX Error code
+ *
+*/
+int amdgpu_cs_create_syncobj(amdgpu_device_handle dev,
+uint32_t *syncobj);
+/**
+ *  Destroy kernel sync object
+ *
+ * \param   dev- \c [in] device handle
+ * \param   syncobj - \c [in] sync object handle
+ *
+ * \return   0 on success\n
+ *  <0 - Negative POSIX Error code
+ *
+*/
+int amdgpu_cs_destroy_syncobj(amdgpu_device_handle dev,
+ uint32_t syncobj);
+
+/**
+ *  Export kernel sync object to shareable fd.
+ *
+ * \param   dev   - \c [in] device handle
+ * \param   syncobj- \c [in] sync object handle
+ * \param   shared_fd  - \c [out] shared file descriptor.
+ *
+ * \return   0 on success\n
+ *  <0 - Negative POSIX Error code
+ *
+*/
+int amdgpu_cs_export_syncobj(amdgpu_device_handle dev,
+uint32_t syncobj,
+int *shared_fd);
+/**
+ *  Import kernel sync object from shareable fd.
+ *
+ * \param   dev   - \c [in] device handle
+ * \param   shared_fd  - \c [in] shared file descriptor.
+ * \param   syncobj- \c [out] sync object handle
+ *
+ * \return   0 on success\n
+ *  <0 - Negative POSIX Error code
+ *
+*/
+int amdgpu_cs_import_syncobj(amdgpu_device_handle dev,
+int shared_fd,
+uint32_t *syncobj);
+
 #ifdef __cplusplus
 }
 #endif
-
 #endif /* #ifdef _AMDGPU_H_ */
diff --git a/amdgpu/amdgpu_cs.c b/amdgpu/amdgpu_cs.c
index 868eb7b..722fd75 100644
--- a/amdgpu/amdgpu_cs.c
+++ b/amdgpu/amdgpu_cs.c
@@ -596,3 +596,41 @@ int amdgpu_cs_destroy_semaphore(amdgpu_semaphore_handle 
sem)
 {
return amdgpu_cs_unreference_sem(sem);
 }
+
+int amdgpu_cs_create_syncobj(amdgpu_device_handle dev,
+uint32_t *handle)
+{
+   if (NULL == dev)
+   return -EINVAL;
+
+   return drmSyncobjCreate(dev->fd, 0, handle);
+}
+
+int amdgpu_cs_destroy_syncobj(amdgpu_device_handle dev,
+ uint32_t handle)
+{
+   if (NULL == dev)
+   return -EINVAL;
+
+   return drmSyncobjDestroy(dev->fd, handle);
+}
+
+int amdgpu_cs_export_syncobj(amdgpu_device_handle dev,
+uint32_t handle,
+int *shared_fd)
+{
+   if (NULL == dev)
+   return -EINVAL;
+
+   return drmSyncobjHandleToFD(dev->fd, handle, shared_fd);
+}
+
+int amdgpu_cs_import_syncobj(amdgpu_device_handle dev,
+int shared_fd,
+uint32_t *handle)
+{
+   if (NULL == dev)
+   return -EINVAL;
+
+   return drmSyncobjFDToHandle(dev->fd, shared_fd, handle);
+}
-- 
2.9.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 2/2] drm/amdgpu: enable huge page handling in the VM v4


Am 17.07.2017 um 23:30 schrieb Felix Kuehling:

On 17-07-17 05:02 PM, Christian König wrote:

+   if (p->adev->asic_type < CHIP_VEGA10 ||
+   nptes != AMDGPU_VM_PTE_COUNT(p->adev) ||
+   p->func != amdgpu_vm_do_set_ptes ||
+   !(flags & AMDGPU_PTE_VALID)) {

Because of this condition, I think this still won't work correctly for
cpu page table updates. p->func will be amdgpu_vm_cpu_set_ptes.


Good point.

This is totally untested anyway, because of lack of hardware access at 
the moment.


Just wanted to point you to the bits I've changed for testing it.

Christian.



Regards,
   Felix
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 2/2] drm/amdgpu: enable huge page handling in the VM v4

On 17-07-17 05:02 PM, Christian König wrote:
> + if (p->adev->asic_type < CHIP_VEGA10 ||
> + nptes != AMDGPU_VM_PTE_COUNT(p->adev) ||
> + p->func != amdgpu_vm_do_set_ptes ||
> + !(flags & AMDGPU_PTE_VALID)) {

Because of this condition, I think this still won't work correctly for
cpu page table updates. p->func will be amdgpu_vm_cpu_set_ptes.

Regards,
  Felix
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 2/2] drm/amdgpu: enable huge page handling in the VM v4


The minimum BO size is 4K, so that can never happen.

Christian.

Am 17.07.2017 um 23:21 schrieb StDenis, Tom:

In amdgpu_vm_get_entry() if the bo size is less than 8 you'll get a divide by 
zero.  Are there mechanisms to prevent this?  Maybe add a BUG() there?

Tom

From: amd-gfx  on behalf of Christian König 

Sent: Monday, July 17, 2017 17:02
To: amd-gfx@lists.freedesktop.org
Cc: Kuehling, Felix
Subject: [PATCH 2/2] drm/amdgpu: enable huge page handling in the VM v4

From: Christian König 

The hardware can use huge pages to map 2MB of address space with only one PDE.

v2: few cleanups and rebased
v3: skip PT updates if we are using the PDE
v4: rebased, added support for CPU based updates

Signed-off-by: Christian König 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 119 +++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |   4 ++
  2 files changed, 103 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index a3dbebe..62d97f5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -351,6 +351,7 @@ static int amdgpu_vm_alloc_levels(struct amdgpu_device 
*adev,

 entry->bo = pt;
 entry->addr = 0;
+   entry->huge_page = false;
 }

 if (level < adev->vm_manager.num_level) {
@@ -1116,7 +1117,8 @@ static int amdgpu_vm_update_level(struct amdgpu_device 
*adev,

 pt = amdgpu_bo_gpu_offset(bo);
 pt = amdgpu_gart_get_vm_pde(adev, pt);
-   if (parent->entries[pt_idx].addr == pt)
+   if (parent->entries[pt_idx].addr == pt ||
+   parent->entries[pt_idx].huge_page)
 continue;

 parent->entries[pt_idx].addr = pt;
@@ -1257,29 +1259,95 @@ int amdgpu_vm_update_directories(struct amdgpu_device 
*adev,
  }

  /**
- * amdgpu_vm_find_pt - find the page table for an address
+ * amdgpu_vm_find_entry - find the entry for an address
   *
   * @p: see amdgpu_pte_update_params definition
   * @addr: virtual address in question
+ * @entry: resulting entry or NULL
+ * @parent: parent entry
   *
- * Find the page table BO for a virtual address, return NULL when none found.
+ * Find the vm_pt entry and it's parent for the given address.
   */
-static struct amdgpu_bo *amdgpu_vm_get_pt(struct amdgpu_pte_update_params *p,
- uint64_t addr)
+void amdgpu_vm_get_entry(struct amdgpu_pte_update_params *p, uint64_t addr,
+struct amdgpu_vm_pt **entry,
+struct amdgpu_vm_pt **parent)
  {
-   struct amdgpu_vm_pt *entry = &p->vm->root;
 unsigned idx, level = p->adev->vm_manager.num_level;

-   while (entry->entries) {
+   *parent = NULL;
+   *entry = &p->vm->root;
+   while ((*entry)->entries) {
 idx = addr >> (p->adev->vm_manager.block_size * level--);
-   idx %= amdgpu_bo_size(entry->bo) / 8;
-   entry = &entry->entries[idx];
+   idx %= amdgpu_bo_size((*entry)->bo) / 8;
+   *parent = *entry;
+   *entry = &(*entry)->entries[idx];
 }

 if (level)
-   return NULL;
+   *entry = NULL;
+}
+
+/**
+ * amdgpu_vm_handle_huge_pages - handle updating the PD with huge pages
+ *
+ * @p: see amdgpu_pte_update_params definition
+ * @entry: vm_pt entry to check
+ * @parent: parent entry
+ * @nptes: number of PTEs updated with this operation
+ * @dst: destination address where the PTEs should point to
+ * @flags: access flags fro the PTEs
+ *
+ * Check if we can update the PD with a huge page.
+ */
+static int amdgpu_vm_handle_huge_pages(struct amdgpu_pte_update_params *p,
+  struct amdgpu_vm_pt *entry,
+  struct amdgpu_vm_pt *parent,
+  unsigned nptes, uint64_t dst,
+  uint64_t flags)
+{
+   bool use_cpu_update = (p->func == amdgpu_vm_cpu_set_ptes);
+   uint64_t pd_addr, pde;
+   int r;

-   return entry->bo;
+   /* In the case of a mixed PT the PDE must point to it*/
+   if (p->adev->asic_type < CHIP_VEGA10 ||
+   nptes != AMDGPU_VM_PTE_COUNT(p->adev) ||
+   p->func != amdgpu_vm_do_set_ptes ||
+   !(flags & AMDGPU_PTE_VALID)) {
+
+   dst = amdgpu_bo_gpu_offset(entry->bo);
+   dst = amdgpu_gart_get_vm_pde(p->adev, dst);
+   flags = AMDGPU_PTE_VALID;
+   } else {
+   flags |= AMDGPU_PDE_PTE;
+   }
+
+   if (entry->addr == dst &&
+   entry->huge_page == !!(flags & AMDGPU_PDE_PTE))
+   return 0;
+
+   entry->addr = dst;
+   entry->huge_page = !!(flags & AMDGPU_PDE_PTE);
+
+

Re: [PATCH 2/2] drm/amdgpu: enable huge page handling in the VM v4

2017-07-17 Thread StDenis, Tom

In amdgpu_vm_get_entry() if the bo size is less than 8 you'll get a divide by 
zero.  Are there mechanisms to prevent this?  Maybe add a BUG() there?

Tom

From: amd-gfx  on behalf of Christian 
König 
Sent: Monday, July 17, 2017 17:02
To: amd-gfx@lists.freedesktop.org
Cc: Kuehling, Felix
Subject: [PATCH 2/2] drm/amdgpu: enable huge page handling in the VM v4

From: Christian König 

The hardware can use huge pages to map 2MB of address space with only one PDE.

v2: few cleanups and rebased
v3: skip PT updates if we are using the PDE
v4: rebased, added support for CPU based updates

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 119 +++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |   4 ++
 2 files changed, 103 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index a3dbebe..62d97f5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -351,6 +351,7 @@ static int amdgpu_vm_alloc_levels(struct amdgpu_device 
*adev,

entry->bo = pt;
entry->addr = 0;
+   entry->huge_page = false;
}

if (level < adev->vm_manager.num_level) {
@@ -1116,7 +1117,8 @@ static int amdgpu_vm_update_level(struct amdgpu_device 
*adev,

pt = amdgpu_bo_gpu_offset(bo);
pt = amdgpu_gart_get_vm_pde(adev, pt);
-   if (parent->entries[pt_idx].addr == pt)
+   if (parent->entries[pt_idx].addr == pt ||
+   parent->entries[pt_idx].huge_page)
continue;

parent->entries[pt_idx].addr = pt;
@@ -1257,29 +1259,95 @@ int amdgpu_vm_update_directories(struct amdgpu_device 
*adev,
 }

 /**
- * amdgpu_vm_find_pt - find the page table for an address
+ * amdgpu_vm_find_entry - find the entry for an address
  *
  * @p: see amdgpu_pte_update_params definition
  * @addr: virtual address in question
+ * @entry: resulting entry or NULL
+ * @parent: parent entry
  *
- * Find the page table BO for a virtual address, return NULL when none found.
+ * Find the vm_pt entry and it's parent for the given address.
  */
-static struct amdgpu_bo *amdgpu_vm_get_pt(struct amdgpu_pte_update_params *p,
- uint64_t addr)
+void amdgpu_vm_get_entry(struct amdgpu_pte_update_params *p, uint64_t addr,
+struct amdgpu_vm_pt **entry,
+struct amdgpu_vm_pt **parent)
 {
-   struct amdgpu_vm_pt *entry = &p->vm->root;
unsigned idx, level = p->adev->vm_manager.num_level;

-   while (entry->entries) {
+   *parent = NULL;
+   *entry = &p->vm->root;
+   while ((*entry)->entries) {
idx = addr >> (p->adev->vm_manager.block_size * level--);
-   idx %= amdgpu_bo_size(entry->bo) / 8;
-   entry = &entry->entries[idx];
+   idx %= amdgpu_bo_size((*entry)->bo) / 8;
+   *parent = *entry;
+   *entry = &(*entry)->entries[idx];
}

if (level)
-   return NULL;
+   *entry = NULL;
+}
+
+/**
+ * amdgpu_vm_handle_huge_pages - handle updating the PD with huge pages
+ *
+ * @p: see amdgpu_pte_update_params definition
+ * @entry: vm_pt entry to check
+ * @parent: parent entry
+ * @nptes: number of PTEs updated with this operation
+ * @dst: destination address where the PTEs should point to
+ * @flags: access flags fro the PTEs
+ *
+ * Check if we can update the PD with a huge page.
+ */
+static int amdgpu_vm_handle_huge_pages(struct amdgpu_pte_update_params *p,
+  struct amdgpu_vm_pt *entry,
+  struct amdgpu_vm_pt *parent,
+  unsigned nptes, uint64_t dst,
+  uint64_t flags)
+{
+   bool use_cpu_update = (p->func == amdgpu_vm_cpu_set_ptes);
+   uint64_t pd_addr, pde;
+   int r;

-   return entry->bo;
+   /* In the case of a mixed PT the PDE must point to it*/
+   if (p->adev->asic_type < CHIP_VEGA10 ||
+   nptes != AMDGPU_VM_PTE_COUNT(p->adev) ||
+   p->func != amdgpu_vm_do_set_ptes ||
+   !(flags & AMDGPU_PTE_VALID)) {
+
+   dst = amdgpu_bo_gpu_offset(entry->bo);
+   dst = amdgpu_gart_get_vm_pde(p->adev, dst);
+   flags = AMDGPU_PTE_VALID;
+   } else {
+   flags |= AMDGPU_PDE_PTE;
+   }
+
+   if (entry->addr == dst &&
+   entry->huge_page == !!(flags & AMDGPU_PDE_PTE))
+   return 0;
+
+   entry->addr = dst;
+   entry->huge_page = !!(flags & AMDGPU_PDE_PTE);
+
+   if (use_cpu_update) {
+   r = amdgpu_bo_kmap(parent->bo, (void *)&pd_addr);
+   if (r)
+

[PATCH 2/2] drm/amdgpu: enable huge page handling in the VM v4

From: Christian König 

The hardware can use huge pages to map 2MB of address space with only one PDE.

v2: few cleanups and rebased
v3: skip PT updates if we are using the PDE
v4: rebased, added support for CPU based updates

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 119 +++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h |   4 ++
 2 files changed, 103 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index a3dbebe..62d97f5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -351,6 +351,7 @@ static int amdgpu_vm_alloc_levels(struct amdgpu_device 
*adev,
 
entry->bo = pt;
entry->addr = 0;
+   entry->huge_page = false;
}
 
if (level < adev->vm_manager.num_level) {
@@ -1116,7 +1117,8 @@ static int amdgpu_vm_update_level(struct amdgpu_device 
*adev,
 
pt = amdgpu_bo_gpu_offset(bo);
pt = amdgpu_gart_get_vm_pde(adev, pt);
-   if (parent->entries[pt_idx].addr == pt)
+   if (parent->entries[pt_idx].addr == pt ||
+   parent->entries[pt_idx].huge_page)
continue;
 
parent->entries[pt_idx].addr = pt;
@@ -1257,29 +1259,95 @@ int amdgpu_vm_update_directories(struct amdgpu_device 
*adev,
 }
 
 /**
- * amdgpu_vm_find_pt - find the page table for an address
+ * amdgpu_vm_find_entry - find the entry for an address
  *
  * @p: see amdgpu_pte_update_params definition
  * @addr: virtual address in question
+ * @entry: resulting entry or NULL
+ * @parent: parent entry
  *
- * Find the page table BO for a virtual address, return NULL when none found.
+ * Find the vm_pt entry and it's parent for the given address.
  */
-static struct amdgpu_bo *amdgpu_vm_get_pt(struct amdgpu_pte_update_params *p,
- uint64_t addr)
+void amdgpu_vm_get_entry(struct amdgpu_pte_update_params *p, uint64_t addr,
+struct amdgpu_vm_pt **entry,
+struct amdgpu_vm_pt **parent)
 {
-   struct amdgpu_vm_pt *entry = &p->vm->root;
unsigned idx, level = p->adev->vm_manager.num_level;
 
-   while (entry->entries) {
+   *parent = NULL;
+   *entry = &p->vm->root;
+   while ((*entry)->entries) {
idx = addr >> (p->adev->vm_manager.block_size * level--);
-   idx %= amdgpu_bo_size(entry->bo) / 8;
-   entry = &entry->entries[idx];
+   idx %= amdgpu_bo_size((*entry)->bo) / 8;
+   *parent = *entry;
+   *entry = &(*entry)->entries[idx];
}
 
if (level)
-   return NULL;
+   *entry = NULL;
+}
+
+/**
+ * amdgpu_vm_handle_huge_pages - handle updating the PD with huge pages
+ *
+ * @p: see amdgpu_pte_update_params definition
+ * @entry: vm_pt entry to check
+ * @parent: parent entry
+ * @nptes: number of PTEs updated with this operation
+ * @dst: destination address where the PTEs should point to
+ * @flags: access flags fro the PTEs
+ *
+ * Check if we can update the PD with a huge page.
+ */
+static int amdgpu_vm_handle_huge_pages(struct amdgpu_pte_update_params *p,
+  struct amdgpu_vm_pt *entry,
+  struct amdgpu_vm_pt *parent,
+  unsigned nptes, uint64_t dst,
+  uint64_t flags)
+{
+   bool use_cpu_update = (p->func == amdgpu_vm_cpu_set_ptes);
+   uint64_t pd_addr, pde;
+   int r;
 
-   return entry->bo;
+   /* In the case of a mixed PT the PDE must point to it*/
+   if (p->adev->asic_type < CHIP_VEGA10 ||
+   nptes != AMDGPU_VM_PTE_COUNT(p->adev) ||
+   p->func != amdgpu_vm_do_set_ptes ||
+   !(flags & AMDGPU_PTE_VALID)) {
+
+   dst = amdgpu_bo_gpu_offset(entry->bo);
+   dst = amdgpu_gart_get_vm_pde(p->adev, dst);
+   flags = AMDGPU_PTE_VALID;
+   } else {
+   flags |= AMDGPU_PDE_PTE;
+   }
+
+   if (entry->addr == dst &&
+   entry->huge_page == !!(flags & AMDGPU_PDE_PTE))
+   return 0;
+
+   entry->addr = dst;
+   entry->huge_page = !!(flags & AMDGPU_PDE_PTE);
+
+   if (use_cpu_update) {
+   r = amdgpu_bo_kmap(parent->bo, (void *)&pd_addr);
+   if (r)
+   return r;
+
+   pde = pd_addr + (entry - parent->entries) * 8;
+   amdgpu_vm_cpu_set_ptes(p, pde, dst, 1, 0, flags);
+   } else {
+   if (parent->bo->shadow) {
+   pd_addr = amdgpu_bo_gpu_offset(parent->bo->shadow);
+   pde = pd_addr + (entry - parent->entries) * 8;
+   amdgpu_vm_do_set_ptes

[PATCH 1/2] drm/amdgpu: increase fragmentation size for Vega10 v2

From: Christian König 

The fragment bits work differently for Vega10 compared to previous generations.

Increase the fragment size to 2MB for now to better handle that.

v2: handle the hardware setup as well

Signed-off-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c  | 5 +++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c   | 5 +++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h   | 4 +++-
 drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c | 4 +++-
 drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c  | 4 +++-
 5 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 7a8da32..fc77844 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -588,8 +588,9 @@ static int amdgpu_info_ioctl(struct drm_device *dev, void 
*data, struct drm_file
dev_info.virtual_address_offset = AMDGPU_VA_RESERVED_SIZE;
dev_info.virtual_address_max = 
(uint64_t)adev->vm_manager.max_pfn * AMDGPU_GPU_PAGE_SIZE;
dev_info.virtual_address_alignment = max((int)PAGE_SIZE, 
AMDGPU_GPU_PAGE_SIZE);
-   dev_info.pte_fragment_size = (1 << AMDGPU_LOG2_PAGES_PER_FRAG) *
-AMDGPU_GPU_PAGE_SIZE;
+   dev_info.pte_fragment_size =
+   (1 << AMDGPU_LOG2_PAGES_PER_FRAG(adev)) *
+   AMDGPU_GPU_PAGE_SIZE;
dev_info.gart_page_size = AMDGPU_GPU_PAGE_SIZE;
 
dev_info.cu_active_number = adev->gfx.cu_info.number;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 55d1c7f..a3dbebe 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -1381,8 +1381,9 @@ static int amdgpu_vm_frag_ptes(struct 
amdgpu_pte_update_params*params,
 */
 
/* SI and newer are optimized for 64KB */
-   uint64_t frag_flags = AMDGPU_PTE_FRAG(AMDGPU_LOG2_PAGES_PER_FRAG);
-   uint64_t frag_align = 1 << AMDGPU_LOG2_PAGES_PER_FRAG;
+   unsigned pages_per_frag = AMDGPU_LOG2_PAGES_PER_FRAG(params->adev);
+   uint64_t frag_flags = AMDGPU_PTE_FRAG(pages_per_frag);
+   uint64_t frag_align = 1 << pages_per_frag;
 
uint64_t frag_start = ALIGN(start, frag_align);
uint64_t frag_end = end & ~(frag_align - 1);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
index 3441ec5..c4f5d1f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h
@@ -51,7 +51,9 @@ struct amdgpu_bo_list_entry;
 #define AMDGPU_VM_PTB_ALIGN_SIZE   32768
 
 /* LOG2 number of continuous pages for the fragment field */
-#define AMDGPU_LOG2_PAGES_PER_FRAG 4
+#define AMDGPU_LOG2_PAGES_PER_FRAG(adev) \
+   ((adev)->asic_type < CHIP_VEGA10 ? 4 : \
+(adev)->vm_manager.block_size)
 
 #define AMDGPU_PTE_VALID   (1ULL << 0)
 #define AMDGPU_PTE_SYSTEM  (1ULL << 1)
diff --git a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
index 008ad3d..408723e 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
@@ -129,7 +129,7 @@ static void gfxhub_v1_0_init_cache_regs(struct 
amdgpu_device *adev)
/* Setup L2 cache */
tmp = RREG32_SOC15(GC, 0, mmVM_L2_CNTL);
tmp = REG_SET_FIELD(tmp, VM_L2_CNTL, ENABLE_L2_CACHE, 1);
-   tmp = REG_SET_FIELD(tmp, VM_L2_CNTL, ENABLE_L2_FRAGMENT_PROCESSING, 0);
+   tmp = REG_SET_FIELD(tmp, VM_L2_CNTL, ENABLE_L2_FRAGMENT_PROCESSING, 1);
/* XXX for emulation, Refer to closed source code.*/
tmp = REG_SET_FIELD(tmp, VM_L2_CNTL, L2_PDE0_CACHE_TAG_GENERATION_MODE,
0);
@@ -144,6 +144,8 @@ static void gfxhub_v1_0_init_cache_regs(struct 
amdgpu_device *adev)
WREG32_SOC15(GC, 0, mmVM_L2_CNTL2, tmp);
 
tmp = mmVM_L2_CNTL3_DEFAULT;
+   tmp = REG_SET_FIELD(tmp, VM_L2_CNTL3, BANK_SELECT, 12);
+   tmp = REG_SET_FIELD(tmp, VM_L2_CNTL3, L2_CACHE_BIGK_FRAGMENT_SIZE, 9);
WREG32_SOC15(GC, 0, mmVM_L2_CNTL3, tmp);
 
tmp = mmVM_L2_CNTL4_DEFAULT;
diff --git a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c 
b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c
index 96f1628..ad8def3 100644
--- a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c
@@ -143,7 +143,7 @@ static void mmhub_v1_0_init_cache_regs(struct amdgpu_device 
*adev)
/* Setup L2 cache */
tmp = RREG32_SOC15(MMHUB, 0, mmVM_L2_CNTL);
tmp = REG_SET_FIELD(tmp, VM_L2_CNTL, ENABLE_L2_CACHE, 1);
-   tmp = REG_SET_FIELD(tmp, VM_L2_CNTL, ENABLE_L2_FRAGMENT_PROCESSING, 0);
+   tmp = REG_SET_FIELD(tmp, VM_L2_CNTL, ENABLE_L2_FRAGMENT_PROCESSING, 1);
/* XXX for emulation, Refer to closed source code.*/
tmp = REG_SET_FIELD(tmp, VM_L2_CNTL, L2_PDE0_CACHE_TAG_GENERATION_MODE,

Re: [PATCH libdrm] libdrm_amdgpu: add kernel semaphore support

Am 17.07.2017 um 19:22 schrieb Marek Olšák:

On Sun, Jul 16, 2017 at 11:36 PM, Dave Airlie wrote:

I can take a look at it, I just won't have time until next week most likely.

Adding the new API so far looks like
https://cgit.freedesktop.org/~airlied/drm/log/?h=drm-amdgpu-cs-submit-raw

https://cgit.freedesktop.org/~airlied/drm/commit/?h=drm-amdgpu-cs-submit-raw&id=e7f85d0ca617fa41e72624780c9035df132e23c4
being the API, and whether it should take a uint32_t context id or
context handle left as an open question in the last patch in the
series.

However to hook this into radv or radeonsi will take a bit of
rewriting of a lot of code that is probably a bit more fragile than
I'd like for this sort of surgery at this point.

Well, we plan to stop using the BO list ioctl. The interface has
bo_list_handle in it. Will we just set it to 0 when add the chunk for
the inlined buffer list i.e. what radeon has?

Yeah, exactly that was my thinking as well.

Christian.

Marek

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH libdrm] libdrm_amdgpu: add kernel semaphore support

2017-07-17 Thread Marek Olšák

On Sun, Jul 16, 2017 at 11:36 PM, Dave Airlie  wrote:
>>
>> I can take a look at it, I just won't have time until next week most likely.
>
> I've taken a look, and it's seemingly more complicated than I'm
> expecting I'd want to land in Mesa before 17.2 ships, I'd really
> prefer to just push the new libdrm_amdgpu api from this patch. If I
> have to port all the current radv code to the new API, I'll most
> definitely get something wrong.
>
> Adding the new API so far looks like
> https://cgit.freedesktop.org/~airlied/drm/log/?h=drm-amdgpu-cs-submit-raw
>
> https://cgit.freedesktop.org/~airlied/drm/commit/?h=drm-amdgpu-cs-submit-raw&id=e7f85d0ca617fa41e72624780c9035df132e23c4
> being the API, and whether it should take a uint32_t context id or
> context handle left as an open question in the last patch in the
> series.
>
> However to hook this into radv or radeonsi will take a bit of
> rewriting of a lot of code that is probably a bit more fragile than
> I'd like for this sort of surgery at this point.
>
> I'd actually suspect if we do want to proceed with this type of
> interface, we might be better doing it all in common mesa code, and
> maybe bypassing libdrm_amdgpu altogether, which I suppose the API I've
> written here is mostly already doing.

Well, we plan to stop using the BO list ioctl. The interface has
bo_list_handle in it. Will we just set it to 0 when add the chunk for
the inlined buffer list i.e. what radeon has?

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 2/2] drm/amdgpu: Implement ttm_bo_driver.access_vram callback


Am 14.07.2017 um 21:44 schrieb Felix Kuehling:

On 17-07-14 06:08 AM, Christian König wrote:

Am 13.07.2017 um 23:08 schrieb Felix Kuehling:
[SNIP]

+result += bytes;
+buf = (uint8_t *)buf + bytes;
+pos += bytes;
+len -= bytes;
+if (pos >= (nodes->start + nodes->size) << PAGE_SHIFT) {
+++nodes;
+pos = (nodes->start << PAGE_SHIFT);

... Here I handle crossing a node boundary. Yes, I actually added this
case to my kfdtest unit test and made sure it works, along with all odd
alignments that the code above handles.


Ah, I see. Sorry totally missed that chunk. In this case the patch is 
Acked-by: Christian König 


Regards,
Christian.



Regards,
   Felix


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH libdrm] libdrm_amdgpu: add kernel semaphore support