On 2020-03-26 3:11, Pan, Xinhui wrote:

2020年3月26日 14:51,Koenig, Christian <christian.koe...@amd.com> 写道:



Am 26.03.2020 07:45 schrieb "Pan, Xinhui" <xinhui....@amd.com>:


2020年3月26日 14:36,Koenig, Christian <christian.koe...@amd.com> 写道:



Am 26.03.2020 07:15 schrieb "Pan, Xinhui" <xinhui....@amd.com>:


2020年3月26日 13:38,Koenig, Christian <christian.koe...@amd.com> 写道:

Yeah that's on my TODO list for quite a while as well.

But we even need three IB pools. One very small for the IB tests, one for 
direct VM updates and one for the rest.

So please make the pool a parameter to ib_get() and not the hack you have below.
yep, I will make IB pool  a parameter.

IB tests for gfx need many IBs, PAGE_SIZE for ib pool is still not enough.
but the default size for ib pool is 2MB now, just one hugepage, today we have 
memory in TB.
so no need make a different size for IB tests pool.

2MB is probably a bit much and we don't have huge page optimisation for kernel 
allocations at the moment anyway. Keep in mind that we have only limited space 
in the GART.
gart table is just 512MB.
do you mean every entry in gart table just points to one 4KB page? and need 5 
gart table entries for one 2M hugepage?

Yes, we need 512 * 4KB entries for a 2MB page in GART. The table for the system 
VM is flat because of hardware restrictions.

oh yes, 512 entries.

IIRC we tried 256MB for the GART initially and in general we try to keep that 
as small as possible because it eats up visible VRAM space.
that is true. but our roadmap tells that we are a linux ML team.
I can not imagine that customers use small pci bar servers. well, they care the 
cost. but I prefer they using new products. :)

anyway, I will chosse a smallest workable default vaule for now.

For Linux ML we want a small GART size. We used to waste lots of memory for a big GART that was unnecessary and took away valuable memory from ML applications. The GART is only used for kernel mode mappings, most of which tend to be short-lived. Statically allocating a big GART is not necessary.

The IB pools are statically allocated and long-lived. So if we need to increase the GART size to meet that demand, so be it. But we should not waste memory for IBs and associated GART space without a good reason.

Regards,
  Felix



Christian.


Maybe make this 4*PAGE_SIZE for the new IB pool for now and test if that works 
or not.

Christian.




Thanks,
Christian.

Am 26.03.2020 03:02 schrieb "Pan, Xinhui" <xinhui....@amd.com>:
Another ib poll for direct submit.
Any jobs schedule IBs without dependence on gpu scheduler should use
this pool firstly.

Signed-off-by: xinhui pan <xinhui....@amd.com>
---
  drivers/gpu/drm/amd/amdgpu/amdgpu.h     |  1 +
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  |  2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c  | 12 ++++++++++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c |  8 +++++++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_job.h |  3 ++-
  5 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 7dd74253e7b6..c01423ffb8ed 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -849,6 +849,7 @@ struct amdgpu_device {
          struct amdgpu_ring              *rings[AMDGPU_MAX_RINGS];
          bool                            ib_pool_ready;
          struct amdgpu_sa_manager        ring_tmp_bo;
+       struct amdgpu_sa_manager        ring_tmp_bo_direct;
/* interrupts */
          struct amdgpu_irq               irq;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 8304d0c87899..28be4efb3d5b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -920,7 +920,7 @@ static int amdgpu_cs_ib_fill(struct amdgpu_device *adev,
                  parser->entity = entity;
ring = to_amdgpu_ring(entity->rq->sched);
-               r =  amdgpu_ib_get(adev, vm, ring->funcs->parse_cs ?
+               r =  amdgpu_ib_get(adev, (unsigned long )vm|0x1, 
ring->funcs->parse_cs ?
                                     chunk_ib->ib_bytes : 0, ib);
                  if (r) {
                          DRM_ERROR("Failed to get ib !\n");
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
index bece01f1cf09..f2e08c372d57 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
@@ -66,7 +66,7 @@ int amdgpu_ib_get(struct amdgpu_device *adev, struct 
amdgpu_vm *vm,
          int r;
if (size) {
-               r = amdgpu_sa_bo_new(&adev->ring_tmp_bo,
+               r = amdgpu_sa_bo_new(vm ? &adev->ring_tmp_bo : 
&adev->ring_tmp_bo_direct,
                                        &ib->sa_bo, size, 256);
                  if (r) {
                          dev_err(adev->dev, "failed to get a new IB (%d)\n", 
r);
@@ -75,7 +75,7 @@ int amdgpu_ib_get(struct amdgpu_device *adev, struct 
amdgpu_vm *vm,
ib->ptr = amdgpu_sa_bo_cpu_addr(ib->sa_bo); - if (!vm)
+               if (!((unsigned long)vm & ~0x1))
                          ib->gpu_addr = amdgpu_sa_bo_gpu_addr(ib->sa_bo);
          }
@@ -310,6 +310,13 @@ int amdgpu_ib_pool_init(struct amdgpu_device *adev)
                  return r;
          }
+ r = amdgpu_sa_bo_manager_init(adev, &adev->ring_tmp_bo_direct,
+                                     AMDGPU_IB_POOL_SIZE*64*1024,
+                                     AMDGPU_GPU_PAGE_SIZE,
+                                     AMDGPU_GEM_DOMAIN_GTT);
+       if (r) {
+               return r;
+       }
          adev->ib_pool_ready = true;
return 0;
@@ -327,6 +334,7 @@ void amdgpu_ib_pool_fini(struct amdgpu_device *adev)
  {
          if (adev->ib_pool_ready) {
                  amdgpu_sa_bo_manager_fini(adev, &adev->ring_tmp_bo);
+               amdgpu_sa_bo_manager_fini(adev, &adev->ring_tmp_bo_direct);
                  adev->ib_pool_ready = false;
          }
  }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index 4981e443a884..6a63826c6760 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -88,6 +88,12 @@ int amdgpu_job_alloc(struct amdgpu_device *adev, unsigned 
num_ibs,
int amdgpu_job_alloc_with_ib(struct amdgpu_device *adev, unsigned size,
                               struct amdgpu_job **job)
+{
+       return amdgpu_job_alloc_with_ib_direct(adev, size, job, 0);
+}
+
+int amdgpu_job_alloc_with_ib_direct(struct amdgpu_device *adev, unsigned size,
+                            struct amdgpu_job **job, int direct)
  {
          int r;
@@ -95,7 +101,7 @@ int amdgpu_job_alloc_with_ib(struct amdgpu_device *adev, unsigned size,
          if (r)
                  return r;
- r = amdgpu_ib_get(adev, NULL, size, &(*job)->ibs[0]);
+       r = amdgpu_ib_get(adev, direct ? NULL : 0x1, size, &(*job)->ibs[0]);
          if (r)
                  kfree(*job);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
index 2e2110dddb76..be9dd72b9912 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.h
@@ -67,7 +67,8 @@ int amdgpu_job_alloc(struct amdgpu_device *adev, unsigned 
num_ibs,
                       struct amdgpu_job **job, struct amdgpu_vm *vm);
  int amdgpu_job_alloc_with_ib(struct amdgpu_device *adev, unsigned size,
                               struct amdgpu_job **job);
-
+int amdgpu_job_alloc_with_ib_direct(struct amdgpu_device *adev, unsigned size,
+                            struct amdgpu_job **job, int direct);
  void amdgpu_job_free_resources(struct amdgpu_job *job);
  void amdgpu_job_free(struct amdgpu_job *job);
  int amdgpu_job_submit(struct amdgpu_job *job, struct drm_sched_entity *entity,
--
2.17.1


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Reply via email to