RE: [PATCH 2/3] drm/amd/amdgpu: Define and implement a function that collects number of waves that are in flight.

2020-09-28 Thread Russell, Kent
[AMD Public Use]

Some minor typos

> -Original Message-
> From: amd-gfx  On Behalf Of Ramesh 
> Errabolu
> Sent: Friday, September 25, 2020 6:03 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Errabolu, Ramesh 
> Subject: [PATCH 2/3] drm/amd/amdgpu: Define and implement a function that 
> collects
> number of waves that are in flight.
> 
> [Why]
> Allow user to know how many compute units (CU) are in use at any given
> moment.
> 
> [How]
> Read registers of SQ that give number of waves that are in flight
> of various queues. Use this information to determine number of CU's
> in use.
> 
> Signed-off-by: Ramesh Errabolu 
> ---
>  .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 176 +-
>  .../gpu/drm/amd/include/kgd_kfd_interface.h   |  12 ++
>  2 files changed, 187 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> index e6aede725197..87d4c8855805 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
> @@ -38,7 +38,7 @@
>  #include "soc15d.h"
>  #include "mmhub_v1_0.h"
>  #include "gfxhub_v1_0.h"
> -
> +#include "gfx_v9_0.h"
> 
>  enum hqd_dequeue_request_type {
>   NO_ACTION = 0,
> @@ -706,6 +706,179 @@ void kgd_gfx_v9_set_vm_context_page_table_base(struct
> kgd_dev *kgd,
>   gfxhub_v1_0_setup_vm_pt_regs(adev, vmid, page_table_base);
>  }
> 
> +static void lock_spi_csq_mutexes(struct amdgpu_device *adev)
> +{
> + mutex_lock(>srbm_mutex);
> + mutex_lock(>grbm_idx_mutex);
> +
> +}
> +
> +static void unlock_spi_csq_mutexes(struct amdgpu_device *adev)
> +{
> + mutex_unlock(>grbm_idx_mutex);
> + mutex_unlock(>srbm_mutex);
> +}
> +
> +/**
> + * @get_wave_count: Read device registers to get number of waves in flight 
> for
> + * a particulare queue. The method also returns the VMID associated with the

particular

> + * queue.
> + *
> + * @adev: Handle of device whose registers are to be read
> + * @queue_idx: Index of queue in the queue-map bit-field
> + * @wave_cnt: Output parameter updated with number of waves in flight
> + * @vmid: Output parameter updated with VMID of queue whose wave count
> + * is being collected
> + */
> +static void get_wave_count(struct amdgpu_device *adev, int queue_idx,
> + int *wave_cnt, int *vmid)
> +{
> + int pipe_idx;
> + int queue_slot;
> + unsigned int reg_val;
> +
> + /*
> +  * Program GRBM with appropriate MEID, PIPEID, QUEUEID and VMID
> +  * parameters to read out waves in flight. Get VMID if there are
> +  * non-zero waves in flight.
> +  */
> + *vmid = 0xFF;
> + *wave_cnt = 0;
> + pipe_idx = queue_idx / adev->gfx.mec.num_queue_per_pipe;
> + queue_slot = queue_idx % adev->gfx.mec.num_queue_per_pipe;
> + soc15_grbm_select(adev, 1, pipe_idx, queue_slot, 0);
> + reg_val = RREG32(SOC15_REG_OFFSET(GC, 0, mmSPI_CSQ_WF_ACTIVE_COUNT_0) +
> +  queue_slot);
> + *wave_cnt = reg_val & SPI_CSQ_WF_ACTIVE_COUNT_0__COUNT_MASK;
> + if (*wave_cnt != 0)
> + *vmid = (RREG32_SOC15(GC, 0, mmCP_HQD_VMID) &
> +  CP_HQD_VMID__VMID_MASK) >> CP_HQD_VMID__VMID__SHIFT;
> +}
> +
> +/**
> + * @kgd_gfx_v9_get_cu_occupancy: Reads relevant registers associated with 
> each
> + * shader engine and aggregates the number of waves that are in fight for the
in flight

> + * process whose pasid is provided as a parameter. The process could have 
> ZERO
> + * or more queues running and submitting waves to compute units.
> + *
> + * @kgd: Handle of device from which to get number of waves in flight
> + * @pasid: Identifies the process for which this query call is invoked
> + * @wave_cnt: Output parameter updated with number of waves in flight that
> + * belong to process with given pasid
> + * @max_waves_per_cu: Output parameter updated with maximum number of waves
> + * possible per Compute Unit
> + *
> + * @note: It's possible that the device has too many queues 
> (oversubscription)
> + * in which case a VMID could be remapped to a different PASID. This could 
> lead
> + * to in accurate wave count. Following is a high-level sequence:
to an inaccurate

> + *Time T1: vmid = getVmid(); vmid is associated with Pasid P1
> + *Time T2: passId = getPasId(vmid); vmid is associated with Pasid P2
> + * In the sequence above wave count obtained from time T1 will be incorrectly
> + * lost or added to total wave count.
> + *
> + * The registers that provide the waves in flight are:
> + *
> + *  SPI_CSQ_WF_ACTIVE_STATUS - bit-map of queues per pipe. The bit is ON if a
> + *  queue is slotted, OFF if there is no queue. A process could have ZERO or
> + *  more queues slotted and submitting waves to be run on compute units. Even
> + *  when there is a queue it is possible there could be zero wave fronts, 
> this
> + *  can happen when queue is waiting on top-of-pipe events - e.g. 

[PATCH 2/3] drm/amd/amdgpu: Define and implement a function that collects number of waves that are in flight.

2020-09-25 Thread Ramesh Errabolu
[Why]
Allow user to know how many compute units (CU) are in use at any given
moment.

[How]
Read registers of SQ that give number of waves that are in flight
of various queues. Use this information to determine number of CU's
in use.

Signed-off-by: Ramesh Errabolu 
---
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 176 +-
 .../gpu/drm/amd/include/kgd_kfd_interface.h   |  12 ++
 2 files changed, 187 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
index e6aede725197..87d4c8855805 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
@@ -38,7 +38,7 @@
 #include "soc15d.h"
 #include "mmhub_v1_0.h"
 #include "gfxhub_v1_0.h"
-
+#include "gfx_v9_0.h"
 
 enum hqd_dequeue_request_type {
NO_ACTION = 0,
@@ -706,6 +706,179 @@ void kgd_gfx_v9_set_vm_context_page_table_base(struct 
kgd_dev *kgd,
gfxhub_v1_0_setup_vm_pt_regs(adev, vmid, page_table_base);
 }
 
+static void lock_spi_csq_mutexes(struct amdgpu_device *adev)
+{
+   mutex_lock(>srbm_mutex);
+   mutex_lock(>grbm_idx_mutex);
+
+}
+
+static void unlock_spi_csq_mutexes(struct amdgpu_device *adev)
+{
+   mutex_unlock(>grbm_idx_mutex);
+   mutex_unlock(>srbm_mutex);
+}
+
+/**
+ * @get_wave_count: Read device registers to get number of waves in flight for
+ * a particulare queue. The method also returns the VMID associated with the
+ * queue.
+ *
+ * @adev: Handle of device whose registers are to be read
+ * @queue_idx: Index of queue in the queue-map bit-field
+ * @wave_cnt: Output parameter updated with number of waves in flight
+ * @vmid: Output parameter updated with VMID of queue whose wave count
+ * is being collected
+ */
+static void get_wave_count(struct amdgpu_device *adev, int queue_idx,
+   int *wave_cnt, int *vmid)
+{
+   int pipe_idx;
+   int queue_slot;
+   unsigned int reg_val;
+
+   /*
+* Program GRBM with appropriate MEID, PIPEID, QUEUEID and VMID
+* parameters to read out waves in flight. Get VMID if there are
+* non-zero waves in flight.
+*/
+   *vmid = 0xFF;
+   *wave_cnt = 0;
+   pipe_idx = queue_idx / adev->gfx.mec.num_queue_per_pipe;
+   queue_slot = queue_idx % adev->gfx.mec.num_queue_per_pipe;
+   soc15_grbm_select(adev, 1, pipe_idx, queue_slot, 0);
+   reg_val = RREG32(SOC15_REG_OFFSET(GC, 0, mmSPI_CSQ_WF_ACTIVE_COUNT_0) +
+queue_slot);
+   *wave_cnt = reg_val & SPI_CSQ_WF_ACTIVE_COUNT_0__COUNT_MASK;
+   if (*wave_cnt != 0)
+   *vmid = (RREG32_SOC15(GC, 0, mmCP_HQD_VMID) &
+CP_HQD_VMID__VMID_MASK) >> CP_HQD_VMID__VMID__SHIFT;
+}
+
+/**
+ * @kgd_gfx_v9_get_cu_occupancy: Reads relevant registers associated with each
+ * shader engine and aggregates the number of waves that are in fight for the
+ * process whose pasid is provided as a parameter. The process could have ZERO
+ * or more queues running and submitting waves to compute units.
+ *
+ * @kgd: Handle of device from which to get number of waves in flight
+ * @pasid: Identifies the process for which this query call is invoked
+ * @wave_cnt: Output parameter updated with number of waves in flight that
+ * belong to process with given pasid
+ * @max_waves_per_cu: Output parameter updated with maximum number of waves
+ * possible per Compute Unit
+ *
+ * @note: It's possible that the device has too many queues (oversubscription)
+ * in which case a VMID could be remapped to a different PASID. This could lead
+ * to in accurate wave count. Following is a high-level sequence:
+ *Time T1: vmid = getVmid(); vmid is associated with Pasid P1
+ *Time T2: passId = getPasId(vmid); vmid is associated with Pasid P2
+ * In the sequence above wave count obtained from time T1 will be incorrectly
+ * lost or added to total wave count.
+ *
+ * The registers that provide the waves in flight are:
+ *
+ *  SPI_CSQ_WF_ACTIVE_STATUS - bit-map of queues per pipe. The bit is ON if a
+ *  queue is slotted, OFF if there is no queue. A process could have ZERO or
+ *  more queues slotted and submitting waves to be run on compute units. Even
+ *  when there is a queue it is possible there could be zero wave fronts, this
+ *  can happen when queue is waiting on top-of-pipe events - e.g. waitRegMem
+ *  command
+ *
+ *  For each bit that is ON from above:
+ *
+ *Read (SPI_CSQ_WF_ACTIVE_COUNT_0 + queue_idx) register. It provides the
+ *number of waves that are in flight for the queue at specified index. The
+ *index ranges from 0 to 7.
+ *
+ *If non-zero waves are in fligth, read CP_HQD_VMID register to obtain VMID
+ *of the wave(s).
+ *
+ *Determine if VMID from above step maps to pasid provided as parameter. If
+ *it matches agrregate the wave count. That the VMID will not match pasid 
is
+ *a normal condition i.e. a 

[PATCH 2/3] drm/amd/amdgpu: Define and implement a function that collects number of waves that are in flight.

2020-09-17 Thread Ramesh Errabolu
[Why]
Allow user to know how many compute units (CU) are in use at any given
moment.

[How]
Read registers of SQ that give number of waves that are in flight
of various queues. Use this information to determine number of CU's
in use.

Signed-off-by: Ramesh Errabolu 
---
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 206 ++
 .../gpu/drm/amd/include/kgd_kfd_interface.h   |  11 +
 2 files changed, 217 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
index e6aede725197..2f8c8140734e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
@@ -38,7 +38,9 @@
 #include "soc15d.h"
 #include "mmhub_v1_0.h"
 #include "gfxhub_v1_0.h"
+#include "gfx_v9_0.h"
 
+struct kfd_dev;
 
 enum hqd_dequeue_request_type {
NO_ACTION = 0,
@@ -706,6 +708,209 @@ void kgd_gfx_v9_set_vm_context_page_table_base(struct 
kgd_dev *kgd,
gfxhub_v1_0_setup_vm_pt_regs(adev, vmid, page_table_base);
 }
 
+static void lock_spi_csq_mutexes(struct amdgpu_device *adev)
+{
+   mutex_lock(>srbm_mutex);
+   mutex_lock(>grbm_idx_mutex);
+
+}
+
+static void unlock_spi_csq_mutexes(struct amdgpu_device *adev)
+{
+   mutex_unlock(>grbm_idx_mutex);
+   mutex_unlock(>srbm_mutex);
+}
+
+/**
+ * @get_wave_count: Read device registers to get number of waves in flight for
+ * a particulare queue. The method also returns the VMID associated with the
+ * queue.
+ *
+ * @adev: Handle of device whose registers are to be read
+ *
+ * @queue_idx: Index of queue in the queue-map bit-field
+ *
+ * @wave_cnt: Output parameter updated with number of waves in flight
+ *
+ * @vmid: Output parameter updated with VMID of queue whose wave count
+ * is being collected
+ */
+static void get_wave_count(struct amdgpu_device *adev, int queue_idx,
+  int *wave_cnt, int *vmid)
+{
+   int pipe_idx;
+   int queue_slot;
+   unsigned int reg_val;
+
+   /*
+* By policy queues at slots 0 and 1 are reserved for non-compute
+* queues i.e. those managed for graphic functions.
+*/
+   if ((queue_idx % adev->gfx.mec.num_queue_per_pipe) < 2)
+   return;
+
+   /*
+* Queue belongs to a compute workload. Determine the PIPE index
+* associated wit queue and program GRBM accordingly:
+* MEID = 1, PIPEID = pipe_idx, QUEUEID = queue_idx, VMID = 0
+*/
+   pipe_idx = queue_idx / adev->gfx.mec.num_queue_per_pipe;
+   queue_slot = queue_idx % adev->gfx.mec.num_queue_per_pipe;
+   soc15_grbm_select(adev, 1, pipe_idx, queue_slot, 0);
+
+   /*
+* Read from register number of waves in flight. If non-zero get the
+* VMID associated with queue
+*/
+   reg_val = RREG32(SOC15_REG_OFFSET(GC, 0, mmSPI_CSQ_WF_ACTIVE_COUNT_0) +
+queue_slot);
+   *wave_cnt = reg_val & SPI_CSQ_WF_ACTIVE_COUNT_0__COUNT_MASK;
+   if (*wave_cnt != 0)
+   *vmid = (RREG32_SOC15(GC, 0, mmCP_HQD_VMID) &
+CP_HQD_VMID__VMID_MASK) >> CP_HQD_VMID__VMID__SHIFT;
+}
+
+/**
+ * @kgd_gfx_v9_get_cu_occupancy: Reads relevant registers associated with each
+ * shader engine and aggregates the number of waves that are in fight for the
+ * process whose pasid is provided as a parameter. The process could have ZERO
+ * or more queues running and submitting waves to compute units.
+ *
+ * @note: It's possible that the device has too many queues (oversubscription)
+ * in which case a VMID could be remapped to a different PASID. This could lead
+ * to in accurate wave count. Following is a high-level sequence:
+ *Time T1: vmid = getVmid(); vmid is associated with Pasid P1
+ *Time T2: passId = getPasId(vmid); vmid is associated with Pasid P2
+ * In the sequence above wave count obtained from time T1 will be incorrectly
+ * lost or added to total wave count.
+ *
+ * @kgd: Handle of device from which to get number of waves in flight
+ *
+ * @pasid: Identifies the process for which this query call is invoked
+ *
+ * @wave_cnt: Output parameter updated with number of waves in flight that
+ * belong to process with given pasid
+ *
+ * The registers that provide the waves in flight are:
+ *
+ *  SPI_CSQ_WF_ACTIVE_STATUS - bit-map of queues per pipe. At any moment there
+ *  can be a max of 32 queues that could submit wave fronts to be run by 
compute
+ *  units. The bit is ON if a queue is slotted, OFF if there is no queue. The
+ *  process could have ZERO or more queues slotted and submitting waves to be
+ *  run compute units. Even when there is a queue it is possible there could
+ *  be zero wave fronts, this can happen when queue is waiting on top-of-pipe
+ *  events - e.g. waitRegMem command
+ *
+ *  For each bit that is ON from above:
+ *
+ *Read (SPI_CSQ_WF_ACTIVE_COUNT_0 + queue_idx) register. It provides the
+ *number of waves