Re: [PATCH] dmr/amdgpu: Fix wrongly unref of BO

2017-04-13 Thread zhoucm1



On 2017年04月14日 05:34, Alex Xie wrote:

According to comment of amdgpu_bo_reserve, amdgpu_bo_reserve
can return with -ERESTARTSYS. When this function was interrupted
by a signal, BO should not be unref. Otherwise the BO might be
released while is kmapped and pinned, or BO MIGHT be deref
multiple times, etc.

r = amdgpu_bo_reserve(adev->vram_scratch.robj, false);
we have specified interruptible to false, so -ERESTARTSYS isn't possible 
here.


Thanks,
David Zhou


Change-Id: If76071a768950a0d3ad9d5da7fcae04881807621
Signed-off-by: Alex Xie 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 53996e3..1dcc2d1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -355,8 +355,8 @@ static void amdgpu_vram_scratch_fini(struct amdgpu_device 
*adev)
amdgpu_bo_kunmap(adev->vram_scratch.robj);
amdgpu_bo_unpin(adev->vram_scratch.robj);
amdgpu_bo_unreserve(adev->vram_scratch.robj);
+   amdgpu_bo_unref(&adev->vram_scratch.robj);
}
-   amdgpu_bo_unref(&adev->vram_scratch.robj);
  }
  
  /**


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: fix to print incorrect wptr address

2017-04-13 Thread Michel Dänzer
On 14/04/17 11:50 AM, Huang Rui wrote:
> Signed-off-by: Huang Rui 
> ---
>  drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c 
> b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> index da4559b..4736196 100644
> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
> @@ -326,8 +326,8 @@ static void sdma_v4_0_ring_set_wptr(struct amdgpu_ring 
> *ring)
>   "mmSDMA%i_GFX_RB_WPTR == 0x%08x "
>   "mmSDMA%i_GFX_RB_WPTR_HI == 0x%08x \n",
>   me,
> - me,
>   lower_32_bits(ring->wptr << 2),
> + me,
>   upper_32_bits(ring->wptr << 2));
>   WREG32(sdma_v4_0_get_reg_offset(me, mmSDMA0_GFX_RB_WPTR), 
> lower_32_bits(ring->wptr << 2));
>   WREG32(sdma_v4_0_get_reg_offset(me, mmSDMA0_GFX_RB_WPTR_HI), 
> upper_32_bits(ring->wptr << 2));
> 

Reviewed-by: Michel Dänzer 


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdgpu: fix to print incorrect wptr address

2017-04-13 Thread Huang Rui
Signed-off-by: Huang Rui 
---
 drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c 
b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
index da4559b..4736196 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
@@ -326,8 +326,8 @@ static void sdma_v4_0_ring_set_wptr(struct amdgpu_ring 
*ring)
"mmSDMA%i_GFX_RB_WPTR == 0x%08x "
"mmSDMA%i_GFX_RB_WPTR_HI == 0x%08x \n",
me,
-   me,
lower_32_bits(ring->wptr << 2),
+   me,
upper_32_bits(ring->wptr << 2));
WREG32(sdma_v4_0_get_reg_offset(me, mmSDMA0_GFX_RB_WPTR), 
lower_32_bits(ring->wptr << 2));
WREG32(sdma_v4_0_get_reg_offset(me, mmSDMA0_GFX_RB_WPTR_HI), 
upper_32_bits(ring->wptr << 2));
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: AMDGPU without display output

2017-04-13 Thread Yu, Qiang

PS. If you want to use X11/OGL on this no display output card, just add an 
option to
the amdgpu kernel module "virtual_display=all" which will fake a display output 
so that
the xserver can start with amdgpu DDX successfully. You can use remote desktop 
apps
like VNC to view the X screen.

Regards,
Qiang

From: amd-gfx  on behalf of Dennis 
Schridde 
Sent: Thursday, April 13, 2017 11:20:22 PM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander
Subject: Re: AMDGPU without display output

Thanks, Alex!
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH split 3/3] Add support for high priority scheduling in amdgpu v9

2017-04-13 Thread Andres Rodriguez
Third part of the split of the series:
Add support for high priority scheduling in amdgpu v8

This is the part of the series that is in a bit more murky water than the rest.

Sending out this patch series mostly for completion. And maybe for discussion
purposes as well. There are still 2 issues open with this series:

  1) Is the spinlock patch still okay? Should we pursue this differently?

I'd rather not use a mutex here. That would mean that to program srbm registers
from an interrupt we'd need to dispatch a worker thread. That could mean extra
time that the CU reservation is in place which can impact performance.

So my preferred (biased) alternative is to still move to a spinlock.

Another alternative I'm not sure of: Can we take advantage of the KIQ FIFO
semantics to perform srbm writes atomically?

Something like:
   ib_append(ib, PKT_WRITE_REG(SRBM_SELECT(...)))
   ib_append(ib, PKT_WRITE_REG(SOME_REG, VAL)
   ib_append(ib, PKT_WRITE_REG(SRBM_SELECT(0, 0, 0)))
   ib_sumbit(kiq_ring, ib)

Something that makes this immediately feel wrong though is the possibility of a
race condition between an srbm operation on the KIQ and one through MMIO.

  2) Alex suggested changing some MMIO writes to happen on the KIQ instead. I
still haven't addressed that.

I'm not sure the full criteria for patches landing on -wip. But if these are 
good enough
to fix with some followup work, I wouldn't be oppossed to that idea.

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 4/4] drm/amdgpu: implement ring set_priority for gfx_v8 compute v5

2017-04-13 Thread Andres Rodriguez
Programming CP_HQD_QUEUE_PRIORITY enables a queue to take priority over
other queues on the same pipe. Multiple queues on a pipe are timesliced
so this gives us full precedence over other queues.

Programming CP_HQD_PIPE_PRIORITY changes the SPI_ARB_PRIORITY of the
wave as follows:
0x2: CS_H
0x1: CS_M
0x0: CS_L

The SPI block will then dispatch work according to the policy set by
SPI_ARB_PRIORITY. In the current policy CS_H is higher priority than
gfx.

In order to prevent getting stuck in loops of CUs bouncing between GFX
and high priority compute and introducing further latency, we reserve
CUs 2+ for high priority compute on-demand.

v2: fix srbm_select to ring->queue and use ring->funcs->type
v3: use AMD_SCHED_PRIORITY_* instead of AMDGPU_CTX_PRIORITY_*
v4: switch int to enum amd_sched_priority
v5: corresponding changes for srbm_lock

Acked-by: Christian König 
Signed-off-by: Andres Rodriguez 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h|  3 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  1 +
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c  | 95 ++
 3 files changed, 99 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 68350ca..4e81a8e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1057,40 +1057,43 @@ struct amdgpu_gfx {
const struct firmware   *pfp_fw; /* PFP firmware */
uint32_tpfp_fw_version;
const struct firmware   *ce_fw; /* CE firmware */
uint32_tce_fw_version;
const struct firmware   *rlc_fw; /* RLC firmware */
uint32_trlc_fw_version;
const struct firmware   *mec_fw; /* MEC firmware */
uint32_tmec_fw_version;
const struct firmware   *mec2_fw; /* MEC2 firmware */
uint32_tmec2_fw_version;
uint32_tme_feature_version;
uint32_tce_feature_version;
uint32_tpfp_feature_version;
uint32_trlc_feature_version;
uint32_tmec_feature_version;
uint32_tmec2_feature_version;
struct amdgpu_ring  gfx_ring[AMDGPU_MAX_GFX_RINGS];
unsignednum_gfx_rings;
struct amdgpu_ring  compute_ring[AMDGPU_MAX_COMPUTE_RINGS];
unsignednum_compute_rings;
+   spinlock_t  cu_reserve_lock;
+   uint32_tcu_reserve_pipe_mask;
+   uint32_t
cu_reserve_queue_mask[AMDGPU_MAX_COMPUTE_RINGS];
struct amdgpu_irq_src   eop_irq;
struct amdgpu_irq_src   priv_reg_irq;
struct amdgpu_irq_src   priv_inst_irq;
/* gfx status */
uint32_tgfx_current_status;
/* ce ram size*/
unsignedce_ram_size;
struct amdgpu_cu_info   cu_info;
const struct amdgpu_gfx_funcs   *funcs;
 
/* reset mask */
uint32_tgrbm_soft_reset;
uint32_tsrbm_soft_reset;
boolin_reset;
/* s3/s4 mask */
boolin_suspend;
/* NGG */
struct amdgpu_ngg   ngg;
 };
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 674256a..971303d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -1858,40 +1858,41 @@ int amdgpu_device_init(struct amdgpu_device *adev,
mutex_init(&adev->firmware.mutex);
mutex_init(&adev->pm.mutex);
mutex_init(&adev->gfx.gpu_clock_mutex);
spin_lock_init(&adev->srbm_lock);
mutex_init(&adev->grbm_idx_mutex);
mutex_init(&adev->mn_lock);
hash_init(adev->mn_hash);
 
amdgpu_check_arguments(adev);
 
/* Registers mapping */
/* TODO: block userspace mapping of io register */
spin_lock_init(&adev->mmio_idx_lock);
spin_lock_init(&adev->smc_idx_lock);
spin_lock_init(&adev->pcie_idx_lock);
spin_lock_init(&adev->uvd_ctx_idx_lock);
spin_lock_init(&adev->didt_idx_lock);
spin_lock_init(&adev->gc_cac_idx_lock);
spin_lock_init(&adev->audio_endpt_idx_lock);
spin_lock_init(&adev->mm_stats.lock);
+   spin_lock_init(&adev->gfx.cu_reserve_lock);
 
INIT_LIST_HEAD(&adev->shadow_list);
mutex_init(&adev->shadow_list_lock);
 
INIT_LIST_HEAD(&adev->gtt_list);
spin_lock_init(&adev->gtt_list_lock);
 
INIT_LIST_HEAD(&adev->ring_lru_list);
  

[PATCH 1/4] drm/amdgpu: add parameter to allocate high priority contexts v7

2017-04-13 Thread Andres Rodriguez
Add a new context creation parameter to express a global context priority.

Contexts allocated with AMDGPU_CTX_PRIORITY_HIGH will receive higher
priority to schedule their work than AMDGPU_CTX_PRIORITY_NORMAL
(default) contexts.

v2: Instead of using flags, repurpose __pad
v3: Swap enum values of _NORMAL _HIGH for backwards compatibility
v4: Validate usermode priority and store it
v5: Move priority validation into amdgpu_ctx_ioctl(), headline reword
v6: add UAPI note regarding priorities requiring CAP_SYS_ADMIN
v7: remove ctx->priority

Reviewed-by: Emil Velikov 
Reviewed-by: Christian König 
Signed-off-by: Andres Rodriguez 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c   | 36 ---
 drivers/gpu/drm/amd/scheduler/gpu_scheduler.h |  1 +
 include/uapi/drm/amdgpu_drm.h |  8 +-
 3 files changed, 40 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
index 1969f27..df6fc9d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -8,67 +8,75 @@
  * and/or sell copies of the Software, and to permit persons to whom the
  * Software is furnished to do so, subject to the following conditions:
  *
  * The above copyright notice and this permission notice shall be included in
  * all copies or substantial portions of the Software.
  *
  * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
  * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
  * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
  * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
  * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
  * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
  * OTHER DEALINGS IN THE SOFTWARE.
  *
  * Authors: monk liu 
  */
 
 #include 
 #include "amdgpu.h"
 
-static int amdgpu_ctx_init(struct amdgpu_device *adev, struct amdgpu_ctx *ctx)
+static int amdgpu_ctx_init(struct amdgpu_device *adev,
+  enum amd_sched_priority priority,
+  struct amdgpu_ctx *ctx)
 {
unsigned i, j;
int r;
 
+   if (priority < 0 || priority >= AMD_SCHED_PRIORITY_MAX)
+   return -EINVAL;
+
+   if (priority == AMD_SCHED_PRIORITY_HIGH && !capable(CAP_SYS_ADMIN))
+   return -EACCES;
+
memset(ctx, 0, sizeof(*ctx));
ctx->adev = adev;
kref_init(&ctx->refcount);
spin_lock_init(&ctx->ring_lock);
ctx->fences = kcalloc(amdgpu_sched_jobs * AMDGPU_MAX_RINGS,
  sizeof(struct dma_fence*), GFP_KERNEL);
if (!ctx->fences)
return -ENOMEM;
 
for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
ctx->rings[i].sequence = 1;
ctx->rings[i].fences = &ctx->fences[amdgpu_sched_jobs * i];
}
 
ctx->reset_counter = atomic_read(&adev->gpu_reset_counter);
 
/* create context entity for each ring */
for (i = 0; i < adev->num_rings; i++) {
struct amdgpu_ring *ring = adev->rings[i];
struct amd_sched_rq *rq;
 
-   rq = &ring->sched.sched_rq[AMD_SCHED_PRIORITY_NORMAL];
+   rq = &ring->sched.sched_rq[priority];
r = amd_sched_entity_init(&ring->sched, &ctx->rings[i].entity,
  rq, amdgpu_sched_jobs);
if (r)
goto failed;
}
 
r = amdgpu_queue_mgr_init(adev, &ctx->queue_mgr);
if (r)
goto failed;
 
return 0;
 
 failed:
for (j = 0; j < i; j++)
amd_sched_entity_fini(&adev->rings[j]->sched,
  &ctx->rings[j].entity);
kfree(ctx->fences);
ctx->fences = NULL;
return r;
 }
@@ -79,59 +87,61 @@ static void amdgpu_ctx_fini(struct amdgpu_ctx *ctx)
unsigned i, j;
 
if (!adev)
return;
 
for (i = 0; i < AMDGPU_MAX_RINGS; ++i)
for (j = 0; j < amdgpu_sched_jobs; ++j)
dma_fence_put(ctx->rings[i].fences[j]);
kfree(ctx->fences);
ctx->fences = NULL;
 
for (i = 0; i < adev->num_rings; i++)
amd_sched_entity_fini(&adev->rings[i]->sched,
  &ctx->rings[i].entity);
 
amdgpu_queue_mgr_fini(adev, &ctx->queue_mgr);
 }
 
 static int amdgpu_ctx_alloc(struct amdgpu_device *adev,
struct amdgpu_fpriv *fpriv,
+   enum amd_sched_priority priority,
uint32_t *id)
 {
struct amdgpu_ctx_mgr *mgr = &fpriv->ctx_mgr;
struct amdgpu_ctx *ctx;
int r;
 
ctx = kmalloc(sizeof(*ctx), GFP_KERNEL);
if (!ctx)
return -ENOMEM;
 
mut

[PATCH 3/4] drm/amdgpu: convert srbm lock to a spinlock v2

2017-04-13 Thread Andres Rodriguez
Replace adev->srbm_mutex with a spinlock adev->srbm_lock

v2: rebased on 4.12 and included gfx9
Signed-off-by: Andres Rodriguez 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c |  4 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c |  4 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c|  2 +-
 drivers/gpu/drm/amd/amdgpu/cik_sdma.c |  4 +--
 drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | 20 ++---
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 34 +++
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 24 
 drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c|  4 +--
 drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c|  4 +--
 10 files changed, 51 insertions(+), 51 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index a9b7a61..68350ca 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1465,41 +1465,41 @@ struct amdgpu_device {
enum amd_asic_type  asic_type;
uint32_tfamily;
uint32_trev_id;
uint32_texternal_rev_id;
unsigned long   flags;
int usec_timeout;
const struct amdgpu_asic_funcs  *asic_funcs;
boolshutdown;
boolneed_dma32;
boolaccel_working;
struct work_struct  reset_work;
struct notifier_block   acpi_nb;
struct amdgpu_i2c_chan  *i2c_bus[AMDGPU_MAX_I2C_BUS];
struct amdgpu_debugfs   debugfs[AMDGPU_DEBUGFS_MAX_COMPONENTS];
unsigneddebugfs_count;
 #if defined(CONFIG_DEBUG_FS)
struct dentry   
*debugfs_regs[AMDGPU_DEBUGFS_MAX_COMPONENTS];
 #endif
struct amdgpu_atif  atif;
struct amdgpu_atcs  atcs;
-   struct mutexsrbm_mutex;
+   spinlock_t  srbm_lock;
/* GRBM index mutex. Protects concurrent access to GRBM index */
struct mutexgrbm_idx_mutex;
struct dev_pm_domainvga_pm_domain;
boolhave_disp_power_ref;
 
/* BIOS */
boolis_atom_fw;
uint8_t *bios;
uint32_tbios_size;
struct amdgpu_bo*stollen_vga_memory;
uint32_tbios_scratch_reg_offset;
uint32_tbios_scratch[AMDGPU_BIOS_NUM_SCRATCH];
 
/* Register/doorbell mmio */
resource_size_t rmmio_base;
resource_size_t rmmio_size;
void __iomem*rmmio;
/* protects concurrent MM_INDEX/DATA based register access */
spinlock_t mmio_idx_lock;
/* protects concurrent SMC based register access */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
index 5254562..a009990 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
@@ -152,50 +152,50 @@ static const struct kfd2kgd_calls kfd2kgd = {
.write_vmid_invalidate_request = write_vmid_invalidate_request,
.get_fw_version = get_fw_version
 };
 
 struct kfd2kgd_calls *amdgpu_amdkfd_gfx_7_get_functions(void)
 {
return (struct kfd2kgd_calls *)&kfd2kgd;
 }
 
 static inline struct amdgpu_device *get_amdgpu_device(struct kgd_dev *kgd)
 {
return (struct amdgpu_device *)kgd;
 }
 
 static void lock_srbm(struct kgd_dev *kgd, uint32_t mec, uint32_t pipe,
uint32_t queue, uint32_t vmid)
 {
struct amdgpu_device *adev = get_amdgpu_device(kgd);
uint32_t value = PIPEID(pipe) | MEID(mec) | VMID(vmid) | QUEUEID(queue);
 
-   mutex_lock(&adev->srbm_mutex);
+   spin_lock(&adev->srbm_lock);
WREG32(mmSRBM_GFX_CNTL, value);
 }
 
 static void unlock_srbm(struct kgd_dev *kgd)
 {
struct amdgpu_device *adev = get_amdgpu_device(kgd);
 
WREG32(mmSRBM_GFX_CNTL, 0);
-   mutex_unlock(&adev->srbm_mutex);
+   spin_unlock(&adev->srbm_lock);
 }
 
 static void acquire_queue(struct kgd_dev *kgd, uint32_t pipe_id,
uint32_t queue_id)
 {
struct amdgpu_device *adev = get_amdgpu_device(kgd);
 
uint32_t mec = (++pipe_id / adev->gfx.mec.num_pipe_per_mec) + 1;
uint32_t pipe = (pipe_id % adev->gfx.mec.num_pipe_per_mec);
 
lock_srbm(kgd, mec, pipe, queue_id, 0);
 }
 
 static void release_queue(struct kgd_dev *kgd)
 {
unlock_srbm(kgd);
 }
 
 static void kgd_progr

[PATCH 2/4] drm/amdgpu: add framework for HW specific priority settings v6

2017-04-13 Thread Andres Rodriguez
Add an initial framework for changing the HW priorities of rings. The
framework allows requesting priority changes for the lifetime of an
amdgpu_job. After the job completes the priority will decay to the next
lowest priority for which a request is still valid.

A new ring function set_priority() can now be populated to take care of
the HW specific programming sequence for priority changes.

v2: set priority before emitting IB, and take a ref on amdgpu_job
v3: use AMD_SCHED_PRIORITY_* instead of AMDGPU_CTX_PRIORITY_*
v4: plug amdgpu_ring_restore_priority_cb into amdgpu_job_free_cb
v5: use atomic for tracking job priorities instead of last_job
v6: rename amdgpu_ring_priority_[get/put]() and align parameters

Signed-off-by: Andres Rodriguez 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c   |  7 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c  | 78 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h  | 15 ++
 drivers/gpu/drm/amd/scheduler/gpu_scheduler.h |  7 +++
 4 files changed, 106 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index 86a1242..ac90dfc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -78,40 +78,41 @@ int amdgpu_job_alloc_with_ib(struct amdgpu_device *adev, 
unsigned size,
 
return r;
 }
 
 void amdgpu_job_free_resources(struct amdgpu_job *job)
 {
struct dma_fence *f;
unsigned i;
 
/* use sched fence if available */
f = job->base.s_fence ? &job->base.s_fence->finished : job->fence;
 
for (i = 0; i < job->num_ibs; ++i)
amdgpu_ib_free(job->adev, &job->ibs[i], f);
 }
 
 static void amdgpu_job_free_cb(struct amd_sched_job *s_job)
 {
struct amdgpu_job *job = container_of(s_job, struct amdgpu_job, base);
 
+   amdgpu_ring_priority_put(job->ring, amd_sched_get_job_priority(s_job));
dma_fence_put(job->fence);
amdgpu_sync_free(&job->sync);
kfree(job);
 }
 
 void amdgpu_job_free(struct amdgpu_job *job)
 {
amdgpu_job_free_resources(job);
 
dma_fence_put(job->fence);
amdgpu_sync_free(&job->sync);
kfree(job);
 }
 
 int amdgpu_job_submit(struct amdgpu_job *job, struct amdgpu_ring *ring,
  struct amd_sched_entity *entity, void *owner,
  struct dma_fence **f)
 {
int r;
job->ring = ring;
@@ -152,38 +153,44 @@ static struct dma_fence *amdgpu_job_dependency(struct 
amd_sched_job *sched_job)
fence = amdgpu_sync_get_fence(&job->sync);
}
 
return fence;
 }
 
 static struct dma_fence *amdgpu_job_run(struct amd_sched_job *sched_job)
 {
struct dma_fence *fence = NULL;
struct amdgpu_job *job;
int r;
 
if (!sched_job) {
DRM_ERROR("job is null\n");
return NULL;
}
job = to_amdgpu_job(sched_job);
 
BUG_ON(amdgpu_sync_peek_fence(&job->sync, NULL));
 
+   r = amdgpu_ring_priority_get(job->ring,
+amd_sched_get_job_priority(&job->base));
+   if (r)
+   DRM_ERROR("Failed to set job priority (%d)\n", r);
+
trace_amdgpu_sched_run_job(job);
r = amdgpu_ib_schedule(job->ring, job->num_ibs, job->ibs, job, &fence);
if (r)
DRM_ERROR("Error scheduling IBs (%d)\n", r);
 
/* if gpu reset, hw fence will be replaced here */
dma_fence_put(job->fence);
job->fence = dma_fence_get(fence);
+
amdgpu_job_free_resources(job);
return fence;
 }
 
 const struct amd_sched_backend_ops amdgpu_sched_ops = {
.dependency = amdgpu_job_dependency,
.run_job = amdgpu_job_run,
.timedout_job = amdgpu_job_timedout,
.free_job = amdgpu_job_free_cb
 };
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
index 7486277..09fa8f7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
@@ -183,55 +183,126 @@ void amdgpu_ring_commit(struct amdgpu_ring *ring)
 
amdgpu_ring_lru_touch(ring->adev, ring);
 }
 
 /**
  * amdgpu_ring_undo - reset the wptr
  *
  * @ring: amdgpu_ring structure holding ring information
  *
  * Reset the driver's copy of the wptr (all asics).
  */
 void amdgpu_ring_undo(struct amdgpu_ring *ring)
 {
ring->wptr = ring->wptr_old;
 
if (ring->funcs->end_use)
ring->funcs->end_use(ring);
 }
 
 /**
+ * amdgpu_ring_priority_put - restore a ring's priority
+ *
+ * @ring: amdgpu_ring structure holding the information
+ * @priority: target priority
+ *
+ * Release a request for executing at @priority
+ */
+void amdgpu_ring_priority_put(struct amdgpu_ring *ring,
+ enum amd_sched_priority priority)
+{
+   int i;
+
+   if (!ring->funcs->set_priority)
+   return;

[PATCH 5/6] drm/amdgpu: guarantee bijective mapping of ring ids for LRU v3

2017-04-13 Thread Andres Rodriguez
Depending on usage patterns, the current LRU policy may create a
non-injective mapping between userspace ring ids and kernel rings.

This behaviour is undesired as apps that attempt to fill all HW blocks
would be unable to reach some of them.

This change forces the LRU policy to create bijective mappings only.

v2: compress ring_blacklist
v3: simplify amdgpu_ring_is_blacklisted() logic

Signed-off-by: Andres Rodriguez 
Reviewed-by: Nicolai Hähnle 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_queue_mgr.c | 16 +++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c  | 33 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h  |  4 ++--
 3 files changed, 42 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_queue_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_queue_mgr.c
index 054d750..5a7c691 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_queue_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_queue_mgr.c
@@ -98,44 +98,56 @@ static enum amdgpu_ring_type amdgpu_hw_ip_to_ring_type(int 
hw_ip)
return AMDGPU_RING_TYPE_GFX;
case AMDGPU_HW_IP_COMPUTE:
return AMDGPU_RING_TYPE_COMPUTE;
case AMDGPU_HW_IP_DMA:
return AMDGPU_RING_TYPE_SDMA;
case AMDGPU_HW_IP_UVD:
return AMDGPU_RING_TYPE_UVD;
case AMDGPU_HW_IP_VCE:
return AMDGPU_RING_TYPE_VCE;
default:
DRM_ERROR("Invalid HW IP specified %d\n", hw_ip);
return -1;
}
 }
 
 static int amdgpu_lru_map(struct amdgpu_device *adev,
  struct amdgpu_queue_mapper *mapper,
  int user_ring,
  struct amdgpu_ring **out_ring)
 {
-   int r;
+   int r, i, j;
int ring_type = amdgpu_hw_ip_to_ring_type(mapper->hw_ip);
+   int ring_blacklist[AMDGPU_MAX_RINGS];
+   struct amdgpu_ring *ring;
 
-   r = amdgpu_ring_lru_get(adev, ring_type, out_ring);
+   /* 0 is a valid ring index, so initialize to -1 */
+   memset(ring_blacklist, 0xff, sizeof(ring_blacklist));
+
+   for (i = 0, j = 0; i < AMDGPU_MAX_RINGS; i++) {
+   ring = mapper->queue_map[i];
+   if (ring)
+   ring_blacklist[j++] = ring->idx;
+   }
+
+   r = amdgpu_ring_lru_get(adev, ring_type, ring_blacklist,
+   j, out_ring);
if (r)
return r;
 
return amdgpu_update_cached_map(mapper, user_ring, *out_ring);
 }
 
 /**
  * amdgpu_queue_mgr_init - init an amdgpu_queue_mgr struct
  *
  * @adev: amdgpu_device pointer
  * @mgr: amdgpu_queue_mgr structure holding queue information
  *
  * Initialize the the selected @mgr (all asics).
  *
  * Returns 0 on success, error on failure.
  */
 int amdgpu_queue_mgr_init(struct amdgpu_device *adev,
  struct amdgpu_queue_mgr *mgr)
 {
int i, r;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
index 2b452b0..7486277 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
@@ -333,66 +333,85 @@ void amdgpu_ring_fini(struct amdgpu_ring *ring)
amdgpu_wb_free(ring->adev, ring->wptr_offs);
}
 
 
amdgpu_bo_free_kernel(&ring->ring_obj,
  &ring->gpu_addr,
  (void **)&ring->ring);
 
amdgpu_debugfs_ring_fini(ring);
 
ring->adev->rings[ring->idx] = NULL;
 }
 
 static void amdgpu_ring_lru_touch_locked(struct amdgpu_device *adev,
 struct amdgpu_ring *ring)
 {
/* list_move_tail handles the case where ring isn't part of the list */
list_move_tail(&ring->lru_list, &adev->ring_lru_list);
 }
 
+static bool amdgpu_ring_is_blacklisted(struct amdgpu_ring *ring,
+  int *blacklist, int num_blacklist)
+{
+   int i;
+
+   for (i = 0; i < num_blacklist; i++) {
+   if (ring->idx == blacklist[i])
+   return true;
+   }
+
+   return false;
+}
+
 /**
  * amdgpu_ring_lru_get - get the least recently used ring for a HW IP block
  *
  * @adev: amdgpu_device pointer
  * @type: amdgpu_ring_type enum
+ * @blacklist: blacklisted ring ids array
+ * @num_blacklist: number of entries in @blacklist
  * @ring: output ring
  *
  * Retrieve the amdgpu_ring structure for the least recently used ring of
  * a specific IP block (all asics).
  * Returns 0 on success, error on failure.
  */
-int amdgpu_ring_lru_get(struct amdgpu_device *adev, int type,
-   struct amdgpu_ring **ring)
+int amdgpu_ring_lru_get(struct amdgpu_device *adev, int type, int *blacklist,
+   int num_blacklist, struct amdgpu_ring **ring)
 {
struct amdgpu_ring *entry;
 
/* List is sorted in LRU order, find first entry corresponding
 * to the desired

[PATCH split 2/3] LRU map compute/SDMA user ring ids to kernel ring ids

2017-04-13 Thread Andres Rodriguez
Second part of the split of the series:
Add support for high priority scheduling in amdgpu v8

These patches should be close to being good enough to land.

The first two patches are simple fixes I've ported from the ROCm branch. These
still need review.

I've fixed all of Christian's comments for patch 04:
drm/amdgpu: implement lru amdgpu_queue_mgr policy for compute v4



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 1/6] drm/amdgpu: condense mqd programming sequence

2017-04-13 Thread Andres Rodriguez
The MQD structure matches the reg layout. Take advantage of this to
simplify HQD programming.

Note that the ACTIVE field still needs to be programmed last.

Suggested-by: Felix Kuehling 
Signed-off-by: Andres Rodriguez 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | 44 +-
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 84 +--
 2 files changed, 23 insertions(+), 105 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
index c0844a5..85321d6 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
@@ -3118,81 +3118,59 @@ static void gfx_v7_0_mqd_init(struct amdgpu_device 
*adev,
mqd->cp_hqd_ib_rptr = RREG32(mmCP_HQD_IB_RPTR);
mqd->cp_hqd_persistent_state = RREG32(mmCP_HQD_PERSISTENT_STATE);
mqd->cp_hqd_sema_cmd = RREG32(mmCP_HQD_SEMA_CMD);
mqd->cp_hqd_msg_type = RREG32(mmCP_HQD_MSG_TYPE);
mqd->cp_hqd_atomic0_preop_lo = RREG32(mmCP_HQD_ATOMIC0_PREOP_LO);
mqd->cp_hqd_atomic0_preop_hi = RREG32(mmCP_HQD_ATOMIC0_PREOP_HI);
mqd->cp_hqd_atomic1_preop_lo = RREG32(mmCP_HQD_ATOMIC1_PREOP_LO);
mqd->cp_hqd_atomic1_preop_hi = RREG32(mmCP_HQD_ATOMIC1_PREOP_HI);
mqd->cp_hqd_pq_rptr = RREG32(mmCP_HQD_PQ_RPTR);
mqd->cp_hqd_quantum = RREG32(mmCP_HQD_QUANTUM);
mqd->cp_hqd_pipe_priority = RREG32(mmCP_HQD_PIPE_PRIORITY);
mqd->cp_hqd_queue_priority = RREG32(mmCP_HQD_QUEUE_PRIORITY);
mqd->cp_hqd_iq_rptr = RREG32(mmCP_HQD_IQ_RPTR);
 
/* activate the queue */
mqd->cp_hqd_active = 1;
 }
 
 int gfx_v7_0_mqd_commit(struct amdgpu_device *adev, struct cik_mqd *mqd)
 {
-   u32 tmp;
+   uint32_t tmp;
+   uint32_t mqd_reg;
+   uint32_t *mqd_data;
+
+   /* HQD registers extend from mmCP_MQD_BASE_ADDR to mmCP_MQD_CONTROL */
+   mqd_data = &mqd->cp_mqd_base_addr_lo;
 
/* disable wptr polling */
tmp = RREG32(mmCP_PQ_WPTR_POLL_CNTL);
tmp = REG_SET_FIELD(tmp, CP_PQ_WPTR_POLL_CNTL, EN, 0);
WREG32(mmCP_PQ_WPTR_POLL_CNTL, tmp);
 
-   /* program MQD field to HW */
-   WREG32(mmCP_MQD_BASE_ADDR, mqd->cp_mqd_base_addr_lo);
-   WREG32(mmCP_MQD_BASE_ADDR_HI, mqd->cp_mqd_base_addr_hi);
-   WREG32(mmCP_MQD_CONTROL, mqd->cp_mqd_control);
-   WREG32(mmCP_HQD_PQ_BASE, mqd->cp_hqd_pq_base_lo);
-   WREG32(mmCP_HQD_PQ_BASE_HI, mqd->cp_hqd_pq_base_hi);
-   WREG32(mmCP_HQD_PQ_CONTROL, mqd->cp_hqd_pq_control);
-   WREG32(mmCP_HQD_PQ_WPTR_POLL_ADDR, mqd->cp_hqd_pq_wptr_poll_addr_lo);
-   WREG32(mmCP_HQD_PQ_WPTR_POLL_ADDR_HI, mqd->cp_hqd_pq_wptr_poll_addr_hi);
-   WREG32(mmCP_HQD_PQ_RPTR_REPORT_ADDR, 
mqd->cp_hqd_pq_rptr_report_addr_lo);
-   WREG32(mmCP_HQD_PQ_RPTR_REPORT_ADDR_HI, 
mqd->cp_hqd_pq_rptr_report_addr_hi);
-   WREG32(mmCP_HQD_PQ_DOORBELL_CONTROL, mqd->cp_hqd_pq_doorbell_control);
-   WREG32(mmCP_HQD_PQ_WPTR, mqd->cp_hqd_pq_wptr);
-   WREG32(mmCP_HQD_VMID, mqd->cp_hqd_vmid);
-
-   WREG32(mmCP_HQD_IB_CONTROL, mqd->cp_hqd_ib_control);
-   WREG32(mmCP_HQD_IB_BASE_ADDR, mqd->cp_hqd_ib_base_addr_lo);
-   WREG32(mmCP_HQD_IB_BASE_ADDR_HI, mqd->cp_hqd_ib_base_addr_hi);
-   WREG32(mmCP_HQD_IB_RPTR, mqd->cp_hqd_ib_rptr);
-   WREG32(mmCP_HQD_PERSISTENT_STATE, mqd->cp_hqd_persistent_state);
-   WREG32(mmCP_HQD_SEMA_CMD, mqd->cp_hqd_sema_cmd);
-   WREG32(mmCP_HQD_MSG_TYPE, mqd->cp_hqd_msg_type);
-   WREG32(mmCP_HQD_ATOMIC0_PREOP_LO, mqd->cp_hqd_atomic0_preop_lo);
-   WREG32(mmCP_HQD_ATOMIC0_PREOP_HI, mqd->cp_hqd_atomic0_preop_hi);
-   WREG32(mmCP_HQD_ATOMIC1_PREOP_LO, mqd->cp_hqd_atomic1_preop_lo);
-   WREG32(mmCP_HQD_ATOMIC1_PREOP_HI, mqd->cp_hqd_atomic1_preop_hi);
-   WREG32(mmCP_HQD_PQ_RPTR, mqd->cp_hqd_pq_rptr);
-   WREG32(mmCP_HQD_QUANTUM, mqd->cp_hqd_quantum);
-   WREG32(mmCP_HQD_PIPE_PRIORITY, mqd->cp_hqd_pipe_priority);
-   WREG32(mmCP_HQD_QUEUE_PRIORITY, mqd->cp_hqd_queue_priority);
-   WREG32(mmCP_HQD_IQ_RPTR, mqd->cp_hqd_iq_rptr);
+   /* program all HQD registers */
+   for (mqd_reg = mmCP_HQD_VMID; mqd_reg <= mmCP_MQD_CONTROL; mqd_reg++)
+   WREG32(mqd_reg, mqd_data[mqd_reg - mmCP_MQD_BASE_ADDR]);
 
/* activate the HQD */
-   WREG32(mmCP_HQD_ACTIVE, mqd->cp_hqd_active);
+   for (mqd_reg = mmCP_MQD_BASE_ADDR; mqd_reg <= mmCP_HQD_ACTIVE; 
mqd_reg++)
+   WREG32(mqd_reg, mqd_data[mqd_reg - mmCP_MQD_BASE_ADDR]);
 
return 0;
 }
 
 static int gfx_v7_0_compute_queue_init(struct amdgpu_device *adev, int ring_id)
 {
int r;
u64 mqd_gpu_addr;
struct cik_mqd *mqd;
struct amdgpu_ring *ring = &adev->gfx.compute_ring[ring_id];
 
if (ring->mqd_obj == NULL) {
r = amdgpu_bo_create(adev,
sizeof(struct cik_mqd),
PAGE_SIZE, true,
AMDGPU_GEM_DOMAIN_GTT, 0, 

[PATCH 4/6] drm/amdgpu: implement lru amdgpu_queue_mgr policy for compute v4

2017-04-13 Thread Andres Rodriguez
Use an LRU policy to map usermode rings to HW compute queues.

Most compute clients use one queue, and usually the first queue
available. This results in poor pipe/queue work distribution when
multiple compute apps are running. In most cases pipe 0 queue 0 is
the only queue that gets used.

In order to better distribute work across multiple HW queues, we adopt
a policy to map the usermode ring ids to the LRU HW queue.

This fixes a large majority of multi-app compute workloads sharing the
same HW queue, even though 7 other queues are available.

v2: use ring->funcs->type instead of ring->hw_ip
v3: remove amdgpu_queue_mapper_funcs
v4: change ring_lru_list_lock to spinlock, grab only once in lru_get()

Signed-off-by: Andres Rodriguez 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  3 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c|  3 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_queue_mgr.c | 38 +++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c  | 63 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h  |  4 ++
 5 files changed, 110 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 1d9053f..a9b7a61 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1617,40 +1617,43 @@ struct amdgpu_device {
int num_ip_blocks;
struct mutexmn_lock;
DECLARE_HASHTABLE(mn_hash, 7);
 
/* tracking pinned memory */
u64 vram_pin_size;
u64 invisible_pin_size;
u64 gart_pin_size;
 
/* amdkfd interface */
struct kfd_dev  *kfd;
 
struct amdgpu_virt  virt;
 
/* link all shadow bo */
struct list_headshadow_list;
struct mutexshadow_list_lock;
/* link all gtt */
spinlock_t  gtt_list_lock;
struct list_headgtt_list;
+   /* keep an lru list of rings by HW IP */
+   struct list_headring_lru_list;
+   spinlock_t  ring_lru_list_lock;
 
/* record hw reset is performed */
bool has_hw_reset;
 
 };
 
 static inline struct amdgpu_device *amdgpu_ttm_adev(struct ttm_bo_device *bdev)
 {
return container_of(bdev, struct amdgpu_device, mman.bdev);
 }
 
 bool amdgpu_device_is_px(struct drm_device *dev);
 int amdgpu_device_init(struct amdgpu_device *adev,
   struct drm_device *ddev,
   struct pci_dev *pdev,
   uint32_t flags);
 void amdgpu_device_fini(struct amdgpu_device *adev);
 int amdgpu_gpu_wait_for_idle(struct amdgpu_device *adev);
 
 uint32_t amdgpu_mm_rreg(struct amdgpu_device *adev, uint32_t reg,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 724b4c1..2acceef 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -1865,40 +1865,43 @@ int amdgpu_device_init(struct amdgpu_device *adev,
 
amdgpu_check_arguments(adev);
 
/* Registers mapping */
/* TODO: block userspace mapping of io register */
spin_lock_init(&adev->mmio_idx_lock);
spin_lock_init(&adev->smc_idx_lock);
spin_lock_init(&adev->pcie_idx_lock);
spin_lock_init(&adev->uvd_ctx_idx_lock);
spin_lock_init(&adev->didt_idx_lock);
spin_lock_init(&adev->gc_cac_idx_lock);
spin_lock_init(&adev->audio_endpt_idx_lock);
spin_lock_init(&adev->mm_stats.lock);
 
INIT_LIST_HEAD(&adev->shadow_list);
mutex_init(&adev->shadow_list_lock);
 
INIT_LIST_HEAD(&adev->gtt_list);
spin_lock_init(&adev->gtt_list_lock);
 
+   INIT_LIST_HEAD(&adev->ring_lru_list);
+   spin_lock_init(&adev->ring_lru_list_lock);
+
if (adev->asic_type >= CHIP_BONAIRE) {
adev->rmmio_base = pci_resource_start(adev->pdev, 5);
adev->rmmio_size = pci_resource_len(adev->pdev, 5);
} else {
adev->rmmio_base = pci_resource_start(adev->pdev, 2);
adev->rmmio_size = pci_resource_len(adev->pdev, 2);
}
 
adev->rmmio = ioremap(adev->rmmio_base, adev->rmmio_size);
if (adev->rmmio == NULL) {
return -ENOMEM;
}
DRM_INFO("register mmio base: 0x%08X\n", (uint32_t)adev->rmmio_base);
DRM_INFO("register mmio size: %u\n", (unsigned)adev->rmmio_size);
 
if (adev->asic_type >= CHIP_BONAIRE)
/* doorbell bar mapping */
amdgpu_doorbell_init(adev);
 
/* io port mapping */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_queue_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_queue_mgr.c
index 3e9ac80..054d750 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_queue_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_queue_mgr.c
@@ -74,40 +74,74 @@ sta

[PATCH 3/6] drm/amdgpu: untie user ring ids from kernel ring ids v4

2017-04-13 Thread Andres Rodriguez
Add amdgpu_queue_mgr, a mechanism that allows disjointing usermode's
ring ids from the kernel's ring ids.

The queue manager maintains a per-file descriptor map of user ring ids
to amdgpu_ring pointers. Once a map is created it is permanent (this is
required to maintain FIFO execution guarantees for a context's ring).

Different queue map policies can be configured for each HW IP.
Currently all HW IPs use the identity mapper, i.e. kernel ring id is
equal to the user ring id.

The purpose of this mechanism is to distribute the load across multiple
queues more effectively for HW IPs that support multiple rings.
Userspace clients are unable to check whether a specific resource is in
use by a different client. Therefore, it is up to the kernel driver to
make the optimal choice.

v2: remove amdgpu_queue_mapper_funcs
v3: made amdgpu_queue_mgr per context instead of per-fd
v4: add context_put on error paths

Reviewed-by: Christian König 
Signed-off-by: Andres Rodriguez 
---
 drivers/gpu/drm/amd/amdgpu/Makefile   |   3 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  27 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c| 117 +++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c   |   6 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_queue_mgr.c | 230 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c  |  45 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h  |   2 +
 7 files changed, 335 insertions(+), 95 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_queue_mgr.c

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index 660786a..dd48eb2 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -7,41 +7,42 @@ FULL_AMD_PATH=$(src)/..
 ccflags-y := -Iinclude/drm -I$(FULL_AMD_PATH)/include/asic_reg \
-I$(FULL_AMD_PATH)/include \
-I$(FULL_AMD_PATH)/amdgpu \
-I$(FULL_AMD_PATH)/scheduler \
-I$(FULL_AMD_PATH)/powerplay/inc \
-I$(FULL_AMD_PATH)/acp/include
 
 amdgpu-y := amdgpu_drv.o
 
 # add KMS driver
 amdgpu-y += amdgpu_device.o amdgpu_kms.o \
amdgpu_atombios.o atombios_crtc.o amdgpu_connectors.o \
atom.o amdgpu_fence.o amdgpu_ttm.o amdgpu_object.o amdgpu_gart.o \
amdgpu_encoders.o amdgpu_display.o amdgpu_i2c.o \
amdgpu_fb.o amdgpu_gem.o amdgpu_ring.o \
amdgpu_cs.o amdgpu_bios.o amdgpu_benchmark.o amdgpu_test.o \
amdgpu_pm.o atombios_dp.o amdgpu_afmt.o amdgpu_trace_points.o \
atombios_encoders.o amdgpu_sa.o atombios_i2c.o \
amdgpu_prime.o amdgpu_vm.o amdgpu_ib.o amdgpu_pll.o \
amdgpu_ucode.o amdgpu_bo_list.o amdgpu_ctx.o amdgpu_sync.o \
-   amdgpu_gtt_mgr.o amdgpu_vram_mgr.o amdgpu_virt.o amdgpu_atomfirmware.o
+   amdgpu_gtt_mgr.o amdgpu_vram_mgr.o amdgpu_virt.o amdgpu_atomfirmware.o \
+   amdgpu_queue_mgr.o
 
 # add asic specific block
 amdgpu-$(CONFIG_DRM_AMDGPU_CIK)+= cik.o cik_ih.o kv_smc.o kv_dpm.o \
ci_smc.o ci_dpm.o dce_v8_0.o gfx_v7_0.o cik_sdma.o uvd_v4_2.o 
vce_v2_0.o \
amdgpu_amdkfd_gfx_v7.o
 
 amdgpu-$(CONFIG_DRM_AMDGPU_SI)+= si.o gmc_v6_0.o gfx_v6_0.o si_ih.o si_dma.o 
dce_v6_0.o si_dpm.o si_smc.o
 
 amdgpu-y += \
vi.o mxgpu_vi.o nbio_v6_1.o soc15.o mxgpu_ai.o
 
 # add GMC block
 amdgpu-y += \
gmc_v7_0.o \
gmc_v8_0.o \
gfxhub_v1_0.o mmhub_v1_0.o gmc_v9_0.o
 
 # add IH block
 amdgpu-y += \
amdgpu_irq.o \
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 0a58575..1d9053f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -756,52 +756,76 @@ struct amdgpu_ib {
uint32_tlength_dw;
uint64_tgpu_addr;
uint32_t*ptr;
uint32_tflags;
 };
 
 extern const struct amd_sched_backend_ops amdgpu_sched_ops;
 
 int amdgpu_job_alloc(struct amdgpu_device *adev, unsigned num_ibs,
 struct amdgpu_job **job, struct amdgpu_vm *vm);
 int amdgpu_job_alloc_with_ib(struct amdgpu_device *adev, unsigned size,
 struct amdgpu_job **job);
 
 void amdgpu_job_free_resources(struct amdgpu_job *job);
 void amdgpu_job_free(struct amdgpu_job *job);
 int amdgpu_job_submit(struct amdgpu_job *job, struct amdgpu_ring *ring,
  struct amd_sched_entity *entity, void *owner,
  struct dma_fence **f);
 
 /*
+ * Queue manager
+ */
+struct amdgpu_queue_mapper {
+   int hw_ip;
+   struct mutexlock;
+   /* protected by lock */
+   struct amdgpu_ring *queue_map[AMDGPU_MAX_RINGS];
+};
+
+struct amdgpu_queue_mgr {
+   struct amdgpu_queue_mapper mapper[AMDGPU_MAX_IP_NUM];
+};
+
+int amdgpu_queue_mgr_init(struct amdgpu_device *adev,
+ struct amdgpu_queue_mgr *mgr);
+int amdgpu_queue_mgr_fini(struct amdgpu_device *ad

[PATCH 2/6] drm/amdgpu: workaround tonga HW bug in HQD programming sequence

2017-04-13 Thread Andres Rodriguez
Tonga based asics may experience hangs when an HQD's EOP parameters
are modified.

Workaround this HW issue by avoiding writes to these registers for
tonga asics.

Based on the following ROCm commit:
2a0fb8 - drm/amdgpu: Synchronize KFD HQD load protocol with CP scheduler

From the ROCm git repository:
https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver.git

CC: Jay Cornwall 
Suggested-by: Felix Kuehling 
Signed-off-by: Andres Rodriguez 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
index 0f1b62d..b9e0ded 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
@@ -5033,41 +5033,55 @@ static int gfx_v8_0_mqd_init(struct amdgpu_ring *ring)
 
/* activate the queue */
mqd->cp_hqd_active = 1;
 
return 0;
 }
 
 int gfx_v8_0_mqd_commit(struct amdgpu_device *adev,
struct vi_mqd *mqd)
 {
uint32_t mqd_reg;
uint32_t *mqd_data;
 
/* HQD registers extend from mmCP_MQD_BASE_ADDR to mmCP_HQD_ERROR */
mqd_data = &mqd->cp_mqd_base_addr_lo;
 
/* disable wptr polling */
WREG32_FIELD(CP_PQ_WPTR_POLL_CNTL, EN, 0);
 
/* program all HQD registers */
-   for (mqd_reg = mmCP_HQD_VMID; mqd_reg <= mmCP_HQD_ERROR; mqd_reg++)
+   for (mqd_reg = mmCP_HQD_VMID; mqd_reg <= mmCP_HQD_EOP_CONTROL; 
mqd_reg++)
+   WREG32(mqd_reg, mqd_data[mqd_reg - mmCP_MQD_BASE_ADDR]);
+
+   /* Tonga errata: EOP RPTR/WPTR should be left unmodified.
+* This is safe since EOP RPTR==WPTR for any inactive HQD
+* on ASICs that do not support context-save.
+* EOP writes/reads can start anywhere in the ring.
+*/
+   if (adev->asic_type != CHIP_TONGA) {
+   WREG32(mmCP_HQD_EOP_RPTR, mqd->cp_hqd_eop_rptr);
+   WREG32(mmCP_HQD_EOP_WPTR, mqd->cp_hqd_eop_wptr);
+   WREG32(mmCP_HQD_EOP_WPTR_MEM, mqd->cp_hqd_eop_wptr_mem);
+   }
+
+   for (mqd_reg = mmCP_HQD_EOP_EVENTS; mqd_reg <= mmCP_HQD_ERROR; 
mqd_reg++)
WREG32(mqd_reg, mqd_data[mqd_reg - mmCP_MQD_BASE_ADDR]);
 
/* activate the HQD */
for (mqd_reg = mmCP_MQD_BASE_ADDR; mqd_reg <= mmCP_HQD_ACTIVE; 
mqd_reg++)
WREG32(mqd_reg, mqd_data[mqd_reg - mmCP_MQD_BASE_ADDR]);
 
return 0;
 }
 
 static int gfx_v8_0_kiq_init_queue(struct amdgpu_ring *ring)
 {
int r = 0;
struct amdgpu_device *adev = ring->adev;
struct vi_mqd *mqd = ring->mqd_ptr;
int mqd_idx = AMDGPU_MAX_COMPUTE_RINGS;
 
gfx_v8_0_kiq_setting(ring);
 
if (adev->gfx.in_reset) { /* for GPU_RESET case */
/* reset MQD to a clean status */
-- 
2.9.3

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 6/6] drm/amdgpu: use LRU mapping policy for SDMA engines

2017-04-13 Thread Andres Rodriguez
Spreading the load across multiple SDMA engines can increase memory
transfer performance.

Signed-off-by: Andres Rodriguez 
Reviewed-by: Nicolai Hähnle 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_queue_mgr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_queue_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_queue_mgr.c
index 5a7c691..e8984df 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_queue_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_queue_mgr.c
@@ -241,38 +241,38 @@ int amdgpu_queue_mgr_map(struct amdgpu_device *adev,
return -EINVAL;
}
 
if (ring >= ip_num_rings) {
DRM_ERROR("Ring index:%d exceeds maximum:%d for ip:%d\n",
ring, ip_num_rings, hw_ip);
return -EINVAL;
}
 
mutex_lock(&mapper->lock);
 
*out_ring = amdgpu_get_cached_map(mapper, ring);
if (*out_ring) {
/* cache hit */
r = 0;
goto out_unlock;
}
 
switch (mapper->hw_ip) {
case AMDGPU_HW_IP_GFX:
-   case AMDGPU_HW_IP_DMA:
case AMDGPU_HW_IP_UVD:
case AMDGPU_HW_IP_VCE:
r = amdgpu_identity_map(adev, mapper, ring, out_ring);
break;
+   case AMDGPU_HW_IP_DMA:
case AMDGPU_HW_IP_COMPUTE:
r = amdgpu_lru_map(adev, mapper, ring, out_ring);
break;
default:
*out_ring = NULL;
r = -EINVAL;
DRM_ERROR("unknown HW IP type: %d\n", mapper->hw_ip);
}
 
 out_unlock:
mutex_unlock(&mapper->lock);
return r;
 }
-- 
2.9.3

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH split] Improve pipe split between amdgpu and amdkfd

2017-04-13 Thread Andres Rodriguez

Forgot to mention:
  * Re-ordered some patches as suggested by Felix
  * Included "drm: Fix get_property logic fumble" during testing, 
otherwise the system boots to a black screen.


Regards,
Andres

On 2017-04-13 05:35 PM, Andres Rodriguez wrote:

This is a split of patches that are ready to land from the series:
Add support for high priority scheduling in amdgpu v8

I've included Felix and Alex's feedback from the thread above. This includes:
 * Separate MEC_HPD_SIZE rename into a separate patch (patch 01)
 * Added a patch to fix the kgd_hqd_load bug Felix pointed out (patch 06)
 * Fixes for various off-by-one errors
 * Use gfx_v8_0_deactivate_hqd

Only comment I didn't address was changing the queue allocation policy for
gfx9 (similar to gfx7/8). See inline reply in that thread for more details
on why this was skipped.



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 10/17] drm/amdgpu: allow split of queues with kfd at queue granularity v4

2017-04-13 Thread Andres Rodriguez
Previously the queue/pipe split with kfd operated with pipe
granularity. This patch allows amdgpu to take ownership of an arbitrary
set of queues.

It also consolidates the last few magic numbers in the compute
initialization process into mec_init.

v2: support for gfx9
v3: renamed AMDGPU_MAX_QUEUES to AMDGPU_MAX_COMPUTE_QUEUES
v4: fix off-by-one in num_mec checks in *_compute_queue_acquire

Reviewed-by: Edward O'Callaghan 
Reviewed-by: Felix Kuehling 
Acked-by: Christian König 
Signed-off-by: Andres Rodriguez 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h |  7 +++
 drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c   | 82 +---
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c   | 81 +++-
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c   | 84 +++--
 drivers/gpu/drm/amd/include/kgd_kfd_interface.h |  1 +
 5 files changed, 211 insertions(+), 44 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 6b294d2..61990be 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -29,40 +29,42 @@
 #define __AMDGPU_H__
 
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 
 #include 
 
 #include 
 #include 
 #include 
 #include 
 #include 
 
 #include 
 #include 
 #include 
 
+#include 
+
 #include "amd_shared.h"
 #include "amdgpu_mode.h"
 #include "amdgpu_ih.h"
 #include "amdgpu_irq.h"
 #include "amdgpu_ucode.h"
 #include "amdgpu_ttm.h"
 #include "amdgpu_psp.h"
 #include "amdgpu_gds.h"
 #include "amdgpu_sync.h"
 #include "amdgpu_ring.h"
 #include "amdgpu_vm.h"
 #include "amd_powerplay.h"
 #include "amdgpu_dpm.h"
 #include "amdgpu_acp.h"
 #include "amdgpu_uvd.h"
 #include "amdgpu_vce.h"
 
 #include "gpu_scheduler.h"
 #include "amdgpu_virt.h"
 
@@ -875,49 +877,54 @@ struct amdgpu_rlc {
 
/* safe mode for updating CG/PG state */
bool in_safe_mode;
const struct amdgpu_rlc_funcs *funcs;
 
/* for firmware data */
u32 save_and_restore_offset;
u32 clear_state_descriptor_offset;
u32 avail_scratch_ram_locations;
u32 reg_restore_list_size;
u32 reg_list_format_start;
u32 reg_list_format_separate_start;
u32 starting_offsets_start;
u32 reg_list_format_size_bytes;
u32 reg_list_size_bytes;
 
u32 *register_list_format;
u32 *register_restore;
 };
 
+#define AMDGPU_MAX_COMPUTE_QUEUES KGD_MAX_QUEUES
+
 struct amdgpu_mec {
struct amdgpu_bo*hpd_eop_obj;
u64 hpd_eop_gpu_addr;
struct amdgpu_bo*mec_fw_obj;
u64 mec_fw_gpu_addr;
u32 num_mec;
u32 num_pipe_per_mec;
u32 num_queue_per_pipe;
void*mqd_backup[AMDGPU_MAX_COMPUTE_RINGS + 1];
+
+   /* These are the resources for which amdgpu takes ownership */
+   DECLARE_BITMAP(queue_bitmap, AMDGPU_MAX_COMPUTE_QUEUES);
 };
 
 struct amdgpu_kiq {
u64 eop_gpu_addr;
struct amdgpu_bo*eop_obj;
struct amdgpu_ring  ring;
struct amdgpu_irq_src   irq;
 };
 
 /*
  * GPU scratch registers structures, functions & helpers
  */
 struct amdgpu_scratch {
unsignednum_reg;
uint32_treg_base;
uint32_tfree_mask;
 };
 
 /*
  * GFX configurations
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
index 41bda98..8520b4b 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
@@ -32,41 +32,40 @@
 #include "amdgpu_ucode.h"
 #include "clearstate_ci.h"
 
 #include "dce/dce_8_0_d.h"
 #include "dce/dce_8_0_sh_mask.h"
 
 #include "bif/bif_4_1_d.h"
 #include "bif/bif_4_1_sh_mask.h"
 
 #include "gca/gfx_7_0_d.h"
 #include "gca/gfx_7_2_enum.h"
 #include "gca/gfx_7_2_sh_mask.h"
 
 #include "gmc/gmc_7_0_d.h"
 #include "gmc/gmc_7_0_sh_mask.h"
 
 #include "oss/oss_2_0_d.h"
 #include "oss/oss_2_0_sh_mask.h"
 
 #define GFX7_NUM_GFX_RINGS 1
-#define GFX7_NUM_COMPUTE_RINGS 8
 #define GFX7_MEC_HPD_SIZE  2048
 
 static void gfx_v7_0_set_ring_funcs(struct amdgpu_device *adev);
 static void gfx_v7_0_set_irq_funcs(struct amdgpu_device *adev);
 static void gfx_v7_0_set_gds_init(struct amdgpu_device *adev);
 
 MODULE_FIRMWARE("radeon/bonaire_pfp.bin");
 MODULE_FIRMWARE("radeon/bonaire_me.bin");
 MODULE_FIRMWARE("radeon/bonaire_ce.bin");
 MODULE_FIRMWARE("radeon/bonaire_rlc.bin");
 MODULE_FIRMWARE("radeon/bonaire_mec.bin");
 
 MODULE_FIRMWARE("radeon/hawaii_pfp.bin");
 MODULE_FIRMWARE("radeon/hawaii_me.bin");
 MODULE_FIRMWARE("radeon/hawaii_ce.bin");
 MODULE_FIRMWARE("radeon/hawaii_rlc.bin");
 MODULE_FIRMWARE("radeon/hawaii_mec.bin");
 
 MODULE_FIRMWARE("radeon/kaveri_pfp.bin");
 MODULE_FIRMWARE("radeon/kaveri_me.bin");
@@ -2806,67 +2805,98 @@ static void gfx_v7_0_cp_compute_fini(struct 
amdgpu_device *adev)
  

[PATCH 15/17] drm/amdgpu: remove hardcoded queue_mask in PACKET3_SET_RESOURCES

2017-04-13 Thread Andres Rodriguez
The assumption that we are only using the first pipe no longer holds.
Instead, calculate the queue_mask from the queue_bitmap.

Acked-by: Felix Kuehling 
Signed-off-by: Andres Rodriguez 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 20 ++--
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 23 +--
 2 files changed, 39 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
index 90e1dd3..ff77351 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
@@ -4697,60 +4697,76 @@ static int gfx_v8_0_cp_compute_load_microcode(struct 
amdgpu_device *adev)
 
 /* KIQ functions */
 static void gfx_v8_0_kiq_setting(struct amdgpu_ring *ring)
 {
uint32_t tmp;
struct amdgpu_device *adev = ring->adev;
 
/* tell RLC which is KIQ queue */
tmp = RREG32(mmRLC_CP_SCHEDULERS);
tmp &= 0xff00;
tmp |= (ring->me << 5) | (ring->pipe << 3) | (ring->queue);
WREG32(mmRLC_CP_SCHEDULERS, tmp);
tmp |= 0x80;
WREG32(mmRLC_CP_SCHEDULERS, tmp);
 }
 
 static int gfx_v8_0_kiq_kcq_enable(struct amdgpu_device *adev)
 {
struct amdgpu_ring *kiq_ring = &adev->gfx.kiq.ring;
uint32_t scratch, tmp = 0;
+   uint64_t queue_mask = 0;
int r, i;
 
+   for (i = 0; i < AMDGPU_MAX_COMPUTE_QUEUES; ++i) {
+   if (!test_bit(i, adev->gfx.mec.queue_bitmap))
+   continue;
+
+   /* This situation may be hit in the future if a new HW
+* generation exposes more than 64 queues. If so, the
+* definition of queue_mask needs updating */
+   if (WARN_ON(i > (sizeof(queue_mask)*8))) {
+   DRM_ERROR("Invalid KCQ enabled: %d\n", i);
+   break;
+   }
+
+   queue_mask |= (1ull << i);
+   }
+
r = amdgpu_gfx_scratch_get(adev, &scratch);
if (r) {
DRM_ERROR("Failed to get scratch reg (%d).\n", r);
return r;
}
WREG32(scratch, 0xCAFEDEAD);
 
r = amdgpu_ring_alloc(kiq_ring, (8 * adev->gfx.num_compute_rings) + 11);
if (r) {
DRM_ERROR("Failed to lock KIQ (%d).\n", r);
amdgpu_gfx_scratch_free(adev, scratch);
return r;
}
/* set resources */
amdgpu_ring_write(kiq_ring, PACKET3(PACKET3_SET_RESOURCES, 6));
amdgpu_ring_write(kiq_ring, 0); /* vmid_mask:0 queue_type:0 (KIQ) */
-   amdgpu_ring_write(kiq_ring, 0x00FF);/* queue mask lo */
-   amdgpu_ring_write(kiq_ring, 0); /* queue mask hi */
+   amdgpu_ring_write(kiq_ring, lower_32_bits(queue_mask)); /* queue mask 
lo */
+   amdgpu_ring_write(kiq_ring, upper_32_bits(queue_mask)); /* queue mask 
hi */
amdgpu_ring_write(kiq_ring, 0); /* gws mask lo */
amdgpu_ring_write(kiq_ring, 0); /* gws mask hi */
amdgpu_ring_write(kiq_ring, 0); /* oac mask */
amdgpu_ring_write(kiq_ring, 0); /* gds heap base:0, gds heap size:0 */
for (i = 0; i < adev->gfx.num_compute_rings; i++) {
struct amdgpu_ring *ring = &adev->gfx.compute_ring[i];
uint64_t mqd_addr = amdgpu_bo_gpu_offset(ring->mqd_obj);
uint64_t wptr_addr = adev->wb.gpu_addr + (ring->wptr_offs * 4);
 
/* map queues */
amdgpu_ring_write(kiq_ring, PACKET3(PACKET3_MAP_QUEUES, 5));
/* Q_sel:0, vmid:0, vidmem: 1, engine:0, num_Q:1*/
amdgpu_ring_write(kiq_ring,
  PACKET3_MAP_QUEUES_NUM_QUEUES(1));
amdgpu_ring_write(kiq_ring,
  
PACKET3_MAP_QUEUES_DOORBELL_OFFSET(ring->doorbell_index) |
  PACKET3_MAP_QUEUES_QUEUE(ring->queue) |
  PACKET3_MAP_QUEUES_PIPE(ring->pipe) |
  PACKET3_MAP_QUEUES_ME(ring->me == 1 ? 0 : 
1)); /* doorbell */
amdgpu_ring_write(kiq_ring, lower_32_bits(mqd_addr));
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
index 6208493..5a5ff47 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
@@ -1895,46 +1895,65 @@ static int gfx_v9_0_cp_compute_resume(struct 
amdgpu_device *adev)
return 0;
 }
 
 /* KIQ functions */
 static void gfx_v9_0_kiq_setting(struct amdgpu_ring *ring)
 {
uint32_t tmp;
struct amdgpu_device *adev = ring->adev;
 
/* tell RLC which is KIQ queue */
tmp = RREG32_SOC15(GC, 0, mmRLC_CP_SCHEDULERS);
tmp &= 0xff00;
tmp |= (ring->me << 5) | (ring->pipe << 3) | (ring->queue);
WREG32_SOC15(GC, 0, mmRLC_CP_SCHEDULERS, tmp);
tmp |= 0x80;
WREG32_SOC15(GC, 0, mmRLC_CP_SCHEDULERS, tmp);
 }
 
 static void gfx_v9_0_kiq_en

[PATCH 12/17] drm/amdkfd: allow split HQD on per-queue granularity v4

2017-04-13 Thread Andres Rodriguez
Update the KGD to KFD interface to allow sharing pipes with queue
granularity instead of pipe granularity.

This allows for more interesting pipe/queue splits.

v2: fix overflow check for res.queue_mask
v3: fix shift overflow when setting res.queue_mask
v4: fix comment in is_pipeline_enabled()

Reviewed-by: Edward O'Callaghan 
Reviewed-by: Felix Kuehling 
Acked-by: Christian König 
Signed-off-by: Andres Rodriguez 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c |  22 -
 drivers/gpu/drm/amd/amdkfd/kfd_device.c|   4 +
 .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c  | 100 ++---
 .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.h  |  10 +--
 .../drm/amd/amdkfd/kfd_device_queue_manager_cik.c  |   2 +-
 drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c|   3 +-
 .../gpu/drm/amd/amdkfd/kfd_process_queue_manager.c |   2 +-
 drivers/gpu/drm/amd/include/kgd_kfd_interface.h|  17 ++--
 drivers/gpu/drm/radeon/radeon_kfd.c|  21 -
 9 files changed, 126 insertions(+), 55 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index 3200ff9..8fc5aa3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -78,48 +78,64 @@ bool amdgpu_amdkfd_load_interface(struct amdgpu_device 
*adev)
return true;
 }
 
 void amdgpu_amdkfd_fini(void)
 {
if (kgd2kfd) {
kgd2kfd->exit();
symbol_put(kgd2kfd_init);
}
 }
 
 void amdgpu_amdkfd_device_probe(struct amdgpu_device *adev)
 {
if (kgd2kfd)
adev->kfd = kgd2kfd->probe((struct kgd_dev *)adev,
adev->pdev, kfd2kgd);
 }
 
 void amdgpu_amdkfd_device_init(struct amdgpu_device *adev)
 {
+   int i;
+   int last_valid_bit;
if (adev->kfd) {
struct kgd2kfd_shared_resources gpu_resources = {
.compute_vmid_bitmap = 0xFF00,
-
-   .first_compute_pipe = 1,
-   .compute_pipe_count = 4 - 1,
+   .num_mec = adev->gfx.mec.num_mec,
+   .num_pipe_per_mec = adev->gfx.mec.num_pipe_per_mec,
+   .num_queue_per_pipe = adev->gfx.mec.num_queue_per_pipe
};
 
+   /* this is going to have a few of the MSBs set that we need to
+* clear */
+   bitmap_complement(gpu_resources.queue_bitmap,
+ adev->gfx.mec.queue_bitmap,
+ KGD_MAX_QUEUES);
+
+   /* According to linux/bitmap.h we shouldn't use bitmap_clear if
+* nbits is not compile time constant */
+   last_valid_bit = adev->gfx.mec.num_mec
+   * adev->gfx.mec.num_pipe_per_mec
+   * adev->gfx.mec.num_queue_per_pipe;
+   for (i = last_valid_bit; i < KGD_MAX_QUEUES; ++i)
+   clear_bit(i, gpu_resources.queue_bitmap);
+
amdgpu_doorbell_get_kfd_info(adev,
&gpu_resources.doorbell_physical_address,
&gpu_resources.doorbell_aperture_size,
&gpu_resources.doorbell_start_offset);
 
kgd2kfd->device_init(adev->kfd, &gpu_resources);
}
 }
 
 void amdgpu_amdkfd_device_fini(struct amdgpu_device *adev)
 {
if (adev->kfd) {
kgd2kfd->device_exit(adev->kfd);
adev->kfd = NULL;
}
 }
 
 void amdgpu_amdkfd_interrupt(struct amdgpu_device *adev,
const void *ih_ring_entry)
 {
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index 3f95f7c..88187bf 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -209,40 +209,44 @@ static int iommu_invalid_ppr_cb(struct pci_dev *pdev, int 
pasid,
pasid,
address,
flags);
 
dev = kfd_device_by_pci_dev(pdev);
BUG_ON(dev == NULL);
 
kfd_signal_iommu_event(dev, pasid, address,
flags & PPR_FAULT_WRITE, flags & PPR_FAULT_EXEC);
 
return AMD_IOMMU_INV_PRI_RSP_INVALID;
 }
 
 bool kgd2kfd_device_init(struct kfd_dev *kfd,
 const struct kgd2kfd_shared_resources *gpu_resources)
 {
unsigned int size;
 
kfd->shared_resources = *gpu_resources;
 
+   /* We only use the first MEC */
+   if (kfd->shared_resources.num_mec > 1)
+   kfd->shared_resources.num_mec = 1;
+
/* calculate max size of mqds needed for queues */
size = max_num_of_queues_per_device *
kfd->device_info->mqd_size_aligned;
 
/*
 * calculate max size of runlist packet.
 * There can be only 2 packets

[PATCH 05/17] drm/amdgpu: unify MQD programming sequence for kfd and amdgpu v2

2017-04-13 Thread Andres Rodriguez
Use the same gfx_*_mqd_commit function for kfd and amdgpu codepaths.

This removes the last duplicates of this programming sequence.

v2: fix cp_hqd_pq_wptr value

Reviewed-by: Edward O'Callaghan 
Acked-by: Christian König 
Reviewed-by: Felix Kuehling 
Signed-off-by: Andres Rodriguez 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c | 51 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c | 49 ++
 drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | 38 -
 drivers/gpu/drm/amd/amdgpu/gfx_v7_0.h |  5 +++
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 49 +++---
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.h |  5 +++
 6 files changed, 97 insertions(+), 100 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
index 1a0a5f7..038b7ea 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
@@ -12,40 +12,41 @@
  * all copies or substantial portions of the Software.
  *
  * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
  * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
  * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
  * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
  * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
  * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
  * OTHER DEALINGS IN THE SOFTWARE.
  */
 
 #include 
 #include 
 #include 
 #include 
 #include "amdgpu.h"
 #include "amdgpu_amdkfd.h"
 #include "cikd.h"
 #include "cik_sdma.h"
 #include "amdgpu_ucode.h"
+#include "gfx_v7_0.h"
 #include "gca/gfx_7_2_d.h"
 #include "gca/gfx_7_2_enum.h"
 #include "gca/gfx_7_2_sh_mask.h"
 #include "oss/oss_2_0_d.h"
 #include "oss/oss_2_0_sh_mask.h"
 #include "gmc/gmc_7_1_d.h"
 #include "gmc/gmc_7_1_sh_mask.h"
 #include "cik_structs.h"
 
 #define CIK_PIPE_PER_MEC   (4)
 
 enum {
MAX_TRAPID = 8, /* 3 bits in the bitfield. */
MAX_WATCH_ADDRESSES = 4
 };
 
 enum {
ADDRESS_WATCH_REG_ADDR_HI = 0,
ADDRESS_WATCH_REG_ADDR_LO,
ADDRESS_WATCH_REG_CNTL,
@@ -292,89 +293,45 @@ static inline uint32_t get_sdma_base_addr(struct 
cik_sdma_rlc_registers *m)
 static inline struct cik_mqd *get_mqd(void *mqd)
 {
return (struct cik_mqd *)mqd;
 }
 
 static inline struct cik_sdma_rlc_registers *get_sdma_mqd(void *mqd)
 {
return (struct cik_sdma_rlc_registers *)mqd;
 }
 
 static int kgd_hqd_load(struct kgd_dev *kgd, void *mqd, uint32_t pipe_id,
uint32_t queue_id, uint32_t __user *wptr)
 {
struct amdgpu_device *adev = get_amdgpu_device(kgd);
uint32_t wptr_shadow, is_wptr_shadow_valid;
struct cik_mqd *m;
 
m = get_mqd(mqd);
 
is_wptr_shadow_valid = !get_user(wptr_shadow, wptr);
-
-   acquire_queue(kgd, pipe_id, queue_id);
-   WREG32(mmCP_MQD_BASE_ADDR, m->cp_mqd_base_addr_lo);
-   WREG32(mmCP_MQD_BASE_ADDR_HI, m->cp_mqd_base_addr_hi);
-   WREG32(mmCP_MQD_CONTROL, m->cp_mqd_control);
-
-   WREG32(mmCP_HQD_PQ_BASE, m->cp_hqd_pq_base_lo);
-   WREG32(mmCP_HQD_PQ_BASE_HI, m->cp_hqd_pq_base_hi);
-   WREG32(mmCP_HQD_PQ_CONTROL, m->cp_hqd_pq_control);
-
-   WREG32(mmCP_HQD_IB_CONTROL, m->cp_hqd_ib_control);
-   WREG32(mmCP_HQD_IB_BASE_ADDR, m->cp_hqd_ib_base_addr_lo);
-   WREG32(mmCP_HQD_IB_BASE_ADDR_HI, m->cp_hqd_ib_base_addr_hi);
-
-   WREG32(mmCP_HQD_IB_RPTR, m->cp_hqd_ib_rptr);
-
-   WREG32(mmCP_HQD_PERSISTENT_STATE, m->cp_hqd_persistent_state);
-   WREG32(mmCP_HQD_SEMA_CMD, m->cp_hqd_sema_cmd);
-   WREG32(mmCP_HQD_MSG_TYPE, m->cp_hqd_msg_type);
-
-   WREG32(mmCP_HQD_ATOMIC0_PREOP_LO, m->cp_hqd_atomic0_preop_lo);
-   WREG32(mmCP_HQD_ATOMIC0_PREOP_HI, m->cp_hqd_atomic0_preop_hi);
-   WREG32(mmCP_HQD_ATOMIC1_PREOP_LO, m->cp_hqd_atomic1_preop_lo);
-   WREG32(mmCP_HQD_ATOMIC1_PREOP_HI, m->cp_hqd_atomic1_preop_hi);
-
-   WREG32(mmCP_HQD_PQ_RPTR_REPORT_ADDR, m->cp_hqd_pq_rptr_report_addr_lo);
-   WREG32(mmCP_HQD_PQ_RPTR_REPORT_ADDR_HI,
-   m->cp_hqd_pq_rptr_report_addr_hi);
-
-   WREG32(mmCP_HQD_PQ_RPTR, m->cp_hqd_pq_rptr);
-
-   WREG32(mmCP_HQD_PQ_WPTR_POLL_ADDR, m->cp_hqd_pq_wptr_poll_addr_lo);
-   WREG32(mmCP_HQD_PQ_WPTR_POLL_ADDR_HI, m->cp_hqd_pq_wptr_poll_addr_hi);
-
-   WREG32(mmCP_HQD_PQ_DOORBELL_CONTROL, m->cp_hqd_pq_doorbell_control);
-
-   WREG32(mmCP_HQD_VMID, m->cp_hqd_vmid);
-
-   WREG32(mmCP_HQD_QUANTUM, m->cp_hqd_quantum);
-
-   WREG32(mmCP_HQD_PIPE_PRIORITY, m->cp_hqd_pipe_priority);
-   WREG32(mmCP_HQD_QUEUE_PRIORITY, m->cp_hqd_queue_priority);
-
-   WREG32(mmCP_HQD_IQ_RPTR, m->cp_hqd_iq_rptr);
-
if (is_wptr_shadow_valid)
-   WREG32(mmCP_HQD_PQ_WPTR, wptr_shadow);
+   

[PATCH 17/17] drm/amdgpu: new queue policy, take first 2 queues of each pipe v2

2017-04-13 Thread Andres Rodriguez
Instead of taking the first pipe and giving the rest to kfd, take the
first 2 queues of each pipe.

Effectively, amdgpu and amdkfd own the same number of queues. But
because the queues are spread over multiple pipes the hardware will be
able to better handle concurrent compute workloads.

amdgpu goes from 1 pipe to 4 pipes, i.e. from 1 compute threads to 4
amdkfd goes from 3 pipe to 4 pipes, i.e. from 3 compute threads to 4

v2: fix policy comment

Reviewed-by: Edward O'Callaghan 
Reviewed-by: Felix Kuehling 
Acked-by: Christian König 
Signed-off-by: Andres Rodriguez 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | 4 ++--
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
index 684f053..c0844a5 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
@@ -2821,42 +2821,42 @@ static void gfx_v7_0_mec_fini(struct amdgpu_device 
*adev)
adev->gfx.mec.hpd_eop_obj = NULL;
}
 }
 
 static void gfx_v7_0_compute_queue_acquire(struct amdgpu_device *adev)
 {
int i, queue, pipe, mec;
 
/* policy for amdgpu compute queue ownership */
for (i = 0; i < AMDGPU_MAX_COMPUTE_QUEUES; ++i) {
queue = i % adev->gfx.mec.num_queue_per_pipe;
pipe = (i / adev->gfx.mec.num_queue_per_pipe)
% adev->gfx.mec.num_pipe_per_mec;
mec = (i / adev->gfx.mec.num_queue_per_pipe)
/ adev->gfx.mec.num_pipe_per_mec;
 
/* we've run out of HW */
if (mec >= adev->gfx.mec.num_mec)
break;
 
-   /* policy: amdgpu owns all queues in the first pipe */
-   if (mec == 0 && pipe == 0)
+   /* policy: amdgpu owns the first two queues of the first MEC */
+   if (mec == 0 && queue < 2)
set_bit(i, adev->gfx.mec.queue_bitmap);
}
 
/* update the number of active compute rings */
adev->gfx.num_compute_rings =
bitmap_weight(adev->gfx.mec.queue_bitmap, 
AMDGPU_MAX_COMPUTE_QUEUES);
 
/* If you hit this case and edited the policy, you probably just
 * need to increase AMDGPU_MAX_COMPUTE_RINGS */
if (WARN_ON(adev->gfx.num_compute_rings > AMDGPU_MAX_COMPUTE_RINGS))
adev->gfx.num_compute_rings = AMDGPU_MAX_COMPUTE_RINGS;
 }
 
 static int gfx_v7_0_mec_init(struct amdgpu_device *adev)
 {
int r;
u32 *hpd;
size_t mec_hpd_size;
 
bitmap_zero(adev->gfx.mec.queue_bitmap, AMDGPU_MAX_COMPUTE_QUEUES);
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
index 2178611..a5ba48b 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
@@ -1439,42 +1439,42 @@ static void gfx_v8_0_kiq_free_ring(struct amdgpu_ring 
*ring,
amdgpu_wb_free(ring->adev, ring->adev->virt.reg_val_offs);
amdgpu_ring_fini(ring);
 }
 
 static void gfx_v8_0_compute_queue_acquire(struct amdgpu_device *adev)
 {
int i, queue, pipe, mec;
 
/* policy for amdgpu compute queue ownership */
for (i = 0; i < AMDGPU_MAX_COMPUTE_QUEUES; ++i) {
queue = i % adev->gfx.mec.num_queue_per_pipe;
pipe = (i / adev->gfx.mec.num_queue_per_pipe)
% adev->gfx.mec.num_pipe_per_mec;
mec = (i / adev->gfx.mec.num_queue_per_pipe)
/ adev->gfx.mec.num_pipe_per_mec;
 
/* we've run out of HW */
if (mec >= adev->gfx.mec.num_mec)
break;
 
-   /* policy: amdgpu owns all queues in the first pipe */
-   if (mec == 0 && pipe == 0)
+   /* policy: amdgpu owns the first two queues of the first MEC */
+   if (mec == 0 && queue < 2)
set_bit(i, adev->gfx.mec.queue_bitmap);
}
 
/* update the number of active compute rings */
adev->gfx.num_compute_rings =
bitmap_weight(adev->gfx.mec.queue_bitmap, 
AMDGPU_MAX_COMPUTE_QUEUES);
 
/* If you hit this case and edited the policy, you probably just
 * need to increase AMDGPU_MAX_COMPUTE_RINGS */
if (WARN_ON(adev->gfx.num_compute_rings > AMDGPU_MAX_COMPUTE_RINGS))
adev->gfx.num_compute_rings = AMDGPU_MAX_COMPUTE_RINGS;
 }
 
 static int gfx_v8_0_mec_init(struct amdgpu_device *adev)
 {
int r;
u32 *hpd;
size_t mec_hpd_size;
 
bitmap_zero(adev->gfx.mec.queue_bitmap, AMDGPU_MAX_COMPUTE_QUEUES);
-- 
2.9.3

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 14/17] drm/amdgpu: allocate queues horizontally across pipes

2017-04-13 Thread Andres Rodriguez
Pipes provide better concurrency than queues, therefore we want to make
sure that apps use queues from different pipes whenever possible.

Optimize for the trivial case where an app will consume rings in order,
therefore we don't want adjacent rings to belong to the same pipe.

Reviewed-by: Edward O'Callaghan 
Acked-by: Felix Kuehling 
Acked-by: Christian König 
Signed-off-by: Andres Rodriguez 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h   | 13 ++
 drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | 83 +++--
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 86 +--
 3 files changed, 113 insertions(+), 69 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 61990be..0583396 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1762,40 +1762,53 @@ static inline void amdgpu_ring_write_multiple(struct 
amdgpu_ring *ring, void *sr
ring->count_dw -= count_dw;
}
 }
 
 static inline struct amdgpu_sdma_instance *
 amdgpu_get_sdma_instance(struct amdgpu_ring *ring)
 {
struct amdgpu_device *adev = ring->adev;
int i;
 
for (i = 0; i < adev->sdma.num_instances; i++)
if (&adev->sdma.instance[i].ring == ring)
break;
 
if (i < AMDGPU_MAX_SDMA_INSTANCES)
return &adev->sdma.instance[i];
else
return NULL;
 }
 
+static inline bool amdgpu_is_mec_queue_enabled(struct amdgpu_device *adev,
+   int mec, int pipe, int queue)
+{
+   int bit = 0;
+
+   bit += mec * adev->gfx.mec.num_pipe_per_mec
+   * adev->gfx.mec.num_queue_per_pipe;
+   bit += pipe * adev->gfx.mec.num_queue_per_pipe;
+   bit += queue;
+
+   return test_bit(bit, adev->gfx.mec.queue_bitmap);
+}
+
 /*
  * ASICs macro.
  */
 #define amdgpu_asic_set_vga_state(adev, state) 
(adev)->asic_funcs->set_vga_state((adev), (state))
 #define amdgpu_asic_reset(adev) (adev)->asic_funcs->reset((adev))
 #define amdgpu_asic_get_xclk(adev) (adev)->asic_funcs->get_xclk((adev))
 #define amdgpu_asic_set_uvd_clocks(adev, v, d) 
(adev)->asic_funcs->set_uvd_clocks((adev), (v), (d))
 #define amdgpu_asic_set_vce_clocks(adev, ev, ec) 
(adev)->asic_funcs->set_vce_clocks((adev), (ev), (ec))
 #define amdgpu_get_pcie_lanes(adev) (adev)->asic_funcs->get_pcie_lanes((adev))
 #define amdgpu_set_pcie_lanes(adev, l) 
(adev)->asic_funcs->set_pcie_lanes((adev), (l))
 #define amdgpu_asic_get_gpu_clock_counter(adev) 
(adev)->asic_funcs->get_gpu_clock_counter((adev))
 #define amdgpu_asic_read_disabled_bios(adev) 
(adev)->asic_funcs->read_disabled_bios((adev))
 #define amdgpu_asic_read_bios_from_rom(adev, b, l) 
(adev)->asic_funcs->read_bios_from_rom((adev), (b), (l))
 #define amdgpu_asic_read_register(adev, se, sh, offset, 
v)((adev)->asic_funcs->read_register((adev), (se), (sh), (offset), (v)))
 #define amdgpu_asic_get_config_memsize(adev) 
(adev)->asic_funcs->get_config_memsize((adev))
 #define amdgpu_gart_flush_gpu_tlb(adev, vmid) 
(adev)->gart.gart_funcs->flush_gpu_tlb((adev), (vmid))
 #define amdgpu_gart_set_pte_pde(adev, pt, idx, addr, flags) 
(adev)->gart.gart_funcs->set_pte_pde((adev), (pt), (idx), (addr), (flags))
 #define amdgpu_vm_copy_pte(adev, ib, pe, src, count) 
((adev)->vm_manager.vm_pte_funcs->copy_pte((ib), (pe), (src), (count)))
 #define amdgpu_vm_write_pte(adev, ib, pe, value, count, incr) 
((adev)->vm_manager.vm_pte_funcs->write_pte((ib), (pe), (value), (count), 
(incr)))
 #define amdgpu_vm_set_pte_pde(adev, ib, pe, addr, count, incr, flags) 
((adev)->vm_manager.vm_pte_funcs->set_pte_pde((ib), (pe), (addr), (count), 
(incr), (flags)))
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
index 8969c69..684f053 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
@@ -4733,45 +4733,76 @@ static void gfx_v7_0_gpu_early_init(struct 
amdgpu_device *adev)
adev->gfx.config.num_gpus = 1;
adev->gfx.config.multi_gpu_tile_size = 64;
 
/* fix up row size */
gb_addr_config &= ~GB_ADDR_CONFIG__ROW_SIZE_MASK;
switch (adev->gfx.config.mem_row_size_in_kb) {
case 1:
default:
gb_addr_config |= (0 << GB_ADDR_CONFIG__ROW_SIZE__SHIFT);
break;
case 2:
gb_addr_config |= (1 << GB_ADDR_CONFIG__ROW_SIZE__SHIFT);
break;
case 4:
gb_addr_config |= (2 << GB_ADDR_CONFIG__ROW_SIZE__SHIFT);
break;
}
adev->gfx.config.gb_addr_config = gb_addr_config;
 }
 
+static int gfx_v7_0_compute_ring_init(struct amdgpu_device *adev, int ring_id,
+   int mec, int pipe, int queue)
+{
+   int r;
+   unsigned irq_type;
+   struct amdgpu_ring *ring = &adev->gfx.compute_ring[ring_id];
+
+   

[PATCH 11/17] drm/amdgpu: teach amdgpu how to enable interrupts for any pipe v3

2017-04-13 Thread Andres Rodriguez
The current implementation is hardcoded to enable ME1/PIPE0 interrupts
only.

This patch allows amdgpu to enable interrupts for any pipe of ME1.

v2: added gfx9 support
v3: use soc15_grbm_select for gfx9

Acked-by: Felix Kuehling 
Signed-off-by: Andres Rodriguez 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | 48 -
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 33 +++
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 50 +++
 3 files changed, 49 insertions(+), 82 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
index 8520b4b..8969c69 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
@@ -5047,76 +5047,62 @@ static void gfx_v7_0_set_gfx_eop_interrupt_state(struct 
amdgpu_device *adev,
switch (state) {
case AMDGPU_IRQ_STATE_DISABLE:
cp_int_cntl = RREG32(mmCP_INT_CNTL_RING0);
cp_int_cntl &= ~CP_INT_CNTL_RING0__TIME_STAMP_INT_ENABLE_MASK;
WREG32(mmCP_INT_CNTL_RING0, cp_int_cntl);
break;
case AMDGPU_IRQ_STATE_ENABLE:
cp_int_cntl = RREG32(mmCP_INT_CNTL_RING0);
cp_int_cntl |= CP_INT_CNTL_RING0__TIME_STAMP_INT_ENABLE_MASK;
WREG32(mmCP_INT_CNTL_RING0, cp_int_cntl);
break;
default:
break;
}
 }
 
 static void gfx_v7_0_set_compute_eop_interrupt_state(struct amdgpu_device 
*adev,
 int me, int pipe,
 enum 
amdgpu_interrupt_state state)
 {
-   u32 mec_int_cntl, mec_int_cntl_reg;
-
-   /*
-* amdgpu controls only pipe 0 of MEC1. That's why this function only
-* handles the setting of interrupts for this specific pipe. All other
-* pipes' interrupts are set by amdkfd.
+   /* Me 0 is for graphics and Me 2 is reserved for HW scheduling
+* So we should only really be configuring ME 1 i.e. MEC0
 */
-
-   if (me == 1) {
-   switch (pipe) {
-   case 0:
-   mec_int_cntl_reg = mmCP_ME1_PIPE0_INT_CNTL;
-   break;
-   default:
-   DRM_DEBUG("invalid pipe %d\n", pipe);
-   return;
-   }
-   } else {
-   DRM_DEBUG("invalid me %d\n", me);
+   if (me != 1) {
+   DRM_ERROR("Ignoring request to enable interrupts for invalid 
me:%d\n", me);
return;
}
 
-   switch (state) {
-   case AMDGPU_IRQ_STATE_DISABLE:
-   mec_int_cntl = RREG32(mec_int_cntl_reg);
-   mec_int_cntl &= ~CP_INT_CNTL_RING0__TIME_STAMP_INT_ENABLE_MASK;
-   WREG32(mec_int_cntl_reg, mec_int_cntl);
-   break;
-   case AMDGPU_IRQ_STATE_ENABLE:
-   mec_int_cntl = RREG32(mec_int_cntl_reg);
-   mec_int_cntl |= CP_INT_CNTL_RING0__TIME_STAMP_INT_ENABLE_MASK;
-   WREG32(mec_int_cntl_reg, mec_int_cntl);
-   break;
-   default:
-   break;
+   if (pipe >= adev->gfx.mec.num_pipe_per_mec) {
+   DRM_ERROR("Ignoring request to enable interrupts for invalid "
+   "me:%d pipe:%d\n", pipe, me);
+   return;
}
+
+   mutex_lock(&adev->srbm_mutex);
+   cik_srbm_select(adev, me, pipe, 0, 0);
+
+   WREG32_FIELD(CPC_INT_CNTL, TIME_STAMP_INT_ENABLE,
+   state == AMDGPU_IRQ_STATE_DISABLE ? 0 : 1);
+
+   cik_srbm_select(adev, 0, 0, 0, 0);
+   mutex_unlock(&adev->srbm_mutex);
 }
 
 static int gfx_v7_0_set_priv_reg_fault_state(struct amdgpu_device *adev,
 struct amdgpu_irq_src *src,
 unsigned type,
 enum amdgpu_interrupt_state state)
 {
u32 cp_int_cntl;
 
switch (state) {
case AMDGPU_IRQ_STATE_DISABLE:
cp_int_cntl = RREG32(mmCP_INT_CNTL_RING0);
cp_int_cntl &= ~CP_INT_CNTL_RING0__PRIV_REG_INT_ENABLE_MASK;
WREG32(mmCP_INT_CNTL_RING0, cp_int_cntl);
break;
case AMDGPU_IRQ_STATE_ENABLE:
cp_int_cntl = RREG32(mmCP_INT_CNTL_RING0);
cp_int_cntl |= CP_INT_CNTL_RING0__PRIV_REG_INT_ENABLE_MASK;
WREG32(mmCP_INT_CNTL_RING0, cp_int_cntl);
break;
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
index fc94e2b..8cc9874 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
@@ -6769,61 +6769,60 @@ static void gfx_v8_0_ring_emit_wreg(struct amdgpu_ring 
*ring, uint32_t reg,
  uint32_t val)
 {
amdgpu_ring_write(ring

[PATCH 13/17] drm/amdgpu: remove duplicate magic constants from amdgpu_amdkfd_gfx*.c

2017-04-13 Thread Andres Rodriguez
This information is already available in adev.

Reviewed-by: Edward O'Callaghan 
Reviewed-by: Felix Kuehling 
Acked-by: Christian König 
Signed-off-by: Andres Rodriguez 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c | 12 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c | 12 ++--
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
index 910f9d3..5254562 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
@@ -22,42 +22,40 @@
 
 #include 
 #include 
 #include 
 #include 
 #include "amdgpu.h"
 #include "amdgpu_amdkfd.h"
 #include "cikd.h"
 #include "cik_sdma.h"
 #include "amdgpu_ucode.h"
 #include "gfx_v7_0.h"
 #include "gca/gfx_7_2_d.h"
 #include "gca/gfx_7_2_enum.h"
 #include "gca/gfx_7_2_sh_mask.h"
 #include "oss/oss_2_0_d.h"
 #include "oss/oss_2_0_sh_mask.h"
 #include "gmc/gmc_7_1_d.h"
 #include "gmc/gmc_7_1_sh_mask.h"
 #include "cik_structs.h"
 
-#define CIK_PIPE_PER_MEC   (4)
-
 enum {
MAX_TRAPID = 8, /* 3 bits in the bitfield. */
MAX_WATCH_ADDRESSES = 4
 };
 
 enum {
ADDRESS_WATCH_REG_ADDR_HI = 0,
ADDRESS_WATCH_REG_ADDR_LO,
ADDRESS_WATCH_REG_CNTL,
ADDRESS_WATCH_REG_MAX
 };
 
 /*  not defined in the CI/KV reg file  */
 enum {
ADDRESS_WATCH_REG_CNTL_ATC_BIT = 0x1000UL,
ADDRESS_WATCH_REG_CNTL_DEFAULT_MASK = 0x00FF,
ADDRESS_WATCH_REG_ADDLOW_MASK_EXTENSION = 0x0300,
/* extend the mask to 26 bits to match the low address field */
ADDRESS_WATCH_REG_ADDLOW_SHIFT = 6,
ADDRESS_WATCH_REG_ADDHIGH_MASK = 0x
@@ -169,42 +167,44 @@ static void lock_srbm(struct kgd_dev *kgd, uint32_t mec, 
uint32_t pipe,
uint32_t queue, uint32_t vmid)
 {
struct amdgpu_device *adev = get_amdgpu_device(kgd);
uint32_t value = PIPEID(pipe) | MEID(mec) | VMID(vmid) | QUEUEID(queue);
 
mutex_lock(&adev->srbm_mutex);
WREG32(mmSRBM_GFX_CNTL, value);
 }
 
 static void unlock_srbm(struct kgd_dev *kgd)
 {
struct amdgpu_device *adev = get_amdgpu_device(kgd);
 
WREG32(mmSRBM_GFX_CNTL, 0);
mutex_unlock(&adev->srbm_mutex);
 }
 
 static void acquire_queue(struct kgd_dev *kgd, uint32_t pipe_id,
uint32_t queue_id)
 {
-   uint32_t mec = (++pipe_id / CIK_PIPE_PER_MEC) + 1;
-   uint32_t pipe = (pipe_id % CIK_PIPE_PER_MEC);
+   struct amdgpu_device *adev = get_amdgpu_device(kgd);
+
+   uint32_t mec = (++pipe_id / adev->gfx.mec.num_pipe_per_mec) + 1;
+   uint32_t pipe = (pipe_id % adev->gfx.mec.num_pipe_per_mec);
 
lock_srbm(kgd, mec, pipe, queue_id, 0);
 }
 
 static void release_queue(struct kgd_dev *kgd)
 {
unlock_srbm(kgd);
 }
 
 static void kgd_program_sh_mem_settings(struct kgd_dev *kgd, uint32_t vmid,
uint32_t sh_mem_config,
uint32_t sh_mem_ape1_base,
uint32_t sh_mem_ape1_limit,
uint32_t sh_mem_bases)
 {
struct amdgpu_device *adev = get_amdgpu_device(kgd);
 
lock_srbm(kgd, 0, 0, 0, vmid);
 
WREG32(mmSH_MEM_CONFIG, sh_mem_config);
@@ -237,42 +237,42 @@ static int kgd_set_pasid_vmid_mapping(struct kgd_dev 
*kgd, unsigned int pasid,
 
/* Mapping vmid to pasid also for IH block */
WREG32(mmIH_VMID_0_LUT + vmid, pasid_mapping);
 
return 0;
 }
 
 static int kgd_init_pipeline(struct kgd_dev *kgd, uint32_t pipe_id,
uint32_t hpd_size, uint64_t hpd_gpu_addr)
 {
/* amdgpu owns the per-pipe state */
return 0;
 }
 
 static int kgd_init_interrupts(struct kgd_dev *kgd, uint32_t pipe_id)
 {
struct amdgpu_device *adev = get_amdgpu_device(kgd);
uint32_t mec;
uint32_t pipe;
 
-   mec = (pipe_id / CIK_PIPE_PER_MEC) + 1;
-   pipe = (pipe_id % CIK_PIPE_PER_MEC);
+   mec = (pipe_id / adev->gfx.mec.num_pipe_per_mec) + 1;
+   pipe = (pipe_id % adev->gfx.mec.num_pipe_per_mec);
 
lock_srbm(kgd, mec, pipe, 0, 0);
 
WREG32(mmCPC_INT_CNTL, CP_INT_CNTL_RING0__TIME_STAMP_INT_ENABLE_MASK |
CP_INT_CNTL_RING0__OPCODE_ERROR_INT_ENABLE_MASK);
 
unlock_srbm(kgd);
 
return 0;
 }
 
 static inline uint32_t get_sdma_base_addr(struct cik_sdma_rlc_registers *m)
 {
uint32_t retval;
 
retval = m->sdma_engine_id * SDMA1_REGISTER_OFFSET +
m->sdma_queue_id * KFD_CIK_SDMA_QUEUE_OFFSET;
 
pr_debug("kfd: sdma base address: 0x%x\n", retval);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
index 6ba94e9..133d066 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_g

[PATCH 16/17] drm/amdgpu: avoid KIQ clashing with compute or KFD queues v2

2017-04-13 Thread Andres Rodriguez
Instead of picking an arbitrary queue for KIQ, search for one according
to policy. The queue must be unused.

Also report the KIQ as an unavailable resource to KFD.

In testing I ran into KCQ initialization issues when using pipes 2/3 of
MEC2 for the KIQ. Therefore the policy disallows grabbing one of these.

v2: fix (ring.me + 1) to (ring.me -1) in amdgpu_amdkfd_device_init

Reviewed-by: Felix Kuehling 
Signed-off-by: Andres Rodriguez 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h| 23 +---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c |  8 ++
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c  | 43 --
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  | 42 -
 4 files changed, 98 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 0583396..0a58575 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1762,51 +1762,68 @@ static inline void amdgpu_ring_write_multiple(struct 
amdgpu_ring *ring, void *sr
ring->count_dw -= count_dw;
}
 }
 
 static inline struct amdgpu_sdma_instance *
 amdgpu_get_sdma_instance(struct amdgpu_ring *ring)
 {
struct amdgpu_device *adev = ring->adev;
int i;
 
for (i = 0; i < adev->sdma.num_instances; i++)
if (&adev->sdma.instance[i].ring == ring)
break;
 
if (i < AMDGPU_MAX_SDMA_INSTANCES)
return &adev->sdma.instance[i];
else
return NULL;
 }
 
-static inline bool amdgpu_is_mec_queue_enabled(struct amdgpu_device *adev,
-   int mec, int pipe, int queue)
+static inline int amdgpu_queue_to_bit(struct amdgpu_device *adev,
+ int mec, int pipe, int queue)
 {
int bit = 0;
 
bit += mec * adev->gfx.mec.num_pipe_per_mec
* adev->gfx.mec.num_queue_per_pipe;
bit += pipe * adev->gfx.mec.num_queue_per_pipe;
bit += queue;
 
-   return test_bit(bit, adev->gfx.mec.queue_bitmap);
+   return bit;
+}
+
+static inline void amdgpu_bit_to_queue(struct amdgpu_device *adev, int bit,
+  int *mec, int *pipe, int *queue)
+{
+   *queue = bit % adev->gfx.mec.num_queue_per_pipe;
+   *pipe = (bit / adev->gfx.mec.num_queue_per_pipe)
+   % adev->gfx.mec.num_pipe_per_mec;
+   *mec = (bit / adev->gfx.mec.num_queue_per_pipe)
+  / adev->gfx.mec.num_pipe_per_mec;
+
+}
+static inline bool amdgpu_is_mec_queue_enabled(struct amdgpu_device *adev,
+  int mec, int pipe, int queue)
+{
+   return test_bit(amdgpu_queue_to_bit(adev, mec, pipe, queue),
+   adev->gfx.mec.queue_bitmap);
 }
 
 /*
  * ASICs macro.
  */
 #define amdgpu_asic_set_vga_state(adev, state) 
(adev)->asic_funcs->set_vga_state((adev), (state))
 #define amdgpu_asic_reset(adev) (adev)->asic_funcs->reset((adev))
 #define amdgpu_asic_get_xclk(adev) (adev)->asic_funcs->get_xclk((adev))
 #define amdgpu_asic_set_uvd_clocks(adev, v, d) 
(adev)->asic_funcs->set_uvd_clocks((adev), (v), (d))
 #define amdgpu_asic_set_vce_clocks(adev, ev, ec) 
(adev)->asic_funcs->set_vce_clocks((adev), (ev), (ec))
 #define amdgpu_get_pcie_lanes(adev) (adev)->asic_funcs->get_pcie_lanes((adev))
 #define amdgpu_set_pcie_lanes(adev, l) 
(adev)->asic_funcs->set_pcie_lanes((adev), (l))
 #define amdgpu_asic_get_gpu_clock_counter(adev) 
(adev)->asic_funcs->get_gpu_clock_counter((adev))
 #define amdgpu_asic_read_disabled_bios(adev) 
(adev)->asic_funcs->read_disabled_bios((adev))
 #define amdgpu_asic_read_bios_from_rom(adev, b, l) 
(adev)->asic_funcs->read_bios_from_rom((adev), (b), (l))
 #define amdgpu_asic_read_register(adev, se, sh, offset, 
v)((adev)->asic_funcs->read_register((adev), (se), (sh), (offset), (v)))
 #define amdgpu_asic_get_config_memsize(adev) 
(adev)->asic_funcs->get_config_memsize((adev))
 #define amdgpu_gart_flush_gpu_tlb(adev, vmid) 
(adev)->gart.gart_funcs->flush_gpu_tlb((adev), (vmid))
 #define amdgpu_gart_set_pte_pde(adev, pt, idx, addr, flags) 
(adev)->gart.gart_funcs->set_pte_pde((adev), (pt), (idx), (addr), (flags))
 #define amdgpu_vm_copy_pte(adev, ib, pe, src, count) 
((adev)->vm_manager.vm_pte_funcs->copy_pte((ib), (pe), (src), (count)))
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index 8fc5aa3..339e8cd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -94,40 +94,48 @@ void amdgpu_amdkfd_device_probe(struct amdgpu_device *adev)
 }
 
 void amdgpu_amdkfd_device_init(struct amdgpu_device *adev)
 {
int i;
int last_valid_bit;
if (adev->kfd) {
struct kgd2kfd_shared_resources gpu_resources = {
.compute_vmid_bitmap = 0xFF00,
  

[PATCH 06/17] drm/amdgpu: fix kgd_hqd_load failing to update shadow_wptr

2017-04-13 Thread Andres Rodriguez
The return value from copy_form_user is 0 for the success case.

Signed-off-by: Andres Rodriguez 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
index f9ad534..8af2975 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
@@ -235,41 +235,41 @@ static inline uint32_t get_sdma_base_addr(struct 
cik_sdma_rlc_registers *m)
 static inline struct vi_mqd *get_mqd(void *mqd)
 {
return (struct vi_mqd *)mqd;
 }
 
 static inline struct cik_sdma_rlc_registers *get_sdma_mqd(void *mqd)
 {
return (struct cik_sdma_rlc_registers *)mqd;
 }
 
 static int kgd_hqd_load(struct kgd_dev *kgd, void *mqd, uint32_t pipe_id,
uint32_t queue_id, uint32_t __user *wptr)
 {
struct vi_mqd *m;
uint32_t shadow_wptr, valid_wptr;
struct amdgpu_device *adev = get_amdgpu_device(kgd);
 
m = get_mqd(mqd);
 
valid_wptr = copy_from_user(&shadow_wptr, wptr, sizeof(shadow_wptr));
-   if (valid_wptr > 0)
+   if (valid_wptr == 0)
m->cp_hqd_pq_wptr = shadow_wptr;
 
acquire_queue(kgd, pipe_id, queue_id);
gfx_v8_0_mqd_commit(adev, mqd);
release_queue(kgd);
 
return 0;
 }
 
 static int kgd_hqd_sdma_load(struct kgd_dev *kgd, void *mqd)
 {
return 0;
 }
 
 static bool kgd_hqd_is_occupied(struct kgd_dev *kgd, uint64_t queue_address,
uint32_t pipe_id, uint32_t queue_id)
 {
struct amdgpu_device *adev = get_amdgpu_device(kgd);
uint32_t act;
bool retval = false;
-- 
2.9.3

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 07/17] drm/amdgpu: rename rdev to adev

2017-04-13 Thread Andres Rodriguez
Rename straggler instances of r(adeon)dev to a(mdgpu)dev

Reviewed-by: Edward O'Callaghan 
Reviewed-by: Felix Kuehling 
Acked-by: Christian König 
Signed-off-by: Andres Rodriguez 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 70 +++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 14 +++---
 drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c  |  2 +-
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c  |  2 +-
 4 files changed, 44 insertions(+), 44 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
index dba8a5b..3200ff9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
@@ -43,204 +43,204 @@ int amdgpu_amdkfd_init(void)
return -ENOENT;
 
ret = kgd2kfd_init_p(KFD_INTERFACE_VERSION, &kgd2kfd);
if (ret) {
symbol_put(kgd2kfd_init);
kgd2kfd = NULL;
}
 
 #elif defined(CONFIG_HSA_AMD)
ret = kgd2kfd_init(KFD_INTERFACE_VERSION, &kgd2kfd);
if (ret)
kgd2kfd = NULL;
 
 #else
ret = -ENOENT;
 #endif
 
return ret;
 }
 
-bool amdgpu_amdkfd_load_interface(struct amdgpu_device *rdev)
+bool amdgpu_amdkfd_load_interface(struct amdgpu_device *adev)
 {
-   switch (rdev->asic_type) {
+   switch (adev->asic_type) {
 #ifdef CONFIG_DRM_AMDGPU_CIK
case CHIP_KAVERI:
kfd2kgd = amdgpu_amdkfd_gfx_7_get_functions();
break;
 #endif
case CHIP_CARRIZO:
kfd2kgd = amdgpu_amdkfd_gfx_8_0_get_functions();
break;
default:
return false;
}
 
return true;
 }
 
 void amdgpu_amdkfd_fini(void)
 {
if (kgd2kfd) {
kgd2kfd->exit();
symbol_put(kgd2kfd_init);
}
 }
 
-void amdgpu_amdkfd_device_probe(struct amdgpu_device *rdev)
+void amdgpu_amdkfd_device_probe(struct amdgpu_device *adev)
 {
if (kgd2kfd)
-   rdev->kfd = kgd2kfd->probe((struct kgd_dev *)rdev,
-   rdev->pdev, kfd2kgd);
+   adev->kfd = kgd2kfd->probe((struct kgd_dev *)adev,
+   adev->pdev, kfd2kgd);
 }
 
-void amdgpu_amdkfd_device_init(struct amdgpu_device *rdev)
+void amdgpu_amdkfd_device_init(struct amdgpu_device *adev)
 {
-   if (rdev->kfd) {
+   if (adev->kfd) {
struct kgd2kfd_shared_resources gpu_resources = {
.compute_vmid_bitmap = 0xFF00,
 
.first_compute_pipe = 1,
.compute_pipe_count = 4 - 1,
};
 
-   amdgpu_doorbell_get_kfd_info(rdev,
+   amdgpu_doorbell_get_kfd_info(adev,
&gpu_resources.doorbell_physical_address,
&gpu_resources.doorbell_aperture_size,
&gpu_resources.doorbell_start_offset);
 
-   kgd2kfd->device_init(rdev->kfd, &gpu_resources);
+   kgd2kfd->device_init(adev->kfd, &gpu_resources);
}
 }
 
-void amdgpu_amdkfd_device_fini(struct amdgpu_device *rdev)
+void amdgpu_amdkfd_device_fini(struct amdgpu_device *adev)
 {
-   if (rdev->kfd) {
-   kgd2kfd->device_exit(rdev->kfd);
-   rdev->kfd = NULL;
+   if (adev->kfd) {
+   kgd2kfd->device_exit(adev->kfd);
+   adev->kfd = NULL;
}
 }
 
-void amdgpu_amdkfd_interrupt(struct amdgpu_device *rdev,
+void amdgpu_amdkfd_interrupt(struct amdgpu_device *adev,
const void *ih_ring_entry)
 {
-   if (rdev->kfd)
-   kgd2kfd->interrupt(rdev->kfd, ih_ring_entry);
+   if (adev->kfd)
+   kgd2kfd->interrupt(adev->kfd, ih_ring_entry);
 }
 
-void amdgpu_amdkfd_suspend(struct amdgpu_device *rdev)
+void amdgpu_amdkfd_suspend(struct amdgpu_device *adev)
 {
-   if (rdev->kfd)
-   kgd2kfd->suspend(rdev->kfd);
+   if (adev->kfd)
+   kgd2kfd->suspend(adev->kfd);
 }
 
-int amdgpu_amdkfd_resume(struct amdgpu_device *rdev)
+int amdgpu_amdkfd_resume(struct amdgpu_device *adev)
 {
int r = 0;
 
-   if (rdev->kfd)
-   r = kgd2kfd->resume(rdev->kfd);
+   if (adev->kfd)
+   r = kgd2kfd->resume(adev->kfd);
 
return r;
 }
 
 int alloc_gtt_mem(struct kgd_dev *kgd, size_t size,
void **mem_obj, uint64_t *gpu_addr,
void **cpu_ptr)
 {
-   struct amdgpu_device *rdev = (struct amdgpu_device *)kgd;
+   struct amdgpu_device *adev = (struct amdgpu_device *)kgd;
struct kgd_mem **mem = (struct kgd_mem **) mem_obj;
int r;
 
BUG_ON(kgd == NULL);
BUG_ON(gpu_addr == NULL);
BUG_ON(cpu_ptr == NULL);
 
*mem = kmalloc(sizeof(struct kgd_mem), GFP_KERNEL);
if ((*mem) == NULL)
return -ENOMEM;
 
-   r =

[PATCH 08/17] drm/radeon: take ownership of pipe initialization

2017-04-13 Thread Andres Rodriguez
Take ownership of pipe initialization away from KFD.

Note that hpd_eop_gpu_addr was already large enough to accomodate all
pipes.

Reviewed-by: Edward O'Callaghan 
Reviewed-by: Felix Kuehling 
Acked-by: Christian König 
Signed-off-by: Andres Rodriguez 
---
 drivers/gpu/drm/radeon/cik.c| 27 ++-
 drivers/gpu/drm/radeon/radeon_kfd.c | 13 +
 2 files changed, 15 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/radeon/cik.c b/drivers/gpu/drm/radeon/cik.c
index 53710dd..3d084c2 100644
--- a/drivers/gpu/drm/radeon/cik.c
+++ b/drivers/gpu/drm/radeon/cik.c
@@ -4563,57 +4563,58 @@ static int cik_cp_compute_resume(struct radeon_device 
*rdev)
bool use_doorbell = true;
u64 hqd_gpu_addr;
u64 mqd_gpu_addr;
u64 eop_gpu_addr;
u64 wb_gpu_addr;
u32 *buf;
struct bonaire_mqd *mqd;
 
r = cik_cp_compute_start(rdev);
if (r)
return r;
 
/* fix up chicken bits */
tmp = RREG32(CP_CPF_DEBUG);
tmp |= (1 << 23);
WREG32(CP_CPF_DEBUG, tmp);
 
/* init the pipes */
mutex_lock(&rdev->srbm_mutex);
 
-   eop_gpu_addr = rdev->mec.hpd_eop_gpu_addr;
+   for (i = 0; i < rdev->mec.num_pipe; ++i) {
+   cik_srbm_select(rdev, 0, i, 0, 0);
 
-   cik_srbm_select(rdev, 0, 0, 0, 0);
-
-   /* write the EOP addr */
-   WREG32(CP_HPD_EOP_BASE_ADDR, eop_gpu_addr >> 8);
-   WREG32(CP_HPD_EOP_BASE_ADDR_HI, upper_32_bits(eop_gpu_addr) >> 8);
+   eop_gpu_addr = rdev->mec.hpd_eop_gpu_addr + (i * MEC_HPD_SIZE * 
2) ;
+   /* write the EOP addr */
+   WREG32(CP_HPD_EOP_BASE_ADDR, eop_gpu_addr >> 8);
+   WREG32(CP_HPD_EOP_BASE_ADDR_HI, upper_32_bits(eop_gpu_addr) >> 
8);
 
-   /* set the VMID assigned */
-   WREG32(CP_HPD_EOP_VMID, 0);
+   /* set the VMID assigned */
+   WREG32(CP_HPD_EOP_VMID, 0);
 
-   /* set the EOP size, register value is 2^(EOP_SIZE+1) dwords */
-   tmp = RREG32(CP_HPD_EOP_CONTROL);
-   tmp &= ~EOP_SIZE_MASK;
-   tmp |= order_base_2(MEC_HPD_SIZE / 8);
-   WREG32(CP_HPD_EOP_CONTROL, tmp);
+   /* set the EOP size, register value is 2^(EOP_SIZE+1) dwords */
+   tmp = RREG32(CP_HPD_EOP_CONTROL);
+   tmp &= ~EOP_SIZE_MASK;
+   tmp |= order_base_2(MEC_HPD_SIZE / 8);
+   WREG32(CP_HPD_EOP_CONTROL, tmp);
 
+   }
mutex_unlock(&rdev->srbm_mutex);
 
/* init the queues.  Just two for now. */
for (i = 0; i < 2; i++) {
if (i == 0)
idx = CAYMAN_RING_TYPE_CP1_INDEX;
else
idx = CAYMAN_RING_TYPE_CP2_INDEX;
 
if (rdev->ring[idx].mqd_obj == NULL) {
r = radeon_bo_create(rdev,
 sizeof(struct bonaire_mqd),
 PAGE_SIZE, true,
 RADEON_GEM_DOMAIN_GTT, 0, NULL,
 NULL, &rdev->ring[idx].mqd_obj);
if (r) {
dev_warn(rdev->dev, "(%d) create MQD bo 
failed\n", r);
return r;
}
}
diff --git a/drivers/gpu/drm/radeon/radeon_kfd.c 
b/drivers/gpu/drm/radeon/radeon_kfd.c
index 87a9ebb..a06e3b1 100644
--- a/drivers/gpu/drm/radeon/radeon_kfd.c
+++ b/drivers/gpu/drm/radeon/radeon_kfd.c
@@ -406,52 +406,41 @@ static int kgd_set_pasid_vmid_mapping(struct kgd_dev 
*kgd, unsigned int pasid,
ATC_VMID_PASID_MAPPING_VALID_MASK;
 
write_register(kgd, ATC_VMID0_PASID_MAPPING + vmid*sizeof(uint32_t),
pasid_mapping);
 
while (!(read_register(kgd, ATC_VMID_PASID_MAPPING_UPDATE_STATUS) &
(1U << vmid)))
cpu_relax();
write_register(kgd, ATC_VMID_PASID_MAPPING_UPDATE_STATUS, 1U << vmid);
 
/* Mapping vmid to pasid also for IH block */
write_register(kgd, IH_VMID_0_LUT + vmid * sizeof(uint32_t),
pasid_mapping);
 
return 0;
 }
 
 static int kgd_init_pipeline(struct kgd_dev *kgd, uint32_t pipe_id,
uint32_t hpd_size, uint64_t hpd_gpu_addr)
 {
-   uint32_t mec = (pipe_id / CIK_PIPE_PER_MEC) + 1;
-   uint32_t pipe = (pipe_id % CIK_PIPE_PER_MEC);
-
-   lock_srbm(kgd, mec, pipe, 0, 0);
-   write_register(kgd, CP_HPD_EOP_BASE_ADDR,
-   lower_32_bits(hpd_gpu_addr >> 8));
-   write_register(kgd, CP_HPD_EOP_BASE_ADDR_HI,
-   upper_32_bits(hpd_gpu_addr >> 8));
-   write_register(kgd, CP_HPD_EOP_VMID, 0);
-   write_register(kgd, CP_HPD_EOP_CONTROL, hpd_size);
-   unlock_srbm(kgd);

[PATCH 09/17] drm/amdgpu: take ownership of per-pipe configuration v2

2017-04-13 Thread Andres Rodriguez
Make amdgpu the owner of all per-pipe state of the HQDs.

This change will allow us to split the queues between kfd and amdgpu
with a queue granularity instead of pipe granularity.

This patch fixes kfd allocating an HDP_EOP region for its 3 pipes which
goes unused.

v2: support for gfx9

Reviewed-by: Edward O'Callaghan 
Reviewed-by: Felix Kuehling 
Acked-by: Christian König 
Signed-off-by: Andres Rodriguez 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h|  4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c  | 13 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c  |  1 +
 drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c  | 28 ++
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c  | 33 +++-
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c  | 24 
 .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c  | 45 --
 7 files changed, 65 insertions(+), 83 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 3abd2dc..6b294d2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -880,43 +880,43 @@ struct amdgpu_rlc {
/* for firmware data */
u32 save_and_restore_offset;
u32 clear_state_descriptor_offset;
u32 avail_scratch_ram_locations;
u32 reg_restore_list_size;
u32 reg_list_format_start;
u32 reg_list_format_separate_start;
u32 starting_offsets_start;
u32 reg_list_format_size_bytes;
u32 reg_list_size_bytes;
 
u32 *register_list_format;
u32 *register_restore;
 };
 
 struct amdgpu_mec {
struct amdgpu_bo*hpd_eop_obj;
u64 hpd_eop_gpu_addr;
struct amdgpu_bo*mec_fw_obj;
u64 mec_fw_gpu_addr;
-   u32 num_pipe;
u32 num_mec;
-   u32 num_queue;
+   u32 num_pipe_per_mec;
+   u32 num_queue_per_pipe;
void*mqd_backup[AMDGPU_MAX_COMPUTE_RINGS + 1];
 };
 
 struct amdgpu_kiq {
u64 eop_gpu_addr;
struct amdgpu_bo*eop_obj;
struct amdgpu_ring  ring;
struct amdgpu_irq_src   irq;
 };
 
 /*
  * GPU scratch registers structures, functions & helpers
  */
 struct amdgpu_scratch {
unsignednum_reg;
uint32_treg_base;
uint32_tfree_mask;
 };
 
 /*
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
index 038b7ea..910f9d3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c
@@ -227,52 +227,41 @@ static int kgd_set_pasid_vmid_mapping(struct kgd_dev 
*kgd, unsigned int pasid,
 * SW cleared it. So the protocol is to always wait & clear.
 */
uint32_t pasid_mapping = (pasid == 0) ? 0 : (uint32_t)pasid |
ATC_VMID0_PASID_MAPPING__VALID_MASK;
 
WREG32(mmATC_VMID0_PASID_MAPPING + vmid, pasid_mapping);
 
while (!(RREG32(mmATC_VMID_PASID_MAPPING_UPDATE_STATUS) & (1U << vmid)))
cpu_relax();
WREG32(mmATC_VMID_PASID_MAPPING_UPDATE_STATUS, 1U << vmid);
 
/* Mapping vmid to pasid also for IH block */
WREG32(mmIH_VMID_0_LUT + vmid, pasid_mapping);
 
return 0;
 }
 
 static int kgd_init_pipeline(struct kgd_dev *kgd, uint32_t pipe_id,
uint32_t hpd_size, uint64_t hpd_gpu_addr)
 {
-   struct amdgpu_device *adev = get_amdgpu_device(kgd);
-
-   uint32_t mec = (++pipe_id / CIK_PIPE_PER_MEC) + 1;
-   uint32_t pipe = (pipe_id % CIK_PIPE_PER_MEC);
-
-   lock_srbm(kgd, mec, pipe, 0, 0);
-   WREG32(mmCP_HPD_EOP_BASE_ADDR, lower_32_bits(hpd_gpu_addr >> 8));
-   WREG32(mmCP_HPD_EOP_BASE_ADDR_HI, upper_32_bits(hpd_gpu_addr >> 8));
-   WREG32(mmCP_HPD_EOP_VMID, 0);
-   WREG32(mmCP_HPD_EOP_CONTROL, hpd_size);
-   unlock_srbm(kgd);
-
+   /* amdgpu owns the per-pipe state */
return 0;
 }
 
 static int kgd_init_interrupts(struct kgd_dev *kgd, uint32_t pipe_id)
 {
struct amdgpu_device *adev = get_amdgpu_device(kgd);
uint32_t mec;
uint32_t pipe;
 
mec = (pipe_id / CIK_PIPE_PER_MEC) + 1;
pipe = (pipe_id % CIK_PIPE_PER_MEC);
 
lock_srbm(kgd, mec, pipe, 0, 0);
 
WREG32(mmCPC_INT_CNTL, CP_INT_CNTL_RING0__TIME_STAMP_INT_ENABLE_MASK |
CP_INT_CNTL_RING0__OPCODE_ERROR_INT_ENABLE_MASK);
 
unlock_srbm(kgd);
 
return 0;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
index 8af2975..6ba94e9 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
@@ -189,40 +189,41 @@ static int kgd_set_pasid_vmid_mapping(struct kgd_dev 
*kgd, unsi

[PATCH 02/17] drm/amdgpu: refactor MQD/HQD initialization v3

2017-04-13 Thread Andres Rodriguez
The MQD programming sequence currently exists in 3 different places.
Refactor it to absorb all the duplicates.

The success path remains mostly identical except for a slightly
different order in the non-kiq case. This shouldn't matter if the HQD
is disabled.

The error handling paths have been updated to deal with the new code
structure.

v2: the non-kiq path for gfxv8 was dropped in the rebase
v3: split MEC_HPD_SIZE rename, dropped doorbell changes

Reviewed-by: Edward O'Callaghan 
Acked-by: Christian König 
Acked-by: Felix Kuehling 
Signed-off-by: Andres Rodriguez 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | 439 ++
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c |  78 +++---
 2 files changed, 271 insertions(+), 246 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
index 3b98162..4e6a60c 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
@@ -2927,281 +2927,316 @@ struct bonaire_mqd
u32 perf_counter_enable;
u32 pgm[2];
u32 tba[2];
u32 tma[2];
u32 pgm_rsrc[2];
u32 vmid;
u32 resource_limits;
u32 static_thread_mgmt01[2];
u32 tmp_ring_size;
u32 static_thread_mgmt23[2];
u32 restart[3];
u32 thread_trace_enable;
u32 reserved1;
u32 user_data[16];
u32 vgtcs_invoke_count[2];
struct hqd_registers queue_state;
u32 dequeue_cntr;
u32 interrupt_queue[64];
 };
 
-/**
- * gfx_v7_0_cp_compute_resume - setup the compute queue registers
- *
- * @adev: amdgpu_device pointer
- *
- * Program the compute queues and test them to make sure they
- * are working.
- * Returns 0 for success, error for failure.
- */
-static int gfx_v7_0_cp_compute_resume(struct amdgpu_device *adev)
+static void gfx_v7_0_compute_pipe_init(struct amdgpu_device *adev, int me, int 
pipe)
 {
-   int r, i, j;
-   u32 tmp;
-   bool use_doorbell = true;
-   u64 hqd_gpu_addr;
-   u64 mqd_gpu_addr;
u64 eop_gpu_addr;
-   u64 wb_gpu_addr;
-   u32 *buf;
-   struct bonaire_mqd *mqd;
-   struct amdgpu_ring *ring;
-
-   /* fix up chicken bits */
-   tmp = RREG32(mmCP_CPF_DEBUG);
-   tmp |= (1 << 23);
-   WREG32(mmCP_CPF_DEBUG, tmp);
+   u32 tmp;
+   size_t eop_offset = me * pipe * GFX7_MEC_HPD_SIZE * 2;
 
-   /* init the pipes */
mutex_lock(&adev->srbm_mutex);
-   for (i = 0; i < (adev->gfx.mec.num_pipe * adev->gfx.mec.num_mec); i++) {
-   int me = (i < 4) ? 1 : 2;
-   int pipe = (i < 4) ? i : (i - 4);
+   eop_gpu_addr = adev->gfx.mec.hpd_eop_gpu_addr + eop_offset;
 
-   eop_gpu_addr = adev->gfx.mec.hpd_eop_gpu_addr + (i * 
GFX7_MEC_HPD_SIZE * 2);
+   cik_srbm_select(adev, me, pipe, 0, 0);
 
-   cik_srbm_select(adev, me, pipe, 0, 0);
+   /* write the EOP addr */
+   WREG32(mmCP_HPD_EOP_BASE_ADDR, eop_gpu_addr >> 8);
+   WREG32(mmCP_HPD_EOP_BASE_ADDR_HI, upper_32_bits(eop_gpu_addr) >> 8);
 
-   /* write the EOP addr */
-   WREG32(mmCP_HPD_EOP_BASE_ADDR, eop_gpu_addr >> 8);
-   WREG32(mmCP_HPD_EOP_BASE_ADDR_HI, upper_32_bits(eop_gpu_addr) 
>> 8);
+   /* set the VMID assigned */
+   WREG32(mmCP_HPD_EOP_VMID, 0);
 
-   /* set the VMID assigned */
-   WREG32(mmCP_HPD_EOP_VMID, 0);
+   /* set the EOP size, register value is 2^(EOP_SIZE+1) dwords */
+   tmp = RREG32(mmCP_HPD_EOP_CONTROL);
+   tmp &= ~CP_HPD_EOP_CONTROL__EOP_SIZE_MASK;
+   tmp |= order_base_2(GFX7_MEC_HPD_SIZE / 8);
+   WREG32(mmCP_HPD_EOP_CONTROL, tmp);
 
-   /* set the EOP size, register value is 2^(EOP_SIZE+1) dwords */
-   tmp = RREG32(mmCP_HPD_EOP_CONTROL);
-   tmp &= ~CP_HPD_EOP_CONTROL__EOP_SIZE_MASK;
-   tmp |= order_base_2(GFX7_MEC_HPD_SIZE / 8);
-   WREG32(mmCP_HPD_EOP_CONTROL, tmp);
-   }
cik_srbm_select(adev, 0, 0, 0, 0);
mutex_unlock(&adev->srbm_mutex);
+}
 
-   /* init the queues.  Just two for now. */
-   for (i = 0; i < adev->gfx.num_compute_rings; i++) {
-   ring = &adev->gfx.compute_ring[i];
+static int gfx_v7_0_mqd_deactivate(struct amdgpu_device *adev)
+{
+   int i;
 
-   if (ring->mqd_obj == NULL) {
-   r = amdgpu_bo_create(adev,
-sizeof(struct bonaire_mqd),
-PAGE_SIZE, true,
-AMDGPU_GEM_DOMAIN_GTT, 0, NULL, 
NULL,
-&ring->mqd_obj);
-   if (r) {
-   dev_warn(adev->dev, "(%d) create MQD bo 
failed\n", r);
-   return r;
-   }
+   /* disable the queue if it's active */
+   if (RREG32(mm

[PATCH 03/17] drm/amdgpu: detect timeout error when deactivating hqd

2017-04-13 Thread Andres Rodriguez
Handle HQD deactivation timeouts instead of ignoring them.

Reviewed-by: Edward O'Callaghan 
Acked-by: Christian König 
Acked-by: Felix Kuehling 
Signed-off-by: Andres Rodriguez 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 20 +---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
index b670302..cd1af26 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
@@ -4947,75 +4947,89 @@ static int gfx_v8_0_mqd_commit(struct amdgpu_ring *ring)
 
/* enable the doorbell if requested */
WREG32(mmCP_HQD_PQ_DOORBELL_CONTROL, mqd->cp_hqd_pq_doorbell_control);
 
/* reset read and write pointers, similar to CP_RB0_WPTR/_RPTR */
WREG32(mmCP_HQD_PQ_WPTR, mqd->cp_hqd_pq_wptr);
 
/* set the vmid for the queue */
WREG32(mmCP_HQD_VMID, mqd->cp_hqd_vmid);
 
WREG32(mmCP_HQD_PERSISTENT_STATE, mqd->cp_hqd_persistent_state);
 
/* activate the queue */
WREG32(mmCP_HQD_ACTIVE, mqd->cp_hqd_active);
 
return 0;
 }
 
 static int gfx_v8_0_kiq_init_queue(struct amdgpu_ring *ring)
 {
+   int r = 0;
struct amdgpu_device *adev = ring->adev;
struct vi_mqd *mqd = ring->mqd_ptr;
int mqd_idx = AMDGPU_MAX_COMPUTE_RINGS;
 
gfx_v8_0_kiq_setting(ring);
 
if (adev->gfx.in_reset) { /* for GPU_RESET case */
/* reset MQD to a clean status */
if (adev->gfx.mec.mqd_backup[mqd_idx])
memcpy(mqd, adev->gfx.mec.mqd_backup[mqd_idx], 
sizeof(*mqd));
 
/* reset ring buffer */
ring->wptr = 0;
amdgpu_ring_clear_ring(ring);
 
mutex_lock(&adev->srbm_mutex);
vi_srbm_select(adev, ring->me, ring->pipe, ring->queue, 0);
-   gfx_v8_0_deactivate_hqd(adev, 1);
+   r = gfx_v8_0_deactivate_hqd(adev, 1);
+   if (r) {
+   dev_err(adev->dev, "failed to deactivate ring %s\n", 
ring->name);
+   goto out_unlock;
+   }
gfx_v8_0_mqd_commit(ring);
vi_srbm_select(adev, 0, 0, 0, 0);
mutex_unlock(&adev->srbm_mutex);
} else {
mutex_lock(&adev->srbm_mutex);
vi_srbm_select(adev, ring->me, ring->pipe, ring->queue, 0);
gfx_v8_0_mqd_init(ring);
-   gfx_v8_0_deactivate_hqd(adev, 1);
+   r = gfx_v8_0_deactivate_hqd(adev, 1);
+   if (r) {
+   dev_err(adev->dev, "failed to deactivate ring %s\n", 
ring->name);
+   goto out_unlock;
+   }
gfx_v8_0_mqd_commit(ring);
vi_srbm_select(adev, 0, 0, 0, 0);
mutex_unlock(&adev->srbm_mutex);
 
if (adev->gfx.mec.mqd_backup[mqd_idx])
memcpy(adev->gfx.mec.mqd_backup[mqd_idx], mqd, 
sizeof(*mqd));
}
 
-   return 0;
+   return r;
+
+out_unlock:
+   vi_srbm_select(adev, 0, 0, 0, 0);
+   mutex_unlock(&adev->srbm_mutex);
+   return r;
 }
 
 static int gfx_v8_0_kcq_init_queue(struct amdgpu_ring *ring)
 {
struct amdgpu_device *adev = ring->adev;
struct vi_mqd *mqd = ring->mqd_ptr;
int mqd_idx = ring - &adev->gfx.compute_ring[0];
 
if (!adev->gfx.in_reset && !adev->gfx.in_suspend) {
mutex_lock(&adev->srbm_mutex);
vi_srbm_select(adev, ring->me, ring->pipe, ring->queue, 0);
gfx_v8_0_mqd_init(ring);
vi_srbm_select(adev, 0, 0, 0, 0);
mutex_unlock(&adev->srbm_mutex);
 
if (adev->gfx.mec.mqd_backup[mqd_idx])
memcpy(adev->gfx.mec.mqd_backup[mqd_idx], mqd, 
sizeof(*mqd));
} else if (adev->gfx.in_reset) { /* for GPU_RESET case */
/* reset MQD to a clean status */
if (adev->gfx.mec.mqd_backup[mqd_idx])
-- 
2.9.3

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 04/17] drm/amdgpu: remove duplicate definition of cik_mqd

2017-04-13 Thread Andres Rodriguez
The gfxv7 contains a slightly different version of cik_mqd called
bonaire_mqd. This can introduce subtle bugs if fixes are not applied in
both places.

Reviewed-by: Edward O'Callaghan 
Acked-by: Christian König 
Acked-by: Felix Kuehling 
Signed-off-by: Andres Rodriguez 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | 135 ++
 1 file changed, 54 insertions(+), 81 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
index 4e6a60c..c408af5 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
@@ -10,40 +10,41 @@
  *
  * The above copyright notice and this permission notice shall be included in
  * all copies or substantial portions of the Software.
  *
  * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
  * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
  * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
  * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
  * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
  * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
  * OTHER DEALINGS IN THE SOFTWARE.
  *
  */
 #include 
 #include "drmP.h"
 #include "amdgpu.h"
 #include "amdgpu_ih.h"
 #include "amdgpu_gfx.h"
 #include "cikd.h"
 #include "cik.h"
+#include "cik_structs.h"
 #include "atom.h"
 #include "amdgpu_ucode.h"
 #include "clearstate_ci.h"
 
 #include "dce/dce_8_0_d.h"
 #include "dce/dce_8_0_sh_mask.h"
 
 #include "bif/bif_4_1_d.h"
 #include "bif/bif_4_1_sh_mask.h"
 
 #include "gca/gfx_7_0_d.h"
 #include "gca/gfx_7_2_enum.h"
 #include "gca/gfx_7_2_sh_mask.h"
 
 #include "gmc/gmc_7_0_d.h"
 #include "gmc/gmc_7_0_sh_mask.h"
 
 #include "oss/oss_2_0_d.h"
 #include "oss/oss_2_0_sh_mask.h"
 
@@ -2899,68 +2900,40 @@ struct hqd_registers
u32 cp_hqd_pq_control;
u32 cp_hqd_ib_base_addr;
u32 cp_hqd_ib_base_addr_hi;
u32 cp_hqd_ib_rptr;
u32 cp_hqd_ib_control;
u32 cp_hqd_iq_timer;
u32 cp_hqd_iq_rptr;
u32 cp_hqd_dequeue_request;
u32 cp_hqd_dma_offload;
u32 cp_hqd_sema_cmd;
u32 cp_hqd_msg_type;
u32 cp_hqd_atomic0_preop_lo;
u32 cp_hqd_atomic0_preop_hi;
u32 cp_hqd_atomic1_preop_lo;
u32 cp_hqd_atomic1_preop_hi;
u32 cp_hqd_hq_scheduler0;
u32 cp_hqd_hq_scheduler1;
u32 cp_mqd_control;
 };
 
-struct bonaire_mqd
-{
-   u32 header;
-   u32 dispatch_initiator;
-   u32 dimensions[3];
-   u32 start_idx[3];
-   u32 num_threads[3];
-   u32 pipeline_stat_enable;
-   u32 perf_counter_enable;
-   u32 pgm[2];
-   u32 tba[2];
-   u32 tma[2];
-   u32 pgm_rsrc[2];
-   u32 vmid;
-   u32 resource_limits;
-   u32 static_thread_mgmt01[2];
-   u32 tmp_ring_size;
-   u32 static_thread_mgmt23[2];
-   u32 restart[3];
-   u32 thread_trace_enable;
-   u32 reserved1;
-   u32 user_data[16];
-   u32 vgtcs_invoke_count[2];
-   struct hqd_registers queue_state;
-   u32 dequeue_cntr;
-   u32 interrupt_queue[64];
-};
-
 static void gfx_v7_0_compute_pipe_init(struct amdgpu_device *adev, int me, int 
pipe)
 {
u64 eop_gpu_addr;
u32 tmp;
size_t eop_offset = me * pipe * GFX7_MEC_HPD_SIZE * 2;
 
mutex_lock(&adev->srbm_mutex);
eop_gpu_addr = adev->gfx.mec.hpd_eop_gpu_addr + eop_offset;
 
cik_srbm_select(adev, me, pipe, 0, 0);
 
/* write the EOP addr */
WREG32(mmCP_HPD_EOP_BASE_ADDR, eop_gpu_addr >> 8);
WREG32(mmCP_HPD_EOP_BASE_ADDR_HI, upper_32_bits(eop_gpu_addr) >> 8);
 
/* set the VMID assigned */
WREG32(mmCP_HPD_EOP_VMID, 0);
 
/* set the EOP size, register value is 2^(EOP_SIZE+1) dwords */
tmp = RREG32(mmCP_HPD_EOP_CONTROL);
@@ -2980,182 +2953,182 @@ static int gfx_v7_0_mqd_deactivate(struct 
amdgpu_device *adev)
if (RREG32(mmCP_HQD_ACTIVE) & 1) {
WREG32(mmCP_HQD_DEQUEUE_REQUEST, 1);
for (i = 0; i < adev->usec_timeout; i++) {
if (!(RREG32(mmCP_HQD_ACTIVE) & 1))
break;
udelay(1);
}
 
if (i == adev->usec_timeout)
return -ETIMEDOUT;
 
WREG32(mmCP_HQD_DEQUEUE_REQUEST, 0);
WREG32(mmCP_HQD_PQ_RPTR, 0);
WREG32(mmCP_HQD_PQ_WPTR, 0);
}
 
return 0;
 }
 
 static void gfx_v7_0_mqd_init(struct amdgpu_device *adev,
-struct bonaire_mqd *mqd,
+struct cik_mqd *mqd,
 uint64_t mqd_gpu_addr,
 struct amdgpu_ring *ring)
 {
u64 hqd_gpu_addr;
u64 wb_gpu_addr;
 
/* init the mqd struct */
-   memset(mqd, 0, sizeof(struct b

[PATCH split] Improve pipe split between amdgpu and amdkfd

2017-04-13 Thread Andres Rodriguez
This is a split of patches that are ready to land from the series:
Add support for high priority scheduling in amdgpu v8

I've included Felix and Alex's feedback from the thread above. This includes:
 * Separate MEC_HPD_SIZE rename into a separate patch (patch 01)
 * Added a patch to fix the kgd_hqd_load bug Felix pointed out (patch 06)
 * Fixes for various off-by-one errors
 * Use gfx_v8_0_deactivate_hqd

Only comment I didn't address was changing the queue allocation policy for
gfx9 (similar to gfx7/8). See inline reply in that thread for more details
on why this was skipped.


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] dmr/amdgpu: Fix wrongly unref of BO

2017-04-13 Thread Alex Xie
According to comment of amdgpu_bo_reserve, amdgpu_bo_reserve
can return with -ERESTARTSYS. When this function was interrupted
by a signal, BO should not be unref. Otherwise the BO might be
released while is kmapped and pinned, or BO MIGHT be deref
multiple times, etc.

Change-Id: If76071a768950a0d3ad9d5da7fcae04881807621
Signed-off-by: Alex Xie 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 53996e3..1dcc2d1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -355,8 +355,8 @@ static void amdgpu_vram_scratch_fini(struct amdgpu_device 
*adev)
amdgpu_bo_kunmap(adev->vram_scratch.robj);
amdgpu_bo_unpin(adev->vram_scratch.robj);
amdgpu_bo_unreserve(adev->vram_scratch.robj);
+   amdgpu_bo_unref(&adev->vram_scratch.robj);
}
-   amdgpu_bo_unref(&adev->vram_scratch.robj);
 }
 
 /**
-- 
1.9.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 01/17] drm/amdgpu: clarify MEC_HPD_SIZE is specific to a gfx generation

2017-04-13 Thread Andres Rodriguez
Rename MEC_HPD_SIZE to GFXN_MEC_HPD_SIZE to clarify it is specific to a
gfx generation.

Signed-off-by: Andres Rodriguez 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | 11 +--
 drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 15 +++
 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 17 -
 3 files changed, 20 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
index c930bb8..3b98162 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
@@ -32,40 +32,41 @@
 #include "clearstate_ci.h"
 
 #include "dce/dce_8_0_d.h"
 #include "dce/dce_8_0_sh_mask.h"
 
 #include "bif/bif_4_1_d.h"
 #include "bif/bif_4_1_sh_mask.h"
 
 #include "gca/gfx_7_0_d.h"
 #include "gca/gfx_7_2_enum.h"
 #include "gca/gfx_7_2_sh_mask.h"
 
 #include "gmc/gmc_7_0_d.h"
 #include "gmc/gmc_7_0_sh_mask.h"
 
 #include "oss/oss_2_0_d.h"
 #include "oss/oss_2_0_sh_mask.h"
 
 #define GFX7_NUM_GFX_RINGS 1
 #define GFX7_NUM_COMPUTE_RINGS 8
+#define GFX7_MEC_HPD_SIZE  2048
 
 static void gfx_v7_0_set_ring_funcs(struct amdgpu_device *adev);
 static void gfx_v7_0_set_irq_funcs(struct amdgpu_device *adev);
 static void gfx_v7_0_set_gds_init(struct amdgpu_device *adev);
 
 MODULE_FIRMWARE("radeon/bonaire_pfp.bin");
 MODULE_FIRMWARE("radeon/bonaire_me.bin");
 MODULE_FIRMWARE("radeon/bonaire_ce.bin");
 MODULE_FIRMWARE("radeon/bonaire_rlc.bin");
 MODULE_FIRMWARE("radeon/bonaire_mec.bin");
 
 MODULE_FIRMWARE("radeon/hawaii_pfp.bin");
 MODULE_FIRMWARE("radeon/hawaii_me.bin");
 MODULE_FIRMWARE("radeon/hawaii_ce.bin");
 MODULE_FIRMWARE("radeon/hawaii_rlc.bin");
 MODULE_FIRMWARE("radeon/hawaii_mec.bin");
 
 MODULE_FIRMWARE("radeon/kaveri_pfp.bin");
 MODULE_FIRMWARE("radeon/kaveri_me.bin");
 MODULE_FIRMWARE("radeon/kaveri_ce.bin");
@@ -2804,90 +2805,88 @@ static void gfx_v7_0_cp_compute_fini(struct 
amdgpu_device *adev)
}
}
 }
 
 static void gfx_v7_0_mec_fini(struct amdgpu_device *adev)
 {
int r;
 
if (adev->gfx.mec.hpd_eop_obj) {
r = amdgpu_bo_reserve(adev->gfx.mec.hpd_eop_obj, false);
if (unlikely(r != 0))
dev_warn(adev->dev, "(%d) reserve HPD EOP bo failed\n", 
r);
amdgpu_bo_unpin(adev->gfx.mec.hpd_eop_obj);
amdgpu_bo_unreserve(adev->gfx.mec.hpd_eop_obj);
 
amdgpu_bo_unref(&adev->gfx.mec.hpd_eop_obj);
adev->gfx.mec.hpd_eop_obj = NULL;
}
 }
 
-#define MEC_HPD_SIZE 2048
-
 static int gfx_v7_0_mec_init(struct amdgpu_device *adev)
 {
int r;
u32 *hpd;
 
/*
 * KV:2 MEC, 4 Pipes/MEC, 8 Queues/Pipe - 64 Queues total
 * CI/KB: 1 MEC, 4 Pipes/MEC, 8 Queues/Pipe - 32 Queues total
 * Nonetheless, we assign only 1 pipe because all other pipes will
 * be handled by KFD
 */
adev->gfx.mec.num_mec = 1;
adev->gfx.mec.num_pipe = 1;
adev->gfx.mec.num_queue = adev->gfx.mec.num_mec * 
adev->gfx.mec.num_pipe * 8;
 
if (adev->gfx.mec.hpd_eop_obj == NULL) {
r = amdgpu_bo_create(adev,
-adev->gfx.mec.num_mec 
*adev->gfx.mec.num_pipe * MEC_HPD_SIZE * 2,
+adev->gfx.mec.num_mec * 
adev->gfx.mec.num_pipe * GFX7_MEC_HPD_SIZE * 2,
 PAGE_SIZE, true,
 AMDGPU_GEM_DOMAIN_GTT, 0, NULL, NULL,
 &adev->gfx.mec.hpd_eop_obj);
if (r) {
dev_warn(adev->dev, "(%d) create HDP EOP bo failed\n", 
r);
return r;
}
}
 
r = amdgpu_bo_reserve(adev->gfx.mec.hpd_eop_obj, false);
if (unlikely(r != 0)) {
gfx_v7_0_mec_fini(adev);
return r;
}
r = amdgpu_bo_pin(adev->gfx.mec.hpd_eop_obj, AMDGPU_GEM_DOMAIN_GTT,
  &adev->gfx.mec.hpd_eop_gpu_addr);
if (r) {
dev_warn(adev->dev, "(%d) pin HDP EOP bo failed\n", r);
gfx_v7_0_mec_fini(adev);
return r;
}
r = amdgpu_bo_kmap(adev->gfx.mec.hpd_eop_obj, (void **)&hpd);
if (r) {
dev_warn(adev->dev, "(%d) map HDP EOP bo failed\n", r);
gfx_v7_0_mec_fini(adev);
return r;
}
 
/* clear memory.  Not sure if this is required or not */
-   memset(hpd, 0, adev->gfx.mec.num_mec *adev->gfx.mec.num_pipe * 
MEC_HPD_SIZE * 2);
+   memset(hpd, 0, adev->gfx.mec.num_mec * adev->gfx.mec.num_pipe * 
GFX7_MEC_HPD_SIZE * 2);
 
amdgpu_bo_kunmap(adev->gfx.mec.hpd_eop_obj);
amdgpu_bo_unreserve(adev->gfx.mec.hpd_eop_obj);
 
return 0;
 }
 
 struct hqd_registers
 {
u32 cp_mqd_base_addr;
u32 cp_mqd_base_addr_hi;
u32 cp_hqd_active;
u32 cp_hqd_vmid;
 

Re: amdgpu 0000:84:00.0: gpu post error! \\ Fatal error during GPU init

2017-04-13 Thread Dennis Schridde
Hi!

On Donnerstag, 13. April 2017 17:30:45 CEST Deucher, Alexander wrote:
> > [   17.692746] amdgpu :84:00.0: enabling device ( -> 0003)
> > [   17.692940] [drm] initializing kernel modesetting (TONGA 0x1002:0x6929
> > 0x1002:0x0334 0x00).
> > [   17.692963] [drm] register mmio base: 0xD010
> > [   17.692964] [drm] register mmio size: 262144
> > [   17.692970] [drm] doorbell mmio base: 0xF000
> > [   17.692971] [drm] doorbell mmio size: 2097152
> > [   17.692980] [drm] probing gen 2 caps for device 10b5:8747 = 8796103/10e
> > [   17.692981] [drm] probing mlw for device 10b5:8747 = 8796103
> > [   17.692992] [drm] VCE enabled in physical mode
> > [   18.648132] ATOM BIOS: C76301
> > [   18.651758] [drm] GPU posting now...
> > [   23.661513] [drm:amdgpu_connector_add [amdgpu]] *ERROR* atombios
> > stuck in
> > loop for more than 5secs aborting
> > [   23.673155] [drm:amdgpu_connector_add [amdgpu]] *ERROR* atombios
> > stuck
> > executing F250 (len 334, WS 4, PS 0) @ 0xF365
> > [   23.685453] [drm:amdgpu_connector_add [amdgpu]] *ERROR* atombios
> > stuck
> > executing DB34 (len 324, WS 4, PS 0) @ 0xDC2C
> > [   23.697816] [drm:amdgpu_connector_add [amdgpu]] *ERROR* atombios
> > stuck
> > executing BCDE (len 254, WS 0, PS 4) @ 0xBDB4
> > [   23.710137] [drm:amdgpu_connector_add [amdgpu]] *ERROR* atombios
> > stuck
> > executing B832 (len 143, WS 0, PS 8) @ 0xB8A9
> > [   23.722451] amdgpu :84:00.0: gpu post error!
> > [   23.727950] amdgpu :84:00.0: Fatal error during GPU init
> 
> Posting the GPU is failing.  The is the initial basic asic setup that is
> required before anything else can happen.  There seem to be timeouts
> waiting for some register states.  Is there anything special about your
> setup?  Can you try a vanilla kernel?

I don't think there is anything special. At least not that I am aware of. Dell 
R730xd with one AMD FirePro S7150X2 and 2 Mellanox ConnectX-4 Dual Port cards. 
Apart from the modifications shown in the commit log, I made no changes to the 
CoreOS Container Linux 1381 development version. The kernel is now unpatched, 
stock 4.10.9. Please find the logs of the unpatched / vanilla kernel attached.

--Dennis[SOL Session operational.  Use ~? for help]
[Þžæþžàžæþàààààààà[=3h[=3h[01;01[=3h[=3hKEY MAPPING FOR CONSOLE REDIRECTION:

Use the <1> key sequence for 
Use the <2> key sequence for 
Use the <3> key sequence for 
Use the <0> key sequence for 
Use the  key sequence for 
Use the <@> key sequence for 

Use the  key sequence for 
Use the  key sequence for 
Use the  key sequence for 
Use the  key sequence for 

Use the  key sequence for , where x is any letter
key, and X is the upper case of that key

Use the  key sequence for 

Press the spacebar to pause...
[=3h[=3hInitializing PCIe, USB, and Video... Done
(B[?1;6;7l>[?25h	Press the spacebar to pause...

	KEY MAPPING FOR CONSOLE REDIRECTION:

	Use the <1> key sequence for 
	Use the <2> key sequence for 
	Use the <3> key sequence for 
	Use the <0> key sequence for 
	Use the  key sequence for 
	Use the <@> key sequence for 

	Use the  key sequence for 
	Use the  key sequence for 
	Use the  key sequence for 
	Use the  key sequence for 

	Use the  key sequence for , where x is any letter
	key, and X is the upper case of that key

	Use the  key sequence for 





F2  = System SetupF10 = Lifecycle Controller (Config iDRAC, Update FW, Install OS)F11 = Boot ManagerF12 = PXE BootBroadcomNetXtremeEthernetBootAgentCopyright(C)2000-2016BroadcomCorporationAllrightsreserved.PressCtrl-StoenterConfigurationMenu[?25h

InitializingSerialATAdevices...[?25h
PortJ:PLDSDVD+/-RWDS-8ABSH[?25h



PowerEdgeExpandableRAIDControllerBIOSCopyright(c)2015AvagoTechnologiesPresstoRunConfigurationUtility[?25h


HA-0(Bus3Dev0)PERCH730MiniFWpackage:25.4.1.0004[?25h








1Non-RAIDDisk(s)foundonthehostadapter1Non-RAIDDisk(s)handledbyBIOS0VirtualDrive(s)foundonthehostadapter.[?25h




0VirtualDrive(s)handledbyBIOS[?25h





FlexBootv3.4.812FlexBootPCI86:

Re: [PATCH 3/3] drm/amdgpu: CIK support is no longer experimental

2017-04-13 Thread Marek Olšák
On Thu, Apr 13, 2017 at 6:41 PM, Nicolai Hähnle  wrote:
> On 11.04.2017 00:06, Felix Kuehling wrote:
>>
>> On 17-04-08 04:50 AM, Nicolai Hähnle wrote:
>>>
>>> On 07.04.2017 22:15, Felix Kuehling wrote:

 Change the wording of the CONFIG_DRM_AMDGPU_CIK option to indicate
 that it's no longer experimental.

 Signed-off-by: Felix Kuehling 
 ---
  drivers/gpu/drm/amd/amdgpu/Kconfig | 9 +
  1 file changed, 5 insertions(+), 4 deletions(-)

 diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig
 b/drivers/gpu/drm/amd/amdgpu/Kconfig
 index f3b6df8..029e3fe 100644
 --- a/drivers/gpu/drm/amd/amdgpu/Kconfig
 +++ b/drivers/gpu/drm/amd/amdgpu/Kconfig
 @@ -9,11 +9,12 @@ config DRM_AMDGPU_CIK
  bool "Enable amdgpu support for CIK parts"
  depends on DRM_AMDGPU
  help
 -  Choose this option if you want to enable experimental support
 -  for CIK asics.
 +  Choose this option if you want to enable support for CIK asics.

 -  CIK is already supported in radeon.  CIK support in amdgpu
 -  is for experimentation and testing.
 +  If you choose No here, CIK ASICs will be supported by the
 +  radeon driver, as in previous kernel versions. Depending on
 +  your choice you will need different user mode (Mesa, X.org)
 +  drivers to support accelerated graphics on CIK.
>>>
>>>
>>> The last part is a bit misleading: while you do need different DDXes,
>>> the same Mesa driver (radeonsi) will work with both the radeon and the
>>> amdgpu kernel module for CIK. FWIW, the same is true for SI, although
>>> older versions of Mesa might stumble when run on the amdgpu kernel
>>> module.
>>
>>
>> I see. Do you know the minimum Mesa version required for SI and CIK
>> support on amdgpu respectively?
>
>
> For SI, it's Mesa 17.0.
>
> For CIK, I kind of suspect the support has "always" been there, since the
> amdgpu kernel module was originally brought up on CIK, but maybe Marek knows
> more.

Yes, CIK Mesa support should work with all amdgpu versions.

Marek
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/dp-helper: DP_TEST_MISC1 should be DP_TEST_MISC0

2017-04-13 Thread Deucher, Alexander
> -Original Message-
> From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf
> Of Harry Wentland
> Sent: Thursday, April 13, 2017 10:34 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Wentland, Harry
> Subject: [PATCH] drm/dp-helper: DP_TEST_MISC1 should be
> DP_TEST_MISC0
> 
> Bring this in line with spec and what commit in upstream drm tree.
> 
> Signed-off-by: Harry Wentland 

I think you forgot to commit the relevant change on the DC side as this breaks 
the DC compile.

Alex

> ---
> 
> This brings this definition in amd-staging-4.9 in line with upstream.
> 
>  include/drm/drm_dp_helper.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/drm/drm_dp_helper.h b/include/drm/drm_dp_helper.h
> index 4b14a7674be1..d6a5015976d9 100644
> --- a/include/drm/drm_dp_helper.h
> +++ b/include/drm/drm_dp_helper.h
> @@ -419,7 +419,7 @@
> 
>  #define DP_TEST_PATTERN  0x221
> 
> -#define DP_TEST_MISC1   0x232
> +#define DP_TEST_MISC0   0x232
> 
>  #define DP_TEST_CRC_R_CR 0x240
>  #define DP_TEST_CRC_G_Y  0x242
> --
> 2.11.0
> 
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH 3/3] drm/amdgpu: CIK support is no longer experimental

2017-04-13 Thread Nicolai Hähnle

On 11.04.2017 00:06, Felix Kuehling wrote:

On 17-04-08 04:50 AM, Nicolai Hähnle wrote:

On 07.04.2017 22:15, Felix Kuehling wrote:

Change the wording of the CONFIG_DRM_AMDGPU_CIK option to indicate
that it's no longer experimental.

Signed-off-by: Felix Kuehling 
---
 drivers/gpu/drm/amd/amdgpu/Kconfig | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/Kconfig
b/drivers/gpu/drm/amd/amdgpu/Kconfig
index f3b6df8..029e3fe 100644
--- a/drivers/gpu/drm/amd/amdgpu/Kconfig
+++ b/drivers/gpu/drm/amd/amdgpu/Kconfig
@@ -9,11 +9,12 @@ config DRM_AMDGPU_CIK
 bool "Enable amdgpu support for CIK parts"
 depends on DRM_AMDGPU
 help
-  Choose this option if you want to enable experimental support
-  for CIK asics.
+  Choose this option if you want to enable support for CIK asics.

-  CIK is already supported in radeon.  CIK support in amdgpu
-  is for experimentation and testing.
+  If you choose No here, CIK ASICs will be supported by the
+  radeon driver, as in previous kernel versions. Depending on
+  your choice you will need different user mode (Mesa, X.org)
+  drivers to support accelerated graphics on CIK.


The last part is a bit misleading: while you do need different DDXes,
the same Mesa driver (radeonsi) will work with both the radeon and the
amdgpu kernel module for CIK. FWIW, the same is true for SI, although
older versions of Mesa might stumble when run on the amdgpu kernel
module.


I see. Do you know the minimum Mesa version required for SI and CIK
support on amdgpu respectively?


For SI, it's Mesa 17.0.

For CIK, I kind of suspect the support has "always" been there, since 
the amdgpu kernel module was originally brought up on CIK, but maybe 
Marek knows more.


Cheers,
Nicolai




Thanks,
  Felix



Cheers,
Nicolai




 config DRM_AMDGPU_USERPTR
 bool "Always enable userptr write support"









--
Lerne, wie die Welt wirklich ist,
Aber vergiss niemals, wie sie sein sollte.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: Add kernel parameter to manage memory error handling.

2017-04-13 Thread Tom St Denis

On 13/04/17 11:38 AM, Panariti, David wrote:

+ Vilas


-Original Message-
From: Deucher, Alexander
Sent: Wednesday, April 12, 2017 9:29 PM
To: 'Michel Dänzer' ; Panariti, David

Cc: amd-gfx@lists.freedesktop.org
Subject: RE: [PATCH] drm/amdgpu: Add kernel parameter to manage
memory error handling.


-Original Message-
From: Michel Dänzer [mailto:mic...@daenzer.net]
Sent: Wednesday, April 12, 2017 9:17 PM
To: Panariti, David
Cc: Deucher, Alexander; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amdgpu: Add kernel parameter to manage

memory

error handling.

On 13/04/17 02:38 AM, Panariti, David wrote:

From: Michel Dänzer [mailto:mic...@daenzer.net]


@@ -212,6 +213,9 @@ module_param_named(cg_mask,

amdgpu_cg_mask, uint,

0444);  MODULE_PARM_DESC(pg_mask, "Powergating flags mask (0 =

disable

power gating)");  module_param_named(pg_mask,

amdgpu_pg_mask,

uint,

0444);

+MODULE_PARM_DESC(ecc_mask, "ECC/EDC flags mask (0 = disable
+ECC/EDC)");


"0 = disable ECC/EDC" implies that they're enabled by default? Was
that already the case before this patch?


[davep] Yes it was, and there was actually a problem in some cases
where the CZ would hang which is why I added the param. I was
wondering if it would be better to default to them being off, but I
wasn't sure how important maintaining original behavior is
considered. Actually, there are some bugs in the workaround function
as it is, so it really should default to off.


I agree. There have been some bug reports about Carrizo hangs, I
wonder if any of those might be related to this.


Only the embedded SKUs support EDC.  If they are embedded parts, it could
be related.


[davep]  Sorry for the length, but I wanted all of the details out there for 
the most informed decision.

Another thing is that they can go from not hanging to hanging for no 
discernable reason.
The KIQ changes, however, have seemed to have fixed it.  For one chip and a few 
tens of reboots.

There is also the issue of improperly initialized *gpr registers.
  From the doc:
"Due to a hardware condition whereby some shader instructions utilize uninitialized 
SGPRs and/or VGPRs, the S/VPGR memories must be initialized prior to EDC operation, as, 
not doing so will cause erroneous counts to show up in the EDC counters."
I seem to recall Vilas saying it is the poison that isn't reset properly.  But 
I'm not sure about the actual register contents.
Vilas?

I suggest, at a minimum,  checking for cz *and* the EDC fuse.  If so, 
explicitly disable EDC, run the shaders to zero the *gprs, leave EDC disabled, 
and merge in the existing new code to zero all of the counters.
All of the code exists, in one place or another.

The parameter to enable/disable probably won't be needed until EDC is fully 
implemented.
However, EDC can be enabled in a way that simply allows it to count errors.  
This has never caused a hang for me.
The counts are useful for reliability research.  This was one of the goals of 
the original EDC task.
A umr script could be written (by interested parties) to read the counters and 
in fact enable the EDC counters.
I think this should be done if anyone is interested in the numbers.
Vilas?  Any R&R work left in this area?  Do you think customers would be 
interested in doing this on their own?


For the special keeners we could add EDC counters to umr's --top and 
then in theory it'll be included in the log output.


If you can send me info on how to enable/read the counters I can take a 
look at it.


Cheers,
Tom
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdgpu: Add kernel parameter to manage memory error handling.

2017-04-13 Thread Panariti, David
+ Vilas

> -Original Message-
> From: Deucher, Alexander
> Sent: Wednesday, April 12, 2017 9:29 PM
> To: 'Michel Dänzer' ; Panariti, David
> 
> Cc: amd-gfx@lists.freedesktop.org
> Subject: RE: [PATCH] drm/amdgpu: Add kernel parameter to manage
> memory error handling.
> 
> > -Original Message-
> > From: Michel Dänzer [mailto:mic...@daenzer.net]
> > Sent: Wednesday, April 12, 2017 9:17 PM
> > To: Panariti, David
> > Cc: Deucher, Alexander; amd-gfx@lists.freedesktop.org
> > Subject: Re: [PATCH] drm/amdgpu: Add kernel parameter to manage
> memory
> > error handling.
> >
> > On 13/04/17 02:38 AM, Panariti, David wrote:
> > >> From: Michel Dänzer [mailto:mic...@daenzer.net]
> > >>
> > >>> @@ -212,6 +213,9 @@ module_param_named(cg_mask,
> > >> amdgpu_cg_mask, uint,
> > >>> 0444);  MODULE_PARM_DESC(pg_mask, "Powergating flags mask (0 =
> > >> disable
> > >>> power gating)");  module_param_named(pg_mask,
> amdgpu_pg_mask,
> > >> uint,
> > >>> 0444);
> > >>>
> > >>> +MODULE_PARM_DESC(ecc_mask, "ECC/EDC flags mask (0 = disable
> > >>> +ECC/EDC)");
> > >>
> > >> "0 = disable ECC/EDC" implies that they're enabled by default? Was
> > >> that already the case before this patch?
> > >
> > > [davep] Yes it was, and there was actually a problem in some cases
> > > where the CZ would hang which is why I added the param. I was
> > > wondering if it would be better to default to them being off, but I
> > > wasn't sure how important maintaining original behavior is
> > > considered. Actually, there are some bugs in the workaround function
> > > as it is, so it really should default to off.
> >
> > I agree. There have been some bug reports about Carrizo hangs, I
> > wonder if any of those might be related to this.
> 
> Only the embedded SKUs support EDC.  If they are embedded parts, it could
> be related.

[davep]  Sorry for the length, but I wanted all of the details out there for 
the most informed decision.

Another thing is that they can go from not hanging to hanging for no 
discernable reason.  
The KIQ changes, however, have seemed to have fixed it.  For one chip and a few 
tens of reboots.

There is also the issue of improperly initialized *gpr registers.
 From the doc:
"Due to a hardware condition whereby some shader instructions utilize 
uninitialized SGPRs and/or VGPRs, the S/VPGR memories must be initialized prior 
to EDC operation, as, not doing so will cause erroneous counts to show up in 
the EDC counters."
I seem to recall Vilas saying it is the poison that isn't reset properly.  But 
I'm not sure about the actual register contents.
Vilas?

I suggest, at a minimum,  checking for cz *and* the EDC fuse.  If so, 
explicitly disable EDC, run the shaders to zero the *gprs, leave EDC disabled, 
and merge in the existing new code to zero all of the counters.
All of the code exists, in one place or another.

The parameter to enable/disable probably won't be needed until EDC is fully 
implemented.  
However, EDC can be enabled in a way that simply allows it to count errors.  
This has never caused a hang for me.
The counts are useful for reliability research.  This was one of the goals of 
the original EDC task.
A umr script could be written (by interested parties) to read the counters and 
in fact enable the EDC counters.
I think this should be done if anyone is interested in the numbers.  
Vilas?  Any R&R work left in this area?  Do you think customers would be 
interested in doing this on their own?

davep
> 
> Alex

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: amdgpu 0000:84:00.0: gpu post error! \\ Fatal error during GPU init

2017-04-13 Thread Deucher, Alexander
> -Original Message-
> From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf
> Of Dennis Schridde
> Sent: Thursday, April 13, 2017 10:18 AM
> To: amd-gfx@lists.freedesktop.org
> Subject: amdgpu :84:00.0: gpu post error! \\ Fatal error during GPU init
> 
> Hello!
> 
> I am trying to use an AMD FirePro S7150X2 with the AMDGPU driver of a
> Linux
> 4.10.9 kernel (CoreOS Container Linux) and linux-firmware
> e39f0e3e6897ad865b3704f61218ae83f98a85da, but I run into the following
> error
> after the amdgpu module is being loaded:
> 
> [   17.692746] amdgpu :84:00.0: enabling device ( -> 0003)
> [   17.692940] [drm] initializing kernel modesetting (TONGA 0x1002:0x6929
> 0x1002:0x0334 0x00).
> [   17.692963] [drm] register mmio base: 0xD010
> [   17.692964] [drm] register mmio size: 262144
> [   17.692970] [drm] doorbell mmio base: 0xF000
> [   17.692971] [drm] doorbell mmio size: 2097152
> [   17.692980] [drm] probing gen 2 caps for device 10b5:8747 = 8796103/10e
> [   17.692981] [drm] probing mlw for device 10b5:8747 = 8796103
> [   17.692992] [drm] VCE enabled in physical mode
> [   18.648132] ATOM BIOS: C76301
> [   18.651758] [drm] GPU posting now...
> [   23.661513] [drm:amdgpu_connector_add [amdgpu]] *ERROR* atombios
> stuck in
> loop for more than 5secs aborting
> [   23.673155] [drm:amdgpu_connector_add [amdgpu]] *ERROR* atombios
> stuck
> executing F250 (len 334, WS 4, PS 0) @ 0xF365
> [   23.685453] [drm:amdgpu_connector_add [amdgpu]] *ERROR* atombios
> stuck
> executing DB34 (len 324, WS 4, PS 0) @ 0xDC2C
> [   23.697816] [drm:amdgpu_connector_add [amdgpu]] *ERROR* atombios
> stuck
> executing BCDE (len 254, WS 0, PS 4) @ 0xBDB4
> [   23.710137] [drm:amdgpu_connector_add [amdgpu]] *ERROR* atombios
> stuck
> executing B832 (len 143, WS 0, PS 8) @ 0xB8A9
> [   23.722451] amdgpu :84:00.0: gpu post error!
> [   23.727950] amdgpu :84:00.0: Fatal error during GPU init

Posting the GPU is failing.  The is the initial basic asic setup that is 
required before anything else can happen.  There seem to be timeouts waiting 
for some register states.  Is there anything special about your setup?  Can you 
try a vanilla kernel?

Alex

> [   23.734594] [drm] amdgpu: finishing device.
> [   23.739592] [ cut here ]
> ...
> [   24.096608] ---[ end trace 88c8cb35b32e3b88 ]---
> [   24.102086] BUG: unable to handle kernel NULL pointer dereference at
> 0018
> [   24.111438] IP: __ww_mutex_lock+0x24/0xa0
> [   24.116222] PGD 0
> [   24.116223]
> [   24.120737] Oops: 0002 [#1] SMP
> ...
> 
> Please find a full log attached.
> 
> My kernel configuration is available at:
>  https://github.com/urzds/coreos-overlay/blob/hpc_support/sys-
> kernel/coreos-modules/files/{commonconfig-4.10,amd64_defconfig-4.10}
> Please refer to the the commit log of the "hpc_support" branch for my
> changes
> compared to the CoreOS CL stock config.
> 
> I would be very glad if you could help me in debugging the issue and getting
> the GPU running.
> 
> Thanks,
> Dennis
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: AMDGPU without display output

2017-04-13 Thread Dennis Schridde
Thanks, Alex!

signature.asc
Description: This is a digitally signed message part.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: AMDGPU without display output

2017-04-13 Thread Deucher, Alexander
> -Original Message-
> From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf
> Of Dennis Schridde
> Sent: Thursday, April 13, 2017 10:32 AM
> To: amd-gfx@lists.freedesktop.org
> Subject: AMDGPU without display output
> 
> Hello again!
> 
> I am trying to use a AMD FirePro S7150X2 with the AMDGPU driver of a Linux
> 4.10.9 kernel (CoreOS Container Linux) and linux-firmware
> e39f0e3e6897ad865b3704f61218ae83f98a85da.
> 
> Since the card has no display output and I want to run remote applications
> only, I would like to prevent any interference with mode setting and the
> kernel console. Thus I set "nomodeset" on the kernel command line to
> prevent
> the kernel from trying to initialise anything but the rendering functions of
> the card. However, this leads to following error message:
> 
> [drm:init_module [amdgpu]] *ERROR* VGACON disables amdgpu kernel
> modesetting.
> 
> The result is that the AMDGPU module can not be loaded.

nomodeset prevents the driver from loading.  It's a way to disable the KMS 
altogether.

> 
> Is this generally the right approach to use this driver for rendering without
> display output, or can I safely leave KMS enabled and it will not interfere
> with my application's X servers and the OpenGL applications running on
> them?
> 

The driver only exposes display connectors when they exist.

> Assuming I have to disable KMS, how would I get past this error, i.e. to
> initialise the card's rendering functions, but skipping initialisation of the
> output part of the driver?
> 

You don't have disable KMS.  there is no way to.  The driver will expose the hw 
that exists on the card.  If there are not display connectors, none will be 
exposed.  It's up to the user to configure X to use or not use specific cards.

> Generally asking: How do people use this card for GPGPU compute (i.e.
> headless) tasks? Is there some documentation what I need to pay attention
> to?

It should just work as long as the driver is loaded.

Alex

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


AMDGPU without display output

2017-04-13 Thread Dennis Schridde
Hello again!

I am trying to use a AMD FirePro S7150X2 with the AMDGPU driver of a Linux 
4.10.9 kernel (CoreOS Container Linux) and linux-firmware 
e39f0e3e6897ad865b3704f61218ae83f98a85da.

Since the card has no display output and I want to run remote applications 
only, I would like to prevent any interference with mode setting and the 
kernel console. Thus I set "nomodeset" on the kernel command line to prevent 
the kernel from trying to initialise anything but the rendering functions of 
the card. However, this leads to following error message:

[drm:init_module [amdgpu]] *ERROR* VGACON disables amdgpu kernel modesetting.

The result is that the AMDGPU module can not be loaded.

Is this generally the right approach to use this driver for rendering without 
display output, or can I safely leave KMS enabled and it will not interfere 
with my application's X servers and the OpenGL applications running on them?

Assuming I have to disable KMS, how would I get past this error, i.e. to 
initialise the card's rendering functions, but skipping initialisation of the 
output part of the driver?

Generally asking: How do people use this card for GPGPU compute (i.e. 
headless) tasks? Is there some documentation what I need to pay attention to?

Thanks,
Dennis

signature.asc
Description: This is a digitally signed message part.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


amdgpu 0000:84:00.0: gpu post error! \\ Fatal error during GPU init

2017-04-13 Thread Dennis Schridde
Hello!

I am trying to use an AMD FirePro S7150X2 with the AMDGPU driver of a Linux 
4.10.9 kernel (CoreOS Container Linux) and linux-firmware 
e39f0e3e6897ad865b3704f61218ae83f98a85da, but I run into the following error 
after the amdgpu module is being loaded:

[   17.692746] amdgpu :84:00.0: enabling device ( -> 0003)
[   17.692940] [drm] initializing kernel modesetting (TONGA 0x1002:0x6929 
0x1002:0x0334 0x00).
[   17.692963] [drm] register mmio base: 0xD010
[   17.692964] [drm] register mmio size: 262144
[   17.692970] [drm] doorbell mmio base: 0xF000
[   17.692971] [drm] doorbell mmio size: 2097152
[   17.692980] [drm] probing gen 2 caps for device 10b5:8747 = 8796103/10e
[   17.692981] [drm] probing mlw for device 10b5:8747 = 8796103
[   17.692992] [drm] VCE enabled in physical mode
[   18.648132] ATOM BIOS: C76301
[   18.651758] [drm] GPU posting now...
[   23.661513] [drm:amdgpu_connector_add [amdgpu]] *ERROR* atombios stuck in 
loop for more than 5secs aborting
[   23.673155] [drm:amdgpu_connector_add [amdgpu]] *ERROR* atombios stuck 
executing F250 (len 334, WS 4, PS 0) @ 0xF365
[   23.685453] [drm:amdgpu_connector_add [amdgpu]] *ERROR* atombios stuck 
executing DB34 (len 324, WS 4, PS 0) @ 0xDC2C
[   23.697816] [drm:amdgpu_connector_add [amdgpu]] *ERROR* atombios stuck 
executing BCDE (len 254, WS 0, PS 4) @ 0xBDB4
[   23.710137] [drm:amdgpu_connector_add [amdgpu]] *ERROR* atombios stuck 
executing B832 (len 143, WS 0, PS 8) @ 0xB8A9
[   23.722451] amdgpu :84:00.0: gpu post error!
[   23.727950] amdgpu :84:00.0: Fatal error during GPU init
[   23.734594] [drm] amdgpu: finishing device.
[   23.739592] [ cut here ]
...
[   24.096608] ---[ end trace 88c8cb35b32e3b88 ]---
[   24.102086] BUG: unable to handle kernel NULL pointer dereference at 
0018
[   24.111438] IP: __ww_mutex_lock+0x24/0xa0
[   24.116222] PGD 0 
[   24.116223] 
[   24.120737] Oops: 0002 [#1] SMP
...

Please find a full log attached.

My kernel configuration is available at:
 
https://github.com/urzds/coreos-overlay/blob/hpc_support/sys-kernel/coreos-modules/files/{commonconfig-4.10,amd64_defconfig-4.10}
Please refer to the the commit log of the "hpc_support" branch for my changes 
compared to the CoreOS CL stock config.

I would be very glad if you could help me in debugging the issue and getting 
the GPU running.

Thanks,
Dennis[SOL Session operational.  Use ~? for help]
Lifecycle Controller: Done
Booting...
(B[?1;6;7l>[?25h






BootingfromBRCMMBASlot0100v20.2.0BroadcomUNDIPXE-2.1v20.2.0Copyright(C)2000-2016BroadcomCorporationCopyright(C)1997-2000IntelCorporationAllrightsreserved.[?25h

CLIENTMACADDR:1866DAF04318GUID:4C4C4544-004C-5610-804C-B8C04F574732DHCP.[?25h-[?25h\[?25h|[?25h/[?25h-[?25h\[?25h|[?25h/[?25h-[?25h\[?25h|[?25h/[?25h-[?25h\[?25h|[?25h/[?25h-[?25h\[?25h|[?25h/[?25h-[?25h\[?25h|[?25h/[?25h-[?25h\[?25h|[?25h/[?25h-[?25h\[?25h|[?25h/[?25h-[?25h\[?25h|[?25h/[?25h-[?25h






CLIENTIP:192.168.10.73MASK:255.255.255.0DHCPIP:192.168.10.27GATEWAYIP:192.168.10.27TFTP.[?25h





PXE->EB:!PXEat95E1:0040,entrypointat95E1:00D6UNDIcodesegment95E1:6B70,datasegment922A:3B70(584-627kB)UNDIdeviceisPCI01:00.0,typeDIX+802.3[?25h

584kBfreebasememoryafterPXEunloadiPXEinitialisingdevices...[?25h






okiPXE 1.0.0+ (6c748)--OpenSourceNetwork

[PATCH libdrm 0/2] amdgpu: add amdgpu_cs_wait_fences

2017-04-13 Thread Nicolai Hähnle
Hi all,

These changes expose a function to call the WAIT_FENCES ioctl for
waiting on multiple fences at the same time. This is useful for
Vulkan.

They are mostly changes that have been in the amdgpu-pro libdrm
for a long time. I've taken the liberty to clean them up a bit
and add some missing bits.

Please review!
Thanks,
Nicolai

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH libdrm 1/2] amdgpu: add the interface of waiting multiple fences

2017-04-13 Thread Nicolai Hähnle
From: Nicolai Hähnle 

Signed-off-by: Junwei Zhang 
[v2: allow returning the first signaled fence index]
Signed-off-by: monk.liu 
[v3:
 - cleanup *status setting
 - fix amdgpu symbols check]
Signed-off-by: Nicolai Hähnle 
Reviewed-by: Christian König  (v1)
Reviewed-by: Jammy Zhou  (v1)
---
 amdgpu/amdgpu-symbol-check |  1 +
 amdgpu/amdgpu.h| 23 ++
 amdgpu/amdgpu_cs.c | 74 ++
 3 files changed, 98 insertions(+)

diff --git a/amdgpu/amdgpu-symbol-check b/amdgpu/amdgpu-symbol-check
index 4d1ae65..81ef9b4 100755
--- a/amdgpu/amdgpu-symbol-check
+++ b/amdgpu/amdgpu-symbol-check
@@ -26,20 +26,21 @@ amdgpu_bo_va_op_raw
 amdgpu_bo_wait_for_idle
 amdgpu_create_bo_from_user_mem
 amdgpu_cs_create_semaphore
 amdgpu_cs_ctx_create
 amdgpu_cs_ctx_free
 amdgpu_cs_destroy_semaphore
 amdgpu_cs_query_fence_status
 amdgpu_cs_query_reset_state
 amdgpu_cs_signal_semaphore
 amdgpu_cs_submit
+amdgpu_cs_wait_fences
 amdgpu_cs_wait_semaphore
 amdgpu_device_deinitialize
 amdgpu_device_initialize
 amdgpu_get_marketing_name
 amdgpu_query_buffer_size_alignment
 amdgpu_query_crtc_from_id
 amdgpu_query_firmware_version
 amdgpu_query_gds_info
 amdgpu_query_gpu_info
 amdgpu_query_heap_info
diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h
index 55884b2..fdea905 100644
--- a/amdgpu/amdgpu.h
+++ b/amdgpu/amdgpu.h
@@ -900,20 +900,43 @@ int amdgpu_cs_submit(amdgpu_context_handle context,
  *  returned in the case if submission was completed or timeout error
  *  code.
  *
  * \sa amdgpu_cs_submit()
 */
 int amdgpu_cs_query_fence_status(struct amdgpu_cs_fence *fence,
 uint64_t timeout_ns,
 uint64_t flags,
 uint32_t *expired);
 
+/**
+ *  Wait for multiple fences
+ *
+ * \param   fences  - \c [in] The fence array to wait
+ * \param   fence_count - \c [in] The fence count
+ * \param   wait_all- \c [in] If true, wait all fences to be signaled,
+ *otherwise, wait at least one fence
+ * \param   timeout_ns  - \c [in] The timeout to wait, in nanoseconds
+ * \param   status  - \c [out] '1' for signaled, '0' for timeout
+ * \param   first   - \c [out] the index of the first signaled fence from 
@fences
+ *
+ * \return  0 on success
+ *  <0 - Negative POSIX Error code
+ *
+ * \noteCurrently it supports only one amdgpu_device. All fences come from
+ *  the same amdgpu_device with the same fd.
+*/
+int amdgpu_cs_wait_fences(struct amdgpu_cs_fence *fences,
+ uint32_t fence_count,
+ bool wait_all,
+ uint64_t timeout_ns,
+ uint32_t *status, uint32_t *first);
+
 /*
  * Query / Info API
  *
 */
 
 /**
  * Query allocation size alignments
  *
  * UMD should query information about GPU VM MC size alignments requirements
  * to be able correctly choose required allocation size and implement
diff --git a/amdgpu/amdgpu_cs.c b/amdgpu/amdgpu_cs.c
index fb5b3a8..707e6d1 100644
--- a/amdgpu/amdgpu_cs.c
+++ b/amdgpu/amdgpu_cs.c
@@ -436,20 +436,94 @@ int amdgpu_cs_query_fence_status(struct amdgpu_cs_fence 
*fence,
r = amdgpu_ioctl_wait_cs(fence->context, fence->ip_type,
fence->ip_instance, fence->ring,
fence->fence, timeout_ns, flags, &busy);
 
if (!r && !busy)
*expired = true;
 
return r;
 }
 
+static int amdgpu_ioctl_wait_fences(struct amdgpu_cs_fence *fences,
+   uint32_t fence_count,
+   bool wait_all,
+   uint64_t timeout_ns,
+   uint32_t *status,
+   uint32_t *first)
+{
+   struct drm_amdgpu_fence *drm_fences;
+   amdgpu_device_handle dev = fences[0].context->dev;
+   union drm_amdgpu_wait_fences args;
+   int r;
+   uint32_t i;
+
+   drm_fences = alloca(sizeof(struct drm_amdgpu_fence) * fence_count);
+   for (i = 0; i < fence_count; i++) {
+   drm_fences[i].ctx_id = fences[i].context->id;
+   drm_fences[i].ip_type = fences[i].ip_type;
+   drm_fences[i].ip_instance = fences[i].ip_instance;
+   drm_fences[i].ring = fences[i].ring;
+   drm_fences[i].seq_no = fences[i].fence;
+   }
+
+   memset(&args, 0, sizeof(args));
+   args.in.fences = (uint64_t)(uintptr_t)drm_fences;
+   args.in.fence_count = fence_count;
+   args.in.wait_all = wait_all;
+   args.in.timeout_ns = amdgpu_cs_calculate_timeout(timeout_ns);
+
+   r = drmIoctl(dev->fd, DRM_IOCTL_AMDGPU_WAIT_FENCES, &args);
+   if (r)
+   return -errno;
+
+   *status = args.out.status;
+
+   if (first)
+   *first = args.out.first_signaled;
+
+   return 0;
+}
+
+int amdgpu_cs

[PATCH libdrm 2/2] amdgpu: add a test for amdgpu_cs_wait_fences

2017-04-13 Thread Nicolai Hähnle
From: Nicolai Hähnle 

Signed-off-by: monk.liu 
[v2: actually hook up the test case]
Signed-off-by: Nicolai Hähnle 
---
 tests/amdgpu/basic_tests.c | 100 +
 1 file changed, 100 insertions(+)

diff --git a/tests/amdgpu/basic_tests.c b/tests/amdgpu/basic_tests.c
index 4dce67e..8d5844b 100644
--- a/tests/amdgpu/basic_tests.c
+++ b/tests/amdgpu/basic_tests.c
@@ -38,34 +38,36 @@
 #include "amdgpu_drm.h"
 
 static  amdgpu_device_handle device_handle;
 static  uint32_t  major_version;
 static  uint32_t  minor_version;
 
 static void amdgpu_query_info_test(void);
 static void amdgpu_memory_alloc(void);
 static void amdgpu_command_submission_gfx(void);
 static void amdgpu_command_submission_compute(void);
+static void amdgpu_command_submission_multi_fence(void);
 static void amdgpu_command_submission_sdma(void);
 static void amdgpu_userptr_test(void);
 static void amdgpu_semaphore_test(void);
 
 static void amdgpu_command_submission_write_linear_helper(unsigned ip_type);
 static void amdgpu_command_submission_const_fill_helper(unsigned ip_type);
 static void amdgpu_command_submission_copy_linear_helper(unsigned ip_type);
 
 CU_TestInfo basic_tests[] = {
{ "Query Info Test",  amdgpu_query_info_test },
{ "Memory alloc Test",  amdgpu_memory_alloc },
{ "Userptr Test",  amdgpu_userptr_test },
{ "Command submission Test (GFX)",  amdgpu_command_submission_gfx },
{ "Command submission Test (Compute)", 
amdgpu_command_submission_compute },
+   { "Command submission Test (Multi-Fence)", 
amdgpu_command_submission_multi_fence },
{ "Command submission Test (SDMA)", amdgpu_command_submission_sdma },
{ "SW semaphore Test",  amdgpu_semaphore_test },
CU_TEST_INFO_NULL,
 };
 #define BUFFER_SIZE (8 * 1024)
 #define SDMA_PKT_HEADER_op_offset 0
 #define SDMA_PKT_HEADER_op_mask   0x00FF
 #define SDMA_PKT_HEADER_op_shift  0
 #define SDMA_PKT_HEADER_OP(x) (((x) & SDMA_PKT_HEADER_op_mask) << 
SDMA_PKT_HEADER_op_shift)
 #define SDMA_OPCODE_CONSTANT_FILL  11
@@ -1142,20 +1144,118 @@ static void 
amdgpu_command_submission_sdma_copy_linear(void)
amdgpu_command_submission_copy_linear_helper(AMDGPU_HW_IP_DMA);
 }
 
 static void amdgpu_command_submission_sdma(void)
 {
amdgpu_command_submission_sdma_write_linear();
amdgpu_command_submission_sdma_const_fill();
amdgpu_command_submission_sdma_copy_linear();
 }
 
+static void amdgpu_command_submission_multi_fence_wait_all(bool wait_all)
+{
+   amdgpu_context_handle context_handle;
+   amdgpu_bo_handle ib_result_handle, ib_result_ce_handle;
+   void *ib_result_cpu, *ib_result_ce_cpu;
+   uint64_t ib_result_mc_address, ib_result_ce_mc_address;
+   struct amdgpu_cs_request ibs_request[2] = {0};
+   struct amdgpu_cs_ib_info ib_info[2];
+   struct amdgpu_cs_fence fence_status[2] = {0};
+   uint32_t *ptr;
+   uint32_t expired;
+   amdgpu_bo_list_handle bo_list;
+   amdgpu_va_handle va_handle, va_handle_ce;
+   int r;
+   int i, ib_cs_num = 2;
+
+   r = amdgpu_cs_ctx_create(device_handle, &context_handle);
+   CU_ASSERT_EQUAL(r, 0);
+
+   r = amdgpu_bo_alloc_and_map(device_handle, 4096, 4096,
+   AMDGPU_GEM_DOMAIN_GTT, 0,
+   &ib_result_handle, &ib_result_cpu,
+   &ib_result_mc_address, &va_handle);
+   CU_ASSERT_EQUAL(r, 0);
+
+   r = amdgpu_bo_alloc_and_map(device_handle, 4096, 4096,
+   AMDGPU_GEM_DOMAIN_GTT, 0,
+   &ib_result_ce_handle, &ib_result_ce_cpu,
+   &ib_result_ce_mc_address, &va_handle_ce);
+   CU_ASSERT_EQUAL(r, 0);
+
+   r = amdgpu_get_bo_list(device_handle, ib_result_handle,
+  ib_result_ce_handle, &bo_list);
+   CU_ASSERT_EQUAL(r, 0);
+
+   memset(ib_info, 0, 2 * sizeof(struct amdgpu_cs_ib_info));
+
+   /* IT_SET_CE_DE_COUNTERS */
+   ptr = ib_result_ce_cpu;
+   ptr[0] = 0xc0008900;
+   ptr[1] = 0;
+   ptr[2] = 0xc0008400;
+   ptr[3] = 1;
+   ib_info[0].ib_mc_address = ib_result_ce_mc_address;
+   ib_info[0].size = 4;
+   ib_info[0].flags = AMDGPU_IB_FLAG_CE;
+
+   /* IT_WAIT_ON_CE_COUNTER */
+   ptr = ib_result_cpu;
+   ptr[0] = 0xc0008600;
+   ptr[1] = 0x0001;
+   ib_info[1].ib_mc_address = ib_result_mc_address;
+   ib_info[1].size = 2;
+
+   for (i = 0; i < ib_cs_num; i++) {
+   ibs_request[i].ip_type = AMDGPU_HW_IP_GFX;
+   ibs_request[i].number_of_ibs = 2;
+   ibs_request[i].ibs = ib_info;
+   ibs_request[i].resources = bo_list;
+   ibs_request[i].fence_info.handle = NULL;
+   }
+
+   r = amdgpu_cs_submit(context_handle, 0,ibs_request, ib_cs_num);
+
+   CU_ASSERT_EQUAL(r, 0);
+
+   for (i = 0; i 

Re: [PATCH] drm/dp-helper: DP_TEST_MISC1 should be DP_TEST_MISC0

2017-04-13 Thread Alex Deucher
On Thu, Apr 13, 2017 at 10:34 AM, Harry Wentland  wrote:
> Bring this in line with spec and what commit in upstream drm tree.
>
> Signed-off-by: Harry Wentland 

Acked-by: Alex Deucher 

> ---
>
> This brings this definition in amd-staging-4.9 in line with upstream.
>
>  include/drm/drm_dp_helper.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/drm/drm_dp_helper.h b/include/drm/drm_dp_helper.h
> index 4b14a7674be1..d6a5015976d9 100644
> --- a/include/drm/drm_dp_helper.h
> +++ b/include/drm/drm_dp_helper.h
> @@ -419,7 +419,7 @@
>
>  #define DP_TEST_PATTERN0x221
>
> -#define DP_TEST_MISC1   0x232
> +#define DP_TEST_MISC0   0x232
>
>  #define DP_TEST_CRC_R_CR   0x240
>  #define DP_TEST_CRC_G_Y0x242
> --
> 2.11.0
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/dp-helper: DP_TEST_MISC1 should be DP_TEST_MISC0

2017-04-13 Thread Harry Wentland
Bring this in line with spec and what commit in upstream drm tree.

Signed-off-by: Harry Wentland 
---

This brings this definition in amd-staging-4.9 in line with upstream.

 include/drm/drm_dp_helper.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/drm/drm_dp_helper.h b/include/drm/drm_dp_helper.h
index 4b14a7674be1..d6a5015976d9 100644
--- a/include/drm/drm_dp_helper.h
+++ b/include/drm/drm_dp_helper.h
@@ -419,7 +419,7 @@
 
 #define DP_TEST_PATTERN0x221
 
-#define DP_TEST_MISC1   0x232
+#define DP_TEST_MISC0   0x232
 
 #define DP_TEST_CRC_R_CR   0x240
 #define DP_TEST_CRC_G_Y0x242
-- 
2.11.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH] drm/amdgpu: fix dead lock if any ip block resume failed in s3

2017-04-13 Thread Deucher, Alexander
> -Original Message-
> From: Huang Rui [mailto:ray.hu...@amd.com]
> Sent: Thursday, April 13, 2017 4:12 AM
> To: amd-gfx@lists.freedesktop.org; Deucher, Alexander
> Cc: Koenig, Christian; Wang, Ken; Huang, Ray
> Subject: [PATCH] drm/amdgpu: fix dead lock if any ip block resume failed in
> s3
> 
> Driver must free the console lock whether driver resuming successful
> or not.  Otherwise, fb_console will be always waiting for the lock and
> then cause system stuck.
> 
> [  244.405541] INFO: task kworker/0:0:4 blocked for more than 120 seconds.
> [  244.405543]   Tainted: G   OE   4.9.0-custom #1
> [  244.405544] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [  244.405541] INFO: task kworker/0:0:4 blocked for more than 120 seconds.
> [  244.405543]   Tainted: G   OE   4.9.0-custom #1
> [  244.405544] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [  244.405550] kworker/0:0 D0 4  2 0x0008
> [  244.405559] Workqueue: events console_callback
> [  244.405564]  88045a2cfc00  880462b75940
> 81c0e500
> [  244.405568]  880476419280 c900018f7c90 817dcf62
> 003c
> [  244.405572]  0001 0002 880462b75940
> 880462b75940
> [  244.405573] Call Trace:
> [  244.405580]  [] ? __schedule+0x222/0x6a0
> [  244.405584]  [] schedule+0x36/0x80
> [  244.405588]  [] schedule_timeout+0x1fc/0x390
> [  244.405592]  [] __down_common+0xa5/0xf8
> [  244.405598]  [] ? put_prev_entity+0x48/0x710
> [  244.405601]  [] __down+0x1d/0x1f
> [  244.405606]  [] down+0x41/0x50
> [  244.405611]  [] console_lock+0x1a/0x40
> [  244.405614]  [] console_callback+0x13/0x160
> [  244.405617]  [] ? __schedule+0x22a/0x6a0
> [  244.405623]  [] process_one_work+0x153/0x3f0
> [  244.405628]  [] worker_thread+0x12b/0x4b0
> [  244.405633]  [] ? rescuer_thread+0x350/0x350
> [  244.405637]  [] kthread+0xd3/0xf0
> [  244.405641]  [] ? kthread_park+0x60/0x60
> [  244.405645]  [] ? kthread_park+0x60/0x60
> [  244.405649]  [] ret_from_fork+0x25/0x30
> 
> Signed-off-by: Huang Rui 

Reviewed-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 28 ---
> -
>  1 file changed, 12 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index bd3a0d5..abb4dcc 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -2280,7 +2280,7 @@ int amdgpu_device_resume(struct drm_device
> *dev, bool resume, bool fbcon)
>   struct drm_connector *connector;
>   struct amdgpu_device *adev = dev->dev_private;
>   struct drm_crtc *crtc;
> - int r;
> + int r = 0;
> 
>   if (dev->switch_power_state == DRM_SWITCH_POWER_OFF)
>   return 0;
> @@ -2292,11 +2292,8 @@ int amdgpu_device_resume(struct drm_device
> *dev, bool resume, bool fbcon)
>   pci_set_power_state(dev->pdev, PCI_D0);
>   pci_restore_state(dev->pdev);
>   r = pci_enable_device(dev->pdev);
> - if (r) {
> - if (fbcon)
> - console_unlock();
> - return r;
> - }
> + if (r)
> + goto unlock;
>   }
>   if (adev->is_atom_fw)
>   amdgpu_atomfirmware_scratch_regs_restore(adev);
> @@ -2313,7 +2310,7 @@ int amdgpu_device_resume(struct drm_device
> *dev, bool resume, bool fbcon)
>   r = amdgpu_resume(adev);
>   if (r) {
>   DRM_ERROR("amdgpu_resume failed (%d).\n", r);
> - return r;
> + goto unlock;
>   }
>   amdgpu_fence_driver_resume(adev);
> 
> @@ -2324,11 +2321,8 @@ int amdgpu_device_resume(struct drm_device
> *dev, bool resume, bool fbcon)
>   }
> 
>   r = amdgpu_late_init(adev);
> - if (r) {
> - if (fbcon)
> - console_unlock();
> - return r;
> - }
> + if (r)
> + goto unlock;
> 
>   /* pin cursors */
>   list_for_each_entry(crtc, &dev->mode_config.crtc_list, head) {
> @@ -2349,7 +2343,7 @@ int amdgpu_device_resume(struct drm_device
> *dev, bool resume, bool fbcon)
>   }
>   r = amdgpu_amdkfd_resume(adev);
>   if (r)
> - return r;
> + goto unlock;
> 
>   /* blat the mode back in */
>   if (fbcon) {
> @@ -2396,12 +2390,14 @@ int amdgpu_device_resume(struct drm_device
> *dev, bool resume, bool fbcon)
>   dev->dev->power.disable_depth--;
>  #endif
> 
> - if (fbcon) {
> + if (fbcon)
>   amdgpu_fbdev_set_suspend(adev, 0);
> +
> +unlock:
> + if (fbcon)
>   console_unlock();
> - }
> 
> - return 0;
> + return r;
>  }
> 
>  static bool amdgpu_check_soft_reset(struct amdgpu_device *adev)
> --
> 2.7.4

__

Re: [PATCH] drm/amdgpu: fix dead lock if any ip block resume failed in s3

2017-04-13 Thread Michel Dänzer
On 13/04/17 05:12 PM, Huang Rui wrote:
> Driver must free the console lock whether driver resuming successful
> or not.  Otherwise, fb_console will be always waiting for the lock and
> then cause system stuck.
> 
> [  244.405541] INFO: task kworker/0:0:4 blocked for more than 120 seconds.
> [  244.405543]   Tainted: G   OE   4.9.0-custom #1
> [  244.405544] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
> this message.
> [  244.405541] INFO: task kworker/0:0:4 blocked for more than 120 seconds.
> [  244.405543]   Tainted: G   OE   4.9.0-custom #1
> [  244.405544] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
> this message.
> [  244.405550] kworker/0:0 D0 4  2 0x0008
> [  244.405559] Workqueue: events console_callback
> [  244.405564]  88045a2cfc00  880462b75940 
> 81c0e500
> [  244.405568]  880476419280 c900018f7c90 817dcf62 
> 003c
> [  244.405572]  0001 0002 880462b75940 
> 880462b75940
> [  244.405573] Call Trace:
> [  244.405580]  [] ? __schedule+0x222/0x6a0
> [  244.405584]  [] schedule+0x36/0x80
> [  244.405588]  [] schedule_timeout+0x1fc/0x390
> [  244.405592]  [] __down_common+0xa5/0xf8
> [  244.405598]  [] ? put_prev_entity+0x48/0x710
> [  244.405601]  [] __down+0x1d/0x1f
> [  244.405606]  [] down+0x41/0x50
> [  244.405611]  [] console_lock+0x1a/0x40
> [  244.405614]  [] console_callback+0x13/0x160
> [  244.405617]  [] ? __schedule+0x22a/0x6a0
> [  244.405623]  [] process_one_work+0x153/0x3f0
> [  244.405628]  [] worker_thread+0x12b/0x4b0
> [  244.405633]  [] ? rescuer_thread+0x350/0x350
> [  244.405637]  [] kthread+0xd3/0xf0
> [  244.405641]  [] ? kthread_park+0x60/0x60
> [  244.405645]  [] ? kthread_park+0x60/0x60
> [  244.405649]  [] ret_from_fork+0x25/0x30
> 
> Signed-off-by: Huang Rui 

Reviewed-by: Michel Dänzer 


-- 
Earthling Michel Dänzer   |   http://www.amd.com
Libre software enthusiast | Mesa and X developer
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 2/3] drm/amdgpu: add gtt print like vram when dump mm table

2017-04-13 Thread Chunming Zhou
Change-Id: If0474e24e14d237d2d55731871c5ceb11e5a3601
Signed-off-by: Chunming Zhou 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 6 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 4 
 2 files changed, 10 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
index 8a950a5..4bc1dd6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
@@ -138,6 +138,12 @@ int amdgpu_gtt_mgr_alloc(struct ttm_mem_type_manager *man,
return r;
 }
 
+void amdgpu_gtt_mgr_print(struct seq_file *m, struct ttm_mem_type_manager *man)
+{
+   struct amdgpu_gtt_mgr *mgr = man->priv;
+   seq_printf(m, "man size:%llu pages, gtt available:%llu pages\n",
+  man->size,   mgr->available);
+}
 /**
  * amdgpu_gtt_mgr_new - allocate a new node
  *
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index c3112b6..688056e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1540,6 +1540,8 @@ int amdgpu_fill_buffer(struct amdgpu_bo *bo,
 
 #if defined(CONFIG_DEBUG_FS)
 
+extern void amdgpu_gtt_mgr_print(struct seq_file *m, struct 
ttm_mem_type_manager
+*man);
 static int amdgpu_mm_dump_table(struct seq_file *m, void *data)
 {
struct drm_info_node *node = (struct drm_info_node *)m->private;
@@ -1558,6 +1560,8 @@ static int amdgpu_mm_dump_table(struct seq_file *m, void 
*data)
   adev->mman.bdev.man[ttm_pl].size,
   (u64)atomic64_read(&adev->vram_usage) >> 20,
   (u64)atomic64_read(&adev->vram_vis_usage) >> 20);
+   if (ttm_pl == TTM_PL_TT)
+   amdgpu_gtt_mgr_print(m, &adev->mman.bdev.man[ttm_pl]);
return ret;
 }
 
-- 
1.9.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 3/3] drm/amdgpu: move gtt usage statistic to gtt mgr

2017-04-13 Thread Chunming Zhou
Change-Id: Ifea42c8ae2206143d7e22b35eea537ba9e928fe8
Signed-off-by: Chunming Zhou 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 13 ++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c  |  6 --
 2 files changed, 10 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
index 4bc1dd6..4b282ec 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
@@ -97,6 +97,7 @@ int amdgpu_gtt_mgr_alloc(struct ttm_mem_type_manager *man,
 {
struct amdgpu_gtt_mgr *mgr = man->priv;
struct drm_mm_node *node = mem->mm_node;
+   struct amdgpu_device *adev = amdgpu_ttm_adev(man->bdev);
enum drm_mm_search_flags sflags = DRM_MM_SEARCH_BEST;
enum drm_mm_allocator_flags aflags = DRM_MM_CREATE_DEFAULT;
unsigned long fpfn, lpfn;
@@ -124,8 +125,10 @@ int amdgpu_gtt_mgr_alloc(struct ttm_mem_type_manager *man,
r = drm_mm_insert_node_in_range_generic(&mgr->mm, node, mem->num_pages,
mem->page_alignment, 0,
fpfn, lpfn, sflags, aflags);
-   if (!r)
+   if (!r) {
mgr->available -= mem->num_pages;
+   atomic64_add(mem->size, &adev->gtt_usage);
+   }
spin_unlock(&mgr->lock);
 
if (!r) {
@@ -140,9 +143,11 @@ int amdgpu_gtt_mgr_alloc(struct ttm_mem_type_manager *man,
 
 void amdgpu_gtt_mgr_print(struct seq_file *m, struct ttm_mem_type_manager *man)
 {
+   struct amdgpu_device *adev = amdgpu_ttm_adev(man->bdev);
struct amdgpu_gtt_mgr *mgr = man->priv;
-   seq_printf(m, "man size:%llu pages, gtt available:%llu pages\n",
-  man->size,   mgr->available);
+   seq_printf(m, "man size:%llu pages, gtt available:%llu pages, 
usage:%lluMB\n",
+  man->size,   mgr->available,
+  (u64)atomic64_read(&adev->gtt_usage) >> 20);
 }
 /**
  * amdgpu_gtt_mgr_new - allocate a new node
@@ -213,6 +218,7 @@ static void amdgpu_gtt_mgr_del(struct ttm_mem_type_manager 
*man,
 {
struct amdgpu_gtt_mgr *mgr = man->priv;
struct drm_mm_node *node = mem->mm_node;
+   struct amdgpu_device *adev = amdgpu_ttm_adev(man->bdev);
 
if (!node)
return;
@@ -221,6 +227,7 @@ static void amdgpu_gtt_mgr_del(struct ttm_mem_type_manager 
*man,
if (node->start != AMDGPU_BO_INVALID_OFFSET) {
drm_mm_remove_node(node);
mgr->available += mem->num_pages;
+   atomic64_sub(mem->size, &adev->gtt_usage);
}
spin_unlock(&mgr->lock);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index 3cde1c9..2249eb6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -61,9 +61,6 @@ static void amdgpu_update_memory_usage(struct amdgpu_device 
*adev,
 
if (new_mem) {
switch (new_mem->mem_type) {
-   case TTM_PL_TT:
-   atomic64_add(new_mem->size, &adev->gtt_usage);
-   break;
case TTM_PL_VRAM:
atomic64_add(new_mem->size, &adev->vram_usage);
vis_size = amdgpu_get_vis_part_size(adev, new_mem);
@@ -80,9 +77,6 @@ static void amdgpu_update_memory_usage(struct amdgpu_device 
*adev,
 
if (old_mem) {
switch (old_mem->mem_type) {
-   case TTM_PL_TT:
-   atomic64_sub(old_mem->size, &adev->gtt_usage);
-   break;
case TTM_PL_VRAM:
atomic64_sub(old_mem->size, &adev->vram_usage);
vis_size = amdgpu_get_vis_part_size(adev, old_mem);
-- 
1.9.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH 1/3] drm/amdgpu: fix gtt mgr available statistics

2017-04-13 Thread Chunming Zhou
gtt_mgr_alloc is called by many places in local driver, while
gtt_mgr_new is called by get_node in ttm.

Change-Id: Ia5a18a3b531a01ad7d47f40e08f778e7b94c048a
Signed-off-by: Chunming Zhou 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
index 69ab2ee..8a950a5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c
@@ -124,6 +124,8 @@ int amdgpu_gtt_mgr_alloc(struct ttm_mem_type_manager *man,
r = drm_mm_insert_node_in_range_generic(&mgr->mm, node, mem->num_pages,
mem->page_alignment, 0,
fpfn, lpfn, sflags, aflags);
+   if (!r)
+   mgr->available -= mem->num_pages;
spin_unlock(&mgr->lock);
 
if (!r) {
@@ -160,7 +162,6 @@ static int amdgpu_gtt_mgr_new(struct ttm_mem_type_manager 
*man,
spin_unlock(&mgr->lock);
return 0;
}
-   mgr->available -= mem->num_pages;
spin_unlock(&mgr->lock);
 
node = kzalloc(sizeof(*node), GFP_KERNEL);
@@ -187,9 +188,6 @@ static int amdgpu_gtt_mgr_new(struct ttm_mem_type_manager 
*man,
 
return 0;
 err_out:
-   spin_lock(&mgr->lock);
-   mgr->available += mem->num_pages;
-   spin_unlock(&mgr->lock);
 
return r;
 }
@@ -214,9 +212,10 @@ static void amdgpu_gtt_mgr_del(struct ttm_mem_type_manager 
*man,
return;
 
spin_lock(&mgr->lock);
-   if (node->start != AMDGPU_BO_INVALID_OFFSET)
+   if (node->start != AMDGPU_BO_INVALID_OFFSET) {
drm_mm_remove_node(node);
-   mgr->available += mem->num_pages;
+   mgr->available += mem->num_pages;
+   }
spin_unlock(&mgr->lock);
 
kfree(node);
-- 
1.9.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH] drm/amdgpu: fix dead lock if any ip block resume failed in s3

2017-04-13 Thread Huang Rui
Driver must free the console lock whether driver resuming successful
or not.  Otherwise, fb_console will be always waiting for the lock and
then cause system stuck.

[  244.405541] INFO: task kworker/0:0:4 blocked for more than 120 seconds.
[  244.405543]   Tainted: G   OE   4.9.0-custom #1
[  244.405544] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[  244.405541] INFO: task kworker/0:0:4 blocked for more than 120 seconds.
[  244.405543]   Tainted: G   OE   4.9.0-custom #1
[  244.405544] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[  244.405550] kworker/0:0 D0 4  2 0x0008
[  244.405559] Workqueue: events console_callback
[  244.405564]  88045a2cfc00  880462b75940 
81c0e500
[  244.405568]  880476419280 c900018f7c90 817dcf62 
003c
[  244.405572]  0001 0002 880462b75940 
880462b75940
[  244.405573] Call Trace:
[  244.405580]  [] ? __schedule+0x222/0x6a0
[  244.405584]  [] schedule+0x36/0x80
[  244.405588]  [] schedule_timeout+0x1fc/0x390
[  244.405592]  [] __down_common+0xa5/0xf8
[  244.405598]  [] ? put_prev_entity+0x48/0x710
[  244.405601]  [] __down+0x1d/0x1f
[  244.405606]  [] down+0x41/0x50
[  244.405611]  [] console_lock+0x1a/0x40
[  244.405614]  [] console_callback+0x13/0x160
[  244.405617]  [] ? __schedule+0x22a/0x6a0
[  244.405623]  [] process_one_work+0x153/0x3f0
[  244.405628]  [] worker_thread+0x12b/0x4b0
[  244.405633]  [] ? rescuer_thread+0x350/0x350
[  244.405637]  [] kthread+0xd3/0xf0
[  244.405641]  [] ? kthread_park+0x60/0x60
[  244.405645]  [] ? kthread_park+0x60/0x60
[  244.405649]  [] ret_from_fork+0x25/0x30

Signed-off-by: Huang Rui 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 28 
 1 file changed, 12 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index bd3a0d5..abb4dcc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2280,7 +2280,7 @@ int amdgpu_device_resume(struct drm_device *dev, bool 
resume, bool fbcon)
struct drm_connector *connector;
struct amdgpu_device *adev = dev->dev_private;
struct drm_crtc *crtc;
-   int r;
+   int r = 0;
 
if (dev->switch_power_state == DRM_SWITCH_POWER_OFF)
return 0;
@@ -2292,11 +2292,8 @@ int amdgpu_device_resume(struct drm_device *dev, bool 
resume, bool fbcon)
pci_set_power_state(dev->pdev, PCI_D0);
pci_restore_state(dev->pdev);
r = pci_enable_device(dev->pdev);
-   if (r) {
-   if (fbcon)
-   console_unlock();
-   return r;
-   }
+   if (r)
+   goto unlock;
}
if (adev->is_atom_fw)
amdgpu_atomfirmware_scratch_regs_restore(adev);
@@ -2313,7 +2310,7 @@ int amdgpu_device_resume(struct drm_device *dev, bool 
resume, bool fbcon)
r = amdgpu_resume(adev);
if (r) {
DRM_ERROR("amdgpu_resume failed (%d).\n", r);
-   return r;
+   goto unlock;
}
amdgpu_fence_driver_resume(adev);
 
@@ -2324,11 +2321,8 @@ int amdgpu_device_resume(struct drm_device *dev, bool 
resume, bool fbcon)
}
 
r = amdgpu_late_init(adev);
-   if (r) {
-   if (fbcon)
-   console_unlock();
-   return r;
-   }
+   if (r)
+   goto unlock;
 
/* pin cursors */
list_for_each_entry(crtc, &dev->mode_config.crtc_list, head) {
@@ -2349,7 +2343,7 @@ int amdgpu_device_resume(struct drm_device *dev, bool 
resume, bool fbcon)
}
r = amdgpu_amdkfd_resume(adev);
if (r)
-   return r;
+   goto unlock;
 
/* blat the mode back in */
if (fbcon) {
@@ -2396,12 +2390,14 @@ int amdgpu_device_resume(struct drm_device *dev, bool 
resume, bool fbcon)
dev->dev->power.disable_depth--;
 #endif
 
-   if (fbcon) {
+   if (fbcon)
amdgpu_fbdev_set_suspend(adev, 0);
+
+unlock:
+   if (fbcon)
console_unlock();
-   }
 
-   return 0;
+   return r;
 }
 
 static bool amdgpu_check_soft_reset(struct amdgpu_device *adev)
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [RfC PATCH] drm: fourcc byteorder: brings header file comments in line with reality.

2017-04-13 Thread Pekka Paalanen
On Tue, 11 Apr 2017 13:23:53 +0200
Gerd Hoffmann  wrote:

>   Hi,
> 
> > > Just let know what you need tested, I should be able to turn it around
> > > within a couple of days.  
> > 
> > That's part of my problem. I don't really know what should be tested.
> > What do people do with their BE machines that we should avoid breaking?  
> 
> For the virtual machine use case the bar is pretty low, it's mostly
> about a graphical server console.  Anaconda installer.  Gnome desktop
> with browser and standard xorg (xterm) + gtk apps.  No heavy OpenGL
> stuff.  No hardware acceleration, so if opengl is used then it'll be
> llvmpipe.
> 
> Right now Xorg is important.  Not sure whenever wayland ever will be,
> possibly the ppc64 -> ppc64le switch goes faster than the xorg ->
> wayland switch.

Hi,

IMHO you can ignore Wayland for now I suppose, I just wanted to point
out that we have similar problems there and whatever you do with the
DRM format codes will affect things on Wayland too.

Once you get things hashed out on an X.org based stack, we can look
what it means for Wayland software.

After all, BE users are scarce and allegedly favouring old software to
avoid breakage; Wayland is new, and Wayland compositors still "rare",
so the intersection of people using both BE and Wayland and relying on
it to work is... minuscule? insignificant?

I don't mean to belittle people that use Wayland on BE, but by that one
bug report EGL is and probably has been broken, and it's unclear if
anything has ever worked.


Thanks,
pq


pgp1u35sx47li.pgp
Description: OpenPGP digital signature
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx