[PATCH] drm/amd/display: remove gcc warning Wunused-but-set-variable

2019-10-18 Thread Chen Wandun
From: Chenwandun 

drivers/gpu/drm/amd/display/dc/dce/dce_aux.c: In function 
dce_aux_configure_timeout:
drivers/gpu/drm/amd/display/dc/dce/dce_aux.c: warning: variable timeout set but 
not used [-Wunused-but-set-variable]

Signed-off-by: Chenwandun 
---
 drivers/gpu/drm/amd/display/dc/dce/dce_aux.c | 5 -
 1 file changed, 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dce/dce_aux.c 
b/drivers/gpu/drm/amd/display/dc/dce/dce_aux.c
index 976bd49..739f8e2 100644
--- a/drivers/gpu/drm/amd/display/dc/dce/dce_aux.c
+++ b/drivers/gpu/drm/amd/display/dc/dce/dce_aux.c
@@ -432,7 +432,6 @@ static bool dce_aux_configure_timeout(struct ddc_service 
*ddc,
 {
uint32_t multiplier = 0;
uint32_t length = 0;
-   uint32_t timeout = 0;
struct ddc *ddc_pin = ddc->ddc_pin;
struct dce_aux *aux_engine = 
ddc->ctx->dc->res_pool->engines[ddc_pin->pin_data->en];
struct aux_engine_dce110 *aux110 = FROM_AUX_ENGINE(aux_engine);
@@ -446,25 +445,21 @@ static bool dce_aux_configure_timeout(struct ddc_service 
*ddc,
length = timeout_in_us/TIME_OUT_MULTIPLIER_8;
if (timeout_in_us % TIME_OUT_MULTIPLIER_8 != 0)
length++;
-   timeout = length * TIME_OUT_MULTIPLIER_8;
} else if (timeout_in_us <= 2 * TIME_OUT_INCREMENT) {
multiplier = 1;
length = timeout_in_us/TIME_OUT_MULTIPLIER_16;
if (timeout_in_us % TIME_OUT_MULTIPLIER_16 != 0)
length++;
-   timeout = length * TIME_OUT_MULTIPLIER_16;
} else if (timeout_in_us <= 4 * TIME_OUT_INCREMENT) {
multiplier = 2;
length = timeout_in_us/TIME_OUT_MULTIPLIER_32;
if (timeout_in_us % TIME_OUT_MULTIPLIER_32 != 0)
length++;
-   timeout = length * TIME_OUT_MULTIPLIER_32;
} else if (timeout_in_us > 4 * TIME_OUT_INCREMENT) {
multiplier = 3;
length = timeout_in_us/TIME_OUT_MULTIPLIER_64;
if (timeout_in_us % TIME_OUT_MULTIPLIER_64 != 0)
length++;
-   timeout = length * TIME_OUT_MULTIPLIER_64;
}
 
length = (length < MAX_TIMEOUT_LENGTH) ? length : MAX_TIMEOUT_LENGTH;
-- 
2.7.4



RE: [PATCH v2] drm/amdkfd: kfd open return failed if device is locked

2019-10-18 Thread Zeng, Oak
In current implementation, even dqm is stopped, user can still create (and 
start) new queue. This is not correct. We should forbid user create/start new 
queue if dqm is stopped - stop means stopping the current executing queues and 
stop receiving new creating request.

Regards,
Oak

-Original Message-
From: amd-gfx  On Behalf Of Kuehling, 
Felix
Sent: Friday, October 18, 2019 3:08 PM
To: amd-gfx@lists.freedesktop.org; Yang, Philip 
Subject: Re: [PATCH v2] drm/amdkfd: kfd open return failed if device is locked

On 2019-10-18 1:36 p.m., Yang, Philip wrote:
> If device is locked for suspend and resume, kfd open should return 
> failed -EAGAIN without creating process, otherwise the application 
> exit to release the process will hang to wait for resume is done if 
> the suspend and resume is stuck somewhere. This is backtrace:
>
> v2: fix processes that were created before suspend/resume got stuck
>
> [Thu Oct 17 16:43:37 2019] INFO: task rocminfo:3024 blocked for more 
> than 120 seconds.
> [Thu Oct 17 16:43:37 2019]   Not tainted
> 5.0.0-rc1-kfd-compute-rocm-dkms-no-npi-1131 #1 [Thu Oct 17 16:43:37 
> 2019] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
> message.
> [Thu Oct 17 16:43:37 2019] rocminfoD0  3024   2947
> 0x8000
> [Thu Oct 17 16:43:37 2019] Call Trace:
> [Thu Oct 17 16:43:37 2019]  ? __schedule+0x3d9/0x8a0 [Thu Oct 17 
> 16:43:37 2019]  schedule+0x32/0x70 [Thu Oct 17 16:43:37 2019]  
> schedule_preempt_disabled+0xa/0x10
> [Thu Oct 17 16:43:37 2019]  __mutex_lock.isra.9+0x1e3/0x4e0 [Thu Oct 
> 17 16:43:37 2019]  ? __call_srcu+0x264/0x3b0 [Thu Oct 17 16:43:37 
> 2019]  ? process_termination_cpsch+0x24/0x2f0
> [amdgpu]
> [Thu Oct 17 16:43:37 2019]  process_termination_cpsch+0x24/0x2f0
> [amdgpu]
> [Thu Oct 17 16:43:37 2019]
> kfd_process_dequeue_from_all_devices+0x42/0x60 [amdgpu] [Thu Oct 17 
> 16:43:37 2019]  kfd_process_notifier_release+0x1be/0x220
> [amdgpu]
> [Thu Oct 17 16:43:37 2019]  __mmu_notifier_release+0x3e/0xc0 [Thu Oct 
> 17 16:43:37 2019]  exit_mmap+0x160/0x1a0 [Thu Oct 17 16:43:37 2019]  ? 
> __handle_mm_fault+0xba3/0x1200 [Thu Oct 17 16:43:37 2019]  ? 
> exit_robust_list+0x5a/0x110 [Thu Oct 17 16:43:37 2019]  
> mmput+0x4a/0x120 [Thu Oct 17 16:43:37 2019]  do_exit+0x284/0xb20 [Thu 
> Oct 17 16:43:37 2019]  ? handle_mm_fault+0xfa/0x200 [Thu Oct 17 
> 16:43:37 2019]  do_group_exit+0x3a/0xa0 [Thu Oct 17 16:43:37 2019]  
> __x64_sys_exit_group+0x14/0x20 [Thu Oct 17 16:43:37 2019]  
> do_syscall_64+0x4f/0x100 [Thu Oct 17 16:43:37 2019]  
> entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> Signed-off-by: Philip Yang 
> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c   | 6 +++---
>   drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 6 ++
>   2 files changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> index d9e36dbf13d5..40d75c39f08e 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> @@ -120,13 +120,13 @@ static int kfd_open(struct inode *inode, struct file 
> *filep)
>   return -EPERM;
>   }
>   
> + if (kfd_is_locked())
> + return -EAGAIN;
> +
>   process = kfd_create_process(filep);
>   if (IS_ERR(process))
>   return PTR_ERR(process);
>   
> - if (kfd_is_locked())
> - return -EAGAIN;
> -
>   dev_dbg(kfd_device, "process %d opened, compat mode (32 bit) - %d\n",
>   process->pasid, process->is_32bit_user_mode);
>   
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> index 8509814a6ff0..3784013b92a0 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> @@ -128,6 +128,12 @@ void kfd_process_dequeue_from_all_devices(struct 
> kfd_process *p)
>   {
>   struct kfd_process_device *pdd;
>   
> + /* If suspend/resume got stuck, dqm_lock is hold,
> +  * skip process_termination_cpsch to avoid deadlock
> +  */
> + if (kfd_is_locked())
> + return;
> +

Holding the DQM lock during reset has caused other problems (lock dependency 
issues and deadlocks) and I was thinking about getting rid of that completely. 
The intention of holding the DQM lock during reset was to prevent the device 
queue manager from accessing the CP hardware while a reset was in progress. 
However, I think there are smarter ways to achieve that. We already get a 
pre-reset callback (kgd2kfd_pre_reset) that executes the kgd2kfd_suspend, which 
suspends processes and stops DQM through kfd->dqm->ops.stop(kfd->dqm). This 
should take care of most of the problem. If there are any places in DQM that 
try to access the devices, they should add conditions to not access HW while 
DQM is stopped. Then we could avoid holding a lock indefinitely 

amdgpu dumping during boot in picasso

2019-10-18 Thread Ken Moffat
I recently upgraded the mobo in an old machine to now use a Ryzen 5
3400G (Picasso APU). With 5.3 kernels it seems to be running well, but
an unrelated issue caused me to look at dmesg: it dumps during boot,
and then there may be related dumps later, possibly from resuming from
suspend or hibernation. The earliest usable kernel is 5.0 (Picasso is
too new for original 4.19 which doesn't load amdgpu) and it turns out
that all kernels up to and including linus's tree of a few hours ago
do this. From linus's tree:

[2.445827] [drm] DM_PPLIB: values for F clock
[2.445828] [drm] DM_PPLIB:   0 in kHz, 3099 in mV
[2.445829] [drm] DM_PPLIB:   0 in kHz, 3099 in mV
[2.445829] [drm] DM_PPLIB:   0 in kHz, 3099 in mV
[2.445830] [drm] DM_PPLIB:   150 in kHz, 4399 in mV
[2.445839] [ cut here ]
[2.445914] WARNING: CPU: 5 PID: 287 at
drivers/gpu/drm/amd/amdgpu/../display/dc/calcs/dcn_calcs.c:1464
dcn_bw_update_from_pplib+0xa1/0x2e0 [amdgpu]
[2.445914] Modules linked in: amdgpu(+) k10temp mfd_core gpu_sched ttm
[2.445920] CPU: 5 PID: 287 Comm: udevd Tainted: GT
5.4.0-rc3-00124-g7571438a4868 #28
[2.445921] Hardware name: Gigabyte Technology Co., Ltd. A320M-S2H
V2/A320M-S2H V2-CF, BIOS F31 04/15/2019
[2.446006] RIP: 0010:dcn_bw_update_from_pplib+0xa1/0x2e0 [amdgpu]
[2.446008] Code: 24 10 85 c9 74 24 8d 71 ff 48 8d 44 24 14 48 8d
54 f4 1c eb 0d 48 83 c0 08 48 39 d0 0f 84 3e 01 00 00 44 8b 00 45 85
c0 75 eb <0f> 0b e8 c8 4c b6 c9 48 89 ef 4c 89 e2 be 04 00 00 00 e8 28
8d fe
[2.446009] RSP: 0018:b60480533638 EFLAGS: 00010246
[2.446010] RAX: b6048053364c RBX: a17c0a04 RCX: 0004
[2.446011] RDX: b6048053366c RSI: 0003 RDI: 8b008d08
[2.446012] RBP: a17c0bb03500 R08:  R09: 8b7963b4
[2.446013] R10: 0353 R11: 0002e208 R12: b604805336d8
[2.446013] R13: 0001 R14: 000a R15: a17c0bb03500
[2.446015] FS:  7fb95ab5c780() GS:a17c10f4()
knlGS:
[2.446015] CS:  0010 DS:  ES:  CR0: 80050033
[2.446016] CR2: 7fff52ec2348 CR3: 00040d33 CR4: 003406e0
[2.446017] Call Trace:
[2.446022]  ? preempt_count_add+0x44/0xa0
[2.446108]  dcn10_create_resource_pool+0x832/0xb50 [amdgpu]
[2.446177]  ? get_smu_clock_info_v3_1+0x48/0x70 [amdgpu]
[2.446241]  dc_create_resource_pool+0xd5/0x140 [amdgpu]
[2.446308]  ? dal_gpio_service_create+0x84/0x100 [amdgpu]
[2.446371]  dc_create+0x255/0x730 [amdgpu]
[2.446374]  ? lock_timer_base+0x5c/0x80
[2.446376]  ? apic_timer_interrupt+0xa/0x20
[2.446378]  ? kmem_cache_alloc_trace+0x3a/0x1e0
[2.446443]  amdgpu_dm_init+0x161/0x210 [amdgpu]
[2.446509]  ?
phm_wait_for_register_unequal.part.0+0x4b/0x80 [amdgpu]
[2.446574]  dm_hw_init+0x9/0x20 [amdgpu]
[2.446638]  amdgpu_device_init.cold+0x117a/0x1325 [amdgpu]
[2.446692]  amdgpu_driver_load_kms+0x55/0x110 [amdgpu]
[2.446695]  drm_dev_register+0x13c/0x180
[2.446748]  amdgpu_pci_probe+0xd4/0x130 [amdgpu]
[2.446749]  ? __pm_runtime_resume+0x54/0x70
[2.446751]  pci_device_probe+0xc6/0x130
[2.446753]  really_probe+0xfc/0x2d0
[2.446754]  driver_probe_device+0x59/0xd0
[2.446756]  device_driver_attach+0x68/0x70
[2.446757]  __driver_attach+0x54/0xc0
[2.446758]  ? device_driver_attach+0x70/0x70
[2.446758]  bus_for_each_dev+0x87/0xd0
[2.446760]  bus_add_driver+0x18b/0x1e0
[2.446761]  driver_register+0x67/0xb0
[2.446762]  ? 0xc072d000
[2.446763]  do_one_initcall+0x41/0x21f
[2.446765]  ? kmem_cache_alloc_trace+0x3a/0x1e0
[2.446767]  do_init_module+0x59/0x210
[2.446769]  load_module+0x20f5/0x2420
[2.446770]  ? frob_text.isra.0+0x20/0x20
[2.446772]  __do_sys_finit_module+0xfd/0x120
[2.446774]  do_syscall_64+0x43/0x110
[2.446775]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[2.446777] RIP: 0033:0x7fb95acbede9
[2.446778] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00
00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24
08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 77 a0 0c 00 f7 d8 64 89
01 48
[2.446779] RSP: 002b:7fff52ec1568 EFLAGS: 0246 ORIG_RAX:
0139
[2.446780] RAX: ffda RBX: 00e78450 RCX: 7fb95acbede9
[2.446781] RDX:  RSI: 7fb95ada084d RDI: 000d
[2.446781] RBP: 0002 R08:  R09: 7fff52ec1ad0
[2.446782] R10: 000d R11: 0246 R12: 7fb95ada084d
[2.446782] R13:  R14: 00e69c40 R15: 00e78450
[2.446783] ---[ end trace ba451112660fe31d ]---

Full dmesg and config available if required.

Any ideas, please ?

ĸen


Re: [PATCH v4] drm/amd/display: Add MST atomic routines

2019-10-18 Thread Lyude Paul
On Thu, 2019-10-17 at 12:52 -0400, mikita.lip...@amd.com wrote:
> From: Mikita Lipski 
> 
> - Adding encoder atomic check to find vcpi slots for a connector
> - Using DRM helper functions to calculate PBN
> - Adding connector atomic check to release vcpi slots if connector
> loses CRTC
> - Calculate  PBN and VCPI slots only once during atomic
> check and store them on crtc_state to eliminate
> redundant calculation
> - Call drm_dp_mst_atomic_check to verify validity of MST topology
> during state atomic check
> 
> v2:
> - squashed previous 3 separate patches
> - removed DSC PBN calculation,
> - added PBN and VCPI slots properties to amdgpu connector
> 
> v3:
> - moved vcpi_slots and pbn properties to dm_crtc_state and dc_stream_state
> - updates stream's vcpi_slots and pbn on commit
> - separated patch from the DSC MST series
> 
> v4:
> - set vcpi_slots and pbn properties to dm_connector_state
> - copy porperties from connector state on to crtc state
> 
> Cc: Jerry Zuo 
> Cc: Harry Wentland 
> Cc: Nicholas Kazlauskas 
> Cc: Lyude Paul 
> Signed-off-by: Mikita Lipski 
> ---
>  .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 72 +--
>  .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h |  6 ++
>  .../amd/display/amdgpu_dm/amdgpu_dm_helpers.c | 42 +--
>  .../display/amdgpu_dm/amdgpu_dm_mst_types.c   | 32 +
>  drivers/gpu/drm/amd/display/dc/dc_stream.h|  3 +
>  5 files changed, 112 insertions(+), 43 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> index 10cce584719f..1f1146a4e85e 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> @@ -3811,6 +3811,11 @@ create_stream_for_sink(struct amdgpu_dm_connector
> *aconnector,
>  
>   update_stream_signal(stream, sink);
>  
> + if(dm_state){

nit: if (dm_state) {

But that's it! The rest of this looks good to me. With that nitpick addressed:

Reviewed-by: Lyude Paul 
> + stream->vcpi_slots = dm_state->vcpi_slots;
> + stream->pbn = dm_state->pbn;
> + }
> +
>   if (stream->signal == SIGNAL_TYPE_HDMI_TYPE_A)
>   mod_build_hf_vsif_infopacket(stream, >vsp_infopacket,
> false, false);
>  
> @@ -3889,6 +3894,8 @@ dm_crtc_duplicate_state(struct drm_crtc *crtc)
>   state->crc_src = cur->crc_src;
>   state->cm_has_degamma = cur->cm_has_degamma;
>   state->cm_is_degamma_srgb = cur->cm_is_degamma_srgb;
> + state->vcpi_slots = cur->vcpi_slots;
> + state->pbn = cur->pbn;
>  
>   /* TODO Duplicate dc_stream after objects are stream object is
> flattened */
>  
> @@ -4157,7 +4164,8 @@ void amdgpu_dm_connector_funcs_reset(struct
> drm_connector *connector)
>   state->underscan_hborder = 0;
>   state->underscan_vborder = 0;
>   state->base.max_requested_bpc = 8;
> -
> + state->vcpi_slots = 0;
> + state->pbn = 0;
>   if (connector->connector_type == DRM_MODE_CONNECTOR_eDP)
>   state->abm_level = amdgpu_dm_abm_level;
>  
> @@ -4186,7 +4194,8 @@ amdgpu_dm_connector_atomic_duplicate_state(struct
> drm_connector *connector)
>   new_state->underscan_enable = state->underscan_enable;
>   new_state->underscan_hborder = state->underscan_hborder;
>   new_state->underscan_vborder = state->underscan_vborder;
> -
> + new_state->vcpi_slots = state->vcpi_slots;
> + new_state->pbn = state->pbn;
>   return _state->base;
>  }
>  
> @@ -4587,6 +4596,37 @@ static int dm_encoder_helper_atomic_check(struct
> drm_encoder *encoder,
> struct drm_crtc_state *crtc_state,
> struct drm_connector_state
> *conn_state)
>  {
> + struct drm_atomic_state *state = crtc_state->state;
> + struct drm_connector *connector = conn_state->connector;
> + struct amdgpu_dm_connector *aconnector =
> to_amdgpu_dm_connector(connector);
> + struct dm_connector_state *dm_new_connector_state =
> to_dm_connector_state(conn_state);
> + const struct drm_display_mode *adjusted_mode = _state-
> >adjusted_mode;
> + struct drm_dp_mst_topology_mgr *mst_mgr;
> + struct drm_dp_mst_port *mst_port;
> + int clock, bpp = 0;
> +
> + if (!aconnector->port || !aconnector->dc_sink)
> + return 0;
> +
> + mst_port = aconnector->port;
> + mst_mgr = >mst_port->mst_mgr;
> +
> + if (!crtc_state->connectors_changed && !crtc_state->mode_changed)
> + return 0;
> +
> + if(!state->duplicated) {
> + bpp = (uint8_t)connector->display_info.bpc * 3;
> + clock = adjusted_mode->clock;
> + dm_new_connector_state->pbn = drm_dp_calc_pbn_mode(clock,
> bpp);
> + }
> + dm_new_connector_state->vcpi_slots =
> drm_dp_atomic_find_vcpi_slots(state,
> +

[PATCH AUTOSEL 4.19 088/100] drm/amdgpu: fix memory leak

2019-10-18 Thread Sasha Levin
From: Nirmoy Das 

[ Upstream commit 083164dbdb17c5ea4ad92c1782b59c9d75567790 ]

cleanup error handling code and make sure temporary info array
with the handles are freed by amdgpu_bo_list_put() on
idr_replace()'s failure.

Signed-off-by: Nirmoy Das 
Reviewed-by: Christian König 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
index b80243d3972e4..ce7f18c5ccb26 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
@@ -264,7 +264,7 @@ int amdgpu_bo_list_ioctl(struct drm_device *dev, void *data,
 
r = amdgpu_bo_create_list_entry_array(>in, );
if (r)
-   goto error_free;
+   return r;
 
switch (args->in.operation) {
case AMDGPU_BO_LIST_OP_CREATE:
@@ -277,8 +277,7 @@ int amdgpu_bo_list_ioctl(struct drm_device *dev, void *data,
r = idr_alloc(>bo_list_handles, list, 1, 0, GFP_KERNEL);
mutex_unlock(>bo_list_lock);
if (r < 0) {
-   amdgpu_bo_list_put(list);
-   return r;
+   goto error_put_list;
}
 
handle = r;
@@ -300,9 +299,8 @@ int amdgpu_bo_list_ioctl(struct drm_device *dev, void *data,
mutex_unlock(>bo_list_lock);
 
if (IS_ERR(old)) {
-   amdgpu_bo_list_put(list);
r = PTR_ERR(old);
-   goto error_free;
+   goto error_put_list;
}
 
amdgpu_bo_list_put(old);
@@ -319,8 +317,10 @@ int amdgpu_bo_list_ioctl(struct drm_device *dev, void 
*data,
 
return 0;
 
+error_put_list:
+   amdgpu_bo_list_put(list);
+
 error_free:
-   if (info)
-   kvfree(info);
+   kvfree(info);
return r;
 }
-- 
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH AUTOSEL 4.19 048/100] drm/amd/display: fix odm combine pipe reset

2019-10-18 Thread Sasha Levin
From: Dmytro Laktyushkin 

[ Upstream commit f25f06b67ba237b76092a6fc522b1a94e84bfa85 ]

We fail to reset the second odm combine pipe. This change fixes
odm pointer management.

Signed-off-by: Dmytro Laktyushkin 
Reviewed-by: Tony Cheng 
Acked-by: Bhawanpreet Lakha 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/display/dc/core/dc_resource.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
index d440b28ee43fb..6896d69b8c240 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
@@ -1399,9 +1399,9 @@ bool dc_remove_plane_from_context(
 * For head pipe detach surfaces from pipe for tail
 * pipe just zero it out
 */
-   if (!pipe_ctx->top_pipe ||
-   (!pipe_ctx->top_pipe->top_pipe &&
+   if (!pipe_ctx->top_pipe || 
(!pipe_ctx->top_pipe->top_pipe &&
pipe_ctx->top_pipe->stream_res.opp != 
pipe_ctx->stream_res.opp)) {
+   pipe_ctx->top_pipe = NULL;
pipe_ctx->plane_state = NULL;
pipe_ctx->bottom_pipe = NULL;
} else {
@@ -1803,8 +1803,6 @@ enum dc_status dc_remove_stream_from_ctx(
dc->res_pool->funcs->remove_stream_from_ctx(dc, 
new_ctx, stream);
 
memset(del_pipe, 0, sizeof(*del_pipe));
-
-   break;
}
}
 
-- 
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH AUTOSEL 5.3 68/89] drm/amdgpu: fix memory leak

2019-10-18 Thread Sasha Levin
From: Nirmoy Das 

[ Upstream commit 083164dbdb17c5ea4ad92c1782b59c9d75567790 ]

cleanup error handling code and make sure temporary info array
with the handles are freed by amdgpu_bo_list_put() on
idr_replace()'s failure.

Signed-off-by: Nirmoy Das 
Reviewed-by: Christian König 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
index 7bcf86c619995..61e38e43ad1d5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c
@@ -270,7 +270,7 @@ int amdgpu_bo_list_ioctl(struct drm_device *dev, void *data,
 
r = amdgpu_bo_create_list_entry_array(>in, );
if (r)
-   goto error_free;
+   return r;
 
switch (args->in.operation) {
case AMDGPU_BO_LIST_OP_CREATE:
@@ -283,8 +283,7 @@ int amdgpu_bo_list_ioctl(struct drm_device *dev, void *data,
r = idr_alloc(>bo_list_handles, list, 1, 0, GFP_KERNEL);
mutex_unlock(>bo_list_lock);
if (r < 0) {
-   amdgpu_bo_list_put(list);
-   return r;
+   goto error_put_list;
}
 
handle = r;
@@ -306,9 +305,8 @@ int amdgpu_bo_list_ioctl(struct drm_device *dev, void *data,
mutex_unlock(>bo_list_lock);
 
if (IS_ERR(old)) {
-   amdgpu_bo_list_put(list);
r = PTR_ERR(old);
-   goto error_free;
+   goto error_put_list;
}
 
amdgpu_bo_list_put(old);
@@ -325,8 +323,10 @@ int amdgpu_bo_list_ioctl(struct drm_device *dev, void 
*data,
 
return 0;
 
+error_put_list:
+   amdgpu_bo_list_put(list);
+
 error_free:
-   if (info)
-   kvfree(info);
+   kvfree(info);
return r;
 }
-- 
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: Stack out of bounds in KFD on Arcturus

2019-10-18 Thread Kuehling, Felix
On 2019-10-17 6:38 p.m., Grodzovsky, Andrey wrote:
> Not that I aware of, is there a special Kconfig flag to determine stack
> size ?

I remember there used to be a Kconfig option to force a 4KB kernel 
stack. I don't see it in the current kernel any more.

I don't have time to work on this myself. I'll create a ticket and see 
if I can find someone to investigate.

Thanks,
   Felix


>
> Andrey
>
> On 10/17/19 5:29 PM, Kuehling, Felix wrote:
>> I don't see why this problem would be specific to Arcturus. I don't see
>> any excessive allocations on the stack either. Also the code involved
>> here hasn't changed recently.
>>
>> Are you using some weird kernel config with a smaller stack? Is it
>> specific to a compiler version or some optimization flags? I've
>> sometimes seen function inlining cause excessive stack usage.
>>
>> Regards,
>>  Felix
>>
>> On 2019-10-17 4:09 p.m., Grodzovsky, Andrey wrote:
>>> He Felix - I see this on boot when working with Arcturus.
>>>
>>> Andrey
>>>
>>>
>>> [  103.602092] kfd kfd: Allocated 3969056 bytes on gart
>>> [  103.610769]
>>> ==
>>> [  103.611469] BUG: KASAN: stack-out-of-bounds in
>>> kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu]
>>> [  103.611646] Read of size 4 at addr 8883cb19ee38 by task modprobe/1122
>>>
>>> [  103.611836] CPU: 3 PID: 1122 Comm: modprobe Tainted: G
>>> O  5.3.0-rc3+ #45
>>> [  103.611847] Hardware name: System manufacturer System Product
>>> Name/Z170-PRO, BIOS 1902 06/27/2016
>>> [  103.611856] Call Trace:
>>> [  103.611879]  dump_stack+0x71/0xab
>>> [  103.611907]  print_address_description+0x1da/0x3c0
>>> [  103.612453]  ? kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu]
>>> [  103.612479]  __kasan_report+0x13f/0x1a0
>>> [  103.613022]  ? kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu]
>>> [  103.613580]  ? kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu]
>>> [  103.613604]  kasan_report+0xe/0x20
>>> [  103.614149]  kfd_create_vcrat_image_gpu+0x5db/0xb80 [amdgpu]
>>> [  103.614762]  ? kfd_fill_gpu_memory_affinity+0x110/0x110 [amdgpu]
>>> [  103.614796]  ? __alloc_pages_nodemask+0x2c9/0x560
>>> [  103.614824]  ? __alloc_pages_slowpath+0x1390/0x1390
>>> [  103.614898]  ? kmalloc_order+0x63/0x70
>>> [  103.615469]  kfd_create_crat_image_virtual+0x70c/0x770 [amdgpu]
>>> [  103.616054]  ? kfd_create_crat_image_acpi+0x1c0/0x1c0 [amdgpu]
>>> [  103.616095]  ? up_write+0x4b/0x70
>>> [  103.616649]  kfd_topology_add_device+0x98d/0xb10 [amdgpu]
>>> [  103.617207]  ? kfd_topology_shutdown+0x60/0x60 [amdgpu]
>>> [  103.617743]  ? start_cpsch+0x2ff/0x3a0 [amdgpu]
>>> [  103.61]  ? mutex_lock_io_nested+0xac0/0xac0
>>> [  103.617807]  ? __mutex_unlock_slowpath+0xda/0x420
>>> [  103.617848]  ? __mutex_unlock_slowpath+0xda/0x420
>>> [  103.617877]  ? wait_for_completion+0x200/0x200
>>> [  103.618461]  ? start_cpsch+0x38b/0x3a0 [amdgpu]
>>> [  103.619011]  ? create_queue_cpsch+0x670/0x670 [amdgpu]
>>> [  103.619573]  ? kfd_iommu_device_init+0x92/0x1e0 [amdgpu]
>>> [  103.620112]  ? kfd_iommu_resume+0x2c/0x2c0 [amdgpu]
>>> [  103.620655]  ? kfd_iommu_check_device+0xf0/0xf0 [amdgpu]
>>> [  103.621228]  kgd2kfd_device_init+0x474/0x870 [amdgpu]
>>> [  103.621781]  amdgpu_amdkfd_device_init+0x291/0x390 [amdgpu]
>>> [  103.622329]  ? amdgpu_amdkfd_device_probe+0x90/0x90 [amdgpu]
>>> [  103.622344]  ? kmsg_dump_rewind_nolock+0x59/0x59
>>> [  103.622895]  ? amdgpu_ras_eeprom_test+0x71/0x90 [amdgpu]
>>> [  103.623424]  amdgpu_device_init+0x1bbe/0x2f00 [amdgpu]
>>> [  103.623819]  ? amdgpu_device_has_dc_support+0x30/0x30 [amdgpu]
>>> [  103.623842]  ? __isolate_free_page+0x290/0x290
>>> [  103.623852]  ? fs_reclaim_acquire.part.97+0x5/0x30
>>> [  103.623891]  ? __alloc_pages_nodemask+0x2c9/0x560
>>> [  103.623912]  ? __alloc_pages_slowpath+0x1390/0x1390
>>> [  103.623945]  ? kasan_unpoison_shadow+0x31/0x40
>>> [  103.623970]  ? kmalloc_order+0x63/0x70
>>> [  103.624337]  amdgpu_driver_load_kms+0xd9/0x430 [amdgpu]
>>> [  103.624690]  ? amdgpu_register_gpu_instance+0xe0/0xe0 [amdgpu]
>>> [  103.624756]  ? drm_dev_register+0x19c/0x310 [drm]
>>> [  103.624768]  ? __kasan_slab_free+0x133/0x160
>>> [  103.624849]  drm_dev_register+0x1f5/0x310 [drm]
>>> [  103.625212]  amdgpu_pci_probe+0x109/0x1f0 [amdgpu]
>>> [  103.625565]  ? amdgpu_pmops_runtime_idle+0xe0/0xe0 [amdgpu]
>>> [  103.625580]  local_pci_probe+0x74/0xd0
>>> [  103.625603]  pci_device_probe+0x1fa/0x310
>>> [  103.625620]  ? pci_device_remove+0x1c0/0x1c0
>>> [  103.625640]  ? sysfs_do_create_link_sd.isra.2+0x74/0xe0
>>> [  103.625673]  really_probe+0x367/0x5d0
>>> [  103.625700]  driver_probe_device+0x177/0x1b0
>>> [  103.625721]  device_driver_attach+0x8a/0x90
>>> [  103.625737]  ? device_driver_attach+0x90/0x90
>>> [  103.625746]  __driver_attach+0xeb/0x190
>>> [  103.625765]  ? device_driver_attach+0x90/0x90
>>> [  103.625773]  bus_for_each_dev+0xe4/0x160
>>> [  103.625789]  ? subsys_dev_iter_exit+0x10/0x10
>>> [  103.625829]  

Re: [PATCH] drm/amdgpu: revert calling smu msg in df callbacks

2019-10-18 Thread Kuehling, Felix
On 2019-10-18 4:29 p.m., Kim, Jonathan wrote:
> reverting the following changes:
> commit 7dd2eb31fcd5 ("drm/amdgpu: fix compiler warnings for df perfmons")
> commit 54275cd1649f ("drm/amdgpu: disable c-states on xgmi perfmons")
>
> perf events use spin-locks.  embedded smu messages have potential long
> response times and potentially deadlocks the system.
>
> Change-Id: Ic36c35a62dec116d0a2f5b69c22af4d414458679
> Signed-off-by: Jonathan Kim 

Reviewed-by: Felix Kuehling 

See one more comment inline below ...


> ---
>   drivers/gpu/drm/amd/amdgpu/df_v3_6.c | 38 ++--
>   1 file changed, 2 insertions(+), 36 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/df_v3_6.c 
> b/drivers/gpu/drm/amd/amdgpu/df_v3_6.c
> index e1cf7e9c616a..16fbd2bc8ad1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/df_v3_6.c
> +++ b/drivers/gpu/drm/amd/amdgpu/df_v3_6.c
> @@ -93,21 +93,6 @@ const struct attribute_group *df_v3_6_attr_groups[] = {
>   NULL
>   };
>   
> -static int df_v3_6_set_df_cstate(struct amdgpu_device *adev, int allow)
> -{
> - int r = 0;
> -
> - if (is_support_sw_smu(adev)) {
> - r = smu_set_df_cstate(>smu, allow);
> - } else if (adev->powerplay.pp_funcs
> - && adev->powerplay.pp_funcs->set_df_cstate) {
> - r = adev->powerplay.pp_funcs->set_df_cstate(
> - adev->powerplay.pp_handle, allow);
> - }
> -
> - return r;
> -}
> -
>   static uint64_t df_v3_6_get_fica(struct amdgpu_device *adev,
>uint32_t ficaa_val)
>   {
> @@ -117,9 +102,6 @@ static uint64_t df_v3_6_get_fica(struct amdgpu_device 
> *adev,
>   address = adev->nbio.funcs->get_pcie_index_offset(adev);
>   data = adev->nbio.funcs->get_pcie_data_offset(adev);
>   
> - if (df_v3_6_set_df_cstate(adev, DF_CSTATE_DISALLOW))
> - return 0x;
> -
>   spin_lock_irqsave(>pcie_idx_lock, flags);
>   WREG32(address, smnDF_PIE_AON_FabricIndirectConfigAccessAddress3);
>   WREG32(data, ficaa_val);
> @@ -132,8 +114,6 @@ static uint64_t df_v3_6_get_fica(struct amdgpu_device 
> *adev,
>   
>   spin_unlock_irqrestore(>pcie_idx_lock, flags);
>   
> - df_v3_6_set_df_cstate(adev, DF_CSTATE_ALLOW);
> -
>   return (((ficadh_val & 0x) << 32) | ficadl_val);
>   }
>   
> @@ -145,9 +125,6 @@ static void df_v3_6_set_fica(struct amdgpu_device *adev, 
> uint32_t ficaa_val,
>   address = adev->nbio.funcs->get_pcie_index_offset(adev);
>   data = adev->nbio.funcs->get_pcie_data_offset(adev);
>   
> - if (df_v3_6_set_df_cstate(adev, DF_CSTATE_DISALLOW))
> - return;
> -
>   spin_lock_irqsave(>pcie_idx_lock, flags);
>   WREG32(address, smnDF_PIE_AON_FabricIndirectConfigAccessAddress3);
>   WREG32(data, ficaa_val);
> @@ -157,9 +134,8 @@ static void df_v3_6_set_fica(struct amdgpu_device *adev, 
> uint32_t ficaa_val,
>   
>   WREG32(address, smnDF_PIE_AON_FabricIndirectConfigAccessDataHi3);
>   WREG32(data, ficadh_val);
> - spin_unlock_irqrestore(>pcie_idx_lock, flags);
>   
> - df_v3_6_set_df_cstate(adev, DF_CSTATE_ALLOW);
> + spin_unlock_irqrestore(>pcie_idx_lock, flags);
>   }
>   
>   /*
> @@ -177,17 +153,12 @@ static void df_v3_6_perfmon_rreg(struct amdgpu_device 
> *adev,
>   address = adev->nbio.funcs->get_pcie_index_offset(adev);
>   data = adev->nbio.funcs->get_pcie_data_offset(adev);
>   
> - if (df_v3_6_set_df_cstate(adev, DF_CSTATE_DISALLOW))
> - return;
> -
>   spin_lock_irqsave(>pcie_idx_lock, flags);
>   WREG32(address, lo_addr);
>   *lo_val = RREG32(data);
>   WREG32(address, hi_addr);
>   *hi_val = RREG32(data);
>   spin_unlock_irqrestore(>pcie_idx_lock, flags);
> -
> - df_v3_6_set_df_cstate(adev, DF_CSTATE_ALLOW);
>   }
>   
>   /*
> @@ -204,17 +175,12 @@ static void df_v3_6_perfmon_wreg(struct amdgpu_device 
> *adev, uint32_t lo_addr,
>   address = adev->nbio.funcs->get_pcie_index_offset(adev);
>   data = adev->nbio.funcs->get_pcie_data_offset(adev);
>   
> - if (df_v3_6_set_df_cstate(adev, DF_CSTATE_DISALLOW))
> - return;
> -
>   spin_lock_irqsave(>pcie_idx_lock, flags);
>   WREG32(address, lo_addr);
>   WREG32(data, lo_val);
>   WREG32(address, hi_addr);
>   WREG32(data, hi_val);
>   spin_unlock_irqrestore(>pcie_idx_lock, flags);
> -
> - df_v3_6_set_df_cstate(adev, DF_CSTATE_ALLOW);
>   }
>   
>   /* get the number of df counters available */
> @@ -546,7 +512,7 @@ static void df_v3_6_pmc_get_count(struct amdgpu_device 
> *adev,
> uint64_t config,
> uint64_t *count)
>   {
> - uint32_t lo_base_addr, hi_base_addr, lo_val = 0, hi_val = 0;
> + uint32_t lo_base_addr, hi_base_addr, lo_val, hi_val;

This part looks like it was unrelated to the DF Cstate changes. If this 
addressed a real problem, maybe it can be reintroduced with 

[PATCH 2/4] drm/amd/powerplay: Add EEPROM I2C read/write support to Arcturus.

2019-10-18 Thread Andrey Grodzovsky
The communication is done through SMU table and hence the code
is in powerplay.

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/powerplay/arcturus_ppt.c | 229 +++
 1 file changed, 229 insertions(+)

diff --git a/drivers/gpu/drm/amd/powerplay/arcturus_ppt.c 
b/drivers/gpu/drm/amd/powerplay/arcturus_ppt.c
index 90d871a..53d08de5 100644
--- a/drivers/gpu/drm/amd/powerplay/arcturus_ppt.c
+++ b/drivers/gpu/drm/amd/powerplay/arcturus_ppt.c
@@ -36,6 +36,11 @@
 #include "smu_v11_0_pptable.h"
 #include "arcturus_ppsmc.h"
 #include "nbio/nbio_7_4_sh_mask.h"
+#include 
+#include 
+#include "amdgpu_ras.h"
+
+#define to_amdgpu_device(x) (container_of(x, struct amdgpu_ras, 
eeprom_control.eeprom_accessor))->adev
 
 #define CTF_OFFSET_EDGE5
 #define CTF_OFFSET_HOTSPOT 5
@@ -171,6 +176,7 @@ static struct smu_11_0_cmn2aisc_mapping 
arcturus_table_map[SMU_TABLE_COUNT] = {
TAB_MAP(SMU_METRICS),
TAB_MAP(DRIVER_SMU_CONFIG),
TAB_MAP(OVERDRIVE),
+   TAB_MAP(I2C_COMMANDS),
 };
 
 static struct smu_11_0_cmn2aisc_mapping 
arcturus_pwr_src_map[SMU_POWER_SOURCE_COUNT] = {
@@ -293,6 +299,9 @@ static int arcturus_tables_init(struct smu_context *smu, 
struct smu_table *table
SMU_TABLE_INIT(tables, SMU_TABLE_SMU_METRICS, sizeof(SmuMetrics_t),
   PAGE_SIZE, AMDGPU_GEM_DOMAIN_VRAM);
 
+   SMU_TABLE_INIT(tables, SMU_TABLE_I2C_COMMANDS, sizeof(SwI2cRequest_t),
+  PAGE_SIZE, AMDGPU_GEM_DOMAIN_VRAM);
+
smu_table->metrics_table = kzalloc(sizeof(SmuMetrics_t), GFP_KERNEL);
if (!smu_table->metrics_table)
return -ENOMEM;
@@ -1927,6 +1936,224 @@ static int arcturus_dpm_set_uvd_enable(struct 
smu_context *smu, bool enable)
return ret;
 }
 
+
+static void arcturus_fill_eeprom_i2c_req(SwI2cRequest_t  *req, bool write,
+ uint8_t address, uint32_t numbytes,
+ uint8_t *data)
+{
+   int i;
+
+   BUG_ON(numbytes > MAX_SW_I2C_COMMANDS);
+
+   req->I2CcontrollerPort = 0;
+   req->I2CSpeed = 2;
+   req->SlaveAddress = address;
+   req->NumCmds = numbytes;
+
+   for (i = 0; i < numbytes; i++) {
+   SwI2cCmd_t *cmd =  >SwI2cCmds[i];
+
+   /* First 2 bytes are always write for lower 2b EEPROM address */
+   if (i < 2)
+   cmd->Cmd = 1;
+   else
+   cmd->Cmd = write;
+
+
+   /* Add RESTART for read  after address filled */
+   cmd->CmdConfig |= (i == 2 && !write) ? CMDCONFIG_RESTART_MASK : 
0;
+
+   /* Add STOP in the end */
+   cmd->CmdConfig |= (i == (numbytes - 1)) ? CMDCONFIG_STOP_MASK : 
0;
+
+   /* Fill with data regardless if read or write to simplify code 
*/
+   cmd->RegisterAddr = data[i];
+   }
+}
+
+static int arcturus_i2c_eeprom_read_data(struct i2c_adapter *control,
+  uint8_t address,
+  uint8_t *data,
+  uint32_t numbytes)
+{
+   uint32_t  i, ret = 0;
+   SwI2cRequest_t req;
+   struct amdgpu_device *adev = to_amdgpu_device(control);
+   struct smu_table_context *smu_table = >smu.smu_table;
+   struct smu_table *table = _table->tables[SMU_TABLE_I2C_COMMANDS];
+
+   memset(, 0, sizeof(req));
+   arcturus_fill_eeprom_i2c_req(, false, address, numbytes, data);
+
+   mutex_lock(>smu.mutex);
+   /* Now read data starting with that address */
+   ret = smu_update_table(>smu, SMU_TABLE_I2C_COMMANDS, 0, ,
+   true);
+   mutex_unlock(>smu.mutex);
+
+   if (!ret) {
+   SwI2cRequest_t *res = (SwI2cRequest_t *)table->cpu_addr;
+
+   /* Assume SMU  fills res.SwI2cCmds[i].Data with read bytes */
+   for (i = 0; i < numbytes; i++)
+   data[i] = res->SwI2cCmds[i].Data;
+
+   pr_debug("arcturus_i2c_eeprom_read_data, address = %x, bytes = 
%d, data :",
+ (uint16_t)address, numbytes);
+
+   print_hex_dump(KERN_DEBUG, "data: ", DUMP_PREFIX_NONE,
+  8, 1, data, numbytes, false);
+   } else
+   pr_err("arcturus_i2c_eeprom_read_data - error occurred :%x", 
ret);
+
+   return ret;
+}
+
+static int arcturus_i2c_eeprom_write_data(struct i2c_adapter *control,
+   uint8_t address,
+   uint8_t *data,
+   uint32_t numbytes)
+{
+   uint32_t ret;
+   SwI2cRequest_t req;
+   struct amdgpu_device *adev = to_amdgpu_device(control);
+
+   memset(, 0, sizeof(req));
+   arcturus_fill_eeprom_i2c_req(, true, address, 

[PATCH 3/4] drm/amdgpu: Use ARCTURUS in RAS EEPROM.

2019-10-18 Thread Andrey Grodzovsky
Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
index 20af0a1..7de16c0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
@@ -216,6 +216,10 @@ int amdgpu_ras_eeprom_init(struct 
amdgpu_ras_eeprom_control *control)
ret = 
smu_v11_0_i2c_eeprom_control_init(>eeprom_accessor);
break;
 
+   case CHIP_ARCTURUS:
+   ret = smu_i2c_eeprom_init(>smu, 
>eeprom_accessor);
+   break;
+
default:
return 0;
}
@@ -260,6 +264,9 @@ void amdgpu_ras_eeprom_fini(struct 
amdgpu_ras_eeprom_control *control)
case CHIP_VEGA20:
smu_v11_0_i2c_eeprom_control_fini(>eeprom_accessor);
break;
+   case CHIP_ARCTURUS:
+   smu_i2c_eeprom_fini(>smu, >eeprom_accessor);
+   break;
 
default:
return;
@@ -364,7 +371,7 @@ int amdgpu_ras_eeprom_process_recods(struct 
amdgpu_ras_eeprom_control *control,
struct eeprom_table_record *record;
struct amdgpu_device *adev = to_amdgpu_device(control);
 
-   if (adev->asic_type != CHIP_VEGA20)
+   if (adev->asic_type != CHIP_VEGA20 && adev->asic_type != CHIP_ARCTURUS)
return 0;
 
buffs = kcalloc(num, EEPROM_ADDRESS_SIZE + EEPROM_TABLE_RECORD_SIZE,
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 1/4] drm/amd/powerplay: Add interface for I2C transactions to SMU.

2019-10-18 Thread Andrey Grodzovsky
Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/powerplay/inc/amdgpu_smu.h | 9 +
 1 file changed, 9 insertions(+)

diff --git a/drivers/gpu/drm/amd/powerplay/inc/amdgpu_smu.h 
b/drivers/gpu/drm/amd/powerplay/inc/amdgpu_smu.h
index bf13bf3..24244eb 100644
--- a/drivers/gpu/drm/amd/powerplay/inc/amdgpu_smu.h
+++ b/drivers/gpu/drm/amd/powerplay/inc/amdgpu_smu.h
@@ -394,6 +394,8 @@ struct smu_context
 
 };
 
+struct i2c_adapter;
+
 struct pptable_funcs {
int (*alloc_dpm_context)(struct smu_context *smu);
int (*store_powerplay_table)(struct smu_context *smu);
@@ -470,6 +472,8 @@ struct pptable_funcs {
   uint32_t dpm_level, uint32_t *freq);
int (*set_df_cstate)(struct smu_context *smu, enum pp_df_cstate state);
int (*update_pcie_parameters)(struct smu_context *smu, uint32_t 
pcie_gen_cap, uint32_t pcie_width_cap);
+   int (*i2c_eeprom_init)(struct i2c_adapter *control);
+   void (*i2c_eeprom_fini)(struct i2c_adapter *control);
int (*get_dpm_clock_table)(struct smu_context *smu, struct dpm_clocks 
*clock_table);
 };
 
@@ -782,6 +786,11 @@ struct smu_funcs
 #define smu_override_pcie_parameters(smu) \
((smu)->funcs->override_pcie_parameters ? 
(smu)->funcs->override_pcie_parameters((smu)) : 0)
 
+#define smu_i2c_eeprom_init(smu, control) \
+   ((smu)->ppt_funcs->i2c_eeprom_init ? 
(smu)->ppt_funcs->i2c_eeprom_init((control)) : -EINVAL)
+#define smu_i2c_eeprom_fini(smu, control) \
+   ((smu)->ppt_funcs->i2c_eeprom_fini ? 
(smu)->ppt_funcs->i2c_eeprom_fini((control)) : -EINVAL)
+
 #define smu_update_pcie_parameters(smu, pcie_gen_cap, pcie_width_cap) \
((smu)->ppt_funcs->update_pcie_parameters ? 
(smu)->ppt_funcs->update_pcie_parameters((smu), (pcie_gen_cap), 
(pcie_width_cap)) : 0)
 
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 4/4] drm/amdgpu: Move amdgpu_ras_recovery_init to after SMU ready.

2019-10-18 Thread Andrey Grodzovsky
For Arcturus the I2C traffic is done through SMU tables and so
we must postpone RAS recovery init to after they are ready
which is in amdgpu_device_ip_hw_init_phase2.

Signed-off-by: Andrey Grodzovsky 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 13 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c| 11 ---
 2 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 17cfdaf..c40e9a5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -1850,6 +1850,19 @@ static int amdgpu_device_ip_init(struct amdgpu_device 
*adev)
if (r)
goto init_failed;
 
+   /*
+* retired pages will be loaded from eeprom and reserved here,
+* it should be called after amdgpu_device_ip_hw_init_phase2  since
+* for some ASICs the RAS EEPROM code relies on SMU fully functioning
+* for I2C communication which only true at this point.
+* recovery_init may fail, but it can free all resources allocated by
+* itself and its failure should not stop amdgpu init process.
+*
+* Note: theoretically, this should be called before all vram 
allocations
+* to protect retired page from abusing
+*/
+   amdgpu_ras_recovery_init(adev);
+
if (adev->gmc.xgmi.num_physical_nodes > 1)
amdgpu_xgmi_add_device(adev);
amdgpu_amdkfd_device_init(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 2e85a51..1045c3f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -1721,17 +1721,6 @@ int amdgpu_ttm_init(struct amdgpu_device *adev)
 #endif
 
/*
-* retired pages will be loaded from eeprom and reserved here,
-* it should be called after ttm init since new bo may be created,
-* recovery_init may fail, but it can free all resources allocated by
-* itself and its failure should not stop amdgpu init process.
-*
-* Note: theoretically, this should be called before all vram 
allocations
-* to protect retired page from abusing
-*/
-   amdgpu_ras_recovery_init(adev);
-
-   /*
 *The reserved vram for firmware must be pinned to the specified
 *place on the VRAM, so reserve it early.
 */
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 0/4] Add RAS EEPROM table support for Arcturus.

2019-10-18 Thread Andrey Grodzovsky
This patch set adds support for Arcturus EEPROM to store RAS 
errors which rise during run time so on next driver load those 
errors can be retrieved and action taken on them 
(e.g. Reserve bad memory pages to disallow their usage by the GPU).

The I2C communication is done through SMU table  which is what in patch 2
while patch 4 relocates RAS recovery init to much later place then before
since SMU must be fully operational for this to work on Arcturus.


Andrey Grodzovsky (4):
  drm/amd/powerplay: Add interface for I2C transactions to SMU.
  drm/amd/powerplay: Add EEPROM I2C read/write support to Arcturus.
  drm/amdgpu: Use ARCTURUS in RAS EEPROM.
  drm/amdgpu: Move amdgpu_ras_recovery_init to after SMU ready.

 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  13 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c |   9 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c|  11 --
 drivers/gpu/drm/amd/powerplay/arcturus_ppt.c   | 229 +
 drivers/gpu/drm/amd/powerplay/inc/amdgpu_smu.h |   9 +
 5 files changed, 259 insertions(+), 12 deletions(-)

-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH hmm 00/15] Consolidate the mmu notifier interval_tree and locking

2019-10-18 Thread Jason Gunthorpe
On Thu, Oct 17, 2019 at 04:47:20PM +, Koenig, Christian wrote:

> > get_user_pages/hmm_range_fault() and invalidate_range_start() both are
> > called while holding mm->map_sem, so they are always serialized.
> 
> Not even remotely.
> 
> For calling get_user_pages()/hmm_range_fault() you only need to hold the 
> mmap_sem in read mode.

Right
 
> And IIRC invalidate_range_start() is sometimes called without holding 
> the mmap_sem at all.

Yep
 
> So again how are they serialized?

The 'driver lock' thing does it, read the hmm documentation, the hmm
approach is basically the only approach that was correct of all the
drivers..

So long as the 'driver lock' is held the range cannot become
invalidated as the 'driver lock' prevents progress of invalidation.

Holding the driver lock and using the seq based mmu_range_read_retry()
tells if the previous unlocked get_user_pages() is still valid or
needs to be discard.

So it doesn't matter if get_user_pages() races or not, the result is not
to be used until the driver lock is held and mmu_range_read_retry()
called, which provides the coherence.

It is the usual seqlock pattern.

Jason
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu: revert calling smu msg in df callbacks

2019-10-18 Thread Kim, Jonathan
reverting the following changes:
commit 7dd2eb31fcd5 ("drm/amdgpu: fix compiler warnings for df perfmons")
commit 54275cd1649f ("drm/amdgpu: disable c-states on xgmi perfmons")

perf events use spin-locks.  embedded smu messages have potential long
response times and potentially deadlocks the system.

Change-Id: Ic36c35a62dec116d0a2f5b69c22af4d414458679
Signed-off-by: Jonathan Kim 
---
 drivers/gpu/drm/amd/amdgpu/df_v3_6.c | 38 ++--
 1 file changed, 2 insertions(+), 36 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/df_v3_6.c 
b/drivers/gpu/drm/amd/amdgpu/df_v3_6.c
index e1cf7e9c616a..16fbd2bc8ad1 100644
--- a/drivers/gpu/drm/amd/amdgpu/df_v3_6.c
+++ b/drivers/gpu/drm/amd/amdgpu/df_v3_6.c
@@ -93,21 +93,6 @@ const struct attribute_group *df_v3_6_attr_groups[] = {
NULL
 };
 
-static int df_v3_6_set_df_cstate(struct amdgpu_device *adev, int allow)
-{
-   int r = 0;
-
-   if (is_support_sw_smu(adev)) {
-   r = smu_set_df_cstate(>smu, allow);
-   } else if (adev->powerplay.pp_funcs
-   && adev->powerplay.pp_funcs->set_df_cstate) {
-   r = adev->powerplay.pp_funcs->set_df_cstate(
-   adev->powerplay.pp_handle, allow);
-   }
-
-   return r;
-}
-
 static uint64_t df_v3_6_get_fica(struct amdgpu_device *adev,
 uint32_t ficaa_val)
 {
@@ -117,9 +102,6 @@ static uint64_t df_v3_6_get_fica(struct amdgpu_device *adev,
address = adev->nbio.funcs->get_pcie_index_offset(adev);
data = adev->nbio.funcs->get_pcie_data_offset(adev);
 
-   if (df_v3_6_set_df_cstate(adev, DF_CSTATE_DISALLOW))
-   return 0x;
-
spin_lock_irqsave(>pcie_idx_lock, flags);
WREG32(address, smnDF_PIE_AON_FabricIndirectConfigAccessAddress3);
WREG32(data, ficaa_val);
@@ -132,8 +114,6 @@ static uint64_t df_v3_6_get_fica(struct amdgpu_device *adev,
 
spin_unlock_irqrestore(>pcie_idx_lock, flags);
 
-   df_v3_6_set_df_cstate(adev, DF_CSTATE_ALLOW);
-
return (((ficadh_val & 0x) << 32) | ficadl_val);
 }
 
@@ -145,9 +125,6 @@ static void df_v3_6_set_fica(struct amdgpu_device *adev, 
uint32_t ficaa_val,
address = adev->nbio.funcs->get_pcie_index_offset(adev);
data = adev->nbio.funcs->get_pcie_data_offset(adev);
 
-   if (df_v3_6_set_df_cstate(adev, DF_CSTATE_DISALLOW))
-   return;
-
spin_lock_irqsave(>pcie_idx_lock, flags);
WREG32(address, smnDF_PIE_AON_FabricIndirectConfigAccessAddress3);
WREG32(data, ficaa_val);
@@ -157,9 +134,8 @@ static void df_v3_6_set_fica(struct amdgpu_device *adev, 
uint32_t ficaa_val,
 
WREG32(address, smnDF_PIE_AON_FabricIndirectConfigAccessDataHi3);
WREG32(data, ficadh_val);
-   spin_unlock_irqrestore(>pcie_idx_lock, flags);
 
-   df_v3_6_set_df_cstate(adev, DF_CSTATE_ALLOW);
+   spin_unlock_irqrestore(>pcie_idx_lock, flags);
 }
 
 /*
@@ -177,17 +153,12 @@ static void df_v3_6_perfmon_rreg(struct amdgpu_device 
*adev,
address = adev->nbio.funcs->get_pcie_index_offset(adev);
data = adev->nbio.funcs->get_pcie_data_offset(adev);
 
-   if (df_v3_6_set_df_cstate(adev, DF_CSTATE_DISALLOW))
-   return;
-
spin_lock_irqsave(>pcie_idx_lock, flags);
WREG32(address, lo_addr);
*lo_val = RREG32(data);
WREG32(address, hi_addr);
*hi_val = RREG32(data);
spin_unlock_irqrestore(>pcie_idx_lock, flags);
-
-   df_v3_6_set_df_cstate(adev, DF_CSTATE_ALLOW);
 }
 
 /*
@@ -204,17 +175,12 @@ static void df_v3_6_perfmon_wreg(struct amdgpu_device 
*adev, uint32_t lo_addr,
address = adev->nbio.funcs->get_pcie_index_offset(adev);
data = adev->nbio.funcs->get_pcie_data_offset(adev);
 
-   if (df_v3_6_set_df_cstate(adev, DF_CSTATE_DISALLOW))
-   return;
-
spin_lock_irqsave(>pcie_idx_lock, flags);
WREG32(address, lo_addr);
WREG32(data, lo_val);
WREG32(address, hi_addr);
WREG32(data, hi_val);
spin_unlock_irqrestore(>pcie_idx_lock, flags);
-
-   df_v3_6_set_df_cstate(adev, DF_CSTATE_ALLOW);
 }
 
 /* get the number of df counters available */
@@ -546,7 +512,7 @@ static void df_v3_6_pmc_get_count(struct amdgpu_device 
*adev,
  uint64_t config,
  uint64_t *count)
 {
-   uint32_t lo_base_addr, hi_base_addr, lo_val = 0, hi_val = 0;
+   uint32_t lo_base_addr, hi_base_addr, lo_val, hi_val;
*count = 0;
 
switch (adev->asic_type) {
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH][next] drm/amdgpu/psp: fix spelling mistake "initliaze" -> "initialize"

2019-10-18 Thread Alex Deucher
On Fri, Oct 18, 2019 at 4:15 AM Colin King  wrote:
>
> From: Colin Ian King 
>
> There is a spelling mistake in a DRM_ERROR error message. Fix it.
>
> Signed-off-by: Colin Ian King 

Applied.  thanks!

Alex

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> index b996b5bc5804..fd7a73f4fa70 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> @@ -90,7 +90,7 @@ static int psp_sw_init(void *handle)
>
> ret = psp_mem_training_init(psp);
> if (ret) {
> -   DRM_ERROR("Failed to initliaze memory training!\n");
> +   DRM_ERROR("Failed to initialize memory training!\n");
> return ret;
> }
> ret = psp_mem_training(psp, PSP_MEM_TRAIN_COLD_BOOT);
> --
> 2.20.1
>
> ___
> dri-devel mailing list
> dri-de...@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 2/2] Revert "drm/amdgpu: disable c-states on xgmi perfmons"

2019-10-18 Thread Kuehling, Felix
You can squash the two reverts into a single commit so you avoid 
reintroducing a broken intermediate state. Mention both reverted commits 
in the squashed commit description. Checkpatch.pl prefers a different 
format for quoting reverted commits. Run checkpatch.pl on your commit to 
see a proper example.

Regards,
   Felix


On 2019-10-18 1:59 p.m., Kim, Jonathan wrote:
> This reverts commit 54275cd1649f4034c6450b6c5a8358fcd4f7dda6.
>
> incomplete solution to df c-state race condition.  smu msg in perf events
> causes deadlock.
>
> Change-Id: Ia85179df2bd167657e42a2d828c4a7c475c392ff
> Signed-off-by: Jonathan Kim 
> ---
>   drivers/gpu/drm/amd/amdgpu/df_v3_6.c | 36 +---
>   1 file changed, 1 insertion(+), 35 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/df_v3_6.c 
> b/drivers/gpu/drm/amd/amdgpu/df_v3_6.c
> index f403c62c944e..16fbd2bc8ad1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/df_v3_6.c
> +++ b/drivers/gpu/drm/amd/amdgpu/df_v3_6.c
> @@ -93,21 +93,6 @@ const struct attribute_group *df_v3_6_attr_groups[] = {
>   NULL
>   };
>   
> -static df_v3_6_set_df_cstate(struct amdgpu_device *adev, int allow)
> -{
> - int r = 0;
> -
> - if (is_support_sw_smu(adev)) {
> - r = smu_set_df_cstate(>smu, allow);
> - } else if (adev->powerplay.pp_funcs
> - && adev->powerplay.pp_funcs->set_df_cstate) {
> - r = adev->powerplay.pp_funcs->set_df_cstate(
> - adev->powerplay.pp_handle, allow);
> - }
> -
> - return r;
> -}
> -
>   static uint64_t df_v3_6_get_fica(struct amdgpu_device *adev,
>uint32_t ficaa_val)
>   {
> @@ -117,9 +102,6 @@ static uint64_t df_v3_6_get_fica(struct amdgpu_device 
> *adev,
>   address = adev->nbio.funcs->get_pcie_index_offset(adev);
>   data = adev->nbio.funcs->get_pcie_data_offset(adev);
>   
> - if (df_v3_6_set_df_cstate(adev, DF_CSTATE_DISALLOW))
> - return 0x;
> -
>   spin_lock_irqsave(>pcie_idx_lock, flags);
>   WREG32(address, smnDF_PIE_AON_FabricIndirectConfigAccessAddress3);
>   WREG32(data, ficaa_val);
> @@ -132,8 +114,6 @@ static uint64_t df_v3_6_get_fica(struct amdgpu_device 
> *adev,
>   
>   spin_unlock_irqrestore(>pcie_idx_lock, flags);
>   
> - df_v3_6_set_df_cstate(adev, DF_CSTATE_ALLOW);
> -
>   return (((ficadh_val & 0x) << 32) | ficadl_val);
>   }
>   
> @@ -145,9 +125,6 @@ static void df_v3_6_set_fica(struct amdgpu_device *adev, 
> uint32_t ficaa_val,
>   address = adev->nbio.funcs->get_pcie_index_offset(adev);
>   data = adev->nbio.funcs->get_pcie_data_offset(adev);
>   
> - if (df_v3_6_set_df_cstate(adev, DF_CSTATE_DISALLOW))
> - return;
> -
>   spin_lock_irqsave(>pcie_idx_lock, flags);
>   WREG32(address, smnDF_PIE_AON_FabricIndirectConfigAccessAddress3);
>   WREG32(data, ficaa_val);
> @@ -157,9 +134,8 @@ static void df_v3_6_set_fica(struct amdgpu_device *adev, 
> uint32_t ficaa_val,
>   
>   WREG32(address, smnDF_PIE_AON_FabricIndirectConfigAccessDataHi3);
>   WREG32(data, ficadh_val);
> - spin_unlock_irqrestore(>pcie_idx_lock, flags);
>   
> - df_v3_6_set_df_cstate(adev, DF_CSTATE_ALLOW);
> + spin_unlock_irqrestore(>pcie_idx_lock, flags);
>   }
>   
>   /*
> @@ -177,17 +153,12 @@ static void df_v3_6_perfmon_rreg(struct amdgpu_device 
> *adev,
>   address = adev->nbio.funcs->get_pcie_index_offset(adev);
>   data = adev->nbio.funcs->get_pcie_data_offset(adev);
>   
> - if (df_v3_6_set_df_cstate(adev, DF_CSTATE_DISALLOW))
> - return;
> -
>   spin_lock_irqsave(>pcie_idx_lock, flags);
>   WREG32(address, lo_addr);
>   *lo_val = RREG32(data);
>   WREG32(address, hi_addr);
>   *hi_val = RREG32(data);
>   spin_unlock_irqrestore(>pcie_idx_lock, flags);
> -
> - df_v3_6_set_df_cstate(adev, DF_CSTATE_ALLOW);
>   }
>   
>   /*
> @@ -204,17 +175,12 @@ static void df_v3_6_perfmon_wreg(struct amdgpu_device 
> *adev, uint32_t lo_addr,
>   address = adev->nbio.funcs->get_pcie_index_offset(adev);
>   data = adev->nbio.funcs->get_pcie_data_offset(adev);
>   
> - if (df_v3_6_set_df_cstate(adev, DF_CSTATE_DISALLOW))
> - return;
> -
>   spin_lock_irqsave(>pcie_idx_lock, flags);
>   WREG32(address, lo_addr);
>   WREG32(data, lo_val);
>   WREG32(address, hi_addr);
>   WREG32(data, hi_val);
>   spin_unlock_irqrestore(>pcie_idx_lock, flags);
> -
> - df_v3_6_set_df_cstate(adev, DF_CSTATE_ALLOW);
>   }
>   
>   /* get the number of df counters available */
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH v2] drm/amdkfd: kfd open return failed if device is locked

2019-10-18 Thread Kuehling, Felix
On 2019-10-18 1:36 p.m., Yang, Philip wrote:
> If device is locked for suspend and resume, kfd open should return
> failed -EAGAIN without creating process, otherwise the application exit
> to release the process will hang to wait for resume is done if the suspend
> and resume is stuck somewhere. This is backtrace:
>
> v2: fix processes that were created before suspend/resume got stuck
>
> [Thu Oct 17 16:43:37 2019] INFO: task rocminfo:3024 blocked for more
> than 120 seconds.
> [Thu Oct 17 16:43:37 2019]   Not tainted
> 5.0.0-rc1-kfd-compute-rocm-dkms-no-npi-1131 #1
> [Thu Oct 17 16:43:37 2019] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [Thu Oct 17 16:43:37 2019] rocminfoD0  3024   2947
> 0x8000
> [Thu Oct 17 16:43:37 2019] Call Trace:
> [Thu Oct 17 16:43:37 2019]  ? __schedule+0x3d9/0x8a0
> [Thu Oct 17 16:43:37 2019]  schedule+0x32/0x70
> [Thu Oct 17 16:43:37 2019]  schedule_preempt_disabled+0xa/0x10
> [Thu Oct 17 16:43:37 2019]  __mutex_lock.isra.9+0x1e3/0x4e0
> [Thu Oct 17 16:43:37 2019]  ? __call_srcu+0x264/0x3b0
> [Thu Oct 17 16:43:37 2019]  ? process_termination_cpsch+0x24/0x2f0
> [amdgpu]
> [Thu Oct 17 16:43:37 2019]  process_termination_cpsch+0x24/0x2f0
> [amdgpu]
> [Thu Oct 17 16:43:37 2019]
> kfd_process_dequeue_from_all_devices+0x42/0x60 [amdgpu]
> [Thu Oct 17 16:43:37 2019]  kfd_process_notifier_release+0x1be/0x220
> [amdgpu]
> [Thu Oct 17 16:43:37 2019]  __mmu_notifier_release+0x3e/0xc0
> [Thu Oct 17 16:43:37 2019]  exit_mmap+0x160/0x1a0
> [Thu Oct 17 16:43:37 2019]  ? __handle_mm_fault+0xba3/0x1200
> [Thu Oct 17 16:43:37 2019]  ? exit_robust_list+0x5a/0x110
> [Thu Oct 17 16:43:37 2019]  mmput+0x4a/0x120
> [Thu Oct 17 16:43:37 2019]  do_exit+0x284/0xb20
> [Thu Oct 17 16:43:37 2019]  ? handle_mm_fault+0xfa/0x200
> [Thu Oct 17 16:43:37 2019]  do_group_exit+0x3a/0xa0
> [Thu Oct 17 16:43:37 2019]  __x64_sys_exit_group+0x14/0x20
> [Thu Oct 17 16:43:37 2019]  do_syscall_64+0x4f/0x100
> [Thu Oct 17 16:43:37 2019]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> Signed-off-by: Philip Yang 
> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c   | 6 +++---
>   drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 6 ++
>   2 files changed, 9 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> index d9e36dbf13d5..40d75c39f08e 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> @@ -120,13 +120,13 @@ static int kfd_open(struct inode *inode, struct file 
> *filep)
>   return -EPERM;
>   }
>   
> + if (kfd_is_locked())
> + return -EAGAIN;
> +
>   process = kfd_create_process(filep);
>   if (IS_ERR(process))
>   return PTR_ERR(process);
>   
> - if (kfd_is_locked())
> - return -EAGAIN;
> -
>   dev_dbg(kfd_device, "process %d opened, compat mode (32 bit) - %d\n",
>   process->pasid, process->is_32bit_user_mode);
>   
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> index 8509814a6ff0..3784013b92a0 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
> @@ -128,6 +128,12 @@ void kfd_process_dequeue_from_all_devices(struct 
> kfd_process *p)
>   {
>   struct kfd_process_device *pdd;
>   
> + /* If suspend/resume got stuck, dqm_lock is hold,
> +  * skip process_termination_cpsch to avoid deadlock
> +  */
> + if (kfd_is_locked())
> + return;
> +

Holding the DQM lock during reset has caused other problems (lock 
dependency issues and deadlocks) and I was thinking about getting rid of 
that completely. The intention of holding the DQM lock during reset was 
to prevent the device queue manager from accessing the CP hardware while 
a reset was in progress. However, I think there are smarter ways to 
achieve that. We already get a pre-reset callback (kgd2kfd_pre_reset) 
that executes the kgd2kfd_suspend, which suspends processes and stops 
DQM through kfd->dqm->ops.stop(kfd->dqm). This should take care of most 
of the problem. If there are any places in DQM that try to access the 
devices, they should add conditions to not access HW while DQM is 
stopped. Then we could avoid holding a lock indefinitely while a reset 
is in progress.

The DQM lock is particularly problematic in terms of lock dependencies 
because it can be taken in MMU notifiers. We want to avoid taking any 
other locks while holding the DQM lock. Holding the DQM lock for a long 
time during reset is counterproductive to that objective.

Regards,
   Felix


>   list_for_each_entry(pdd, >per_device_data, per_device_list)
>   kfd_process_dequeue_from_device(pdd);
>   }
___
amd-gfx mailing list

[PATCH 2/2] Revert "drm/amdgpu: disable c-states on xgmi perfmons"

2019-10-18 Thread Kim, Jonathan
This reverts commit 54275cd1649f4034c6450b6c5a8358fcd4f7dda6.

incomplete solution to df c-state race condition.  smu msg in perf events
causes deadlock.

Change-Id: Ia85179df2bd167657e42a2d828c4a7c475c392ff
Signed-off-by: Jonathan Kim 
---
 drivers/gpu/drm/amd/amdgpu/df_v3_6.c | 36 +---
 1 file changed, 1 insertion(+), 35 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/df_v3_6.c 
b/drivers/gpu/drm/amd/amdgpu/df_v3_6.c
index f403c62c944e..16fbd2bc8ad1 100644
--- a/drivers/gpu/drm/amd/amdgpu/df_v3_6.c
+++ b/drivers/gpu/drm/amd/amdgpu/df_v3_6.c
@@ -93,21 +93,6 @@ const struct attribute_group *df_v3_6_attr_groups[] = {
NULL
 };
 
-static df_v3_6_set_df_cstate(struct amdgpu_device *adev, int allow)
-{
-   int r = 0;
-
-   if (is_support_sw_smu(adev)) {
-   r = smu_set_df_cstate(>smu, allow);
-   } else if (adev->powerplay.pp_funcs
-   && adev->powerplay.pp_funcs->set_df_cstate) {
-   r = adev->powerplay.pp_funcs->set_df_cstate(
-   adev->powerplay.pp_handle, allow);
-   }
-
-   return r;
-}
-
 static uint64_t df_v3_6_get_fica(struct amdgpu_device *adev,
 uint32_t ficaa_val)
 {
@@ -117,9 +102,6 @@ static uint64_t df_v3_6_get_fica(struct amdgpu_device *adev,
address = adev->nbio.funcs->get_pcie_index_offset(adev);
data = adev->nbio.funcs->get_pcie_data_offset(adev);
 
-   if (df_v3_6_set_df_cstate(adev, DF_CSTATE_DISALLOW))
-   return 0x;
-
spin_lock_irqsave(>pcie_idx_lock, flags);
WREG32(address, smnDF_PIE_AON_FabricIndirectConfigAccessAddress3);
WREG32(data, ficaa_val);
@@ -132,8 +114,6 @@ static uint64_t df_v3_6_get_fica(struct amdgpu_device *adev,
 
spin_unlock_irqrestore(>pcie_idx_lock, flags);
 
-   df_v3_6_set_df_cstate(adev, DF_CSTATE_ALLOW);
-
return (((ficadh_val & 0x) << 32) | ficadl_val);
 }
 
@@ -145,9 +125,6 @@ static void df_v3_6_set_fica(struct amdgpu_device *adev, 
uint32_t ficaa_val,
address = adev->nbio.funcs->get_pcie_index_offset(adev);
data = adev->nbio.funcs->get_pcie_data_offset(adev);
 
-   if (df_v3_6_set_df_cstate(adev, DF_CSTATE_DISALLOW))
-   return;
-
spin_lock_irqsave(>pcie_idx_lock, flags);
WREG32(address, smnDF_PIE_AON_FabricIndirectConfigAccessAddress3);
WREG32(data, ficaa_val);
@@ -157,9 +134,8 @@ static void df_v3_6_set_fica(struct amdgpu_device *adev, 
uint32_t ficaa_val,
 
WREG32(address, smnDF_PIE_AON_FabricIndirectConfigAccessDataHi3);
WREG32(data, ficadh_val);
-   spin_unlock_irqrestore(>pcie_idx_lock, flags);
 
-   df_v3_6_set_df_cstate(adev, DF_CSTATE_ALLOW);
+   spin_unlock_irqrestore(>pcie_idx_lock, flags);
 }
 
 /*
@@ -177,17 +153,12 @@ static void df_v3_6_perfmon_rreg(struct amdgpu_device 
*adev,
address = adev->nbio.funcs->get_pcie_index_offset(adev);
data = adev->nbio.funcs->get_pcie_data_offset(adev);
 
-   if (df_v3_6_set_df_cstate(adev, DF_CSTATE_DISALLOW))
-   return;
-
spin_lock_irqsave(>pcie_idx_lock, flags);
WREG32(address, lo_addr);
*lo_val = RREG32(data);
WREG32(address, hi_addr);
*hi_val = RREG32(data);
spin_unlock_irqrestore(>pcie_idx_lock, flags);
-
-   df_v3_6_set_df_cstate(adev, DF_CSTATE_ALLOW);
 }
 
 /*
@@ -204,17 +175,12 @@ static void df_v3_6_perfmon_wreg(struct amdgpu_device 
*adev, uint32_t lo_addr,
address = adev->nbio.funcs->get_pcie_index_offset(adev);
data = adev->nbio.funcs->get_pcie_data_offset(adev);
 
-   if (df_v3_6_set_df_cstate(adev, DF_CSTATE_DISALLOW))
-   return;
-
spin_lock_irqsave(>pcie_idx_lock, flags);
WREG32(address, lo_addr);
WREG32(data, lo_val);
WREG32(address, hi_addr);
WREG32(data, hi_val);
spin_unlock_irqrestore(>pcie_idx_lock, flags);
-
-   df_v3_6_set_df_cstate(adev, DF_CSTATE_ALLOW);
 }
 
 /* get the number of df counters available */
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] Revert "drm/amdgpu: fix compiler warnings for df perfmons"

2019-10-18 Thread Kim, Jonathan
This reverts commit 7dd2eb31fcd564574e8efea6bf23cf504f9e2fd7.

revert fix of compiler warning on incomplete df-cstate race condition
handling solution i.e. smu msg cannot be sent within perfevents

Change-Id: Ia09dd24ef91ef75a79a223f72f0cb6a86cd08667
Signed-off-by: Jonathan Kim 
---
 drivers/gpu/drm/amd/amdgpu/df_v3_6.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/df_v3_6.c 
b/drivers/gpu/drm/amd/amdgpu/df_v3_6.c
index e1cf7e9c616a..f403c62c944e 100644
--- a/drivers/gpu/drm/amd/amdgpu/df_v3_6.c
+++ b/drivers/gpu/drm/amd/amdgpu/df_v3_6.c
@@ -93,7 +93,7 @@ const struct attribute_group *df_v3_6_attr_groups[] = {
NULL
 };
 
-static int df_v3_6_set_df_cstate(struct amdgpu_device *adev, int allow)
+static df_v3_6_set_df_cstate(struct amdgpu_device *adev, int allow)
 {
int r = 0;
 
@@ -546,7 +546,7 @@ static void df_v3_6_pmc_get_count(struct amdgpu_device 
*adev,
  uint64_t config,
  uint64_t *count)
 {
-   uint32_t lo_base_addr, hi_base_addr, lo_val = 0, hi_val = 0;
+   uint32_t lo_base_addr, hi_base_addr, lo_val, hi_val;
*count = 0;
 
switch (adev->asic_type) {
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH 1/2] Revert "drm/amdgpu: fix compiler warnings for df perfmons"

2019-10-18 Thread Russell, Kent
I think the first thing that Alex will say is "please include a commit 
description". Why did you revert it? 

 Kent

-Original Message-
From: amd-gfx  On Behalf Of Kim, Jonathan
Sent: Friday, October 18, 2019 1:31 PM
To: amd-gfx@lists.freedesktop.org
Cc: Kuehling, Felix ; Kim, Jonathan 

Subject: [PATCH 1/2] Revert "drm/amdgpu: fix compiler warnings for df perfmons"

This reverts commit 7dd2eb31fcd564574e8efea6bf23cf504f9e2fd7.

Signed-off-by: Jonathan Kim 
---
 drivers/gpu/drm/amd/amdgpu/df_v3_6.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/df_v3_6.c 
b/drivers/gpu/drm/amd/amdgpu/df_v3_6.c
index e1cf7e9c616a..f403c62c944e 100644
--- a/drivers/gpu/drm/amd/amdgpu/df_v3_6.c
+++ b/drivers/gpu/drm/amd/amdgpu/df_v3_6.c
@@ -93,7 +93,7 @@ const struct attribute_group *df_v3_6_attr_groups[] = {
NULL
 };
 
-static int df_v3_6_set_df_cstate(struct amdgpu_device *adev, int allow)
+static df_v3_6_set_df_cstate(struct amdgpu_device *adev, int allow)
 {
int r = 0;
 
@@ -546,7 +546,7 @@ static void df_v3_6_pmc_get_count(struct amdgpu_device 
*adev,
  uint64_t config,
  uint64_t *count)
 {
-   uint32_t lo_base_addr, hi_base_addr, lo_val = 0, hi_val = 0;
+   uint32_t lo_base_addr, hi_base_addr, lo_val, hi_val;
*count = 0;
 
switch (adev->asic_type) {
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH v2] drm/amdkfd: kfd open return failed if device is locked

2019-10-18 Thread Yang, Philip
If device is locked for suspend and resume, kfd open should return
failed -EAGAIN without creating process, otherwise the application exit
to release the process will hang to wait for resume is done if the suspend
and resume is stuck somewhere. This is backtrace:

v2: fix processes that were created before suspend/resume got stuck

[Thu Oct 17 16:43:37 2019] INFO: task rocminfo:3024 blocked for more
than 120 seconds.
[Thu Oct 17 16:43:37 2019]   Not tainted
5.0.0-rc1-kfd-compute-rocm-dkms-no-npi-1131 #1
[Thu Oct 17 16:43:37 2019] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Oct 17 16:43:37 2019] rocminfoD0  3024   2947
0x8000
[Thu Oct 17 16:43:37 2019] Call Trace:
[Thu Oct 17 16:43:37 2019]  ? __schedule+0x3d9/0x8a0
[Thu Oct 17 16:43:37 2019]  schedule+0x32/0x70
[Thu Oct 17 16:43:37 2019]  schedule_preempt_disabled+0xa/0x10
[Thu Oct 17 16:43:37 2019]  __mutex_lock.isra.9+0x1e3/0x4e0
[Thu Oct 17 16:43:37 2019]  ? __call_srcu+0x264/0x3b0
[Thu Oct 17 16:43:37 2019]  ? process_termination_cpsch+0x24/0x2f0
[amdgpu]
[Thu Oct 17 16:43:37 2019]  process_termination_cpsch+0x24/0x2f0
[amdgpu]
[Thu Oct 17 16:43:37 2019]
kfd_process_dequeue_from_all_devices+0x42/0x60 [amdgpu]
[Thu Oct 17 16:43:37 2019]  kfd_process_notifier_release+0x1be/0x220
[amdgpu]
[Thu Oct 17 16:43:37 2019]  __mmu_notifier_release+0x3e/0xc0
[Thu Oct 17 16:43:37 2019]  exit_mmap+0x160/0x1a0
[Thu Oct 17 16:43:37 2019]  ? __handle_mm_fault+0xba3/0x1200
[Thu Oct 17 16:43:37 2019]  ? exit_robust_list+0x5a/0x110
[Thu Oct 17 16:43:37 2019]  mmput+0x4a/0x120
[Thu Oct 17 16:43:37 2019]  do_exit+0x284/0xb20
[Thu Oct 17 16:43:37 2019]  ? handle_mm_fault+0xfa/0x200
[Thu Oct 17 16:43:37 2019]  do_group_exit+0x3a/0xa0
[Thu Oct 17 16:43:37 2019]  __x64_sys_exit_group+0x14/0x20
[Thu Oct 17 16:43:37 2019]  do_syscall_64+0x4f/0x100
[Thu Oct 17 16:43:37 2019]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Signed-off-by: Philip Yang 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c   | 6 +++---
 drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 6 ++
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index d9e36dbf13d5..40d75c39f08e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -120,13 +120,13 @@ static int kfd_open(struct inode *inode, struct file 
*filep)
return -EPERM;
}
 
+   if (kfd_is_locked())
+   return -EAGAIN;
+
process = kfd_create_process(filep);
if (IS_ERR(process))
return PTR_ERR(process);
 
-   if (kfd_is_locked())
-   return -EAGAIN;
-
dev_dbg(kfd_device, "process %d opened, compat mode (32 bit) - %d\n",
process->pasid, process->is_32bit_user_mode);
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
index 8509814a6ff0..3784013b92a0 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c
@@ -128,6 +128,12 @@ void kfd_process_dequeue_from_all_devices(struct 
kfd_process *p)
 {
struct kfd_process_device *pdd;
 
+   /* If suspend/resume got stuck, dqm_lock is hold,
+* skip process_termination_cpsch to avoid deadlock
+*/
+   if (kfd_is_locked())
+   return;
+
list_for_each_entry(pdd, >per_device_data, per_device_list)
kfd_process_dequeue_from_device(pdd);
 }
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdkfd: kfd open return failed if device is locked

2019-10-18 Thread Yang, Philip


On 2019-10-18 11:40 a.m., Kuehling, Felix wrote:
> On 2019-10-18 10:27 a.m., Yang, Philip wrote:
>> If device is locked for suspend and resume, kfd open should return
>> failed -EAGAIN without creating process, otherwise the application exit
>> to release the process will hang to wait for resume is done if the suspend
>> and resume is stuck somewhere. This is backtrace:
> 
> This doesn't fix processes that were created before suspend/resume got
> stuck. They would still get stuck with the same backtrace. So this is
> jut a band-aid. The real underlying problem, that is not getting
> addressed, is suspend/resume getting stuck.
> 
> Am I missing something?
> 
This is to address application stuck to quit issue after suspend/resume 
got stuck. The real underlying suspend/resume issue should be addressed 
separately.

I will submit v2 patch to fix processes that were created before 
suspend/resume got stuck.

Philip

> Regards,
>     Felix
> 
> 
>>
>> [Thu Oct 17 16:43:37 2019] INFO: task rocminfo:3024 blocked for more
>> than 120 seconds.
>> [Thu Oct 17 16:43:37 2019]   Not tainted
>> 5.0.0-rc1-kfd-compute-rocm-dkms-no-npi-1131 #1
>> [Thu Oct 17 16:43:37 2019] "echo 0 >
>> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>> [Thu Oct 17 16:43:37 2019] rocminfoD0  3024   2947
>> 0x8000
>> [Thu Oct 17 16:43:37 2019] Call Trace:
>> [Thu Oct 17 16:43:37 2019]  ? __schedule+0x3d9/0x8a0
>> [Thu Oct 17 16:43:37 2019]  schedule+0x32/0x70
>> [Thu Oct 17 16:43:37 2019]  schedule_preempt_disabled+0xa/0x10
>> [Thu Oct 17 16:43:37 2019]  __mutex_lock.isra.9+0x1e3/0x4e0
>> [Thu Oct 17 16:43:37 2019]  ? __call_srcu+0x264/0x3b0
>> [Thu Oct 17 16:43:37 2019]  ? process_termination_cpsch+0x24/0x2f0
>> [amdgpu]
>> [Thu Oct 17 16:43:37 2019]  process_termination_cpsch+0x24/0x2f0
>> [amdgpu]
>> [Thu Oct 17 16:43:37 2019]
>> kfd_process_dequeue_from_all_devices+0x42/0x60 [amdgpu]
>> [Thu Oct 17 16:43:37 2019]  kfd_process_notifier_release+0x1be/0x220
>> [amdgpu]
>> [Thu Oct 17 16:43:37 2019]  __mmu_notifier_release+0x3e/0xc0
>> [Thu Oct 17 16:43:37 2019]  exit_mmap+0x160/0x1a0
>> [Thu Oct 17 16:43:37 2019]  ? __handle_mm_fault+0xba3/0x1200
>> [Thu Oct 17 16:43:37 2019]  ? exit_robust_list+0x5a/0x110
>> [Thu Oct 17 16:43:37 2019]  mmput+0x4a/0x120
>> [Thu Oct 17 16:43:37 2019]  do_exit+0x284/0xb20
>> [Thu Oct 17 16:43:37 2019]  ? handle_mm_fault+0xfa/0x200
>> [Thu Oct 17 16:43:37 2019]  do_group_exit+0x3a/0xa0
>> [Thu Oct 17 16:43:37 2019]  __x64_sys_exit_group+0x14/0x20
>> [Thu Oct 17 16:43:37 2019]  do_syscall_64+0x4f/0x100
>> [Thu Oct 17 16:43:37 2019]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>
>> Signed-off-by: Philip Yang 
>> ---
>>drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 6 +++---
>>1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
>> b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
>> index d9e36dbf13d5..40d75c39f08e 100644
>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
>> @@ -120,13 +120,13 @@ static int kfd_open(struct inode *inode, struct file 
>> *filep)
>>  return -EPERM;
>>  }
>>
>> +if (kfd_is_locked())
>> +return -EAGAIN;
>> +
>>  process = kfd_create_process(filep);
>>  if (IS_ERR(process))
>>  return PTR_ERR(process);
>>
>> -if (kfd_is_locked())
>> -return -EAGAIN;
>> -
>>  dev_dbg(kfd_device, "process %d opened, compat mode (32 bit) - %d\n",
>>  process->pasid, process->is_32bit_user_mode);
>>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 1/2] Revert "drm/amdgpu: fix compiler warnings for df perfmons"

2019-10-18 Thread Kim, Jonathan
This reverts commit 7dd2eb31fcd564574e8efea6bf23cf504f9e2fd7.

Signed-off-by: Jonathan Kim 
---
 drivers/gpu/drm/amd/amdgpu/df_v3_6.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/df_v3_6.c 
b/drivers/gpu/drm/amd/amdgpu/df_v3_6.c
index e1cf7e9c616a..f403c62c944e 100644
--- a/drivers/gpu/drm/amd/amdgpu/df_v3_6.c
+++ b/drivers/gpu/drm/amd/amdgpu/df_v3_6.c
@@ -93,7 +93,7 @@ const struct attribute_group *df_v3_6_attr_groups[] = {
NULL
 };
 
-static int df_v3_6_set_df_cstate(struct amdgpu_device *adev, int allow)
+static df_v3_6_set_df_cstate(struct amdgpu_device *adev, int allow)
 {
int r = 0;
 
@@ -546,7 +546,7 @@ static void df_v3_6_pmc_get_count(struct amdgpu_device 
*adev,
  uint64_t config,
  uint64_t *count)
 {
-   uint32_t lo_base_addr, hi_base_addr, lo_val = 0, hi_val = 0;
+   uint32_t lo_base_addr, hi_base_addr, lo_val, hi_val;
*count = 0;
 
switch (adev->asic_type) {
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 2/2] Revert "drm/amdgpu: disable c-states on xgmi perfmons"

2019-10-18 Thread Kim, Jonathan
This reverts commit 54275cd1649f4034c6450b6c5a8358fcd4f7dda6.

Signed-off-by: Jonathan Kim 
---
 drivers/gpu/drm/amd/amdgpu/df_v3_6.c | 36 +---
 1 file changed, 1 insertion(+), 35 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/df_v3_6.c 
b/drivers/gpu/drm/amd/amdgpu/df_v3_6.c
index f403c62c944e..16fbd2bc8ad1 100644
--- a/drivers/gpu/drm/amd/amdgpu/df_v3_6.c
+++ b/drivers/gpu/drm/amd/amdgpu/df_v3_6.c
@@ -93,21 +93,6 @@ const struct attribute_group *df_v3_6_attr_groups[] = {
NULL
 };
 
-static df_v3_6_set_df_cstate(struct amdgpu_device *adev, int allow)
-{
-   int r = 0;
-
-   if (is_support_sw_smu(adev)) {
-   r = smu_set_df_cstate(>smu, allow);
-   } else if (adev->powerplay.pp_funcs
-   && adev->powerplay.pp_funcs->set_df_cstate) {
-   r = adev->powerplay.pp_funcs->set_df_cstate(
-   adev->powerplay.pp_handle, allow);
-   }
-
-   return r;
-}
-
 static uint64_t df_v3_6_get_fica(struct amdgpu_device *adev,
 uint32_t ficaa_val)
 {
@@ -117,9 +102,6 @@ static uint64_t df_v3_6_get_fica(struct amdgpu_device *adev,
address = adev->nbio.funcs->get_pcie_index_offset(adev);
data = adev->nbio.funcs->get_pcie_data_offset(adev);
 
-   if (df_v3_6_set_df_cstate(adev, DF_CSTATE_DISALLOW))
-   return 0x;
-
spin_lock_irqsave(>pcie_idx_lock, flags);
WREG32(address, smnDF_PIE_AON_FabricIndirectConfigAccessAddress3);
WREG32(data, ficaa_val);
@@ -132,8 +114,6 @@ static uint64_t df_v3_6_get_fica(struct amdgpu_device *adev,
 
spin_unlock_irqrestore(>pcie_idx_lock, flags);
 
-   df_v3_6_set_df_cstate(adev, DF_CSTATE_ALLOW);
-
return (((ficadh_val & 0x) << 32) | ficadl_val);
 }
 
@@ -145,9 +125,6 @@ static void df_v3_6_set_fica(struct amdgpu_device *adev, 
uint32_t ficaa_val,
address = adev->nbio.funcs->get_pcie_index_offset(adev);
data = adev->nbio.funcs->get_pcie_data_offset(adev);
 
-   if (df_v3_6_set_df_cstate(adev, DF_CSTATE_DISALLOW))
-   return;
-
spin_lock_irqsave(>pcie_idx_lock, flags);
WREG32(address, smnDF_PIE_AON_FabricIndirectConfigAccessAddress3);
WREG32(data, ficaa_val);
@@ -157,9 +134,8 @@ static void df_v3_6_set_fica(struct amdgpu_device *adev, 
uint32_t ficaa_val,
 
WREG32(address, smnDF_PIE_AON_FabricIndirectConfigAccessDataHi3);
WREG32(data, ficadh_val);
-   spin_unlock_irqrestore(>pcie_idx_lock, flags);
 
-   df_v3_6_set_df_cstate(adev, DF_CSTATE_ALLOW);
+   spin_unlock_irqrestore(>pcie_idx_lock, flags);
 }
 
 /*
@@ -177,17 +153,12 @@ static void df_v3_6_perfmon_rreg(struct amdgpu_device 
*adev,
address = adev->nbio.funcs->get_pcie_index_offset(adev);
data = adev->nbio.funcs->get_pcie_data_offset(adev);
 
-   if (df_v3_6_set_df_cstate(adev, DF_CSTATE_DISALLOW))
-   return;
-
spin_lock_irqsave(>pcie_idx_lock, flags);
WREG32(address, lo_addr);
*lo_val = RREG32(data);
WREG32(address, hi_addr);
*hi_val = RREG32(data);
spin_unlock_irqrestore(>pcie_idx_lock, flags);
-
-   df_v3_6_set_df_cstate(adev, DF_CSTATE_ALLOW);
 }
 
 /*
@@ -204,17 +175,12 @@ static void df_v3_6_perfmon_wreg(struct amdgpu_device 
*adev, uint32_t lo_addr,
address = adev->nbio.funcs->get_pcie_index_offset(adev);
data = adev->nbio.funcs->get_pcie_data_offset(adev);
 
-   if (df_v3_6_set_df_cstate(adev, DF_CSTATE_DISALLOW))
-   return;
-
spin_lock_irqsave(>pcie_idx_lock, flags);
WREG32(address, lo_addr);
WREG32(data, lo_val);
WREG32(address, hi_addr);
WREG32(data, hi_val);
spin_unlock_irqrestore(>pcie_idx_lock, flags);
-
-   df_v3_6_set_df_cstate(adev, DF_CSTATE_ALLOW);
 }
 
 /* get the number of df counters available */
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 3/3] drm/amd/powerplay: clear the swSMU code layer

2019-10-18 Thread Alex Deucher
Nice cleanup.  As a next step, it would be nice to converge on a
single set of ppt functions so we could clean up all the call sites to
have one path regardless of powerplay or swSMU as the backend
implementation.
Acked-by: Alex Deucher 

On Fri, Oct 18, 2019 at 10:57 AM Quan, Evan  wrote:
>
> With this cleanup, the APIs from amdgpu_smu.c will map to
> ASIC specific ones directly. Those can be shared around
> all SMU V11/V12 ASICs will be put in smu_v11_0.c and
> smu_v12_0.c respectively.
>
> Change-Id: I9b98eb5ace5df19896de4b05c37255a38d1079ce
> Signed-off-by: Evan Quan 
> ---
>  drivers/gpu/drm/amd/amdgpu/soc15.c|   4 +-
>  .../amd/display/amdgpu_dm/amdgpu_dm_pp_smu.c  |  48 ++---
>  drivers/gpu/drm/amd/powerplay/amdgpu_smu.c| 119 +--
>  drivers/gpu/drm/amd/powerplay/arcturus_ppt.c  |  53 -
>  .../gpu/drm/amd/powerplay/inc/amdgpu_smu.h|   9 +-
>  drivers/gpu/drm/amd/powerplay/inc/smu_v11_0.h | 127 +++-
>  drivers/gpu/drm/amd/powerplay/inc/smu_v12_0.h |  41 +++-
>  drivers/gpu/drm/amd/powerplay/navi10_ppt.c|  56 +-
>  drivers/gpu/drm/amd/powerplay/renoir_ppt.c|  15 ++
>  drivers/gpu/drm/amd/powerplay/smu_cmn.h   |  84 
>  drivers/gpu/drm/amd/powerplay/smu_v11_0.c | 189 +-
>  drivers/gpu/drm/amd/powerplay/smu_v12_0.c |  70 ++-
>  drivers/gpu/drm/amd/powerplay/vega20_ppt.c|  57 +-
>  13 files changed, 542 insertions(+), 330 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c 
> b/drivers/gpu/drm/amd/amdgpu/soc15.c
> index fcae935bdc1b..a2c46e09e3e7 100644
> --- a/drivers/gpu/drm/amd/amdgpu/soc15.c
> +++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
> @@ -511,7 +511,7 @@ static int soc15_asic_baco_reset(struct amdgpu_device 
> *adev)
> if (pp_funcs->set_asic_baco_state(pp_handle, 0))
> return -EIO;
> } else {
> -   if (!smu->funcs)
> +   if (!smu->ppt_funcs)
> return -ENOENT;
>
> if (smu_baco_reset(smu))
> @@ -568,7 +568,7 @@ soc15_asic_reset_method(struct amdgpu_device *adev)
> }
> break;
> case CHIP_ARCTURUS:
> -   if (smu->funcs && smu_baco_is_support(smu))
> +   if (smu->ppt_funcs && smu_baco_is_support(smu))
> baco_reset = true;
> else
> baco_reset = false;
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_pp_smu.c 
> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_pp_smu.c
> index ee9915d61cf1..5df9e6de7c75 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_pp_smu.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_pp_smu.c
> @@ -346,7 +346,7 @@ bool dm_pp_get_clock_levels_by_type(
> /* Error in pplib. Provide default values. */
> return true;
> }
> -   } else if (adev->smu.funcs && adev->smu.funcs->get_clock_by_type) {
> +   } else if (adev->smu.ppt_funcs && 
> adev->smu.ppt_funcs->get_clock_by_type) {
> if (smu_get_clock_by_type(>smu,
>   dc_to_pp_clock_type(clk_type),
>   _clks)) {
> @@ -366,7 +366,7 @@ bool dm_pp_get_clock_levels_by_type(
> validation_clks.memory_max_clock = 8;
> validation_clks.level = 0;
> }
> -   } else if (adev->smu.funcs && adev->smu.funcs->get_max_high_clocks) {
> +   } else if (adev->smu.ppt_funcs && 
> adev->smu.ppt_funcs->get_max_high_clocks) {
> if (smu_get_max_high_clocks(>smu, _clks)) {
> DRM_INFO("DM_PPLIB: Warning: using default validation 
> clocks!\n");
> validation_clks.engine_max_clock = 72000;
> @@ -507,8 +507,8 @@ bool dm_pp_apply_clock_for_voltage_request(
> ret = adev->powerplay.pp_funcs->display_clock_voltage_request(
> adev->powerplay.pp_handle,
> _clock_request);
> -   else if (adev->smu.funcs &&
> -adev->smu.funcs->display_clock_voltage_request)
> +   else if (adev->smu.ppt_funcs &&
> +adev->smu.ppt_funcs->display_clock_voltage_request)
> ret = smu_display_clock_voltage_request(>smu,
> _clock_request);
> if (ret)
> @@ -528,7 +528,7 @@ bool dm_pp_get_static_clocks(
> ret = adev->powerplay.pp_funcs->get_current_clocks(
> adev->powerplay.pp_handle,
> _clk_info);
> -   else if (adev->smu.funcs)
> +   else if (adev->smu.ppt_funcs)
> ret = smu_get_current_clocks(>smu, _clk_info);
> if (ret)
> return false;
> @@ -590,8 +590,8 @@ void pp_rv_set_wm_ranges(struct pp_smu *pp,
> if (pp_funcs && 

Re: [PATCH 2/3] drm/amd/powerplay: split out those internal used swSMU APIs

2019-10-18 Thread Alex Deucher
On Fri, Oct 18, 2019 at 10:57 AM Quan, Evan  wrote:
>
> Those swSMU APIs used internally are moved to smu_cmn.h while
> others are kept in amdgpu_smu.h.
>

Maybe call this smu_internal.h so it's clear these are internal SMU interfaces.

Alex

> Change-Id: Ib726ef7f65dee46e47a07680b71e6e043e459f42
> Signed-off-by: Evan Quan 
> ---
>  drivers/gpu/drm/amd/powerplay/amdgpu_smu.c|   1 +
>  drivers/gpu/drm/amd/powerplay/arcturus_ppt.c  |   1 +
>  .../gpu/drm/amd/powerplay/inc/amdgpu_smu.h| 164 +-
>  drivers/gpu/drm/amd/powerplay/navi10_ppt.c|   1 +
>  drivers/gpu/drm/amd/powerplay/renoir_ppt.c|   1 +
>  drivers/gpu/drm/amd/powerplay/smu_cmn.h   | 206 ++
>  drivers/gpu/drm/amd/powerplay/smu_v11_0.c |   1 +
>  drivers/gpu/drm/amd/powerplay/smu_v12_0.c |   1 +
>  drivers/gpu/drm/amd/powerplay/vega20_ppt.c|   1 +
>  9 files changed, 214 insertions(+), 163 deletions(-)
>  create mode 100644 drivers/gpu/drm/amd/powerplay/smu_cmn.h
>
> diff --git a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c 
> b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
> index 0841d8c79e5b..184b6d034d51 100644
> --- a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
> +++ b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
> @@ -25,6 +25,7 @@
>  #include 
>  #include "amdgpu.h"
>  #include "amdgpu_smu.h"
> +#include "smu_cmn.h"
>  #include "soc15_common.h"
>  #include "smu_v11_0.h"
>  #include "smu_v12_0.h"
> diff --git a/drivers/gpu/drm/amd/powerplay/arcturus_ppt.c 
> b/drivers/gpu/drm/amd/powerplay/arcturus_ppt.c
> index 141e48cd1c5d..19825576233f 100644
> --- a/drivers/gpu/drm/amd/powerplay/arcturus_ppt.c
> +++ b/drivers/gpu/drm/amd/powerplay/arcturus_ppt.c
> @@ -25,6 +25,7 @@
>  #include 
>  #include "amdgpu.h"
>  #include "amdgpu_smu.h"
> +#include "smu_cmn.h"
>  #include "atomfirmware.h"
>  #include "amdgpu_atomfirmware.h"
>  #include "smu_v11_0.h"
> diff --git a/drivers/gpu/drm/amd/powerplay/inc/amdgpu_smu.h 
> b/drivers/gpu/drm/amd/powerplay/inc/amdgpu_smu.h
> index 3e3464fa2ff5..d01e40184fe0 100644
> --- a/drivers/gpu/drm/amd/powerplay/inc/amdgpu_smu.h
> +++ b/drivers/gpu/drm/amd/powerplay/inc/amdgpu_smu.h
> @@ -555,92 +555,13 @@ struct smu_funcs
> int (*override_pcie_parameters)(struct smu_context *smu);
>  };
>
> -#define smu_init_microcode(smu) \
> -   ((smu)->funcs->init_microcode ? (smu)->funcs->init_microcode((smu)) : 
> 0)
> -#define smu_init_smc_tables(smu) \
> -   ((smu)->funcs->init_smc_tables ? (smu)->funcs->init_smc_tables((smu)) 
> : 0)
> -#define smu_fini_smc_tables(smu) \
> -   ((smu)->funcs->fini_smc_tables ? (smu)->funcs->fini_smc_tables((smu)) 
> : 0)
> -#define smu_init_power(smu) \
> -   ((smu)->funcs->init_power ? (smu)->funcs->init_power((smu)) : 0)
> -#define smu_fini_power(smu) \
> -   ((smu)->funcs->fini_power ? (smu)->funcs->fini_power((smu)) : 0)
>  int smu_load_microcode(struct smu_context *smu);
>
>  int smu_check_fw_status(struct smu_context *smu);
>
> -#define smu_setup_pptable(smu) \
> -   ((smu)->funcs->setup_pptable ? (smu)->funcs->setup_pptable((smu)) : 0)
> -#define smu_powergate_sdma(smu, gate) \
> -   ((smu)->funcs->powergate_sdma ? (smu)->funcs->powergate_sdma((smu), 
> (gate)) : 0)
> -#define smu_powergate_vcn(smu, gate) \
> -   ((smu)->funcs->powergate_vcn ? (smu)->funcs->powergate_vcn((smu), 
> (gate)) : 0)
>  int smu_set_gfx_cgpg(struct smu_context *smu, bool enabled);
> -#define smu_get_vbios_bootup_values(smu) \
> -   ((smu)->funcs->get_vbios_bootup_values ? 
> (smu)->funcs->get_vbios_bootup_values((smu)) : 0)
> -#define smu_get_clk_info_from_vbios(smu) \
> -   ((smu)->funcs->get_clk_info_from_vbios ? 
> (smu)->funcs->get_clk_info_from_vbios((smu)) : 0)
> -#define smu_check_pptable(smu) \
> -   ((smu)->funcs->check_pptable ? (smu)->funcs->check_pptable((smu)) : 0)
> -#define smu_parse_pptable(smu) \
> -   ((smu)->funcs->parse_pptable ? (smu)->funcs->parse_pptable((smu)) : 0)
> -#define smu_populate_smc_tables(smu) \
> -   ((smu)->funcs->populate_smc_tables ? 
> (smu)->funcs->populate_smc_tables((smu)) : 0)
> -#define smu_check_fw_version(smu) \
> -   ((smu)->funcs->check_fw_version ? 
> (smu)->funcs->check_fw_version((smu)) : 0)
> -#define smu_write_pptable(smu) \
> -   ((smu)->funcs->write_pptable ? (smu)->funcs->write_pptable((smu)) : 0)
> -#define smu_set_min_dcef_deep_sleep(smu) \
> -   ((smu)->funcs->set_min_dcef_deep_sleep ? 
> (smu)->funcs->set_min_dcef_deep_sleep((smu)) : 0)
> -#define smu_set_tool_table_location(smu) \
> -   ((smu)->funcs->set_tool_table_location ? 
> (smu)->funcs->set_tool_table_location((smu)) : 0)
> -#define smu_notify_memory_pool_location(smu) \
> -   ((smu)->funcs->notify_memory_pool_location ? 
> (smu)->funcs->notify_memory_pool_location((smu)) : 0)
> -#define smu_gfx_off_control(smu, enable) \
> -   ((smu)->funcs->gfx_off_control ? (smu)->funcs->gfx_off_control((smu), 
> (enable)) : 0)
> -
> -#define smu_write_watermarks_table(smu) \
> 

Re: [PATCH] drm/amdkfd: kfd open return failed if device is locked

2019-10-18 Thread Kuehling, Felix
On 2019-10-18 10:27 a.m., Yang, Philip wrote:
> If device is locked for suspend and resume, kfd open should return
> failed -EAGAIN without creating process, otherwise the application exit
> to release the process will hang to wait for resume is done if the suspend
> and resume is stuck somewhere. This is backtrace:

This doesn't fix processes that were created before suspend/resume got 
stuck. They would still get stuck with the same backtrace. So this is 
jut a band-aid. The real underlying problem, that is not getting 
addressed, is suspend/resume getting stuck.

Am I missing something?

Regards,
   Felix


>
> [Thu Oct 17 16:43:37 2019] INFO: task rocminfo:3024 blocked for more
> than 120 seconds.
> [Thu Oct 17 16:43:37 2019]   Not tainted
> 5.0.0-rc1-kfd-compute-rocm-dkms-no-npi-1131 #1
> [Thu Oct 17 16:43:37 2019] "echo 0 >
> /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [Thu Oct 17 16:43:37 2019] rocminfoD0  3024   2947
> 0x8000
> [Thu Oct 17 16:43:37 2019] Call Trace:
> [Thu Oct 17 16:43:37 2019]  ? __schedule+0x3d9/0x8a0
> [Thu Oct 17 16:43:37 2019]  schedule+0x32/0x70
> [Thu Oct 17 16:43:37 2019]  schedule_preempt_disabled+0xa/0x10
> [Thu Oct 17 16:43:37 2019]  __mutex_lock.isra.9+0x1e3/0x4e0
> [Thu Oct 17 16:43:37 2019]  ? __call_srcu+0x264/0x3b0
> [Thu Oct 17 16:43:37 2019]  ? process_termination_cpsch+0x24/0x2f0
> [amdgpu]
> [Thu Oct 17 16:43:37 2019]  process_termination_cpsch+0x24/0x2f0
> [amdgpu]
> [Thu Oct 17 16:43:37 2019]
> kfd_process_dequeue_from_all_devices+0x42/0x60 [amdgpu]
> [Thu Oct 17 16:43:37 2019]  kfd_process_notifier_release+0x1be/0x220
> [amdgpu]
> [Thu Oct 17 16:43:37 2019]  __mmu_notifier_release+0x3e/0xc0
> [Thu Oct 17 16:43:37 2019]  exit_mmap+0x160/0x1a0
> [Thu Oct 17 16:43:37 2019]  ? __handle_mm_fault+0xba3/0x1200
> [Thu Oct 17 16:43:37 2019]  ? exit_robust_list+0x5a/0x110
> [Thu Oct 17 16:43:37 2019]  mmput+0x4a/0x120
> [Thu Oct 17 16:43:37 2019]  do_exit+0x284/0xb20
> [Thu Oct 17 16:43:37 2019]  ? handle_mm_fault+0xfa/0x200
> [Thu Oct 17 16:43:37 2019]  do_group_exit+0x3a/0xa0
> [Thu Oct 17 16:43:37 2019]  __x64_sys_exit_group+0x14/0x20
> [Thu Oct 17 16:43:37 2019]  do_syscall_64+0x4f/0x100
> [Thu Oct 17 16:43:37 2019]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> Signed-off-by: Philip Yang 
> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 6 +++---
>   1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> index d9e36dbf13d5..40d75c39f08e 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
> @@ -120,13 +120,13 @@ static int kfd_open(struct inode *inode, struct file 
> *filep)
>   return -EPERM;
>   }
>   
> + if (kfd_is_locked())
> + return -EAGAIN;
> +
>   process = kfd_create_process(filep);
>   if (IS_ERR(process))
>   return PTR_ERR(process);
>   
> - if (kfd_is_locked())
> - return -EAGAIN;
> -
>   dev_dbg(kfd_device, "process %d opened, compat mode (32 bit) - %d\n",
>   process->pasid, process->is_32bit_user_mode);
>   
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [RFC] drm: Add AMD GFX9+ format modifiers.

2019-10-18 Thread Bas Nieuwenhuizen
On Thu, Oct 17, 2019 at 9:50 PM Marek Olšák  wrote:
>
> On Wed, Oct 16, 2019 at 9:48 AM Bas Nieuwenhuizen  
> wrote:
>>
>> This adds initial format modifiers for AMD GFX9 and newer GPUs.
>>
>> This is particularly useful to determine if we can use DCC, and whether
>> we need an extra display compatible DCC metadata plane.
>>
>> Design decisions:
>>   - Always expose a single plane
>>This way everything works correctly with images with multiple planes.
>>
>>   - Do not add an extra memory region in DCC for putting a bit on whether
>> we are in compressed state.
>>A decompress on import is cheap enough if already decompressed, and
>>I do think in most cases we can avoid it in advance during modifier
>>negotiation. The remainder is probably not common enough to worry
>>about.
>>
>>   - Explicitly define the sizes as part of the modifier description instead
>> of using whatever the current version of radeonsi does.
>>This way we can avoid dedicated buffers and we can make sure we keep
>>compatibility across mesa versions. I'd like to put some tests on
>>this on ac_surface.c so we can learn early in the process if things
>>need to be changed. Furthermore, the lack of configurable strides on
>>GFX10 means things already go wrong if we do not agree, making a
>>custom stride somewhat less useful.
>
>
> The custom stride will be back for 2D images (not for 3D/Array), so Navi10-14 
> will be the only hw not supporting the custom stride for 2D. It might not be 
> worth adding the width and height into the modifier just because of 
> Navi10-14, though I don't feel strongly about it.

Right, I'll clarify the text.

I meant standardizing how we get the surface_size/dcc_size/total_size
(+ alignment of DCC metadata if bigger than surface alignment), so we
get to agree about offsets.

I believe we should not put in width/height in the modifier as (1) we
are allowed to assume every party in negotiation puts in the same
width (even though minigbm violates that currently ...) (2) this would
be not workable with most enumeration APIs.

>
> This patch doesn't add the sizes into the description anyway.
>
> The rest looks good.
>
> Marek
>
>>
>>
>>   - No usage of BO metadata at all for modifier usecases.
>>To avoid the requirement of dedicated dma bufs per image. For
>>non-modifier based interop we still use the BO metadata, since we
>>need to keep compatibility with old mesa and this is used for
>>depth/msaa/3d/CL etc. API interop.
>>
>>   - A single FD for all planes.
>>Easier in Vulkan / bindless and radeonsi is already transitioning.
>>
>>   - Make a single modifier for DCN1
>>   It defines things uniquely given bpp, which we can assume, so adding
>>   more modifier values do not add clarity.
>>
>>   - Not exposing the 4K and 256B tiling modes.
>>   These are largely only better for something like a cursor or very long
>>   and/or tall images. Are they worth the added complexity to save memory?
>>   For context, at 32bpp, tiles are 128x128 pixels.
>>
>>   - For multiplane images, every plane uses the same tiling.
>>   On GFX9/GFX10 we can, so no need to make it complicated.
>>
>>   - We use family_id + external_rev to distinguish between incompatible GPUs.
>>   PCI ID is not enough, as RAVEN and RAVEN2 have the same PCI device id,
>>   but different tiling. We might be able to find bigger equivalence
>>   groups for _X, but especially for DCC I would be uncomfortable making 
>> it
>>   shared between GPUs.
>>
>>   - For DCN1 DCC, radeonsi currently uses another texelbuffer with indices
>> to reorder. This is not shared.
>>   Specific to current implementation and does not need to be shared. To
>>   pave the way to shader-based solution, lets keep this internal to each
>>   driver. This should reduce the modifier churn if any of the driver
>>   implementations change. (Especially as you'd want to support the old
>>   implementation for a while to stay compatible with old kernels not
>>   supporting a new modifier yet).
>>
>>   - No support for rotated swizzling.
>>   Can be added easily later and nothing in the stack would generate it
>>   currently.
>>
>>   - Add extra enum values in the definitions.
>>   This way we can easily switch on modifier without having to pass around
>>   the current GPU everywhere, assuming the modifier has been validated.
>> ---
>>
>>  Since my previous attempt for modifiers got bogged down on details for
>>  the GFX6-GFX8 modifiers in previous discussions, this only attempts to
>>  define modifiers for GFX9+, which is significantly simpler.
>>
>>  For a final version I'd like to wait until I have written most of the
>>  userspace + kernelspace so we can actually test it. However, I'd
>>  appreciate any early feedback people are willing to give.
>>
>>  Initial Mesa amd/common 

Re: [PATCH] drm/amd/powerplay: add lock protection for swSMU APIs

2019-10-18 Thread Grodzovsky, Andrey

On 10/18/19 1:00 AM, Quan, Evan wrote:
>
> -Original Message-
> From: Grodzovsky, Andrey 
> Sent: Thursday, October 17, 2019 10:22 PM
> To: Quan, Evan ; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH] drm/amd/powerplay: add lock protection for swSMU APIs
>
>
> On 10/16/19 11:55 PM, Quan, Evan wrote:
>> This is a quick and low risk fix. Those APIs which
>> are exposed to other IPs or to support sysfs/hwmon
>> interfaces or DAL will have lock protection. Meanwhile
>> no lock protection is enforced for swSMU internal used
>> APIs. Future optimization is needed.
>
> Does it mean that there is still risk of collision on SMU access between
> external API function to internal one ?
>
> [Quan, Evan] should not. Neither SMU or other IPs should access those 
> internal APIs directly after SMU ip setup completely(after late_int).
> The access should always be through those external APIs. In fact I run a 
> compute stress test over night with 10 terminals accessing the amdgpu_pm_info 
> sysfs at the same time and did not see any problem. So, the implementation 
> should be safe.
> The "optimization" mentioned here is about code style and readability.


I see.

Acked-by: Andrey Grodzovsky 

Andrey


>
> Andrey
>
>
>> Change-Id: I8392652c9da1574a85acd9b171f04380f3630852
>> Signed-off-by: Evan Quan 
>> ---
>>drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.c   |   6 +-
>>drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.h   |   6 -
>>drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c|  23 +-
>>.../amd/display/amdgpu_dm/amdgpu_dm_pp_smu.c  |   4 +-
>>drivers/gpu/drm/amd/powerplay/amdgpu_smu.c| 684 --
>>.../gpu/drm/amd/powerplay/inc/amdgpu_smu.h| 163 +++--
>>drivers/gpu/drm/amd/powerplay/navi10_ppt.c|  15 +-
>>drivers/gpu/drm/amd/powerplay/renoir_ppt.c|  12 +-
>>drivers/gpu/drm/amd/powerplay/smu_v11_0.c |   7 +-
>>drivers/gpu/drm/amd/powerplay/vega20_ppt.c|   6 +-
>>10 files changed, 773 insertions(+), 153 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.c
>> index 263265245e19..28d32725285b 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.c
>> @@ -912,7 +912,8 @@ int amdgpu_dpm_get_sclk(struct amdgpu_device *adev, bool 
>> low)
>>  if (is_support_sw_smu(adev)) {
>>  ret = smu_get_dpm_freq_range(>smu, SMU_GFXCLK,
>>   low ? _freq : NULL,
>> - !low ? _freq : NULL);
>> + !low ? _freq : NULL,
>> + true);
>>  if (ret)
>>  return 0;
>>  return clk_freq * 100;
>> @@ -930,7 +931,8 @@ int amdgpu_dpm_get_mclk(struct amdgpu_device *adev, bool 
>> low)
>>  if (is_support_sw_smu(adev)) {
>>  ret = smu_get_dpm_freq_range(>smu, SMU_UCLK,
>>   low ? _freq : NULL,
>> - !low ? _freq : NULL);
>> + !low ? _freq : NULL,
>> + true);
>>  if (ret)
>>  return 0;
>>  return clk_freq * 100;
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.h 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.h
>> index 1c5c0fd76dbf..2cfb677272af 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.h
>> @@ -298,12 +298,6 @@ enum amdgpu_pcie_gen {
>>#define amdgpu_dpm_get_current_power_state(adev) \
>>  
>> ((adev)->powerplay.pp_funcs->get_current_power_state((adev)->powerplay.pp_handle))
>>
>> -#define amdgpu_smu_get_current_power_state(adev) \
>> -((adev)->smu.ppt_funcs->get_current_power_state(&((adev)->smu)))
>> -
>> -#define amdgpu_smu_set_power_state(adev) \
>> -((adev)->smu.ppt_funcs->set_power_state(&((adev)->smu)))
>> -
>>#define amdgpu_dpm_get_pp_num_states(adev, data) \
>>  
>> ((adev)->powerplay.pp_funcs->get_pp_num_states((adev)->powerplay.pp_handle, 
>> data))
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
>> index c50d5f1e75e5..36f36b35000d 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
>> @@ -211,7 +211,7 @@ static ssize_t amdgpu_get_dpm_state(struct device *dev,
>>
>>  if (is_support_sw_smu(adev)) {
>>  if (adev->smu.ppt_funcs->get_current_power_state)
>> -pm = amdgpu_smu_get_current_power_state(adev);
>> +pm = smu_get_current_power_state(>smu);
>>  else
>>  pm = adev->pm.dpm.user_state;
>>  } else if (adev->powerplay.pp_funcs->get_current_power_state) {
>> @@ -957,7 +957,7 @@ static ssize_t 

[PATCH 1/3] drm/amd/powerplay: add lock protection for swSMU APIs V2

2019-10-18 Thread Quan, Evan
This is a quick and low risk fix. Those APIs which
are exposed to other IPs or to support sysfs/hwmon
interfaces or DAL will have lock protection. Meanwhile
no lock protection is enforced for swSMU internal used
APIs. Future optimization is needed.

V2: strip the lock protection for all swSMU internal APIs

Change-Id: I8392652c9da1574a85acd9b171f04380f3630852
Signed-off-by: Evan Quan 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.c   |   6 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.h   |   6 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c|  23 +-
 .../amd/display/amdgpu_dm/amdgpu_dm_pp_smu.c  |   4 +-
 drivers/gpu/drm/amd/powerplay/amdgpu_smu.c| 696 --
 drivers/gpu/drm/amd/powerplay/arcturus_ppt.c  |   3 -
 .../gpu/drm/amd/powerplay/inc/amdgpu_smu.h| 163 ++--
 drivers/gpu/drm/amd/powerplay/navi10_ppt.c|  15 +-
 drivers/gpu/drm/amd/powerplay/renoir_ppt.c|  14 +-
 drivers/gpu/drm/amd/powerplay/smu_v11_0.c |  22 +-
 drivers/gpu/drm/amd/powerplay/smu_v12_0.c |   3 -
 drivers/gpu/drm/amd/powerplay/vega20_ppt.c|  20 +-
 12 files changed, 777 insertions(+), 198 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.c
index 263265245e19..28d32725285b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.c
@@ -912,7 +912,8 @@ int amdgpu_dpm_get_sclk(struct amdgpu_device *adev, bool 
low)
if (is_support_sw_smu(adev)) {
ret = smu_get_dpm_freq_range(>smu, SMU_GFXCLK,
 low ? _freq : NULL,
-!low ? _freq : NULL);
+!low ? _freq : NULL,
+true);
if (ret)
return 0;
return clk_freq * 100;
@@ -930,7 +931,8 @@ int amdgpu_dpm_get_mclk(struct amdgpu_device *adev, bool 
low)
if (is_support_sw_smu(adev)) {
ret = smu_get_dpm_freq_range(>smu, SMU_UCLK,
 low ? _freq : NULL,
-!low ? _freq : NULL);
+!low ? _freq : NULL,
+true);
if (ret)
return 0;
return clk_freq * 100;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.h
index 1c5c0fd76dbf..2cfb677272af 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_dpm.h
@@ -298,12 +298,6 @@ enum amdgpu_pcie_gen {
 #define amdgpu_dpm_get_current_power_state(adev) \

((adev)->powerplay.pp_funcs->get_current_power_state((adev)->powerplay.pp_handle))
 
-#define amdgpu_smu_get_current_power_state(adev) \
-   ((adev)->smu.ppt_funcs->get_current_power_state(&((adev)->smu)))
-
-#define amdgpu_smu_set_power_state(adev) \
-   ((adev)->smu.ppt_funcs->set_power_state(&((adev)->smu)))
-
 #define amdgpu_dpm_get_pp_num_states(adev, data) \

((adev)->powerplay.pp_funcs->get_pp_num_states((adev)->powerplay.pp_handle, 
data))
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
index c50d5f1e75e5..36f36b35000d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_pm.c
@@ -211,7 +211,7 @@ static ssize_t amdgpu_get_dpm_state(struct device *dev,
 
if (is_support_sw_smu(adev)) {
if (adev->smu.ppt_funcs->get_current_power_state)
-   pm = amdgpu_smu_get_current_power_state(adev);
+   pm = smu_get_current_power_state(>smu);
else
pm = adev->pm.dpm.user_state;
} else if (adev->powerplay.pp_funcs->get_current_power_state) {
@@ -957,7 +957,7 @@ static ssize_t amdgpu_set_pp_dpm_sclk(struct device *dev,
return ret;
 
if (is_support_sw_smu(adev))
-   ret = smu_force_clk_levels(>smu, SMU_SCLK, mask);
+   ret = smu_force_clk_levels(>smu, SMU_SCLK, mask, true);
else if (adev->powerplay.pp_funcs->force_clock_level)
ret = amdgpu_dpm_force_clock_level(adev, PP_SCLK, mask);
 
@@ -1004,7 +1004,7 @@ static ssize_t amdgpu_set_pp_dpm_mclk(struct device *dev,
return ret;
 
if (is_support_sw_smu(adev))
-   ret = smu_force_clk_levels(>smu, SMU_MCLK, mask);
+   ret = smu_force_clk_levels(>smu, SMU_MCLK, mask, true);
else if (adev->powerplay.pp_funcs->force_clock_level)
ret = amdgpu_dpm_force_clock_level(adev, PP_MCLK, mask);
 
@@ -1044,7 +1044,7 @@ static ssize_t amdgpu_set_pp_dpm_socclk(struct device 
*dev,
return ret;
 
if (is_support_sw_smu(adev))
-   ret = smu_force_clk_levels(>smu, 

[PATCH 3/3] drm/amd/powerplay: clear the swSMU code layer

2019-10-18 Thread Quan, Evan
With this cleanup, the APIs from amdgpu_smu.c will map to
ASIC specific ones directly. Those can be shared around
all SMU V11/V12 ASICs will be put in smu_v11_0.c and
smu_v12_0.c respectively.

Change-Id: I9b98eb5ace5df19896de4b05c37255a38d1079ce
Signed-off-by: Evan Quan 
---
 drivers/gpu/drm/amd/amdgpu/soc15.c|   4 +-
 .../amd/display/amdgpu_dm/amdgpu_dm_pp_smu.c  |  48 ++---
 drivers/gpu/drm/amd/powerplay/amdgpu_smu.c| 119 +--
 drivers/gpu/drm/amd/powerplay/arcturus_ppt.c  |  53 -
 .../gpu/drm/amd/powerplay/inc/amdgpu_smu.h|   9 +-
 drivers/gpu/drm/amd/powerplay/inc/smu_v11_0.h | 127 +++-
 drivers/gpu/drm/amd/powerplay/inc/smu_v12_0.h |  41 +++-
 drivers/gpu/drm/amd/powerplay/navi10_ppt.c|  56 +-
 drivers/gpu/drm/amd/powerplay/renoir_ppt.c|  15 ++
 drivers/gpu/drm/amd/powerplay/smu_cmn.h   |  84 
 drivers/gpu/drm/amd/powerplay/smu_v11_0.c | 189 +-
 drivers/gpu/drm/amd/powerplay/smu_v12_0.c |  70 ++-
 drivers/gpu/drm/amd/powerplay/vega20_ppt.c|  57 +-
 13 files changed, 542 insertions(+), 330 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/soc15.c 
b/drivers/gpu/drm/amd/amdgpu/soc15.c
index fcae935bdc1b..a2c46e09e3e7 100644
--- a/drivers/gpu/drm/amd/amdgpu/soc15.c
+++ b/drivers/gpu/drm/amd/amdgpu/soc15.c
@@ -511,7 +511,7 @@ static int soc15_asic_baco_reset(struct amdgpu_device *adev)
if (pp_funcs->set_asic_baco_state(pp_handle, 0))
return -EIO;
} else {
-   if (!smu->funcs)
+   if (!smu->ppt_funcs)
return -ENOENT;
 
if (smu_baco_reset(smu))
@@ -568,7 +568,7 @@ soc15_asic_reset_method(struct amdgpu_device *adev)
}
break;
case CHIP_ARCTURUS:
-   if (smu->funcs && smu_baco_is_support(smu))
+   if (smu->ppt_funcs && smu_baco_is_support(smu))
baco_reset = true;
else
baco_reset = false;
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_pp_smu.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_pp_smu.c
index ee9915d61cf1..5df9e6de7c75 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_pp_smu.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_pp_smu.c
@@ -346,7 +346,7 @@ bool dm_pp_get_clock_levels_by_type(
/* Error in pplib. Provide default values. */
return true;
}
-   } else if (adev->smu.funcs && adev->smu.funcs->get_clock_by_type) {
+   } else if (adev->smu.ppt_funcs && 
adev->smu.ppt_funcs->get_clock_by_type) {
if (smu_get_clock_by_type(>smu,
  dc_to_pp_clock_type(clk_type),
  _clks)) {
@@ -366,7 +366,7 @@ bool dm_pp_get_clock_levels_by_type(
validation_clks.memory_max_clock = 8;
validation_clks.level = 0;
}
-   } else if (adev->smu.funcs && adev->smu.funcs->get_max_high_clocks) {
+   } else if (adev->smu.ppt_funcs && 
adev->smu.ppt_funcs->get_max_high_clocks) {
if (smu_get_max_high_clocks(>smu, _clks)) {
DRM_INFO("DM_PPLIB: Warning: using default validation 
clocks!\n");
validation_clks.engine_max_clock = 72000;
@@ -507,8 +507,8 @@ bool dm_pp_apply_clock_for_voltage_request(
ret = adev->powerplay.pp_funcs->display_clock_voltage_request(
adev->powerplay.pp_handle,
_clock_request);
-   else if (adev->smu.funcs &&
-adev->smu.funcs->display_clock_voltage_request)
+   else if (adev->smu.ppt_funcs &&
+adev->smu.ppt_funcs->display_clock_voltage_request)
ret = smu_display_clock_voltage_request(>smu,
_clock_request);
if (ret)
@@ -528,7 +528,7 @@ bool dm_pp_get_static_clocks(
ret = adev->powerplay.pp_funcs->get_current_clocks(
adev->powerplay.pp_handle,
_clk_info);
-   else if (adev->smu.funcs)
+   else if (adev->smu.ppt_funcs)
ret = smu_get_current_clocks(>smu, _clk_info);
if (ret)
return false;
@@ -590,8 +590,8 @@ void pp_rv_set_wm_ranges(struct pp_smu *pp,
if (pp_funcs && pp_funcs->set_watermarks_for_clocks_ranges)
pp_funcs->set_watermarks_for_clocks_ranges(pp_handle,
   
_with_clock_ranges);
-   else if (adev->smu.funcs &&
-adev->smu.funcs->set_watermarks_for_clock_ranges)
+   else if (adev->smu.ppt_funcs &&
+adev->smu.ppt_funcs->set_watermarks_for_clock_ranges)
smu_set_watermarks_for_clock_ranges(>smu,

[PATCH 2/3] drm/amd/powerplay: split out those internal used swSMU APIs

2019-10-18 Thread Quan, Evan
Those swSMU APIs used internally are moved to smu_cmn.h while
others are kept in amdgpu_smu.h.

Change-Id: Ib726ef7f65dee46e47a07680b71e6e043e459f42
Signed-off-by: Evan Quan 
---
 drivers/gpu/drm/amd/powerplay/amdgpu_smu.c|   1 +
 drivers/gpu/drm/amd/powerplay/arcturus_ppt.c  |   1 +
 .../gpu/drm/amd/powerplay/inc/amdgpu_smu.h| 164 +-
 drivers/gpu/drm/amd/powerplay/navi10_ppt.c|   1 +
 drivers/gpu/drm/amd/powerplay/renoir_ppt.c|   1 +
 drivers/gpu/drm/amd/powerplay/smu_cmn.h   | 206 ++
 drivers/gpu/drm/amd/powerplay/smu_v11_0.c |   1 +
 drivers/gpu/drm/amd/powerplay/smu_v12_0.c |   1 +
 drivers/gpu/drm/amd/powerplay/vega20_ppt.c|   1 +
 9 files changed, 214 insertions(+), 163 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/powerplay/smu_cmn.h

diff --git a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c 
b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
index 0841d8c79e5b..184b6d034d51 100644
--- a/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
+++ b/drivers/gpu/drm/amd/powerplay/amdgpu_smu.c
@@ -25,6 +25,7 @@
 #include 
 #include "amdgpu.h"
 #include "amdgpu_smu.h"
+#include "smu_cmn.h"
 #include "soc15_common.h"
 #include "smu_v11_0.h"
 #include "smu_v12_0.h"
diff --git a/drivers/gpu/drm/amd/powerplay/arcturus_ppt.c 
b/drivers/gpu/drm/amd/powerplay/arcturus_ppt.c
index 141e48cd1c5d..19825576233f 100644
--- a/drivers/gpu/drm/amd/powerplay/arcturus_ppt.c
+++ b/drivers/gpu/drm/amd/powerplay/arcturus_ppt.c
@@ -25,6 +25,7 @@
 #include 
 #include "amdgpu.h"
 #include "amdgpu_smu.h"
+#include "smu_cmn.h"
 #include "atomfirmware.h"
 #include "amdgpu_atomfirmware.h"
 #include "smu_v11_0.h"
diff --git a/drivers/gpu/drm/amd/powerplay/inc/amdgpu_smu.h 
b/drivers/gpu/drm/amd/powerplay/inc/amdgpu_smu.h
index 3e3464fa2ff5..d01e40184fe0 100644
--- a/drivers/gpu/drm/amd/powerplay/inc/amdgpu_smu.h
+++ b/drivers/gpu/drm/amd/powerplay/inc/amdgpu_smu.h
@@ -555,92 +555,13 @@ struct smu_funcs
int (*override_pcie_parameters)(struct smu_context *smu);
 };
 
-#define smu_init_microcode(smu) \
-   ((smu)->funcs->init_microcode ? (smu)->funcs->init_microcode((smu)) : 0)
-#define smu_init_smc_tables(smu) \
-   ((smu)->funcs->init_smc_tables ? (smu)->funcs->init_smc_tables((smu)) : 
0)
-#define smu_fini_smc_tables(smu) \
-   ((smu)->funcs->fini_smc_tables ? (smu)->funcs->fini_smc_tables((smu)) : 
0)
-#define smu_init_power(smu) \
-   ((smu)->funcs->init_power ? (smu)->funcs->init_power((smu)) : 0)
-#define smu_fini_power(smu) \
-   ((smu)->funcs->fini_power ? (smu)->funcs->fini_power((smu)) : 0)
 int smu_load_microcode(struct smu_context *smu);
 
 int smu_check_fw_status(struct smu_context *smu);
 
-#define smu_setup_pptable(smu) \
-   ((smu)->funcs->setup_pptable ? (smu)->funcs->setup_pptable((smu)) : 0)
-#define smu_powergate_sdma(smu, gate) \
-   ((smu)->funcs->powergate_sdma ? (smu)->funcs->powergate_sdma((smu), 
(gate)) : 0)
-#define smu_powergate_vcn(smu, gate) \
-   ((smu)->funcs->powergate_vcn ? (smu)->funcs->powergate_vcn((smu), 
(gate)) : 0)
 int smu_set_gfx_cgpg(struct smu_context *smu, bool enabled);
-#define smu_get_vbios_bootup_values(smu) \
-   ((smu)->funcs->get_vbios_bootup_values ? 
(smu)->funcs->get_vbios_bootup_values((smu)) : 0)
-#define smu_get_clk_info_from_vbios(smu) \
-   ((smu)->funcs->get_clk_info_from_vbios ? 
(smu)->funcs->get_clk_info_from_vbios((smu)) : 0)
-#define smu_check_pptable(smu) \
-   ((smu)->funcs->check_pptable ? (smu)->funcs->check_pptable((smu)) : 0)
-#define smu_parse_pptable(smu) \
-   ((smu)->funcs->parse_pptable ? (smu)->funcs->parse_pptable((smu)) : 0)
-#define smu_populate_smc_tables(smu) \
-   ((smu)->funcs->populate_smc_tables ? 
(smu)->funcs->populate_smc_tables((smu)) : 0)
-#define smu_check_fw_version(smu) \
-   ((smu)->funcs->check_fw_version ? (smu)->funcs->check_fw_version((smu)) 
: 0)
-#define smu_write_pptable(smu) \
-   ((smu)->funcs->write_pptable ? (smu)->funcs->write_pptable((smu)) : 0)
-#define smu_set_min_dcef_deep_sleep(smu) \
-   ((smu)->funcs->set_min_dcef_deep_sleep ? 
(smu)->funcs->set_min_dcef_deep_sleep((smu)) : 0)
-#define smu_set_tool_table_location(smu) \
-   ((smu)->funcs->set_tool_table_location ? 
(smu)->funcs->set_tool_table_location((smu)) : 0)
-#define smu_notify_memory_pool_location(smu) \
-   ((smu)->funcs->notify_memory_pool_location ? 
(smu)->funcs->notify_memory_pool_location((smu)) : 0)
-#define smu_gfx_off_control(smu, enable) \
-   ((smu)->funcs->gfx_off_control ? (smu)->funcs->gfx_off_control((smu), 
(enable)) : 0)
-
-#define smu_write_watermarks_table(smu) \
-   ((smu)->funcs->write_watermarks_table ? 
(smu)->funcs->write_watermarks_table((smu)) : 0)
-#define smu_set_last_dcef_min_deep_sleep_clk(smu) \
-   ((smu)->funcs->set_last_dcef_min_deep_sleep_clk ? 
(smu)->funcs->set_last_dcef_min_deep_sleep_clk((smu)) : 0)
-#define smu_system_features_control(smu, en) \
-   

Re: Spontaneous reboots when using RX 560

2019-10-18 Thread Sylvain Munaut
Hi Alex,


> Does disabling the IOMMU help?  E.g., append IOMMU=off or IOMMU=pt on
> the kernel command line in grub.

Good suggestion, I should have tried that earlier, unfortunately it
doesn't change anything :/

I tried both independently and also combining with pci=noats and cg/pg
mask=0. Same behavior.
The actual message in dmesg vary slightly but same idea. Here's the
two most unique ones :

[  122.525452] gmc_v8_0_process_interrupt: 14 callbacks suppressed
[  122.525456] amdgpu :06:00.0: GPU fault detected: 146 0x0140440c
for process gnome-shell pid 2069 thread gnome-shel:cs0 pid 2084
[  122.525459] amdgpu :06:00.0:
VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x0028
[  122.525460] amdgpu :06:00.0:
VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0E04400C
[  122.525462] amdgpu :06:00.0: VM fault (0x0c, vmid 7, pasid
32770) at page 40, read from 'TC1' (0x54433100) (68)
[  127.745969] [drm:amdgpu_dm_commit_planes.constprop.0 [amdgpu]]
*ERROR* Waiting for fences timed out or interrupted!
[  129.675374] clocksource: timekeeping watchdog on CPU0: Marking
clocksource 'tsc' as unstable because the skew is too large:
[  129.675377] clocksource:   'hpet' wd_now:
6ef9849c wd_last: 6dd444eb mask: 
[  129.675377] clocksource:   'tsc' cs_now:
10c4a3c5c88 cs_last: 10b779feadc mask: 
[  129.675378] tsc: Marking TSC unstable due to clocksource watchdog
[  130.480703] igb :07:00.0 enp7s0: PCIe link lost

The above was with "iommu=off pci=noats amdgpu.cg_mask=0 amdgpu.pg_mask=0"

I also saw this stack trace with iommu=pt :

[   89.211541] [drm:amdgpu_dm_commit_planes.constprop.0 [amdgpu]]
*ERROR* Waiting for fences timed out or interrupted!
[   89.463287] invalid opcode:  [#1] SMP NOPTI
[   89.463292] CPU: 1 PID: 1647 Comm: InputThread Tainted: P
OE 5.3.0-18-generic #19-Ubuntu
[   89.463294] Hardware name: To Be Filled By O.E.M. To Be Filled By
O.E.M./X570 Pro4, BIOS P1.70 09/10/2019
[   89.463383] RIP: 0010:amdgpu_dm_atomic_check+0x63c/0x6c0 [amdgpu]
[   89.463385] Code: 8d 78 f0 49 39 c5 0f 85 0a fb ff ff e9 5c fb ff
ff 41 89 c4 e9 ec fe ff ff 41 89 c4 e9 14 f0 9e 00 44 85 ff 0f 85 b8
fd ff ce  b0 1c 10 02 24 84 00 ff ff ff 00 8b 90 c0 48 89 b0 e8 7d
eb 19
[   89.463387] RSP: 0018:a4d001adf9c0 EFLAGS: 00010246
[   89.463389] RAX:  RBX:  RCX: 
[   89.463391] RDX: 09f6 RSI: 8e45de670140 RDI: 00030140
[   89.463392] RBP: a4d001adfa20 R08: 8e45b78adc00 R09: 
[   89.463393] R10: 8e45d827 R11: 8e45da49d000 R12: 
[   89.463394] R13: 8e45d827 R14: 8e45cb4f2480 R15: 
[   89.463396] FS:  7fbd627fc700() GS:8e45de64()
knlGS:
[   89.463397] CS:  0010 DS:  ES:  CR0: 80050033
[   89.463399] CR2: 55f405646450 CR3: 000811268000 CR4: 00340ee0
[   89.463400] Call Trace:
[   89.463420]  drm_atomic_check_only+0x2d6/0x3d0 [drm]
[   89.463433]  drm_atomic_commit+0x18/0x50 [drm]
[   89.463443]  drm_atomic_helper_update_plane+0xea/0x100 [drm_kms_helper]
[   89.463457]  __setplane_atomic+0xcb/0x110 [drm]
[   89.463470]  drm_mode_cursor_universal+0x140/0x260 [drm]
[   89.463484]  drm_mode_cursor_common+0xcc/0x220 [drm]
[   89.463496]  ? drm_mode_setplane+0x2b0/0x2b0 [drm]
[   89.463507]  drm_mode_cursor_ioctl+0x4a/0x60 [drm]
[   89.463519]  drm_ioctl_kernel+0xae/0xf0 [drm]
[   89.463531]  drm_ioctl+0x234/0x3d0 [drm]
[   89.463542]  ? drm_mode_setplane+0x2b0/0x2b0 [drm]
[   89.463548]  ? _copy_to_user+0x2c/0x30
[   89.463551]  ? input_event_to_user+0x42/0xa0
[   89.463604]  amdgpu_drm_ioctl+0x4e/0x80 [amdgpu]
[   89.463608]  do_vfs_ioctl+0x407/0x670
[   89.463611]  ? __vfs_read+0x1b/0x40
[   89.463613]  ? vfs_read+0xab/0x160
[   89.463616]  ksys_ioctl+0x67/0x90
[   89.463619]  __x64_sys_ioctl+0x1a/0x20
[   89.463622]  do_syscall_64+0x5a/0x130
[   89.463625]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   89.463627] RIP: 0033:0x7fbe0550667b
[   89.463629] Code: 0f 1e fa 48 8b 05 15 28 0d 00 64 c7 00 26 00 00
00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00
00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e5 27 0d 00 f7 d8 64 89
01 48
[   89.463631] RSP: 002b:7fbd627fa2d8 EFLAGS: 3246 ORIG_RAX:
0010
[   89.463632] RAX: ffda RBX: 7fbd627fa310 RCX: 7fbe0550667b
[   89.463634] RDX: 7fbd627fa310 RSI: c01c64a3 RDI: 000d
[   89.463635] RBP: c01c64a3 R08: 002a R09: 0001
[   89.463636] R10:  R11: 3246 R12: 55d1b53bd790
[   89.463637] R13: 000d R14: 002e R15: 057a
[   89.463641] Modules linked in: edac_mce_amd kvm_amd binfmt_misc
nls_iso8859_1 kvm irqbypass nvidia_uvm(OE) snd_hda_codec_generic
ledtrig_audio crct10dif_pclmul snd_hda_codec_hdmi crc32_pclmul
nvidia_drm(POE) amdgpu 

[PATCH] drm/amdkfd: kfd open return failed if device is locked

2019-10-18 Thread Yang, Philip
If device is locked for suspend and resume, kfd open should return
failed -EAGAIN without creating process, otherwise the application exit
to release the process will hang to wait for resume is done if the suspend
and resume is stuck somewhere. This is backtrace:

[Thu Oct 17 16:43:37 2019] INFO: task rocminfo:3024 blocked for more
than 120 seconds.
[Thu Oct 17 16:43:37 2019]   Not tainted
5.0.0-rc1-kfd-compute-rocm-dkms-no-npi-1131 #1
[Thu Oct 17 16:43:37 2019] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
[Thu Oct 17 16:43:37 2019] rocminfoD0  3024   2947
0x8000
[Thu Oct 17 16:43:37 2019] Call Trace:
[Thu Oct 17 16:43:37 2019]  ? __schedule+0x3d9/0x8a0
[Thu Oct 17 16:43:37 2019]  schedule+0x32/0x70
[Thu Oct 17 16:43:37 2019]  schedule_preempt_disabled+0xa/0x10
[Thu Oct 17 16:43:37 2019]  __mutex_lock.isra.9+0x1e3/0x4e0
[Thu Oct 17 16:43:37 2019]  ? __call_srcu+0x264/0x3b0
[Thu Oct 17 16:43:37 2019]  ? process_termination_cpsch+0x24/0x2f0
[amdgpu]
[Thu Oct 17 16:43:37 2019]  process_termination_cpsch+0x24/0x2f0
[amdgpu]
[Thu Oct 17 16:43:37 2019]
kfd_process_dequeue_from_all_devices+0x42/0x60 [amdgpu]
[Thu Oct 17 16:43:37 2019]  kfd_process_notifier_release+0x1be/0x220
[amdgpu]
[Thu Oct 17 16:43:37 2019]  __mmu_notifier_release+0x3e/0xc0
[Thu Oct 17 16:43:37 2019]  exit_mmap+0x160/0x1a0
[Thu Oct 17 16:43:37 2019]  ? __handle_mm_fault+0xba3/0x1200
[Thu Oct 17 16:43:37 2019]  ? exit_robust_list+0x5a/0x110
[Thu Oct 17 16:43:37 2019]  mmput+0x4a/0x120
[Thu Oct 17 16:43:37 2019]  do_exit+0x284/0xb20
[Thu Oct 17 16:43:37 2019]  ? handle_mm_fault+0xfa/0x200
[Thu Oct 17 16:43:37 2019]  do_group_exit+0x3a/0xa0
[Thu Oct 17 16:43:37 2019]  __x64_sys_exit_group+0x14/0x20
[Thu Oct 17 16:43:37 2019]  do_syscall_64+0x4f/0x100
[Thu Oct 17 16:43:37 2019]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Signed-off-by: Philip Yang 
---
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index d9e36dbf13d5..40d75c39f08e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -120,13 +120,13 @@ static int kfd_open(struct inode *inode, struct file 
*filep)
return -EPERM;
}
 
+   if (kfd_is_locked())
+   return -EAGAIN;
+
process = kfd_create_process(filep);
if (IS_ERR(process))
return PTR_ERR(process);
 
-   if (kfd_is_locked())
-   return -EAGAIN;
-
dev_dbg(kfd_device, "process %d opened, compat mode (32 bit) - %d\n",
process->pasid, process->is_32bit_user_mode);
 
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: Spontaneous reboots when using RX 560

2019-10-18 Thread Alex Deucher
Does disabling the IOMMU help?  E.g., append IOMMU=off or IOMMU=pt on
the kernel command line in grub.

Alex

On Fri, Oct 18, 2019 at 8:06 AM Sylvain Munaut <246...@gmail.com> wrote:
>
> Hi Christian,
>
>
> > I would also test if disabling power features helps as well, try to add
> > amdgpu.pg_mask=0 and amdgpu.cg_mask=0 to the kernel command line for
> > example.
>
> Thanks for the suggestion.
> Just tried this, no luck. Also tried 'runpm=0' (but apparently that's
> for laptop only so ...)
>
> Even with cg_mask=0, I still see this in amdgpu_pm_info, not sure if
> that's expected of if somehow the option was ignored ?
>
> 
> Clock Gating Flags Mask: 0x16b00
> Graphics Medium Grain Clock Gating: Off
> Graphics Medium Grain memory Light Sleep: Off
> Graphics Coarse Grain Clock Gating: Off
> Graphics Coarse Grain memory Light Sleep: Off
> Graphics Coarse Grain Tree Shader Clock Gating: Off
> Graphics Coarse Grain Tree Shader Light Sleep: Off
> Graphics Command Processor Light Sleep: Off
> Graphics Run List Controller Light Sleep: Off
> Graphics 3D Coarse Grain Clock Gating: Off
> Graphics 3D Coarse Grain memory Light Sleep: Off
> Memory Controller Light Sleep: On
> Memory Controller Medium Grain Clock Gating: On
> System Direct Memory Access Light Sleep: Off
> System Direct Memory Access Medium Grain Clock Gating: On
> Bus Interface Medium Grain Clock Gating: Off
> Bus Interface Light Sleep: Off
> Unified Video Decoder Medium Grain Clock Gating: On
> Video Compression Engine Medium Grain Clock Gating: On
> Host Data Path Light Sleep: Off
> Host Data Path Medium Grain Clock Gating: On
> Digital Right Management Medium Grain Clock Gating: Off
> Digital Right Management Light Sleep: Off
> Rom Medium Grain Clock Gating: Off
> Data Fabric Medium Grain Clock Gating: Off
> Address Translation Hub Medium Grain Clock Gating: Off
> Address Translation Hub Light Sleep: Off
>
> GFX Clocks and Power:
> 300 MHz (MCLK)
> 214 MHz (SCLK)
> 387 MHz (PSTATE_SCLK)
> 625 MHz (PSTATE_MCLK)
> 775 mV (VDDGFX)
> 7.254 W (average GPU)
>
> GPU Temperature: 34 C
> GPU Load: 0 %
> MEM Load: 6 %
>
> UVD: Disabled
>
> VCE: Disabled
> 
>
> I'm not really sure what to try next. I unfortunately don't have
> access to any other card or any other motherboard I could use to test
> :/
> (Or anything fancy like pcie bus analyzer or stuff like that).
>
> My understanding of the first error message that shows up is that the
> card itself tries to make an access to a memory zone it's not allowed
> to right ?
> [  144.311704] amdgpu :06:00.0: AMD-Vi: Event logged
> [IO_PAGE_FAULT domain=0x address=0xa076010100 flags=0x0010]
>
> Cheers,
>
> Sylvain
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: Spontaneous reboots when using RX 560

2019-10-18 Thread Sylvain Munaut
Hi Christian,


> I would also test if disabling power features helps as well, try to add
> amdgpu.pg_mask=0 and amdgpu.cg_mask=0 to the kernel command line for
> example.

Thanks for the suggestion.
Just tried this, no luck. Also tried 'runpm=0' (but apparently that's
for laptop only so ...)

Even with cg_mask=0, I still see this in amdgpu_pm_info, not sure if
that's expected of if somehow the option was ignored ?


Clock Gating Flags Mask: 0x16b00
Graphics Medium Grain Clock Gating: Off
Graphics Medium Grain memory Light Sleep: Off
Graphics Coarse Grain Clock Gating: Off
Graphics Coarse Grain memory Light Sleep: Off
Graphics Coarse Grain Tree Shader Clock Gating: Off
Graphics Coarse Grain Tree Shader Light Sleep: Off
Graphics Command Processor Light Sleep: Off
Graphics Run List Controller Light Sleep: Off
Graphics 3D Coarse Grain Clock Gating: Off
Graphics 3D Coarse Grain memory Light Sleep: Off
Memory Controller Light Sleep: On
Memory Controller Medium Grain Clock Gating: On
System Direct Memory Access Light Sleep: Off
System Direct Memory Access Medium Grain Clock Gating: On
Bus Interface Medium Grain Clock Gating: Off
Bus Interface Light Sleep: Off
Unified Video Decoder Medium Grain Clock Gating: On
Video Compression Engine Medium Grain Clock Gating: On
Host Data Path Light Sleep: Off
Host Data Path Medium Grain Clock Gating: On
Digital Right Management Medium Grain Clock Gating: Off
Digital Right Management Light Sleep: Off
Rom Medium Grain Clock Gating: Off
Data Fabric Medium Grain Clock Gating: Off
Address Translation Hub Medium Grain Clock Gating: Off
Address Translation Hub Light Sleep: Off

GFX Clocks and Power:
300 MHz (MCLK)
214 MHz (SCLK)
387 MHz (PSTATE_SCLK)
625 MHz (PSTATE_MCLK)
775 mV (VDDGFX)
7.254 W (average GPU)

GPU Temperature: 34 C
GPU Load: 0 %
MEM Load: 6 %

UVD: Disabled

VCE: Disabled


I'm not really sure what to try next. I unfortunately don't have
access to any other card or any other motherboard I could use to test
:/
(Or anything fancy like pcie bus analyzer or stuff like that).

My understanding of the first error message that shows up is that the
card itself tries to make an access to a memory zone it's not allowed
to right ?
[  144.311704] amdgpu :06:00.0: AMD-Vi: Event logged
[IO_PAGE_FAULT domain=0x address=0xa076010100 flags=0x0010]

Cheers,

Sylvain
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/ttm: use the parent resv for ghost objects v2

2019-10-18 Thread Christian König
This way the TTM is destroyed with the correct dma_resv object
locked and we can even pipeline imported BO evictions.

v2: Limit this to only cases when the parent object uses a separate
reservation object as well. This fixes another OOM problem.

Signed-off-by: Christian König 
---
 drivers/gpu/drm/ttm/ttm_bo_util.c | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo_util.c 
b/drivers/gpu/drm/ttm/ttm_bo_util.c
index e030c27f53cf..45e440f80b7b 100644
--- a/drivers/gpu/drm/ttm/ttm_bo_util.c
+++ b/drivers/gpu/drm/ttm/ttm_bo_util.c
@@ -512,7 +512,9 @@ static int ttm_buffer_object_transfer(struct 
ttm_buffer_object *bo,
kref_init(>base.kref);
fbo->base.destroy = _transfered_destroy;
fbo->base.acc_size = 0;
-   fbo->base.base.resv = >base.base._resv;
+   if (bo->base.resv == >base._resv)
+   fbo->base.base.resv = >base.base._resv;
+
dma_resv_init(fbo->base.base.resv);
ret = dma_resv_trylock(fbo->base.base.resv);
WARN_ON(!ret);
@@ -711,7 +713,7 @@ int ttm_bo_move_accel_cleanup(struct ttm_buffer_object *bo,
if (ret)
return ret;
 
-   dma_resv_add_excl_fence(ghost_obj->base.resv, fence);
+   dma_resv_add_excl_fence(_obj->base._resv, fence);
 
/**
 * If we're not moving to fixed memory, the TTM object
@@ -724,7 +726,7 @@ int ttm_bo_move_accel_cleanup(struct ttm_buffer_object *bo,
else
bo->ttm = NULL;
 
-   ttm_bo_unreserve(ghost_obj);
+   dma_resv_unlock(_obj->base._resv);
ttm_bo_put(ghost_obj);
}
 
@@ -767,7 +769,7 @@ int ttm_bo_pipeline_move(struct ttm_buffer_object *bo,
if (ret)
return ret;
 
-   dma_resv_add_excl_fence(ghost_obj->base.resv, fence);
+   dma_resv_add_excl_fence(_obj->base._resv, fence);
 
/**
 * If we're not moving to fixed memory, the TTM object
@@ -780,7 +782,7 @@ int ttm_bo_pipeline_move(struct ttm_buffer_object *bo,
else
bo->ttm = NULL;
 
-   ttm_bo_unreserve(ghost_obj);
+   dma_resv_unlock(_obj->base._resv);
ttm_bo_put(ghost_obj);
 
} else if (from->flags & TTM_MEMTYPE_FLAG_FIXED) {
@@ -836,7 +838,7 @@ int ttm_bo_pipeline_gutting(struct ttm_buffer_object *bo)
if (ret)
return ret;
 
-   ret = dma_resv_copy_fences(ghost->base.resv, bo->base.resv);
+   ret = dma_resv_copy_fences(>base._resv, bo->base.resv);
/* Last resort, wait for the BO to be idle when we are OOM */
if (ret)
ttm_bo_wait(bo, false, false);
@@ -845,7 +847,7 @@ int ttm_bo_pipeline_gutting(struct ttm_buffer_object *bo)
bo->mem.mem_type = TTM_PL_SYSTEM;
bo->ttm = NULL;
 
-   ttm_bo_unreserve(ghost);
+   dma_resv_unlock(>base._resv);
ttm_bo_put(ghost);
 
return 0;
-- 
2.17.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: radeon backtrace on fedora 31

2019-10-18 Thread Christian König

Looks like just another race condition during suspend/resume to me.

Is that reproducible?

Christian.

Am 18.10.19 um 06:24 schrieb Dave Airlie:

5.3.4-300.fc31.x86_64

seems to be new.

https://retrace.fedoraproject.org/faf/reports/2726149/


Dave.


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 2/2] drm/amdgpu/psp11: fix typo in comment

2019-10-18 Thread Xu, Feifei
Series is reviewed by Feifei Xu 


> 在 2019年10月18日,18:59,Yuan, Xiaojie  写道:
> 
> Signed-off-by: Xiaojie Yuan 
> ---
> drivers/gpu/drm/amd/amdgpu/psp_v11_0.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c 
> b/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c
> index dfe85a1d79a5..4eb5bacb55f7 100644
> --- a/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c
> @@ -232,7 +232,7 @@ static int psp_v11_0_bootloader_load_kdb(struct 
> psp_context *psp)
>/* Copy PSP KDB binary to memory */
>memcpy(psp->fw_pri_buf, psp->kdb_start_addr, psp->kdb_bin_size);
> 
> -/* Provide the sys driver to bootloader */
> +/* Provide the PSP KDB to bootloader */
>WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36,
>   (uint32_t)(psp->fw_pri_mc_addr >> 20));
>psp_gfxdrv_command_reg = PSP_BL__LOAD_KEY_DATABASE;
> -- 
> 2.20.1
> 
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 2/2] drm/amdgpu/psp11: fix typo in comment

2019-10-18 Thread Yuan, Xiaojie
Signed-off-by: Xiaojie Yuan 
---
 drivers/gpu/drm/amd/amdgpu/psp_v11_0.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c
index dfe85a1d79a5..4eb5bacb55f7 100644
--- a/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c
@@ -232,7 +232,7 @@ static int psp_v11_0_bootloader_load_kdb(struct psp_context 
*psp)
/* Copy PSP KDB binary to memory */
memcpy(psp->fw_pri_buf, psp->kdb_start_addr, psp->kdb_bin_size);
 
-   /* Provide the sys driver to bootloader */
+   /* Provide the PSP KDB to bootloader */
WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_36,
   (uint32_t)(psp->fw_pri_mc_addr >> 20));
psp_gfxdrv_command_reg = PSP_BL__LOAD_KEY_DATABASE;
-- 
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH 1/2] drm/amdgpu/psp11: wait for sOS ready for ring creation

2019-10-18 Thread Yuan, Xiaojie
Signed-off-by: Xiaojie Yuan 
---
 drivers/gpu/drm/amd/amdgpu/psp_v11_0.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c
index e8e70b74ea5b..dfe85a1d79a5 100644
--- a/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/psp_v11_0.c
@@ -459,6 +459,14 @@ static int psp_v11_0_ring_create(struct psp_context *psp,
   0x8000, 0x8000, false);
 
} else {
+   /* Wait for sOS ready for ring creation */
+   ret = psp_wait_for(psp, SOC15_REG_OFFSET(MP0, 0, 
mmMP0_SMN_C2PMSG_64),
+  0x8000, 0x8000, false);
+   if (ret) {
+   DRM_ERROR("Failed to wait for sOS ready for ring 
creation\n");
+   return ret;
+   }
+
/* Write low address of the ring to C2PMSG_69 */
psp_ring_reg = lower_32_bits(ring->ring_mem_mc_addr);
WREG32_SOC15(MP0, 0, mmMP0_SMN_C2PMSG_69, psp_ring_reg);
-- 
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: Spontaneous reboots when using RX 560

2019-10-18 Thread Koenig, Christian
Am 17.10.19 um 22:12 schrieb Sylvain Munaut:
> So a bit more testing.
>
> I was using a bit of "unusual" config I guess, having 2 GPUs and some
> other pcie cards (10G, ..).
> So I simplified and went to the most standard thing I could think of,
> _just_ the RX 560 card plugged into the main PCIe 16x slot directly
> connected to the CPU.
>
> And exact same results, no change in behavior.
>
> So on one hand I'm happy that the other cards and having the AMD GPU
> in the second slot isn't the issue (because I really need that config
> that way), but on the other, I'm no closer to finding the issue :/

At least you tested quite a bunch of things which I would have suggested 
as well.

I would also test if disabling power features helps as well, try to add 
amdgpu.pg_mask=0 and amdgpu.cg_mask=0 to the kernel command line for 
example.

Regards,
Christian.

>
> Cheers,
>
>   Sylvain Munaut

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH][next] drm/amdgpu/psp: fix spelling mistake "initliaze" -> "initialize"

2019-10-18 Thread Colin King
From: Colin Ian King 

There is a spelling mistake in a DRM_ERROR error message. Fix it.

Signed-off-by: Colin Ian King 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index b996b5bc5804..fd7a73f4fa70 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -90,7 +90,7 @@ static int psp_sw_init(void *handle)
 
ret = psp_mem_training_init(psp);
if (ret) {
-   DRM_ERROR("Failed to initliaze memory training!\n");
+   DRM_ERROR("Failed to initialize memory training!\n");
return ret;
}
ret = psp_mem_training(psp, PSP_MEM_TRAIN_COLD_BOOT);
-- 
2.20.1