Re: [PATCH] drm/amdkfd: fix cp hang in eviction

2019-07-11 Thread Kuehling, Felix
On 2019-07-10 11:20 a.m., Huang, JinHuiEric wrote:
> The cp hang occurs in OCL conformance test only on supermicro
> platform which has 40 cores and the test generates 40 threads.
> The root cause is race condition in non-protected flags.
>
> The fix is to add flags of is_evicted and is_active(init_mqd())
> into protected area.
>
> Signed-off-by: Eric Huang 

Sorry, I missed this one. I only saw the one earlier that you recalled.

Reviewed-by: Felix Kuehling 


> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 16 +---
>   1 file changed, 9 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> index 9ffdda5..f23e17b 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> @@ -1157,12 +1157,7 @@ static int create_queue_cpsch(struct 
> device_queue_manager *dqm, struct queue *q,
>   
>   mqd_mgr = dqm->mqd_mgrs[get_mqd_type_from_queue_type(
>   q->properties.type)];
> - /*
> -  * Eviction state logic: mark all queues as evicted, even ones
> -  * not currently active. Restoring inactive queues later only
> -  * updates the is_evicted flag but is a no-op otherwise.
> -  */
> - q->properties.is_evicted = !!qpd->evicted;
> +
>   if (q->properties.type == KFD_QUEUE_TYPE_SDMA ||
>   q->properties.type == KFD_QUEUE_TYPE_SDMA_XGMI)
>   dqm->asic_ops.init_sdma_vm(dqm, q, qpd);
> @@ -1173,9 +1168,16 @@ static int create_queue_cpsch(struct 
> device_queue_manager *dqm, struct queue *q,
>   retval = -ENOMEM;
>   goto out_deallocate_doorbell;
>   }
> +
> + dqm_lock(dqm);
> + /*
> +  * Eviction state logic: mark all queues as evicted, even ones
> +  * not currently active. Restoring inactive queues later only
> +  * updates the is_evicted flag but is a no-op otherwise.
> +  */
> + q->properties.is_evicted = !!qpd->evicted;
>   mqd_mgr->init_mqd(mqd_mgr, >mqd, q->mqd_mem_obj,
>   >gart_mqd_addr, >properties);
> - dqm_lock(dqm);
>   
>   list_add(>list, >queues_list);
>   qpd->queue_count++;
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdkfd: fix cp hang in eviction

2019-07-11 Thread Huang, JinHuiEric
ping.

On 2019-07-10 11:20 a.m., Huang, JinHuiEric wrote:
> The cp hang occurs in OCL conformance test only on supermicro
> platform which has 40 cores and the test generates 40 threads.
> The root cause is race condition in non-protected flags.
>
> The fix is to add flags of is_evicted and is_active(init_mqd())
> into protected area.
>
> Signed-off-by: Eric Huang 
> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 16 +---
>   1 file changed, 9 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> index 9ffdda5..f23e17b 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> @@ -1157,12 +1157,7 @@ static int create_queue_cpsch(struct 
> device_queue_manager *dqm, struct queue *q,
>   
>   mqd_mgr = dqm->mqd_mgrs[get_mqd_type_from_queue_type(
>   q->properties.type)];
> - /*
> -  * Eviction state logic: mark all queues as evicted, even ones
> -  * not currently active. Restoring inactive queues later only
> -  * updates the is_evicted flag but is a no-op otherwise.
> -  */
> - q->properties.is_evicted = !!qpd->evicted;
> +
>   if (q->properties.type == KFD_QUEUE_TYPE_SDMA ||
>   q->properties.type == KFD_QUEUE_TYPE_SDMA_XGMI)
>   dqm->asic_ops.init_sdma_vm(dqm, q, qpd);
> @@ -1173,9 +1168,16 @@ static int create_queue_cpsch(struct 
> device_queue_manager *dqm, struct queue *q,
>   retval = -ENOMEM;
>   goto out_deallocate_doorbell;
>   }
> +
> + dqm_lock(dqm);
> + /*
> +  * Eviction state logic: mark all queues as evicted, even ones
> +  * not currently active. Restoring inactive queues later only
> +  * updates the is_evicted flag but is a no-op otherwise.
> +  */
> + q->properties.is_evicted = !!qpd->evicted;
>   mqd_mgr->init_mqd(mqd_mgr, >mqd, q->mqd_mem_obj,
>   >gart_mqd_addr, >properties);
> - dqm_lock(dqm);
>   
>   list_add(>list, >queues_list);
>   qpd->queue_count++;
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Recall: [PATCH] drm/amdkfd: fix cp hang in eviction

2019-07-10 Thread Huang, JinHuiEric
Huang, JinHuiEric would like to recall the message, "[PATCH] drm/amdkfd: fix cp 
hang in eviction".
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdkfd: fix cp hang in eviction

2019-07-10 Thread Huang, JinHuiEric
The cp hang occurs in OCL conformance test only on supermicro
platform which has 40 cores and the test generates 40 threads.
The root cause is race condition in non-protected flags.

The fix is to add flags of is_evicted and is_active(init_mqd())
into protected area.

Signed-off-by: Eric Huang 
---
 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 9ffdda5..f23e17b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -1157,12 +1157,7 @@ static int create_queue_cpsch(struct 
device_queue_manager *dqm, struct queue *q,
 
mqd_mgr = dqm->mqd_mgrs[get_mqd_type_from_queue_type(
q->properties.type)];
-   /*
-* Eviction state logic: mark all queues as evicted, even ones
-* not currently active. Restoring inactive queues later only
-* updates the is_evicted flag but is a no-op otherwise.
-*/
-   q->properties.is_evicted = !!qpd->evicted;
+
if (q->properties.type == KFD_QUEUE_TYPE_SDMA ||
q->properties.type == KFD_QUEUE_TYPE_SDMA_XGMI)
dqm->asic_ops.init_sdma_vm(dqm, q, qpd);
@@ -1173,9 +1168,16 @@ static int create_queue_cpsch(struct 
device_queue_manager *dqm, struct queue *q,
retval = -ENOMEM;
goto out_deallocate_doorbell;
}
+
+   dqm_lock(dqm);
+   /*
+* Eviction state logic: mark all queues as evicted, even ones
+* not currently active. Restoring inactive queues later only
+* updates the is_evicted flag but is a no-op otherwise.
+*/
+   q->properties.is_evicted = !!qpd->evicted;
mqd_mgr->init_mqd(mqd_mgr, >mqd, q->mqd_mem_obj,
>gart_mqd_addr, >properties);
-   dqm_lock(dqm);
 
list_add(>list, >queues_list);
qpd->queue_count++;
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdkfd: fix cp hang in eviction

2019-07-10 Thread Huang, JinHuiEric
The cp hang occurs in OCL conformance test only on supermicro
platform which has 40 cores and the test generates 40 threads.
The root cause is race condition in non-protected flags.

The fix is to add flags of is_evicted and is_active(init_mqd())
into protected area.

Signed-off-by: Eric Huang 
---
 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 9ffdda5..535c981 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -1157,12 +1157,7 @@ static int create_queue_cpsch(struct 
device_queue_manager *dqm, struct queue *q,
 
mqd_mgr = dqm->mqd_mgrs[get_mqd_type_from_queue_type(
q->properties.type)];
-   /*
-* Eviction state logic: mark all queues as evicted, even ones
-* not currently active. Restoring inactive queues later only
-* updates the is_evicted flag but is a no-op otherwise.
-*/
-   q->properties.is_evicted = !!qpd->evicted;
+
if (q->properties.type == KFD_QUEUE_TYPE_SDMA ||
q->properties.type == KFD_QUEUE_TYPE_SDMA_XGMI)
dqm->asic_ops.init_sdma_vm(dqm, q, qpd);
@@ -1173,9 +1168,17 @@ static int create_queue_cpsch(struct 
device_queue_manager *dqm, struct queue *q,
retval = -ENOMEM;
goto out_deallocate_doorbell;
}
+
+   dqm_lock(dqm);
+   /*
+* Eviction state logic: mark all queues as evicted, even ones
+* not currently active. Restoring inactive queues later only
+* updates the is_evicted flag but is a no-op otherwise.
+*/
+   q->properties.is_evicted = !!qpd->evicted;
+   q->properties.is_suspended = false;
mqd_mgr->init_mqd(mqd_mgr, >mqd, q->mqd_mem_obj,
>gart_mqd_addr, >properties);
-   dqm_lock(dqm);
 
list_add(>list, >queues_list);
qpd->queue_count++;
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx