from:"Ding, Pixel"

Re: [PATCH] drm/scheduler: don't update last scheduled fence in TDR

2018-05-06 Thread Ding, Pixel

3. killed this process, will call to drm_sched_entity_cleanup, will 
register drm_sched_entity_kill_jobs_cb to jobB->finished, but jobB isn't 
submitted at all.

I think that point is your misunderstanding. Killing process won’t stop jobB to 
be submitted.

— 
Sincerely Yours,
Pixel

On 2018/4/26, 12:23 PM, "amd-gfx on behalf of zhoucm1" 
<amd-gfx-boun...@lists.freedesktop.org on behalf of zhou...@amd.com> wrote:

NAK on it.

First of all, without this patch, Does it cause any issue?

second,

entity->last_scheduled present the last submiting job.

Your this change will break this meaning and don't work, e.g.

1. mirror list has jobA and jobB, assuming they are belonged to same 
entity, then the entity->last_scheduled is jobB->finished.

2. when you do recovery, re-submit jobA first, if don't update 
last_scheduled, then entity->last_scheduled still is jobB->finished.

3. killed this process, will call to drm_sched_entity_cleanup, will 
register drm_sched_entity_kill_jobs_cb to jobB->finished, but jobB isn't 
submitted at all.

So the change isn't necessary at all.

Regards,

David Zhou

On 2018年04月26日 11:47, Ding, Pixel wrote:
> Hi Monk,
>
> Please review it. Thanks.
>
> —
> Sincerely Yours,
> Pixel
>
>
> On 2018/4/25, 4:39 PM, "Pixel Ding" <pixel.d...@amd.com> wrote:
>
>  The current sequence in scheduler thread is:
>  1. update last sched fence
>  2. job begin (adding to mirror list)
>  3. job finish (remove from mirror list)
>  4. back to 1
>  
>  Since we update last sched prior to joining mirror list, the jobs
>  in mirror list already pass the last sched fence. TDR just run
>  the jobs in mirror list, so we should not update the last sched
>  fences in TDR.
>  
>  Signed-off-by: Pixel Ding <pixel.d...@amd.com>
>  ---
>   drivers/gpu/drm/scheduler/gpu_scheduler.c | 3 ---
>   1 file changed, 3 deletions(-)
>  
>  diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c 
b/drivers/gpu/drm/scheduler/gpu_scheduler.c
>  index 088ff2b..1f1dd70 100644
>  --- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
>  +++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
>  @@ -575,9 +575,6 @@ void drm_sched_job_recovery(struct 
drm_gpu_scheduler *sched)
>   fence = sched->ops->run_job(s_job);
>   atomic_inc(>hw_rq_count);
>   
>  -dma_fence_put(s_job->entity->last_scheduled);
>  -s_job->entity->last_scheduled = 
dma_fence_get(_fence->finished);
>  -
>   if (fence) {
>   s_fence->parent = dma_fence_get(fence);
>   r = dma_fence_add_callback(fence, _fence->cb,
>  --
>  2.7.4
>  
>  
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/scheduler: don't update last scheduled fence in TDR

2018-04-25 Thread Ding, Pixel

Hi Monk,

Please review it. Thanks.

— 
Sincerely Yours,
Pixel


On 2018/4/25, 4:39 PM, "Pixel Ding"  wrote:

The current sequence in scheduler thread is:
1. update last sched fence
2. job begin (adding to mirror list)
3. job finish (remove from mirror list)
4. back to 1

Since we update last sched prior to joining mirror list, the jobs
in mirror list already pass the last sched fence. TDR just run
the jobs in mirror list, so we should not update the last sched
fences in TDR.

Signed-off-by: Pixel Ding 
---
 drivers/gpu/drm/scheduler/gpu_scheduler.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/gpu_scheduler.c 
b/drivers/gpu/drm/scheduler/gpu_scheduler.c
index 088ff2b..1f1dd70 100644
--- a/drivers/gpu/drm/scheduler/gpu_scheduler.c
+++ b/drivers/gpu/drm/scheduler/gpu_scheduler.c
@@ -575,9 +575,6 @@ void drm_sched_job_recovery(struct drm_gpu_scheduler 
*sched)
fence = sched->ops->run_job(s_job);
atomic_inc(>hw_rq_count);
 
-   dma_fence_put(s_job->entity->last_scheduled);
-   s_job->entity->last_scheduled = 
dma_fence_get(_fence->finished);
-
if (fence) {
s_fence->parent = dma_fence_get(fence);
r = dma_fence_add_callback(fence, _fence->cb,
-- 
2.7.4



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 2/4] drm/amdgpu: refactoring mailbox to fix TDR handshake bugs

2018-03-07 Thread Ding, Pixel

Hi Monk,

See comments inline.
— 
Sincerely Yours,
Pixel

On 07/03/2018, 4:58 PM, "Liu, Monk" <monk@amd.com> wrote:

HI Pixel

The patch better not to be split, otherwise the stress TDR test still fail 
since there are couple issues and some of them mixed together 

For your concerns:
> Any VF ACK could lead missing msg if timing is bad while hypervisor send 
msg quite frequently and guest send ack frequently (suck like the case 
FLR_NOTIFY followed by READY_TO_ACCESS_GPU immediately). Is it correct? If so, 
maybe we need further protection in GIM.

We had a lot of fixings in GIM side along with this patch to make stress 
TDR test finally passed, you can watch the history, anyway GIM side is not 
related with this patch

[Pixel]Per talked offline, understand your point here.

> it’s not clear to understand the last 2 lines of code here to set 
RCV_MSG_ACK bit: 0x10 to +1 byte offset. Can we define some macros to make it 
clear?
Can you do that ? looks to me the MACRO in amdgpu only works for DW aligned 
register names, so I'm afraid we cannot do that gracefully  

[Pixel] some readable macros can help us easily understand which register field 
is accessed.

> Can we also set valid bit for IDH_FLR_NOTIFICATION_CMPL event in GIM? By 
now guest use peek_msg to catch this event, so I think it’s OK to set valid for 
it. Then the code here would be clear without CMPL condition. We can also use 
peek_ack to ack CMPL if it’s necessary. any problem?

I don't understand your point 
[Pixel]generally, CMPL is not handled by rcv_msg anymore but in peek_msg. we 
can remove the logics here.

-Original Message-----
From: Ding, Pixel 
Sent: 2018年3月7日 16:31
To: Liu, Monk <monk@amd.com>; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 2/4] drm/amdgpu: refactoring mailbox to fix TDR 
handshake bugs

Hi Monk,

It’s a long patch which could split into 6 pieces at most… anyway I got 
some questions inline to have a better understanding.

— 
Sincerely Yours,
Pixel

On 06/03/2018, 11:29 AM, "amd-gfx on behalf of Monk Liu" 
<amd-gfx-boun...@lists.freedesktop.org on behalf of monk@amd.com> wrote:

this patch actually refactor mailbox implmentations, and
all below changes are needed together to fix all those mailbox
handshake issues exposured by heavey TDR test.

1)refactor all mailbox functions based on byte accessing for mb_control
reason is to avoid touching non-related bits when writing trn/rcv part 
of
mailbox_control, this way some incorrect INTR sent to hypervisor
side could be avoided, and it fixes couple handshake bug.

2)trans_msg function re-impled: put a invalid
logic before transmitting message to make sure the ACK bit is in
a clear status, otherwise there is chance that ACK asserted already
before transmitting message and lead to fake ACK polling.
(hypervisor side have some tricks to workaround ACK bit being corrupted
by VF FLR which hase an side effects that may make guest side ACK bit
asserted wrongly), and clear TRANS_MSG words after message transferred.

3)for mailbox_flr_work, it is also re-worked: it takes the mutex lock
first if invoked, to block gpu recover's participate too early while
hypervisor side is doing VF FLR. (hypervisor sends FLR_NOTIFY to guest
before doing VF FLR and sentds FLR_COMPLETE after VF FLR done, and
the FLR_NOTIFY will trigger interrupt to guest which lead to
mailbox_flr_work being invoked)

This can avoid the issue that mailbox trans msg being cleared by its VF 
FLR.

4)for mailbox_rcv_irq IRQ routine, it should only peek msg and schedule
mailbox_flr_work, instead of ACK to hypervisor itself, because 
FLR_NOTIFY
msg sent from hypervisor side doesn't need VF's ACK (this is because
VF's ACK would lead to hypervisor clear its trans_valid/msg, and this
would cause handshake bug if trans_valid/msg is cleared not due to
correct VF ACK but from a wrong VF ACK like this "FLR_NOTIFY" one)

Any VF ACK could lead missing msg if timing is bad while hypervisor send 
msg quite frequently and guest send ack frequently (suck like the case 
FLR_NOTIFY followed by READY_TO_ACCESS_GPU immediately). Is it correct? If so, 
maybe we need further protection in GIM.

This fixed handshake bug that sometimes GUEST always couldn't receive
"READY_TO_ACCESS_GPU" msg from hypervisor.

5)seperate polling time limite accordingly:
POLL ACK cost no more than 500ms
POLL MSG cost no more than 12000ms
POLL FLR finish cost no more than 500ms

6)

Re: [PATCH 2/4] drm/amdgpu: refactoring mailbox to fix TDR handshake bugs

2018-03-07 Thread Ding, Pixel

Hi Monk,

It’s a long patch which could split into 6 pieces at most… anyway I got some 
questions inline to have a better understanding.

— 
Sincerely Yours,
Pixel


On 06/03/2018, 11:29 AM, "amd-gfx on behalf of Monk Liu" 
 wrote:

this patch actually refactor mailbox implmentations, and
all below changes are needed together to fix all those mailbox
handshake issues exposured by heavey TDR test.

1)refactor all mailbox functions based on byte accessing for mb_control
reason is to avoid touching non-related bits when writing trn/rcv part of
mailbox_control, this way some incorrect INTR sent to hypervisor
side could be avoided, and it fixes couple handshake bug.

2)trans_msg function re-impled: put a invalid
logic before transmitting message to make sure the ACK bit is in
a clear status, otherwise there is chance that ACK asserted already
before transmitting message and lead to fake ACK polling.
(hypervisor side have some tricks to workaround ACK bit being corrupted
by VF FLR which hase an side effects that may make guest side ACK bit
asserted wrongly), and clear TRANS_MSG words after message transferred.

3)for mailbox_flr_work, it is also re-worked: it takes the mutex lock
first if invoked, to block gpu recover's participate too early while
hypervisor side is doing VF FLR. (hypervisor sends FLR_NOTIFY to guest
before doing VF FLR and sentds FLR_COMPLETE after VF FLR done, and
the FLR_NOTIFY will trigger interrupt to guest which lead to
mailbox_flr_work being invoked)

This can avoid the issue that mailbox trans msg being cleared by its VF FLR.

4)for mailbox_rcv_irq IRQ routine, it should only peek msg and schedule
mailbox_flr_work, instead of ACK to hypervisor itself, because FLR_NOTIFY
msg sent from hypervisor side doesn't need VF's ACK (this is because
VF's ACK would lead to hypervisor clear its trans_valid/msg, and this
would cause handshake bug if trans_valid/msg is cleared not due to
correct VF ACK but from a wrong VF ACK like this "FLR_NOTIFY" one)

Any VF ACK could lead missing msg if timing is bad while hypervisor send msg 
quite frequently and guest send ack frequently (suck like the case FLR_NOTIFY 
followed by READY_TO_ACCESS_GPU immediately). Is it correct? If so, maybe we 
need further protection in GIM.

This fixed handshake bug that sometimes GUEST always couldn't receive
"READY_TO_ACCESS_GPU" msg from hypervisor.

5)seperate polling time limite accordingly:
POLL ACK cost no more than 500ms
POLL MSG cost no more than 12000ms
POLL FLR finish cost no more than 500ms

6) we still need to set adev into in_gpu_reset mode after we received
FLR_NOTIFY from host side, this can prevent innocent app wrongly succesed
to open amdgpu dri device.

FLR_NOFITY is received due to an IDLE hang detected from hypervisor side
which indicating GPU is already die in this VF.

Change-Id: I17df8b4490a5b53a1cc2bd6c8f9bc3ee14c23f1a
Signed-off-by: Monk Liu 
---
 drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c | 200 
++
 drivers/gpu/drm/amd/amdgpu/mxgpu_ai.h |   4 +-
 2 files changed, 111 insertions(+), 93 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c 
b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
index 271452d..8d09380 100644
--- a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
@@ -33,56 +33,42 @@
 
 static void xgpu_ai_mailbox_send_ack(struct amdgpu_device *adev)
 {
-   u32 reg;
-   int timeout = AI_MAILBOX_TIMEDOUT;
-   u32 mask = REG_FIELD_MASK(BIF_BX_PF0_MAILBOX_CONTROL, RCV_MSG_VALID);
-
-   reg = RREG32_NO_KIQ(SOC15_REG_OFFSET(NBIO, 0,
-mmBIF_BX_PF0_MAILBOX_CONTROL));
-   reg = REG_SET_FIELD(reg, BIF_BX_PF0_MAILBOX_CONTROL, RCV_MSG_ACK, 1);
-   WREG32_NO_KIQ(SOC15_REG_OFFSET(NBIO, 0,
-  mmBIF_BX_PF0_MAILBOX_CONTROL), reg);
-
-   /*Wait for RCV_MSG_VALID to be 0*/
-   reg = RREG32_NO_KIQ(SOC15_REG_OFFSET(NBIO, 0,
-mmBIF_BX_PF0_MAILBOX_CONTROL));
-   while (reg & mask) {
-   if (timeout <= 0) {
-   pr_err("RCV_MSG_VALID is not cleared\n");
-   break;
-   }
-   mdelay(1);
-   timeout -=1;
-
-   reg = RREG32_NO_KIQ(SOC15_REG_OFFSET(NBIO, 0,
-
mmBIF_BX_PF0_MAILBOX_CONTROL));
-   }
  + const u32 offset = SOC15_REG_OFFSET(NBIO, 0, 
mmBIF_BX_PF0_MAILBOX_CONTROL) * 4 + 1;
  + WREG8(offset, 2);

it’s not clear to understand the last 2 lines of code here to set RCV_MSG_ACK 
bit: 0x10 to +1

Re: [PATCH 1/2] drm/amdgpu: imlement mmio byte access helpers

2018-03-06 Thread Ding, Pixel

Actually, for mailbox registers once the byte field is touched even not 
changed, the mailbox behaves, so we need the byte width accessing to those sort 
of regs.

Please correct the typo in commit title. With that change,
Reviewed-by: Pixel Ding 


— 
Sincerely Yours,
Pixel


On 06/03/2018, 7:05 PM, "amd-gfx on behalf of Monk Liu" 
 wrote:

mailbox register can be accessed with a byte boundry according
to BIF team, so this patch prepares register byte access
and will be used by following patches

Change-Id: I1e84f1c6e8e75dc42eb5be09c492fa5e7eb7502a
Signed-off-by: Monk Liu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h|  6 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 26 ++
 2 files changed, 32 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 292c7e7..72385bb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1635,6 +1635,9 @@ uint32_t amdgpu_mm_rreg(struct amdgpu_device *adev, 
uint32_t reg,
uint32_t acc_flags);
 void amdgpu_mm_wreg(struct amdgpu_device *adev, uint32_t reg, uint32_t v,
uint32_t acc_flags);
+void amdgpu_mm_wreg8(struct amdgpu_device *adev, uint32_t offset, uint8_t 
value);
+uint8_t amdgpu_mm_rreg8(struct amdgpu_device *adev, uint32_t offset);
+
 u32 amdgpu_io_rreg(struct amdgpu_device *adev, u32 reg);
 void amdgpu_io_wreg(struct amdgpu_device *adev, u32 reg, u32 v);
 
@@ -1658,6 +1661,9 @@ int emu_soc_asic_init(struct amdgpu_device *adev);
 #define RREG32_NO_KIQ(reg) amdgpu_mm_rreg(adev, (reg), AMDGPU_REGS_NO_KIQ)
 #define WREG32_NO_KIQ(reg, v) amdgpu_mm_wreg(adev, (reg), (v), 
AMDGPU_REGS_NO_KIQ)
 
+#define RREG8(reg) amdgpu_mm_rreg8(adev, (reg))
+#define WREG8(reg, v) amdgpu_mm_wreg8(adev, (reg), (v))
+
 #define RREG32(reg) amdgpu_mm_rreg(adev, (reg), 0)
 #define RREG32_IDX(reg) amdgpu_mm_rreg(adev, (reg), AMDGPU_REGS_IDX)
 #define DREG32(reg) printk(KERN_INFO "REGISTER: " #reg " : 0x%08X\n", 
amdgpu_mm_rreg(adev, (reg), 0))
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 65584f6..c8e1940 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -121,6 +121,32 @@ uint32_t amdgpu_mm_rreg(struct amdgpu_device *adev, 
uint32_t reg,
return ret;
 }
 
+/*
+ * MMIO register read with bytes helper functions
+ * @offset:bytes offset from MMIO start
+ *
+*/
+
+uint8_t amdgpu_mm_rreg8(struct amdgpu_device *adev, uint32_t offset) {
+   if (offset < adev->rmmio_size)
+   return (readb(adev->rmmio + offset));
+   BUG();
+}
+
+/*
+ * MMIO register write with bytes helper functions
+ * @offset:bytes offset from MMIO start
+ * @value: the value want to be written to the register
+ *
+*/
+void amdgpu_mm_wreg8(struct amdgpu_device *adev, uint32_t offset, uint8_t 
value) {
+   if (offset < adev->rmmio_size)
+   writeb(value, adev->rmmio + offset);
+   else
+   BUG();
+}
+
+
 void amdgpu_mm_wreg(struct amdgpu_device *adev, uint32_t reg, uint32_t v,
uint32_t acc_flags)
 {
-- 
2.7.4

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: try again kiq access if not in IRQ(v2)

2018-03-01 Thread Ding, Pixel

Reviewed-by: Pixel Ding 

— 
Sincerely Yours,
Pixel


On 01/03/2018, 4:21 PM, "amd-gfx on behalf of Liu, Monk" 
 wrote:

Hi Christian

KIQ is used by kernel, and we submit commands on it without gpu scheduler, 
In sriov env, register access must go through KIQ in non-exclusive mode 
because we don’t want one VF access MMIO during the period that this VF is not 
occupying 
GPU resource,

But since now we use busy polling wait for the fence of KIQ, this way it is 
too latency to let KIQ wait before KIQ job finish, but since KIQ is also part 
of CP
So it may serve other VFs, and that cause a situation that KIQ job in 
certain VF may need at least 1~2 seconds, which is not good to use busy wait,

So my patch changed something to let this busy wait bail out for a moment 
when not in IRQ context, another concern is if KIQ is in current VF, but 
current 
VF already die, then all KIQ job will not signal, so it's still need this 
bail out and retry scheme to make sure CPU not always in that busy polling 
status

Thanks
/Monk

-Original Message-
From: Christian König [mailto:ckoenig.leichtzumer...@gmail.com] 
Sent: 2018年3月1日 16:13
To: Liu, Monk ; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amdgpu: try again kiq access if not in IRQ(v2)

 From a coding style perspective a define for the 5ms timeout would be nice 
to have and it might be better to use a do { } while instead of the goto, but 
those are really minor issues.

But from the technical perspective I can't fully judge if that is a good 
idea or not cause I'm not so deeply know how the KIQ works.

Christian.

Am 01.03.2018 um 06:52 schrieb Liu, Monk:
> Anyone review this ?
>
> -Original Message-
> From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf 
> Of Monk Liu
> Sent: 2018年2月28日 15:28
> To: amd-gfx@lists.freedesktop.org
> Cc: Liu, Monk 
> Subject: [PATCH] drm/amdgpu: try again kiq access if not in IRQ(v2)
>
> sometimes GPU is switched to other VFs and won't swich back soon, so 
> the kiq reg access will not signal within a short period, instead of 
> busy waiting a long time(MAX_KEQ_REG_WAIT) and returning TMO we can 
> istead sleep 5ms and try again later (non irq context)
>
> And since the waiting in kiq_r/weg is busy wait, so MAX_KIQ_REG_WAIT 
shouldn't set to a long time, set it to 10ms is more appropriate.
>
> if gpu already in reset state, don't retry the KIQ reg access otherwise 
it would always hang because KIQ was already die usually.
>
> v2:
> replace schedule() with msleep() for the wait
>
> Change-Id: I8fc807ce85a8d30d2b50153f3f3a6eda344ef994
> Signed-off-by: Monk Liu 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 15 +--
>   1 file changed, 13 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> index b832651..1672f5b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> @@ -22,7 +22,7 @@
>*/
>   
>   #include "amdgpu.h"
> -#define MAX_KIQ_REG_WAIT 1 /* in usecs */
> +#define MAX_KIQ_REG_WAIT 1 /* in usecs, 10ms */
>   
>   uint64_t amdgpu_csa_vaddr(struct amdgpu_device *adev)  { @@ -152,9 
+152,14 @@ uint32_t amdgpu_virt_kiq_rreg(struct amdgpu_device *adev, uint32_t 
reg)
>   amdgpu_ring_commit(ring);
>   spin_unlock_irqrestore(>ring_lock, flags);
>   
> +retry_read:
>   r = amdgpu_fence_wait_polling(ring, seq, MAX_KIQ_REG_WAIT);
>   if (r < 1) {
>   DRM_ERROR("wait for kiq fence error: %ld\n", r);
> + if (!in_interrupt() && !adev->in_gpu_reset) {
> + msleep(5);
> + goto retry_read;
> + }
>   return ~0;
>   }
>   val = adev->wb.wb[adev->virt.reg_val_offs];
> @@ -179,9 +184,15 @@ void amdgpu_virt_kiq_wreg(struct amdgpu_device 
*adev, uint32_t reg, uint32_t v)
>   amdgpu_ring_commit(ring);
>   spin_unlock_irqrestore(>ring_lock, flags);
>   
> +retry_write:
>   r = amdgpu_fence_wait_polling(ring, seq, MAX_KIQ_REG_WAIT);
> - if (r < 1)
> + if (r < 1) {
>   DRM_ERROR("wait for kiq fence error: %ld\n", r);
> + if (!in_interrupt() && !adev->in_gpu_reset) {
> + msleep(5);
> + goto retry_write;
> + }
> + }
>   }
>   
>   /**
> --
> 2.7.4
>
>

Re: [PATCH] drm/amdgpu: always use polling mem to update wptr

2017-12-11 Thread Ding, Pixel

Hi Monk,

Please review.

— 
Sincerely Yours,
Pixel


On 12/12/2017, 2:05 PM, "Pixel Ding"  wrote:

Both doorbell and polling mem are working on Tonga VF. SDMA issue
happens because SDMA engine accepts doorbell writes even if it's
inactive, that introduces conflict when world switch routine update
wptr though polling memory. Use polling mem in driver too.

Signed-off-by: Pixel Ding 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h |  1 +
 drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c   | 20 +++-
 2 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index 641e3fd..be09406 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -186,6 +186,7 @@ struct amdgpu_ring {
uint64_teop_gpu_addr;
u32 doorbell_index;
booluse_doorbell;
+   booluse_polling_mem;
unsignedwptr_offs;
unsignedfence_offs;
uint64_tcurrent_ctx;
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c 
b/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
index c8c93f9..b09562c 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
@@ -380,10 +380,10 @@ static void sdma_v3_0_ring_set_wptr(struct 
amdgpu_ring *ring)
 
if (ring->use_doorbell) {
u32 *wb = (u32 *)>wb.wb[ring->wptr_offs];
-
/* XXX check if swapping is necessary on BE */
WRITE_ONCE(*wb, (lower_32_bits(ring->wptr) << 2));
-   WDOORBELL32(ring->doorbell_index, lower_32_bits(ring->wptr) << 
2);
+   if (!ring->use_polling_mem)
+   WDOORBELL32(ring->doorbell_index, 
lower_32_bits(ring->wptr) << 2);
} else {
int me = (ring == >adev->sdma.instance[0].ring) ? 0 : 1;
 
@@ -718,10 +718,18 @@ static int sdma_v3_0_gfx_resume(struct amdgpu_device 
*adev)
WREG32(mmSDMA0_GFX_RB_WPTR_POLL_ADDR_HI + sdma_offsets[i],
   upper_32_bits(wptr_gpu_addr));
wptr_poll_cntl = RREG32(mmSDMA0_GFX_RB_WPTR_POLL_CNTL + 
sdma_offsets[i]);
-   if (amdgpu_sriov_vf(adev))
-   wptr_poll_cntl = REG_SET_FIELD(wptr_poll_cntl, 
SDMA0_GFX_RB_WPTR_POLL_CNTL, F32_POLL_ENABLE, 1);
+   if (ring->use_polling_mem) {
+   wptr_poll_cntl = REG_SET_FIELD(wptr_poll_cntl,
+  
SDMA0_GFX_RB_WPTR_POLL_CNTL,
+  F32_POLL_ENABLE, 1);
+   wptr_poll_cntl = REG_SET_FIELD(wptr_poll_cntl,
+  
SDMA0_GFX_RB_WPTR_POLL_CNTL,
+  ENABLE, 1);
+   }
else
-   wptr_poll_cntl = REG_SET_FIELD(wptr_poll_cntl, 
SDMA0_GFX_RB_WPTR_POLL_CNTL, F32_POLL_ENABLE, 0);
+   wptr_poll_cntl = REG_SET_FIELD(wptr_poll_cntl,
+  
SDMA0_GFX_RB_WPTR_POLL_CNTL,
+  F32_POLL_ENABLE, 0);
WREG32(mmSDMA0_GFX_RB_WPTR_POLL_CNTL + sdma_offsets[i], 
wptr_poll_cntl);
 
/* enable DMA RB */
@@ -1204,6 +1212,8 @@ static int sdma_v3_0_sw_init(void *handle)
ring = >sdma.instance[i].ring;
ring->ring_obj = NULL;
ring->use_doorbell = true;
+   if (amdgpu_sriov_vf(adev))
+   ring->use_polling_mem = true;
ring->doorbell_index = (i == 0) ?
AMDGPU_DOORBELL_sDMA_ENGINE0 : 
AMDGPU_DOORBELL_sDMA_ENGINE1;
 
-- 
2.9.5



___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu:cancel timer of virtual DCE(v2)

2017-11-16 Thread Ding, Pixel

I got it, the IH sw_fini is done later than DCE while your other patch cleans 
up timer in DCE sw_fini.
Maybe you can call amdgpu_irq_put() here, anyway your patch is already in:)
— 
Sincerely Yours,
Pixel







On 16/11/2017, 5:28 PM, "Ding, Pixel" <pixel.d...@amd.com> wrote:

>Hi Monk,
>
>I’m not for sure what issue you are fixing. The timer should be cancelled 
>while the IRQ is put to disable vblank . However there is a known rmmod issue 
>fixed with “02d319e drm/amdgpu/virtual_dce: Remove the rmmod error message”.
>
>Can you take a look at that patch or tell why or in which case the timer is 
>not cancelled during IRQ put?
>
>— 
>Sincerely Yours,
>Pixel
>
>
>
>
>
>
>
>On 16/11/2017, 2:06 PM, "amd-gfx on behalf of Monk Liu" 
><amd-gfx-boun...@lists.freedesktop.org on behalf of monk@amd.com> wrote:
>
>>virtual DCE Timer structure is already released
>>after its sw_fini(), so we need to cancel the
>>its Timer in hw_fini() otherwise the Timer canceling
>>is missed.
>>
>>v2:
>>use for loop and num_crtc to replace original code
>>
>>Change-Id: I03d6ca7aa07591d287da379ef4fe008f06edaff6
>>Signed-off-by: Monk Liu <monk@amd.com>
>>Reviewed-by: Alex Deucher <alexander.deuc...@amd.com>
>>---
>> drivers/gpu/drm/amd/amdgpu/dce_virtual.c | 10 ++
>> 1 file changed, 10 insertions(+)
>>
>>diff --git a/drivers/gpu/drm/amd/amdgpu/dce_virtual.c 
>>b/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>>index cd4895b4..943efc3 100644
>>--- a/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>>+++ b/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>>@@ -44,6 +44,9 @@ static void dce_virtual_set_display_funcs(struct 
>>amdgpu_device *adev);
>> static void dce_virtual_set_irq_funcs(struct amdgpu_device *adev);
>> static int dce_virtual_connector_encoder_init(struct amdgpu_device *adev,
>>int index);
>>+static void dce_virtual_set_crtc_vblank_interrupt_state(struct amdgpu_device 
>>*adev,
>>+ int crtc,
>>+ enum 
>>amdgpu_interrupt_state state);
>> 
>> /**
>>  * dce_virtual_vblank_wait - vblank wait asic callback.
>>@@ -550,6 +553,13 @@ static int dce_virtual_hw_init(void *handle)
>> 
>> static int dce_virtual_hw_fini(void *handle)
>> {
>>+ struct amdgpu_device *adev = (struct amdgpu_device *)handle;
>>+ int i = 0;
>>+
>>+ for (i = 0; imode_info.num_crtc; i++)
>>+ if (adev->mode_info.crtcs[i])
>>+ dce_virtual_set_crtc_vblank_interrupt_state(adev, i, 
>>AMDGPU_IRQ_STATE_DISABLE);
>>+
>>  return 0;
>> }
>> 
>>-- 
>>2.7.4
>>
>>___
>>amd-gfx mailing list
>>amd-gfx@lists.freedesktop.org
>>https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu:cancel timer of virtual DCE(v2)

2017-11-16 Thread Ding, Pixel

Hi Monk,

I’m not for sure what issue you are fixing. The timer should be cancelled while 
the IRQ is put to disable vblank . However there is a known rmmod issue fixed 
with “02d319e drm/amdgpu/virtual_dce: Remove the rmmod error message”.

Can you take a look at that patch or tell why or in which case the timer is not 
cancelled during IRQ put?

— 
Sincerely Yours,
Pixel







On 16/11/2017, 2:06 PM, "amd-gfx on behalf of Monk Liu" 
 wrote:

>virtual DCE Timer structure is already released
>after its sw_fini(), so we need to cancel the
>its Timer in hw_fini() otherwise the Timer canceling
>is missed.
>
>v2:
>use for loop and num_crtc to replace original code
>
>Change-Id: I03d6ca7aa07591d287da379ef4fe008f06edaff6
>Signed-off-by: Monk Liu 
>Reviewed-by: Alex Deucher 
>---
> drivers/gpu/drm/amd/amdgpu/dce_virtual.c | 10 ++
> 1 file changed, 10 insertions(+)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/dce_virtual.c 
>b/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>index cd4895b4..943efc3 100644
>--- a/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>+++ b/drivers/gpu/drm/amd/amdgpu/dce_virtual.c
>@@ -44,6 +44,9 @@ static void dce_virtual_set_display_funcs(struct 
>amdgpu_device *adev);
> static void dce_virtual_set_irq_funcs(struct amdgpu_device *adev);
> static int dce_virtual_connector_encoder_init(struct amdgpu_device *adev,
> int index);
>+static void dce_virtual_set_crtc_vblank_interrupt_state(struct amdgpu_device 
>*adev,
>+  int crtc,
>+  enum 
>amdgpu_interrupt_state state);
> 
> /**
>  * dce_virtual_vblank_wait - vblank wait asic callback.
>@@ -550,6 +553,13 @@ static int dce_virtual_hw_init(void *handle)
> 
> static int dce_virtual_hw_fini(void *handle)
> {
>+  struct amdgpu_device *adev = (struct amdgpu_device *)handle;
>+  int i = 0;
>+
>+  for (i = 0; imode_info.num_crtc; i++)
>+  if (adev->mode_info.crtcs[i])
>+  dce_virtual_set_crtc_vblank_interrupt_state(adev, i, 
>AMDGPU_IRQ_STATE_DISABLE);
>+
>   return 0;
> }
> 
>-- 
>2.7.4
>
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: revise retry init to fully cleanup driver

2017-11-08 Thread Ding, Pixel

When exclusive mode timeout happens, the VF is not active anymore. Exclusive 
requests will be ignored by host. Unload kms or device fini also request 
exclusive mode and it will get timeout again since no response received.

This only happens for exclusive mode timeout, so I didn’t put them in general 
SRIOV fini function.
— 
Sincerely Yours,
Pixel








On 08/11/2017, 5:42 PM, "Christian König"  
wrote:

>Am 08.11.2017 um 04:29 schrieb Pixel Ding:
>> Retry at drm_dev_register instead of amdgpu_device_init.
>>
>> Signed-off-by: Pixel Ding 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  2 ++
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c| 11 +--
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c| 15 ++-
>>   3 files changed, 13 insertions(+), 15 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index bf2b008..4ef2b1b 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -2390,6 +2390,8 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>>  amdgpu_virt_mmio_blocked(adev) &&
>>  !amdgpu_virt_wait_reset(adev)) {
>>  dev_err(adev->dev, "VF exclusive mode timeout\n");
>> +adev->virt.caps &= ~AMDGPU_SRIOV_CAPS_RUNTIME;
>> +adev->virt.ops = NULL;
>
>Why is that necessary? Maybe put this into some SRIOV specific fini 
>function?
>
>Apart from that patch looks good to me and is Acked-by: Christian König 
>.
>
>Regards,
>Christian.
>
>>  r = -EAGAIN;
>>  goto failed;
>>  }
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> index 6b11a75..eaccd4b 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> @@ -565,12 +565,13 @@ static int amdgpu_kick_out_firmware_fb(struct pci_dev 
>> *pdev)
>>  return 0;
>>   }
>>   
>> +
>>   static int amdgpu_pci_probe(struct pci_dev *pdev,
>>  const struct pci_device_id *ent)
>>   {
>>  struct drm_device *dev;
>>  unsigned long flags = ent->driver_data;
>> -int ret;
>> +int ret, retry = 0;
>>   
>>  if ((flags & AMD_EXP_HW_SUPPORT) && !amdgpu_exp_hw_support) {
>>  DRM_INFO("This hardware requires experimental hardware 
>> support.\n"
>> @@ -603,8 +604,14 @@ static int amdgpu_pci_probe(struct pci_dev *pdev,
>>   
>>  pci_set_drvdata(pdev, dev);
>>   
>> +retry_init:
>>  ret = drm_dev_register(dev, ent->driver_data);
>> -if (ret)
>> +if (ret == -EAGAIN && ++retry <= 3) {
>> +DRM_INFO("retry init %d\n", retry);
>> +/* Don't request EX mode too frequently which is attacking */
>> +msleep(5000);
>> +goto retry_init;
>> +} else if (ret)
>>  goto err_pci;
>>   
>>  return 0;
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> index 1d56b5b..65360cd 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> @@ -84,7 +84,7 @@ void amdgpu_driver_unload_kms(struct drm_device *dev)
>>   int amdgpu_driver_load_kms(struct drm_device *dev, unsigned long flags)
>>   {
>>  struct amdgpu_device *adev;
>> -int r, acpi_status, retry = 0;
>> +int r, acpi_status;
>>   
>>   #ifdef CONFIG_DRM_AMDGPU_SI
>>  if (!amdgpu_si_support) {
>> @@ -120,7 +120,6 @@ int amdgpu_driver_load_kms(struct drm_device *dev, 
>> unsigned long flags)
>>  }
>>  }
>>   #endif
>> -retry_init:
>>   
>>  adev = kzalloc(sizeof(struct amdgpu_device), GFP_KERNEL);
>>  if (adev == NULL) {
>> @@ -143,17 +142,7 @@ int amdgpu_driver_load_kms(struct drm_device *dev, 
>> unsigned long flags)
>>   * VRAM allocation
>>   */
>>  r = amdgpu_device_init(adev, dev, dev->pdev, flags);
>> -if (r == -EAGAIN && ++retry <= 3) {
>> -adev->virt.caps &= ~AMDGPU_SRIOV_CAPS_RUNTIME;
>> -adev->virt.ops = NULL;
>> -amdgpu_device_fini(adev);
>> -kfree(adev);
>> -dev->dev_private = NULL;
>> -/* Don't request EX mode too frequently which is attacking */
>> -msleep(5000);
>> -dev_err(>pdev->dev, "retry init %d\n", retry);
>> -goto retry_init;
>> -} else if (r) {
>> +if (r) {
>>  dev_err(>pdev->dev, "Fatal error during GPU init\n");
>>  goto out;
>>  }
>
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: revise retry init to fully cleanup driver

2017-11-07 Thread Ding, Pixel

Hi Christian,

Please help reviewing. 

Hi Gary, 

with this change debugfs will be cleaned up by DRM, we don’t need to handle it 
anymore.
— 
Sincerely Yours,
Pixel








On 08/11/2017, 11:29 AM, "amd-gfx on behalf of Pixel Ding" 
 wrote:

>Retry at drm_dev_register instead of amdgpu_device_init.
>
>Signed-off-by: Pixel Ding 
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  2 ++
> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c| 11 +--
> drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c| 15 ++-
> 3 files changed, 13 insertions(+), 15 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>index bf2b008..4ef2b1b 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>@@ -2390,6 +2390,8 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>   amdgpu_virt_mmio_blocked(adev) &&
>   !amdgpu_virt_wait_reset(adev)) {
>   dev_err(adev->dev, "VF exclusive mode timeout\n");
>+  adev->virt.caps &= ~AMDGPU_SRIOV_CAPS_RUNTIME;
>+  adev->virt.ops = NULL;
>   r = -EAGAIN;
>   goto failed;
>   }
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>index 6b11a75..eaccd4b 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>@@ -565,12 +565,13 @@ static int amdgpu_kick_out_firmware_fb(struct pci_dev 
>*pdev)
>   return 0;
> }
> 
>+
> static int amdgpu_pci_probe(struct pci_dev *pdev,
>   const struct pci_device_id *ent)
> {
>   struct drm_device *dev;
>   unsigned long flags = ent->driver_data;
>-  int ret;
>+  int ret, retry = 0;
> 
>   if ((flags & AMD_EXP_HW_SUPPORT) && !amdgpu_exp_hw_support) {
>   DRM_INFO("This hardware requires experimental hardware 
> support.\n"
>@@ -603,8 +604,14 @@ static int amdgpu_pci_probe(struct pci_dev *pdev,
> 
>   pci_set_drvdata(pdev, dev);
> 
>+retry_init:
>   ret = drm_dev_register(dev, ent->driver_data);
>-  if (ret)
>+  if (ret == -EAGAIN && ++retry <= 3) {
>+  DRM_INFO("retry init %d\n", retry);
>+  /* Don't request EX mode too frequently which is attacking */
>+  msleep(5000);
>+  goto retry_init;
>+  } else if (ret)
>   goto err_pci;
> 
>   return 0;
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>index 1d56b5b..65360cd 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>@@ -84,7 +84,7 @@ void amdgpu_driver_unload_kms(struct drm_device *dev)
> int amdgpu_driver_load_kms(struct drm_device *dev, unsigned long flags)
> {
>   struct amdgpu_device *adev;
>-  int r, acpi_status, retry = 0;
>+  int r, acpi_status;
> 
> #ifdef CONFIG_DRM_AMDGPU_SI
>   if (!amdgpu_si_support) {
>@@ -120,7 +120,6 @@ int amdgpu_driver_load_kms(struct drm_device *dev, 
>unsigned long flags)
>   }
>   }
> #endif
>-retry_init:
> 
>   adev = kzalloc(sizeof(struct amdgpu_device), GFP_KERNEL);
>   if (adev == NULL) {
>@@ -143,17 +142,7 @@ int amdgpu_driver_load_kms(struct drm_device *dev, 
>unsigned long flags)
>* VRAM allocation
>*/
>   r = amdgpu_device_init(adev, dev, dev->pdev, flags);
>-  if (r == -EAGAIN && ++retry <= 3) {
>-  adev->virt.caps &= ~AMDGPU_SRIOV_CAPS_RUNTIME;
>-  adev->virt.ops = NULL;
>-  amdgpu_device_fini(adev);
>-  kfree(adev);
>-  dev->dev_private = NULL;
>-  /* Don't request EX mode too frequently which is attacking */
>-  msleep(5000);
>-  dev_err(>pdev->dev, "retry init %d\n", retry);
>-  goto retry_init;
>-  } else if (r) {
>+  if (r) {
>   dev_err(>pdev->dev, "Fatal error during GPU init\n");
>   goto out;
>   }
>-- 
>2.9.5
>
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu:remove debugfs file in amdgpu_device_finish

2017-11-07 Thread Ding, Pixel

Hi Christian,

The retry_init only handles the failure caused by exclusive mode timeout. It 
checks the MMIO to see if there’s exclusive mode timeout, and retry init if 
there’s, otherwise just return error.

For exclusive timeout case, the host layer issues a FLR on this VF so driver 
needn't cleanup hardware status, amdgpu_device_fini here just is used to 
cleanup the software.

It’s tested and proved working correctly. Although the debugfs files are only 
the tip of the iceberg, it’s the only issue we found in this version of 
retry_init.

— 
Sincerely Yours,
Pixel







On 07/11/2017, 5:56 PM, "Koenig, Christian"  wrote:

>Hi Gary,
>
>well that patch is nonsense to begin with.
>
>amdgpu_device_init() does quite a bunch of other initialization which is 
>not cleaned up by amdgpu_device_fini(), so the debugfs files are only 
>the tip of the iceberg here.
>
>Please revert 2316518efc459928ad1d3d2d3511ea5fbda19475 and then we can 
>try again from scratch.
>
>What we need to do is return -EAGAIN from amdgpu_driver_load_kms. Then 
>in amdgpu_pci_probe() we can catch that error and call 
>drm_dev_register() multiple times if necessary.
>
>This way we can also optionally pci_disable_device() / 
>pci_enable_device() between tries if appropriate.
>
>Regards,
>Christian.
>
>Am 07.11.2017 um 09:02 schrieb Sun, Gary:
>> Hi Christian,
>>
>> The feature is for GPU virtualization and has been checked in, you can refer 
>> to the following patch or commit 75b126427778218b36cfb68637e4f8d0e584b8ef.
>>
>>  From 2316518efc459928ad1d3d2d3511ea5fbda19475 Mon Sep 17 00:00:00 2001
>> From: pding 
>> Date: Mon, 23 Oct 2017 17:22:09 +0800
>> Subject: [PATCH 001/121] drm/amdgpu: retry init if it fails due to exclusive 
>> mode timeout (v3)
>>
>> The exclusive mode has real-time limitation in reality, such like being
>> done in 300ms. It's easy observed if running many VF/VMs in single host
>> with heavy CPU workload.
>>
>> If we find the init fails due to exclusive mode timeout, try it again.
>>
>> v2:
>>   - rewrite the condition for readable value.
>>
>> v3:
>>   - fix typo, add comments for sleep
>>
>> Acked-by: Alex Deucher 
>> Signed-off-by: pding 
>> Signed-off-by: Alex Deucher 
>> Signed-off-by: Gary Sun 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |   10 ++
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c|   15 +--
>>   2 files changed, 23 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index 125f77d..385b10e 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -2303,6 +2303,15 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>>   
>>  r = amdgpu_init(adev);
>>  if (r) {
>> +/* failed in exclusive mode due to timeout */
>> +if (amdgpu_sriov_vf(adev) &&
>> +!amdgpu_sriov_runtime(adev) &&
>> +amdgpu_virt_mmio_blocked(adev) &&
>> +!amdgpu_virt_wait_reset(adev)) {
>> +dev_err(adev->dev, "VF exclusive mode timeout\n");
>> +r = -EAGAIN;
>> +goto failed;
>> +}
>>  dev_err(adev->dev, "amdgpu_init failed\n");
>>  amdgpu_vf_error_put(adev, AMDGIM_ERROR_VF_AMDGPU_INIT_FAIL, 0, 
>> 0);
>>  amdgpu_fini(adev);
>> @@ -2390,6 +2399,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>>  amdgpu_vf_error_trans_all(adev);
>>  if (runtime)
>>  vga_switcheroo_fini_domain_pm_ops(adev->dev);
>> +
>>  return r;
>>   }
>>   
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> index 720139e..f313eee 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> @@ -86,7 +86,7 @@ void amdgpu_driver_unload_kms(struct drm_device *dev)
>>   int amdgpu_driver_load_kms(struct drm_device *dev, unsigned long flags)
>>   {
>>  struct amdgpu_device *adev;
>> -int r, acpi_status;
>> +int r, acpi_status, retry = 0;
>>   
>>   #ifdef CONFIG_DRM_AMDGPU_SI
>>  if (!amdgpu_si_support) {
>> @@ -122,6 +122,7 @@ int amdgpu_driver_load_kms(struct drm_device *dev, 
>> unsigned long flags)
>>  }
>>  }
>>   #endif
>> +retry_init:
>>   
>>  adev = kzalloc(sizeof(struct amdgpu_device), GFP_KERNEL);
>>  if (adev == NULL) {
>> @@ -144,7 +145,17 @@ int amdgpu_driver_load_kms(struct drm_device *dev, 
>> unsigned long flags)
>>   * VRAM allocation
>>   */
>>  r = amdgpu_device_init(adev, dev, dev->pdev, flags);
>> -if (r) {
>> +if (r == -EAGAIN && ++retry <= 3) {
>> +adev->virt.caps &= ~AMDGPU_SRIOV_CAPS_RUNTIME;
>> +adev->virt.ops = NULL;
>> +

Re: [PATCH] drm/amdgpu: resize VRAM BAR for CPU access v5

2017-11-06 Thread Ding, Pixel

It doesn’t work without that write. Meanwhile, this functionality doesn’t 
change BAR0 and BAR2 when operating on VF, just as you said it’s not necessary 
for VF.

So I think we can skip the whole stuff. Please help to review the patch coming 
later.
— 
Sincerely Yours,
Pixel







On 06/11/2017, 5:52 PM, "Christian König" <ckoenig.leichtzumer...@gmail.com> 
wrote:

>> What benefits will the larger VRAM bar bring in,
>Well the obvious one that the CPU can access all of VRAM. There are some 
>games/workloads which rely quite a bit on this and it simplifies MM 
>largely when you don't need to shuffle buffers around for CPU access.
>
>> or are there new features relying on larger VRAM bar under development?
>Not that I know of.
>
>> I think it’s better to try something which can keep this change for SRIOV 
>> and don’t introduce too much latency.
>Agree on that, but reprogramming the PCI BAR is something which takes 
>quite a bunch of time.
>
>That's why I asked if it is sufficient if you comment out this one 
>pci_write_config_word() and it works?
>
>If yes than we can just skip that write, but I have doubts that this 
>will be sufficient.
>
>On the other hand in an SRIOV environment the host can resize the BAR 
>before starting the client, so that whole stuff shouldn't be necessary 
>in the first place.
>
>Regards,
>Christian.
>
>Am 06.11.2017 um 10:34 schrieb Ding, Pixel:
>> What benefits will the larger VRAM bar bring in, or are there new features 
>> relying on larger VRAM bar under development? SRIOV wants to leverage this 
>> sort of optimization too.
>>
>> I think it’s better to try something which can keep this change for SRIOV 
>> and don’t introduce too much latency.
>>
>> —
>> Sincerely Yours,
>> Pixel
>>
>>
>>
>>
>>
>>
>>
>> On 06/11/2017, 5:16 PM, "Christian König" <ckoenig.leichtzumer...@gmail.com> 
>> wrote:
>>
>>> Am 06.11.2017 um 08:21 schrieb Ding, Pixel:
>>>> Hi Christian,
>>>>
>>>> The latest driver fails to load on SRIOV VF with xen hypervisor driver.
>>>>
>>>> +int amdgpu_device_resize_fb_bar(struct amdgpu_device *adev)
>>>> +{
>>>> + u64 space_needed = roundup_pow_of_two(adev->mc.real_vram_size);
>>>> + u32 rbar_size = order_base_2(((space_needed >> 20) | 1)) - 1;
>>>> + u16 cmd;
>>>> + int r;
>>>> +
>>>> + /* Disable memory decoding while we change the BAR addresses and 
>>>> size */
>>>> + pci_read_config_word(adev->pdev, PCI_COMMAND, );
>>>> + pci_write_config_word(adev->pdev, PCI_COMMAND,
>>>>
>>>> The function above introduces 900ms latency during init in exclusive 
>>>> accessing.
>>>>
>>>>
>>>> + cmd & ~PCI_COMMAND_MEMORY);
>>>> +
>>>>
>>>>
>>>> Can we bypass disablement of response in memory space here? Any concern or 
>>>> suggestion?
>>> Mhm, that was added to be on the extra safe side. In theory we can skip it.
>>>
>>> Does PCI BAR resize work on SRIOV if you skip this or is that just a NOP
>>> anyway?
>>>
>>> Might be a good idea to just immediately return here when SRIOV is active.
>>>
>>> Regards,
>>> Christian.
>>>
>>>>
>>>> —
>>>> Sincerely Yours,
>>>> Pixel
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 02/11/2017, 9:13 PM, "amd-gfx on behalf of Alex Deucher" 
>>>> <amd-gfx-boun...@lists.freedesktop.org on behalf of alexdeuc...@gmail.com> 
>>>> wrote:
>>>>
>>>>> On Thu, Nov 2, 2017 at 9:02 AM, Christian König <deathsim...@vodafone.de> 
>>>>> wrote:
>>>>>> From: Christian König <christian.koe...@amd.com>
>>>>>>
>>>>>> Try to resize BAR0 to let CPU access all of VRAM.
>>>>>>
>>>>>> v2: rebased, style cleanups, disable mem decode before resize,
>>>>>>   handle gmc_v9 as well, round size up to power of two.
>>>>>> v3: handle gmc_v6 as well, release and reassign all BARs in the driver.
>>>>>> v4: rename new function to amdgpu_device_resize_fb_bar,
>>>>>>   reenable mem decoding only if all resources are assigned.
>>>>>> v5: reorder resource release, retu

Re: [PATCH] drm/amdgpu: resize VRAM BAR for CPU access v5

2017-11-06 Thread Ding, Pixel

What benefits will the larger VRAM bar bring in, or are there new features 
relying on larger VRAM bar under development? SRIOV wants to leverage this sort 
of optimization too.

I think it’s better to try something which can keep this change for SRIOV and 
don’t introduce too much latency.

— 
Sincerely Yours,
Pixel







On 06/11/2017, 5:16 PM, "Christian König" <ckoenig.leichtzumer...@gmail.com> 
wrote:

>Am 06.11.2017 um 08:21 schrieb Ding, Pixel:
>> Hi Christian,
>>
>> The latest driver fails to load on SRIOV VF with xen hypervisor driver.
>>
>> +int amdgpu_device_resize_fb_bar(struct amdgpu_device *adev)
>> +{
>> + u64 space_needed = roundup_pow_of_two(adev->mc.real_vram_size);
>> + u32 rbar_size = order_base_2(((space_needed >> 20) | 1)) - 1;
>> + u16 cmd;
>> + int r;
>> +
>> + /* Disable memory decoding while we change the BAR addresses and size 
>> */
>> + pci_read_config_word(adev->pdev, PCI_COMMAND, );
>> + pci_write_config_word(adev->pdev, PCI_COMMAND,
>>
>> The function above introduces 900ms latency during init in exclusive 
>> accessing.
>>
>>
>> + cmd & ~PCI_COMMAND_MEMORY);
>> +
>>
>>
>> Can we bypass disablement of response in memory space here? Any concern or 
>> suggestion?
>
>Mhm, that was added to be on the extra safe side. In theory we can skip it.
>
>Does PCI BAR resize work on SRIOV if you skip this or is that just a NOP 
>anyway?
>
>Might be a good idea to just immediately return here when SRIOV is active.
>
>Regards,
>Christian.
>
>>
>>
>> —
>> Sincerely Yours,
>> Pixel
>>
>>
>>
>>
>>
>>
>>
>> On 02/11/2017, 9:13 PM, "amd-gfx on behalf of Alex Deucher" 
>> <amd-gfx-boun...@lists.freedesktop.org on behalf of alexdeuc...@gmail.com> 
>> wrote:
>>
>>> On Thu, Nov 2, 2017 at 9:02 AM, Christian König <deathsim...@vodafone.de> 
>>> wrote:
>>>> From: Christian König <christian.koe...@amd.com>
>>>>
>>>> Try to resize BAR0 to let CPU access all of VRAM.
>>>>
>>>> v2: rebased, style cleanups, disable mem decode before resize,
>>>>  handle gmc_v9 as well, round size up to power of two.
>>>> v3: handle gmc_v6 as well, release and reassign all BARs in the driver.
>>>> v4: rename new function to amdgpu_device_resize_fb_bar,
>>>>  reenable mem decoding only if all resources are assigned.
>>>> v5: reorder resource release, return -ENODEV instead of BUG_ON().
>>>>
>>>> Signed-off-by: Christian König <christian.koe...@amd.com>
>>> Reviewed-by: Alex Deucher <alexander.deuc...@amd.com>
>>>
>>>> ---
>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h|  1 +
>>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 48 
>>>> ++
>>>>   drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c  | 12 ++--
>>>>   drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c  | 13 ++--
>>>>   drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c  | 13 ++--
>>>>   drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  | 14 ++---
>>>>   6 files changed, 88 insertions(+), 13 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>> index 2730a75..4f919d3 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>> @@ -1854,6 +1854,7 @@ void amdgpu_ttm_placement_from_domain(struct 
>>>> amdgpu_bo *abo, u32 domain);
>>>>   bool amdgpu_ttm_bo_is_amdgpu_bo(struct ttm_buffer_object *bo);
>>>>   void amdgpu_vram_location(struct amdgpu_device *adev, struct amdgpu_mc 
>>>> *mc, u64 base);
>>>>   void amdgpu_gart_location(struct amdgpu_device *adev, struct amdgpu_mc 
>>>> *mc);
>>>> +int amdgpu_device_resize_fb_bar(struct amdgpu_device *adev);
>>>>   void amdgpu_ttm_set_active_vram_size(struct amdgpu_device *adev, u64 
>>>> size);
>>>>   int amdgpu_ttm_init(struct amdgpu_device *adev);
>>>>   void amdgpu_ttm_fini(struct amdgpu_device *adev);
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> index 8b33adf..cb3a0ac 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_d

Re: [PATCH] drm/amdgpu: resize VRAM BAR for CPU access v5

2017-11-05 Thread Ding, Pixel

Hi Christian,

The latest driver fails to load on SRIOV VF with xen hypervisor driver.

+int amdgpu_device_resize_fb_bar(struct amdgpu_device *adev)
+{
+ u64 space_needed = roundup_pow_of_two(adev->mc.real_vram_size);
+ u32 rbar_size = order_base_2(((space_needed >> 20) | 1)) - 1;
+ u16 cmd;
+ int r;
+
+ /* Disable memory decoding while we change the BAR addresses and size */
+ pci_read_config_word(adev->pdev, PCI_COMMAND, );
+ pci_write_config_word(adev->pdev, PCI_COMMAND,

The function above introduces 900ms latency during init in exclusive accessing.


+ cmd & ~PCI_COMMAND_MEMORY);
+


Can we bypass disablement of response in memory space here? Any concern or 
suggestion?


— 
Sincerely Yours,
Pixel







On 02/11/2017, 9:13 PM, "amd-gfx on behalf of Alex Deucher" 
 
wrote:

>On Thu, Nov 2, 2017 at 9:02 AM, Christian König  
>wrote:
>> From: Christian König 
>>
>> Try to resize BAR0 to let CPU access all of VRAM.
>>
>> v2: rebased, style cleanups, disable mem decode before resize,
>> handle gmc_v9 as well, round size up to power of two.
>> v3: handle gmc_v6 as well, release and reassign all BARs in the driver.
>> v4: rename new function to amdgpu_device_resize_fb_bar,
>> reenable mem decoding only if all resources are assigned.
>> v5: reorder resource release, return -ENODEV instead of BUG_ON().
>>
>> Signed-off-by: Christian König 
>
>Reviewed-by: Alex Deucher 
>
>> ---
>>  drivers/gpu/drm/amd/amdgpu/amdgpu.h|  1 +
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 48 
>> ++
>>  drivers/gpu/drm/amd/amdgpu/gmc_v6_0.c  | 12 ++--
>>  drivers/gpu/drm/amd/amdgpu/gmc_v7_0.c  | 13 ++--
>>  drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c  | 13 ++--
>>  drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  | 14 ++---
>>  6 files changed, 88 insertions(+), 13 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> index 2730a75..4f919d3 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> @@ -1854,6 +1854,7 @@ void amdgpu_ttm_placement_from_domain(struct amdgpu_bo 
>> *abo, u32 domain);
>>  bool amdgpu_ttm_bo_is_amdgpu_bo(struct ttm_buffer_object *bo);
>>  void amdgpu_vram_location(struct amdgpu_device *adev, struct amdgpu_mc *mc, 
>> u64 base);
>>  void amdgpu_gart_location(struct amdgpu_device *adev, struct amdgpu_mc *mc);
>> +int amdgpu_device_resize_fb_bar(struct amdgpu_device *adev);
>>  void amdgpu_ttm_set_active_vram_size(struct amdgpu_device *adev, u64 size);
>>  int amdgpu_ttm_init(struct amdgpu_device *adev);
>>  void amdgpu_ttm_fini(struct amdgpu_device *adev);
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index 8b33adf..cb3a0ac 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -410,6 +410,9 @@ static int amdgpu_doorbell_init(struct amdgpu_device 
>> *adev)
>> return 0;
>> }
>>
>> +   if (pci_resource_flags(adev->pdev, 2) & IORESOURCE_UNSET)
>> +   return -EINVAL;
>> +
>> /* doorbell bar mapping */
>> adev->doorbell.base = pci_resource_start(adev->pdev, 2);
>> adev->doorbell.size = pci_resource_len(adev->pdev, 2);
>> @@ -749,6 +752,51 @@ int amdgpu_fw_reserve_vram_init(struct amdgpu_device 
>> *adev)
>> return r;
>>  }
>>
>> +/**
>> + * amdgpu_device_resize_fb_bar - try to resize FB BAR
>> + *
>> + * @adev: amdgpu_device pointer
>> + *
>> + * Try to resize FB BAR to make all VRAM CPU accessible. We try very hard 
>> not
>> + * to fail, but if any of the BARs is not accessible after the size we abort
>> + * driver loading by returning -ENODEV.
>> + */
>> +int amdgpu_device_resize_fb_bar(struct amdgpu_device *adev)
>> +{
>> +   u64 space_needed = roundup_pow_of_two(adev->mc.real_vram_size);
>> +   u32 rbar_size = order_base_2(((space_needed >> 20) | 1)) - 1;
>> +   u16 cmd;
>> +   int r;
>> +
>> +   /* Disable memory decoding while we change the BAR addresses and 
>> size */
>> +   pci_read_config_word(adev->pdev, PCI_COMMAND, );
>> +   pci_write_config_word(adev->pdev, PCI_COMMAND,
>> + cmd & ~PCI_COMMAND_MEMORY);
>> +
>> +   /* Free the VRAM and doorbell BAR, we most likely need to move both. 
>> */
>> +   amdgpu_doorbell_fini(adev);
>> +   if (adev->asic_type >= CHIP_BONAIRE)
>> +   pci_release_resource(adev->pdev, 2);
>> +
>> +   pci_release_resource(adev->pdev, 0);
>> +
>> +   r = pci_resize_resource(adev->pdev, 0, rbar_size);
>> +   if (r == -ENOSPC)
>> +   DRM_INFO("Not enough PCI address space for a large BAR.");
>> +   else if (r

Re: [PATCH 2/3] drm/amdgpu: release exclusive mode after hw_init if no kfd

2017-11-02 Thread Ding, Pixel

Hi Felix,


KFD will set HQD. These registers must be accessed in exclusive mode, otherwise 
driver use KIQ to access them which causes world switch failure.

— 
Sincerely Yours,
Pixel







On 02/11/2017, 9:49 PM, "Kuehling, Felix"  wrote:

>Hi Pixel,
>
>I'm curious, which part of the KFD initialization requires exclusive
>access? KFD doesn't access MMIO directly, only through callbacks to
>amdgpu. Which of those callbacks are only used during initialization,
>and require exclusive access? Maybe that's something that can be fixed.
>
>Regards,
>  Felix
>
>
>On 2017-11-01 11:16 PM, Pixel Ding wrote:
>> From: pding 
>>
>> KFD device init requires exclusive mode. Driver can release
>> exclusive mode after hw_init if KFD is not enabled.
>>
>> Signed-off-by: pding 
>> ---
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 +++
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c| 5 +++--
>>  2 files changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index 809e656..dc1d1af 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -1716,6 +1716,9 @@ static int amdgpu_init(struct amdgpu_device *adev)
>>  adev->ip_blocks[i].status.hw = true;
>>  }
>>  
>> +if (amdgpu_sriov_vf(adev) && !adev->kfd)
>> +amdgpu_virt_release_full_gpu(adev, true);
>> +
>>  return 0;
>>  }
>>  
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> index acdb010..589b41f 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> @@ -130,6 +130,7 @@ int amdgpu_driver_load_kms(struct drm_device *dev, 
>> unsigned long flags)
>>  
>>  dev->dev_private = (void *)adev;
>>  
>> +amdgpu_amdkfd_device_probe(adev);
>>  /* amdgpu_device_init should report only fatal error
>>   * like memory allocation failure or iomapping failure,
>>   * or memory manager initialization failure, it must
>> @@ -141,6 +142,7 @@ int amdgpu_driver_load_kms(struct drm_device *dev, 
>> unsigned long flags)
>>  adev->virt.caps &= ~AMDGPU_SRIOV_CAPS_RUNTIME;
>>  adev->virt.ops = NULL;
>>  amdgpu_device_fini(adev);
>> +kfree(adev->kfd);
>>  kfree(adev);
>>  dev->dev_private = NULL;
>>  /* Don't request EX mode too frequently which is attacking */
>> @@ -162,7 +164,6 @@ int amdgpu_driver_load_kms(struct drm_device *dev, 
>> unsigned long flags)
>>  "Error during ACPI methods call\n");
>>  }
>>  
>> -amdgpu_amdkfd_device_probe(adev);
>>  amdgpu_amdkfd_device_init(adev);
>>  
>>  if (amdgpu_device_is_px(dev)) {
>> @@ -174,7 +175,7 @@ int amdgpu_driver_load_kms(struct drm_device *dev, 
>> unsigned long flags)
>>  pm_runtime_put_autosuspend(dev->dev);
>>  }
>>  
>> -if (amdgpu_sriov_vf(adev))
>> +if (amdgpu_sriov_vf(adev) && adev->kfd)
>>  amdgpu_virt_release_full_gpu(adev, true);
>>  
>>  out:
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: release exclusive mode after hw init if no kfd

2017-11-01 Thread Ding, Pixel

Hi Oded,

Please ignore them so far. I need to consider more like that the probe is 
included in retry init logics.
— 
Sincerely Yours,
Pixel








On 01/11/2017, 1:53 PM, "amd-gfx on behalf of Pixel Ding" 
 wrote:

>Hi Oded,
>
>Please review.
>
>[PATCH 1/2] drm/amdkfd: initialise kgd field inside kfd device_init
>As you suggested, move kgd assignment to device_init
>
>[PATCH 2/2] drm/amdgpu: release exclusive mode after hw_init if no
>We still need this change because pdev is passed in.
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: release exclusive mode after hw_init if no kfd

2017-10-31 Thread Ding, Pixel

I’m not for sure about this case you talked about. Assume that it could happen 
and the KFD probe and init are invoked when loading it manually.

For baremetal device, it’s always correct.
For SRIOV virtual function, it doesn’t behave correctly with or without this 
patch. KFD initialization also needs to access VF in exclusive mode, while the 
exclusive mode request/release messages are sent in amdgpu_device_init.

—
Sincerely Yours,
Pixel







On 31/10/2017, 11:06 PM, "Tom Stellard"  wrote:

>On 10/30/2017 12:57 AM, Pixel Ding wrote:
>> From: pding 
>> 
>> Move kfd probe prior to device init. Release exclusive mode
>> after hw_init if kfd is not enabled.
>> 
>
>What happens if only the amdgpu module is loaded at startup, and then the
>user manually loads the amdkfd module at some point later on.  Will the driver
>still behave correctly in this case with this patch?
>
>-Tom
>
>
>> Signed-off-by: pding 
>> ---
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 +++
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c| 5 +++--
>>  2 files changed, 6 insertions(+), 2 deletions(-)
>> 
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index 400dfaa..e46ec51 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -1716,6 +1716,9 @@ static int amdgpu_init(struct amdgpu_device *adev)
>>  adev->ip_blocks[i].status.hw = true;
>>  }
>>  
>> +if (amdgpu_sriov_vf(adev) && !adev->kfd)
>> +amdgpu_virt_release_full_gpu(adev, true);
>> +
>>  return 0;
>>  }
>>  
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> index 3e9760d..e91907c 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> @@ -138,6 +138,8 @@ int amdgpu_driver_load_kms(struct drm_device *dev, 
>> unsigned long flags)
>>  !pci_is_thunderbolt_attached(dev->pdev))
>>  flags |= AMD_IS_PX;
>>  
>> +amdgpu_amdkfd_device_probe(adev);
>> +
>>  /* amdgpu_device_init should report only fatal error
>>   * like memory allocation failure or iomapping failure,
>>   * or memory manager initialization failure, it must
>> @@ -170,7 +172,6 @@ int amdgpu_driver_load_kms(struct drm_device *dev, 
>> unsigned long flags)
>>  "Error during ACPI methods call\n");
>>  }
>>  
>> -amdgpu_amdkfd_device_probe(adev);
>>  amdgpu_amdkfd_device_init(adev);
>>  
>>  if (amdgpu_device_is_px(dev)) {
>> @@ -182,7 +183,7 @@ int amdgpu_driver_load_kms(struct drm_device *dev, 
>> unsigned long flags)
>>  pm_runtime_put_autosuspend(dev->dev);
>>  }
>>  
>> -if (amdgpu_sriov_vf(adev))
>> +if (amdgpu_sriov_vf(adev) && adev->kfd)
>>  amdgpu_virt_release_full_gpu(adev, true);
>>  
>>  out:
>> 
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: release exclusive mode after hw_init if no kfd (v2)

2017-10-31 Thread Ding, Pixel

Hi Oded,

Would you please help reviewing the V2 patch?

— 
Sincerely Yours,
Pixel








On 31/10/2017, 9:47 AM, "Pixel Ding"  wrote:

>From: pding 
>
>Move kfd probe prior to device init. Release exclusive mode
>after hw_init if kfd is not enabled.
>
>v2:
> - pass pdev param
>
>Signed-off-by: pding 
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 5 +++--
> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 3 ++-
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 +++
> drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c| 5 +++--
> 4 files changed, 11 insertions(+), 5 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c 
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>index c70cda0..f0f5d0e 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
>@@ -68,7 +68,8 @@ void amdgpu_amdkfd_fini(void)
>   }
> }
> 
>-void amdgpu_amdkfd_device_probe(struct amdgpu_device *adev)
>+void amdgpu_amdkfd_device_probe(struct amdgpu_device *adev,
>+  struct pci_dev *pdev)
> {
>   const struct kfd2kgd_calls *kfd2kgd;
> 
>@@ -90,7 +91,7 @@ void amdgpu_amdkfd_device_probe(struct amdgpu_device *adev)
>   }
> 
>   adev->kfd = kgd2kfd->probe((struct kgd_dev *)adev,
>- adev->pdev, kfd2kgd);
>+ pdev, kfd2kgd);
> }
> 
> void amdgpu_amdkfd_device_init(struct amdgpu_device *adev)
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>index 8d689ab..707c892 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
>@@ -44,7 +44,8 @@ void amdgpu_amdkfd_suspend(struct amdgpu_device *adev);
> int amdgpu_amdkfd_resume(struct amdgpu_device *adev);
> void amdgpu_amdkfd_interrupt(struct amdgpu_device *adev,
>   const void *ih_ring_entry);
>-void amdgpu_amdkfd_device_probe(struct amdgpu_device *adev);
>+void amdgpu_amdkfd_device_probe(struct amdgpu_device *adev,
>+  struct pci_dev *pdev);
> void amdgpu_amdkfd_device_init(struct amdgpu_device *adev);
> void amdgpu_amdkfd_device_fini(struct amdgpu_device *adev);
> 
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>index 2ff2c54..daa2098 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>@@ -1716,6 +1716,9 @@ static int amdgpu_init(struct amdgpu_device *adev)
>   adev->ip_blocks[i].status.hw = true;
>   }
> 
>+  if (amdgpu_sriov_vf(adev) && !adev->kfd)
>+  amdgpu_virt_release_full_gpu(adev, true);
>+
>   return 0;
> }
> 
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>index 3e9760d..f872052 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>@@ -138,6 +138,8 @@ int amdgpu_driver_load_kms(struct drm_device *dev, 
>unsigned long flags)
>   !pci_is_thunderbolt_attached(dev->pdev))
>   flags |= AMD_IS_PX;
> 
>+  amdgpu_amdkfd_device_probe(adev, dev->pdev);
>+
>   /* amdgpu_device_init should report only fatal error
>* like memory allocation failure or iomapping failure,
>* or memory manager initialization failure, it must
>@@ -170,7 +172,6 @@ int amdgpu_driver_load_kms(struct drm_device *dev, 
>unsigned long flags)
>   "Error during ACPI methods call\n");
>   }
> 
>-  amdgpu_amdkfd_device_probe(adev);
>   amdgpu_amdkfd_device_init(adev);
> 
>   if (amdgpu_device_is_px(dev)) {
>@@ -182,7 +183,7 @@ int amdgpu_driver_load_kms(struct drm_device *dev, 
>unsigned long flags)
>   pm_runtime_put_autosuspend(dev->dev);
>   }
> 
>-  if (amdgpu_sriov_vf(adev))
>+  if (amdgpu_sriov_vf(adev) && adev->kfd)
>   amdgpu_virt_release_full_gpu(adev, true);
> 
> out:
>-- 
>2.9.5
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: release exclusive mode after hw_init if no kfd

2017-10-30 Thread Ding, Pixel

Another option is adding a function with the same logics as top half probe to 
determine if KFD is enabled or not, like amdgpu_amdkfd_enabled().

I think it’s okay to pass pdev separately. The KFD probe implementation also 
has pdev parameter, it doesn’t retrieve that from adev, while the middle layer 
does this.

Which one do you prefer?

— 
Sincerely Yours,
Pixel







On 30/10/2017, 4:52 PM, "Oded Gabbay" <oded.gab...@gmail.com> wrote:

>Except from pdev, kfd doesn't access any other fields in adev, so
>passing pdev as a different parameter seems fine.
>The problem with that approach is we need to remember for the future
>not to access other adev fields, and that seems risky and not correct
>
>
>
>
>On Mon, Oct 30, 2017 at 10:30 AM, Ding, Pixel <pixel.d...@amd.com> wrote:
>> Get your point. I don’t know why the test is passed however will revise 
>> later.
>>
>> I definitely want the know if KFD is enabled before device init. Any 
>> suggestion? Or do you mind if the pdev is passed to probe in other way?
>>
>> —
>> Sincerely Yours,
>> Pixel
>>
>>
>>
>>
>>
>>
>>
>>
>> On 30/10/2017, 4:20 PM, "Oded Gabbay" <oded.gab...@gmail.com> wrote:
>>
>>>On Mon, Oct 30, 2017 at 9:57 AM, Pixel Ding <pixel.d...@amd.com> wrote:
>>>> From: pding <pixel.d...@amd.com>
>>>>
>>>> Move kfd probe prior to device init. Release exclusive mode
>>>> after hw_init if kfd is not enabled.
>>>>
>>>> Signed-off-by: pding <pixel.d...@amd.com>
>>>> ---
>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 +++
>>>>  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c| 5 +++--
>>>>  2 files changed, 6 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> index 400dfaa..e46ec51 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>> @@ -1716,6 +1716,9 @@ static int amdgpu_init(struct amdgpu_device *adev)
>>>> adev->ip_blocks[i].status.hw = true;
>>>> }
>>>>
>>>> +   if (amdgpu_sriov_vf(adev) && !adev->kfd)
>>>> +   amdgpu_virt_release_full_gpu(adev, true);
>>>> +
>>>> return 0;
>>>>  }
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>>>> index 3e9760d..e91907c 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>>>> @@ -138,6 +138,8 @@ int amdgpu_driver_load_kms(struct drm_device *dev, 
>>>> unsigned long flags)
>>>> !pci_is_thunderbolt_attached(dev->pdev))
>>>> flags |= AMD_IS_PX;
>>>>
>>>> +   amdgpu_amdkfd_device_probe(adev);
>>>> +
>>>> /* amdgpu_device_init should report only fatal error
>>>>  * like memory allocation failure or iomapping failure,
>>>>  * or memory manager initialization failure, it must
>>>> @@ -170,7 +172,6 @@ int amdgpu_driver_load_kms(struct drm_device *dev, 
>>>> unsigned long flags)
>>>> "Error during ACPI methods call\n");
>>>> }
>>>>
>>>> -   amdgpu_amdkfd_device_probe(adev);
>>>> amdgpu_amdkfd_device_init(adev);
>>>>
>>>> if (amdgpu_device_is_px(dev)) {
>>>> @@ -182,7 +183,7 @@ int amdgpu_driver_load_kms(struct drm_device *dev, 
>>>> unsigned long flags)
>>>> pm_runtime_put_autosuspend(dev->dev);
>>>> }
>>>>
>>>> -   if (amdgpu_sriov_vf(adev))
>>>> +   if (amdgpu_sriov_vf(adev) && adev->kfd)
>>>> amdgpu_virt_release_full_gpu(adev, true);
>>>>
>>>>  out:
>>>> --
>>>> 2.9.5
>>>>
>>>> ___
>>>> amd-gfx mailing list
>>>> amd-gfx@lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>>
>>>The amdkfd probe function uses the pdev field inside adev. That field
>>>is initialized in device init, so you can't call amdkfd probe before
>>>that function.
>>>Oded
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: release exclusive mode after hw_init if no kfd

2017-10-30 Thread Ding, Pixel

Get your point. I don’t know why the test is passed however will revise later.

I definitely want the know if KFD is enabled before device init. Any 
suggestion? Or do you mind if the pdev is passed to probe in other way?

— 
Sincerely Yours,
Pixel








On 30/10/2017, 4:20 PM, "Oded Gabbay"  wrote:

>On Mon, Oct 30, 2017 at 9:57 AM, Pixel Ding  wrote:
>> From: pding 
>>
>> Move kfd probe prior to device init. Release exclusive mode
>> after hw_init if kfd is not enabled.
>>
>> Signed-off-by: pding 
>> ---
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 +++
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c| 5 +++--
>>  2 files changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index 400dfaa..e46ec51 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -1716,6 +1716,9 @@ static int amdgpu_init(struct amdgpu_device *adev)
>> adev->ip_blocks[i].status.hw = true;
>> }
>>
>> +   if (amdgpu_sriov_vf(adev) && !adev->kfd)
>> +   amdgpu_virt_release_full_gpu(adev, true);
>> +
>> return 0;
>>  }
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> index 3e9760d..e91907c 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> @@ -138,6 +138,8 @@ int amdgpu_driver_load_kms(struct drm_device *dev, 
>> unsigned long flags)
>> !pci_is_thunderbolt_attached(dev->pdev))
>> flags |= AMD_IS_PX;
>>
>> +   amdgpu_amdkfd_device_probe(adev);
>> +
>> /* amdgpu_device_init should report only fatal error
>>  * like memory allocation failure or iomapping failure,
>>  * or memory manager initialization failure, it must
>> @@ -170,7 +172,6 @@ int amdgpu_driver_load_kms(struct drm_device *dev, 
>> unsigned long flags)
>> "Error during ACPI methods call\n");
>> }
>>
>> -   amdgpu_amdkfd_device_probe(adev);
>> amdgpu_amdkfd_device_init(adev);
>>
>> if (amdgpu_device_is_px(dev)) {
>> @@ -182,7 +183,7 @@ int amdgpu_driver_load_kms(struct drm_device *dev, 
>> unsigned long flags)
>> pm_runtime_put_autosuspend(dev->dev);
>> }
>>
>> -   if (amdgpu_sriov_vf(adev))
>> +   if (amdgpu_sriov_vf(adev) && adev->kfd)
>> amdgpu_virt_release_full_gpu(adev, true);
>>
>>  out:
>> --
>> 2.9.5
>>
>> ___
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
>The amdkfd probe function uses the pdev field inside adev. That field
>is initialized in device init, so you can't call amdkfd probe before
>that function.
>Oded
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 6/7] drm/amdgpu/virt: implement wait_reset callbacks for vi/ai

2017-10-25 Thread Ding, Pixel

Thanks:) I only got the ack for part 1…
— 
Sincerely Yours,
Pixel







On 26/10/2017, 10:29 AM, "Deucher, Alexander" <alexander.deuc...@amd.com> wrote:

>> -Original Message-
>> From: Ding, Pixel
>> Sent: Wednesday, October 25, 2017 10:20 PM
>> To: amd-gfx@lists.freedesktop.org; Deucher, Alexander
>> Cc: Sun, Gary
>> Subject: Re: [PATCH 6/7] drm/amdgpu/virt: implement wait_reset callbacks
>> for vi/ai
>> 
>> Hi Alex,
>> 
>> Any concern about this patch? it’s split as suggested.
>
>I replied with my ack.  Did the email not go through?
>
>Alex
>
>> —
>> Sincerely Yours,
>> Pixel
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> On 25/10/2017, 10:17 AM, "Pixel Ding" <pixel.d...@amd.com> wrote:
>> 
>> >From: pding <pixel.d...@amd.com>
>> >
>> >Hi Alex,
>> >
>> >Split the wait_reset patch to 2. Part 2.
>> >
>> >please review.
>> >
>> >---
>> >
>> >Signed-off-by: pding <pixel.d...@amd.com>
>> >---
>> > drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c | 1 +
>> > drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c | 6 ++
>> > 2 files changed, 7 insertions(+)
>> >
>> >diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
>> b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
>> >index b4906d2..f91aab3 100644
>> >--- a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
>> >+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
>> >@@ -353,5 +353,6 @@ const struct amdgpu_virt_ops xgpu_ai_virt_ops = {
>> >.req_full_gpu   = xgpu_ai_request_full_gpu_access,
>> >.rel_full_gpu   = xgpu_ai_release_full_gpu_access,
>> >.reset_gpu = xgpu_ai_request_reset,
>> >+   .wait_reset = NULL,
>> >.trans_msg = xgpu_ai_mailbox_trans_msg,
>> > };
>> >diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c
>> b/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c
>> >index c25a831..27b03c7 100644
>> >--- a/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c
>> >+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c
>> >@@ -458,6 +458,11 @@ static int xgpu_vi_request_reset(struct
>> amdgpu_device *adev)
>> >return xgpu_vi_send_access_requests(adev,
>> IDH_REQ_GPU_RESET_ACCESS);
>> > }
>> >
>> >+static int xgpu_vi_wait_reset_cmpl(struct amdgpu_device *adev)
>> >+{
>> >+   return xgpu_vi_poll_msg(adev, IDH_FLR_NOTIFICATION_CMPL);
>> >+}
>> >+
>> > static int xgpu_vi_request_full_gpu_access(struct amdgpu_device *adev,
>> >   bool init)
>> > {
>> >@@ -613,5 +618,6 @@ const struct amdgpu_virt_ops xgpu_vi_virt_ops = {
>> >.req_full_gpu   = xgpu_vi_request_full_gpu_access,
>> >.rel_full_gpu   = xgpu_vi_release_full_gpu_access,
>> >.reset_gpu  = xgpu_vi_request_reset,
>> >+   .wait_reset = xgpu_vi_wait_reset_cmpl,
>> >.trans_msg  = NULL, /* Does not need to trans VF errors
>> to host. */
>> > };
>> >--
>> >2.9.5
>> >
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 6/7] drm/amdgpu/virt: implement wait_reset callbacks for vi/ai

2017-10-25 Thread Ding, Pixel

Hi Alex,

Any concern about this patch? it’s split as suggested.
— 
Sincerely Yours,
Pixel







On 25/10/2017, 10:17 AM, "Pixel Ding"  wrote:

>From: pding 
>
>Hi Alex,
>
>Split the wait_reset patch to 2. Part 2.
>
>please review.
>
>---
>
>Signed-off-by: pding 
>---
> drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c | 1 +
> drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c | 6 ++
> 2 files changed, 7 insertions(+)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c 
>b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
>index b4906d2..f91aab3 100644
>--- a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
>+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
>@@ -353,5 +353,6 @@ const struct amdgpu_virt_ops xgpu_ai_virt_ops = {
>   .req_full_gpu   = xgpu_ai_request_full_gpu_access,
>   .rel_full_gpu   = xgpu_ai_release_full_gpu_access,
>   .reset_gpu = xgpu_ai_request_reset,
>+  .wait_reset = NULL,
>   .trans_msg = xgpu_ai_mailbox_trans_msg,
> };
>diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c 
>b/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c
>index c25a831..27b03c7 100644
>--- a/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c
>+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c
>@@ -458,6 +458,11 @@ static int xgpu_vi_request_reset(struct amdgpu_device 
>*adev)
>   return xgpu_vi_send_access_requests(adev, IDH_REQ_GPU_RESET_ACCESS);
> }
> 
>+static int xgpu_vi_wait_reset_cmpl(struct amdgpu_device *adev)
>+{
>+  return xgpu_vi_poll_msg(adev, IDH_FLR_NOTIFICATION_CMPL);
>+}
>+
> static int xgpu_vi_request_full_gpu_access(struct amdgpu_device *adev,
>  bool init)
> {
>@@ -613,5 +618,6 @@ const struct amdgpu_virt_ops xgpu_vi_virt_ops = {
>   .req_full_gpu   = xgpu_vi_request_full_gpu_access,
>   .rel_full_gpu   = xgpu_vi_release_full_gpu_access,
>   .reset_gpu  = xgpu_vi_request_reset,
>+  .wait_reset = xgpu_vi_wait_reset_cmpl,
>   .trans_msg  = NULL, /* Does not need to trans VF errors to 
> host. */
> };
>-- 
>2.9.5
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 7/7] drm/amdgpu: retry init if it fails due to exclusive mode timeout (v2)

2017-10-25 Thread Ding, Pixel

Host rejects frequent exclusive mode requesting, otherwise an attacking VF may 
cause problems. Sleep for a while to let host accept next request here.

— 
Sincerely Yours,
Pixel







On 26/10/2017, 6:44 AM, "Alex Deucher"  wrote:

>On Tue, Oct 24, 2017 at 10:19 PM, Pixel Ding  wrote:
>> From: pding 
>>
>> The exclusive mode has real-time limitation in reality, such like being
>> done in 300ms. It's easy observed if running many VF/VMs in single host
>> with heavy CPU workload.
>>
>> If we find the init fails due to exclusive mode timeout, try it again.
>>
>> v2:
>>  - rewrite the condition for readable value.
>>
>> Signed-off-by: pding 
>> ---
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 ++
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c| 14 --
>>  2 files changed, 22 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index 2fdd73b..14fe2bc 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -2301,6 +2301,15 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>>
>> r = amdgpu_init(adev);
>> if (r) {
>> +   /* failed in exclusive mode due to timeout */
>> +   if (amdgpu_sriov_vf(adev) &&
>> +   !amdgpu_sriov_runtime(adev) &&
>> +   amdgpu_virt_mmio_blocked(adev) &&
>> +   !amdgpu_virt_wait_reset(adev)) {
>> +   dev_err(adev->dev, "VF exclusive mode timeout\n");
>> +   r = -EAGAIN;
>> +   goto failed;
>> +   }
>> dev_err(adev->dev, "amdgpu_init failed\n");
>> amdgpu_vf_error_put(adev, AMDGIM_ERROR_VF_AMDGPU_INIT_FAIL, 
>> 0, 0);
>> amdgpu_fini(adev);
>> @@ -2388,6 +2397,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>> amdgpu_vf_error_trans_all(adev);
>> if (runtime)
>> vga_switcheroo_fini_domain_pm_ops(adev->dev);
>> +
>> return r;
>>  }
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> index 4a9f749..f4ee407 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> @@ -86,7 +86,7 @@ void amdgpu_driver_unload_kms(struct drm_device *dev)
>>  int amdgpu_driver_load_kms(struct drm_device *dev, unsigned long flags)
>>  {
>> struct amdgpu_device *adev;
>> -   int r, acpi_status;
>> +   int r, acpi_status, retry = 0;
>>
>>  #ifdef CONFIG_DRM_AMDGPU_SI
>> if (!amdgpu_si_support) {
>> @@ -122,6 +122,7 @@ int amdgpu_driver_load_kms(struct drm_device *dev, 
>> unsigned long flags)
>> }
>> }
>>  #endif
>> +retry_init:
>>
>> adev = kzalloc(sizeof(struct amdgpu_device), GFP_KERNEL);
>> if (adev == NULL) {
>> @@ -144,7 +145,16 @@ int amdgpu_driver_load_kms(struct drm_device *dev, 
>> unsigned long flags)
>>  * VRAM allocation
>>  */
>> r = amdgpu_device_init(adev, dev, dev->pdev, flags);
>> -   if (r) {
>> +   if (++retry >= 3 && r == -EAGAIN) {
>
>Might be better to check for -EAGAIN first.
>
>> +   adev->virt.caps &= ~AMDGPU_SRIOV_CAPS_RUNTIME;
>> +   adev->virt.ops = NULL;
>> +   amdgpu_device_fini(adev);
>> +   kfree(adev);
>> +   dev->dev_private = NULL;
>> +   msleep(5000);
>
>
>Why do we need the sleep here?
>
>> +   dev_err(>pdev->dev, "retry init %d\n", retry);
>> +   goto retry_init;
>> +   } else if (r) {
>> dev_err(>pdev->dev, "Fatal error during GPU init\n");
>> goto out;
>> }
>> --
>> 2.9.5
>>
>> ___
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 7/7] drm/amdgpu: retry init if it fails due to exclusive mode timeout

2017-10-24 Thread Ding, Pixel

Thanks Andres, revised in v2.
— 
Sincerely Yours,
Pixel








On 23/10/2017, 11:44 PM, "amd-gfx on behalf of Andres Rodriguez" 
 wrote:

>
>
>On 2017-10-23 06:03 AM, Pixel Ding wrote:
>> From: pding 
>> 
>> The exclusive mode has real-time limitation in reality, such like being
>> done in 300ms. It's easy observed if running many VF/VMs in single host
>> with heavy CPU workload.
>> 
>> If we find the init fails due to exclusive mode timeout, try it again.
>> 
>> Signed-off-by: pding 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 ++
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c| 15 +--
>>   2 files changed, 23 insertions(+), 2 deletions(-)
>> 
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index 3458d46..1935f5a 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -2306,6 +2306,15 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>>   
>>  r = amdgpu_init(adev);
>>  if (r) {
>> +/* failed in exclusive mode due to timeout */
>> +if (amdgpu_sriov_vf(adev) &&
>> +!amdgpu_sriov_runtime(adev) &&
>> +amdgpu_virt_mmio_blocked(adev) &&
>> +!amdgpu_virt_wait_reset(adev)) {
>> +dev_err(adev->dev, "VF exclusive mode timeout\n");
>> +r = -EAGAIN;
>> +goto failed;
>> +}
>>  dev_err(adev->dev, "amdgpu_init failed\n");
>>  amdgpu_vf_error_put(adev, AMDGIM_ERROR_VF_AMDGPU_INIT_FAIL, 0, 
>> 0);
>>  amdgpu_fini(adev);
>> @@ -2393,6 +2402,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>>  amdgpu_vf_error_trans_all(adev);
>>  if (runtime)
>>  vga_switcheroo_fini_domain_pm_ops(adev->dev);
>> +
>>  return r;
>>   }
>>   
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> index f2eb7ac..fdc240a 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> @@ -86,7 +86,7 @@ void amdgpu_driver_unload_kms(struct drm_device *dev)
>>   int amdgpu_driver_load_kms(struct drm_device *dev, unsigned long flags)
>>   {
>>  struct amdgpu_device *adev;
>> -int r, acpi_status;
>> +int r, acpi_status, retry = 0;
>>   
>>   #ifdef CONFIG_DRM_AMDGPU_SI
>>  if (!amdgpu_si_support) {
>> @@ -122,6 +122,7 @@ int amdgpu_driver_load_kms(struct drm_device *dev, 
>> unsigned long flags)
>>  }
>>  }
>>   #endif
>> +retry_init:
>>   
>>  adev = kzalloc(sizeof(struct amdgpu_device), GFP_KERNEL);
>>  if (adev == NULL) {
>> @@ -144,7 +145,17 @@ int amdgpu_driver_load_kms(struct drm_device *dev, 
>> unsigned long flags)
>>   * VRAM allocation
>>   */
>>  r = amdgpu_device_init(adev, dev, dev->pdev, flags);
>> -if (r) {
>> +if (++retry != 3 && r == -EAGAIN) {
>
>Minor nitpick here. Might want to rewrite the condition so that it 
>evaluates to false for most values of retry (currently it evaluates to 
>false only for one value of retry).
>
>E.g. if (++retry >= 3 ...)
>
>Or
>
>int retry = 3;
>...
>if (--retry >= 0 ...)
>
>> +adev->virt.caps &= ~AMDGPU_SRIOV_CAPS_RUNTIME;
>> +adev->virt.ops = NULL;
>> +amdgpu_device_fini(adev);
>> +kfree(adev);
>> +dev->dev_private = NULL;
>> +msleep(5000);
>> +dev_err(>pdev->dev, "retry init %d\n", retry);
>> +amdgpu_init_log = 0;
>> +goto retry_init;
>> +} else if (r) {
>>  dev_err(>pdev->dev, "Fatal error during GPU init\n");
>>  goto out;
>>  }
>> 
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 1/7] drm/amdgpu: release VF exclusive accessing after hw_init

2017-10-24 Thread Ding, Pixel

Hi all,

I notice that this change will break KFD init on VF. KFD also load HQD which 
needs to be done in exclusive mode, otherwise KIQ register accessing causes 
LOAD VF failed at that time. However if we release EX mode after KFD init, the 
retry init logic will be quite complicated to handle all failures, also the EX 
mode time consuming is too much. Any idea? 

By now I would drop this change upstream.


— 
Sincerely Yours,
Pixel








On 24/10/2017, 12:23 AM, "Deucher, Alexander" <alexander.deuc...@amd.com> wrote:

>> -Original Message-
>> From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf
>> Of Pixel Ding
>> Sent: Monday, October 23, 2017 6:03 AM
>> To: amd-gfx@lists.freedesktop.org
>> Cc: Sun, Gary; Ding, Pixel; Li, Bingley
>> Subject: [PATCH 1/7] drm/amdgpu: release VF exclusive accessing after
>> hw_init
>> 
>> The subsequent operations don't need exclusive accessing hardware.
>> 
>> Signed-off-by: Pixel Ding <pixel.d...@amd.com>
>
>Acked-by: Alex Deucher <alexander.deuc...@amd.com>
>
>> ---
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 5 +
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c| 3 ---
>>  2 files changed, 5 insertions(+), 3 deletions(-)
>> 
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index 99acf29..286ba3c 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -1716,6 +1716,11 @@ static int amdgpu_init(struct amdgpu_device
>> *adev)
>>  adev->ip_blocks[i].status.hw = true;
>>  }
>> 
>> +if (amdgpu_sriov_vf(adev)) {
>> +DRM_INFO("rel_init\n");
>> +amdgpu_virt_release_full_gpu(adev, true);
>> +}
>> +
>>  return 0;
>>  }
>> 
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> index 4a9f749..f2eb7ac 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
>> @@ -171,9 +171,6 @@ int amdgpu_driver_load_kms(struct drm_device
>> *dev, unsigned long flags)
>>  pm_runtime_put_autosuspend(dev->dev);
>>  }
>> 
>> -if (amdgpu_sriov_vf(adev))
>> -amdgpu_virt_release_full_gpu(adev, true);
>> -
>>  out:
>>  if (r) {
>>  /* balance pm_runtime_get_sync in
>> amdgpu_driver_unload_kms */
>> --
>> 2.9.5
>> 
>> ___
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 5/7] drm/amdgpu: don't disable MSI for GPU virtual function

2017-10-24 Thread Ding, Pixel

Tried but still fail. Meanwhile I also think it’s not good to disable device 
here since we only redo the amdgpu_device_init for exclusive mode timeout.

— 
Sincerely Yours,
Pixel








On 24/10/2017, 11:03 AM, "Liu, Monk" <monk@amd.com> wrote:

>Can you try call pci_disable_device once you found init failed, and make sure 
>it is called before re-init 
>
>
>-----Original Message-
>From: Ding, Pixel 
>Sent: 2017年10月24日 9:31
>To: Liu, Monk <monk@amd.com>; amd-gfx@lists.freedesktop.org; Xiao, Jack 
><jack.x...@amd.com>
>Cc: Sun, Gary <gary@amd.com>; Li, Bingley <bingley...@amd.com>
>Subject: Re: [PATCH 5/7] drm/amdgpu: don't disable MSI for GPU virtual function
>
>Tested with 5248e3d9, however issue is still reproduced in reinit case.
>
>+Jack,
>To bypass MSI enable/disable for reinit, any comment?
>— 
>Sincerely Yours,
>Pixel
>
>
>
>
>
>
>
>On 23/10/2017, 6:57 PM, "Liu, Monk" <monk@amd.com> wrote:
>
>>Please check commit "5248e3d9", your issue should already be fixed by that 
>>patch, please verify
>>
>>-Original Message-
>>From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf Of 
>>Pixel Ding
>>Sent: 2017年10月23日 18:04
>>To: amd-gfx@lists.freedesktop.org
>>Cc: Sun, Gary <gary@amd.com>; Ding, Pixel <pixel.d...@amd.com>; Li, 
>>Bingley <bingley...@amd.com>
>>Subject: [PATCH 5/7] drm/amdgpu: don't disable MSI for GPU virtual function
>>
>>From: pding <pixel.d...@amd.com>
>>
>>After calling pci_disable_msi() and pci_enable_msi(), VF can't receive 
>>interrupt anymore. This may introduce problems in module reloading or 
>>retrying init.
>>
>>Signed-off-by: pding <pixel.d...@amd.com>
>>---
>> drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 5 ++---
>> 1 file changed, 2 insertions(+), 3 deletions(-)
>>
>>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c 
>>b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>>index c2d8255..a3314b5 100644
>>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>>@@ -229,8 +229,7 @@ int amdgpu_irq_init(struct amdgpu_device *adev)
>>  adev->irq.msi_enabled = false;
>> 
>>  if (amdgpu_msi_ok(adev)) {
>>- int ret = pci_enable_msi(adev->pdev);
>>- if (!ret) {
>>+ if (adev->pdev->msi_enabled || !pci_enable_msi(adev->pdev)) {
>>  adev->irq.msi_enabled = true;
>>  INIT_DEV_INFO(adev->dev, "amdgpu: using MSI.\n");
>>  }
>>@@ -280,7 +279,7 @@ void amdgpu_irq_fini(struct amdgpu_device *adev)
>>  if (adev->irq.installed) {
>>  drm_irq_uninstall(adev->ddev);
>>  adev->irq.installed = false;
>>- if (adev->irq.msi_enabled)
>>+ if (adev->irq.msi_enabled && !amdgpu_sriov_vf(adev))
>>  pci_disable_msi(adev->pdev);
>>  flush_work(>hotplug_work);
>>  cancel_work_sync(>reset_work);
>>--
>>2.9.5
>>
>>___
>>amd-gfx mailing list
>>amd-gfx@lists.freedesktop.org
>>https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 5/7] drm/amdgpu: don't disable MSI for GPU virtual function

2017-10-23 Thread Ding, Pixel

Sorry for the misunderstanding. Will test with similar fix instead of 5248e3d9 
directly later.

—
Sincerely Yours,
Pixel







On 24/10/2017, 9:31 AM, "Ding, Pixel" <pixel.d...@amd.com> wrote:

>Tested with 5248e3d9, however issue is still reproduced in reinit case.
>
>+Jack,
>To bypass MSI enable/disable for reinit, any comment?
>— 
>Sincerely Yours,
>Pixel
>
>
>
>
>
>
>
>On 23/10/2017, 6:57 PM, "Liu, Monk" <monk@amd.com> wrote:
>
>>Please check commit "5248e3d9", your issue should already be fixed by that 
>>patch, please verify
>>
>>-Original Message-
>>From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf Of 
>>Pixel Ding
>>Sent: 2017年10月23日 18:04
>>To: amd-gfx@lists.freedesktop.org
>>Cc: Sun, Gary <gary@amd.com>; Ding, Pixel <pixel.d...@amd.com>; Li, 
>>Bingley <bingley...@amd.com>
>>Subject: [PATCH 5/7] drm/amdgpu: don't disable MSI for GPU virtual function
>>
>>From: pding <pixel.d...@amd.com>
>>
>>After calling pci_disable_msi() and pci_enable_msi(), VF can't receive 
>>interrupt anymore. This may introduce problems in module reloading or 
>>retrying init.
>>
>>Signed-off-by: pding <pixel.d...@amd.com>
>>---
>> drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 5 ++---
>> 1 file changed, 2 insertions(+), 3 deletions(-)
>>
>>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c 
>>b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>>index c2d8255..a3314b5 100644
>>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>>@@ -229,8 +229,7 @@ int amdgpu_irq_init(struct amdgpu_device *adev)
>>  adev->irq.msi_enabled = false;
>> 
>>  if (amdgpu_msi_ok(adev)) {
>>- int ret = pci_enable_msi(adev->pdev);
>>- if (!ret) {
>>+ if (adev->pdev->msi_enabled || !pci_enable_msi(adev->pdev)) {
>>  adev->irq.msi_enabled = true;
>>  INIT_DEV_INFO(adev->dev, "amdgpu: using MSI.\n");
>>  }
>>@@ -280,7 +279,7 @@ void amdgpu_irq_fini(struct amdgpu_device *adev)
>>  if (adev->irq.installed) {
>>  drm_irq_uninstall(adev->ddev);
>>  adev->irq.installed = false;
>>- if (adev->irq.msi_enabled)
>>+ if (adev->irq.msi_enabled && !amdgpu_sriov_vf(adev))
>>  pci_disable_msi(adev->pdev);
>>  flush_work(>hotplug_work);
>>  cancel_work_sync(>reset_work);
>>--
>>2.9.5
>>
>>___
>>amd-gfx mailing list
>>amd-gfx@lists.freedesktop.org
>>https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 6/7] drm/amdgpu/virt: add wait_reset virt ops

2017-10-23 Thread Ding, Pixel

This is for retry init.

If the driver fails before late_init, the IRQ handler is not registered then. 
We need to know if the FLR is done at this point. I think maybe we also can 
leverage this interface to handle some special cases in future. Any concern 
about it?
— 
Sincerely Yours,
Pixel








On 23/10/2017, 7:01 PM, "Liu, Monk" <monk@amd.com> wrote:

>I don't see this is a necessary patch, driver already have the implement to 
>check if VF FLR is completed or not, see "xgpu_ai/vi_mailbox_flr_work()"
>
>Driver won't do gpu reset until this function received the NOTIFICATION_CMPL 
>message
>
>Do you have any particular reason to add this wait_reset ? if so please send 
>out the patch that use this interface 
>
>BR Monk
>
>-Original Message-
>From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf Of 
>Pixel Ding
>Sent: 2017年10月23日 18:04
>To: amd-gfx@lists.freedesktop.org
>Cc: Sun, Gary <gary@amd.com>; Ding, Pixel <pixel.d...@amd.com>; Li, 
>Bingley <bingley...@amd.com>
>Subject: [PATCH 6/7] drm/amdgpu/virt: add wait_reset virt ops
>
>From: pding <pixel.d...@amd.com>
>
>Driver can use this interface to check if there's a function level reset done 
>in hypervisor.
>
>Signed-off-by: pding <pixel.d...@amd.com>
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 16   
> drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h |  2 ++
> drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c|  1 +
> drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c|  6 ++
> 4 files changed, 25 insertions(+)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c 
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
>index 33dac7e..6a4a901 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
>@@ -231,6 +231,22 @@ int amdgpu_virt_reset_gpu(struct amdgpu_device *adev)  }
> 
> /**
>+ * amdgpu_virt_wait_reset() - wait for reset gpu completed
>+ * @amdgpu:   amdgpu device.
>+ * Wait for GPU reset completed.
>+ * Return: Zero if reset success, otherwise will return error.
>+ */
>+int amdgpu_virt_wait_reset(struct amdgpu_device *adev) {
>+  struct amdgpu_virt *virt = >virt;
>+
>+  if (!virt->ops || !virt->ops->wait_reset)
>+  return -EINVAL;
>+
>+  return virt->ops->wait_reset(adev);
>+}
>+
>+/**
>  * amdgpu_virt_alloc_mm_table() - alloc memory for mm table
>  * @amdgpu:   amdgpu device.
>  * MM table is used by UVD and VCE for its initialization diff --git 
> a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
>index 81efb9d..d149aca 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
>@@ -55,6 +55,7 @@ struct amdgpu_virt_ops {
>   int (*req_full_gpu)(struct amdgpu_device *adev, bool init);
>   int (*rel_full_gpu)(struct amdgpu_device *adev, bool init);
>   int (*reset_gpu)(struct amdgpu_device *adev);
>+  int (*wait_reset)(struct amdgpu_device *adev);
>   void (*trans_msg)(struct amdgpu_device *adev, u32 req, u32 data1, u32 
> data2, u32 data3);  };
> 
>@@ -286,6 +287,7 @@ void amdgpu_virt_kiq_wreg(struct amdgpu_device *adev, 
>uint32_t reg, uint32_t v);  int amdgpu_virt_request_full_gpu(struct 
>amdgpu_device *adev, bool init);  int amdgpu_virt_release_full_gpu(struct 
>amdgpu_device *adev, bool init);  int amdgpu_virt_reset_gpu(struct 
>amdgpu_device *adev);
>+int amdgpu_virt_wait_reset(struct amdgpu_device *adev);
> int amdgpu_sriov_gpu_reset(struct amdgpu_device *adev, struct amdgpu_job 
> *job);  int amdgpu_virt_alloc_mm_table(struct amdgpu_device *adev);  void 
> amdgpu_virt_free_mm_table(struct amdgpu_device *adev); diff --git 
> a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c 
> b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
>index b4906d2..f91aab3 100644
>--- a/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
>+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c
>@@ -353,5 +353,6 @@ const struct amdgpu_virt_ops xgpu_ai_virt_ops = {
>   .req_full_gpu   = xgpu_ai_request_full_gpu_access,
>   .rel_full_gpu   = xgpu_ai_release_full_gpu_access,
>   .reset_gpu = xgpu_ai_request_reset,
>+  .wait_reset = NULL,
>   .trans_msg = xgpu_ai_mailbox_trans_msg,  }; diff --git 
> a/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c 
> b/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c
>index c25a831..27b03c7 100644
>--- a/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c
>+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c
>@@ -458,6 +458,11 @@ static int xgpu_vi_request_reset(struct amdgpu_device 
>*adev)
>   return xgpu_vi_send_access_requests(adev, IDH_REQ_GPU_RESET_ACCESS);  }
> 
>+static int xgpu_vi_wait_reset_cmpl(struct amdg

Re: [PATCH 5/7] drm/amdgpu: don't disable MSI for GPU virtual function

2017-10-23 Thread Ding, Pixel

Tested with 5248e3d9, however issue is still reproduced in reinit case.

+Jack,
To bypass MSI enable/disable for reinit, any comment?
— 
Sincerely Yours,
Pixel







On 23/10/2017, 6:57 PM, "Liu, Monk" <monk@amd.com> wrote:

>Please check commit "5248e3d9", your issue should already be fixed by that 
>patch, please verify
>
>-Original Message-
>From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf Of 
>Pixel Ding
>Sent: 2017年10月23日 18:04
>To: amd-gfx@lists.freedesktop.org
>Cc: Sun, Gary <gary@amd.com>; Ding, Pixel <pixel.d...@amd.com>; Li, 
>Bingley <bingley...@amd.com>
>Subject: [PATCH 5/7] drm/amdgpu: don't disable MSI for GPU virtual function
>
>From: pding <pixel.d...@amd.com>
>
>After calling pci_disable_msi() and pci_enable_msi(), VF can't receive 
>interrupt anymore. This may introduce problems in module reloading or retrying 
>init.
>
>Signed-off-by: pding <pixel.d...@amd.com>
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 5 ++---
> 1 file changed, 2 insertions(+), 3 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c 
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>index c2d8255..a3314b5 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c
>@@ -229,8 +229,7 @@ int amdgpu_irq_init(struct amdgpu_device *adev)
>   adev->irq.msi_enabled = false;
> 
>   if (amdgpu_msi_ok(adev)) {
>-  int ret = pci_enable_msi(adev->pdev);
>-  if (!ret) {
>+  if (adev->pdev->msi_enabled || !pci_enable_msi(adev->pdev)) {
>   adev->irq.msi_enabled = true;
>   INIT_DEV_INFO(adev->dev, "amdgpu: using MSI.\n");
>   }
>@@ -280,7 +279,7 @@ void amdgpu_irq_fini(struct amdgpu_device *adev)
>   if (adev->irq.installed) {
>   drm_irq_uninstall(adev->ddev);
>   adev->irq.installed = false;
>-  if (adev->irq.msi_enabled)
>+  if (adev->irq.msi_enabled && !amdgpu_sriov_vf(adev))
>   pci_disable_msi(adev->pdev);
>   flush_work(>hotplug_work);
>   cancel_work_sync(>reset_work);
>--
>2.9.5
>
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 1/3] drm/amdgpu: always consider virualised device for checking post

2017-10-18 Thread Ding, Pixel

Hi Alex,

This change only affect virtualization and doesn’t change any code path of 
baremetal, and I have the virtualization test passed.

As we discussed in the v2 patch, merge 2 bios post checking function. Can I 
have your review for the v1? Or do you like to keep the amdgpu_vpost_needed() 
function name?

— 
Sincerely Yours,
Pixel







On 18/10/2017, 10:08 PM, "Deucher, Alexander" <alexander.deuc...@amd.com> wrote:

>> -Original Message-
>> From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf
>> Of Christian König
>> Sent: Wednesday, October 18, 2017 3:23 AM
>> To: Ding, Pixel; Liu, Monk; amd-gfx@lists.freedesktop.org
>> Cc: Sun, Gary; Li, Bingley
>> Subject: Re: [PATCH 1/3] drm/amdgpu: always consider virualised device for
>> checking post
>> 
>> I've already did, but honestly have no idea what you do here.
>> 
>> ASIC post is not something I've every worked on.
>> 
>> It looks odd that you add/remove some static keyword here, but I can't
>> judge the technical correctness.
>> 
>> Monk, Alex what do you think of this?
>
>I think we should fix the issue in amdgpu_bios.c or sort out/merge 
>amdgpu_need_post and amdgpu_vpost_needed.  This change is really unclear.
>
>Alex
>
>> 
>> Sorry,
>> Christian.
>> 
>> Am 18.10.2017 um 09:19 schrieb Ding, Pixel:
>> > Hi Christian,
>> >
>> > Would you please take a look at this change? It’s to unify the VBIOS post
>> checking.
>> > —
>> > Sincerely Yours,
>> > Pixel
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > On 18/10/2017, 10:25 AM, "Ding, Pixel" <pixel.d...@amd.com> wrote:
>> >
>> >> Hi all,
>> >>
>> >> Could someone review this patch?
>> >>
>> >> —
>> >> Sincerely Yours,
>> >> Pixel
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >>
>> >> On 17/10/2017, 4:06 PM, "amd-gfx on behalf of Ding, Pixel" > boun...@lists.freedesktop.org on behalf of pixel.d...@amd.com> wrote:
>> >>
>> >>> you can see the difference of function amdgpu_need_post. Generally
>> speaking, there were 2 functions to check VBIOS posting, one considers VF
>> and passthru while the other doesn’t. In fact we should always consider VF
>> and PT for checking, right? For example, this checking here believe VF needs
>> a posting because SCRATCH registers are not the expected value. Is it clear?
>> >>> —
>> >>> Sincerely Yours,
>> >>> Pixel
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>>
>> >>> On 17/10/2017, 5:00 PM, "Liu, Monk" <monk@amd.com> wrote:
>> >>>
>> >>> >From the patch itself I still couldn't tell the difference
>> >>>> -Original Message-
>> >>>> From: Ding, Pixel
>> >>>> Sent: 2017年10月17日 15:54
>> >>>> To: Liu, Monk <monk@amd.com>; amd-gfx@lists.freedesktop.org;
>> Koenig, Christian <christian.koe...@amd.com>
>> >>>> Cc: Li, Bingley <bingley...@amd.com>; Sun, Gary
>> <gary@amd.com>
>> >>>> Subject: Re: [PATCH 1/3] drm/amdgpu: always consider virualised
>> device for checking post
>> >>>>
>> >>>> It fixes a issue hidden in:
>> >>>>
>> >>>> 95 static bool igp_read_bios_from_vram(struct amdgpu_device *adev)
>> >>>> 96 {
>> >>>> 97  uint8_t __iomem *bios;
>> >>>> 98  resource_size_t vram_base;
>> >>>> 99  resource_size_t size = 256 * 1024; /* ??? */
>> >>>> 100
>> >>>> 101 if (!(adev->flags & AMD_IS_APU))
>> >>>> 102 if (amdgpu_need_post(adev))
>> >>>> 103 return false;
>> >>>>
>> >>>>
>> >>>> This makes bios reading fallback to SMC INDEX/DATA register case.
>> >>>>
>> >>>> —
>> >>>> Sincerely Yours,
>> >>>> Pixel
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >

Re: [PATCH] drm/amdgpu: drm/amdgpu: always consider virualised device for checking post (v2)

2017-10-18 Thread Ding, Pixel

Thanks Alex. Some comments inline.
— 
Sincerely Yours,
Pixel








On 19/10/2017, 10:34 AM, "Alex Deucher"  wrote:

>On Wed, Oct 18, 2017 at 9:44 PM, Pixel Ding  wrote:
>> From: pding 
>>
>> The post checking on scratch registers isn't reliable for virtual function.
>>
>> v2: only change in IGP reading bios.
>
>
>Subject has "drm/amdgpu: drm/amdgpu: " drop one of them.
>[Pixel] get it.
>
>
>>
>> Signed-off-by: pding 
>> ---
>>  drivers/gpu/drm/amd/amdgpu/amdgpu.h| 1 +
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_bios.c   | 2 +-
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
>>  3 files changed, 3 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> index 0280ae5..caabc5b 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> @@ -1840,6 +1840,7 @@ amdgpu_get_sdma_instance(struct amdgpu_ring *ring)
>>  /* Common functions */
>>  int amdgpu_gpu_reset(struct amdgpu_device *adev);
>>  bool amdgpu_need_backup(struct amdgpu_device *adev);
>> +bool amdgpu_vpost_needed(struct amdgpu_device *adev);
>>  void amdgpu_pci_config_reset(struct amdgpu_device *adev);
>>  bool amdgpu_need_post(struct amdgpu_device *adev);
>
>amdgpu_need_post can be dropped now.
[Pixel] some other functions still invoke the amdgpu_need_post(). Only if we 
change all to use amdgpu_vpost_needed(). that’s what the v1 patch did, however 
it’s confused. the only difference is that original patch uses 
amdgpu_need_post() function name instead of amdgpu_vpost_needed(), I thought 
it’s more generic.
>
>>  void amdgpu_update_display_priority(struct amdgpu_device *adev);
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bios.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_bios.c
>> index c21adf6..25f43eb 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bios.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bios.c
>> @@ -99,7 +99,7 @@ static bool igp_read_bios_from_vram(struct amdgpu_device 
>> *adev)
>> resource_size_t size = 256 * 1024; /* ??? */
>>
>> if (!(adev->flags & AMD_IS_APU))
>> -   if (amdgpu_need_post(adev))
>> +   if (amdgpu_vpost_needed(adev))
>> return false;
>>
>> adev->bios = NULL;
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index a601d87..098cd44 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -764,7 +764,7 @@ bool amdgpu_need_post(struct amdgpu_device *adev)
>>
>>  }
>>
>> -static bool amdgpu_vpost_needed(struct amdgpu_device *adev)
>> +bool amdgpu_vpost_needed(struct amdgpu_device *adev)
>
>Please also make amdgpu_need_post() static now.
>
>Is there a way we can just merge amdgpu_need_post() into
>amdgpu_vpost_needed()?  For bare metal it should be fine.  Might need
>some logic adjustments for sr-iov amdgpu_device_resume().  That can be
>a follow on patch if you want.
[Pixel] the original patch replaces all amdgpu_need_post() with 
amdgpu_vpost_needed(), then rename the amdgpu_vpost_needed() to 
amdgpu_need_post(), and amdgpu_need_post() to static amdgpu_check_post(). After 
these transform it looks unclear. By now I think we can go back to that one.

>
>Alex
>
>
>>  {
>> if (amdgpu_sriov_vf(adev))
>> return false;
>> --
>> 2.9.5
>>
>> ___
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: drm/amdgpu: always consider virualised device for checking post (v2)

2017-10-18 Thread Ding, Pixel

Hi Alex,

This is the v2 patch as you suggested. Only fix the issue found in 
amdgpu_bios.c.

Please review, thanks.

— 
Sincerely Yours,
Pixel








On 19/10/2017, 9:44 AM, "amd-gfx on behalf of Pixel Ding" 
 wrote:

>From: pding 
>
>The post checking on scratch registers isn't reliable for virtual function.
>
>v2: only change in IGP reading bios.
>
>Signed-off-by: pding 
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu.h| 1 +
> drivers/gpu/drm/amd/amdgpu/amdgpu_bios.c   | 2 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
> 3 files changed, 3 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
>b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>index 0280ae5..caabc5b 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>@@ -1840,6 +1840,7 @@ amdgpu_get_sdma_instance(struct amdgpu_ring *ring)
> /* Common functions */
> int amdgpu_gpu_reset(struct amdgpu_device *adev);
> bool amdgpu_need_backup(struct amdgpu_device *adev);
>+bool amdgpu_vpost_needed(struct amdgpu_device *adev);
> void amdgpu_pci_config_reset(struct amdgpu_device *adev);
> bool amdgpu_need_post(struct amdgpu_device *adev);
> void amdgpu_update_display_priority(struct amdgpu_device *adev);
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_bios.c 
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_bios.c
>index c21adf6..25f43eb 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_bios.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_bios.c
>@@ -99,7 +99,7 @@ static bool igp_read_bios_from_vram(struct amdgpu_device 
>*adev)
>   resource_size_t size = 256 * 1024; /* ??? */
> 
>   if (!(adev->flags & AMD_IS_APU))
>-  if (amdgpu_need_post(adev))
>+  if (amdgpu_vpost_needed(adev))
>   return false;
> 
>   adev->bios = NULL;
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>index a601d87..098cd44 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>@@ -764,7 +764,7 @@ bool amdgpu_need_post(struct amdgpu_device *adev)
> 
> }
> 
>-static bool amdgpu_vpost_needed(struct amdgpu_device *adev)
>+bool amdgpu_vpost_needed(struct amdgpu_device *adev)
> {
>   if (amdgpu_sriov_vf(adev))
>   return false;
>-- 
>2.9.5
>
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 1/3] drm/amdgpu: always consider virualised device for checking post

2017-10-18 Thread Ding, Pixel

Sorry for the confusion, while I still believe this is the right way to go.

amdgpu_need_post and amdgpu_vpost_needed both try to check the VBIOS posting, 
amdgpu_need_post doesn’t consider VF/PT devices. They were both in use.

This patch is to unify these 2 functions to single one and always consider the 
VF/PT devices.

So the amdgpu_vpost_needed is renamed to amdgpu_need_post, and the previous 
amdgpu_need_post becomes a static function.
— 
Sincerely Yours,
Pixel







On 18/10/2017, 3:23 PM, "amd-gfx on behalf of Christian König" 
<amd-gfx-boun...@lists.freedesktop.org on behalf of christian.koe...@amd.com> 
wrote:

>I've already did, but honestly have no idea what you do here.
>
>ASIC post is not something I've every worked on.
>
>It looks odd that you add/remove some static keyword here, but I can't 
>judge the technical correctness.
>
>Monk, Alex what do you think of this?
>
>Sorry,
>Christian.
>
>Am 18.10.2017 um 09:19 schrieb Ding, Pixel:
>> Hi Christian,
>>
>> Would you please take a look at this change? It’s to unify the VBIOS post 
>> checking.
>> —
>> Sincerely Yours,
>> Pixel
>>
>>
>>
>>
>>
>>
>>
>>
>> On 18/10/2017, 10:25 AM, "Ding, Pixel" <pixel.d...@amd.com> wrote:
>>
>>> Hi all,
>>>
>>> Could someone review this patch?
>>>
>>> —
>>> Sincerely Yours,
>>> Pixel
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 17/10/2017, 4:06 PM, "amd-gfx on behalf of Ding, Pixel" 
>>> <amd-gfx-boun...@lists.freedesktop.org on behalf of pixel.d...@amd.com> 
>>> wrote:
>>>
>>>> you can see the difference of function amdgpu_need_post. Generally 
>>>> speaking, there were 2 functions to check VBIOS posting, one considers VF 
>>>> and passthru while the other doesn’t. In fact we should always consider VF 
>>>> and PT for checking, right? For example, this checking here believe VF 
>>>> needs a posting because SCRATCH registers are not the expected value. Is 
>>>> it clear?
>>>> —
>>>> Sincerely Yours,
>>>> Pixel
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 17/10/2017, 5:00 PM, "Liu, Monk" <monk@amd.com> wrote:
>>>>
>>>> >From the patch itself I still couldn't tell the difference 
>>>>> -Original Message-
>>>>> From: Ding, Pixel
>>>>> Sent: 2017年10月17日 15:54
>>>>> To: Liu, Monk <monk@amd.com>; amd-gfx@lists.freedesktop.org; Koenig, 
>>>>> Christian <christian.koe...@amd.com>
>>>>> Cc: Li, Bingley <bingley...@amd.com>; Sun, Gary <gary@amd.com>
>>>>> Subject: Re: [PATCH 1/3] drm/amdgpu: always consider virualised device 
>>>>> for checking post
>>>>>
>>>>> It fixes a issue hidden in:
>>>>>
>>>>> 95 static bool igp_read_bios_from_vram(struct amdgpu_device *adev)
>>>>> 96 {
>>>>> 97uint8_t __iomem *bios;
>>>>> 98resource_size_t vram_base;
>>>>> 99resource_size_t size = 256 * 1024; /* ??? */
>>>>> 100
>>>>> 101   if (!(adev->flags & AMD_IS_APU))
>>>>> 102   if (amdgpu_need_post(adev))
>>>>> 103   return false;
>>>>>
>>>>>
>>>>> This makes bios reading fallback to SMC INDEX/DATA register case.
>>>>>
>>>>> —
>>>>> Sincerely Yours,
>>>>> Pixel
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 17/10/2017, 3:48 PM, "Liu, Monk" <monk@amd.com> wrote:
>>>>>
>>>>>> I don't understand how this patch works??? Looks like just rename 
>>>>>> vpost_needed to check_post
>>>>>>
>>>>>> -Original Message-
>>>>>> From: Pixel Ding [mailto:pixel.d...@amd.com]
>>>>>> Sent: 2017年10月17日 14:38
>>>>>> To: amd-gfx@lists.freedesktop.org; Liu, Monk <monk@amd.com>; Koenig, 
>>>>>> Christian <christian.koe...@amd.com>
>>>>>> Cc: Li, Bingley <bingley...@amd.com>; Sun, Gary <gary@amd.com>

Re: [PATCH 1/3] drm/amdgpu: always consider virualised device for checking post

2017-10-18 Thread Ding, Pixel

Hi Christian,

Would you please take a look at this change? It’s to unify the VBIOS post 
checking.
— 
Sincerely Yours,
Pixel








On 18/10/2017, 10:25 AM, "Ding, Pixel" <pixel.d...@amd.com> wrote:

>Hi all,
>
>Could someone review this patch? 
>
>— 
>Sincerely Yours,
>Pixel
>
>
>
>
>
>
>
>On 17/10/2017, 4:06 PM, "amd-gfx on behalf of Ding, Pixel" 
><amd-gfx-boun...@lists.freedesktop.org on behalf of pixel.d...@amd.com> wrote:
>
>>you can see the difference of function amdgpu_need_post. Generally speaking, 
>>there were 2 functions to check VBIOS posting, one considers VF and passthru 
>>while the other doesn’t. In fact we should always consider VF and PT for 
>>checking, right? For example, this checking here believe VF needs a posting 
>>because SCRATCH registers are not the expected value. Is it clear?
>>— 
>>Sincerely Yours,
>>Pixel
>>
>>
>>
>>
>>
>>
>>
>>
>>On 17/10/2017, 5:00 PM, "Liu, Monk" <monk@amd.com> wrote:
>>
>>>From the patch itself I still couldn't tell the difference 
>>>
>>>-Original Message-
>>>From: Ding, Pixel 
>>>Sent: 2017年10月17日 15:54
>>>To: Liu, Monk <monk@amd.com>; amd-gfx@lists.freedesktop.org; Koenig, 
>>>Christian <christian.koe...@amd.com>
>>>Cc: Li, Bingley <bingley...@amd.com>; Sun, Gary <gary@amd.com>
>>>Subject: Re: [PATCH 1/3] drm/amdgpu: always consider virualised device for 
>>>checking post
>>>
>>>It fixes a issue hidden in:
>>>
>>>95 static bool igp_read_bios_from_vram(struct amdgpu_device *adev)
>>> 96 {
>>> 97  uint8_t __iomem *bios;
>>> 98  resource_size_t vram_base;
>>> 99  resource_size_t size = 256 * 1024; /* ??? */
>>>100
>>>101  if (!(adev->flags & AMD_IS_APU))
>>>102  if (amdgpu_need_post(adev))
>>>103  return false;
>>>
>>>
>>>This makes bios reading fallback to SMC INDEX/DATA register case.
>>>
>>>— 
>>>Sincerely Yours,
>>>Pixel
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>On 17/10/2017, 3:48 PM, "Liu, Monk" <monk@amd.com> wrote:
>>>
>>>>I don't understand how this patch works??? Looks like just rename 
>>>>vpost_needed to check_post
>>>>
>>>>-Original Message-
>>>>From: Pixel Ding [mailto:pixel.d...@amd.com] 
>>>>Sent: 2017年10月17日 14:38
>>>>To: amd-gfx@lists.freedesktop.org; Liu, Monk <monk@amd.com>; Koenig, 
>>>>Christian <christian.koe...@amd.com>
>>>>Cc: Li, Bingley <bingley...@amd.com>; Sun, Gary <gary@amd.com>; Ding, 
>>>>Pixel <pixel.d...@amd.com>
>>>>Subject: [PATCH 1/3] drm/amdgpu: always consider virualised device for 
>>>>checking post
>>>>
>>>>From: pding <pixel.d...@amd.com>
>>>>
>>>>The post checking on scratch registers isn't reliable for virtual function.
>>>>
>>>>Signed-off-by: pding <pixel.d...@amd.com>
>>>>---
>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 
>>>> 1 file changed, 4 insertions(+), 4 deletions(-)
>>>>
>>>>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>>>>b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>index 683965b..ab8f0d6 100644
>>>>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>>@@ -669,7 +669,7 @@ void amdgpu_gart_location(struct amdgpu_device *adev, 
>>>>struct amdgpu_mc *mc)
>>>>  * or post is needed if  hw reset is performed.
>>>>  * Returns true if need or false if not.
>>>>  */
>>>>-bool amdgpu_need_post(struct amdgpu_device *adev)
>>>>+static bool amdgpu_check_post(struct amdgpu_device *adev)
>>>> {
>>>>uint32_t reg;
>>>> 
>>>>@@ -692,7 +692,7 @@ bool amdgpu_need_post(struct amdgpu_device *adev)
>>>> 
>>>> }
>>>> 
>>>>-static bool amdgpu_vpost_needed(struct amdgpu_device *adev)
>>>>+bool amdgpu_need_post(struct amdgpu_device *adev)
>>>> {
>>>>if (amdgpu_sriov_vf(adev))
>>>>return false;
>>>>@@ -716,7 +716,7 @@ static bool amdgpu_vpost_needed(struct amdgpu_device 
>>>>*adev)
>>>>return true;
>>>>}
>>>>}
>>>>-   return amdgpu_need_post(adev);
>>>>+   return amdgpu_check_post(adev);
>>>> }
>>>> 
>>>> /**
>>>>@@ -2208,7 +2208,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>>>>amdgpu_device_detect_sriov_bios(adev);
>>>> 
>>>>/* Post card if necessary */
>>>>-   if (amdgpu_vpost_needed(adev)) {
>>>>+   if (amdgpu_need_post(adev)) {
>>>>if (!adev->bios) {
>>>>dev_err(adev->dev, "no vBIOS found\n");
>>>>amdgpu_vf_error_put(AMDGIM_ERROR_VF_NO_VBIOS, 0, 0);
>>>>--
>>>>2.9.5
>>>>
>>___
>>amd-gfx mailing list
>>amd-gfx@lists.freedesktop.org
>>https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 3/3] drm/amdgpu: busywait KIQ register accessing v3

2017-10-18 Thread Ding, Pixel

Thanks Christian:) will change the return value as you suggested.
— 
Sincerely Yours,
Pixel








On 18/10/2017, 3:05 PM, "Koenig, Christian"  wrote:

>Am 18.10.2017 um 02:56 schrieb Pixel Ding:
>> From: pding 
>>
>> Register accessing is performed when IRQ is disabled. Never sleep in
>> this function.
>>
>> Known issue: dead sleep in many use cases of index/data registers.
>>
>> v2: wrap polling fence functions. don't trigger IRQ for polling in
>> case of wrongly fence signal.
>>
>> v3: handle wrap round gracefully. add comments
>>
>> Signed-off-by: pding 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  2 +-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c |  4 +-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c |  4 +-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c|  8 +---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 50 
>> +++
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c   |  2 +-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h  |  4 ++
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c  | 30 ++
>>   8 files changed, 75 insertions(+), 29 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> index ca212ef..cc06e62 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> @@ -885,7 +885,7 @@ struct amdgpu_mec {
>>   struct amdgpu_kiq {
>>  u64 eop_gpu_addr;
>>  struct amdgpu_bo*eop_obj;
>> -struct mutexring_mutex;
>> +spinlock_t  ring_lock;
>>  struct amdgpu_ring  ring;
>>  struct amdgpu_irq_src   irq;
>>   };
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
>> index 9d9965d..6d83573 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
>> @@ -788,7 +788,7 @@ static int invalidate_tlbs_with_kiq(struct amdgpu_device 
>> *adev, uint16_t pasid)
>>  struct dma_fence *f;
>>  struct amdgpu_ring *ring = >gfx.kiq.ring;
>>   
>> -mutex_lock(>gfx.kiq.ring_mutex);
>> +spin_lock(>gfx.kiq.ring_lock);
>>  amdgpu_ring_alloc(ring, 12); /* fence + invalidate_tlbs package*/
>>  amdgpu_ring_write(ring, PACKET3(PACKET3_INVALIDATE_TLBS, 0));
>>  amdgpu_ring_write(ring,
>> @@ -796,7 +796,7 @@ static int invalidate_tlbs_with_kiq(struct amdgpu_device 
>> *adev, uint16_t pasid)
>>  PACKET3_INVALIDATE_TLBS_PASID(pasid));
>>  amdgpu_fence_emit(ring, );
>>  amdgpu_ring_commit(ring);
>> -mutex_unlock(>gfx.kiq.ring_mutex);
>> +spin_unlock(>gfx.kiq.ring_lock);
>>   
>>  r = dma_fence_wait(f, false);
>>  if (r)
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
>> index edbae19..c92217f 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
>> @@ -973,7 +973,7 @@ static int invalidate_tlbs_with_kiq(struct amdgpu_device 
>> *adev, uint16_t pasid)
>>  struct dma_fence *f;
>>  struct amdgpu_ring *ring = >gfx.kiq.ring;
>>   
>> -mutex_lock(>gfx.kiq.ring_mutex);
>> +spin_lock(>gfx.kiq.ring_lock);
>>  amdgpu_ring_alloc(ring, 12); /* fence + invalidate_tlbs package*/
>>  amdgpu_ring_write(ring, PACKET3(PACKET3_INVALIDATE_TLBS, 0));
>>  amdgpu_ring_write(ring,
>> @@ -983,7 +983,7 @@ static int invalidate_tlbs_with_kiq(struct amdgpu_device 
>> *adev, uint16_t pasid)
>>  PACKET3_INVALIDATE_TLBS_FLUSH_TYPE(2));
>>  amdgpu_fence_emit(ring, );
>>  amdgpu_ring_commit(ring);
>> -mutex_unlock(>gfx.kiq.ring_mutex);
>> +spin_unlock(>gfx.kiq.ring_lock);
>>   
>>  r = dma_fence_wait(f, false);
>>  if (r)
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index ab8f0d6..1197274 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -109,10 +109,8 @@ uint32_t amdgpu_mm_rreg(struct amdgpu_device *adev, 
>> uint32_t reg,
>>   {
>>  uint32_t ret;
>>   
>> -if (!(acc_flags & AMDGPU_REGS_NO_KIQ) && amdgpu_sriov_runtime(adev)) {
>> -BUG_ON(in_interrupt());
>> +if (!(acc_flags & AMDGPU_REGS_NO_KIQ) && amdgpu_sriov_runtime(adev))
>>  return amdgpu_virt_kiq_rreg(adev, reg);
>> -}
>>   
>>  if ((reg * 4) < adev->rmmio_size && !(acc_flags & AMDGPU_REGS_IDX))
>>  ret = readl(((void __iomem *)adev->rmmio) + (reg * 4));
>> @@ -137,10 +135,8 @@ void amdgpu_mm_wreg(struct amdgpu_device *adev, 
>> uint32_t reg, uint32_t v,
>>  adev->last_mm_index = v;
>>  }
>>   
>> -if (!(acc_flags & AMDGPU_REGS_NO_KIQ) &&

Re: [PATCH 1/3] drm/amdgpu: always consider virualised device for checking post

2017-10-17 Thread Ding, Pixel

Hi all,

Could someone review this patch? 

— 
Sincerely Yours,
Pixel







On 17/10/2017, 4:06 PM, "amd-gfx on behalf of Ding, Pixel" 
<amd-gfx-boun...@lists.freedesktop.org on behalf of pixel.d...@amd.com> wrote:

>you can see the difference of function amdgpu_need_post. Generally speaking, 
>there were 2 functions to check VBIOS posting, one considers VF and passthru 
>while the other doesn’t. In fact we should always consider VF and PT for 
>checking, right? For example, this checking here believe VF needs a posting 
>because SCRATCH registers are not the expected value. Is it clear?
>— 
>Sincerely Yours,
>Pixel
>
>
>
>
>
>
>
>
>On 17/10/2017, 5:00 PM, "Liu, Monk" <monk@amd.com> wrote:
>
>>From the patch itself I still couldn't tell the difference 
>>
>>-Original Message-
>>From: Ding, Pixel 
>>Sent: 2017年10月17日 15:54
>>To: Liu, Monk <monk@amd.com>; amd-gfx@lists.freedesktop.org; Koenig, 
>>Christian <christian.koe...@amd.com>
>>Cc: Li, Bingley <bingley...@amd.com>; Sun, Gary <gary@amd.com>
>>Subject: Re: [PATCH 1/3] drm/amdgpu: always consider virualised device for 
>>checking post
>>
>>It fixes a issue hidden in:
>>
>>95 static bool igp_read_bios_from_vram(struct amdgpu_device *adev)
>> 96 {
>> 97   uint8_t __iomem *bios;
>> 98   resource_size_t vram_base;
>> 99   resource_size_t size = 256 * 1024; /* ??? */
>>100
>>101   if (!(adev->flags & AMD_IS_APU))
>>102   if (amdgpu_need_post(adev))
>>103   return false;
>>
>>
>>This makes bios reading fallback to SMC INDEX/DATA register case.
>>
>>— 
>>Sincerely Yours,
>>Pixel
>>
>>
>>
>>
>>
>>
>>
>>
>>On 17/10/2017, 3:48 PM, "Liu, Monk" <monk@amd.com> wrote:
>>
>>>I don't understand how this patch works??? Looks like just rename 
>>>vpost_needed to check_post
>>>
>>>-Original Message-
>>>From: Pixel Ding [mailto:pixel.d...@amd.com] 
>>>Sent: 2017年10月17日 14:38
>>>To: amd-gfx@lists.freedesktop.org; Liu, Monk <monk@amd.com>; Koenig, 
>>>Christian <christian.koe...@amd.com>
>>>Cc: Li, Bingley <bingley...@amd.com>; Sun, Gary <gary@amd.com>; Ding, 
>>>Pixel <pixel.d...@amd.com>
>>>Subject: [PATCH 1/3] drm/amdgpu: always consider virualised device for 
>>>checking post
>>>
>>>From: pding <pixel.d...@amd.com>
>>>
>>>The post checking on scratch registers isn't reliable for virtual function.
>>>
>>>Signed-off-by: pding <pixel.d...@amd.com>
>>>---
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 
>>> 1 file changed, 4 insertions(+), 4 deletions(-)
>>>
>>>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>>>b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>index 683965b..ab8f0d6 100644
>>>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>@@ -669,7 +669,7 @@ void amdgpu_gart_location(struct amdgpu_device *adev, 
>>>struct amdgpu_mc *mc)
>>>  * or post is needed if  hw reset is performed.
>>>  * Returns true if need or false if not.
>>>  */
>>>-bool amdgpu_need_post(struct amdgpu_device *adev)
>>>+static bool amdgpu_check_post(struct amdgpu_device *adev)
>>> {
>>> uint32_t reg;
>>> 
>>>@@ -692,7 +692,7 @@ bool amdgpu_need_post(struct amdgpu_device *adev)
>>> 
>>> }
>>> 
>>>-static bool amdgpu_vpost_needed(struct amdgpu_device *adev)
>>>+bool amdgpu_need_post(struct amdgpu_device *adev)
>>> {
>>> if (amdgpu_sriov_vf(adev))
>>> return false;
>>>@@ -716,7 +716,7 @@ static bool amdgpu_vpost_needed(struct amdgpu_device 
>>>*adev)
>>> return true;
>>> }
>>> }
>>>-return amdgpu_need_post(adev);
>>>+return amdgpu_check_post(adev);
>>> }
>>> 
>>> /**
>>>@@ -2208,7 +2208,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>>> amdgpu_device_detect_sriov_bios(adev);
>>> 
>>> /* Post card if necessary */
>>>-if (amdgpu_vpost_needed(adev)) {
>>>+if (amdgpu_need_post(adev)) {
>>> if (!adev->bios) {
>>> dev_err(adev->dev, "no vBIOS found\n");
>>> amdgpu_vf_error_put(AMDGIM_ERROR_VF_NO_VBIOS, 0, 0);
>>>--
>>>2.9.5
>>>
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 3/3] drm/amdgpu: busywait KIQ register accessing v2

2017-10-17 Thread Ding, Pixel

Thanks. In logically, it can’t handle the extreme case that the seq almost 
catches up with wait_seq in wrap round where we can’t use this trick, however I 
understand it can’t happen in reality.

I will test and modify.

— 
Sincerely Yours,
Pixel







On 17/10/2017, 4:59 PM, "Koenig, Christian" <christian.koe...@amd.com> wrote:

>Am 17.10.2017 um 10:38 schrieb Ding, Pixel:
>> [SNIP]
>>>> +  if (seq >= wait_seq && wait_seq >= last_seq)
>>>> +  break;
>>>> +  if (seq <= last_seq &&
>>>> +  (wait_seq <= seq || wait_seq >= last_seq))
>>>> +  break;
>>> Why not just "(int32_t)(wait_seq - seq) > 0" ? IIRC that should be
>>> sufficient for a wrap around test.
>>> [pixel] seq could be larger (executed) or smaller (not yet) than wait_seq 
>>> even without a wrap round, I think you mean "(int32_t)(last_seq - seq) > 0”
>>> it actually can know there’s a wrap around, but it still doesn't know if 
>>> the waited seq is in the range between or not, we still need other 
>>> conditions. Code here is to identify cases to break as:
>> ===last_seq=wait_seqseq===
>> ===wait_seq==seqlast_seq==
>> ==seq==last_seq==wait_seq=
>>
>> Does it make sense?
>
>No, let me explain a bit more. The full code I had in mind is this:
>
>do {
> seq = amdgpu_fence_read(ring)
>} while ((int32_t)(wait_seq - seq) > 0);
>
>This should handle the following cases:
>1. wait_seq is larger than seq, in this case we still need to wait.
>
>(wait_seq - seq) is larger than zero and the loop continues.
>
>2. wait_seq is smaller than seq, in this case the we can stop waiting.
>
>(wait_seq -seq) is smaller or equal to zero and the loop stops.
>
>3. wait_seq is much smaller than seq because of a wrap around, in this 
>case we still need to wait.
>
>(wait_seq - seq) is larger than zero because the substraction wraps 
>around the numeric range as well and the look will continue.
>
>4. wait_seq is much larger than seq because of a wrap around, in this 
>case we can stop waiting.
>
>(wait_seq - seq) is smaller than zero because the cast to a signed 
>number makes it negative and the loop stops.
>
>This is rather common code for wrap around testing and you don't need to 
>fiddle with last_seq at all here.
>
>If you want you can also use the __dma_fence_is_later() helper function 
>which implements exactly that test as well.
>
>Regards,
>Christian.
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 3/3] drm/amdgpu: busywait KIQ register accessing v2

2017-10-17 Thread Ding, Pixel


On 17/10/2017, 4:20 PM, "Koenig, Christian"  wrote:

>Am 17.10.2017 um 08:37 schrieb Pixel Ding:
>> From: pding 
>>
>> Register accessing is performed when IRQ is disabled. Never sleep in
>> this function.
>>
>> Known issue: dead sleep in many use cases of index/data registers.
>>
>> v2: wrap polling fence functions. don't trigger IRQ for polling in
>> case of wrongly fence signal.
>>
>> Signed-off-by: pding 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h   |  2 +-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c |  4 +-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c |  4 +-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c|  8 +---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 53 
>> +++
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c   |  2 +-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h  |  4 ++
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c  | 30 ++---
>>   8 files changed, 78 insertions(+), 29 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> index ca212ef..cc06e62 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> @@ -885,7 +885,7 @@ struct amdgpu_mec {
>>   struct amdgpu_kiq {
>>  u64 eop_gpu_addr;
>>  struct amdgpu_bo*eop_obj;
>> -struct mutexring_mutex;
>> +spinlock_t  ring_lock;
>>  struct amdgpu_ring  ring;
>>  struct amdgpu_irq_src   irq;
>>   };
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
>> index 9d9965d..6d83573 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v8.c
>> @@ -788,7 +788,7 @@ static int invalidate_tlbs_with_kiq(struct amdgpu_device 
>> *adev, uint16_t pasid)
>>  struct dma_fence *f;
>>  struct amdgpu_ring *ring = >gfx.kiq.ring;
>>   
>> -mutex_lock(>gfx.kiq.ring_mutex);
>> +spin_lock(>gfx.kiq.ring_lock);
>>  amdgpu_ring_alloc(ring, 12); /* fence + invalidate_tlbs package*/
>>  amdgpu_ring_write(ring, PACKET3(PACKET3_INVALIDATE_TLBS, 0));
>>  amdgpu_ring_write(ring,
>> @@ -796,7 +796,7 @@ static int invalidate_tlbs_with_kiq(struct amdgpu_device 
>> *adev, uint16_t pasid)
>>  PACKET3_INVALIDATE_TLBS_PASID(pasid));
>>  amdgpu_fence_emit(ring, );
>>  amdgpu_ring_commit(ring);
>> -mutex_unlock(>gfx.kiq.ring_mutex);
>> +spin_unlock(>gfx.kiq.ring_lock);
>>   
>>  r = dma_fence_wait(f, false);
>>  if (r)
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
>> index edbae19..c92217f 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
>> @@ -973,7 +973,7 @@ static int invalidate_tlbs_with_kiq(struct amdgpu_device 
>> *adev, uint16_t pasid)
>>  struct dma_fence *f;
>>  struct amdgpu_ring *ring = >gfx.kiq.ring;
>>   
>> -mutex_lock(>gfx.kiq.ring_mutex);
>> +spin_lock(>gfx.kiq.ring_lock);
>>  amdgpu_ring_alloc(ring, 12); /* fence + invalidate_tlbs package*/
>>  amdgpu_ring_write(ring, PACKET3(PACKET3_INVALIDATE_TLBS, 0));
>>  amdgpu_ring_write(ring,
>> @@ -983,7 +983,7 @@ static int invalidate_tlbs_with_kiq(struct amdgpu_device 
>> *adev, uint16_t pasid)
>>  PACKET3_INVALIDATE_TLBS_FLUSH_TYPE(2));
>>  amdgpu_fence_emit(ring, );
>>  amdgpu_ring_commit(ring);
>> -mutex_unlock(>gfx.kiq.ring_mutex);
>> +spin_unlock(>gfx.kiq.ring_lock);
>>   
>>  r = dma_fence_wait(f, false);
>>  if (r)
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index ab8f0d6..1197274 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -109,10 +109,8 @@ uint32_t amdgpu_mm_rreg(struct amdgpu_device *adev, 
>> uint32_t reg,
>>   {
>>  uint32_t ret;
>>   
>> -if (!(acc_flags & AMDGPU_REGS_NO_KIQ) && amdgpu_sriov_runtime(adev)) {
>> -BUG_ON(in_interrupt());
>> +if (!(acc_flags & AMDGPU_REGS_NO_KIQ) && amdgpu_sriov_runtime(adev))
>>  return amdgpu_virt_kiq_rreg(adev, reg);
>> -}
>>   
>>  if ((reg * 4) < adev->rmmio_size && !(acc_flags & AMDGPU_REGS_IDX))
>>  ret = readl(((void __iomem *)adev->rmmio) + (reg * 4));
>> @@ -137,10 +135,8 @@ void amdgpu_mm_wreg(struct amdgpu_device *adev, 
>> uint32_t reg, uint32_t v,
>>  adev->last_mm_index = v;
>>  }
>>   
>> -if (!(acc_flags & AMDGPU_REGS_NO_KIQ) && amdgpu_sriov_runtime(adev)) {
>> -BUG_ON(in_interrupt());
>> +if (!(acc_flags & AMDGPU_REGS_NO_KIQ) && amdgpu_sriov_runtime(adev))
>>

Re: [PATCH 1/3] drm/amdgpu: always consider virualised device for checking post

2017-10-17 Thread Ding, Pixel

you can see the difference of function amdgpu_need_post. Generally speaking, 
there were 2 functions to check VBIOS posting, one considers VF and passthru 
while the other doesn’t. In fact we should always consider VF and PT for 
checking, right? For example, this checking here believe VF needs a posting 
because SCRATCH registers are not the expected value. Is it clear?
— 
Sincerely Yours,
Pixel








On 17/10/2017, 5:00 PM, "Liu, Monk" <monk@amd.com> wrote:

>From the patch itself I still couldn't tell the difference 
>
>-Original Message-
>From: Ding, Pixel 
>Sent: 2017年10月17日 15:54
>To: Liu, Monk <monk@amd.com>; amd-gfx@lists.freedesktop.org; Koenig, 
>Christian <christian.koe...@amd.com>
>Cc: Li, Bingley <bingley...@amd.com>; Sun, Gary <gary@amd.com>
>Subject: Re: [PATCH 1/3] drm/amdgpu: always consider virualised device for 
>checking post
>
>It fixes a issue hidden in:
>
>95 static bool igp_read_bios_from_vram(struct amdgpu_device *adev)
> 96 {
> 97uint8_t __iomem *bios;
> 98resource_size_t vram_base;
> 99resource_size_t size = 256 * 1024; /* ??? */
>100
>101if (!(adev->flags & AMD_IS_APU))
>102if (amdgpu_need_post(adev))
>103return false;
>
>
>This makes bios reading fallback to SMC INDEX/DATA register case.
>
>— 
>Sincerely Yours,
>Pixel
>
>
>
>
>
>
>
>
>On 17/10/2017, 3:48 PM, "Liu, Monk" <monk@amd.com> wrote:
>
>>I don't understand how this patch works??? Looks like just rename 
>>vpost_needed to check_post
>>
>>-Original Message-
>>From: Pixel Ding [mailto:pixel.d...@amd.com] 
>>Sent: 2017年10月17日 14:38
>>To: amd-gfx@lists.freedesktop.org; Liu, Monk <monk@amd.com>; Koenig, 
>>Christian <christian.koe...@amd.com>
>>Cc: Li, Bingley <bingley...@amd.com>; Sun, Gary <gary@amd.com>; Ding, 
>>Pixel <pixel.d...@amd.com>
>>Subject: [PATCH 1/3] drm/amdgpu: always consider virualised device for 
>>checking post
>>
>>From: pding <pixel.d...@amd.com>
>>
>>The post checking on scratch registers isn't reliable for virtual function.
>>
>>Signed-off-by: pding <pixel.d...@amd.com>
>>---
>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 
>> 1 file changed, 4 insertions(+), 4 deletions(-)
>>
>>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>>b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>index 683965b..ab8f0d6 100644
>>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>@@ -669,7 +669,7 @@ void amdgpu_gart_location(struct amdgpu_device *adev, 
>>struct amdgpu_mc *mc)
>>  * or post is needed if  hw reset is performed.
>>  * Returns true if need or false if not.
>>  */
>>-bool amdgpu_need_post(struct amdgpu_device *adev)
>>+static bool amdgpu_check_post(struct amdgpu_device *adev)
>> {
>>  uint32_t reg;
>> 
>>@@ -692,7 +692,7 @@ bool amdgpu_need_post(struct amdgpu_device *adev)
>> 
>> }
>> 
>>-static bool amdgpu_vpost_needed(struct amdgpu_device *adev)
>>+bool amdgpu_need_post(struct amdgpu_device *adev)
>> {
>>  if (amdgpu_sriov_vf(adev))
>>  return false;
>>@@ -716,7 +716,7 @@ static bool amdgpu_vpost_needed(struct amdgpu_device 
>>*adev)
>>  return true;
>>  }
>>  }
>>- return amdgpu_need_post(adev);
>>+ return amdgpu_check_post(adev);
>> }
>> 
>> /**
>>@@ -2208,7 +2208,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>>  amdgpu_device_detect_sriov_bios(adev);
>> 
>>  /* Post card if necessary */
>>- if (amdgpu_vpost_needed(adev)) {
>>+ if (amdgpu_need_post(adev)) {
>>  if (!adev->bios) {
>>  dev_err(adev->dev, "no vBIOS found\n");
>>  amdgpu_vf_error_put(AMDGIM_ERROR_VF_NO_VBIOS, 0, 0);
>>--
>>2.9.5
>>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 2/3] drm/amdgpu: report more amdgpu_fence_info v2

2017-10-17 Thread Ding, Pixel

Why? The fence WB is always 8 DW.
— 
Sincerely Yours,
Pixel







On 17/10/2017, 3:49 PM, "Liu, Monk" <monk@amd.com> wrote:

>Please use if (amdgpu_sriov_vf())
>To protect your added part
>
>-Original Message-
>From: Pixel Ding [mailto:pixel.d...@amd.com] 
>Sent: 2017年10月17日 14:38
>To: amd-gfx@lists.freedesktop.org; Liu, Monk <monk@amd.com>; Koenig, 
>Christian <christian.koe...@amd.com>
>Cc: Li, Bingley <bingley...@amd.com>; Sun, Gary <gary@amd.com>; Ding, 
>Pixel <pixel.d...@amd.com>
>Subject: [PATCH 2/3] drm/amdgpu: report more amdgpu_fence_info v2
>
>From: pding <pixel.d...@amd.com>
>
>Only for GFX ring. This can help checking MCBP feature.
>
>v2: report more fence offs.
>
>The fence at the end of the frame will indicate the completion status.
>If the frame completed normally, the fence is written to the address given in 
>the EVENT_WRITE_EOP packet. If preemption occurred in the previous IB the 
>address is adjusted by 2 DWs. If work submitted in the frame was reset before 
>completion, the fence address is adjusted by four DWs. In the case that 
>preemption occurred, and before preemption completed a reset was initiated, 
>the address will be adjusted with six DWs
>
>Signed-off-by: pding <pixel.d...@amd.com>
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 13 +
> 1 file changed, 13 insertions(+)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>index 09d5a5c..688740e 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>@@ -645,6 +645,19 @@ static int amdgpu_debugfs_fence_info(struct seq_file *m, 
>void *data)
>  atomic_read(>fence_drv.last_seq));
>   seq_printf(m, "Last emitted0x%08x\n",
>  ring->fence_drv.sync_seq);
>+
>+  if (ring->funcs->type != AMDGPU_RING_TYPE_GFX)
>+  continue;
>+
>+  /* set in CP_VMID_PREEMPT and preemption occurred */
>+  seq_printf(m, "Last preempted  0x%08x\n",
>+ le32_to_cpu(*(ring->fence_drv.cpu_addr + 2)));
>+  /* set in CP_VMID_RESET and reset occurred */
>+  seq_printf(m, "Last reset  0x%08x\n",
>+ le32_to_cpu(*(ring->fence_drv.cpu_addr + 4)));
>+  /* Both preemption and reset occurred */
>+  seq_printf(m, "Last both   0x%08x\n",
>+ le32_to_cpu(*(ring->fence_drv.cpu_addr + 6)));
>   }
>   return 0;
> }
>--
>2.9.5
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 1/3] drm/amdgpu: always consider virualised device for checking post

2017-10-17 Thread Ding, Pixel

It fixes a issue hidden in:

95 static bool igp_read_bios_from_vram(struct amdgpu_device *adev)
 96 {
 97 uint8_t __iomem *bios;
 98 resource_size_t vram_base;
 99 resource_size_t size = 256 * 1024; /* ??? */
100
101 if (!(adev->flags & AMD_IS_APU))
102 if (amdgpu_need_post(adev))
103 return false;


This makes bios reading fallback to SMC INDEX/DATA register case.

— 
Sincerely Yours,
Pixel








On 17/10/2017, 3:48 PM, "Liu, Monk" <monk@amd.com> wrote:

>I don't understand how this patch works??? Looks like just rename vpost_needed 
>to check_post
>
>-Original Message-
>From: Pixel Ding [mailto:pixel.d...@amd.com] 
>Sent: 2017年10月17日 14:38
>To: amd-gfx@lists.freedesktop.org; Liu, Monk <monk@amd.com>; Koenig, 
>Christian <christian.koe...@amd.com>
>Cc: Li, Bingley <bingley...@amd.com>; Sun, Gary <gary@amd.com>; Ding, 
>Pixel <pixel.d...@amd.com>
>Subject: [PATCH 1/3] drm/amdgpu: always consider virualised device for 
>checking post
>
>From: pding <pixel.d...@amd.com>
>
>The post checking on scratch registers isn't reliable for virtual function.
>
>Signed-off-by: pding <pixel.d...@amd.com>
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8 
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>index 683965b..ab8f0d6 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>@@ -669,7 +669,7 @@ void amdgpu_gart_location(struct amdgpu_device *adev, 
>struct amdgpu_mc *mc)
>  * or post is needed if  hw reset is performed.
>  * Returns true if need or false if not.
>  */
>-bool amdgpu_need_post(struct amdgpu_device *adev)
>+static bool amdgpu_check_post(struct amdgpu_device *adev)
> {
>   uint32_t reg;
> 
>@@ -692,7 +692,7 @@ bool amdgpu_need_post(struct amdgpu_device *adev)
> 
> }
> 
>-static bool amdgpu_vpost_needed(struct amdgpu_device *adev)
>+bool amdgpu_need_post(struct amdgpu_device *adev)
> {
>   if (amdgpu_sriov_vf(adev))
>   return false;
>@@ -716,7 +716,7 @@ static bool amdgpu_vpost_needed(struct amdgpu_device *adev)
>   return true;
>   }
>   }
>-  return amdgpu_need_post(adev);
>+  return amdgpu_check_post(adev);
> }
> 
> /**
>@@ -2208,7 +2208,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>   amdgpu_device_detect_sriov_bios(adev);
> 
>   /* Post card if necessary */
>-  if (amdgpu_vpost_needed(adev)) {
>+  if (amdgpu_need_post(adev)) {
>   if (!adev->bios) {
>   dev_err(adev->dev, "no vBIOS found\n");
>   amdgpu_vf_error_put(AMDGIM_ERROR_VF_NO_VBIOS, 0, 0);
>--
>2.9.5
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 4/4] drm/amdgpu: busywait KIQ register accessing

2017-10-16 Thread Ding, Pixel

Hi Monk,

So far I notice that it’s almost impossible to have both methods working 
together.

For example, if the num_fence_mask is 2, and there’re a series of sequence 
number like “fence seq 1, busy wait seq 2, busy wait seq 3, fence seq 4”. Then 
if IRQ handler for seq 1 catch the HW seq number as 2, the fence slot for seq 4 
will be wrongly waken up.

If only dma_fence slots are used, the emit function ensures that each slot only 
has one valid seq number by waiting for the old one. But things go complicated 
if there’s busy waiting which registers seq number at the same time.


— 
Sincerely Yours,
Pixel







On 16/10/2017, 9:25 AM, "Ding, Pixel" <pixel.d...@amd.com> wrote:

>OK, I would keep both methods working together.
>— 
>Sincerely Yours,
>Pixel
>
>
>
>
>
>
>
>On 13/10/2017, 6:04 PM, "Liu, Monk" <monk@amd.com> wrote:
>
>>Why there will be racing issue ?
>>
>>Polling or sleep wait only have different result for the caller, not the job 
>>scheduled to KIQ 
>>
>>The sleep waiting is synchroniz sleep, it just release CPU resource to other 
>>process/thread, so the order is guaranteed 
>>
>>BR Monk
>>
>>-Original Message-
>>From: Ding, Pixel 
>>Sent: 2017年10月13日 17:39
>>To: Liu, Monk <monk@amd.com>; amd-gfx@lists.freedesktop.org
>>Cc: Li, Bingley <bingley...@amd.com>
>>Subject: Re: [PATCH 4/4] drm/amdgpu: busywait KIQ register accessing
>>
>>I’m afraid there’s racing issue if polling and IRQ use cases are mixed at the 
>>same time.
>>
>>The original implementation is as your suggested. Is there any benefit to 
>>keep to sleepy version?
>>— 
>>Sincerely Yours,
>>Pixel
>>
>>
>>
>>
>>
>>
>>
>>
>>On 13/10/2017, 5:34 PM, "Liu, Monk" <monk@amd.com> wrote:
>>
>>>Pixel
>>>
>>>I don't think this will work well, my suggestion is you add a new function 
>>>like:
>>>amdgpu_wreg_kiq_busy(),
>>>
>>>which will write registers through KIQ and use polling/busy wait, and the 
>>>original amdgu_wreg_no_kiq() can be still there.
>>>
>>>When you need to disable sleep like in IRQ CONTEXT, you can call 
>>>wreg_kiq_busy() or wreg_no_kiq(),
>>>
>>>But don't just change the original function 
>>>
>>>BR Monk
>>>
>>>-Original Message-
>>>From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf Of 
>>>Pixel Ding
>>>Sent: 2017年10月13日 16:26
>>>To: amd-gfx@lists.freedesktop.org
>>>Cc: Ding, Pixel <pixel.d...@amd.com>; Li, Bingley <bingley...@amd.com>
>>>Subject: [PATCH 4/4] drm/amdgpu: busywait KIQ register accessing
>>>
>>>From: pding <pixel.d...@amd.com>
>>>
>>>Register accessing is performed when IRQ is disabled. Never sleep in this 
>>>function.
>>>
>>>Known issue: dead sleep in many use cases of index/data registers.
>>>
>>>Signed-off-by: pding <pixel.d...@amd.com>
>>>---
>>> drivers/gpu/drm/amd/amdgpu/amdgpu.h|  1 +
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  8 ++---  
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  |  2 +-
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c|  1 +
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  1 +
>>> drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c   | 55 
>>> ++
>>> 6 files changed, 40 insertions(+), 28 deletions(-)
>>>
>>>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
>>>b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>index ca212ef..f9de756 100644
>>>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>>@@ -886,6 +886,7 @@ struct amdgpu_kiq {
>>> u64 eop_gpu_addr;
>>> struct amdgpu_bo*eop_obj;
>>> struct mutexring_mutex;
>>>+spinlock_t  ring_lock;
>>> struct amdgpu_ring  ring;
>>> struct amdgpu_irq_src   irq;
>>> };
>>>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>>>b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>index ab8f0d6..1197274 100644
>>>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>>@@ -109,10 +109,8 @@ uint32_t amdgpu_mm_rreg(struct amdgpu_device *adev, 
>>>uint32_t reg,  {
>>> uint32_t ret;
>>> 
>>&g

Re: [PATCH 4/4] drm/amdgpu: busywait KIQ register accessing

2017-10-15 Thread Ding, Pixel

OK, I would keep both methods working together.
— 
Sincerely Yours,
Pixel







On 13/10/2017, 6:04 PM, "Liu, Monk" <monk@amd.com> wrote:

>Why there will be racing issue ?
>
>Polling or sleep wait only have different result for the caller, not the job 
>scheduled to KIQ 
>
>The sleep waiting is synchroniz sleep, it just release CPU resource to other 
>process/thread, so the order is guaranteed 
>
>BR Monk
>
>-Original Message-
>From: Ding, Pixel 
>Sent: 2017年10月13日 17:39
>To: Liu, Monk <monk@amd.com>; amd-gfx@lists.freedesktop.org
>Cc: Li, Bingley <bingley...@amd.com>
>Subject: Re: [PATCH 4/4] drm/amdgpu: busywait KIQ register accessing
>
>I’m afraid there’s racing issue if polling and IRQ use cases are mixed at the 
>same time.
>
>The original implementation is as your suggested. Is there any benefit to keep 
>to sleepy version?
>— 
>Sincerely Yours,
>Pixel
>
>
>
>
>
>
>
>
>On 13/10/2017, 5:34 PM, "Liu, Monk" <monk@amd.com> wrote:
>
>>Pixel
>>
>>I don't think this will work well, my suggestion is you add a new function 
>>like:
>>amdgpu_wreg_kiq_busy(),
>>
>>which will write registers through KIQ and use polling/busy wait, and the 
>>original amdgu_wreg_no_kiq() can be still there.
>>
>>When you need to disable sleep like in IRQ CONTEXT, you can call 
>>wreg_kiq_busy() or wreg_no_kiq(),
>>
>>But don't just change the original function 
>>
>>BR Monk
>>
>>-Original Message-
>>From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf Of 
>>Pixel Ding
>>Sent: 2017年10月13日 16:26
>>To: amd-gfx@lists.freedesktop.org
>>Cc: Ding, Pixel <pixel.d...@amd.com>; Li, Bingley <bingley...@amd.com>
>>Subject: [PATCH 4/4] drm/amdgpu: busywait KIQ register accessing
>>
>>From: pding <pixel.d...@amd.com>
>>
>>Register accessing is performed when IRQ is disabled. Never sleep in this 
>>function.
>>
>>Known issue: dead sleep in many use cases of index/data registers.
>>
>>Signed-off-by: pding <pixel.d...@amd.com>
>>---
>> drivers/gpu/drm/amd/amdgpu/amdgpu.h|  1 +
>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  8 ++---  
>> drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  |  2 +-
>> drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c|  1 +
>> drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  1 +
>> drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c   | 55 
>> ++
>> 6 files changed, 40 insertions(+), 28 deletions(-)
>>
>>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
>>b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>index ca212ef..f9de756 100644
>>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>@@ -886,6 +886,7 @@ struct amdgpu_kiq {
>>  u64 eop_gpu_addr;
>>  struct amdgpu_bo*eop_obj;
>>  struct mutexring_mutex;
>>+ spinlock_t  ring_lock;
>>  struct amdgpu_ring  ring;
>>  struct amdgpu_irq_src   irq;
>> };
>>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>>b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>index ab8f0d6..1197274 100644
>>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>@@ -109,10 +109,8 @@ uint32_t amdgpu_mm_rreg(struct amdgpu_device *adev, 
>>uint32_t reg,  {
>>  uint32_t ret;
>> 
>>- if (!(acc_flags & AMDGPU_REGS_NO_KIQ) && amdgpu_sriov_runtime(adev)) {
>>- BUG_ON(in_interrupt());
>>+ if (!(acc_flags & AMDGPU_REGS_NO_KIQ) && amdgpu_sriov_runtime(adev))
>>  return amdgpu_virt_kiq_rreg(adev, reg);
>>- }
>> 
>>  if ((reg * 4) < adev->rmmio_size && !(acc_flags & AMDGPU_REGS_IDX))
>>  ret = readl(((void __iomem *)adev->rmmio) + (reg * 4)); @@ 
>> -137,10 +135,8 @@ void amdgpu_mm_wreg(struct amdgpu_device *adev, uint32_t 
>> reg, uint32_t v,
>>  adev->last_mm_index = v;
>>  }
>> 
>>- if (!(acc_flags & AMDGPU_REGS_NO_KIQ) && amdgpu_sriov_runtime(adev)) {
>>- BUG_ON(in_interrupt());
>>+ if (!(acc_flags & AMDGPU_REGS_NO_KIQ) && amdgpu_sriov_runtime(adev))
>>  return amdgpu_virt_kiq_wreg(adev, reg, v);
>>- }
>> 
>>  if ((reg * 4) < adev->rmmio_size && !(acc_flags & AMDGPU_REGS_IDX))
>>  writel(v, ((void __iomem *)a

Re: [PATCH 4/4] drm/amdgpu: busywait KIQ register accessing

2017-10-13 Thread Ding, Pixel

I’m afraid there’s racing issue if polling and IRQ use cases are mixed at the 
same time.

The original implementation is as your suggested. Is there any benefit to keep 
to sleepy version?
— 
Sincerely Yours,
Pixel








On 13/10/2017, 5:34 PM, "Liu, Monk" <monk@amd.com> wrote:

>Pixel
>
>I don't think this will work well, my suggestion is you add a new function 
>like:
>amdgpu_wreg_kiq_busy(),
>
>which will write registers through KIQ and use polling/busy wait, and the 
>original amdgu_wreg_no_kiq() can be still there.
>
>When you need to disable sleep like in IRQ CONTEXT, you can call 
>wreg_kiq_busy() or wreg_no_kiq(),
>
>But don't just change the original function 
>
>BR Monk
>
>-Original Message-
>From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf Of 
>Pixel Ding
>Sent: 2017年10月13日 16:26
>To: amd-gfx@lists.freedesktop.org
>Cc: Ding, Pixel <pixel.d...@amd.com>; Li, Bingley <bingley...@amd.com>
>Subject: [PATCH 4/4] drm/amdgpu: busywait KIQ register accessing
>
>From: pding <pixel.d...@amd.com>
>
>Register accessing is performed when IRQ is disabled. Never sleep in this 
>function.
>
>Known issue: dead sleep in many use cases of index/data registers.
>
>Signed-off-by: pding <pixel.d...@amd.com>
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu.h|  1 +
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  8 ++---  
> drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  |  2 +-
> drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c|  1 +
> drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  1 +
> drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c   | 55 ++
> 6 files changed, 40 insertions(+), 28 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
>b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>index ca212ef..f9de756 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>@@ -886,6 +886,7 @@ struct amdgpu_kiq {
>   u64 eop_gpu_addr;
>   struct amdgpu_bo*eop_obj;
>   struct mutexring_mutex;
>+  spinlock_t  ring_lock;
>   struct amdgpu_ring  ring;
>   struct amdgpu_irq_src   irq;
> };
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>index ab8f0d6..1197274 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>@@ -109,10 +109,8 @@ uint32_t amdgpu_mm_rreg(struct amdgpu_device *adev, 
>uint32_t reg,  {
>   uint32_t ret;
> 
>-  if (!(acc_flags & AMDGPU_REGS_NO_KIQ) && amdgpu_sriov_runtime(adev)) {
>-  BUG_ON(in_interrupt());
>+  if (!(acc_flags & AMDGPU_REGS_NO_KIQ) && amdgpu_sriov_runtime(adev))
>   return amdgpu_virt_kiq_rreg(adev, reg);
>-  }
> 
>   if ((reg * 4) < adev->rmmio_size && !(acc_flags & AMDGPU_REGS_IDX))
>   ret = readl(((void __iomem *)adev->rmmio) + (reg * 4)); @@ 
> -137,10 +135,8 @@ void amdgpu_mm_wreg(struct amdgpu_device *adev, uint32_t 
> reg, uint32_t v,
>   adev->last_mm_index = v;
>   }
> 
>-  if (!(acc_flags & AMDGPU_REGS_NO_KIQ) && amdgpu_sriov_runtime(adev)) {
>-  BUG_ON(in_interrupt());
>+  if (!(acc_flags & AMDGPU_REGS_NO_KIQ) && amdgpu_sriov_runtime(adev))
>   return amdgpu_virt_kiq_wreg(adev, reg, v);
>-  }
> 
>   if ((reg * 4) < adev->rmmio_size && !(acc_flags & AMDGPU_REGS_IDX))
>   writel(v, ((void __iomem *)adev->rmmio) + (reg * 4)); diff 
> --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>index 2044758..c6c7bf3 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>@@ -109,7 +109,7 @@ static void amdgpu_fence_write(struct amdgpu_ring *ring, 
>u32 seq)
>  * Reads a fence value from memory (all asics).
>  * Returns the value of the fence read from memory.
>  */
>-static u32 amdgpu_fence_read(struct amdgpu_ring *ring)
>+u32 amdgpu_fence_read(struct amdgpu_ring *ring)
> {
>   struct amdgpu_fence_driver *drv = >fence_drv;
>   u32 seq = 0;
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c 
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>index 4f6c68f..46fa88c 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>@@ -186,6 +186,7 @@ int amdgpu_gfx_kiq_init_ring(struct amdgpu_device *adev,
>   int r = 0;
> 
>   mutex_init(>ring_mutex);
>+  spin_lock_init(>ring_lock);
> 
>

Re: [PATCH 3/4] drm/amdgpu: report preemption fence via amdgpu_fence_info

2017-10-13 Thread Ding, Pixel

Get it.
— 
Sincerely Yours,
Pixel








On 13/10/2017, 5:28 PM, "Liu, Monk" <monk@amd.com> wrote:

>@Ding, Pixel
>
>+  seq_printf(m, "Last preempted  0x%08x\n",
>+ le32_to_cpu(*(ring->fence_drv.cpu_addr + 2)));
>
>Please handle other type fence as well:
>
>Preempted fence
>Reset fence
>Reset and preempted fence
>
>
>BR Monk
>-Original Message-
>From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf Of 
>Christian K?nig
>Sent: 2017年10月13日 17:10
>To: Ding, Pixel <pixel.d...@amd.com>; amd-gfx@lists.freedesktop.org
>Cc: Li, Bingley <bingley...@amd.com>
>Subject: Re: [PATCH 3/4] drm/amdgpu: report preemption fence via 
>amdgpu_fence_info
>
>Hi Pixel,
>
>> My understanding is that CP will write seqno back to preempted fence offset 
>> when preemption occurred.
>That is correct.
>
>But my question is why you want to print a different value here:
>> +seq_printf(m, "Last preempted  0x%08x\n",
>> +   le32_to_cpu(*(ring->fence_drv.cpu_addr + 2)));
>
>That code prints two dw after the address where the CPU writes the seqno. 
>Where is the code which setups the CP to do this?
>
>As far as I can see that line should always print just zero (or some random 
>number if the memory is actually not initialized to anything).
>
>Regards,
>Christian.
>
>Am 13.10.2017 um 11:03 schrieb Ding, Pixel:
>> My understanding is that CP will write seqno back to preempted fence offset 
>> when preemption occurred. Then if there is a value here we can generally 
>> know packet with which fence is preempted. Should driver handle any other 
>> thing for this?
>>  
>>  
>>  
>>  
>>  
>>
>>
>> —
>> Sincerely Yours,
>> Pixel
>>
>>
>>
>>
>>
>>
>>
>> On 13/10/2017, 4:51 PM, "Koenig, Christian" <christian.koe...@amd.com> wrote:
>>
>>>> Is it OK to limit this information only for SRIOV VF on Tonga and Vega 
>>>> whose format is known? It can help use to identify if MCBP is working 
>>>> correctly or not.
>>> The question is where is this code for Tonga and Vega? I can't find a 
>>> reference to fence_offs in the GFX8 nor GFX9 code we have in 
>>> amd-staging-drm-next.
>>>
>>> And if it doesn't depend on the fence_offs in the ring printing it 
>>> like this is just pure speculation.
>>>
>>> Regards,
>>> Christian.
>>>
>>> Am 13.10.2017 um 10:41 schrieb Ding, Pixel:
>>>> Thanks Christian,
>>>>
>>>> I’m not sure if I get your point, but yes the preemption fence offset 
>>>> could be changed.
>>>>
>>>> Is it OK to limit this information only for SRIOV VF on Tonga and Vega 
>>>> whose format is known? It can help use to identify if MCBP is working 
>>>> correctly or not.
>>>>
>>>>
>>>> —
>>>> Sincerely Yours,
>>>> Pixel
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 13/10/2017, 4:34 PM, "Christian König" 
>>>> <ckoenig.leichtzumer...@gmail.com> wrote:
>>>>
>>>>> Am 13.10.2017 um 10:26 schrieb Pixel Ding:
>>>>>> From: pding <pixel.d...@amd.com>
>>>>>>
>>>>>> Only report fence for GFX ring. This can help checking MCBP feature.
>>>>>>
>>>>>> Signed-off-by: pding <pixel.d...@amd.com>
>>>>>> ---
>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 7 +++
>>>>>> 1 file changed, 7 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>> index 09d5a5c..2044758 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>> @@ -645,6 +645,13 @@ static int amdgpu_debugfs_fence_info(struct 
>>>>>> seq_file *m, void *data)
>>>>>> atomic_read(>fence_drv.last_seq));
>>>>>>  seq_printf(m, "Last emitted0x%08x\n",
>>>>>> ring->fence_drv.sync_seq);
>>>>>> +
>>>>>> +if (ring->funcs->type != AMDGPU_RING_TYPE_GFX)
>>>>>> +break;
>>>>> That should probably be "continue" instead of break, or otherwise 
>>>>> you don't print the other fences any more.
>>>>>
>>>>>> +
>>>>>> +seq_printf(m, "Last preempted  0x%08x\n",
>>>>>> +   le32_to_cpu(*(ring->fence_drv.cpu_addr + 
>>>>>> 2)));
>>>>> Is the code to put the preemption fence there already upstream?
>>>>>
>>>>> If yes do we really do this like that for all supported generations?
>>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>>> +
>>>>>>  }
>>>>>>  return 0;
>>>>>> }
>
>
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 4/4] drm/amdgpu: busywait KIQ register accessing

2017-10-13 Thread Ding, Pixel

On 13/10/2017, 5:16 PM, "Christian König"  
wrote:



>Am 13.10.2017 um 10:26 schrieb Pixel Ding:
>> From: pding 
>>
>> Register accessing is performed when IRQ is disabled. Never sleep in
>> this function.
>>
>> Known issue: dead sleep in many use cases of index/data registers.
>>
>> Signed-off-by: pding 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu.h|  1 +
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |  8 ++---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c  |  2 +-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c|  1 +
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  1 +
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c   | 55 
>> ++
>>   6 files changed, 40 insertions(+), 28 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> index ca212ef..f9de756 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> @@ -886,6 +886,7 @@ struct amdgpu_kiq {
>>  u64 eop_gpu_addr;
>>  struct amdgpu_bo*eop_obj;
>>  struct mutexring_mutex;
>
>You can remove the ring_mutex if you don't use it any more.
[Pixel] KFD still use the mutex, I didn’t change it also to spin lock, should I?
>
>> +spinlock_t  ring_lock;
>>  struct amdgpu_ring  ring;
>>  struct amdgpu_irq_src   irq;
>>   };
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index ab8f0d6..1197274 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -109,10 +109,8 @@ uint32_t amdgpu_mm_rreg(struct amdgpu_device *adev, 
>> uint32_t reg,
>>   {
>>  uint32_t ret;
>>   
>> -if (!(acc_flags & AMDGPU_REGS_NO_KIQ) && amdgpu_sriov_runtime(adev)) {
>> -BUG_ON(in_interrupt());
>> +if (!(acc_flags & AMDGPU_REGS_NO_KIQ) && amdgpu_sriov_runtime(adev))
>>  return amdgpu_virt_kiq_rreg(adev, reg);
>> -}
>>   
>>  if ((reg * 4) < adev->rmmio_size && !(acc_flags & AMDGPU_REGS_IDX))
>>  ret = readl(((void __iomem *)adev->rmmio) + (reg * 4));
>> @@ -137,10 +135,8 @@ void amdgpu_mm_wreg(struct amdgpu_device *adev, 
>> uint32_t reg, uint32_t v,
>>  adev->last_mm_index = v;
>>  }
>>   
>> -if (!(acc_flags & AMDGPU_REGS_NO_KIQ) && amdgpu_sriov_runtime(adev)) {
>> -BUG_ON(in_interrupt());
>> +if (!(acc_flags & AMDGPU_REGS_NO_KIQ) && amdgpu_sriov_runtime(adev))
>>  return amdgpu_virt_kiq_wreg(adev, reg, v);
>> -}
>>   
>>  if ((reg * 4) < adev->rmmio_size && !(acc_flags & AMDGPU_REGS_IDX))
>>  writel(v, ((void __iomem *)adev->rmmio) + (reg * 4));
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> index 2044758..c6c7bf3 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> @@ -109,7 +109,7 @@ static void amdgpu_fence_write(struct amdgpu_ring *ring, 
>> u32 seq)
>>* Reads a fence value from memory (all asics).
>>* Returns the value of the fence read from memory.
>>*/
>> -static u32 amdgpu_fence_read(struct amdgpu_ring *ring)
>> +u32 amdgpu_fence_read(struct amdgpu_ring *ring)
>>   {
>>  struct amdgpu_fence_driver *drv = >fence_drv;
>>  u32 seq = 0;
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>> index 4f6c68f..46fa88c 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>> @@ -186,6 +186,7 @@ int amdgpu_gfx_kiq_init_ring(struct amdgpu_device *adev,
>>  int r = 0;
>>   
>>  mutex_init(>ring_mutex);
>> +spin_lock_init(>ring_lock);
>>   
>>  r = amdgpu_wb_get(adev, >virt.reg_val_offs);
>>  if (r)
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>> index af8e544..a4fa923 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>> @@ -89,6 +89,7 @@ int amdgpu_fence_driver_start_ring(struct amdgpu_ring 
>> *ring,
>>   void amdgpu_fence_driver_suspend(struct amdgpu_device *adev);
>>   void amdgpu_fence_driver_resume(struct amdgpu_device *adev);
>>   int amdgpu_fence_emit(struct amdgpu_ring *ring, struct dma_fence **fence);
>> +u32 amdgpu_fence_read(struct amdgpu_ring *ring);
>>   void amdgpu_fence_process(struct amdgpu_ring *ring);
>>   int amdgpu_fence_wait_empty(struct amdgpu_ring *ring);
>>   unsigned amdgpu_fence_count_emitted(struct amdgpu_ring *ring);
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
>> index ab05121..9757df1 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
>> @@ -22,7

Re: [PATCH 3/4] drm/amdgpu: report preemption fence via amdgpu_fence_info

2017-10-13 Thread Ding, Pixel

OK I get it…
when we use the fence_offs to submit fence to HW, in fact it’s a 8 DW fence not 
a 2 DW.

The format is:
Completed Fence  0x0 Fence written here if frame completed normally
Preempted Fence  0x2 Bit set in CP_VMID_PREEMPT and preemption occurred
Reset Fence  0x4 Bit is set in CP_VMID_RESET and reset occurred
Preempted then Reset 0x6 Both preemption and reset occurred






Then I notice that the 8DW wb allocation patch is missed on staging driver.


Hi Monk,

Can you take a look? I thought the 8DW WB patch is already upstream.

— 
Sincerely Yours,
Pixel







On 13/10/2017, 5:10 PM, "Koenig, Christian" <christian.koe...@amd.com> wrote:

>Hi Pixel,
>
>> My understanding is that CP will write seqno back to preempted fence offset 
>> when preemption occurred.
>That is correct.
>
>But my question is why you want to print a different value here:
>> +seq_printf(m, "Last preempted  0x%08x\n",
>> +   le32_to_cpu(*(ring->fence_drv.cpu_addr + 2)));
>
>That code prints two dw after the address where the CPU writes the 
>seqno. Where is the code which setups the CP to do this?
>
>As far as I can see that line should always print just zero (or some 
>random number if the memory is actually not initialized to anything).
>
>Regards,
>Christian.
>
>Am 13.10.2017 um 11:03 schrieb Ding, Pixel:
>> My understanding is that CP will write seqno back to preempted fence offset 
>> when preemption occurred. Then if there is a value here we can generally 
>> know packet with which fence is preempted. Should driver handle any other 
>> thing for this?
>>  
>>  
>>  
>>  
>>  
>>
>>
>> —
>> Sincerely Yours,
>> Pixel
>>
>>
>>
>>
>>
>>
>>
>> On 13/10/2017, 4:51 PM, "Koenig, Christian" <christian.koe...@amd.com> wrote:
>>
>>>> Is it OK to limit this information only for SRIOV VF on Tonga and Vega 
>>>> whose format is known? It can help use to identify if MCBP is working 
>>>> correctly or not.
>>> The question is where is this code for Tonga and Vega? I can't find a
>>> reference to fence_offs in the GFX8 nor GFX9 code we have in
>>> amd-staging-drm-next.
>>>
>>> And if it doesn't depend on the fence_offs in the ring printing it like
>>> this is just pure speculation.
>>>
>>> Regards,
>>> Christian.
>>>
>>> Am 13.10.2017 um 10:41 schrieb Ding, Pixel:
>>>> Thanks Christian,
>>>>
>>>> I’m not sure if I get your point, but yes the preemption fence offset 
>>>> could be changed.
>>>>
>>>> Is it OK to limit this information only for SRIOV VF on Tonga and Vega 
>>>> whose format is known? It can help use to identify if MCBP is working 
>>>> correctly or not.
>>>>
>>>>
>>>> —
>>>> Sincerely Yours,
>>>> Pixel
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On 13/10/2017, 4:34 PM, "Christian König" 
>>>> <ckoenig.leichtzumer...@gmail.com> wrote:
>>>>
>>>>> Am 13.10.2017 um 10:26 schrieb Pixel Ding:
>>>>>> From: pding <pixel.d...@amd.com>
>>>>>>
>>>>>> Only report fence for GFX ring. This can help checking MCBP feature.
>>>>>>
>>>>>> Signed-off-by: pding <pixel.d...@amd.com>
>>>>>> ---
>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 7 +++
>>>>>> 1 file changed, 7 insertions(+)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>> index 09d5a5c..2044758 100644
>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>>> @@ -645,6 +645,13 @@ static int amdgpu_debugfs_fence_info(struct 
>>>>>> seq_file *m, void *data)
>>>>>> atomic_read(>fence_drv.last_seq));
>>>>>>  seq_printf(m, "Last emitted0x%08x\n",
>>>>>> ring->fence_drv.sync_seq);
>>>>>> +
>>>>>> +if (ring->funcs->type != AMDGPU_RING_TYPE_GFX)
>>>>>> +break;
>>>>> That should probably be "continue" instead of break, or otherwise you
>>>>> don't print the other fences any more.
>>>>>
>>>>>> +
>>>>>> +seq_printf(m, "Last preempted  0x%08x\n",
>>>>>> +   le32_to_cpu(*(ring->fence_drv.cpu_addr + 
>>>>>> 2)));
>>>>> Is the code to put the preemption fence there already upstream?
>>>>>
>>>>> If yes do we really do this like that for all supported generations?
>>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>>> +
>>>>>>  }
>>>>>>  return 0;
>>>>>> }
>
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 3/4] drm/amdgpu: report preemption fence via amdgpu_fence_info

2017-10-13 Thread Ding, Pixel

Yes I see the “drm/amdgpu: use 256 bit buffers for all wb allocations (v2)”.

Hi Christian, 

So it seems all good, right?

— 
Sincerely Yours,
Pixel







On 13/10/2017, 5:19 PM, "Liu, Monk" <monk@amd.com> wrote:

>Pixel,
>
>On drm-next, we always allocate 8DW for all WB, you can check the wb_get 
>routine on detail
>
>BR Monk
>
>-Original Message-
>From: Ding, Pixel 
>Sent: 2017年10月13日 17:19
>To: Koenig, Christian <christian.koe...@amd.com>; 
>amd-gfx@lists.freedesktop.org; Liu, Monk <monk@amd.com>
>Cc: Li, Bingley <bingley...@amd.com>
>Subject: Re: [PATCH 3/4] drm/amdgpu: report preemption fence via 
>amdgpu_fence_info
>
>OK I get it…
>when we use the fence_offs to submit fence to HW, in fact it’s a 8 DW fence 
>not a 2 DW.
>
>The format is:
>Completed Fence  0x0 Fence written here if frame completed normally
>Preempted Fence  0x2 Bit set in CP_VMID_PREEMPT and preemption occurred
>Reset Fence  0x4 Bit is set in CP_VMID_RESET and reset occurred
>Preempted then Reset 0x6 Both preemption and reset occurred
>
>   
>   
>   
>   
>   
>Then I notice that the 8DW wb allocation patch is missed on staging driver.
>
>
>Hi Monk,
>
>Can you take a look? I thought the 8DW WB patch is already upstream.
>
>— 
>Sincerely Yours,
>Pixel
>
>
>
>
>
>
>
>On 13/10/2017, 5:10 PM, "Koenig, Christian" <christian.koe...@amd.com> wrote:
>
>>Hi Pixel,
>>
>>> My understanding is that CP will write seqno back to preempted fence offset 
>>> when preemption occurred.
>>That is correct.
>>
>>But my question is why you want to print a different value here:
>>> +   seq_printf(m, "Last preempted  0x%08x\n",
>>> +  le32_to_cpu(*(ring->fence_drv.cpu_addr + 2)));
>>
>>That code prints two dw after the address where the CPU writes the 
>>seqno. Where is the code which setups the CP to do this?
>>
>>As far as I can see that line should always print just zero (or some 
>>random number if the memory is actually not initialized to anything).
>>
>>Regards,
>>Christian.
>>
>>Am 13.10.2017 um 11:03 schrieb Ding, Pixel:
>>> My understanding is that CP will write seqno back to preempted fence offset 
>>> when preemption occurred. Then if there is a value here we can generally 
>>> know packet with which fence is preempted. Should driver handle any other 
>>> thing for this?
>>> 
>>> 
>>> 
>>> 
>>> 
>>>
>>>
>>> —
>>> Sincerely Yours,
>>> Pixel
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 13/10/2017, 4:51 PM, "Koenig, Christian" <christian.koe...@amd.com> 
>>> wrote:
>>>
>>>>> Is it OK to limit this information only for SRIOV VF on Tonga and Vega 
>>>>> whose format is known? It can help use to identify if MCBP is working 
>>>>> correctly or not.
>>>> The question is where is this code for Tonga and Vega? I can't find a
>>>> reference to fence_offs in the GFX8 nor GFX9 code we have in
>>>> amd-staging-drm-next.
>>>>
>>>> And if it doesn't depend on the fence_offs in the ring printing it like
>>>> this is just pure speculation.
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>> Am 13.10.2017 um 10:41 schrieb Ding, Pixel:
>>>>> Thanks Christian,
>>>>>
>>>>> I’m not sure if I get your point, but yes the preemption fence offset 
>>>>> could be changed.
>>>>>
>>>>> Is it OK to limit this information only for SRIOV VF on Tonga and Vega 
>>>>> whose format is known? It can help use to identify if MCBP is working 
>>>>> correctly or not.
>>>>>
>>>>>
>>>>> —
>>>>> Sincerely Yours,
>>>>> Pixel
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On 13/10/2017, 4:34 PM, "Christian König" 
>>>>> <ckoenig.leichtzumer...@gmail.com> wrote:
>>>>>
>>>>>> Am 13.10.2017 um 10:26 schrieb Pixel Din

Re: [PATCH 3/4] drm/amdgpu: report preemption fence via amdgpu_fence_info

2017-10-13 Thread Ding, Pixel

This patch is not for implementation of MCBP handled by driver itself. In SRIOV 
use case the preemption is handle in host layer. What I do here is just check 
if the preemption occurs, so there’s no code to handle it.

—
Sincerely Yours,
Pixel







On 13/10/2017, 5:03 PM, "Ding, Pixel" <pixel.d...@amd.com> wrote:

>My understanding is that CP will write seqno back to preempted fence offset 
>when preemption occurred. Then if there is a value here we can generally know 
>packet with which fence is preempted. Should driver handle any other thing for 
>this?
>   
>   
>   
>   
>   
>
>
>— 
>Sincerely Yours,
>Pixel
>
>
>
>
>
>
>
>On 13/10/2017, 4:51 PM, "Koenig, Christian" <christian.koe...@amd.com> wrote:
>
>>> Is it OK to limit this information only for SRIOV VF on Tonga and Vega 
>>> whose format is known? It can help use to identify if MCBP is working 
>>> correctly or not.
>>The question is where is this code for Tonga and Vega? I can't find a 
>>reference to fence_offs in the GFX8 nor GFX9 code we have in 
>>amd-staging-drm-next.
>>
>>And if it doesn't depend on the fence_offs in the ring printing it like 
>>this is just pure speculation.
>>
>>Regards,
>>Christian.
>>
>>Am 13.10.2017 um 10:41 schrieb Ding, Pixel:
>>> Thanks Christian,
>>>
>>> I’m not sure if I get your point, but yes the preemption fence offset could 
>>> be changed.
>>>
>>> Is it OK to limit this information only for SRIOV VF on Tonga and Vega 
>>> whose format is known? It can help use to identify if MCBP is working 
>>> correctly or not.
>>>
>>>
>>> —
>>> Sincerely Yours,
>>> Pixel
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 13/10/2017, 4:34 PM, "Christian König" 
>>> <ckoenig.leichtzumer...@gmail.com> wrote:
>>>
>>>> Am 13.10.2017 um 10:26 schrieb Pixel Ding:
>>>>> From: pding <pixel.d...@amd.com>
>>>>>
>>>>> Only report fence for GFX ring. This can help checking MCBP feature.
>>>>>
>>>>> Signed-off-by: pding <pixel.d...@amd.com>
>>>>> ---
>>>>>drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 7 +++
>>>>>1 file changed, 7 insertions(+)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>> index 09d5a5c..2044758 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>>> @@ -645,6 +645,13 @@ static int amdgpu_debugfs_fence_info(struct seq_file 
>>>>> *m, void *data)
>>>>>  atomic_read(>fence_drv.last_seq));
>>>>>   seq_printf(m, "Last emitted0x%08x\n",
>>>>>  ring->fence_drv.sync_seq);
>>>>> +
>>>>> + if (ring->funcs->type != AMDGPU_RING_TYPE_GFX)
>>>>> + break;
>>>> That should probably be "continue" instead of break, or otherwise you
>>>> don't print the other fences any more.
>>>>
>>>>> +
>>>>> + seq_printf(m, "Last preempted  0x%08x\n",
>>>>> +le32_to_cpu(*(ring->fence_drv.cpu_addr + 2)));
>>>> Is the code to put the preemption fence there already upstream?
>>>>
>>>> If yes do we really do this like that for all supported generations?
>>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>> +
>>>>>   }
>>>>>   return 0;
>>>>>}
>>>>
>>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 2/4] drm/amdgpu: workaround for VM fault caused by SDMA set_wptr

2017-10-13 Thread Ding, Pixel

Yes I tried smp_mb but it doesn’t help…
We will follow up this issue continuously until fix the root cause.
— 
Sincerely Yours,
Pixel







On 13/10/2017, 5:17 PM, "Christian König"  
wrote:

>Am 13.10.2017 um 10:26 schrieb Pixel Ding:
>> From: pding 
>>
>> The polling memory was standalone in VRAM before, so the HDP flush
>> introduced latency that hides a VM fault issue. Now polling memory
>> leverages the WB in system memory and HDP flush is not required, the
>> VM fault at same page happens.
>>
>> Add delay back to workaround until the root cause is found.
>>
>> Tests: VP1 or launch 40 instances of glxinfo at the same time.
>>
>> Signed-off-by: pding 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c 
>> b/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
>> index b1de44f..5c4bbe1 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
>> @@ -381,6 +381,9 @@ static void sdma_v3_0_ring_set_wptr(struct amdgpu_ring 
>> *ring)
>>  if (ring->use_doorbell) {
>>  /* XXX check if swapping is necessary on BE */
>>  adev->wb.wb[ring->wptr_offs] = lower_32_bits(ring->wptr) << 2;
>> +/* workaround: VM fault always happen at page 2046 */
>> +if (amdgpu_sriov_vf(adev))
>> +udelay(500);
>
>Have you tried using a memory barrier here?
>
>That looks like it will have massive impact on performance.
>
>Regards,
>Christian.
>
>>  WDOORBELL32(ring->doorbell_index, lower_32_bits(ring->wptr) << 
>> 2);
>>  } else {
>>  int me = (ring == >adev->sdma.instance[0].ring) ? 0 : 1;
>
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 3/4] drm/amdgpu: report preemption fence via amdgpu_fence_info

2017-10-13 Thread Ding, Pixel

My understanding is that CP will write seqno back to preempted fence offset 
when preemption occurred. Then if there is a value here we can generally know 
packet with which fence is preempted. Should driver handle any other thing for 
this?







— 
Sincerely Yours,
Pixel







On 13/10/2017, 4:51 PM, "Koenig, Christian" <christian.koe...@amd.com> wrote:

>> Is it OK to limit this information only for SRIOV VF on Tonga and Vega whose 
>> format is known? It can help use to identify if MCBP is working correctly or 
>> not.
>The question is where is this code for Tonga and Vega? I can't find a 
>reference to fence_offs in the GFX8 nor GFX9 code we have in 
>amd-staging-drm-next.
>
>And if it doesn't depend on the fence_offs in the ring printing it like 
>this is just pure speculation.
>
>Regards,
>Christian.
>
>Am 13.10.2017 um 10:41 schrieb Ding, Pixel:
>> Thanks Christian,
>>
>> I’m not sure if I get your point, but yes the preemption fence offset could 
>> be changed.
>>
>> Is it OK to limit this information only for SRIOV VF on Tonga and Vega whose 
>> format is known? It can help use to identify if MCBP is working correctly or 
>> not.
>>
>>
>> —
>> Sincerely Yours,
>> Pixel
>>
>>
>>
>>
>>
>>
>>
>> On 13/10/2017, 4:34 PM, "Christian König" <ckoenig.leichtzumer...@gmail.com> 
>> wrote:
>>
>>> Am 13.10.2017 um 10:26 schrieb Pixel Ding:
>>>> From: pding <pixel.d...@amd.com>
>>>>
>>>> Only report fence for GFX ring. This can help checking MCBP feature.
>>>>
>>>> Signed-off-by: pding <pixel.d...@amd.com>
>>>> ---
>>>>drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 7 +++
>>>>1 file changed, 7 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>> index 09d5a5c..2044758 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>>>> @@ -645,6 +645,13 @@ static int amdgpu_debugfs_fence_info(struct seq_file 
>>>> *m, void *data)
>>>>   atomic_read(>fence_drv.last_seq));
>>>>seq_printf(m, "Last emitted0x%08x\n",
>>>>   ring->fence_drv.sync_seq);
>>>> +
>>>> +  if (ring->funcs->type != AMDGPU_RING_TYPE_GFX)
>>>> +  break;
>>> That should probably be "continue" instead of break, or otherwise you
>>> don't print the other fences any more.
>>>
>>>> +
>>>> +  seq_printf(m, "Last preempted  0x%08x\n",
>>>> + le32_to_cpu(*(ring->fence_drv.cpu_addr + 2)));
>>> Is the code to put the preemption fence there already upstream?
>>>
>>> If yes do we really do this like that for all supported generations?
>>>
>>> Regards,
>>> Christian.
>>>
>>>> +
>>>>}
>>>>return 0;
>>>>}
>>>
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 3/4] drm/amdgpu: report preemption fence via amdgpu_fence_info

2017-10-13 Thread Ding, Pixel

Thanks Christian,

I’m not sure if I get your point, but yes the preemption fence offset could be 
changed.

Is it OK to limit this information only for SRIOV VF on Tonga and Vega whose 
format is known? It can help use to identify if MCBP is working correctly or 
not.


— 
Sincerely Yours,
Pixel







On 13/10/2017, 4:34 PM, "Christian König"  
wrote:

>Am 13.10.2017 um 10:26 schrieb Pixel Ding:
>> From: pding 
>>
>> Only report fence for GFX ring. This can help checking MCBP feature.
>>
>> Signed-off-by: pding 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 7 +++
>>   1 file changed, 7 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> index 09d5a5c..2044758 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
>> @@ -645,6 +645,13 @@ static int amdgpu_debugfs_fence_info(struct seq_file 
>> *m, void *data)
>> atomic_read(>fence_drv.last_seq));
>>  seq_printf(m, "Last emitted0x%08x\n",
>> ring->fence_drv.sync_seq);
>> +
>> +if (ring->funcs->type != AMDGPU_RING_TYPE_GFX)
>> +break;
>
>That should probably be "continue" instead of break, or otherwise you 
>don't print the other fences any more.
>
>> +
>> +seq_printf(m, "Last preempted  0x%08x\n",
>> +   le32_to_cpu(*(ring->fence_drv.cpu_addr + 2)));
>
>Is the code to put the preemption fence there already upstream?
>
>If yes do we really do this like that for all supported generations?
>
>Regards,
>Christian.
>
>> +
>>  }
>>  return 0;
>>   }
>
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: right shift 2 bits for SDMA_GFX_RB_WPTR_POLL_ADDR_LO v2

2017-09-25 Thread Ding, Pixel

Hi Alex,

We found that this statement introduces issue.

In sdma_v3_0_ring_set_wptr():
WRITE_ONCE(*wb, (lower_32_bits(ring->wptr) << 2));

When changing it to normal assignment operation, issue is gone, also if adding 
some dumps here to make it slower, issue is gone…

Any idea? Is there some reason to use this macro?

---
Sincerely Yours,
Pixel







On 25/09/2017, 3:12 PM, "Ding, Pixel" <pixel.d...@amd.com> wrote:

>Yes, it seems not related to the seen issue. The previous change simplifies 
>the shift operations while the logic is same. Please ignore.
>— 
>Sincerely Yours,
>Pixel
>
>
>
>
>
>
>
>
>On 25/09/2017, 3:08 PM, "Christian König" <ckoenig.leichtzumer...@gmail.com> 
>wrote:
>
>>NAK, that doesn't looks correct to me.
>>
>>> Both Tonga and Vega register SPECs indicate that this registers only
>>> use 31:2 bits in DW.
>>This means that the value must be DW aligned and NOT that it needs to be 
>>shifted by 2!
>>
>>Regards,
>>Christian.
>>
>>Am 25.09.2017 um 08:38 schrieb Pixel Ding:
>>> Both Tonga and Vega register SPECs indicate that this registers only
>>> use 31:2 bits in DW. SRIOV test case immediately fails withtout this
>>> shift.
>>>
>>> v2: write to ADDR field
>>>
>>> Signed-off-by: Pixel Ding <pixel.d...@amd.com>
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c | 9 +
>>>   drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 8 +---
>>>   2 files changed, 10 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c 
>>> b/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
>>> index 72f31cc..8b83b96 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
>>> @@ -643,7 +643,7 @@ static void sdma_v3_0_enable(struct amdgpu_device 
>>> *adev, bool enable)
>>>   static int sdma_v3_0_gfx_resume(struct amdgpu_device *adev)
>>>   {
>>> struct amdgpu_ring *ring;
>>> -   u32 rb_cntl, ib_cntl, wptr_poll_cntl;
>>> +   u32 rb_cntl, ib_cntl, wptr_poll_cntl, wptr_poll_addr_lo;
>>> u32 rb_bufsz;
>>> u32 wb_offset;
>>> u32 doorbell;
>>> @@ -712,9 +712,10 @@ static int sdma_v3_0_gfx_resume(struct amdgpu_device 
>>> *adev)
>>>   
>>> /* setup the wptr shadow polling */
>>> wptr_gpu_addr = adev->wb.gpu_addr + (ring->wptr_offs * 4);
>>> -
>>> -   WREG32(mmSDMA0_GFX_RB_WPTR_POLL_ADDR_LO + sdma_offsets[i],
>>> -  lower_32_bits(wptr_gpu_addr));
>>> +   wptr_poll_addr_lo = RREG32(mmSDMA0_GFX_RB_WPTR_POLL_ADDR_LO + 
>>> sdma_offsets[i]);
>>> +   wptr_poll_addr_lo = REG_SET_FIELD(wptr_poll_addr_lo, 
>>> SDMA0_GFX_RB_WPTR_POLL_ADDR_LO,
>>> + ADDR, 
>>> lower_32_bits(wptr_gpu_addr) >> 2);
>>> +   WREG32(mmSDMA0_GFX_RB_WPTR_POLL_ADDR_LO + sdma_offsets[i], 
>>> wptr_poll_addr_lo;
>>> WREG32(mmSDMA0_GFX_RB_WPTR_POLL_ADDR_HI + sdma_offsets[i],
>>>upper_32_bits(wptr_gpu_addr));
>>> wptr_poll_cntl = RREG32(mmSDMA0_GFX_RB_WPTR_POLL_CNTL + 
>>> sdma_offsets[i]);
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c 
>>> b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>>> index c26d205..8b8338d 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>>> @@ -574,7 +574,7 @@ static void sdma_v4_0_enable(struct amdgpu_device 
>>> *adev, bool enable)
>>>   static int sdma_v4_0_gfx_resume(struct amdgpu_device *adev)
>>>   {
>>> struct amdgpu_ring *ring;
>>> -   u32 rb_cntl, ib_cntl, wptr_poll_cntl;
>>> +   u32 rb_cntl, ib_cntl, wptr_poll_cntl, wptr_poll_addr_lo
>>> u32 rb_bufsz;
>>> u32 wb_offset;
>>> u32 doorbell;
>>> @@ -664,8 +664,10 @@ static int sdma_v4_0_gfx_resume(struct amdgpu_device 
>>> *adev)
>>>   
>>> /* setup the wptr shadow polling */
>>> wptr_gpu_addr = adev->wb.gpu_addr + (ring->wptr_offs * 4);
>>> -   WREG32(sdma_v4_0_get_reg_offset(i, 
>>> mmSDMA0_GFX_RB_WPTR_POLL_ADDR_LO),
>>> -  lower_32_bits(wptr_gpu_addr));
>>> +   wptr_poll_addr_lo = RREG32(sdma_v4_0_get_reg_offset(i, 
>>> mmSDMA0_GFX_RB_WPTR_POLL_ADDR_LO));
>>> +   wptr_poll_addr_lo = REG_SET_FIELD(wptr_poll_addr_lo, 
>>> SDMA0_GFX_RB_WPTR_POLL_ADDR_LO,
>>> + ADDR, 
>>> lower_32_bits(wptr_gpu_addr) >> 2);
>>> +   WREG32(sdma_v4_0_get_reg_offset(i, 
>>> mmSDMA0_GFX_RB_WPTR_POLL_ADDR_LO), wptr_poll_addr_lo);
>>> WREG32(sdma_v4_0_get_reg_offset(i, 
>>> mmSDMA0_GFX_RB_WPTR_POLL_ADDR_HI),
>>>upper_32_bits(wptr_gpu_addr));
>>> wptr_poll_cntl = RREG32(sdma_v4_0_get_reg_offset(i, 
>>> mmSDMA0_GFX_RB_WPTR_POLL_CNTL));
>>
>>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: right shift 2 bits for SDMA_GFX_RB_WPTR_POLL_ADDR_LO

2017-09-25 Thread Ding, Pixel

Hi Monk,

The world switch gets immediately fail. According to Xiangliang’s comment, I 
think 17.50 also has this issue. Other G branch uses original patch from Frank, 
that doesn’t have this issue. Please confirm this.

Also refer to 
http://adcweb02.amd.com/orlvalid/regspec/web_regspec/vega11/regspec/vega11_chip/public/index.html

SDMA0_GFX_RB_WPTR_POLL_ADDR_LO
ADDR 31:2


I update a v2 patch for this. I think it must fail if you really overwrite the 
address to 31:0.

— 
Sincerely Yours,
Pixel







On 25/09/2017, 2:37 PM, "Liu, Monk" <monk@amd.com> wrote:

>Hold on,
>
>We didn't hit test fail without your patch, actually at least VEGA10 doesn't 
>have the issue you mentioned, 
>Can you elaborate what issue or test case you can fix with this patch ?
>Besides, please don't change anything on vega10 before you verified it 
>
>BR Monk
>
>-Original Message-
>From: Pixel Ding [mailto:pixel.d...@amd.com] 
>Sent: Monday, September 25, 2017 2:16 PM
>To: amd-gfx@lists.freedesktop.org; Ding, Pixel <pixel.d...@amd.com>; Min, 
>Frank <frank@amd.com>; Liu, Monk <monk@amd.com>; Deucher, Alexander 
><alexander.deuc...@amd.com>
>Subject: [PATCH] drm/amdgpu: right shift 2 bits for 
>SDMA_GFX_RB_WPTR_POLL_ADDR_LO
>
>Both Tonga and Vega register SPECs indicate that this registers only use 31:2 
>bits in DW. SRIOV test case immediately fails withtout this shift.
>
>Signed-off-by: Pixel Ding <pixel.d...@amd.com>
>---
> drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c | 2 +-  
> drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 2 +-
> 2 files changed, 2 insertions(+), 2 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c 
>b/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
>index 72f31cc..947f019 100644
>--- a/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
>+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
>@@ -714,7 +714,7 @@ static int sdma_v3_0_gfx_resume(struct amdgpu_device *adev)
>   wptr_gpu_addr = adev->wb.gpu_addr + (ring->wptr_offs * 4);
> 
>   WREG32(mmSDMA0_GFX_RB_WPTR_POLL_ADDR_LO + sdma_offsets[i],
>- lower_32_bits(wptr_gpu_addr));
>+ lower_32_bits(wptr_gpu_addr) >> 2);
>   WREG32(mmSDMA0_GFX_RB_WPTR_POLL_ADDR_HI + sdma_offsets[i],
>  upper_32_bits(wptr_gpu_addr));
>   wptr_poll_cntl = RREG32(mmSDMA0_GFX_RB_WPTR_POLL_CNTL + 
> sdma_offsets[i]); diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c 
> b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>index c26d205..26d7f03 100644
>--- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>@@ -665,7 +665,7 @@ static int sdma_v4_0_gfx_resume(struct amdgpu_device *adev)
>   /* setup the wptr shadow polling */
>   wptr_gpu_addr = adev->wb.gpu_addr + (ring->wptr_offs * 4);
>   WREG32(sdma_v4_0_get_reg_offset(i, 
> mmSDMA0_GFX_RB_WPTR_POLL_ADDR_LO),
>- lower_32_bits(wptr_gpu_addr));
>+ lower_32_bits(wptr_gpu_addr) >> 2);
>   WREG32(sdma_v4_0_get_reg_offset(i, 
> mmSDMA0_GFX_RB_WPTR_POLL_ADDR_HI),
>  upper_32_bits(wptr_gpu_addr));
>   wptr_poll_cntl = RREG32(sdma_v4_0_get_reg_offset(i, 
> mmSDMA0_GFX_RB_WPTR_POLL_CNTL));
>--
>2.7.4
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: right shift 2 bits for SDMA_GFX_RB_WPTR_POLL_ADDR_LO v2

2017-09-25 Thread Ding, Pixel

Yes, it seems not related to the seen issue. The previous change simplifies the 
shift operations while the logic is same. Please ignore.
— 
Sincerely Yours,
Pixel








On 25/09/2017, 3:08 PM, "Christian König"  
wrote:

>NAK, that doesn't looks correct to me.
>
>> Both Tonga and Vega register SPECs indicate that this registers only
>> use 31:2 bits in DW.
>This means that the value must be DW aligned and NOT that it needs to be 
>shifted by 2!
>
>Regards,
>Christian.
>
>Am 25.09.2017 um 08:38 schrieb Pixel Ding:
>> Both Tonga and Vega register SPECs indicate that this registers only
>> use 31:2 bits in DW. SRIOV test case immediately fails withtout this
>> shift.
>>
>> v2: write to ADDR field
>>
>> Signed-off-by: Pixel Ding 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c | 9 +
>>   drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 8 +---
>>   2 files changed, 10 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c 
>> b/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
>> index 72f31cc..8b83b96 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
>> @@ -643,7 +643,7 @@ static void sdma_v3_0_enable(struct amdgpu_device *adev, 
>> bool enable)
>>   static int sdma_v3_0_gfx_resume(struct amdgpu_device *adev)
>>   {
>>  struct amdgpu_ring *ring;
>> -u32 rb_cntl, ib_cntl, wptr_poll_cntl;
>> +u32 rb_cntl, ib_cntl, wptr_poll_cntl, wptr_poll_addr_lo;
>>  u32 rb_bufsz;
>>  u32 wb_offset;
>>  u32 doorbell;
>> @@ -712,9 +712,10 @@ static int sdma_v3_0_gfx_resume(struct amdgpu_device 
>> *adev)
>>   
>>  /* setup the wptr shadow polling */
>>  wptr_gpu_addr = adev->wb.gpu_addr + (ring->wptr_offs * 4);
>> -
>> -WREG32(mmSDMA0_GFX_RB_WPTR_POLL_ADDR_LO + sdma_offsets[i],
>> -   lower_32_bits(wptr_gpu_addr));
>> +wptr_poll_addr_lo = RREG32(mmSDMA0_GFX_RB_WPTR_POLL_ADDR_LO + 
>> sdma_offsets[i]);
>> +wptr_poll_addr_lo = REG_SET_FIELD(wptr_poll_addr_lo, 
>> SDMA0_GFX_RB_WPTR_POLL_ADDR_LO,
>> +  ADDR, 
>> lower_32_bits(wptr_gpu_addr) >> 2);
>> +WREG32(mmSDMA0_GFX_RB_WPTR_POLL_ADDR_LO + sdma_offsets[i], 
>> wptr_poll_addr_lo;
>>  WREG32(mmSDMA0_GFX_RB_WPTR_POLL_ADDR_HI + sdma_offsets[i],
>> upper_32_bits(wptr_gpu_addr));
>>  wptr_poll_cntl = RREG32(mmSDMA0_GFX_RB_WPTR_POLL_CNTL + 
>> sdma_offsets[i]);
>> diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c 
>> b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>> index c26d205..8b8338d 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>> @@ -574,7 +574,7 @@ static void sdma_v4_0_enable(struct amdgpu_device *adev, 
>> bool enable)
>>   static int sdma_v4_0_gfx_resume(struct amdgpu_device *adev)
>>   {
>>  struct amdgpu_ring *ring;
>> -u32 rb_cntl, ib_cntl, wptr_poll_cntl;
>> +u32 rb_cntl, ib_cntl, wptr_poll_cntl, wptr_poll_addr_lo
>>  u32 rb_bufsz;
>>  u32 wb_offset;
>>  u32 doorbell;
>> @@ -664,8 +664,10 @@ static int sdma_v4_0_gfx_resume(struct amdgpu_device 
>> *adev)
>>   
>>  /* setup the wptr shadow polling */
>>  wptr_gpu_addr = adev->wb.gpu_addr + (ring->wptr_offs * 4);
>> -WREG32(sdma_v4_0_get_reg_offset(i, 
>> mmSDMA0_GFX_RB_WPTR_POLL_ADDR_LO),
>> -   lower_32_bits(wptr_gpu_addr));
>> +wptr_poll_addr_lo = RREG32(sdma_v4_0_get_reg_offset(i, 
>> mmSDMA0_GFX_RB_WPTR_POLL_ADDR_LO));
>> +wptr_poll_addr_lo = REG_SET_FIELD(wptr_poll_addr_lo, 
>> SDMA0_GFX_RB_WPTR_POLL_ADDR_LO,
>> +  ADDR, 
>> lower_32_bits(wptr_gpu_addr) >> 2);
>> +WREG32(sdma_v4_0_get_reg_offset(i, 
>> mmSDMA0_GFX_RB_WPTR_POLL_ADDR_LO), wptr_poll_addr_lo);
>>  WREG32(sdma_v4_0_get_reg_offset(i, 
>> mmSDMA0_GFX_RB_WPTR_POLL_ADDR_HI),
>> upper_32_bits(wptr_gpu_addr));
>>  wptr_poll_cntl = RREG32(sdma_v4_0_get_reg_offset(i, 
>> mmSDMA0_GFX_RB_WPTR_POLL_CNTL));
>
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: right shift 2 bits for SDMA_GFX_RB_WPTR_POLL_ADDR_LO v2

2017-09-25 Thread Ding, Pixel

+Xiangliang,
— 
Sincerely Yours,
Pixel








On 25/09/2017, 2:38 PM, "Pixel Ding"  wrote:

>Both Tonga and Vega register SPECs indicate that this registers only
>use 31:2 bits in DW. SRIOV test case immediately fails withtout this
>shift.
>
>v2: write to ADDR field
>
>Signed-off-by: Pixel Ding 
>---
> drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c | 9 +
> drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 8 +---
> 2 files changed, 10 insertions(+), 7 deletions(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c 
>b/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
>index 72f31cc..8b83b96 100644
>--- a/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
>+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c
>@@ -643,7 +643,7 @@ static void sdma_v3_0_enable(struct amdgpu_device *adev, 
>bool enable)
> static int sdma_v3_0_gfx_resume(struct amdgpu_device *adev)
> {
>   struct amdgpu_ring *ring;
>-  u32 rb_cntl, ib_cntl, wptr_poll_cntl;
>+  u32 rb_cntl, ib_cntl, wptr_poll_cntl, wptr_poll_addr_lo;
>   u32 rb_bufsz;
>   u32 wb_offset;
>   u32 doorbell;
>@@ -712,9 +712,10 @@ static int sdma_v3_0_gfx_resume(struct amdgpu_device 
>*adev)
> 
>   /* setup the wptr shadow polling */
>   wptr_gpu_addr = adev->wb.gpu_addr + (ring->wptr_offs * 4);
>-
>-  WREG32(mmSDMA0_GFX_RB_WPTR_POLL_ADDR_LO + sdma_offsets[i],
>- lower_32_bits(wptr_gpu_addr));
>+  wptr_poll_addr_lo = RREG32(mmSDMA0_GFX_RB_WPTR_POLL_ADDR_LO + 
>sdma_offsets[i]);
>+  wptr_poll_addr_lo = REG_SET_FIELD(wptr_poll_addr_lo, 
>SDMA0_GFX_RB_WPTR_POLL_ADDR_LO,
>+ADDR, 
>lower_32_bits(wptr_gpu_addr) >> 2);
>+  WREG32(mmSDMA0_GFX_RB_WPTR_POLL_ADDR_LO + sdma_offsets[i], 
>wptr_poll_addr_lo;
>   WREG32(mmSDMA0_GFX_RB_WPTR_POLL_ADDR_HI + sdma_offsets[i],
>  upper_32_bits(wptr_gpu_addr));
>   wptr_poll_cntl = RREG32(mmSDMA0_GFX_RB_WPTR_POLL_CNTL + 
> sdma_offsets[i]);
>diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c 
>b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>index c26d205..8b8338d 100644
>--- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
>@@ -574,7 +574,7 @@ static void sdma_v4_0_enable(struct amdgpu_device *adev, 
>bool enable)
> static int sdma_v4_0_gfx_resume(struct amdgpu_device *adev)
> {
>   struct amdgpu_ring *ring;
>-  u32 rb_cntl, ib_cntl, wptr_poll_cntl;
>+  u32 rb_cntl, ib_cntl, wptr_poll_cntl, wptr_poll_addr_lo
>   u32 rb_bufsz;
>   u32 wb_offset;
>   u32 doorbell;
>@@ -664,8 +664,10 @@ static int sdma_v4_0_gfx_resume(struct amdgpu_device 
>*adev)
> 
>   /* setup the wptr shadow polling */
>   wptr_gpu_addr = adev->wb.gpu_addr + (ring->wptr_offs * 4);
>-  WREG32(sdma_v4_0_get_reg_offset(i, 
>mmSDMA0_GFX_RB_WPTR_POLL_ADDR_LO),
>- lower_32_bits(wptr_gpu_addr));
>+  wptr_poll_addr_lo = RREG32(sdma_v4_0_get_reg_offset(i, 
>mmSDMA0_GFX_RB_WPTR_POLL_ADDR_LO));
>+  wptr_poll_addr_lo = REG_SET_FIELD(wptr_poll_addr_lo, 
>SDMA0_GFX_RB_WPTR_POLL_ADDR_LO,
>+ADDR, 
>lower_32_bits(wptr_gpu_addr) >> 2);
>+  WREG32(sdma_v4_0_get_reg_offset(i, 
>mmSDMA0_GFX_RB_WPTR_POLL_ADDR_LO), wptr_poll_addr_lo);
>   WREG32(sdma_v4_0_get_reg_offset(i, 
> mmSDMA0_GFX_RB_WPTR_POLL_ADDR_HI),
>  upper_32_bits(wptr_gpu_addr));
>   wptr_poll_cntl = RREG32(sdma_v4_0_get_reg_offset(i, 
> mmSDMA0_GFX_RB_WPTR_POLL_CNTL));
>-- 
>2.7.4
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 2/6] drm/amdgpu: reset GDW, GWS and OA software copy to update them

2017-05-04 Thread Ding, Pixel

Hi Xiangliang,

Please verify as Monk requested.

— 
Sincerely Yours,
Pixel







On 04/05/2017, 4:22 PM, "Liu, Monk" <monk@amd.com> wrote:

>Like I said, you need to test it on amd-staging-4.9 branch
>
>I don't think this patch is necessary unless you prove it 
>
>-----Original Message-
>From: Ding, Pixel 
>Sent: Thursday, May 04, 2017 4:14 PM
>To: Liu, Monk <monk@amd.com>; Yu, Xiangliang <xiangliang...@amd.com>; 
>amd-gfx@lists.freedesktop.org
>Subject: Re: [PATCH 2/6] drm/amdgpu: reset GDW, GWS and OA software copy to 
>update them
>
>In theory this patch is mandatory. It fixes the VM fault issue after TDR on 
>Tonga. You can remove it when you rewrite TDR, but currently Tonga GPU reset 
>has problem without this patch if MCBP is enabled. We need to flush GDS after 
>reset, it’s bypassed if the SW state is not cleaned.
>
>— 
>Sincerely Yours,
>Pixel
>
>
>
>
>
>
>
>On 04/05/2017, 3:28 PM, "Liu, Monk" <monk@amd.com> wrote:
>
>>NAK, this patch is not needed currently
>>
>>1, Because TDR feature is still undergoing, so all patches related with TDR 
>>should pending now
>>2, I don't think this is needed, @Pixel can you get the latest 
>>amd-staging-4.9 driver and try it, see if the "vm fault" duplicated without 
>>this patch?  Because I remember no such issue In staging driver 
>>
>>BR Monk
>>
>>-Original Message-
>>From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf Of 
>>Xiangliang Yu
>>Sent: Thursday, May 04, 2017 2:34 PM
>>To: amd-gfx@lists.freedesktop.org
>>Cc: Ding, Pixel <pixel.d...@amd.com>; Yu, Xiangliang <xiangliang...@amd.com>
>>Subject: [PATCH 2/6] drm/amdgpu: reset GDW, GWS and OA software copy to 
>>update them
>>
>>From: Pixel Ding <pixel.d...@amd.com>
>>
>>Reset GDW, GWS and OA when SRIOV do reset.
>>
>>Signed-off-by: Pixel Ding <pixel.d...@amd.com>
>>Signed-off-by: Xiangliang Yu <xiangliang...@amd.com>
>>---
>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 +-
>> 1 file changed, 9 insertions(+), 1 deletion(-)
>>
>>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>>b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>index aef2019..f11241d 100644
>>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>>@@ -2538,7 +2538,7 @@ static int amdgpu_recover_vram_from_shadow(struct 
>>amdgpu_device *adev,
>>  */
>> int amdgpu_sriov_gpu_reset(struct amdgpu_device *adev, bool voluntary)  {
>>- int i, r = 0;
>>+ int i, j, r = 0;
>>  int resched;
>>  struct amdgpu_bo *bo, *tmp;
>>  struct amdgpu_ring *ring;
>>@@ -2616,6 +2616,14 @@ int amdgpu_sriov_gpu_reset(struct amdgpu_device *adev, 
>>bool voluntary)
>>  }
>>  fence_put(fence);
>> 
>>+ for (i = 0; i < AMDGPU_MAX_VMHUBS; i++) {
>>+ struct amdgpu_vm_id_manager *id_mgr =
>>+ >vm_manager.id_mgr[i];
>>+
>>+ for (j = 1; j < id_mgr->num_ids; j++)
>>+ amdgpu_vm_reset_id(adev, i, j);
>>+ }
>>+
>>  for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
>>  struct amdgpu_ring *ring = adev->rings[i];
>>  if (!ring || !ring->sched.thread)
>>--
>>2.7.4
>>
>>___
>>amd-gfx mailing list
>>amd-gfx@lists.freedesktop.org
>>https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 2/6] drm/amdgpu: reset GDW, GWS and OA software copy to update them

2017-05-04 Thread Ding, Pixel

In theory this patch is mandatory. It fixes the VM fault issue after TDR on 
Tonga. You can remove it when you rewrite TDR, but currently Tonga GPU reset 
has problem without this patch if MCBP is enabled. We need to flush GDS after 
reset, it’s bypassed if the SW state is not cleaned.

— 
Sincerely Yours,
Pixel







On 04/05/2017, 3:28 PM, "Liu, Monk" <monk@amd.com> wrote:

>NAK, this patch is not needed currently
>
>1, Because TDR feature is still undergoing, so all patches related with TDR 
>should pending now
>2, I don't think this is needed, @Pixel can you get the latest amd-staging-4.9 
>driver and try it, see if the "vm fault" duplicated without this patch?  
>Because I remember no such issue In staging driver 
>
>BR Monk
>
>-Original Message-
>From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf Of 
>Xiangliang Yu
>Sent: Thursday, May 04, 2017 2:34 PM
>To: amd-gfx@lists.freedesktop.org
>Cc: Ding, Pixel <pixel.d...@amd.com>; Yu, Xiangliang <xiangliang...@amd.com>
>Subject: [PATCH 2/6] drm/amdgpu: reset GDW, GWS and OA software copy to update 
>them
>
>From: Pixel Ding <pixel.d...@amd.com>
>
>Reset GDW, GWS and OA when SRIOV do reset.
>
>Signed-off-by: Pixel Ding <pixel.d...@amd.com>
>Signed-off-by: Xiangliang Yu <xiangliang...@amd.com>
>---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 +-
> 1 file changed, 9 insertions(+), 1 deletion(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>index aef2019..f11241d 100644
>--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>@@ -2538,7 +2538,7 @@ static int amdgpu_recover_vram_from_shadow(struct 
>amdgpu_device *adev,
>  */
> int amdgpu_sriov_gpu_reset(struct amdgpu_device *adev, bool voluntary)  {
>-  int i, r = 0;
>+  int i, j, r = 0;
>   int resched;
>   struct amdgpu_bo *bo, *tmp;
>   struct amdgpu_ring *ring;
>@@ -2616,6 +2616,14 @@ int amdgpu_sriov_gpu_reset(struct amdgpu_device *adev, 
>bool voluntary)
>   }
>   fence_put(fence);
> 
>+  for (i = 0; i < AMDGPU_MAX_VMHUBS; i++) {
>+  struct amdgpu_vm_id_manager *id_mgr =
>+  >vm_manager.id_mgr[i];
>+
>+  for (j = 1; j < id_mgr->num_ids; j++)
>+  amdgpu_vm_reset_id(adev, i, j);
>+  }
>+
>   for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
>   struct amdgpu_ring *ring = adev->rings[i];
>   if (!ring || !ring->sched.thread)
>--
>2.7.4
>
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu/virt: skip VM fault handler for VF (v2)

2017-02-06 Thread Ding, Pixel

Thank you, Jerry. You can find more detailed discussion in previous thread.
— 
Sincerely Yours,
Pixel








On 07/02/2017, 3:03 PM, "Zhang, Jerry" <jerry.zh...@amd.com> wrote:

>If that info is enough for vf.
>Reviewed-by: Junwei Zhang <jerry.zh...@amd.com>
>
>Regards,
>Jerry (Junwei Zhang)
>
>Linux Base Graphics
>SRDC Software Development
>_
>
>
>> -Original Message-
>> From: Pixel Ding [mailto:pixel.d...@amd.com]
>> Sent: Tuesday, February 07, 2017 14:07
>> To: Zhou, David(ChunMing); Koenig, Christian; Zhang, Jerry
>> Cc: amd-gfx@lists.freedesktop.org; Ding, Pixel
>> Subject: [PATCH] drm/amdgpu/virt: skip VM fault handler for VF (v2)
>> 
>> VF uses KIQ to access registers. When VM fault occurs, the driver can't get 
>> back
>> the fence of KIQ submission and runs into CPU soft lockup.
>> 
>> v2: print IV entry info
>> 
>> Signed-off-by: Pixel Ding <pixel.d...@amd.com>
>> ---
>>  drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c | 7 +++
>>  1 file changed, 7 insertions(+)
>> 
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
>> b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
>> index 7669b32..6502508 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
>> @@ -1237,6 +1237,13 @@ static int gmc_v8_0_process_interrupt(struct
>> amdgpu_device *adev,  {
>>  u32 addr, status, mc_client;
>> 
>> +if (amdgpu_sriov_vf(adev)) {
>> +dev_err(adev->dev, "GPU fault detected: %d 0x%08x\n",
>> +entry->src_id, entry->src_data);
>> +dev_err(adev->dev, " Can't decode VM fault info here on SRIOV
>> VF\n");
>> +return 0;
>> +}
>> +
>>  addr = RREG32(mmVM_CONTEXT1_PROTECTION_FAULT_ADDR);
>>  status = RREG32(mmVM_CONTEXT1_PROTECTION_FAULT_STATUS);
>>  mc_client =
>> RREG32(mmVM_CONTEXT1_PROTECTION_FAULT_MCCLIENT);
>> --
>> 2.7.4
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu/virt: skip VM fault handler for VF

2017-02-06 Thread Ding, Pixel

Good point.
— 
Sincerely Yours,
Pixel








On 07/02/2017, 1:54 PM, "Zhang, Jerry" <jerry.zh...@amd.com> wrote:

>Maybe we can get the some useful info from IV entry.
>
>Regards,
>Jerry (Junwei Zhang)
>
>Linux Base Graphics
>SRDC Software Development
>_
>
>
>> -Original Message-
>> From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf Of
>> Pixel Ding
>> Sent: Tuesday, February 07, 2017 13:49
>> To: Zhou, David(ChunMing); Koenig, Christian
>> Cc: Ding, Pixel; amd-gfx@lists.freedesktop.org
>> Subject: [PATCH] drm/amdgpu/virt: skip VM fault handler for VF
>> 
>> VF uses KIQ to access registers. When VM fault occurs, the driver can't get 
>> back
>> the fence of KIQ submission and runs into CPU soft lockup.
>> 
>> Signed-off-by: Pixel Ding <pixel.d...@amd.com>
>> ---
>>  drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c | 5 +
>>  1 file changed, 5 insertions(+)
>> 
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
>> b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
>> index 7669b32..ff4fb37 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
>> @@ -1237,6 +1237,11 @@ static int gmc_v8_0_process_interrupt(struct
>> amdgpu_device *adev,  {
>>  u32 addr, status, mc_client;
>> 
>> +if (amdgpu_sriov_vf(adev)) {
>> +dev_err(adev->dev, "GPU fault detected on VF, can't access VM
>> registers\n");
>> +return 0;
>> +}
>> +
>>  addr = RREG32(mmVM_CONTEXT1_PROTECTION_FAULT_ADDR);
>>  status = RREG32(mmVM_CONTEXT1_PROTECTION_FAULT_STATUS);
>>  mc_client =
>> RREG32(mmVM_CONTEXT1_PROTECTION_FAULT_MCCLIENT);
>> --
>> 2.7.4
>> 
>> ___
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 2/2] drm/amdgpu/virt: schedule work to handle vm fault for VF

2017-02-06 Thread Ding, Pixel

Hi David/Christian,
Tested the proposed method which prints warning only, there’s no problem. I 
will send another review because it’s a totally different change.
— 
Sincerely Yours,
Pixel







On 06/02/2017, 4:56 PM, "Christian König" <deathsim...@vodafone.de> wrote:

>Hi Pixel,
>
>yeah agree with David here, but even busy waiting for the KIQ register 
>read is not really an option in the interrupt handler.
>
>Additional to that when we have a VM fault we usually see a mass storm 
>of them. So allocating and scheduling a work item for each fault like 
>you do here will certainly only result in a whole system crash.
>
>For now I would say just use DRM_ERROR() to print a warning that a VM 
>fault happen and we can't decode it because we can't access the register 
>under SRIOV.
>
>Regards,
>Christian.
>
>Am 06.02.2017 um 09:36 schrieb zhoucm1:
>> Hi Pixel,
>> I got your mean just now, since your VF must use KIQ to read/write 
>> registers, which use fence_wait to wait reading register completed.
>> The alternative way is implementing a new kiq reading/writing register 
>> way by using udelay instead of fence wait when reading/writing 
>> register in interrupt context.
>>
>> Regards,
>> David Zhou
>>
>> On 2017年02月06日 15:55, Ding, Pixel wrote:
>>> Thanks you for your comments, David. I totally agree on your point.
>>>
>>> However, The VM fault status registers record the latest VM fault 
>>> info no matter when they’re changed, that’s what we care about since 
>>> we don’t handle too much for VM fault even in bare metal system.
>>>
>>> On the other hand, what do you think if we insist to sleep in the 
>>> interrupt context and let the driver runs into a software bug?
>>>
>>> The VM registers are shared among all VFs and we don’t have a copy 
>>> for each in hardware, I think there's no other way to access them in 
>>> this case. Do you have any suggestion?
>>>
>>> —
>>> Sincerely Yours,
>>> Pixel
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On 06/02/2017, 3:33 PM, "Zhou, David(ChunMing)" <david1.z...@amd.com> 
>>> wrote:
>>>
>>>> INIT_WORK(>base, gmc_v8_0_vm_fault_sched);
>>>> However VF is used or not, schedule work shouldn't handle registers 
>>>> reading for interrupt, especially for status register, which could 
>>>> have been changed when you handle it in schedule work after interrupt.
>>>>
>>>> Regards,
>>>> David Zhou
>>>>
>>>> -Original Message-
>>>> From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On 
>>>> Behalf Of Pixel Ding
>>>> Sent: Monday, February 06, 2017 3:00 PM
>>>> To: amd-gfx@lists.freedesktop.org
>>>> Cc: Ding, Pixel <pixel.d...@amd.com>
>>>> Subject: [PATCH 2/2] drm/amdgpu/virt: schedule work to handle vm 
>>>> fault for VF
>>>>
>>>> VF uses KIQ to access registers that invoking fence_wait to get the 
>>>> accessing completed. When VM fault occurs, the driver can't sleep in 
>>>> interrupt context.
>>>>
>>>> For some test cases, VM fault is 'legal' and shouldn't cause driver 
>>>> soft lockup.
>>>>
>>>> Signed-off-by: Pixel Ding <pixel.d...@amd.com>
>>>> ---
>>>> drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c | 46 
>>>> ---
>>>> 1 file changed, 43 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c 
>>>> b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
>>>> index 7669b32..75c913f 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
>>>> @@ -1231,9 +1231,9 @@ static int 
>>>> gmc_v8_0_vm_fault_interrupt_state(struct amdgpu_device *adev,
>>>> return 0;
>>>> }
>>>>
>>>> -static int gmc_v8_0_process_interrupt(struct amdgpu_device *adev,
>>>> -  struct amdgpu_irq_src *source,
>>>> -  struct amdgpu_iv_entry *entry)
>>>> +static int gmc_v8_0_process_vm_fault(struct amdgpu_device *adev,
>>>> + struct amdgpu_irq_src *source,
>>>> + struct amdgpu_iv_entry *entry)
>>>> {
>>>> u32 addr, status, mc_client;
>>>>
>&

Re: [PATCH 1/2] drm/amdgpu: don't clean the framebuffer for VF

2017-02-06 Thread Ding, Pixel

Hi Christian,

I think you mean loading KMS multiple times by “reuse”. At every time loading 
KMS, guest driver tells the host to clear FB during early init. I can’t see a 
case that fb_probe is invoked out of loading KMS, is there?

Anyway I understand we don’t want to have many SRIOV conditional code paths. If 
I remove memset_io here and add GPU clear flag, should it be common logic or 
specific for VF?

— 
Sincerely Yours,
Pixel







On 06/02/2017, 5:17 PM, "Koenig, Christian" <christian.koe...@amd.com> wrote:

>Hi Pixel,
>
>you don't seem to understand the reason for the clear here.
>
>It is completely irrelevant that the host is clearing the memory for the 
>guest, the problem is that the guest reuse the memory it got assigned 
>from the host multiple times.
>
>IIRC we added this because you could see leftovers of the slash screen 
>in the text console when the resolution wasn't a multiple of the 
>character height.
>
>Regards,
>Christian.
>
>Am 06.02.2017 um 10:09 schrieb Ding, Pixel:
>> Hi Christian,
>>
>> The underlying host driver clears VF’s framebuffer when guest driver shake 
>> hands with it, that is done before guest driver init. I think it’s 
>> unnecessary to clear FB again even with GPU for VF.
>>
>> —
>> Sincerely Yours,
>> Pixel
>>
>>
>>
>>
>>
>> On 06/02/2017, 4:49 PM, "Koenig, Christian" <christian.koe...@amd.com> wrote:
>>
>>> Am 06.02.2017 um 07:24 schrieb Pixel Ding:
>>>> The SRIOV host driver cleans framebuffer for each VF, guest driver
>>>> needn't this action which costs much time on some virtualization
>>>> platform, otherwise it might get timeout to initialize.
>>>>
>>>> Signed-off-by: Pixel Ding <pixel.d...@amd.com>
>>>> ---
>>>>drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c | 4 +++-
>>>>1 file changed, 3 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c 
>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
>>>> index 1e735c4..f1eb4f5 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
>>>> @@ -242,7 +242,9 @@ static int amdgpufb_create(struct drm_fb_helper 
>>>> *helper,
>>>>/* setup helper */
>>>>rfbdev->helper.fb = fb;
>>>>
>>>> -  memset_io(abo->kptr, 0x0, amdgpu_bo_size(abo));
>>>> +  if (!amdgpu_sriov_vf(adev)) {
>>>> +  memset_io(abo->kptr, 0x0, amdgpu_bo_size(abo));
>>>> +  }
>>> Nit pick only, but coding style says to not use "{" "}" in an if without
>>> else and only a single line of code.
>>>
>>> Additional to that I'm not sure if that's a good idea. The memory
>>> allocated here might be already be used and so we need to clear it no
>>> matter where it came from.
>>>
>>> It's probably easier to just set the AMDGPU_GEM_CREATE_VRAM_CLEARED in
>>> the call to amdgpu_gem_object_create(). This makes the GPU clear the
>>> memory before the first CPU access to it.
>>>
>>> Regards,
>>> Christian.
>>>
>>>>
>>>>strcpy(info->fix.id, "amdgpudrmfb");
>>>>
>>>
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH 1/2] drm/amdgpu: don't clean the framebuffer for VF

2017-02-06 Thread Ding, Pixel

Hi Christian,

The underlying host driver clears VF’s framebuffer when guest driver shake 
hands with it, that is done before guest driver init. I think it’s unnecessary 
to clear FB again even with GPU for VF.

— 
Sincerely Yours,
Pixel





On 06/02/2017, 4:49 PM, "Koenig, Christian"  wrote:

>Am 06.02.2017 um 07:24 schrieb Pixel Ding:
>> The SRIOV host driver cleans framebuffer for each VF, guest driver
>> needn't this action which costs much time on some virtualization
>> platform, otherwise it might get timeout to initialize.
>>
>> Signed-off-by: Pixel Ding 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c | 4 +++-
>>   1 file changed, 3 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
>> index 1e735c4..f1eb4f5 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c
>> @@ -242,7 +242,9 @@ static int amdgpufb_create(struct drm_fb_helper *helper,
>>  /* setup helper */
>>  rfbdev->helper.fb = fb;
>>   
>> -memset_io(abo->kptr, 0x0, amdgpu_bo_size(abo));
>> +if (!amdgpu_sriov_vf(adev)) {
>> +memset_io(abo->kptr, 0x0, amdgpu_bo_size(abo));
>> +}
>
>Nit pick only, but coding style says to not use "{" "}" in an if without 
>else and only a single line of code.
>
>Additional to that I'm not sure if that's a good idea. The memory 
>allocated here might be already be used and so we need to clear it no 
>matter where it came from.
>
>It's probably easier to just set the AMDGPU_GEM_CREATE_VRAM_CLEARED in 
>the call to amdgpu_gem_object_create(). This makes the GPU clear the 
>memory before the first CPU access to it.
>
>Regards,
>Christian.
>
>>   
>>  strcpy(info->fix.id, "amdgpudrmfb");
>>   
>
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu/virt: increase mailbox timeout value

2017-01-19 Thread Ding, Pixel

Reviewed-by: Pixel Ding 


— 
Sincerely Yours,
Pixel







On 19/01/2017, 5:46 PM, "amd-gfx on behalf of Xiangliang Yu" 
 
wrote:

>If start all VFs at same time, the GPU hypervisor need more time
>to handle mailbox access. Set it to five seconds according to
>test experience.
>
>Signed-off-by: Xiangliang Yu 
>---
> drivers/gpu/drm/amd/amdgpu/mxgpu_vi.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.h 
>b/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.h
>index fd6216e..2db7411 100644
>--- a/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.h
>+++ b/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.h
>@@ -23,7 +23,7 @@
> #ifndef __MXGPU_VI_H__
> #define __MXGPU_VI_H__
> 
>-#define VI_MAILBOX_TIMEDOUT   150
>+#define VI_MAILBOX_TIMEDOUT   5000
> #define VI_MAILBOX_RESET_TIME 12
> 
> /* VI mailbox messages request */
>-- 
>2.7.4
>
>___
>amd-gfx mailing list
>amd-gfx@lists.freedesktop.org
>https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: check ring being ready before using

2017-01-18 Thread Ding, Pixel

Christian,

Thank you for the quick comments. The revision is coming soon.

— 
Sincerely Yours,
Pixel








On 18/01/2017, 9:26 PM, "Christian König" <deathsim...@vodafone.de> wrote:

>Am 18.01.2017 um 11:37 schrieb Pixel Ding:
>> From: Ding Pixel <pd...@amd.com>
>>
>> Return success when the ring is properly initialized, otherwise return
>> failure.
>>
>> Tonga SRIOV VF doesn't have UVD and VCE engines, the initialization of
>> these IPs is bypassed. The system crashes if application submit IB to
>> their rings which are not ready to use. It could be a common issue if
>> IP having ring buffer is disabled for some reason on specific ASIC, so
>> it should check the ring being ready to use.
>>
>> Bug: amdgpu_test crashes system on Tonga VF.
>
>Good catch, we probably have problem using the second VCE ring as well.
>
>>
>> Signed-off-by: Ding Pixel <pd...@amd.com>
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 6 ++
>>   1 file changed, 6 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> index 08dd97b..0a235b9 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> @@ -83,6 +83,12 @@ int amdgpu_cs_get_ring(struct amdgpu_device *adev, u32 
>> ip_type,
>>  }
>>  break;
>>  }
>> +
>> +if (!(*out_ring && (*out_ring)->ready)) {
>
>Don't check ring->ready here, that member is used during GPU reset to 
>test if we have successfully restarted the ring.
>
>Instead check if ring->adev is set, that is used in ring_init() to check 
>if a ring is initialized or not.
>
>> +DRM_ERROR("invalid ip type: %d\n", ip_type);
>
>Please adjust the error message to something like "ring %d not 
>initialized on IP %d\".
>
>Regards,
>Christian.
>
>> +return -EINVAL;
>> +}
>> +
>>  return 0;
>>   }
>>   
>
>
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

66 matches

Mail list logo