feel free to add
Reviewed-by: Jingwen Chen
On 2024/6/12 11:41, Jane Jian wrote:
> add jpeg table size to ctx table size rather than override it
>
> v2:
> save jpeg header info otherwise it will lose debug info
> Signed-off-by: Jane Jian
> ---
> drivers/gpu/drm/amd/amd
Acked-by: Jingwen Chen
On 2024/3/27 11:52, chongli2 wrote:
> support MES command SET_HW_RESOURCE1 in sriov
>
> Signed-off-by: chongli2
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h | 6 +++
> drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 5 +++
> drive
Acked-by: Jingwen Chen
On 2024/3/15 14:31, Lin.Cao wrote:
> pp_dpm_*clk should be set as read only for SRIOV one VF mode, remove
> S_IWUGO flag and _store function of these debugfs in one VF mode.
>
> Signed-off-by: Lin.Cao
> ---
> drivers/gpu/drm/amd/pm/amdgpu_pm.c | 10 ++
Reviewed-by: jingwen.ch...@amd.com
On 4/14/23 4:41 PM, Chong Li wrote:
> [WHY]
> Function "amdgpu_irq_update()" called by "amdgpu_device_ip_late_init()" is
> an atomic context.
> We shouldn't access registers through KIQ since "msleep()" may be called in
> "amdgpu_kiq_rreg()".
>
> [HOW]
>
Acked-by: Jingwen Chen
still need confirmation from Christian
On 9/1/22 5:29 PM, ZhenGuo Yin wrote:
> [Why]
> Ghost BO is released with non-empty bulk move object. There is a
> warning trace:
> WARNING: CPU: 19 PID: 1582 at ttm/ttm_bo.c:366 ttm_bo_release+0x2e1/0x2f0
> [amdtt
feel free to add
Reviewed-by: Jingwen Chen
On 7/14/22 10:31 AM, lin cao wrote:
> In the case of SRIOV, the register smnMp1_PMI_3_FIFO will get an invalid
> value which will cause the "shift out of bound". In Ubuntu22.04, this
> issue will be checked an related call tra
Hi Andrey,
Most part of the patches are OK, but the code will introduce a ib test fail on
the disabled vcn of sienna_cichlid.
In SRIOV use case we will disable one vcn on sienna_cichlid, I have attached a
patch to fix this issue, please check the attachment.
Best Regards,
Jingwen Chen
On 2
istian
>> ; dan...@ffwll.ch
>> *Subject:* Re: [RFC v4 02/11] drm/amdgpu: Move scheduler init to after XGMI
>> is ready
>> No because all the patch-set including this patch was landed into
>> drm-misc-next and will reach amd-staging-drm-next on the next upstream
>> r
Hi Andrey,
Will you port this patch into amd-staging-drm-next?
on 2/10/22 2:06 AM, Andrey Grodzovsky wrote:
> All comments are fixed and code pushed. Thanks for everyone
> who helped reviewing.
>
> Andrey
>
> On 2022-02-09 02:53, Christian König wrote:
>> Am 09.02.22 um 01:23 schrieb Andrey
Hi Andrey,
I have been testing your patch and it seems fine till now.
Best Regards,
Jingwen Chen
On 2022/2/3 上午2:57, Andrey Grodzovsky wrote:
> Just another ping, with Shyun's help I was able to do some smoke testing on
> XGMI SRIOV system (booting and triggering hive reset)
> an
Hi Andrey,
I don't have any XGMI machines here, maybe you can reach out shaoyun for help.
On 2022/1/29 上午12:57, Grodzovsky, Andrey wrote:
> Just a gentle ping.
>
> Andrey
>
:
call amdgpu_virt_init_data_exchange after gmc sw_init to make data
exchange workqueue run
v3:
clean up the code logic
v4:
add some comment and make the code more readable
Signed-off-by: Jingwen Chen
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
:
call amdgpu_virt_init_data_exchange after gmc sw_init to make data
exchange workqueue run
v3:
clean up the code logic
Signed-off-by: Jingwen Chen
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 12
2 files changed, 5 insertions(+), 9
:
call amdgpu_virt_init_data_exchange after gmc sw_init to make data
exchange workqueue run
Signed-off-by: Jingwen Chen
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 10 +++---
2 files changed, 4 insertions(+), 8 deletions(-)
diff --git
-off-by: Jingwen Chen
---
drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 10 +++---
1 file changed, 3 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
index 89ab0032..0b887a49b604 100644
--- a/drivers/gpu/drm/amd
Hi Andrey,
Please go ahead and push your change. I will prepare the RFC later.
On 2022/1/8 上午12:02, Andrey Grodzovsky wrote:
>
> On 2022-01-07 12:46 a.m., JingWen Chen wrote:
>> On 2022/1/7 上午11:57, JingWen Chen wrote:
>>> On 2022/1/7 上午3:13, Andrey Grodzovsky wrote:
>>
On 2022/1/7 上午11:57, JingWen Chen wrote:
> On 2022/1/7 上午3:13, Andrey Grodzovsky wrote:
>> On 2022-01-06 12:18 a.m., JingWen Chen wrote:
>>> On 2022/1/6 下午12:59, JingWen Chen wrote:
>>>> On 2022/1/6 上午2:24, Andrey Grodzovsky wrote:
>>>>> On 2022-01-0
On 2022/1/7 上午3:13, Andrey Grodzovsky wrote:
>
> On 2022-01-06 12:18 a.m., JingWen Chen wrote:
>> On 2022/1/6 下午12:59, JingWen Chen wrote:
>>> On 2022/1/6 上午2:24, Andrey Grodzovsky wrote:
>>>> On 2022-01-05 2:59 a.m., Christian König wrote:
>>>&g
On 2022/1/6 下午12:59, JingWen Chen wrote:
> On 2022/1/6 上午2:24, Andrey Grodzovsky wrote:
>> On 2022-01-05 2:59 a.m., Christian König wrote:
>>> Am 05.01.22 um 08:34 schrieb JingWen Chen:
>>>> On 2022/1/5 上午12:56, Andrey Grodzovsky wrote:
>>>>> O
On 2022/1/6 上午2:24, Andrey Grodzovsky wrote:
>
> On 2022-01-05 2:59 a.m., Christian König wrote:
>> Am 05.01.22 um 08:34 schrieb JingWen Chen:
>>> On 2022/1/5 上午12:56, Andrey Grodzovsky wrote:
>>>> On 2022-01-04 6:36 a.m., Christian König wrote:
>>>
implementation in amdgpu to
>>> actually match the requirements.
>>>
>>> Could be that the reset sequence is questionable in general, but I doubt so
>>> at least for now.
>>>
>>> See the FLR request from the hypervisor is just another source of s
t;
>> See the FLR request from the hypervisor is just another source of signaling
>> the need for a reset, similar to each job timeout on each queue. Otherwise
>> you have a race condition between the hypervisor and the scheduler.
>>
>> Properly setting in_gpu_reset
e_unlock_adev in flr_work instead of
try_lock since no one will conflict with this thread with reset_domain
introduced.
But we do need the reset_sem and adev->in_gpu_reset to keep device untouched
via user space.
Best Regards,
Jingwen Chen
On 2022/1/3 下午6:17, Christian König wrote
Reviewed-by: Jingwen Chen
On 2021/12/29 下午6:38, James Yao wrote:
> [why]
> Malicious mailbox event1 fails driver loading on vega10.
> An dummy event6 prevent driver from taking response from malicious event1 as
> its own.
>
> [how]
> On vega10, send a mailbox event6
I do agree with shaoyun, if the host find the gpu engine hangs first, and do
the flr, guest side thread may not know this and still try to access HW(e.g.
kfd is using a lot of amdgpu_in_reset and reset_sem to identify the reset
status). And this may lead to very bad result.
On 2021/12/24
[Why]
gmc bo will be pinned during loading amdgpu and reset in SRIOV while
only unpinned in unload amdgpu
[How]
add amdgpu_in_reset and sriov judgement to skip pin bo
v2: fix wrong judgement
Signed-off-by: Jingwen Chen
---
drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 4
drivers/gpu/drm/amd
[Why]
psp tmr bo will be pinned during loading amdgpu and reset in SRIOV while
only unpinned in unload amdgpu
[How]
add amdgpu_in_reset and sriov judgement to skip pin bo
v2: fix wrong judgement
Signed-off-by: Jingwen Chen
---
drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 4
1 file changed
[Why]
gmc bo will be pinned during loading amdgpu and reset in SRIOV while
only unpinned in unload amdgpu
[How]
add amdgpu_in_reset and sriov judgement to skip pin bo
Signed-off-by: Jingwen Chen
---
drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 4
drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 4
[Why]
psp tmr bo will be pinned during loading amdgpu and reset in SRIOV while
only unpinned in unload amdgpu
[How]
add amdgpu_in_reset and sriov judgement to skip pin bo
Signed-off-by: Jingwen Chen
---
drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 4
1 file changed, 4 insertions(+)
diff
patch abandoned
On 2021/12/14 上午11:52, Jingwen Chen wrote:
> [Why]
> gmc bo will be pinned during loading amdgpu and reset in SRIOV while
> only unpinned in unload amdgpu
>
> [How]
> add amdgpu_in_reset and sriov judgement for pin bo in gart_enable
>
> Sig
[Why]
gmc bo will be pinned during loading amdgpu and reset in SRIOV while
only unpinned in unload amdgpu
[How]
add amdgpu_in_reset and sriov judgement for pin bo in gart_enable
Signed-off-by: Jingwen Chen
---
drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 8 +---
drivers/gpu/drm/amd/amdgpu
[Why]
psp tmr bo will be pinned during loading amdgpu and reset in SRIOV while
only unpinned in unload amdgpu
[How]
add amdgpu_in_reset and sriov judgement for psp_tmr_init
Signed-off-by: Jingwen Chen
---
drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 11 ++-
1 file changed, 6 insertions
Hi Bokun,
please remove the change-id in your commit message when submitting this patch.
Acked-by: Jingwen Chen
On 2021/11/27 上午8:57, Bokun Zhang wrote:
> From: Bokun Zhang
>
> In the patch about advanced TDR mode, we force to always set
> amdgpu_gpu_recovery=2 under SRIOV. This
On 2021/10/28 上午3:43, Andrey Grodzovsky wrote:
>
> On 2021-10-25 10:57 p.m., JingWen Chen wrote:
>> On 2021/10/25 下午11:18, Andrey Grodzovsky wrote:
>>> On 2021-10-24 10:56 p.m., JingWen Chen wrote:
>>>> On 2021/10/23 上午4:41, Andrey Grodzovsky wrote:
&g
On 2021/10/25 下午11:18, Andrey Grodzovsky wrote:
>
> On 2021-10-24 10:56 p.m., JingWen Chen wrote:
>> On 2021/10/23 上午4:41, Andrey Grodzovsky wrote:
>>> What do you mean by underflow in this case ? You mean use after free
>>> because of extra dma_fence_put() ?
On 2021/10/23 上午4:41, Andrey Grodzovsky wrote:
>
> What do you mean by underflow in this case ? You mean use after free because
> of extra dma_fence_put() ?
yes
>
> On 2021-10-22 4:14 a.m., JingWen Chen wrote:
>> ping
>>
>> On 2021/10/22 AM11:33, Jingwen Chen wr
ping
On 2021/10/22 AM11:33, Jingwen Chen wrote:
> [Why]
> In advance tdr mode, the real bad job will be resubmitted twice, while
> in drm_sched_resubmit_jobs_ext, there's a dma_fence_put, so the bad job
> is put one more time than other jobs.
>
> [How]
> Adding dma_fence_ge
for normal jobs
Signed-off-by: Jingwen Chen
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4
1 file changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 41ce86244144..975f069f6fe8 100644
--- a/drivers/gpu/drm/amd
Add dummy_page_addr to sriov msg for host driver to set
GCVM_L2_PROTECTION_DEFAULT_ADDR* registers correctly.
v2:
should update vf2pf msg instead
Signed-off-by: Jingwen Chen
---
drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c| 1 +
drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h | 3 ++-
2 files
Add dummy_page_addr to sriov msg for host driver to set
GCVM_L2_PROTECTION_DEFAULT_ADDR* registers correctly.
Signed-off-by: Jingwen Chen
---
drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c| 1 +
drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h | 4 +++-
2 files changed, 4 insertions(+), 1 deletion
deleted from pending list. While if we use the ordered
workqueue for timedout in the driver, there will be no bailing job.
Do you have any suggestions?
Best Regards,
JingWen Chen
On Mon Sep 06, 2021 at 02:36:52PM +0800, Liu, Monk wrote:
> [AMD Official Use Only]
>
> > I'm fearing that ju
On Wed Sep 01, 2021 at 12:28:59AM -0400, Andrey Grodzovsky wrote:
>
> On 2021-09-01 12:25 a.m., Jingwen Chen wrote:
> > On Wed Sep 01, 2021 at 12:04:47AM -0400, Andrey Grodzovsky wrote:
> > > I will answer everything here -
> > >
> > > O
On Wed Sep 01, 2021 at 12:04:47AM -0400, Andrey Grodzovsky wrote:
> I will answer everything here -
>
> On 2021-08-31 9:58 p.m., Liu, Monk wrote:
>
>
> [AMD Official Use Only]
>
>
>
> In the previous discussion, you guys stated that we should drop the
> “kthread_should_park”
Reviewed-by: Jingwen Chen
On Fri Aug 27, 2021 at 02:56:51PM +0800, YuBiao Wang wrote:
> Send response to host after received the flr notification from host.
> Port NV change to vega10.
>
> Signed-off-by: YuBiao Wang
> ---
> drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c | 2 ++
>
; -Original Message-
> > From: Daniel Vetter
> > Sent: Thursday, August 19, 2021 5:31 PM
> > To: Grodzovsky, Andrey
> > Cc: Daniel Vetter ; Alex Deucher ;
> > Chen, JingWen ; Maling list - DRI developers
> > ; amd-gfx list
> > ; Liu, Monk ; Koenig,
>
revert this
commit.
This reverts commit 135517d3565b48f4def3b1b82008bc17eb5d1c90.
v2:
add dma_fence_get/put() around timedout_job to avoid concurrent delete
during processing timedout_job
v3:
park sched->thread instead during timedout_job.
Signed-off-by: Jingwen Chen
---
drivers/gpu/drm/schedu
Sorry, just get what you mean, will submit a v2 patch.
On Wed Aug 18, 2021 at 04:08:37PM +0800, Jingwen Chen wrote:
> On Tue Aug 17, 2021 at 03:43:58PM +0200, Christian König wrote:
> >
> >
> > Am 17.08.21 um 15:37 schrieb Andrey Grodzovsky:
> > > On 2021-08-17
On Tue Aug 17, 2021 at 03:43:58PM +0200, Christian König wrote:
>
>
> Am 17.08.21 um 15:37 schrieb Andrey Grodzovsky:
> > On 2021-08-17 12:28 a.m., Jingwen Chen wrote:
> > > [Why]
> > > for bailing job, this commit will delete it from pending list thus the
revert this
commit.
This reverts commit 135517d3565b48f4def3b1b82008bc17eb5d1c90.
v2:
add dma_fence_get/put() around timedout_job to avoid concurrent delete
during processing timedout_job
Signed-off-by: Jingwen Chen
---
drivers/gpu/drm/scheduler/sched_main.c | 23 +--
1 file
revert this
commit.
This reverts commit 135517d3565b48f4def3b1b82008bc17eb5d1c90.
Signed-off-by: Jingwen Chen
---
drivers/gpu/drm/scheduler/sched_main.c | 27 --
1 file changed, 27 deletions(-)
diff --git a/drivers/gpu/drm/scheduler/sched_main.c
b/drivers/gpu/drm/scheduler
st 11, 2021 12:41 AM
> To: Chen, JingWen ; amd-gfx@lists.freedesktop.org
> Cc: Liu, Monk ; Koenig, Christian
> ; Jack Zhang ; Jack Zhang
>
> Subject: Re: [PATCH v4] drm/amd/amdgpu embed hw_fence into amdgpu_job
>
> Reviewed-by: Andrey Grodzovsky
>
> Andrey
>
:
add tdr sequence support for this feature. Add a job_run_counter to
indicate whether this job is a resubmit job.
v5
add missing handling in amdgpu_fence_enable_signaling
Signed-off-by: Jingwen Chen
Signed-off-by: Jack Zhang
Reviewed-by: Andrey Grodzovsky
---
drivers/gpu/drm/amd/amdgpu
Hi Andrey,
The latest patch [PATCH v4] drm/amd/amdgpu embed hw_fence into
amdgpu_job has been sent to amd-gfx. can you help review this patch?
Best Regards,
Jingwen
On Tue Aug 10, 2021 at 10:51:17AM +0800, Jingwen Chen wrote:
> On Mon Aug 09, 2021 at 12:24:37PM -0400, Andrey Grodzovsky wr
On Mon Aug 09, 2021 at 12:24:37PM -0400, Andrey Grodzovsky wrote:
>
> On 2021-08-05 4:31 a.m., Jingwen Chen wrote:
> > [Why]
> > After embeded hw_fence to amdgpu_job, we need to add tdr support
> > for this feature.
> >
> > [How]
> > 1. Add a resubmi
:
add tdr sequence support for this feature. Add a job_run_counter to
indicate whether this job is a resubmit job.
Signed-off-by: Jingwen Chen
Signed-off-by: Jack Zhang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 1 -
drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 2 +-
drivers/gpu/drm/amd
On Mon Aug 09, 2021 at 10:18:37AM +0800, Jingwen Chen wrote:
> On Fri Aug 06, 2021 at 11:48:04AM +0200, Christian König wrote:
> >
> >
> > Am 06.08.21 um 07:52 schrieb Jingwen Chen:
> > > On Thu Aug 05, 2021 at 05:13:22PM -0400, Andrey Grodzovsky wrote:
> >
On Fri Aug 06, 2021 at 11:48:04AM +0200, Christian König wrote:
>
>
> Am 06.08.21 um 07:52 schrieb Jingwen Chen:
> > On Thu Aug 05, 2021 at 05:13:22PM -0400, Andrey Grodzovsky wrote:
> > > On 2021-08-05 4:31 a.m., Jingwen Chen wrote:
> > > > From: Jack Zhang
Signed-off-by: Jingwen Chen
Signed-off-by: Jack Zhang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 1 -
drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 62 -
drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 2 +-
drivers/gpu
On Thu Aug 05, 2021 at 05:13:22PM -0400, Andrey Grodzovsky wrote:
>
> On 2021-08-05 4:31 a.m., Jingwen Chen wrote:
> > From: Jack Zhang
> >
> > Why: Previously hw fence is alloced separately with job.
> > It caused historical lifetime issues and corner cases.
>
for guilty jobs.
v2:
use a job_run_counter in amdgpu_job to replace the resubmit_flag in
drm_sched_job. When the job_run_counter >= 1, it means this job is a
resubmit job.
Signed-off-by: Jack Zhang
Signed-off-by: Jingwen Chen
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 12 +++-
drivers/
into amdgpu_job.
1. We cover the normal job submission by this method.
2. For ib_test, and submit without a parent job keep the
legacy way to create a hw fence separately.
v2:
use AMDGPU_FENCE_FLAG_EMBED_IN_JOB_BIT to show that the fence is
embeded in a job.
Signed-off-by: Jingwen Chen
Signed-off-by: Jack
On Fri Jul 23, 2021 at 10:45:49AM +0200, Christian König wrote:
> Am 23.07.21 um 09:07 schrieb Jingwen Chen:
> > [SNIP]
> > Hi Christian,
> >
> > The thing is vm flush fence has no job passed to amdgpu_fence_emit, so
> > go through the jobs cannot help
On Fri Jul 23, 2021 at 08:33:02AM +0200, Christian König wrote:
> Am 22.07.21 um 18:47 schrieb Jingwen Chen:
> > On Thu Jul 22, 2021 at 06:24:28PM +0200, Christian König wrote:
> > > Am 22.07.21 um 16:45 schrieb Andrey Grodzovsky:
> > > > On 2021-07-22
for guilty jobs.
Signed-off-by: Jack Zhang
Signed-off-by: Jingwen Chen
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 12 +++-
drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 13 +
drivers/gpu/drm/amd/amdgpu/amdgpu_job.c| 4 +++-
drivers/gpu/drm/scheduler/sched_main.c | 1
into amdgpu_job.
1. We cover the normal job submission by this method.
2. For ib_test, and submit without a parent job keep the
legacy way to create a hw fence separately.
Signed-off-by: Jingwen Chen
Signed-off-by: Jack Zhang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 1 -
drivers/gpu/drm/amd/amdgpu
for guilty jobs.
Signed-off-by: Jack Zhang
Signed-off-by: Jingwen Chen
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 12 +++-
drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 13 +
drivers/gpu/drm/amd/amdgpu/amdgpu_job.c| 4 +++-
drivers/gpu/drm/scheduler/sched_main.c | 1
into amdgpu_job.
1. We cover the normal job submission by this method.
2. For ib_test, and submit without a parent job keep the
legacy way to create a hw fence separately.
Signed-off-by: Jingwen Chen
Signed-off-by: Jack Zhang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 1 -
drivers/gpu/drm/amd/amdgpu
On Fri Jul 23, 2021 at 12:06:32AM -0400, Andrey Grodzovsky wrote:
>
> On 2021-07-22 8:20 p.m., Jingwen Chen wrote:
> > On Thu Jul 22, 2021 at 01:50:09PM -0400, Andrey Grodzovsky wrote:
> > > On 2021-07-22 1:27 p.m., Jingwen Chen wrote:
> > > > On Thu Jul 22,
On Thu Jul 22, 2021 at 01:50:09PM -0400, Andrey Grodzovsky wrote:
>
> On 2021-07-22 1:27 p.m., Jingwen Chen wrote:
> > On Thu Jul 22, 2021 at 01:17:13PM -0400, Andrey Grodzovsky wrote:
> > > On 2021-07-22 12:47 p.m., Jingwen Chen wrote:
> > > > On Thu Jul 22, 20
On Thu Jul 22, 2021 at 01:17:13PM -0400, Andrey Grodzovsky wrote:
>
> On 2021-07-22 12:47 p.m., Jingwen Chen wrote:
> > On Thu Jul 22, 2021 at 06:24:28PM +0200, Christian König wrote:
> > > Am 22.07.21 um 16:45 schrieb Andrey Grodzovsky:
> > > > On 2021-07
On Thu Jul 22, 2021 at 06:24:28PM +0200, Christian König wrote:
> Am 22.07.21 um 16:45 schrieb Andrey Grodzovsky:
> >
> > On 2021-07-22 6:45 a.m., Jingwen Chen wrote:
> > > On Wed Jul 21, 2021 at 12:53:51PM -0400, Andrey Grodzovsky wrote:
> > > > On 2021-
On Thu Jul 22, 2021 at 10:45:40AM -0400, Andrey Grodzovsky wrote:
>
> On 2021-07-22 6:45 a.m., Jingwen Chen wrote:
> > On Wed Jul 21, 2021 at 12:53:51PM -0400, Andrey Grodzovsky wrote:
> > > On 2021-07-20 11:13 p.m., Jingwen Chen wrote:
> > > > [Why]
> > &
On Wed Jul 21, 2021 at 12:53:51PM -0400, Andrey Grodzovsky wrote:
>
> On 2021-07-20 11:13 p.m., Jingwen Chen wrote:
> > [Why]
> > After embeded hw_fence to amdgpu_job, we need to add tdr support
> > for this feature.
> >
> > [How]
> > 1. Add a resubmi
into amdgpu_job.
1. We cover the normal job submission by this method.
2. For ib_test, and submit without a parent job keep the
legacy way to create a hw fence separately.
Signed-off-by: Jingwen Chen
Signed-off-by: Jack Zhang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 1 -
drivers/gpu/drm/amd/amdgpu
for guilty jobs.
Signed-off-by: Jack Zhang
Signed-off-by: Jingwen Chen
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 12 +++-
drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 16 +++-
drivers/gpu/drm/amd/amdgpu/amdgpu_job.c| 4 +++-
drivers/gpu/drm/scheduler/sched_main.c | 1
, then the innocent sdma job will be set to guilty. This will lead
to a page fault after resubmitting job.
[How]
If the job is a kernel job, we will always consider it not guilty
Signed-off-by: Jingwen Chen
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 +++---
1 file changed, 3 insertions(+), 3
, then the innocent sdma job will be set to guilty. This will lead
to a page fault after resubmitting job.
[How]
If the job is a paging job, we will always consider it not guilty
Signed-off-by: Jingwen Chen
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 +++---
1 file changed, 3 insertions(+), 3
job will be set to guilty as it only
has NORMAL priority. This will lead to a page fault after
resubmitting job.
[How]
sdma should always have KERNEL priority. The kernel job will always
be resubmitted.
Signed-off-by: Jingwen Chen
---
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 4 ++--
1 file
[Why]
If flr_work takes read_lock, then other threads who takes
read_lock can access hardware when host is doing vf flr.
[How]
flr_work should take write_lock to avoid this case.
Signed-off-by: Jingwen Chen
---
drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c | 4 ++--
drivers/gpu/drm/amd/amdgpu
: Idee78e8c1c781463048f2f6311fdc70488ef05b2
Signed-off-by: Victor Zhao
Signed-off-by: Jingwen Chen
---
drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h | 1 +
drivers/gpu/drm/amd/amdgpu/psp_v11_0.c | 3 ++-
drivers/gpu/drm/amd/amdgpu/psp_v3_1.c | 3 ++-
3 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu
From: Victor Zhao
save psp ring wptr in SRIOV to avoid attack to avoid extra changes to
MP0_SMN_C2PMSG_102 reg
Change-Id: Idee78e8c1c781463048f2f6311fdc70488ef05b2
Signed-off-by: Victor Zhao
Signed-off-by: Jingwen Chen
---
drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 1 +
drivers/gpu/drm/amd
[Why]
the gem object rfb->base.obj[0] is get according to num_planes
in amdgpufb_create, but is not put according to num_planes
[How]
put rfb->base.obj[0] in amdgpu_fbdev_destroy according to num_planes
Signed-off-by: Jingwen Chen
---
drivers/gpu/drm/amd/amdgpu/amdgpu_fb.c | 3 +++
092ae120 R08: 7ffdfa3d6551 R09:
[324584.592188] R10: 7fea6f660c40 R11: 0206 R12:
55b9092ae188
[324584.592189] R13: 0001 R14: 55b9092ae188 R15:
7ffdfa3d8990
[324584.592190] ---[ end trace 4ea03bb6309ad6c3 ]---
Signed-off-by:
[Why]
when try to shutdown guest vm in sriov mode, virt data
exchange is not fini. After vram lost, trying to write
vram could hang cpu.
[How]
add fini virt data exchange in ip_suspend
Signed-off-by: Jingwen Chen
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +++-
1 file changed, 3
Do fini data exchange everytime req_gpu_fini in SRIOV
Signed-off-by: Jingwen Chen
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +---
drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 3 +++
2 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu
Move gpu_reset_counter after drm_sched_stop to avoid race
condition caused by job submitted between reset_count +1 and
drm_sched_stop.
Signed-off-by: Jingwen Chen
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm
[Why]
when a job is scheduled during TDR(after device reset count
increase and before drm_sched_stop), this job won't do vm_flush
when resubmit itself after GPU reset done. This can lead to
a page fault.
[How]
Always do vm_flush for resubmit job.
Signed-off-by: Jingwen Chen
---
drivers/gpu/drm
Add a flag in drm_sched_job to indicate the job resubmit.
Signed-off-by: Jingwen Chen
---
drivers/gpu/drm/scheduler/sched_main.c | 2 ++
include/drm/gpu_scheduler.h| 2 ++
2 files changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/scheduler/sched_main.c
b/drivers/gpu/drm
[Why]
when vram lost happened in guest, try to write vram can lead to
kernel stuck.
[How]
When the readback data is invalid, don't do write work, directly
reschedule a new work.
Signed-off-by: Jingwen Chen
---
drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 6 +-
1 file changed, 5 insertions
When using cancel_delayed_work_sync, there's no need
to flush_delayed_work first. This sequence can lead to
a redundant loop of work executing.
Signed-off-by: Jingwen Chen
---
drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu
power profile switch in vcn need to send SetWorkLoad msg to
smu, which is not supported in sriov.
Signed-off-by: Jingwen Chen
---
drivers/gpu/drm/amd/pm/amdgpu_dpm.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
power profile switch in vcn need to send SetWorkLoad msg to
smu, which is not supported in sriov.
Signed-off-by: Jingwen Chen
---
drivers/gpu/drm/amd/pm/amdgpu_dpm.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
smc, sdma, sos, ta and asd fw is not used in SRIOV. Skip them to
accelerate sw_init for navi12.
v2: skip above fw in SRIOV for vega10 and sienna_cichlid
v3: directly skip psp fw loading in SRIOV
Signed-off-by: Jingwen Chen
---
drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 10
smc fw is not needed in SRIOV, thus driver should not try to get smc
fw data.
Signed-off-by: Jingwen Chen
---
.../gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c| 61 ++-
1 file changed, 32 insertions(+), 29 deletions(-)
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c
b
smc fw is not needed in SRIOV, thus driver should not try to get smc
fw data.
Signed-off-by: Jingwen Chen
---
.../gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c| 61 ++-
1 file changed, 32 insertions(+), 29 deletions(-)
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c
b
smc, sdma, sos, ta and asd fw is not used in SRIOV. Skip them to
accelerate sw_init for navi12.
v2: skip above fw in SRIOV for vega10 and sienna_cichlid
Signed-off-by: Jingwen Chen
---
drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 9 +
drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c
smc fw is not needed in SRIOV, thus driver should not try to get smc
fw data.
Signed-off-by: Jingwen Chen
---
.../gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c| 61 ++-
1 file changed, 32 insertions(+), 29 deletions(-)
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c
b
smc, sdma, sos and asd fw is not used in SRIOV. Skip them to
accelerate sw_init.
Signed-off-by: Jingwen Chen
---
drivers/gpu/drm/amd/amdgpu/psp_v11_0.c | 16 +---
drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 3 +++
drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c | 3
smu_post_init needs to enable SMU feature, while this require
virtualization off. Skip it since this feature is not used in SRIOV.
v2: move the check to the early stage of smu_post_init.
v3: fix typo
Signed-off-by: Jingwen Chen
---
drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c | 3 +++
1
smu_post_init needs to enable SMU feature, while this require
virtualization off. Skip it since this feature is not used in SRIOV.
v2: move the check to the early stage of smu_post_init.
Signed-off-by: Jingwen Chen
---
drivers/gpu/drm/amd/pm/swsmu/smu11/navi10_ppt.c | 3 +++
1 file changed, 3
1 - 100 of 101 matches
Mail list logo