Re:Re: [PATCH v4 00/14] drm: Add a driver for CSF-based Mali GPUs
HI Boris: 在 2024-02-05 17:03:21,"Boris Brezillon" 写道: >+Danilo for the panthor gpuvm-needs update. > >On Sun, 4 Feb 2024 09:14:44 +0800 (CST) >"Andy Yan" wrote: > >> Hi Boris: >> I saw this warning sometimes(Run on a armbain based bookworm),not sure is a >> know issue or something else。 >> [15368.293031] systemd-journald[715]: Received client request to relinquish >> /var/log/journal/1bc4a340506142af9bd31a6a3d2170ba access. >> [37743.040737] [ cut here ] >> [37743.040764] panthor fb00.gpu: drm_WARN_ON(shmem->pages_use_count) >> [37743.040890] WARNING: CPU: 2 PID: 5702 at >> drivers/gpu/drm/drm_gem_shmem_helper.c:158 drm_gem_shmem_free+0x144/0x14c >> [drm_shmem_helper] >> [37743.040929] Modules linked in: joydev rfkill sunrpc lz4hc lz4 zram >> binfmt_misc hantro_vpu crct10dif_ce v4l2_vp9 v4l2_h264 >> snd_soc_simple_amplifier v4l2_mem2mem videobuf2_dma_contig >> snd_soc_es8328_i2c videobuf2_memops rk_crypto2 snd_soc_es8328 videobuf2_v4l2 >> sm3_generic videodev crypto_engine sm3 rockchip_rng videobuf2_common >> nvmem_rockchip_otp snd_soc_rockchip_i2s_tdm snd_soc_hdmi_codec >> snd_soc_simple_card mc snd_soc_simple_card_utils snd_soc_core snd_compress >> ac97_bus snd_pcm_dmaengine snd_pcm snd_timer snd soundcore dm_mod ip_tables >> x_tables autofs4 dw_hdmi_qp_i2s_audio dw_hdmi_qp_cec rk808_regulator >> rockchipdrm dw_mipi_dsi dw_hdmi_qp dw_hdmi analogix_dp drm_dma_helper >> fusb302 display_connector rk8xx_spi drm_display_helper >> phy_rockchip_snps_pcie3 phy_rockchip_samsung_hdptx_hdmi panthor tcpm >> rk8xx_core cec drm_gpuvm gpu_sched drm_kms_helper drm_shmem_helper drm_exec >> r8169 drm pwm_bl adc_keys >> [37743.041108] CPU: 2 PID: 5702 Comm: kworker/u16:8 Not tainted >> 6.8.0-rc1-edge-rockchip-rk3588 #2 >> [37743.041115] Hardware name: Rockchip RK3588 EVB1 V10 Board (DT) >> [37743.041120] Workqueue: panthor-cleanup >> panthor_vm_bind_job_cleanup_op_ctx_work [panthor] >> [37743.041151] pstate: 6049 (nZCv daif +PAN -UAO -TCO -DIT -SSBS >> BTYPE=--) >> [37743.041157] pc : drm_gem_shmem_free+0x144/0x14c [drm_shmem_helper] >> [37743.041169] lr : drm_gem_shmem_free+0x144/0x14c [drm_shmem_helper] >> [37743.041181] sp : 80008d37bcc0 >> [37743.041184] x29: 80008d37bcc0 x28: 800081d379c0 x27: >> 800081d37000 >> [37743.041196] x26: 00019909a280 x25: 00019909a2c0 x24: >> 0001017a4c05 >> [37743.041206] x23: dead0100 x22: dead0122 x21: >> 0001627ac1a0 >> [37743.041217] x20: x19: 0001627ac000 x18: >> >> [37743.041227] x17: 00040044 x16: 005000f2b5503510 x15: >> fff91b77 >> [37743.041238] x14: 0001 x13: 03c5 x12: >> ffea >> [37743.041248] x11: dfff x10: dfff x9 : >> 800081e0e818 >> [37743.041259] x8 : 0002ffe8 x7 : c000dfff x6 : >> 000affa8 >> [37743.041269] x5 : 1fff x4 : x3 : >> 8000819a6008 >> [37743.041279] x2 : x1 : x0 : >> 00018465e900 >> [37743.041290] Call trace: >> [37743.041293] drm_gem_shmem_free+0x144/0x14c [drm_shmem_helper] >> [37743.041306] panthor_gem_free_object+0x24/0xa0 [panthor] >> [37743.041321] drm_gem_object_free+0x1c/0x30 [drm] >> [37743.041452] panthor_vm_bo_put+0xc4/0x12c [panthor] >> [37743.041475] panthor_vm_cleanup_op_ctx.constprop.0+0xb0/0x104 [panthor] >> [37743.041491] panthor_vm_bind_job_cleanup_op_ctx_work+0x28/0xd0 [panthor] > >Ok, I think I found the culprit: there's a race between >the drm_gpuvm_bo_put() call in panthor_vm_bo_put() and the list >iteration done by drm_gpuvm_prepare_objects(). Because we're not >setting DRM_GPUVM_RESV_PROTECTED, the code goes through the 'lockless' >iteration loop, and takes/release a vm_bo ref at each iteration. This >means our 'were we the last vm_bo user?' test in panthor_vm_bo_put() >might return false even if we were actually the last user, and when >for_each_vm_bo_in_list() releases the ref it acquired, it not only leaks >the pin reference, thus leaving GEM pages pinned (which explains this >WARN_ON() splat), but it also calls drm_gpuvm_bo_destroy() in a path >where we don't hold the GPUVA list lock, which is bad. > >Long story short, I'll have to use DRM_GPUVM_RESV_PROTECTED, which is >fine because I'm deferring vm_bo removal to a work where taking the VM >resv lock is allowed. Since I was the one asking for this lockless >iterator in the first place, I wonder if we should kill that and make >DRM_GPUVM_RESV_PROTECTED the default (this would greatly simplify >the code). AFAICT, The PowerVR driver shouldn't be impacted because it's >using drm_gpuvm in synchronous mode only, and Xe already uses the >resv-protected mode. That leaves Nouveau, but IIRC, it's also doing VM >updates in the ioctl path. > >Danilo, any opinions? > >Andy, I pushed a new version to the panthor-next [1] and >panthor-next+rk3588 [2] branches. The fix
Re:Re: [PATCH v4 00/14] drm: Add a driver for CSF-based Mali GPUs
Hi Boris: 在 2024-02-05 17:03:21,"Boris Brezillon" 写道: >+Danilo for the panthor gpuvm-needs update. > >On Sun, 4 Feb 2024 09:14:44 +0800 (CST) >"Andy Yan" wrote: > >> Hi Boris: >> I saw this warning sometimes(Run on a armbain based bookworm),not sure is a >> know issue or something else。 >> [15368.293031] systemd-journald[715]: Received client request to relinquish >> /var/log/journal/1bc4a340506142af9bd31a6a3d2170ba access. >> [37743.040737] [ cut here ] >> [37743.040764] panthor fb00.gpu: drm_WARN_ON(shmem->pages_use_count) >> [37743.040890] WARNING: CPU: 2 PID: 5702 at >> drivers/gpu/drm/drm_gem_shmem_helper.c:158 drm_gem_shmem_free+0x144/0x14c >> [drm_shmem_helper] >> [37743.040929] Modules linked in: joydev rfkill sunrpc lz4hc lz4 zram >> binfmt_misc hantro_vpu crct10dif_ce v4l2_vp9 v4l2_h264 >> snd_soc_simple_amplifier v4l2_mem2mem videobuf2_dma_contig >> snd_soc_es8328_i2c videobuf2_memops rk_crypto2 snd_soc_es8328 videobuf2_v4l2 >> sm3_generic videodev crypto_engine sm3 rockchip_rng videobuf2_common >> nvmem_rockchip_otp snd_soc_rockchip_i2s_tdm snd_soc_hdmi_codec >> snd_soc_simple_card mc snd_soc_simple_card_utils snd_soc_core snd_compress >> ac97_bus snd_pcm_dmaengine snd_pcm snd_timer snd soundcore dm_mod ip_tables >> x_tables autofs4 dw_hdmi_qp_i2s_audio dw_hdmi_qp_cec rk808_regulator >> rockchipdrm dw_mipi_dsi dw_hdmi_qp dw_hdmi analogix_dp drm_dma_helper >> fusb302 display_connector rk8xx_spi drm_display_helper >> phy_rockchip_snps_pcie3 phy_rockchip_samsung_hdptx_hdmi panthor tcpm >> rk8xx_core cec drm_gpuvm gpu_sched drm_kms_helper drm_shmem_helper drm_exec >> r8169 drm pwm_bl adc_keys >> [37743.041108] CPU: 2 PID: 5702 Comm: kworker/u16:8 Not tainted >> 6.8.0-rc1-edge-rockchip-rk3588 #2 >> [37743.041115] Hardware name: Rockchip RK3588 EVB1 V10 Board (DT) >> [37743.041120] Workqueue: panthor-cleanup >> panthor_vm_bind_job_cleanup_op_ctx_work [panthor] >> [37743.041151] pstate: 6049 (nZCv daif +PAN -UAO -TCO -DIT -SSBS >> BTYPE=--) >> [37743.041157] pc : drm_gem_shmem_free+0x144/0x14c [drm_shmem_helper] >> [37743.041169] lr : drm_gem_shmem_free+0x144/0x14c [drm_shmem_helper] >> [37743.041181] sp : 80008d37bcc0 >> [37743.041184] x29: 80008d37bcc0 x28: 800081d379c0 x27: >> 800081d37000 >> [37743.041196] x26: 00019909a280 x25: 00019909a2c0 x24: >> 0001017a4c05 >> [37743.041206] x23: dead0100 x22: dead0122 x21: >> 0001627ac1a0 >> [37743.041217] x20: x19: 0001627ac000 x18: >> >> [37743.041227] x17: 00040044 x16: 005000f2b5503510 x15: >> fff91b77 >> [37743.041238] x14: 0001 x13: 03c5 x12: >> ffea >> [37743.041248] x11: dfff x10: dfff x9 : >> 800081e0e818 >> [37743.041259] x8 : 0002ffe8 x7 : c000dfff x6 : >> 000affa8 >> [37743.041269] x5 : 1fff x4 : x3 : >> 8000819a6008 >> [37743.041279] x2 : x1 : x0 : >> 00018465e900 >> [37743.041290] Call trace: >> [37743.041293] drm_gem_shmem_free+0x144/0x14c [drm_shmem_helper] >> [37743.041306] panthor_gem_free_object+0x24/0xa0 [panthor] >> [37743.041321] drm_gem_object_free+0x1c/0x30 [drm] >> [37743.041452] panthor_vm_bo_put+0xc4/0x12c [panthor] >> [37743.041475] panthor_vm_cleanup_op_ctx.constprop.0+0xb0/0x104 [panthor] >> [37743.041491] panthor_vm_bind_job_cleanup_op_ctx_work+0x28/0xd0 [panthor] > >Ok, I think I found the culprit: there's a race between >the drm_gpuvm_bo_put() call in panthor_vm_bo_put() and the list >iteration done by drm_gpuvm_prepare_objects(). Because we're not >setting DRM_GPUVM_RESV_PROTECTED, the code goes through the 'lockless' >iteration loop, and takes/release a vm_bo ref at each iteration. This >means our 'were we the last vm_bo user?' test in panthor_vm_bo_put() >might return false even if we were actually the last user, and when >for_each_vm_bo_in_list() releases the ref it acquired, it not only leaks >the pin reference, thus leaving GEM pages pinned (which explains this >WARN_ON() splat), but it also calls drm_gpuvm_bo_destroy() in a path >where we don't hold the GPUVA list lock, which is bad. > >Long story short, I'll have to use DRM_GPUVM_RESV_PROTECTED, which is >fine because I'm deferring vm_bo removal to a work where taking the VM >resv lock is allowed. Since I was the one asking for this lockless >iterator in the first place, I wonder if we should kill that and make >DRM_GPUVM_RESV_PROTECTED the default (this would greatly simplify >the code). AFAICT, The PowerVR driver shouldn't be impacted because it's >using drm_gpuvm in synchronous mode only, and Xe already uses the >resv-protected mode. That leaves Nouveau, but IIRC, it's also doing VM >updates in the ioctl path. > >Danilo, any opinions? > >Andy, I pushed a new version to the panthor-next [1] and >panthor-next+rk3588 [2] branches. The fix
Re:Re: [PATCH v4 00/14] drm: Add a driver for CSF-based Mali GPUs
Hi Boris: 在 2024-02-04 18:07:56,"Boris Brezillon" 写道: >On Sun, 4 Feb 2024 09:14:44 +0800 (CST) >"Andy Yan" wrote: > >> Hi Boris: >> I saw this warning sometimes(Run on a armbain based bookworm),not sure is a >> know issue or something else。 > >No it's not, and I didn't manage to reproduce locally. Looks like >you're using a 6.8 kernel, but my panthor-v4/next branches are still >based on drm-misc-next from 2 weeks ago, which was based on a 6.7 >kernel. Can you share the kernel branch you're using? > Here is my kernel branch: https://github.com/andyshrk/linux/commits/linux-6.8-rc1-rk3588_panthor_v4/ >> [15368.293031] systemd-journald[715]: Received client request to relinquish >> /var/log/journal/1bc4a340506142af9bd31a6a3d2170ba access. >> [37743.040737] [ cut here ] >> [37743.040764] panthor fb00.gpu: drm_WARN_ON(shmem->pages_use_count) >> [37743.040890] WARNING: CPU: 2 PID: 5702 at >> drivers/gpu/drm/drm_gem_shmem_helper.c:158 drm_gem_shmem_free+0x144/0x14c >> [drm_shmem_helper] >> [37743.040929] Modules linked in: joydev rfkill sunrpc lz4hc lz4 zram >> binfmt_misc hantro_vpu crct10dif_ce v4l2_vp9 v4l2_h264 >> snd_soc_simple_amplifier v4l2_mem2mem videobuf2_dma_contig >> snd_soc_es8328_i2c videobuf2_memops rk_crypto2 snd_soc_es8328 videobuf2_v4l2 >> sm3_generic videodev crypto_engine sm3 rockchip_rng videobuf2_common >> nvmem_rockchip_otp snd_soc_rockchip_i2s_tdm snd_soc_hdmi_codec >> snd_soc_simple_card mc snd_soc_simple_card_utils snd_soc_core snd_compress >> ac97_bus snd_pcm_dmaengine snd_pcm snd_timer snd soundcore dm_mod ip_tables >> x_tables autofs4 dw_hdmi_qp_i2s_audio dw_hdmi_qp_cec rk808_regulator >> rockchipdrm dw_mipi_dsi dw_hdmi_qp dw_hdmi analogix_dp drm_dma_helper >> fusb302 display_connector rk8xx_spi drm_display_helper >> phy_rockchip_snps_pcie3 phy_rockchip_samsung_hdptx_hdmi panthor tcpm >> rk8xx_core cec drm_gpuvm gpu_sched drm_kms_helper drm_shmem_helper drm_exec >> r8169 drm pwm_bl adc_keys >> [37743.041108] CPU: 2 PID: 5702 Comm: kworker/u16:8 Not tainted >> 6.8.0-rc1-edge-rockchip-rk3588 #2 >> [37743.041115] Hardware name: Rockchip RK3588 EVB1 V10 Board (DT) >> [37743.041120] Workqueue: panthor-cleanup >> panthor_vm_bind_job_cleanup_op_ctx_work [panthor] >> [37743.041151] pstate: 6049 (nZCv daif +PAN -UAO -TCO -DIT -SSBS >> BTYPE=--) >> [37743.041157] pc : drm_gem_shmem_free+0x144/0x14c [drm_shmem_helper] >> [37743.041169] lr : drm_gem_shmem_free+0x144/0x14c [drm_shmem_helper] >> [37743.041181] sp : 80008d37bcc0 >> [37743.041184] x29: 80008d37bcc0 x28: 800081d379c0 x27: >> 800081d37000 >> [37743.041196] x26: 00019909a280 x25: 00019909a2c0 x24: >> 0001017a4c05 >> [37743.041206] x23: dead0100 x22: dead0122 x21: >> 0001627ac1a0 >> [37743.041217] x20: x19: 0001627ac000 x18: >> >> [37743.041227] x17: 00040044 x16: 005000f2b5503510 x15: >> fff91b77 >> [37743.041238] x14: 0001 x13: 03c5 x12: >> ffea >> [37743.041248] x11: dfff x10: dfff x9 : >> 800081e0e818 >> [37743.041259] x8 : 0002ffe8 x7 : c000dfff x6 : >> 000affa8 >> [37743.041269] x5 : 1fff x4 : x3 : >> 8000819a6008 >> [37743.041279] x2 : x1 : x0 : >> 00018465e900 >> [37743.041290] Call trace: >> [37743.041293] drm_gem_shmem_free+0x144/0x14c [drm_shmem_helper] >> [37743.041306] panthor_gem_free_object+0x24/0xa0 [panthor] >> [37743.041321] drm_gem_object_free+0x1c/0x30 [drm] >> [37743.041452] panthor_vm_bo_put+0xc4/0x12c [panthor] > >I checked the _pin/_unpin calls in panthor, and they seem to be >balanced (we take a ref when we allocate a gpuvm_bo and release it >when the gpuvm_bo is gone). I wonder if something else is calling >_pin_pages() or _get_pages() without holding a GEM ref... > >While investigating I found a double-cleanup in the code (see below) >which explains why those memset(0) were required in >panthor_vm_cleanup_op_ctx()), but I doubt it fixes your issue. > >--->8--- >diff --git a/drivers/gpu/drm/panthor/panthor_mmu.c >b/drivers/gpu/drm/panthor/panthor_mmu.c >index d3ce29cd0662..5606ab4d6289 100644 >--- a/drivers/gpu/drm/panthor/panthor_mmu.c >+++ b/drivers/gpu/drm/panthor/panthor_mmu.c >@@ -1085,17 +1085,12 @@ static void panthor_vm_cleanup_op_ctx(struct >panthor_vm_op_ctx *op_ctx, >} > >kfree(op_ctx->rsvd_page_tables.pages); >- memset(_ctx->rsvd_page_tables, 0, sizeof(op_ctx->rsvd_page_tables)); > >if (op_ctx->map.vm_bo) >panthor_vm_bo_put(op_ctx->map.vm_bo); > >- memset(_ctx->map, 0, sizeof(op_ctx->map)); >- >- for (u32 i = 0; i < ARRAY_SIZE(op_ctx->preallocated_vmas); i++) { >+ for (u32 i = 0; i < ARRAY_SIZE(op_ctx->preallocated_vmas); i++) >kfree(op_ctx->preallocated_vmas[i]); >-
Re:Re: [PATCH v4 00/14] drm: Add a driver for CSF-based Mali GPUs
Hi Boris: I saw this warning sometimes(Run on a armbain based bookworm),not sure is a know issue or something else。 [15368.293031] systemd-journald[715]: Received client request to relinquish /var/log/journal/1bc4a340506142af9bd31a6a3d2170ba access. [37743.040737] [ cut here ] [37743.040764] panthor fb00.gpu: drm_WARN_ON(shmem->pages_use_count) [37743.040890] WARNING: CPU: 2 PID: 5702 at drivers/gpu/drm/drm_gem_shmem_helper.c:158 drm_gem_shmem_free+0x144/0x14c [drm_shmem_helper] [37743.040929] Modules linked in: joydev rfkill sunrpc lz4hc lz4 zram binfmt_misc hantro_vpu crct10dif_ce v4l2_vp9 v4l2_h264 snd_soc_simple_amplifier v4l2_mem2mem videobuf2_dma_contig snd_soc_es8328_i2c videobuf2_memops rk_crypto2 snd_soc_es8328 videobuf2_v4l2 sm3_generic videodev crypto_engine sm3 rockchip_rng videobuf2_common nvmem_rockchip_otp snd_soc_rockchip_i2s_tdm snd_soc_hdmi_codec snd_soc_simple_card mc snd_soc_simple_card_utils snd_soc_core snd_compress ac97_bus snd_pcm_dmaengine snd_pcm snd_timer snd soundcore dm_mod ip_tables x_tables autofs4 dw_hdmi_qp_i2s_audio dw_hdmi_qp_cec rk808_regulator rockchipdrm dw_mipi_dsi dw_hdmi_qp dw_hdmi analogix_dp drm_dma_helper fusb302 display_connector rk8xx_spi drm_display_helper phy_rockchip_snps_pcie3 phy_rockchip_samsung_hdptx_hdmi panthor tcpm rk8xx_core cec drm_gpuvm gpu_sched drm_kms_helper drm_shmem_helper drm_exec r8169 drm pwm_bl adc_keys [37743.041108] CPU: 2 PID: 5702 Comm: kworker/u16:8 Not tainted 6.8.0-rc1-edge-rockchip-rk3588 #2 [37743.041115] Hardware name: Rockchip RK3588 EVB1 V10 Board (DT) [37743.041120] Workqueue: panthor-cleanup panthor_vm_bind_job_cleanup_op_ctx_work [panthor] [37743.041151] pstate: 6049 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [37743.041157] pc : drm_gem_shmem_free+0x144/0x14c [drm_shmem_helper] [37743.041169] lr : drm_gem_shmem_free+0x144/0x14c [drm_shmem_helper] [37743.041181] sp : 80008d37bcc0 [37743.041184] x29: 80008d37bcc0 x28: 800081d379c0 x27: 800081d37000 [37743.041196] x26: 00019909a280 x25: 00019909a2c0 x24: 0001017a4c05 [37743.041206] x23: dead0100 x22: dead0122 x21: 0001627ac1a0 [37743.041217] x20: x19: 0001627ac000 x18: [37743.041227] x17: 00040044 x16: 005000f2b5503510 x15: fff91b77 [37743.041238] x14: 0001 x13: 03c5 x12: ffea [37743.041248] x11: dfff x10: dfff x9 : 800081e0e818 [37743.041259] x8 : 0002ffe8 x7 : c000dfff x6 : 000affa8 [37743.041269] x5 : 1fff x4 : x3 : 8000819a6008 [37743.041279] x2 : x1 : x0 : 00018465e900 [37743.041290] Call trace: [37743.041293] drm_gem_shmem_free+0x144/0x14c [drm_shmem_helper] [37743.041306] panthor_gem_free_object+0x24/0xa0 [panthor] [37743.041321] drm_gem_object_free+0x1c/0x30 [drm] [37743.041452] panthor_vm_bo_put+0xc4/0x12c [panthor] [37743.041475] panthor_vm_cleanup_op_ctx.constprop.0+0xb0/0x104 [panthor] [37743.041491] panthor_vm_bind_job_cleanup_op_ctx_work+0x28/0xd0 [panthor] [37743.041507] process_one_work+0x15c/0x3a4 [37743.041526] worker_thread+0x32c/0x438 [37743.041536] kthread+0x108/0x10c [37743.041546] ret_from_fork+0x10/0x20 [37743.041557] ---[ end trace ]--- rk3588@rk3588-evb1:~$ rk3588@rk3588-evb1:~$ neofetch -- █ █ █ █ █ █ █ █ █ █ █ OS: Armbian (24.2.0-trunk) aarch64 ███ Host: Rockchip RK3588 EVB1 V10 Board ▄▄██ ██▄▄ Kernel: 6.8.0-rc1-edge-rockchip-rk3588 ▄▄███████▄▄ Uptime: 13 hours, 28 mins ▄▄██ ██ ██ ██▄▄ Packages: 1455 (dpkg) ▄▄██ ██ ██ ██▄▄ Shell: bash 5.2.15 ▄▄██ ██ ██ ██▄▄ Resolution: 3840x2160 ▄▄██ █ ██▄▄ Terminal: /dev/pts/1 ▄▄██ ██ ██ ██▄▄ CPU: (8) @ 1.800GHz ▄▄██ ██ ██ ██▄▄ Memory: 2062MiB / 7687MiB ▄▄██ ██ ██ ██▄▄ ▄▄██ ██▄▄ ███ █ █ █ █ █ █ █ █ █ █ █ 在 2024-01-29 18:41:47,"Boris Brezillon" 写道: >On Mon, 29 Jan 2024 17:20:47 +0800 (CST) >"Andy Yan" wrote: > >> Hi Boris: >> >> Thanks for you great work。 >> >> One thing please take note: >> commit (arm64: dts: rockchip: rk3588: Add GPU nodes) in [1] seems remove >> the "disabled" status >> of usb_host2_xhci, this may cause a boot issue on some boards that use >> combphy2_psu phy for other functions. > >Oops, should be fixed in >https://gitlab.freedesktop.org/panfrost/linux/-/commits/panthor-next+rk3588 >now. > >Thanks, > >Boris