Hi Rob,

On 07/09/2017 04:19 PM, Rob Clark wrote:
> Not entirely sure what triggers it, but with venus build as kernel
> module and in initrd, we hit this crash:

Is it happens occasionally or everytime in the initrd? And also with
your patch it will bail out on venus_probe, does it crash again on next
venus_probe?

> 
>   Unable to handle kernel paging request at virtual address ffff80003c039000
>   pgd = ffff00000a14f000
>   [ffff80003c039000] *pgd=00000000bd9f7003, *pud=00000000bd9f6003, 
> *pmd=00000000bd9f0003, *pte=0000000000000000
>   Internal error: Oops: 96000007 [#1] SMP
>   Modules linked in: qcom_wcnss_pil(E+) crc32_ce(E) qcom_common(E) 
> venus_core(E+) remoteproc(E) snd_soc_msm8916_digital(E) virtio_ring(E) 
> cdc_ether(E) snd_soc_lpass_apq8016(E) snd_soc_lpass_cpu(E) 
> snd_soc_apq8016_sbc(E) snd_soc_lpass_platform(E) v4l2_mem2mem(E) virtio(E) 
> snd_soc_core(E) ac97_bus(E) snd_pcm_dmaengine(E) snd_seq(E) leds_gpio(E) 
> videobuf2_v4l2(E) videobuf2_core(E) snd_seq_device(E) sndi_pcm(E) videodev(E) 
> media(E) nvmem_qfprom(E) msm(E) snd_timer(E) snd(E) soundcore(E) spi_qup(E) 
> mdt_loader(E) qcom_tsens(E) qcom_spmi_temp_alarm(E) nvmem_core(E) msm_rng(E) 
> uas(E) usb_storage(E) dm9601(E) usbnet(E) mii(E) mmc_block(E) adv7511(E) 
> drm_kms_helper(E) syscopyarea(E) sysfillrect(E) sysimgblt(E) fb_sys_fops(E) 
> qcom_spmi_vadc(E) qcom_vadc_common(PE) industrialio(E) pinctrl_spmi_mpp(E)
>    pinctrl_spmi_gpio(E) rtc_pm8xxx(E) clk_smd_rpm(E) sdhci_msm(E) 
> sdhci_pltfm(E) qcom_smd_regulator(E) drm(E) smd_rpm(E) qcom_spmi_pmic(E) 
> regmap_spmi(E) ci_hdrc_msm(E) ci_hdrc(E) usb3503(E) extcon_usb_gpio(E) 
> phy_msm_usb(E) udc_core(E) qcom_hwspinlock(E) extcon_core(E) ehci_msm(E) 
> i2c_qup(E) sdhci(E) mmc_core(E) spmi_pmic_arb(E) spmi(E) qcom_smd(E) smsm(E) 
> rpmsg_core(E) smp2p(E) smem(E) hwspinlock_core(E) gpio_keys(E)
>   CPU: 2 PID: 551 Comm: irq/150-venus Tainted: P            E   4.12.0+ #1625
>   Hardware name: qualcomm dragonboard410c/dragonboard410c, BIOS 
> 2017.07-rc2-00144-ga97bdbdf72-dirty 07/08/2017
>   task: ffff800037338000 task.stack: ffff800038e00000
>   PC is at hfi_sys_init_done+0x64/0x140 [venus_core]
>   LR is at hfi_process_msg_packet+0xcc/0x1e8 [venus_core]
>   pc : [<ffff00000118b384>] lr : [<ffff00000118c11c>] pstate: 20400145
>   sp : ffff800038e03c60
>   x29: ffff800038e03c60 x28: 0000000000000000
>   x27: 00000000000df018 x26: ffff00000118f4d0
>   x25: 0000000000020003 x24: ffff80003a8d3010
>   x23: ffff00000118f760 x22: ffff800037b40028
>   x21: ffff8000382981f0 x20: ffff800037b40028
>   x19: ffff80003c039000 x18: 0000000000000020
>   x17: 0000000000000000 x16: ffff800037338000
>   x15: ffffffffffffffff x14: 0000001000000014
>   x13: 0000000100001007 x12: 0000000100000020
>   x11: 0000100e00000000 x10: 0000000000000001
>   x9 : 0000000200000000 x8 : 0000001400000001
>   x7 : 0000000000001010 x6 : 0000000000000148
>   x5 : 0000000000001009 x4 : ffff80003c039000
>   x3 : 00000000cd770abb x2 : 0000000000000042
>   x1 : 0000000000000788 x0 : 0000000000000002
>   Process irq/150-venus (pid: 551, stack limit = 0xffff800038e00000)
>   Call trace:
>   [<ffff00000118b384>] hfi_sys_init_done+0x64/0x140 [venus_core]
>   [<ffff00000118c11c>] hfi_process_msg_packet+0xcc/0x1e8 [venus_core]
>   [<ffff00000118a2b4>] venus_isr_thread+0x1b4/0x208 [venus_core]
>   [<ffff00000118e750>] hfi_isr_thread+0x28/0x38 [venus_core]
>   [<ffff000008161550>] irq_thread_fn+0x30/0x70
>   [<ffff0000081617fc>] irq_thread+0x14c/0x1c8
>   [<ffff000008105e68>] kthread+0x138/0x140
>   [<ffff000008083590>] ret_from_fork+0x10/0x40
>   Code: 52820125 52820207 7a431820 54000249 (b9400263)
>   ---[ end trace c963460f20a984b6 ]---
> 
> The problem is that in the error case, we've incremented the data ptr
> but not decremented rem_bytes, and keep reading (presumably garbage)
> until eventually we go beyond the end of the buffer.
> 
> Instead, on first error, we should probably just bail out.  Other
> option is to increment read_bytes by sizeof(u32) before the switch,
> rather than only accounting for the ptype header in the non-error
> case.  Note that in this case it is HFI_ERR_SYS_INVALID_PARAMETER,
> ie. an unrecognized/unsupported parameter, so interpreting the next
> word as a property type would be bogus.  The other error cases are
> due to truncated buffer, so there isn't likely to be anything valid
> to interpret in the remainder of the buffer.  So just bailing seems
> like a reasonable solution.

I have a WIP patch which rewrite the message parsing, it would be nice
if I can reproduce this crash.

> 
> Signed-off-by: Rob Clark <robdcl...@gmail.com>
> ---
>  drivers/media/platform/qcom/venus/hfi_msgs.c | 11 ++++++-----
>  1 file changed, 6 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/media/platform/qcom/venus/hfi_msgs.c 
> b/drivers/media/platform/qcom/venus/hfi_msgs.c
> index debf80a92797..4190825b20a1 100644
> --- a/drivers/media/platform/qcom/venus/hfi_msgs.c
> +++ b/drivers/media/platform/qcom/venus/hfi_msgs.c
> @@ -239,11 +239,12 @@ static void hfi_sys_init_done(struct venus_core *core, 
> struct venus_inst *inst,
>                       break;
>               }
>  
> -             if (!error) {
> -                     rem_bytes -= read_bytes;
> -                     data += read_bytes;
> -                     num_properties--;
> -             }
> +             if (error)
> +                     break;
> +
> +             rem_bytes -= read_bytes;
> +             data += read_bytes;
> +             num_properties--;
>       }
>  
>  err_no_prop:
> 

-- 
regards,
Stan

Reply via email to