Hi, On 29.01.2025 15:09, Rodrigo Vivi wrote: > On Tue, Jan 28, 2025 at 08:54:10AM +0000, MARDI Youness wrote: >> Hello, >> >> >> >> >> >> >> >> Could you help us on this issue: >> >> [1]https://github.com/intel/linux-intel-lts/issues/54 >>
Once you enabled all VFs, try to capture and attach to [1] all SRIOV provisioning details, you may use something like: $ grep . -r /sys/class/drm/card0/iov Also attach full dmesg and GuC log right after the failure. For larger GuC log buffer please select CONFIG_DRM_I915_DEBUG_GUC and use modparam i915.guc_log_level=4 You can also try with (once VFs are enabled, but before starting VMs): - set explicit "execution_quantum_ms" for PF and all VFs to 20 - set explicit "preemption_timeout_us" for PF and all VFs to 20000 - enable "engine_reset" policy $ echo 20 > /sys/class/drm/card0/iov/pf/gt0/execution_quantum_ms $ echo 20 > /sys/class/drm/card0/iov/vf1/gt0/execution_quantum_ms ... $ echo 1 > /sys/class/drm/card0/iov/pf/gt0/policies/engine_reset >> >> >> >> >> >> >> Host environment >> >> >> >> Operating system: Gentoo Base System release 2.14 >> >> OS/kernel version: >> >> >> [2]https://github.com/intel/linux-intel-lts/tree/lts-v6.6.34-linux-240626T131354Z > > https://github.com/intel/linux-intel-lts/blob/lts-v6.6.34-linux-240626T131354Z/drivers/gpu/drm/i915/README.sriov > > Michal, could you please help here? > > Thanks, > Rodrigo. > >> Architecture: x86_64 >> >> QEMU flavor: qemu-system-x86_64 >> >> QEMU version: latest qemu (master branch) >> >> CPU: 12th Gen Intel(R) Core(TM) i7-1270P >> >> igpu: Alder Lake-P >> >> firmware: >> >> >> [3]https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/snapshot/linux-firmware-20241110.tar.gz >> >> >> >> >> >> >> Emulated/Virtualized environment >> >> >> >> Operating system: Windows 10 21H1 >> >> >> >> >> >> >> >> >> >> >> >> Description of problem >> >> >> >> After setting up SR-IOV (kernel compilation, kernel cmdline, vfio-pci >> >> driver attribution to the new pci..) >> >> I've got my two new pci. >> >> >> >> >> >> >> >> >> >> >> >> 00:02.0 VGA compatible controller: Intel Corporation Alder Lake-P >> >> Integrated Graphics Controller (rev 0c) >> >> >> >> DeviceName: Onboard IGD >> >> >> >> >> >> >> >> Subsystem: Hewlett-Packard Company Alder Lake-P Integrated Graphics >> >> Controller >> >> >> >> Kernel driver in use: i915 >> >> >> >> >> >> >> >> 00:02.1 VGA compatible controller: Intel Corporation Alder Lake-P >> >> Integrated Graphics Controller (rev 0c) >> >> >> >> Subsystem: Hewlett-Packard Company Alder Lake-P Integrated Graphics >> >> Controller >> >> >> >> Kernel driver in use: vfio-pci >> >> >> >> >> >> >> >> 00:02.2 VGA compatible controller: Intel Corporation Alder Lake-P >> >> Integrated Graphics Controller (rev 0c) >> >> >> >> Subsystem: Hewlett-Packard Company Alder Lake-P Integrated Graphics >> >> Controller >> >> >> >> Kernel driver in use: vfio-pci >> >> >> >> >> >> >> >> I gave one of those pci to my VM with this qemu cmdline: >> >> >> >> >> >> >> >> -cpu >> >> >> host,migratable=on,hv-time,hv-relaxed,hv-vapic,hv-spinlocks=0x1fff,hv-passthrough,hv-vendor-id=IrisXE >> >> >> ... >> >> >> >> -device >> >> vfio-pci-nohotplug,host=0000:00:02.1,id=hostdev0,bus=pci.4,addr=0x0 >> >> >> >> >> >> >> >> Sometimes it working properly when I start the qemu cmdline but most of >> >> the time I've got those kernel errors and a GPU hang: >> >> >> >> >> >> >> >> kernel [ 2252.208134] i915 0000:00:02.0: [drm] ERROR GT0: GUC: TLB >> >> invalidation response timed out for seqno 9679 >> >> >> >> kernel [ 2252.208134] i915 0000:00:02.0: [drm] ERROR GT0: GUC: TLB >> >> invalidation response timed out for seqno 9679 >> >> >> >> kernel i915 0000:00:02.0: [drm] ERROR GT0: GUC: TLB invalidation >> >> response timed out for seqno 9679 >> >> >> >> kernel i915 0000:00:02.0: [drm] ERROR GT0: GUC: TLB invalidation >> >> response timed out for seqno 9679 >> >> >> >> .... >> >> >> >> kernel Fence expiration time out >> >> i915-0000:00:02.0:renderThread22381:6e0! >> >> >> >> kernel i915 0000:00:02.0: [drm] GT0: GuC firmware >> i915/adlp_guc_70.bin >> version 70.13.1 >> >> >> >> kernel i915 0000:00:02.0: [drm] GT0: HuC firmware i915/tgl_huc.bin >> >> version 7.9.3 >> >> >> >> kernel i915 0000:00:02.0: [drm] GT0: HuC: authenticated for all >> >> workloads >> >> >> >> kernel i915 0000:00:02.0: [drm] GT0: GUC: submission enabled >> >> >> >> kernel i915 0000:00:02.0: [drm] GT0: GUC: SLPC enabled >> >> >> >> kernel [ 2730.991019] i915 0000:00:02.0: [drm] GPU HANG: ecode >> >> 12:1:85dfbfff, in renderThread [22381] >> >> >> >> kernel [ 2730.991084] i915 0000:00:02.0: [drm] renderThread22381 >> >> context reset due to GPU hang >> >> >> >> >> >> >> >> >> >> >> >> It mostly appears when Qemu is starting.. >> >> Any help would be appreciated, thanks a lot >> >> >> >> >> >> >> >> Best Regards, >> >> >> >> >> >> >> >> Youness MARDI >> >> >> >> >> >> >> >> C2 – Usage restreint >> >> >> References >> >> Visible links >> 1. https://github.com/intel/linux-intel-lts/issues/54 >> 2. >> https://github.com/intel/linux-intel-lts/tree/lts-v6.6.34-linux-240626T131354Z >> 3. >> https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/snapshot/linux-firmware-20241110.tar.gz
