On Tue, 14 Jan 2020 19:02:41 +0300 yurij <lnk...@gmail.com> wrote: > On 1/14/20 5:04 PM, Alex Williamson wrote: > > On Tue, 14 Jan 2020 17:14:33 +1100 > > Alexey Kardashevskiy <a...@ozlabs.ru> wrote: > > > >> On 14/01/2020 03:28, Alex Williamson wrote: > >>> On Mon, 13 Jan 2020 18:49:21 +0300 > >>> yurij <lnk...@gmail.com> wrote: > >>> > >>>> Hello everybody! > >>>> > >>>> I have a specific PCIe device (sorry, but I can't tell about what is it > >>>> and what it does) but PCI configuration space consists of 4 BARs (lspci > >>>> output brief): > >>>> > >>>> lspci -s 84:00.00 -vvv > >>>> > >>>> . . . > >>>> Region 0: Memory at fa000000 (64-bit, non-prefetchable) [size=16M] > >>>> Region 2: Memory at fb001000 (32-bit, non-prefetchable) [size=4K] > >>>> Region 3: Memory at fb000000 (32-bit, non-prefetchable) [size=4K] > >>>> Region 4: Memory at f9000000 (64-bit, non-prefetchable) [size=16M] > >>>> . . . > >>>> Kernel driver in use: vfio-pci > >>>> . . . > >>>> > >>>> BAR0 merged with BAR1, BAR4 merged with BAR5 so they are 64 bit width. > >>>> > >>>> I put this PCIe device in virtual machine via vfio: > >>>> > >>>> -device vfio-pci,host=84:00.0,id=hostdev0,bus=pci.6,addr=0x0 > >>>> > >>>> Virtual machine successfully boot. PCI configuration space in virtual > >>>> environment looks OK (lspci output brief): > >>>> > >>>> lspci -s 06:00.0 -vvv > >>>> > >>>> . . . > >>>> Region 0: Memory at f8000000 (64-bit, non-prefetchable) [size=16M] > >>>> Region 2: Memory at fa000000 (32-bit, non-prefetchable) [size=4K] > >>>> Region 3: Memory at fa001000 (32-bit, non-prefetchable) [size=4K] > >>>> Region 4: Memory at f9000000 (64-bit, non-prefetchable) [size=16M] > >>>> . . . > >>>> Kernel driver in use: custom_driver > >>>> > >>>> BAR0 merged with BAR1 and BAR4 merged with BAR5 and so they are also 64 > >>>> bit width. > >>>> > >>>> The main problem in 4K HOLE in REGION 0 in virtual environment. So some > >>>> device features don't work. > >>>> > >>>> I have enabled iommu trace in host system (trace_event=iommu) and > >>>> display all events (for i in $(find > >>>> /sys/kernel/debug/tracing/events/iommu/ -name enable);do echo 1 > $i; > >>>> done). I saw next events during virtual machine booting: > >>>> > >>>> # cat /sys/kernel/debug/tracing/trace > >>>> . . . > >>>> CPU 0/KVM-3046 [051] .... 63113.338894: map: IOMMU: > >>>> iova=0x00000000f8000000 paddr=0x00000000fa000000 size=24576 > >>>> CPU 0/KVM-3046 [051] .... 63113.339177: map: IOMMU: > >>>> iova=0x00000000f8007000 paddr=0x00000000fa007000 size=16748544 > >>>> CPU 0/KVM-3046 [051] .... 63113.339444: map: IOMMU: > >>>> iova=0x00000000fa000000 paddr=0x00000000fb001000 size=4096 > >>>> CPU 0/KVM-3046 [051] .... 63113.339697: map: IOMMU: > >>>> iova=0x00000000fa001000 paddr=0x00000000fb000000 size=4096 > >>>> CPU 0/KVM-3046 [051] .... 63113.340209: map: IOMMU: > >>>> iova=0x00000000f9000000 paddr=0x00000000f9000000 size=16777216 > >>>> . . . > >>>> > >>>> I have enabled qemu trace(-trace events=/root/qemu/trace_events). Trace > >>>> file consists of the falling functions: > >>>> vfio_region_mmap > >>>> vfio_get_dev_region > >>>> vfio_pci_size_rom > >>>> vfio_pci_read_config > >>>> vfio_pci_write_config > >>>> vfio_iommu_map_notify > >>>> vfio_listener_region_add_iommu > >>>> vfio_listener_region_add_ram > >>>> > >>>> Some important brief from qemu trace: > >>>> . . . > >>>> янв 13 18:17:24 VM qemu-system-x86_64[7131]: vfio_region_mmap Region > >>>> 0000:84:00.0 BAR 0 mmaps[0] [0x0 - 0xffffff] > >>>> янв 13 18:17:24 VM qemu-system-x86_64[7131]: vfio_region_mmap Region > >>>> 0000:84:00.0 BAR 2 mmaps[0] [0x0 - 0xfff] > >>>> янв 13 18:17:24 VM qemu-system-x86_64[7131]: vfio_region_mmap Region > >>>> 0000:84:00.0 BAR 3 mmaps[0] [0x0 - 0xfff] > >>>> янв 13 18:17:24 VM qemu-system-x86_64[7131]: vfio_region_mmap Region > >>>> 0000:84:00.0 BAR 4 mmaps[0] [0x0 - 0xffffff] > >>>> . . . > >>>> янв 13 18:17:37 VM qemu-system-x86_64[7131]: > >>>> vfio_listener_region_add_ram region_add [ram] 0xf8000000 - 0xf8005fff > >>>> [0x7f691e800000] > >>>> янв 13 18:17:37 VM qemu-system-x86_64[7131]: > >>>> vfio_listener_region_add_ram region_add [ram] 0xf8007000 - 0xf8ffffff > >>>> [0x7f691e807000] > >>>> янв 13 18:17:37 VM qemu-system-x86_64[7131]: > >>>> vfio_listener_region_add_ram region_add [ram] 0xfa000000 - 0xfa000fff > >>>> [0x7f6b5de37000] > >>>> янв 13 18:17:37 VM qemu-system-x86_64[7131]: > >>>> vfio_listener_region_add_ram region_add [ram] 0xfa001000 - 0xfa001fff > >>>> [0x7f6b58004000] > >>>> янв 13 18:17:37 VM qemu-system-x86_64[7131]: > >>>> vfio_listener_region_add_ram region_add [ram] 0xf9000000 - 0xf9ffffff > >>>> [0x7f691d800000] > >>>> > >>>> I use qemu 4.0.0 which I rebuild for tracing support > >>>> (--enable-trace-backends=syslog). > >>>> > >>>> Please, help me solve this issue. Thank you! > >>> > >>> Something has probably created a QEMU MemoryRegion overlapping the BAR, > >>> we do this for quirks where we want to intercept a range of MMIO for > >>> emulation, but the offset 0x6000 on BAR0 doesn't sound familiar to me. > >>> Run the VM with a monitor and see if 'info mtree' provides any info on > >>> the handling of that overlap. Thanks, > >> > >> > >> Could not it be an MSIX region? 'info mtree -f' should tell exactly what > >> is going on. > > > > Oh, good call, that's probably it. The PCI spec specifically > > recommends against placing non-MSIX related registers within the same > > 4K page as the vector table to avoid such things: > > > > If a Base Address register that maps address space for the MSI-X Table > > or MSI-X PBA also maps other usable address space that is not > > associated with MSI-X structures, locations (e.g., for CSRs) used in > > the other address space must not share any naturally aligned 4-KB > > address range with one where either MSI-X structure resides. This > > allows system software where applicable to use different processor > > attributes for MSI-X structures and the other address space. > > > > We have the following QEMU vfio-pci device option to relocate the BAR > > elsewhere for hardware that violates that recommendation or for where > > the PCI spec recommended alignment isn't sufficient: > > > > x-msix-relocation=<OffAutoPCIBAR> - off/auto/bar0/bar1/bar2/bar3/bar4/bar5 > > > > In this case I'd probably recommend bar2 or bar3 as those BARs would > > only be extended to 8K versus bar0/4 would be extended to 32M. Thanks, > > > > Alex > > > > > x-msix-relocation=<OffAutoPCIBAR> - > off/auto/bar0/bar1/bar2/bar3/bar4/bar5 > > I have used successfully 'x-msix-relocation' option: > -device > vfio-pci,host=84:00.0,id=hostdev0,bus=pci.6,addr=0x0,x-msix-relocation=bar2 > > Now, IOMMU trace looks like: > . . . > CPU 0/KVM-4237 [055] .... 4750.918416: map: IOMMU: > iova=0x00000000f8000000 paddr=0x00000000fa000000 size=16777216 > CPU 0/KVM-4237 [055] .... 4750.918740: map: IOMMU: > iova=0x00000000fa000000 paddr=0x00000000fb001000 size=4096 > CPU 0/KVM-4237 [055] .... 4750.919069: map: IOMMU: > iova=0x00000000fa002000 paddr=0x00000000fb000000 size=4096 > CPU 0/KVM-4237 [055] .... 4750.919698: map: IOMMU: > iova=0x00000000f9000000 paddr=0x00000000f9000000 size=16777216 > . . . > > All seems to be OK. > > Thank you very much!
Glad it worked, but please also tell your hardware developers to follow the PCI spec recommendations for alignment of MSI-X related data structures and ideally use a BAR dedicated to MSI-X for independence from the processor page size. If this is a device under development, it would avoid this headache for future users. Thanks, Alex