Re: [Intel-gfx] i915 dma faults on Xen
On Fri, Feb 19, 2021 at 12:30:23PM -0500, Jason Andryuk wrote: > On Wed, Oct 21, 2020 at 9:59 AM Jan Beulich wrote: > > > > On 21.10.2020 15:36, Jason Andryuk wrote: > > > On Wed, Oct 21, 2020 at 8:53 AM Jan Beulich wrote: > > >> > > >> On 21.10.2020 14:45, Jason Andryuk wrote: > > >>> On Wed, Oct 21, 2020 at 5:58 AM Roger Pau Monné > > >>> wrote: > > >>>> Hm, it's hard to tell what's going on. My limited experience with > > >>>> IOMMU faults on broken systems there's a small range that initially > > >>>> triggers those, and then the device goes wonky and starts accessing a > > >>>> whole load of invalid addresses. > > >>>> > > >>>> You could try adding those manually using the rmrr Xen command line > > >>>> option [0], maybe you can figure out which range(s) are missing? > > >>> > > >>> They seem to change, so it's hard to know. Would there be harm in > > >>> adding one to cover the end of RAM ( 0x04,7c80, ) to ( > > >>> 0xff,, )? Maybe that would just quiet the pointless faults > > >>> while leaving the IOMMU enabled? > > >> > > >> While they may quieten the faults, I don't think those faults are > > >> pointless. They indicate some problem with the software (less > > >> likely the hardware, possibly the firmware) that you're using. > > >> Also there's the question of what the overall behavior is going > > >> to be when devices are permitted to access unpopulated address > > >> ranges. I assume you did check already that no devices have their > > >> BARs placed in that range? > > > > > > Isn't no-igfx already letting them try to read those unpopulated > > > addresses? > > > > Yes, and it is for the reason that the documentation for the > > option says "If specifying `no-igfx` fixes anything, please > > report the problem." I imply from in in particular that one > > better wouldn't use it for non-development purposes of whatever > > kind. > > I stopped seeing these DMA faults, but I didn't know what made them go > away. Then when working with an older 5.4.64 kernel, I saw them > again. Eric bisected down to the 5.4.y version of mainline linux > commit: > > commit 8195400f7ea95399f721ad21f4d663a62c65036f > Author: Chris Wilson > Date: Mon Oct 19 11:15:23 2020 +0100 > > drm/i915: Force VT'd workarounds when running as a guest OS > > If i915.ko is being used as a passthrough device, it does not know if > the host is using intel_iommu. Mixing the iommu and gfx causes a few > issues (such as scanout overfetch) which we need to workaround inside > the driver, so if we detect we are running under a hypervisor, also > assume the device access is being virtualised. So the commit above fixes the DMA faults seen on Linux when using a i915 gfx card? Thanks for digging into this. Roger. ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] i915 dma faults on Xen
On Wed, Oct 21, 2020 at 12:33:05PM +0200, Jan Beulich wrote: > On 21.10.2020 11:58, Roger Pau Monné wrote: > > On Fri, Oct 16, 2020 at 12:23:22PM -0400, Jason Andryuk wrote: > >> The RMRRs are: > >> (XEN) [VT-D]Host address width 39 > >> (XEN) [VT-D]found ACPI_DMAR_DRHD: > >> (XEN) [VT-D] dmaru->address = fed9 > >> (XEN) [VT-D]drhd->address = fed9 iommu->reg = 82c00021d000 > >> (XEN) [VT-D]cap = 1cc40660462 ecap = 19e2ff0505e > >> (XEN) [VT-D] endpoint: :00:02.0 > >> (XEN) [VT-D]found ACPI_DMAR_DRHD: > >> (XEN) [VT-D] dmaru->address = fed91000 > >> (XEN) [VT-D]drhd->address = fed91000 iommu->reg = 82c00021f000 > >> (XEN) [VT-D]cap = d2008c40660462 ecap = f050da > >> (XEN) [VT-D] IOAPIC: :00:1e.7 > >> (XEN) [VT-D] MSI HPET: :00:1e.6 > >> (XEN) [VT-D] flags: INCLUDE_ALL > >> (XEN) [VT-D]found ACPI_DMAR_RMRR: > >> (XEN) [VT-D] endpoint: :00:14.0 > >> (XEN) [VT-D]dmar.c:615: RMRR region: base_addr 78863000 end_addr 78882fff > >> (XEN) [VT-D]found ACPI_DMAR_RMRR: > >> (XEN) [VT-D] endpoint: :00:02.0 > >> (XEN) [VT-D]dmar.c:615: RMRR region: base_addr 7d00 end_addr 7f7f > >> (XEN) [VT-D]found ACPI_DMAR_RMRR: > >> (XEN) [VT-D] endpoint: :00:16.7 > >> (XEN) [VT-D]dmar.c:581: Non-existent device (:00:16.7) is > >> reported in RMRR (78907000, 78986fff)'s scope! > >> (XEN) [VT-D]dmar.c:596: Ignore the RMRR (78907000, 78986fff) due to > > > > This is also part of a reserved region, so should be added to the > > iommu page tables anyway regardless of this message. > > Could you clarify why you think so? RMRRs are tied to devices, so > if a device in reality doesn't exist (and no other one uses the > same range), I don't see why an IOMMU mapping would be needed > (unless to work around some related firmware bug). Plus aiui none > of the IOMMU faults actually report this range as having got > accessed. Since it's the hardware domain that gets the gfx card assigned here it will get any reserved regions added to the IOMMU page tables in arch_iommu_hwdom_init. I agree it's not relevant here, since those are not the regions reported in the IOMMU faults. Roger. ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] i915 dma faults on Xen
On Fri, Oct 16, 2020 at 12:23:22PM -0400, Jason Andryuk wrote: > On Thu, Oct 15, 2020 at 11:16 AM Jason Andryuk wrote: > > > > On Thu, Oct 15, 2020 at 7:31 AM Roger Pau Monné > > wrote: > > > > > > On Wed, Oct 14, 2020 at 08:37:06PM +0100, Andrew Cooper wrote: > > > > On 14/10/2020 20:28, Jason Andryuk wrote: > > > > > Hi, > > > > > > > > > > Bug opened at https://gitlab.freedesktop.org/drm/intel/-/issues/2576 > > > > > > > > > > I'm seeing DMA faults for the i915 graphics hardware on a Dell > > > > > Latitude 5500. These were captured when I plugged into a Dell > > > > > Thunderbolt dock with two DisplayPort monitors attached. Xen 4.12.4 > > > > > staging and Linux 5.4.70 (and some earlier versions). > > > > > > > > > > Oct 14 18:41:49.056490 kernel:[ 85.570347] [drm:gen8_de_irq_handler > > > > > [i915]] *ERROR* Fault errors on pipe A: 0x0080 > > > > > Oct 14 18:41:49.056494 kernel:[ 85.570395] [drm:gen8_de_irq_handler > > > > > [i915]] *ERROR* Fault errors on pipe A: 0x0080 > > > > > Oct 14 18:41:49.056589 VM hypervisor: (XEN) [VT-D]DMAR:[DMA Read] > > > > > Request device [:00:02.0] fault addr 39b5845000, iommu reg = > > > > > 82c00021d000 > > > > > Oct 14 18:41:49.056594 VM hypervisor: (XEN) [VT-D]DMAR: reason 06 - > > > > > PTE Read access is not set > > > > > Oct 14 18:41:49.056784 kernel:[ 85.570668] [drm:gen8_de_irq_handler > > > > > [i915]] *ERROR* Fault errors on pipe A: 0x0080 > > > > > Oct 14 18:41:49.056789 kernel:[ 85.570687] [drm:gen8_de_irq_handler > > > > > [i915]] *ERROR* Fault errors on pipe A: 0x0080 > > > > > Oct 14 18:41:49.056885 VM hypervisor: (XEN) [VT-D]DMAR:[DMA Read] > > > > > Request device [:00:02.0] fault addr 4238d0a000, iommu reg = > > > > > 82c00021d000 > > > > > Oct 14 18:41:49.056890 VM hypervisor: (XEN) [VT-D]DMAR: reason 06 - > > > > > PTE Read access is not set > > > > > > > > > > They repeat. In the log attached to > > > > > https://gitlab.freedesktop.org/drm/intel/-/issues/2576, they start at > > > > > "Oct 14 18:41:49.056589" and continue until I unplug the dock around > > > > > "Oct 14 18:41:54.801802". > > > > > > > > > > I've also seen similar messages when attaching the laptop's HDMI port > > > > > to a 4k monitor. The eDP display by itself seems okay. > > > > > > > > > > I tried Fedora 31 & 32 live images with intel_iommu=on, so no Xen, and > > > > > didn't see any errors > > > > > > > > > > This is a kernel & xen log with drm.debug=0x1e. It also includes some > > > > > application (glass) logging when it changes resolutions which seems to > > > > > set off the DMA faults. 5500-igfx-messages-kern-xen-glass > > > > > > > > > > Running xen with iommu=no-igfx disables the iommu for the i915 > > > > > graphics and no faults are reported. However, that breaks some other > > > > > devices (Dell Latitude 7200 and 5580) giving a black screen with: > > > > > > > > > > Oct 10 13:24:37.022117 kernel:[ 14.884759] i915 :00:02.0: Failed > > > > > to idle engines, declaring wedged! > > > > > Oct 10 13:24:37.022118 kernel:[ 14.964794] i915 :00:02.0: Failed > > > > > to initialize GPU, declaring it wedged! > > > > > > > > > > Any suggestions welcome. > > > > > > > > Presumably this is with a PV dom0. What are 39b5845000 and 4238d0a000 > > > > in the machine memory map? > > > > They are bogus? > > End of RAM is 0x47c80 > > Thats: > > 0x047c80 > > vs. > > 0x39b5845000 > > 0x4238d0a000 > > > > > > This smells like a missing RMRR in the ACPI tables. > > The RMRRs are: > (XEN) [VT-D]Host address width 39 > (XEN) [VT-D]found ACPI_DMAR_DRHD: > (XEN) [VT-D] dmaru->address = fed9 > (XEN) [VT-D]drhd->address = fed9 iommu->reg = 82c00021d000 > (XEN) [VT-D]cap = 1cc40660462 ecap = 19e2ff0505e > (XEN) [VT-D] endpoint: :00:02.0 > (XEN) [VT-D]found ACPI_DMAR_DRHD: > (XEN) [VT-D] dmaru->address = fed91000 > (XEN) [VT-D]drhd->address = fed91000 iommu->reg = 82c00021f000 > (XEN) [VT-D]cap = d
Re: [Intel-gfx] i915 dma faults on Xen
On Wed, Oct 14, 2020 at 08:37:06PM +0100, Andrew Cooper wrote: > On 14/10/2020 20:28, Jason Andryuk wrote: > > Hi, > > > > Bug opened at https://gitlab.freedesktop.org/drm/intel/-/issues/2576 > > > > I'm seeing DMA faults for the i915 graphics hardware on a Dell > > Latitude 5500. These were captured when I plugged into a Dell > > Thunderbolt dock with two DisplayPort monitors attached. Xen 4.12.4 > > staging and Linux 5.4.70 (and some earlier versions). > > > > Oct 14 18:41:49.056490 kernel:[ 85.570347] [drm:gen8_de_irq_handler > > [i915]] *ERROR* Fault errors on pipe A: 0x0080 > > Oct 14 18:41:49.056494 kernel:[ 85.570395] [drm:gen8_de_irq_handler > > [i915]] *ERROR* Fault errors on pipe A: 0x0080 > > Oct 14 18:41:49.056589 VM hypervisor: (XEN) [VT-D]DMAR:[DMA Read] > > Request device [:00:02.0] fault addr 39b5845000, iommu reg = > > 82c00021d000 > > Oct 14 18:41:49.056594 VM hypervisor: (XEN) [VT-D]DMAR: reason 06 - > > PTE Read access is not set > > Oct 14 18:41:49.056784 kernel:[ 85.570668] [drm:gen8_de_irq_handler > > [i915]] *ERROR* Fault errors on pipe A: 0x0080 > > Oct 14 18:41:49.056789 kernel:[ 85.570687] [drm:gen8_de_irq_handler > > [i915]] *ERROR* Fault errors on pipe A: 0x0080 > > Oct 14 18:41:49.056885 VM hypervisor: (XEN) [VT-D]DMAR:[DMA Read] > > Request device [:00:02.0] fault addr 4238d0a000, iommu reg = > > 82c00021d000 > > Oct 14 18:41:49.056890 VM hypervisor: (XEN) [VT-D]DMAR: reason 06 - > > PTE Read access is not set > > > > They repeat. In the log attached to > > https://gitlab.freedesktop.org/drm/intel/-/issues/2576, they start at > > "Oct 14 18:41:49.056589" and continue until I unplug the dock around > > "Oct 14 18:41:54.801802". > > > > I've also seen similar messages when attaching the laptop's HDMI port > > to a 4k monitor. The eDP display by itself seems okay. > > > > I tried Fedora 31 & 32 live images with intel_iommu=on, so no Xen, and > > didn't see any errors > > > > This is a kernel & xen log with drm.debug=0x1e. It also includes some > > application (glass) logging when it changes resolutions which seems to > > set off the DMA faults. 5500-igfx-messages-kern-xen-glass > > > > Running xen with iommu=no-igfx disables the iommu for the i915 > > graphics and no faults are reported. However, that breaks some other > > devices (Dell Latitude 7200 and 5580) giving a black screen with: > > > > Oct 10 13:24:37.022117 kernel:[ 14.884759] i915 :00:02.0: Failed > > to idle engines, declaring wedged! > > Oct 10 13:24:37.022118 kernel:[ 14.964794] i915 :00:02.0: Failed > > to initialize GPU, declaring it wedged! > > > > Any suggestions welcome. > > Presumably this is with a PV dom0. What are 39b5845000 and 4238d0a000 > in the machine memory map? > > This smells like a missing RMRR in the ACPI tables. I agree. Can you paste the memory map as printed by Xen when booting, and what command line are you using to boot Xen. Have you tried adding dom0-iommu=map-inclusive to the Xen command line? Roger. ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [PATCH v2 15/21] xen-blkfront: Make use of the new sg_map helper function
On Tue, Apr 25, 2017 at 12:21:02PM -0600, Logan Gunthorpe wrote: > Straightforward conversion to the new helper, except due to the lack > of error path, we have to use SG_MAP_MUST_NOT_FAIL which may BUG_ON in > certain cases in the future. > > Signed-off-by: Logan Gunthorpe <log...@deltatee.com> > Cc: Boris Ostrovsky <boris.ostrov...@oracle.com> > Cc: Juergen Gross <jgr...@suse.com> > Cc: Konrad Rzeszutek Wilk <konrad.w...@oracle.com> > Cc: "Roger Pau Monné" <roger@citrix.com> > --- > drivers/block/xen-blkfront.c | 20 +++- > 1 file changed, 11 insertions(+), 9 deletions(-) > > diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c > index 3945963..ed62175 100644 > --- a/drivers/block/xen-blkfront.c > +++ b/drivers/block/xen-blkfront.c > @@ -816,8 +816,9 @@ static int blkif_queue_rw_req(struct request *req, struct > blkfront_ring_info *ri > BUG_ON(sg->offset + sg->length > PAGE_SIZE); > > if (setup.need_copy) { > - setup.bvec_off = sg->offset; > - setup.bvec_data = kmap_atomic(sg_page(sg)); > + setup.bvec_off = 0; > + setup.bvec_data = sg_map(sg, 0, SG_KMAP_ATOMIC | > + SG_MAP_MUST_NOT_FAIL); I assume that sg_map already adds sg->offset to the address? Also wondering whether we can get rid of bvec_off and just increment bvec_data, adding Julien who IIRC added this code. > } > > gnttab_foreach_grant_in_range(sg_page(sg), > @@ -827,7 +828,7 @@ static int blkif_queue_rw_req(struct request *req, struct > blkfront_ring_info *ri > ); > > if (setup.need_copy) > - kunmap_atomic(setup.bvec_data); > + sg_unmap(sg, setup.bvec_data, 0, SG_KMAP_ATOMIC); > } > if (setup.segments) > kunmap_atomic(setup.segments); > @@ -1053,7 +1054,7 @@ static int xen_translate_vdev(int vdevice, int *minor, > unsigned int *offset) > case XEN_SCSI_DISK5_MAJOR: > case XEN_SCSI_DISK6_MAJOR: > case XEN_SCSI_DISK7_MAJOR: > - *offset = (*minor / PARTS_PER_DISK) + > + *offset = (*minor / PARTS_PER_DISK) + > ((major - XEN_SCSI_DISK1_MAJOR + 1) * 16) + > EMULATED_SD_DISK_NAME_OFFSET; > *minor = *minor + > @@ -1068,7 +1069,7 @@ static int xen_translate_vdev(int vdevice, int *minor, > unsigned int *offset) > case XEN_SCSI_DISK13_MAJOR: > case XEN_SCSI_DISK14_MAJOR: > case XEN_SCSI_DISK15_MAJOR: > - *offset = (*minor / PARTS_PER_DISK) + > + *offset = (*minor / PARTS_PER_DISK) + > ((major - XEN_SCSI_DISK8_MAJOR + 8) * 16) + > EMULATED_SD_DISK_NAME_OFFSET; > *minor = *minor + > @@ -1119,7 +1120,7 @@ static int xlvbd_alloc_gendisk(blkif_sector_t capacity, > if (!VDEV_IS_EXTENDED(info->vdevice)) { > err = xen_translate_vdev(info->vdevice, , ); > if (err) > - return err; > + return err; Cosmetic changes should go in a separate patch please. Roger. ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx