Re: [Intel-gfx] i915 dma faults on Xen

2021-02-22 Thread Roger Pau Monné
On Fri, Feb 19, 2021 at 12:30:23PM -0500, Jason Andryuk wrote:
> On Wed, Oct 21, 2020 at 9:59 AM Jan Beulich  wrote:
> >
> > On 21.10.2020 15:36, Jason Andryuk wrote:
> > > On Wed, Oct 21, 2020 at 8:53 AM Jan Beulich  wrote:
> > >>
> > >> On 21.10.2020 14:45, Jason Andryuk wrote:
> > >>> On Wed, Oct 21, 2020 at 5:58 AM Roger Pau Monné  
> > >>> wrote:
> > >>>> Hm, it's hard to tell what's going on. My limited experience with
> > >>>> IOMMU faults on broken systems there's a small range that initially
> > >>>> triggers those, and then the device goes wonky and starts accessing a
> > >>>> whole load of invalid addresses.
> > >>>>
> > >>>> You could try adding those manually using the rmrr Xen command line
> > >>>> option [0], maybe you can figure out which range(s) are missing?
> > >>>
> > >>> They seem to change, so it's hard to know.  Would there be harm in
> > >>> adding one to cover the end of RAM ( 0x04,7c80, ) to (
> > >>> 0xff,, )?  Maybe that would just quiet the pointless faults
> > >>> while leaving the IOMMU enabled?
> > >>
> > >> While they may quieten the faults, I don't think those faults are
> > >> pointless. They indicate some problem with the software (less
> > >> likely the hardware, possibly the firmware) that you're using.
> > >> Also there's the question of what the overall behavior is going
> > >> to be when devices are permitted to access unpopulated address
> > >> ranges. I assume you did check already that no devices have their
> > >> BARs placed in that range?
> > >
> > > Isn't no-igfx already letting them try to read those unpopulated 
> > > addresses?
> >
> > Yes, and it is for the reason that the documentation for the
> > option says "If specifying `no-igfx` fixes anything, please
> > report the problem." I imply from in in particular that one
> > better wouldn't use it for non-development purposes of whatever
> > kind.
> 
> I stopped seeing these DMA faults, but I didn't know what made them go
> away.  Then when working with an older 5.4.64 kernel, I saw them
> again.  Eric bisected down to the 5.4.y version of mainline linux
> commit:
> 
> commit 8195400f7ea95399f721ad21f4d663a62c65036f
> Author: Chris Wilson 
> Date:   Mon Oct 19 11:15:23 2020 +0100
> 
> drm/i915: Force VT'd workarounds when running as a guest OS
> 
> If i915.ko is being used as a passthrough device, it does not know if
> the host is using intel_iommu. Mixing the iommu and gfx causes a few
> issues (such as scanout overfetch) which we need to workaround inside
> the driver, so if we detect we are running under a hypervisor, also
> assume the device access is being virtualised.

So the commit above fixes the DMA faults seen on Linux when using a
i915 gfx card?

Thanks for digging into this.

Roger.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] i915 dma faults on Xen

2020-10-21 Thread Roger Pau Monné
On Wed, Oct 21, 2020 at 12:33:05PM +0200, Jan Beulich wrote:
> On 21.10.2020 11:58, Roger Pau Monné wrote:
> > On Fri, Oct 16, 2020 at 12:23:22PM -0400, Jason Andryuk wrote:
> >> The RMRRs are:
> >> (XEN) [VT-D]Host address width 39
> >> (XEN) [VT-D]found ACPI_DMAR_DRHD:
> >> (XEN) [VT-D]  dmaru->address = fed9
> >> (XEN) [VT-D]drhd->address = fed9 iommu->reg = 82c00021d000
> >> (XEN) [VT-D]cap = 1cc40660462 ecap = 19e2ff0505e
> >> (XEN) [VT-D] endpoint: :00:02.0
> >> (XEN) [VT-D]found ACPI_DMAR_DRHD:
> >> (XEN) [VT-D]  dmaru->address = fed91000
> >> (XEN) [VT-D]drhd->address = fed91000 iommu->reg = 82c00021f000
> >> (XEN) [VT-D]cap = d2008c40660462 ecap = f050da
> >> (XEN) [VT-D] IOAPIC: :00:1e.7
> >> (XEN) [VT-D] MSI HPET: :00:1e.6
> >> (XEN) [VT-D]  flags: INCLUDE_ALL
> >> (XEN) [VT-D]found ACPI_DMAR_RMRR:
> >> (XEN) [VT-D] endpoint: :00:14.0
> >> (XEN) [VT-D]dmar.c:615:   RMRR region: base_addr 78863000 end_addr 78882fff
> >> (XEN) [VT-D]found ACPI_DMAR_RMRR:
> >> (XEN) [VT-D] endpoint: :00:02.0
> >> (XEN) [VT-D]dmar.c:615:   RMRR region: base_addr 7d00 end_addr 7f7f
> >> (XEN) [VT-D]found ACPI_DMAR_RMRR:
> >> (XEN) [VT-D] endpoint: :00:16.7
> >> (XEN) [VT-D]dmar.c:581:  Non-existent device (:00:16.7) is
> >> reported in RMRR (78907000, 78986fff)'s scope!
> >> (XEN) [VT-D]dmar.c:596:   Ignore the RMRR (78907000, 78986fff) due to
> > 
> > This is also part of a reserved region, so should be added to the
> > iommu page tables anyway regardless of this message.
> 
> Could you clarify why you think so? RMRRs are tied to devices, so
> if a device in reality doesn't exist (and no other one uses the
> same range), I don't see why an IOMMU mapping would be needed
> (unless to work around some related firmware bug). Plus aiui none
> of the IOMMU faults actually report this range as having got
> accessed.

Since it's the hardware domain that gets the gfx card assigned here it
will get any reserved regions added to the IOMMU page tables in
arch_iommu_hwdom_init. I agree it's not relevant here, since those are
not the regions reported in the IOMMU faults.

Roger.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] i915 dma faults on Xen

2020-10-21 Thread Roger Pau Monné
On Fri, Oct 16, 2020 at 12:23:22PM -0400, Jason Andryuk wrote:
> On Thu, Oct 15, 2020 at 11:16 AM Jason Andryuk  wrote:
> >
> > On Thu, Oct 15, 2020 at 7:31 AM Roger Pau Monné  
> > wrote:
> > >
> > > On Wed, Oct 14, 2020 at 08:37:06PM +0100, Andrew Cooper wrote:
> > > > On 14/10/2020 20:28, Jason Andryuk wrote:
> > > > > Hi,
> > > > >
> > > > > Bug opened at https://gitlab.freedesktop.org/drm/intel/-/issues/2576
> > > > >
> > > > > I'm seeing DMA faults for the i915 graphics hardware on a Dell
> > > > > Latitude 5500. These were captured when I plugged into a Dell
> > > > > Thunderbolt dock with two DisplayPort monitors attached.  Xen 4.12.4
> > > > > staging and Linux 5.4.70 (and some earlier versions).
> > > > >
> > > > > Oct 14 18:41:49.056490 kernel:[   85.570347] [drm:gen8_de_irq_handler
> > > > > [i915]] *ERROR* Fault errors on pipe A: 0x0080
> > > > > Oct 14 18:41:49.056494 kernel:[   85.570395] [drm:gen8_de_irq_handler
> > > > > [i915]] *ERROR* Fault errors on pipe A: 0x0080
> > > > > Oct 14 18:41:49.056589 VM hypervisor: (XEN) [VT-D]DMAR:[DMA Read]
> > > > > Request device [:00:02.0] fault addr 39b5845000, iommu reg =
> > > > > 82c00021d000
> > > > > Oct 14 18:41:49.056594 VM hypervisor: (XEN) [VT-D]DMAR: reason 06 -
> > > > > PTE Read access is not set
> > > > > Oct 14 18:41:49.056784 kernel:[   85.570668] [drm:gen8_de_irq_handler
> > > > > [i915]] *ERROR* Fault errors on pipe A: 0x0080
> > > > > Oct 14 18:41:49.056789 kernel:[   85.570687] [drm:gen8_de_irq_handler
> > > > > [i915]] *ERROR* Fault errors on pipe A: 0x0080
> > > > > Oct 14 18:41:49.056885 VM hypervisor: (XEN) [VT-D]DMAR:[DMA Read]
> > > > > Request device [:00:02.0] fault addr 4238d0a000, iommu reg =
> > > > > 82c00021d000
> > > > > Oct 14 18:41:49.056890 VM hypervisor: (XEN) [VT-D]DMAR: reason 06 -
> > > > > PTE Read access is not set
> > > > >
> > > > > They repeat. In the log attached to
> > > > > https://gitlab.freedesktop.org/drm/intel/-/issues/2576, they start at
> > > > > "Oct 14 18:41:49.056589" and continue until I unplug the dock around
> > > > > "Oct 14 18:41:54.801802".
> > > > >
> > > > > I've also seen similar messages when attaching the laptop's HDMI port
> > > > > to a 4k monitor. The eDP display by itself seems okay.
> > > > >
> > > > > I tried Fedora 31 & 32 live images with intel_iommu=on, so no Xen, and
> > > > > didn't see any errors
> > > > >
> > > > > This is a kernel & xen log with drm.debug=0x1e. It also includes some
> > > > > application (glass) logging when it changes resolutions which seems to
> > > > > set off the DMA faults. 5500-igfx-messages-kern-xen-glass
> > > > >
> > > > > Running xen with iommu=no-igfx disables the iommu for the i915
> > > > > graphics and no faults are reported. However, that breaks some other
> > > > > devices (Dell Latitude 7200 and 5580) giving a black screen with:
> > > > >
> > > > > Oct 10 13:24:37.022117 kernel:[   14.884759] i915 :00:02.0: Failed
> > > > > to idle engines, declaring wedged!
> > > > > Oct 10 13:24:37.022118 kernel:[   14.964794] i915 :00:02.0: Failed
> > > > > to initialize GPU, declaring it wedged!
> > > > >
> > > > > Any suggestions welcome.
> > > >
> > > > Presumably this is with a PV dom0.  What are 39b5845000 and 4238d0a000
> > > > in the machine memory map?
> >
> > They are bogus?
> > End of RAM is 0x47c80
> > Thats:
> > 0x047c80
> > vs.
> > 0x39b5845000
> > 0x4238d0a000
> >
> > > > This smells like a missing RMRR in the ACPI tables.
> 
> The RMRRs are:
> (XEN) [VT-D]Host address width 39
> (XEN) [VT-D]found ACPI_DMAR_DRHD:
> (XEN) [VT-D]  dmaru->address = fed9
> (XEN) [VT-D]drhd->address = fed9 iommu->reg = 82c00021d000
> (XEN) [VT-D]cap = 1cc40660462 ecap = 19e2ff0505e
> (XEN) [VT-D] endpoint: :00:02.0
> (XEN) [VT-D]found ACPI_DMAR_DRHD:
> (XEN) [VT-D]  dmaru->address = fed91000
> (XEN) [VT-D]drhd->address = fed91000 iommu->reg = 82c00021f000
> (XEN) [VT-D]cap = d

Re: [Intel-gfx] i915 dma faults on Xen

2020-10-15 Thread Roger Pau Monné
On Wed, Oct 14, 2020 at 08:37:06PM +0100, Andrew Cooper wrote:
> On 14/10/2020 20:28, Jason Andryuk wrote:
> > Hi,
> >
> > Bug opened at https://gitlab.freedesktop.org/drm/intel/-/issues/2576
> >
> > I'm seeing DMA faults for the i915 graphics hardware on a Dell
> > Latitude 5500. These were captured when I plugged into a Dell
> > Thunderbolt dock with two DisplayPort monitors attached.  Xen 4.12.4
> > staging and Linux 5.4.70 (and some earlier versions).
> >
> > Oct 14 18:41:49.056490 kernel:[   85.570347] [drm:gen8_de_irq_handler
> > [i915]] *ERROR* Fault errors on pipe A: 0x0080
> > Oct 14 18:41:49.056494 kernel:[   85.570395] [drm:gen8_de_irq_handler
> > [i915]] *ERROR* Fault errors on pipe A: 0x0080
> > Oct 14 18:41:49.056589 VM hypervisor: (XEN) [VT-D]DMAR:[DMA Read]
> > Request device [:00:02.0] fault addr 39b5845000, iommu reg =
> > 82c00021d000
> > Oct 14 18:41:49.056594 VM hypervisor: (XEN) [VT-D]DMAR: reason 06 -
> > PTE Read access is not set
> > Oct 14 18:41:49.056784 kernel:[   85.570668] [drm:gen8_de_irq_handler
> > [i915]] *ERROR* Fault errors on pipe A: 0x0080
> > Oct 14 18:41:49.056789 kernel:[   85.570687] [drm:gen8_de_irq_handler
> > [i915]] *ERROR* Fault errors on pipe A: 0x0080
> > Oct 14 18:41:49.056885 VM hypervisor: (XEN) [VT-D]DMAR:[DMA Read]
> > Request device [:00:02.0] fault addr 4238d0a000, iommu reg =
> > 82c00021d000
> > Oct 14 18:41:49.056890 VM hypervisor: (XEN) [VT-D]DMAR: reason 06 -
> > PTE Read access is not set
> >
> > They repeat. In the log attached to
> > https://gitlab.freedesktop.org/drm/intel/-/issues/2576, they start at
> > "Oct 14 18:41:49.056589" and continue until I unplug the dock around
> > "Oct 14 18:41:54.801802".
> >
> > I've also seen similar messages when attaching the laptop's HDMI port
> > to a 4k monitor. The eDP display by itself seems okay.
> >
> > I tried Fedora 31 & 32 live images with intel_iommu=on, so no Xen, and
> > didn't see any errors
> >
> > This is a kernel & xen log with drm.debug=0x1e. It also includes some
> > application (glass) logging when it changes resolutions which seems to
> > set off the DMA faults. 5500-igfx-messages-kern-xen-glass
> >
> > Running xen with iommu=no-igfx disables the iommu for the i915
> > graphics and no faults are reported. However, that breaks some other
> > devices (Dell Latitude 7200 and 5580) giving a black screen with:
> >
> > Oct 10 13:24:37.022117 kernel:[   14.884759] i915 :00:02.0: Failed
> > to idle engines, declaring wedged!
> > Oct 10 13:24:37.022118 kernel:[   14.964794] i915 :00:02.0: Failed
> > to initialize GPU, declaring it wedged!
> >
> > Any suggestions welcome.
> 
> Presumably this is with a PV dom0.  What are 39b5845000 and 4238d0a000
> in the machine memory map?
> 
> This smells like a missing RMRR in the ACPI tables.

I agree.

Can you paste the memory map as printed by Xen when booting, and what
command line are you using to boot Xen.

Have you tried adding dom0-iommu=map-inclusive to the Xen command
line?

Roger.
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx


Re: [Intel-gfx] [PATCH v2 15/21] xen-blkfront: Make use of the new sg_map helper function

2017-05-03 Thread Roger Pau Monné
On Tue, Apr 25, 2017 at 12:21:02PM -0600, Logan Gunthorpe wrote:
> Straightforward conversion to the new helper, except due to the lack
> of error path, we have to use SG_MAP_MUST_NOT_FAIL which may BUG_ON in
> certain cases in the future.
> 
> Signed-off-by: Logan Gunthorpe <log...@deltatee.com>
> Cc: Boris Ostrovsky <boris.ostrov...@oracle.com>
> Cc: Juergen Gross <jgr...@suse.com>
> Cc: Konrad Rzeszutek Wilk <konrad.w...@oracle.com>
> Cc: "Roger Pau Monné" <roger@citrix.com>
> ---
>  drivers/block/xen-blkfront.c | 20 +++-
>  1 file changed, 11 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> index 3945963..ed62175 100644
> --- a/drivers/block/xen-blkfront.c
> +++ b/drivers/block/xen-blkfront.c
> @@ -816,8 +816,9 @@ static int blkif_queue_rw_req(struct request *req, struct 
> blkfront_ring_info *ri
>   BUG_ON(sg->offset + sg->length > PAGE_SIZE);
>  
>   if (setup.need_copy) {
> - setup.bvec_off = sg->offset;
> - setup.bvec_data = kmap_atomic(sg_page(sg));
> + setup.bvec_off = 0;
> + setup.bvec_data = sg_map(sg, 0, SG_KMAP_ATOMIC |
> +  SG_MAP_MUST_NOT_FAIL);

I assume that sg_map already adds sg->offset to the address?

Also wondering whether we can get rid of bvec_off and just increment bvec_data,
adding Julien who IIRC added this code.

>   }
>  
>   gnttab_foreach_grant_in_range(sg_page(sg),
> @@ -827,7 +828,7 @@ static int blkif_queue_rw_req(struct request *req, struct 
> blkfront_ring_info *ri
> );
>  
>   if (setup.need_copy)
> - kunmap_atomic(setup.bvec_data);
> + sg_unmap(sg, setup.bvec_data, 0, SG_KMAP_ATOMIC);
>   }
>   if (setup.segments)
>   kunmap_atomic(setup.segments);
> @@ -1053,7 +1054,7 @@ static int xen_translate_vdev(int vdevice, int *minor, 
> unsigned int *offset)
>   case XEN_SCSI_DISK5_MAJOR:
>   case XEN_SCSI_DISK6_MAJOR:
>   case XEN_SCSI_DISK7_MAJOR:
> - *offset = (*minor / PARTS_PER_DISK) + 
> + *offset = (*minor / PARTS_PER_DISK) +
>   ((major - XEN_SCSI_DISK1_MAJOR + 1) * 16) +
>   EMULATED_SD_DISK_NAME_OFFSET;
>   *minor = *minor +
> @@ -1068,7 +1069,7 @@ static int xen_translate_vdev(int vdevice, int *minor, 
> unsigned int *offset)
>   case XEN_SCSI_DISK13_MAJOR:
>   case XEN_SCSI_DISK14_MAJOR:
>   case XEN_SCSI_DISK15_MAJOR:
> - *offset = (*minor / PARTS_PER_DISK) + 
> + *offset = (*minor / PARTS_PER_DISK) +
>   ((major - XEN_SCSI_DISK8_MAJOR + 8) * 16) +
>   EMULATED_SD_DISK_NAME_OFFSET;
>   *minor = *minor +
> @@ -1119,7 +1120,7 @@ static int xlvbd_alloc_gendisk(blkif_sector_t capacity,
>   if (!VDEV_IS_EXTENDED(info->vdevice)) {
>   err = xen_translate_vdev(info->vdevice, , );
>   if (err)
> - return err; 
> + return err;

Cosmetic changes should go in a separate patch please.

Roger.

___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx