Re: [PATCH] iommu/intel: quirk to disable DMAR for QM57 igfx

2019-01-23 Thread Joerg Roedel
On Wed, Jan 23, 2019 at 05:02:38PM +0200, Joonas Lahtinen wrote:
> We have many reports where just having intel_iommu=on (and using the
> system normally, without any virtualization stuff going on) will cause
> unexplained GPU hangs. For those users, simply switching to
> intel_iommu=igfx_off solves the problems, and the debug often ends
> there.

If you can reproduce problems on your side, then you can try to enable
CONFIG_INTEL_IOMMU_BROKEN_GFX_WA to force the GFX devices into the
identity mapping. We can also add a boot-parameter and workarounds if it
turns out that this is sufficient to make the GFX devices work with
IOMMU enabled.


Regards,

Joerg


Re: [PATCH] iommu/intel: quirk to disable DMAR for QM57 igfx

2019-01-23 Thread Joonas Lahtinen
Quoting Joerg Roedel (2019-01-22 18:51:35)
> On Tue, Jan 22, 2019 at 04:48:26PM +0200, Joonas Lahtinen wrote:
> > According to our IOMMU folks there exists some desire to be able to assign
> > the iGFX device aka have intel_iommu=on instead of intel_iommu=igfx_off
> > due to how the devices might be grouped in IOMMU groups. Even when you
> > would not be using the iGFX device.
> 
> You can force the igfx device into a SI domain, or does that also
> trigger the iommu issues on the chipset?

To be honest, we've had a mixture different issues on different SKUs
that have not been hit in the past when intel_iommu was just disabled by
default.

I know that in one group of the problems, the issue has been debugged
into the GPU having its own set of virtualization mapping translation
hardware with caching and it fails to track changes to the mapping. So
if a identity mapping was established and never changed, I'd assume that
to fix at least that class of problems.

Would just passing intel_iommu=on already cause a non-identity mapping to
possibly be used for the integrated GPU? If it did, then it would
explain quite few of the issues.

We have many reports where just having intel_iommu=on (and using the
system normally, without any virtualization stuff going on) will cause
unexplained GPU hangs. For those users, simply switching to
intel_iommu=igfx_off solves the problems, and the debug often ends
there.

Regards, Joonas

> In any case, if iommu=on breaks these systems I want to make them work
> again with opt-out, even at the cost of breaking assignability.
> 
> Regards,
> 
> Joerg


Re: [PATCH] iommu/intel: quirk to disable DMAR for QM57 igfx

2019-01-22 Thread Joerg Roedel
On Tue, Jan 22, 2019 at 04:48:26PM +0200, Joonas Lahtinen wrote:
> According to our IOMMU folks there exists some desire to be able to assign
> the iGFX device aka have intel_iommu=on instead of intel_iommu=igfx_off
> due to how the devices might be grouped in IOMMU groups. Even when you
> would not be using the iGFX device.

You can force the igfx device into a SI domain, or does that also
trigger the iommu issues on the chipset?

In any case, if iommu=on breaks these systems I want to make them work
again with opt-out, even at the cost of breaking assignability.

Regards,

Joerg


Re: [PATCH] iommu/intel: quirk to disable DMAR for QM57 igfx

2019-01-22 Thread Joonas Lahtinen
Quoting Joerg Roedel (2019-01-22 13:01:09)
> Hi Daniel,
> 
> On Tue, Jan 22, 2019 at 11:46:39AM +0100, Daniel Vetter wrote:
> > Note that the string of platforms which have various issues with iommu
> > and igfx is very long, thus far we only disabled it where there's no
> > workaround to stop it from hanging the box, but otherwise left it
> > enabled. So if we make a policy change to also disable it anywhere
> > where it doesn't work well (instead of not at all), there's a pile
> > more platforms to switch.
> 
> I think its best to just disable iommu for the igfx devices on these
> platforms. Can you pick up Eric's patch and extend it with the list of
> affected platforms?

We've been discussing this again more actively since a few months ago,
and the discussion is still ongoing internally.

According to our IOMMU folks there exists some desire to be able to assign
the iGFX device aka have intel_iommu=on instead of intel_iommu=igfx_off
due to how the devices might be grouped in IOMMU groups. Even when you
would not be using the iGFX device.

So for some uses, the fact that the device (group) is assignable seems
to be more important than the iGFX device to be working. I'm afraid
that retroactively disabling the assignment for such an old platform
might break those usage scenarios. By my quick reading of the code,
there's no way for user to turn the iGFX DMAR on once the quirk
disables it.

I guess one solution would be to default to intel_iommu=igfx_off for
platforms that are older than certain threshold. But still allow
user to enable. But that then requires duplicating the PCI ID database
into iommu code.

I don't really have winning moves to present, but I'm open to hearing
how we can avoid more damage than starting to default to intel_iommu=on
did already.

Regards, Joonas

> 
> Thanks,
> 
> Joerg


Re: [PATCH] iommu/intel: quirk to disable DMAR for QM57 igfx

2019-01-22 Thread Joerg Roedel
Hi Daniel,

On Tue, Jan 22, 2019 at 11:46:39AM +0100, Daniel Vetter wrote:
> Note that the string of platforms which have various issues with iommu
> and igfx is very long, thus far we only disabled it where there's no
> workaround to stop it from hanging the box, but otherwise left it
> enabled. So if we make a policy change to also disable it anywhere
> where it doesn't work well (instead of not at all), there's a pile
> more platforms to switch.

I think its best to just disable iommu for the igfx devices on these
platforms. Can you pick up Eric's patch and extend it with the list of
affected platforms?

Thanks,

Joerg


Re: [PATCH] iommu/intel: quirk to disable DMAR for QM57 igfx

2019-01-22 Thread Daniel Vetter
On Tue, Jan 22, 2019 at 11:39 AM Joerg Roedel  wrote:
>
> On Fri, Jan 18, 2019 at 12:17:05PM +, Eric Wong wrote:
> > @@ -5411,6 +5411,7 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2e20, 
> > quirk_iommu_g4x_gfx);
> >  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2e30, quirk_iommu_g4x_gfx);
> >  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2e40, quirk_iommu_g4x_gfx);
> >  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2e90, quirk_iommu_g4x_gfx);
> > +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x0044, quirk_iommu_g4x_gfx);
> >
> >  static void quirk_iommu_rwbf(struct pci_dev *dev)
> >  {
> > @@ -5457,7 +5458,6 @@ static void quirk_calpella_no_shadow_gtt(struct 
> > pci_dev *dev)
> > }
> >  }
> >  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x0040, 
> > quirk_calpella_no_shadow_gtt);
> > -DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x0044, 
> > quirk_calpella_no_shadow_gtt);
> >  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x0062, 
> > quirk_calpella_no_shadow_gtt);
> >  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x006a, 
> > quirk_calpella_no_shadow_gtt);
>
> This seems to make sense to me. Joonas, any comments or objections?

This is ironlake, which has a huge iommu hack in the gpu driver to
work around hard hangs, which:
- causes massive stalls and kills performance
- isn't well tested (it's the only one that needs this), so tends to break

So if we do this then imo we should:
- probably nuke that w/a too (check for needs_idle_maps and all the
related stuff in i915_gem_gtt.c)
- roll it out for all affected chips (i.e. need to include 0x0040).

Note that the string of platforms which have various issues with iommu
and igfx is very long, thus far we only disabled it where there's no
workaround to stop it from hanging the box, but otherwise left it
enabled. So if we make a policy change to also disable it anywhere
where it doesn't work well (instead of not at all), there's a pile
more platforms to switch.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch


Re: [PATCH] iommu/intel: quirk to disable DMAR for QM57 igfx

2019-01-22 Thread Joerg Roedel
On Fri, Jan 18, 2019 at 12:17:05PM +, Eric Wong wrote:
> @@ -5411,6 +5411,7 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2e20, 
> quirk_iommu_g4x_gfx);
>  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2e30, quirk_iommu_g4x_gfx);
>  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2e40, quirk_iommu_g4x_gfx);
>  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2e90, quirk_iommu_g4x_gfx);
> +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x0044, quirk_iommu_g4x_gfx);
>  
>  static void quirk_iommu_rwbf(struct pci_dev *dev)
>  {
> @@ -5457,7 +5458,6 @@ static void quirk_calpella_no_shadow_gtt(struct pci_dev 
> *dev)
> }
>  }
>  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x0040, 
> quirk_calpella_no_shadow_gtt);
> -DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x0044, 
> quirk_calpella_no_shadow_gtt);
>  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x0062, 
> quirk_calpella_no_shadow_gtt);
>  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x006a, 
> quirk_calpella_no_shadow_gtt);

This seems to make sense to me. Joonas, any comments or objections?

Regards,

Joerg