WARNING: CPU: 4 PID: 863 at include/drm/drm_crtc.h:1577 drm_helper_choose_encoder_dpms+0x88/0x90() - evildoer found and neutralized

2015-10-09 Thread Joerg Roedel
On Tue, Oct 06, 2015 at 09:13:11PM +0800, Jiang Liu wrote:
>   We are on leave for Chinese National Holiday and has limited
> access to my working environment. It would be appreciated if you could
> help to send out a patch for it. Otherwise I will send out a patch
> within 2-3 days.

Okay, I just sent the patch.


Joerg



WARNING: CPU: 4 PID: 863 at include/drm/drm_crtc.h:1577 drm_helper_choose_encoder_dpms+0x88/0x90() - evildoer found and neutralized

2015-10-06 Thread Jiang Liu
On 2015/10/5 18:03, Joerg Roedel wrote:
> Hi Jiang,
> 
> On Sat, Oct 03, 2015 at 03:36:35PM +0800, Jiang Liu wrote:
>> So to summary, I think we only need following change to fix the
>> regression:
>>  int pcibios_alloc_irq(struct pci_dev *dev)
>>  {
>> +if (pci_dev_msi_enabled(dev))
>> +return -EBUSY;
>>
>> What do you think?
> 
> Yes, that works too and has the added benefit that no driver can attach
> to the iommu device and get in the way of the driver.
> 
> Will you send the patch for this change or should I do it?
Hi Joerg,
We are on leave for Chinese National Holiday and has limited
access to my working environment. It would be appreciated if you could
help to send out a patch for it. Otherwise I will send out a patch
within 2-3 days.
Thanks!
Gerry


WARNING: CPU: 4 PID: 863 at include/drm/drm_crtc.h:1577 drm_helper_choose_encoder_dpms+0x88/0x90() - evildoer found and neutralized

2015-10-05 Thread Joerg Roedel
Hi Jiang,

On Sat, Oct 03, 2015 at 03:36:35PM +0800, Jiang Liu wrote:
> So to summary, I think we only need following change to fix the
> regression:
>  int pcibios_alloc_irq(struct pci_dev *dev)
>  {
> + if (pci_dev_msi_enabled(dev))
> + return -EBUSY;
> 
> What do you think?

Yes, that works too and has the added benefit that no driver can attach
to the iommu device and get in the way of the driver.

Will you send the patch for this change or should I do it?



Joerg



WARNING: CPU: 4 PID: 863 at include/drm/drm_crtc.h:1577 drm_helper_choose_encoder_dpms+0x88/0x90() - evildoer found and neutralized

2015-10-03 Thread Jiang Liu


On 2015/10/1 1:36, Borislav Petkov wrote:
> On Thu, Oct 01, 2015 at 01:00:44AM +0800, Jiang Liu wrote:
>> Thanks Joerg, that makes sense. If some driver tries to binding to
>> the IOMMU device, it will trigger the scenario as you described. For
>> example, Xen backend driver will try to probe all PCI devices if
>> enabled. I will do more investigation tomorrow.
> 
> Right, so this fixes the issue on my box, courtesy of Joerg. WE
> basically don't disable the IRQ on MSI-enabled devices. The AMD IOMMU
> uses a barebones PCI device but not a PCI driver, which would be an
> overkill.
> 
> ---
> diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
> index 09d3afc..29ec2eb 100644
> --- a/arch/x86/pci/common.c
> +++ b/arch/x86/pci/common.c
> @@ -674,12 +674,15 @@ int pcibios_add_device(struct pci_dev *dev)
>  
>  int pcibios_alloc_irq(struct pci_dev *dev)
>  {
> + if (pci_dev_msi_enabled(dev))
> + return 0;
We may return -EBUSY here to reject the probe operation. It
doesn't make sense to continue the probe if MSI is already enabled,
tt also helps to avoid calling pcibios_free_irq() in function
pci_device_probe().

> +
>   return pcibios_enable_irq(dev);
>  }
>  
>  void pcibios_free_irq(struct pci_dev *dev)
>  {
> - if (pcibios_disable_irq)
> + if (!pci_dev_msi_enabled(dev) && pcibios_disable_irq)
The above change is not needed, pcibios_disable_irq() will
first check !pci_has_managed_irq(dev) before actually freeing
PCI irq. pci_has_managed_irq(dev) only returns true if
pcibios_alloc_irq() succeeds.

So to summary, I think we only need following change to fix the
regression:
 int pcibios_alloc_irq(struct pci_dev *dev)
 {
+   if (pci_dev_msi_enabled(dev))
+   return -EBUSY;

What do you think?
Thanks!
Gerry

>   pcibios_disable_irq(dev);
>  }
> --
> 


WARNING: CPU: 4 PID: 863 at include/drm/drm_crtc.h:1577 drm_helper_choose_encoder_dpms+0x88/0x90() - evildoer found and neutralized

2015-10-03 Thread Borislav Petkov
On Sat, Oct 03, 2015 at 03:36:35PM +0800, Jiang Liu wrote:
> The above change is not needed, pcibios_disable_irq() will
> first check !pci_has_managed_irq(dev) before actually freeing
> PCI irq. pci_has_managed_irq(dev) only returns true if
> pcibios_alloc_irq() succeeds.
> 
> So to summary, I think we only need following change to fix the
> regression:
>  int pcibios_alloc_irq(struct pci_dev *dev)
>  {
> + if (pci_dev_msi_enabled(dev))
> + return -EBUSY;
> 
> What do you think?

Yap, that works too. I've got only this ontop of 4.3+tip:

---
diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
index dc78a4a9a466..a4687aa6c1fb 100644
--- a/arch/x86/pci/common.c
+++ b/arch/x86/pci/common.c
@@ -675,6 +675,9 @@ int pcibios_add_device(struct pci_dev *dev)

 int pcibios_alloc_irq(struct pci_dev *dev)
 {
+   if (pci_dev_msi_enabled(dev))
+   return -EBUSY;
+
return pcibios_enable_irq(dev);
 }

---

and it suspend+resumed fine.

I guess it is time for Joerg to write a proper patch. :-)

Thanks.

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.


WARNING: CPU: 4 PID: 863 at include/drm/drm_crtc.h:1577 drm_helper_choose_encoder_dpms+0x88/0x90() - evildoer found and neutralized

2015-10-01 Thread Jiang Liu
On 2015/9/30 20:44, Joerg Roedel wrote:
> On Wed, Sep 30, 2015 at 03:45:39PM +0800, Jiang Liu wrote:
>> So we need to figure out why we got irq number 0 after enabling
>> MSI for AMD IOMMU device. The only hint I got is that iommu driver just
>> grabbing the PCI device without providing a PCI device driver for IOMMU
>> PCI device, we have solved a similar case for eata driver. So could you
>> please help to apply this debug patch to gather more info and send me
>> /proc/interrupts?
> 
> I think I have an idea on how dev->irq got 0 after pci_enable_msi(). The
> PCI probe code calls pcibios_alloc_irq() and after a failed probe it calls
> pcibios_free_irq(), which sets dev->irq to 0.
> The AMD IOMMU driver does not register a pci_driver for itself, it just
> doesn't make sense for it. But the PCI device containing the IOMMU gets
> probed later, which fails because there is no driver for it. So the
> following call to pcibios_free_irq() clears dev->irq, so that it is 0 on
> the next resume. Does that make sense?

Thanks Joerg, that makes sense. If some driver tries to binding to the
IOMMU device, it will trigger the scenario as you described. For
example,  Xen backend driver will try to probe all PCI devices
if enabled. I will do more investigation tomorrow.
Thanks!
Gerry



WARNING: CPU: 4 PID: 863 at include/drm/drm_crtc.h:1577 drm_helper_choose_encoder_dpms+0x88/0x90() - evildoer found and neutralized

2015-09-30 Thread Joerg Roedel
On Wed, Sep 30, 2015 at 07:36:19PM +0200, Borislav Petkov wrote:
> Right, so this fixes the issue on my box, courtesy of Joerg. WE
> basically don't disable the IRQ on MSI-enabled devices. The AMD IOMMU
> uses a barebones PCI device but not a PCI driver, which would be an
> overkill.

Well, not only overkill, but actually harmful. As I just wrote to
Jiang, a device can be forcibly unbound from its driver, which is
something we don't want for the IOMMU.


Joerg



WARNING: CPU: 4 PID: 863 at include/drm/drm_crtc.h:1577 drm_helper_choose_encoder_dpms+0x88/0x90() - evildoer found and neutralized

2015-09-30 Thread Joerg Roedel
On Thu, Oct 01, 2015 at 01:00:44AM +0800, Jiang Liu wrote:
> Thanks Joerg, that makes sense. If some driver tries to binding to the
> IOMMU device, it will trigger the scenario as you described. For
> example,  Xen backend driver will try to probe all PCI devices
> if enabled. I will do more investigation tomorrow.

Not only that, the probe code looks like this in __pci_device_probe:

error = -ENODEV;

id = pci_match_device(drv, pci_dev);
if (id)
error = pci_call_probe(drv, pci_dev, id);
if (error >= 0)
error = 0;

The pci_match_device() function will always return NULL for the iommu
pci_dev, because no driver matches the ids of it. So the function
returns -ENODEV, which will be handled in the caller (pci_device_probe):


error = pcibios_alloc_irq(pci_dev);
if (error < 0)
return error;

pci_dev_get(pci_dev);
error = __pci_device_probe(drv, pci_dev);
if (error) {
pcibios_free_irq(pci_dev);
pci_dev_put(pci_dev);
}

For the IOMMU pci_dev a pcibios-irq will be allocated (if there is one,
like on Boris' system) and because __pci_device_probe returns -ENODEV it
will be freed again with pcibios_free_irq().

The pcibios_free_irq() function will set dev->irq = 0, which overwrites
the value that pci_enable_msi() wrote there. So later in suspend/resume
code the msi-handling part tries to fetch the irq-descriptor for the
wrong irq (which is NULL) and causes the crash.

The issue got introduced because with your changes pci_enable_msi() is
only allowed after a pci-device was successfully probed by the driver.
But this assumption is not true, as the AMD IOMMU driver does not
register as a pci-driver.

Registering a pci-driver would actually be harmful, because a device can
be forcibly unbound from its driver, which would be pretty bad for an
IOMMU in the running system.

So the right fix is to allow pci_enable_msi() for pci-devices not
registered against a driver. The fix I sent Boris has issues (I think it
leaks pcibios irqs when MSI is in use), but was thinking about fixing it
in pci_device_probe by not allocating a pcibios-irq when MSI is already
active. What do you think?

Regards,

Joerg


WARNING: CPU: 4 PID: 863 at include/drm/drm_crtc.h:1577 drm_helper_choose_encoder_dpms+0x88/0x90() - evildoer found and neutralized

2015-09-30 Thread Borislav Petkov
On Thu, Oct 01, 2015 at 01:00:44AM +0800, Jiang Liu wrote:
> Thanks Joerg, that makes sense. If some driver tries to binding to
> the IOMMU device, it will trigger the scenario as you described. For
> example, Xen backend driver will try to probe all PCI devices if
> enabled. I will do more investigation tomorrow.

Right, so this fixes the issue on my box, courtesy of Joerg. WE
basically don't disable the IRQ on MSI-enabled devices. The AMD IOMMU
uses a barebones PCI device but not a PCI driver, which would be an
overkill.

---
diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
index 09d3afc..29ec2eb 100644
--- a/arch/x86/pci/common.c
+++ b/arch/x86/pci/common.c
@@ -674,12 +674,15 @@ int pcibios_add_device(struct pci_dev *dev)

 int pcibios_alloc_irq(struct pci_dev *dev)
 {
+   if (pci_dev_msi_enabled(dev))
+   return 0;
+
return pcibios_enable_irq(dev);
 }

 void pcibios_free_irq(struct pci_dev *dev)
 {
-   if (pcibios_disable_irq)
+   if (!pci_dev_msi_enabled(dev) && pcibios_disable_irq)
pcibios_disable_irq(dev);
 }
--

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.


WARNING: CPU: 4 PID: 863 at include/drm/drm_crtc.h:1577 drm_helper_choose_encoder_dpms+0x88/0x90() - evildoer found and neutralized

2015-09-30 Thread Jiang Liu
n 2015/9/29 18:51, Borislav Petkov wrote:
> On Tue, Sep 29, 2015 at 04:50:36PM +0800, Jiang Liu wrote:
>> So could you please help to apply the attached debug patch to gather
>> more information about the regression?
> 
> Sure, just did.
> 
> I'm sending you a full s/r cycle attempt caught over serial in a private
> message.

Hi Boris,
>From the log file, we got to know that the NULL pointer dereference
was caused by AMD IOMMU device. For normal MSI-enabled PCI devices, we get
valid irq numbers such as:
[ 74.661170] ahci :04:00.0: irqdomain: freeze msi 1 irq28
[ 74.661297] radeon :01:00.0: irqdomain: freeze msi 1 irq47
But for AMD IOMMU device, we got an invalid irq number(0) after
enabling MSI as:
[ 74.662488] pci :00:00.2: irqdomain: freeze msi 1 irq0
which then caused NULL pointer deference when __pci_restore_msi_state()
gets called by system resume code.
So we need to figure out why we got irq number 0 after enabling
MSI for AMD IOMMU device. The only hint I got is that iommu driver just
grabbing the PCI device without providing a PCI device driver for IOMMU
PCI device, we have solved a similar case for eata driver. So could you
please help to apply this debug patch to gather more info and send me
/proc/interrupts?
Thanks!
Gerry

O>
> Thanks.
> 
-- next part --
A non-text attachment was scrubbed...
Name: 0001-Debug-Gather-more-information-about-AMD-iommu-device.patch
Type: text/x-patch
Size: 4545 bytes
Desc: not available
URL: 



WARNING: CPU: 4 PID: 863 at include/drm/drm_crtc.h:1577 drm_helper_choose_encoder_dpms+0x88/0x90() - evildoer found and neutralized

2015-09-30 Thread Joerg Roedel
On Wed, Sep 30, 2015 at 03:45:39PM +0800, Jiang Liu wrote:
> So we need to figure out why we got irq number 0 after enabling
> MSI for AMD IOMMU device. The only hint I got is that iommu driver just
> grabbing the PCI device without providing a PCI device driver for IOMMU
> PCI device, we have solved a similar case for eata driver. So could you
> please help to apply this debug patch to gather more info and send me
> /proc/interrupts?

I think I have an idea on how dev->irq got 0 after pci_enable_msi(). The
PCI probe code calls pcibios_alloc_irq() and after a failed probe it calls
pcibios_free_irq(), which sets dev->irq to 0.
The AMD IOMMU driver does not register a pci_driver for itself, it just
doesn't make sense for it. But the PCI device containing the IOMMU gets
probed later, which fails because there is no driver for it. So the
following call to pcibios_free_irq() clears dev->irq, so that it is 0 on
the next resume. Does that make sense?


Joerg



WARNING: CPU: 4 PID: 863 at include/drm/drm_crtc.h:1577 drm_helper_choose_encoder_dpms+0x88/0x90() - evildoer found and neutralized

2015-09-29 Thread Jiang Liu
On 2015/9/27 0:46, Borislav Petkov wrote:
> On Wed, Sep 23, 2015 at 06:18:39PM +0200, Borislav Petkov wrote:
>> On Wed, Sep 23, 2015 at 06:06:21PM +0200, Borislav Petkov wrote:
>>> On Wed, Sep 23, 2015 at 04:44:50PM +0200, Daniel Vetter wrote:
 sorry I sprinkled the locking stuff in the wrong places. Still confused
 why the resume side doesn't blow up anywhere
>>>
>>> But it does:

> 
> Ok, I bisected it.
> 
> First of all, Daniel, you didn't see the resume side blow up because
> of the NULL ptr deref f*cking up the box much earlier. Once I reverted
> the bad commit by hand (it wouldn't revert cleanly) the resume splats
> showed.
> 
> And in talking about the bad commit, it is this one:
> 
> 991de2e59090e55c65a7f59a049142e3c480f7bd is the first bad commit
> commit 991de2e59090e55c65a7f59a049142e3c480f7bd
> Author: Jiang Liu 
> Date:   Wed Jun 10 16:54:59 2015 +0800
> 
> PCI, x86: Implement pcibios_alloc_irq() and pcibios_free_irq()
> 
> To support IOAPIC hotplug, we need to allocate PCI IRQ resources on demand
> and free them when not used anymore.
> 
> Implement pcibios_alloc_irq() and pcibios_free_irq() to dynamically
> allocate and free PCI IRQs.
> 
> Remove mp_should_keep_irq(), which is no longer used.
> 
> [bhelgaas: changelog]
> Signed-off-by: Jiang Liu 
> Signed-off-by: Bjorn Helgaas 
> Acked-by: Thomas Gleixner 
> 
> :04 04 765e2d5232d53247ec260b34b51589c3bccb36ae 
> f680234a27685e94b1a35ae2a7218f8eafa9071a M  arch
> :04 04 d55a682bcde72682e883365e88ad1df6186fd54d 
> f82c470a04a6845fcf5e0aa934512c75628f798d M  drivers
> 
> Jiang, you have to stop breaking my box with your changes. This is
> maybe the third time I'm bisecting fallout from your patches. If you're
> touching all x86, you need to test on an AMD box too. Like everyone else
> testing on the hardware their changes affect. It is that simple.
Hi Boris and Daniel,
Sorry for the regression!
I have tried to reproduce the regression by doing
suspend/resume with a laptop, but failed. The PCI MSI suspend/resume
code work as expected. And I have checked msi.c and radeon driver,
but haven't gotten any hint about the cause.
So could you please help to apply the attached debug patch
to gather more information about the regression?
Thanks!
Gerry

> 
> Anyway, reverting that commit by hand fixes my resume splat.
> 
> Here's the partial revert I did by hand:
> 
> ---
> diff --git a/arch/x86/include/asm/pci_x86.h b/arch/x86/include/asm/pci_x86.h
> index fa1195dae425..164e3f8d3c3d 100644
> --- a/arch/x86/include/asm/pci_x86.h
> +++ b/arch/x86/include/asm/pci_x86.h
> @@ -93,6 +93,8 @@ extern raw_spinlock_t pci_config_lock;
>  extern int (*pcibios_enable_irq)(struct pci_dev *dev);
>  extern void (*pcibios_disable_irq)(struct pci_dev *dev);
>  
> +extern bool mp_should_keep_irq(struct device *dev);
> +
>  struct pci_raw_ops {
>   int (*read)(unsigned int domain, unsigned int bus, unsigned int devfn,
>   int reg, int len, u32 *val);
> diff --git a/arch/x86/pci/common.c b/arch/x86/pci/common.c
> index 09d3afc0a181..3bff24438b00 100644
> --- a/arch/x86/pci/common.c
> +++ b/arch/x86/pci/common.c
> @@ -672,20 +672,22 @@ int pcibios_add_device(struct pci_dev *dev)
>   return 0;
>  }
>  
> -int pcibios_alloc_irq(struct pci_dev *dev)
> +int pcibios_enable_device(struct pci_dev *dev, int mask)
>  {
> - return pcibios_enable_irq(dev);
> -}
> + int err;
>  
> -void pcibios_free_irq(struct pci_dev *dev)
> -{
> - if (pcibios_disable_irq)
> - pcibios_disable_irq(dev);
> + if ((err = pci_enable_resources(dev, mask)) < 0)
> + return err;
> +
> + if (!pci_dev_msi_enabled(dev))
> + return pcibios_enable_irq(dev);
> + return 0;
>  }
>  
> -int pcibios_enable_device(struct pci_dev *dev, int mask)
> +void pcibios_disable_device (struct pci_dev *dev)
>  {
> - return pci_enable_resources(dev, mask);
> + if (!pci_dev_msi_enabled(dev) && pcibios_disable_irq)
> + pcibios_disable_irq(dev);
>  }
>  
>  int pci_ext_cfg_avail(void)
> diff --git a/arch/x86/pci/irq.c b/arch/x86/pci/irq.c
> index 32e70343e6fd..f229834b36d4 100644
> --- a/arch/x86/pci/irq.c
> +++ b/arch/x86/pci/irq.c
> @@ -1186,6 +1186,18 @@ void pcibios_penalize_isa_irq(int irq, int active)
>   pirq_penalize_isa_irq(irq, active);
>  }
>  
> +bool mp_should_keep_irq(struct device *dev)
> +{
> + if (dev->power.is_prepared)
> + return true;
> +#ifdef CONFIG_PM
> + if (dev->power.runtime_status == RPM_SUSPENDING)
> + return true;
> +#endif
> +
> + return false;
> +}
> +
>  static int pirq_enable_irq(struct pci_dev *dev)
>  {
>   u8 pin = 0;
> @@ -1258,7 +1270,8 @@ static int pirq_enable_irq(struct pci_dev *dev)
>  
>  static void pirq_disable_irq(struct pci_dev *dev)
>  {
> - if (io_apic_assign_pci_irqs && pci_has_managed_irq(dev)) 

WARNING: CPU: 4 PID: 863 at include/drm/drm_crtc.h:1577 drm_helper_choose_encoder_dpms+0x88/0x90() - evildoer found and neutralized

2015-09-29 Thread Borislav Petkov
On Tue, Sep 29, 2015 at 04:50:36PM +0800, Jiang Liu wrote:
> So could you please help to apply the attached debug patch to gather
> more information about the regression?

Sure, just did.

I'm sending you a full s/r cycle attempt caught over serial in a private
message.

Thanks.

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.
--


WARNING: CPU: 4 PID: 863 at include/drm/drm_crtc.h:1577 drm_helper_choose_encoder_dpms+0x88/0x90() - evildoer found and neutralized

2015-09-26 Thread Borislav Petkov
On Wed, Sep 23, 2015 at 06:18:39PM +0200, Borislav Petkov wrote:
> On Wed, Sep 23, 2015 at 06:06:21PM +0200, Borislav Petkov wrote:
> > On Wed, Sep 23, 2015 at 04:44:50PM +0200, Daniel Vetter wrote:
> > > sorry I sprinkled the locking stuff in the wrong places. Still confused
> > > why the resume side doesn't blow up anywhere
> > 
> > But it does:
> > 
> > [   69.394204] BUG: unable to handle kernel NULL pointer dereference at 
> > 0034
> > [   69.402080] IP: [] pci_restore_msi_state+0x196/0x240
> > [   69.408624] PGD 4162b8067 PUD 416581067 PMD 0 
> > [   69.413122] Oops:  [#1] PREEMPT SMP 
> > [   69.417101] Modules linked in: tun sha256_ssse3 sha256_generic drbg 
> > binfmt_misc ipv6 vfat fat fuse dm_crypt dm_mod kv
> > m_amd kvm crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper 
> > ablk_helper cryptd amd64_edac_mod edac_mce_amd fa
> > m15h_power k10temp amdkfd amd_iommu_v2 radeon acpi_cpufreq
> > [   69.443647] CPU: 4 PID: 814 Comm: kworker/u16:5 Not tainted 4.3.0-rc2+ #3
> > [   69.450430] Hardware name: To be filled by O.E.M. To be filled by 
> > O.E.M./M5A97 EVO R2.0, BIOS 1503 01/16/2013
> > [   69.460336] Workqueue: events_unbound async_run_entry_fn
> > [   69.465667] task: 88042a255f00 ti: 880428a68000 task.ti: 
> > 880428a68000
> > [   69.473145] RIP: 0010:[]  [] 
> > pci_restore_msi_state+0x196/0x240
> > [   69.482131] RSP: 0018:880428a6bc28  EFLAGS: 00010286
> > [   69.487436] RAX:  RBX: 88042a308000 RCX: 
> > 
> > [   69.494568] RDX: 0001 RSI: 81304448 RDI: 
> > 816c7a1b
> > [   69.501700] RBP: 880428a6bc40 R08: 0001 R09: 
> > 00522000
> > [   69.508833] R10:  R11:  R12: 
> > 
> > [   69.515965] R13: 88042a3087b0 R14: 88042a308010 R15: 
> > 88042a308038
> > [   69.523097] FS:  7fc91328a700() GS:88042ce0() 
> > knlGS:
> > [   69.531185] CS:  0010 DS:  ES:  CR0: 8005003b
> > [   69.536931] CR2: 0034 CR3: 0004164c7000 CR4: 
> > 000406e0
> > [   69.544061] Stack:
> > [   69.546073]  0080002c2a3087b0  88042a308000 
> > 880428a6bc78
> > [   69.553525]  8130c141 88042a308098 88042a308000 
> > 
> > [   69.560996]  8804284e77a8 81961ef1 880428a6bc88 
> > 8130c2b8
> > [   69.568450] Call Trace:
> > [   69.571044]  [] pci_restore_state.part.18+0xf1/0x250
> > [   69.577706]  [] pci_restore_state+0x18/0x20
> > [   69.583591]  [] pci_pm_restore_noirq+0x4c/0xd0
> > [   69.589734]  [] ? pci_pm_freeze_noirq+0xf0/0xf0
> > [   69.595966]  [] dpm_run_callback+0x77/0x2a0
> > [   69.601850]  [] device_resume_noirq+0x93/0x150
> > [   69.607994]  [] async_resume_noirq+0x1d/0x50
> > [   69.613967]  [] async_run_entry_fn+0x46/0xf0
> > [   69.619939]  [] process_one_work+0x1f8/0x640
> > [   69.625910]  [] ? process_one_work+0x154/0x640
> > [   69.632054]  [] worker_thread+0x4b/0x440
> > [   69.637677]  [] ? process_one_work+0x640/0x640
> > [   69.643822]  [] kthread+0xf6/0x110
> > [   69.648927]  [] ? kthread_create_on_node+0x1f0/0x1f0
> > [   69.655591]  [] ret_from_fork+0x3f/0x70
> > [   69.661128]  [] ? kthread_create_on_node+0x1f0/0x1f0
> > [   69.667794] Code: 66 89 4d ee 0f b7 c9 e8 79 41 fe ff 48 89 df e8 d1 7a 
> > ce ff 0f b6 53 4b 8b 73 38 48 8d 4d ee 48 8b 7b 10 83 c2 02 e8 1a 31 fe ff 
> > <41> 0f b6 4c 24 34 41 8b 54 24 30 be ff ff ff ff c0 e9 04 83 e1 
> > [   69.687986] RIP  [] pci_restore_msi_state+0x196/0x240
> > [   69.694772]  RSP 
> > [   69.698412] CR2: 0034
> > [   69.701879] ---[ end trace 814dd8cc56e427ae ]---
> 
> Ok, after some quick staring, we're at __pci_restore_msi_state():
> 
> pci_read_config_word(dev, dev->msi_cap + PCI_MSI_FLAGS, );
> msi_mask_irq(entry, msi_mask(entry->msi_attrib.multi_cap),
>  entry->masked);
> 
> which is:
> 
>   .loc 1 411 0
>   movq%rbx, %rdi  # dev,
>   callarch_restore_msi_irqs   #
> .LBB1921:
> .LBB1922:
>   .loc 2 902 0
>   movzbl  75(%rbx), %edx  # dev_2(D)->msi_cap, D.31945
>   movl56(%rbx), %esi  # MEM[(const struct pci_dev *)dev_2(D)].devfn, 
> MEM[(const struct pci_dev *)dev_2(D)].devfn
>   leaq-18(%rbp), %rcx #, tmp266
>   movq16(%rbx), %rdi  # MEM[(const struct pci_dev *)dev_2(D)].bus, 
> MEM[(const struct pci_dev *)dev_2(D)].bus
>   addl$2, %edx#, D.31945
>   callpci_bus_read_config_word#
> .LBE1922:
> .LBE1921:
>   .loc 1 414 0
>   movzbl  52(%r12), %ecx  # *_85, tmp208  
> <--- faulting insn
>   movl48(%r12), %edx  # _85->D.27233.D.27231.masked, D.31946
> .LBB1923:
> .LBB1924:
>   .loc 1 176 0
>   movl$-1, %esi   #, D.31951
> 
> and that %r12 is supposed to contain struct msi_desc *entry in
>