Re: [PATCH] drm/i915: stop using GMBUS IRQs on Gen4 chips (was Re: [3.9-rc1] irq 16: nobody cared (was [3.9-rc1] very poor interrupt responses))
On Mon, 18 Mar 2013, Chris Wilson wrote: +#define HAS_GMBUS_IRQ(dev) (INTEL_INFO(dev)-gen = 5) void intel_i2c_reset(struct drm_device *dev) { struct drm_i915_private *dev_priv = dev-dev_private; I915_WRITE(dev_priv-gpio_mmio_base + GMBUS0, 0); - I915_WRITE(dev_priv-gpio_mmio_base + GMBUS4, 0); + if (HAS_GMBUS_IRQ(dev)) + I915_WRITE(dev_priv-gpio_mmio_base + GMBUS4, 0); There should not be any harm in always clearing GMBUS4, it exists on all platforms. } static void intel_i2c_quirk_set(struct drm_i915_private *dev_priv, bool enable) @@ -203,7 +205,6 @@ intel_gpio_setup(struct intel_gmbus *bus, u32 pin) algo-data = bus; } -#define HAS_GMBUS_IRQ(dev) (INTEL_INFO(dev)-gen = 4) static int gmbus_wait_hw_status(struct drm_i915_private *dev_priv, u32 gmbus2_status, @@ -214,6 +215,13 @@ gmbus_wait_hw_status(struct drm_i915_private *dev_priv, u32 gmbus2 = 0; DEFINE_WAIT(wait); + if (!HAS_GMBUS_IRQ(dev_priv-dev)) { + int ret; + ret = wait_for((gmbus2 = I915_READ(GMBUS2 + reg_offset)) + (GMBUS_SATOER | gmbus2_status), + 50); This should couple up to the normal return code that chooses the appropriate return value based on gmbus2. How about just using: if (!HAS_GMBUS_IRQ(dev_priv-dev)) gmbus4_irq_en = 0; and the existing wait loop? I explicitly wanted to avoid touching GMBUS4 register, as the real cause of the failure is not clear. But, as Yinghai Lu points out, the problem is most likely caused by interrupt disabling not working properly (see his very good point regarding DisINTx+ and INTx+ discrepancy), so zeroing the register out should work and it indeed does in my case, hence the (tested) patch below. I think it's a 3.9-rc material, and I am all open to debug this further for 3.10 so that the race is closed and gmbus irqs can be used on Gen4 platform properly. From: Jiri Kosina jkos...@suse.cz Subject: [PATCH] drm/i915: stop using GMBUS IRQs on Gen4 chips Commit 28c70f162 (drm/i915: use the gmbus irq for waits) switched to using GMBUS irqs instead of GPIO bit-banging for chipset generations 4 and above. It turns out though that on many systems this leads to spurious interrupts being generated, long after the register write to disable the IRQs has been issued. Flushing of the register writes by POSTING_READ() directly after the register write doesn't work either. Disable using of GMBUS IRQs on Gen4 systems before the root cause is found and revert back to old behavior. Signed-off-by: Jiri Kosina jkos...@suse.cz --- drivers/gpu/drm/i915/intel_i2c.c |4 +++- 1 files changed, 3 insertions(+), 1 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_i2c.c b/drivers/gpu/drm/i915/intel_i2c.c index acf8aec..9934724 100644 --- a/drivers/gpu/drm/i915/intel_i2c.c +++ b/drivers/gpu/drm/i915/intel_i2c.c @@ -203,7 +203,7 @@ intel_gpio_setup(struct intel_gmbus *bus, u32 pin) algo-data = bus; } -#define HAS_GMBUS_IRQ(dev) (INTEL_INFO(dev)-gen = 4) +#define HAS_GMBUS_IRQ(dev) (INTEL_INFO(dev)-gen = 5) static int gmbus_wait_hw_status(struct drm_i915_private *dev_priv, u32 gmbus2_status, @@ -214,6 +214,8 @@ gmbus_wait_hw_status(struct drm_i915_private *dev_priv, u32 gmbus2 = 0; DEFINE_WAIT(wait); + if (!HAS_GMBUS_IRQ(dev_priv-dev)) + gmbus4_irq_en = 0; /* Important: The hw handles only the first bit, so set only one! Since * we also need to check for NAKs besides the hw ready/idle signal, we * need to wake up periodically and check that ourselves. */ -- Jiri Kosina SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] drm/i915: stop using GMBUS IRQs on Gen4 chips (was Re: [3.9-rc1] irq 16: nobody cared (was [3.9-rc1] very poor interrupt responses))
On Tue, Mar 19, 2013 at 09:56:57AM +0100, Jiri Kosina wrote: On Mon, 18 Mar 2013, Chris Wilson wrote: +#define HAS_GMBUS_IRQ(dev) (INTEL_INFO(dev)-gen = 5) void intel_i2c_reset(struct drm_device *dev) { struct drm_i915_private *dev_priv = dev-dev_private; I915_WRITE(dev_priv-gpio_mmio_base + GMBUS0, 0); - I915_WRITE(dev_priv-gpio_mmio_base + GMBUS4, 0); + if (HAS_GMBUS_IRQ(dev)) + I915_WRITE(dev_priv-gpio_mmio_base + GMBUS4, 0); There should not be any harm in always clearing GMBUS4, it exists on all platforms. } static void intel_i2c_quirk_set(struct drm_i915_private *dev_priv, bool enable) @@ -203,7 +205,6 @@ intel_gpio_setup(struct intel_gmbus *bus, u32 pin) algo-data = bus; } -#define HAS_GMBUS_IRQ(dev) (INTEL_INFO(dev)-gen = 4) static int gmbus_wait_hw_status(struct drm_i915_private *dev_priv, u32 gmbus2_status, @@ -214,6 +215,13 @@ gmbus_wait_hw_status(struct drm_i915_private *dev_priv, u32 gmbus2 = 0; DEFINE_WAIT(wait); + if (!HAS_GMBUS_IRQ(dev_priv-dev)) { + int ret; + ret = wait_for((gmbus2 = I915_READ(GMBUS2 + reg_offset)) + (GMBUS_SATOER | gmbus2_status), + 50); This should couple up to the normal return code that chooses the appropriate return value based on gmbus2. How about just using: if (!HAS_GMBUS_IRQ(dev_priv-dev)) gmbus4_irq_en = 0; and the existing wait loop? I explicitly wanted to avoid touching GMBUS4 register, as the real cause of the failure is not clear. But, as Yinghai Lu points out, the problem is most likely caused by interrupt disabling not working properly (see his very good point regarding DisINTx+ and INTx+ discrepancy), so zeroing the register out should work and it indeed does in my case, hence the (tested) patch below. I think it's a 3.9-rc material, and I am all open to debug this further for 3.10 so that the race is closed and gmbus irqs can be used on Gen4 platform properly. Agreed. Using the IRQ for GMBUS is just a performance feature that can be deferred until after we determine the root cause - and hope that the failure is somehow peculiar to GMBUS. Acked-by: Chris Wilson ch...@chris-wilson.co.uk -Chris -- Chris Wilson, Intel Open Source Technology Centre -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
gm45 intel gfx can generate non-MSI irq# in MSI mode (was Re: [PATCH] drm/i915: stop using GMBUS IRQs on Gen4 chips (was Re: [3.9-rc1] irq 16: nobody cared (was [3.9-rc1] very poor interrupt responses
On Tue, Mar 19, 2013 at 10:03 AM, Chris Wilson ch...@chris-wilson.co.uk wrote: How about just using: if (!HAS_GMBUS_IRQ(dev_priv-dev)) gmbus4_irq_en = 0; and the existing wait loop? I explicitly wanted to avoid touching GMBUS4 register, as the real cause of the failure is not clear. But, as Yinghai Lu points out, the problem is most likely caused by interrupt disabling not working properly (see his very good point regarding DisINTx+ and INTx+ discrepancy), so zeroing the register out should work and it indeed does in my case, hence the (tested) patch below. I think it's a 3.9-rc material, and I am all open to debug this further for 3.10 so that the race is closed and gmbus irqs can be used on Gen4 platform properly. Agreed. Using the IRQ for GMBUS is just a performance feature that can be deferred until after we determine the root cause - and hope that the failure is somehow peculiar to GMBUS. Ok, I've merged this patch. But some further investigation points at a much more severe dragon hiding here: The MSI interrupt for the intel gfx is commonly in the 40+ range, but the interrupt vector with the spurious interrupts is 16. Which is the irq of the intel gfx when MSI is disabled! So it looks like gmbus on the intel gfx is capable of generating non-MSI interrupts in parallel to the MSI interrupts (since apparently gmbus still works, so we get the interrupts we expect). I have no idea how that could happen. Hence adding a bunch of people with more clue than me. For reference below the updated commit message. Cheers, Daniel Author: Jiri Kosina jkos...@suse.cz Date: Tue Mar 19 09:56:57 2013 +0100 drm/i915: stop using GMBUS IRQs on Gen4 chips Commit 28c70f162 (drm/i915: use the gmbus irq for waits) switched to using GMBUS irqs instead of GPIO bit-banging for chipset generations 4 and above. It turns out though that on many systems this leads to spurious interrupts being generated, long after the register write to disable the IRQs has been issued. Typically this results in the spurious interrupt source getting disabled: [9.636345] irq 16: nobody cared (try booting with the irqpoll option) [9.637915] Pid: 4157, comm: ifup Tainted: GF 3.9.0-rc2-00341-g0863702 #422 [9.639484] Call Trace: [9.640731] IRQ [8109b40d] __report_bad_irq+0x1d/0xc7 [9.640731] [8109b7db] note_interrupt+0x15b/0x1e8 [9.640731] [810999f7] handle_irq_event_percpu+0x1bf/0x214 [9.640731] [81099a88] handle_irq_event+0x3c/0x5c [9.640731] [8109c139] handle_fasteoi_irq+0x7a/0xb0 [9.640731] [8100400e] handle_irq+0x1a/0x24 [9.640731] [81003d17] do_IRQ+0x48/0xaf [9.640731] [8142f1ea] common_interrupt+0x6a/0x6a [9.640731] EOI [8142f952] ? system_call_fastpath+0x16/0x1b [9.640731] handlers: [9.640731] [a000d771] usb_hcd_irq [usbcore] [9.640731] [a0306189] yenta_interrupt [yenta_socket] [9.640731] Disabling IRQ #16 The really curious thing is now that irq 16 is _not_ the interrupt for the i915 driver when using MSI, but it _is_ the interrupt when not using MSI. So by all indications it seems like gmbus is able to generate a legacy (shared) interrupt in MSI mode on some configurations. I've tried to reproduce this and the differentiating thing seems to be that on unaffected systems no other device uses irq 16 (which seems to be the non-MSI intel gfx interrupt on all gm45). I have no idea how that even can happen. To avoid tempting this elephant into a rage, just disable gmbus interrupt support on gen 4. v2: Improve the commit message with exact details of what's going on. Also add a comment in the code to warn against this particular elephant in the room. Signed-off-by: Jiri Kosina jkos...@suse.cz (v1) Acked-by: Chris Wilson ch...@chris-wilson.co.uk (v1) References: https://lkml.org/lkml/2013/3/8/325 Signed-off-by: Daniel Vetter daniel.vet...@ffwll.ch -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [3.9-rc1] irq 16: nobody cared (was [3.9-rc1] very poor interrupt responses)
My laptop is an Acer 1810T. I see this error message each boot. Kind regards Thomas Jiri Kosina jkos...@suse.cz schrieb: On Fri, 15 Mar 2013, Jiri Kosina wrote: I have the same problem on my Lenovo T500. I think the graphics card is involved. This laptop has hybrid graphics - one Intel GMA 4500MHD and one ATI Mobility Radeon HD 3650. When I boot with the Intel card, I get irq 16: nobody cared during boot, not when I boot with the ATI card. Confirming this. After a lot of hassle, I have bisected this reliably to commit 28c70f162a315bdcfbe0bf940a740ef8bfb918d6 Author: Daniel Vetter daniel.vet...@ffwll.ch Date: Sat Dec 1 13:53:45 2012 +0100 drm/i915: use the gmbus irq for waits Adding Daniel, Imre and Daniel to CC while I will try to figure out what's happening in parallel. Attaching dmesg.txt from the machine with 28c70f162a as head, with drm.debug=0xe. Just a datapoint -- I have put a trivial debugging patch in place, and it reveals that nobody cared for irq 16 happens long after last I915_WRITE(GMBUS4 + reg_offset, 0); has been performed in gmbus_wait_hw_status(). On the other hand, if I comment out both GMBUS4 register offset writes in gmbus_wait_hw_status(), then it of course falls back to GPIO bit-banging, but the nobody cared for irq 16 is gone. So it seems like something gets severely confused by the I915_WRITE to GMBUS4 + reg_offset. So far this seems to have been reported solely on Lenovos as far as I can see (although a completely different types), so it might be some platform-specific quirk? Honestly, I still don't understand how all the GMBUS stuff relates to IRQ 16 at all. -- Jiri Kosina SUSE Labs
Re: [3.9-rc1] irq 16: nobody cared (was [3.9-rc1] very poor interrupt responses)
On Fri, Mar 15, 2013 at 08:47:39AM -0700, Greg KH wrote: On Fri, Mar 15, 2013 at 04:37:56PM +0100, Jiri Kosina wrote: On Fri, 15 Mar 2013, Greg KH wrote: I have the same problem on my Lenovo T500. I think the graphics card is involved. This laptop has hybrid graphics - one Intel GMA 4500MHD and one ATI Mobility Radeon HD 3650. When I boot with the Intel card, I get irq 16: nobody cared during boot, not when I boot with the ATI card. Confirming this. After a lot of hassle, I have bisected this reliably to commit 28c70f162a315bdcfbe0bf940a740ef8bfb918d6 Author: Daniel Vetter daniel.vet...@ffwll.ch Date: Sat Dec 1 13:53:45 2012 +0100 drm/i915: use the gmbus irq for waits Adding Daniel, Imre and Daniel to CC while I will try to figure out what's happening in parallel. Wasn't this fixed by the merge from David (2cc79544bd0aabb4b3cf467ead5df526d9134c64)? Why do you think it should, please? The line: - Fix PCH irq handling race which resulted in missed gmbus/dp aux irqs and subsequent fallout (Paulo) (I am seeing this with a2362d247 still). Ok, I guess it isn't still fixed properly, just was guessing :) Yeah, the above fix is for pch split platforms, whereas these reports here are for gm45 (which doesn't have the pch display split). Acking of gmbus interrupts works differently on those, I'm testing right now whether I can reproduce this fail. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [3.9-rc1] irq 16: nobody cared (was [3.9-rc1] very poor interrupt responses)
On Fri, 15 Mar 2013, Yinghai Lu wrote: Just a datapoint -- I have put a trivial debugging patch in place, and it reveals that nobody cared for irq 16 happens long after last I915_WRITE(GMBUS4 + reg_offset, 0); has been performed in gmbus_wait_hw_status(). On the other hand, if I comment out both GMBUS4 register offset writes in gmbus_wait_hw_status(), then it of course falls back to GPIO bit-banging, but the nobody cared for irq 16 is gone. So it seems like something gets severely confused by the I915_WRITE to GMBUS4 + reg_offset. So far this seems to have been reported solely on Lenovos as far as I can see (although a completely different types), so it might be some platform-specific quirk? Honestly, I still don't understand how all the GMBUS stuff relates to IRQ 16 at all. that device is using i915 :00:02.0: irq 44 for MSI/MSI-X so can you try to boot with pci=nomsi? Yes, switching from MSI to IO-APIC-fasteoi makes the report about lost interrupts go away. My understanding from the other mail is that DAniel Vetter already has an idea what might be going wrong with IRQ acking on GM45 chipsets; hopefully this datapoint regarding MSI will fit into it. -- Jiri Kosina SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] drm/i915: stop using GMBUS IRQs on Gen4 chips (was Re: [3.9-rc1] irq 16: nobody cared (was [3.9-rc1] very poor interrupt responses))
Okay, so I think that for 3.9 we want the patch below, and if eventually hardware root cause / workaround is found for GM45, we can have it merged later. From: Jiri Kosina jkos...@suse.cz Subject: [PATCH] drm/i915: stop using GMBUS IRQs on Gen4 chips Commit 28c70f162 (drm/i915: use the gmbus irq for waits) switched to using GMBUS irqs instead of GPIO bit-banging for chipset generations 4 and above. It turns out though that on many systems this leads to spurious interrupts being generated, long after the register write to disable the IRQs has been issued. Flushing of the register writes by POSTING_READ() directly after the register write doesn't work either. Disable using of GMBUS IRQs on Gen4 systems before the root cause is found and revert back to old behavior. Also be more careful about not issuing GMBUS4 register reads in gmbus_wait_hw_status() if we are not using GMBUS IRQs. Signed-off-by: Jiri Kosina jkos...@suse.cz --- drivers/gpu/drm/i915/intel_i2c.c | 12 ++-- 1 files changed, 10 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_i2c.c b/drivers/gpu/drm/i915/intel_i2c.c index acf8aec..8638036 100644 --- a/drivers/gpu/drm/i915/intel_i2c.c +++ b/drivers/gpu/drm/i915/intel_i2c.c @@ -58,12 +58,14 @@ to_intel_gmbus(struct i2c_adapter *i2c) return container_of(i2c, struct intel_gmbus, adapter); } +#define HAS_GMBUS_IRQ(dev) (INTEL_INFO(dev)-gen = 5) void intel_i2c_reset(struct drm_device *dev) { struct drm_i915_private *dev_priv = dev-dev_private; I915_WRITE(dev_priv-gpio_mmio_base + GMBUS0, 0); - I915_WRITE(dev_priv-gpio_mmio_base + GMBUS4, 0); + if (HAS_GMBUS_IRQ(dev)) + I915_WRITE(dev_priv-gpio_mmio_base + GMBUS4, 0); } static void intel_i2c_quirk_set(struct drm_i915_private *dev_priv, bool enable) @@ -203,7 +205,6 @@ intel_gpio_setup(struct intel_gmbus *bus, u32 pin) algo-data = bus; } -#define HAS_GMBUS_IRQ(dev) (INTEL_INFO(dev)-gen = 4) static int gmbus_wait_hw_status(struct drm_i915_private *dev_priv, u32 gmbus2_status, @@ -214,6 +215,13 @@ gmbus_wait_hw_status(struct drm_i915_private *dev_priv, u32 gmbus2 = 0; DEFINE_WAIT(wait); + if (!HAS_GMBUS_IRQ(dev_priv-dev)) { + int ret; + ret = wait_for((gmbus2 = I915_READ(GMBUS2 + reg_offset)) + (GMBUS_SATOER | gmbus2_status), + 50); + return ret; + } /* Important: The hw handles only the first bit, so set only one! Since * we also need to check for NAKs besides the hw ready/idle signal, we * need to wake up periodically and check that ourselves. */ -- Jiri Kosina SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] drm/i915: stop using GMBUS IRQs on Gen4 chips (was Re: [3.9-rc1] irq 16: nobody cared (was [3.9-rc1] very poor interrupt responses))
On Mon, Mar 18, 2013 at 04:56:02PM +0100, Jiri Kosina wrote: Okay, so I think that for 3.9 we want the patch below, and if eventually hardware root cause / workaround is found for GM45, we can have it merged later. From: Jiri Kosina jkos...@suse.cz Subject: [PATCH] drm/i915: stop using GMBUS IRQs on Gen4 chips Commit 28c70f162 (drm/i915: use the gmbus irq for waits) switched to using GMBUS irqs instead of GPIO bit-banging for chipset generations 4 and above. It turns out though that on many systems this leads to spurious interrupts being generated, long after the register write to disable the IRQs has been issued. Flushing of the register writes by POSTING_READ() directly after the register write doesn't work either. Disable using of GMBUS IRQs on Gen4 systems before the root cause is found and revert back to old behavior. Also be more careful about not issuing GMBUS4 register reads in gmbus_wait_hw_status() if we are not using GMBUS IRQs. Signed-off-by: Jiri Kosina jkos...@suse.cz --- drivers/gpu/drm/i915/intel_i2c.c | 12 ++-- 1 files changed, 10 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_i2c.c b/drivers/gpu/drm/i915/intel_i2c.c index acf8aec..8638036 100644 --- a/drivers/gpu/drm/i915/intel_i2c.c +++ b/drivers/gpu/drm/i915/intel_i2c.c @@ -58,12 +58,14 @@ to_intel_gmbus(struct i2c_adapter *i2c) return container_of(i2c, struct intel_gmbus, adapter); } +#define HAS_GMBUS_IRQ(dev) (INTEL_INFO(dev)-gen = 5) void intel_i2c_reset(struct drm_device *dev) { struct drm_i915_private *dev_priv = dev-dev_private; I915_WRITE(dev_priv-gpio_mmio_base + GMBUS0, 0); - I915_WRITE(dev_priv-gpio_mmio_base + GMBUS4, 0); + if (HAS_GMBUS_IRQ(dev)) + I915_WRITE(dev_priv-gpio_mmio_base + GMBUS4, 0); There should not be any harm in always clearing GMBUS4, it exists on all platforms. } static void intel_i2c_quirk_set(struct drm_i915_private *dev_priv, bool enable) @@ -203,7 +205,6 @@ intel_gpio_setup(struct intel_gmbus *bus, u32 pin) algo-data = bus; } -#define HAS_GMBUS_IRQ(dev) (INTEL_INFO(dev)-gen = 4) static int gmbus_wait_hw_status(struct drm_i915_private *dev_priv, u32 gmbus2_status, @@ -214,6 +215,13 @@ gmbus_wait_hw_status(struct drm_i915_private *dev_priv, u32 gmbus2 = 0; DEFINE_WAIT(wait); + if (!HAS_GMBUS_IRQ(dev_priv-dev)) { + int ret; + ret = wait_for((gmbus2 = I915_READ(GMBUS2 + reg_offset)) + (GMBUS_SATOER | gmbus2_status), + 50); This should couple up to the normal return code that chooses the appropriate return value based on gmbus2. How about just using: if (!HAS_GMBUS_IRQ(dev_priv-dev)) gmbus4_irq_en = 0; and the existing wait loop? + return ret; + } /* Important: The hw handles only the first bit, so set only one! Since * we also need to check for NAKs besides the hw ready/idle signal, we * need to wake up periodically and check that ourselves. */ -- Chris Wilson, Intel Open Source Technology Centre -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [3.9-rc1] irq 16: nobody cared (was [3.9-rc1] very poor interrupt responses)
On Mon, Mar 18, 2013 at 2:12 AM, Jiri Kosina jkos...@suse.cz wrote: On Fri, 15 Mar 2013, Yinghai Lu wrote: Just a datapoint -- I have put a trivial debugging patch in place, and it reveals that nobody cared for irq 16 happens long after last I915_WRITE(GMBUS4 + reg_offset, 0); has been performed in gmbus_wait_hw_status(). On the other hand, if I comment out both GMBUS4 register offset writes in gmbus_wait_hw_status(), then it of course falls back to GPIO bit-banging, but the nobody cared for irq 16 is gone. So it seems like something gets severely confused by the I915_WRITE to GMBUS4 + reg_offset. So far this seems to have been reported solely on Lenovos as far as I can see (although a completely different types), so it might be some platform-specific quirk? Honestly, I still don't understand how all the GMBUS stuff relates to IRQ 16 at all. that device is using i915 :00:02.0: irq 44 for MSI/MSI-X so can you try to boot with pci=nomsi? Yes, switching from MSI to IO-APIC-fasteoi makes the report about lost interrupts go away. My understanding from the other mail is that DAniel Vetter already has an idea what might be going wrong with IRQ acking on GM45 chipsets; hopefully this datapoint regarding MSI will fit into it. What is /proc/interrupts difference between with and without pci=nomsi ? drm is forced to share irq 16? Thanks Yinghai -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [3.9-rc1] irq 16: nobody cared (was [3.9-rc1] very poor interrupt responses)
On Mon, Mar 18, 2013 at 10:12:49AM +0100, Jiri Kosina wrote: On Fri, 15 Mar 2013, Yinghai Lu wrote: Just a datapoint -- I have put a trivial debugging patch in place, and it reveals that nobody cared for irq 16 happens long after last I915_WRITE(GMBUS4 + reg_offset, 0); has been performed in gmbus_wait_hw_status(). On the other hand, if I comment out both GMBUS4 register offset writes in gmbus_wait_hw_status(), then it of course falls back to GPIO bit-banging, but the nobody cared for irq 16 is gone. So it seems like something gets severely confused by the I915_WRITE to GMBUS4 + reg_offset. So far this seems to have been reported solely on Lenovos as far as I can see (although a completely different types), so it might be some platform-specific quirk? Honestly, I still don't understand how all the GMBUS stuff relates to IRQ 16 at all. that device is using i915 :00:02.0: irq 44 for MSI/MSI-X so can you try to boot with pci=nomsi? Yes, switching from MSI to IO-APIC-fasteoi makes the report about lost interrupts go away. My understanding from the other mail is that DAniel Vetter already has an idea what might be going wrong with IRQ acking on GM45 chipsets; hopefully this datapoint regarding MSI will fit into it. Yep, there's a big comment in the irq handler for that chipset that we have a gaping race with when using MSI interrupts. Although the comment bodly claims that the race is small enough to avoid the dreaded nobody cared message. Looks like gmbus is good at hitting that race - on newer chips it already brought up a similar race in handling pch interrupts. Can you please give the below patch a whirl? It removes the probably race msi race avoidance code and replaces it with the same trick Paulo used to fix pch irq handling races. Thanks, Daniel --- diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index 3c7bb04..13de12e 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -2684,7 +2684,7 @@ static irqreturn_t i965_irq_handler(int irq, void *arg) { struct drm_device *dev = (struct drm_device *) arg; drm_i915_private_t *dev_priv = (drm_i915_private_t *) dev-dev_private; - u32 iir, new_iir; + u32 iir, new_iir, ier; u32 pipe_stats[I915_MAX_PIPES]; unsigned long irqflags; int irq_received; @@ -2692,9 +2692,14 @@ static irqreturn_t i965_irq_handler(int irq, void *arg) atomic_inc(dev_priv-irq_received); + /* irq race avoidance, copypasta from Paulo's PCH irq fix */ + ier = I915_READ(IER); + I915_WRITE(IER, 0); + POSTING_READ(IER); + iir = I915_READ(IIR); - for (;;) { + do { bool blc_event = false; irq_received = iir != 0; @@ -2792,7 +2797,10 @@ static irqreturn_t i965_irq_handler(int irq, void *arg) * stray interrupts. */ iir = new_iir; - } + } while (0); + + I915_WRITE(IER, ier); + POSTING_READ(IER); i915_update_dri1_breadcrumb(dev); -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [3.9-rc1] irq 16: nobody cared (was [3.9-rc1] very poor interrupt responses)
On Mon, Mar 18, 2013 at 08:19:03PM +0100, Daniel Vetter wrote: On Mon, Mar 18, 2013 at 10:12:49AM +0100, Jiri Kosina wrote: On Fri, 15 Mar 2013, Yinghai Lu wrote: Just a datapoint -- I have put a trivial debugging patch in place, and it reveals that nobody cared for irq 16 happens long after last I915_WRITE(GMBUS4 + reg_offset, 0); has been performed in gmbus_wait_hw_status(). On the other hand, if I comment out both GMBUS4 register offset writes in gmbus_wait_hw_status(), then it of course falls back to GPIO bit-banging, but the nobody cared for irq 16 is gone. So it seems like something gets severely confused by the I915_WRITE to GMBUS4 + reg_offset. So far this seems to have been reported solely on Lenovos as far as I can see (although a completely different types), so it might be some platform-specific quirk? Honestly, I still don't understand how all the GMBUS stuff relates to IRQ 16 at all. that device is using i915 :00:02.0: irq 44 for MSI/MSI-X so can you try to boot with pci=nomsi? Yes, switching from MSI to IO-APIC-fasteoi makes the report about lost interrupts go away. My understanding from the other mail is that DAniel Vetter already has an idea what might be going wrong with IRQ acking on GM45 chipsets; hopefully this datapoint regarding MSI will fit into it. Yep, there's a big comment in the irq handler for that chipset that we have a gaping race with when using MSI interrupts. Although the comment bodly claims that the race is small enough to avoid the dreaded nobody cared message. Looks like gmbus is good at hitting that race - on newer chips it already brought up a similar race in handling pch interrupts. Can you please give the below patch a whirl? It removes the probably race msi race avoidance code and replaces it with the same trick Paulo used to fix pch irq handling races. Still nobody cares about irq16. -Chris -- Chris Wilson, Intel Open Source Technology Centre -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [3.9-rc1] irq 16: nobody cared (was [3.9-rc1] very poor interrupt responses)
On Mon, 18 Mar 2013, Daniel Vetter wrote: Yep, there's a big comment in the irq handler for that chipset that we have a gaping race with when using MSI interrupts. Although the comment bodly claims that the race is small enough to avoid the dreaded nobody cared message. Looks like gmbus is good at hitting that race - on newer chips it already brought up a similar race in handling pch interrupts. I see ... will target my focus in that direction, thanks. Can you please give the below patch a whirl? It removes the probably race msi race avoidance code and replaces it with the same trick Paulo used to fix pch irq handling races. Unfortunately it didn't change anything, the spurious interrupt report is still there. -- Jiri Kosina SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [3.9-rc1] irq 16: nobody cared (was [3.9-rc1] very poor interrupt responses)
On Mon, 18 Mar 2013, Yinghai Lu wrote: Yes, switching from MSI to IO-APIC-fasteoi makes the report about lost interrupts go away. My understanding from the other mail is that DAniel Vetter already has an idea what might be going wrong with IRQ acking on GM45 chipsets; hopefully this datapoint regarding MSI will fit into it. What is /proc/interrupts difference between with and without pci=nomsi ? drm is forced to share irq 16? Yup, IRQ 16 is being shared, and one of the owners is i915. -- Jiri Kosina SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [3.9-rc1] irq 16: nobody cared (was [3.9-rc1] very poor interrupt responses)
On Mon, Mar 18, 2013 at 3:05 PM, Jiri Kosina jkos...@suse.cz wrote: On Mon, 18 Mar 2013, Yinghai Lu wrote: Yes, switching from MSI to IO-APIC-fasteoi makes the report about lost interrupts go away. My understanding from the other mail is that DAniel Vetter already has an idea what might be going wrong with IRQ acking on GM45 chipsets; hopefully this datapoint regarding MSI will fit into it. What is /proc/interrupts difference between with and without pci=nomsi ? drm is forced to share irq 16? Yup, IRQ 16 is being shared, and one of the owners is i915. the vga report strange INTx status... 00:02.0 VGA compatible controller: Intel Corporation Mobile 4 Series Chipset Integrated Graphics Controller (rev 07) (prog-if 00 [VGA controller]) Subsystem: Lenovo Device 20e4 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast TAbort- TAbort- MAbort- SERR- PERR- INTx+ Latency: 0 Interrupt: pin A routed to IRQ 44 Region 0: Memory at f200 (64-bit, non-prefetchable) [size=4M] Region 2: Memory at d000 (64-bit, prefetchable) [size=256M] Region 4: I/O ports at 1800 [size=8] Expansion ROM at unassigned [disabled] Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit- Address: fee0100c Data: 4142 Capabilities: [d0] Power Management version 3 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Kernel driver in use: i915 Kernel modules: i915 it should be INTx-, after we have set DisINTx+ in control. So INTx can not be disabled after it get enabled before ? the VGA on my T420 looks right. 00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09) (prog-if 00 [VGA controller]) Subsystem: Lenovo Device 21ce Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast TAbort- TAbort- MAbort- SERR- PERR- INTx- Thanks Yinghai -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [3.9-rc1] irq 16: nobody cared (was [3.9-rc1] very poor interrupt responses)
On Thu, 14 Mar 2013, Rafael J. Wysocki wrote: commit 181380b702eee1a9aca51354d7b87c7b08541fcf Author: Yinghai Lu ying...@kernel.org Date: Sat Feb 16 11:58:34 2013 -0700 PCI/ACPI: Don't cache _PRT, and don't associate them with bus numbers This patch __fixed__ this problem for me in linux-next back in February. Rafael, did you hold back some ACPI patches from 3.9 that would have made fix no longer applicable? No, I didn't. I'm afraid, though, that the fix might not be effective on some systems for a reason that's unclear at the moment. So in fact the one to check is commit 4f535093cf (PCI: Put pci_dev in device tree as early as possible) and if the problem doesn't appear before that, we need to figure out why the fix may not be sufficient. With either 4f535093cf or 181380b702 I do *not* see the problem, i.e. these commits are not the culprit and it was caused by some later change. I will proceed with bisect now, hopefully it'll produce a meaningful result. -- Jiri Kosina SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [3.9-rc1] irq 16: nobody cared (was [3.9-rc1] very poor interrupt responses)
I have the same problem on my Lenovo T500. I think the graphics card is involved. This laptop has hybrid graphics - one Intel GMA 4500MHD and one ATI Mobility Radeon HD 3650. When I boot with the Intel card, I get irq 16: nobody cared during boot, not when I boot with the ATI card. And /proc/interrupts are surely different with the two cards. Look at the irq 16 line: $ cat intel-interrupts.txt CPU0 CPU1 0: 23658 22859 IO-APIC-edge timer 1:168177 IO-APIC-edge i8042 8: 1 0 IO-APIC-edge rtc0 9:329347 IO-APIC-fasteoi acpi 12: 3065 3166 IO-APIC-edge i8042 16: 49732 50269 IO-APIC-fasteoi yenta, uhci_hcd:usb6 17: 1 0 IO-APIC-fasteoi firewire_ohci, uhci_hcd:usb7 18: 0 0 IO-APIC-fasteoi mmc0, uhci_hcd:usb8 19:216204 IO-APIC-fasteoi ehci_hcd:usb2 20: 0 0 IO-APIC-fasteoi uhci_hcd:usb3 21:114103 IO-APIC-fasteoi uhci_hcd:usb4 22: 0 0 IO-APIC-fasteoi uhci_hcd:usb5 23: 9 9 IO-APIC-fasteoi i801_smbus, ehci_hcd:usb1 40: 0 0 DMAR_MSI-edge dmar2 41: 0 0 DMAR_MSI-edge dmar0 42: 0 0 DMAR_MSI-edge dmar3 43: 0 0 PCI-MSI-edge PCIe PME 44: 0 0 PCI-MSI-edge PCIe PME 45: 0 0 PCI-MSI-edge PCIe PME 46: 0 0 PCI-MSI-edge PCIe PME 47: 10023 10173 PCI-MSI-edge ahci 48: 10 8 PCI-MSI-edge mei 49: 22 30 PCI-MSI-edge eth0 50: 66 71 PCI-MSI-edge i915 51: 2508 2348 PCI-MSI-edge iwlwifi 52:168169 PCI-MSI-edge snd_hda_intel NMI: 17 17 Non-maskable interrupts LOC: 27988 25243 Local timer interrupts SPU: 0 0 Spurious interrupts PMI: 17 17 Performance monitoring interrupts IWI: 0 0 IRQ work interrupts RTR: 0 0 APIC ICR read retries RES: 4584 2746 Rescheduling interrupts CAL: 6178 7492 Function call interrupts TLB:702651 TLB shootdowns TRM: 0 0 Thermal event interrupts THR: 0 0 Threshold APIC interrupts MCE: 0 0 Machine check exceptions MCP: 1 1 Machine check polls ERR: 0 MIS: 0 $ cat ati-interrupts.txt CPU0 CPU1 0: 15488 15268 IO-APIC-edge timer 1:182189 IO-APIC-edge i8042 8: 1 0 IO-APIC-edge rtc0 9:328339 IO-APIC-fasteoi acpi 12: 2071 1997 IO-APIC-edge i8042 16: 55 47 IO-APIC-fasteoi yenta, uhci_hcd:usb4 17: 1 1 IO-APIC-fasteoi firewire_ohci, uhci_hcd:usb5 18: 0 0 IO-APIC-fasteoi uhci_hcd:usb6, mmc0 19:219202 IO-APIC-fasteoi ehci_hcd:usb8 20: 0 0 IO-APIC-fasteoi uhci_hcd:usb1 21:112104 IO-APIC-fasteoi uhci_hcd:usb2 22: 0 0 IO-APIC-fasteoi uhci_hcd:usb3 23: 10 8 IO-APIC-fasteoi i801_smbus, ehci_hcd:usb7 40: 0 0 DMAR_MSI-edge dmar1 41: 0 0 DMAR_MSI-edge dmar0 42: 0 0 DMAR_MSI-edge dmar2 43: 0 0 PCI-MSI-edge PCIe PME 44: 0 0 PCI-MSI-edge PCIe PME 45: 0 0 PCI-MSI-edge PCIe PME 46: 0 0 PCI-MSI-edge PCIe PME 47: 0 0 PCI-MSI-edge PCIe PME 48: 9733 9932 PCI-MSI-edge ahci 49: 9 9 PCI-MSI-edge mei 50: 2308 2196 PCI-MSI-edge iwlwifi 51: 15 35 PCI-MSI-edge eth0 52:818815 PCI-MSI-edge radeon 53:167167 PCI-MSI-edge snd_hda_intel NMI: 17 16 Non-maskable interrupts LOC: 18139 34223 Local timer interrupts SPU: 0 0 Spurious interrupts PMI: 17 16 Performance monitoring interrupts IWI: 0 0 IRQ work interrupts RTR: 0 0 APIC ICR read retries RES: 3788 3563 Rescheduling interrupts CAL: 6303 5894 Function call interrupts TLB:711711 TLB shootdowns TRM: 0 0 Thermal event interrupts THR: 0 0 Threshold APIC interrupts MCE: 0 0 Machine check exceptions MCP: 1 1 Machine check polls ERR: 0 MIS: 0 -- Hilsen Harald -- To unsubscribe from
Re: [3.9-rc1] irq 16: nobody cared (was [3.9-rc1] very poor interrupt responses)
On Fri, 15 Mar 2013, Harald Arnesen wrote: I have the same problem on my Lenovo T500. I think the graphics card is involved. This laptop has hybrid graphics - one Intel GMA 4500MHD and one ATI Mobility Radeon HD 3650. When I boot with the Intel card, I get irq 16: nobody cared during boot, not when I boot with the ATI card. Confirming this. After a lot of hassle, I have bisected this reliably to commit 28c70f162a315bdcfbe0bf940a740ef8bfb918d6 Author: Daniel Vetter daniel.vet...@ffwll.ch Date: Sat Dec 1 13:53:45 2012 +0100 drm/i915: use the gmbus irq for waits Adding Daniel, Imre and Daniel to CC while I will try to figure out what's happening in parallel. Attaching dmesg.txt from the machine with 28c70f162a as head, with drm.debug=0xe. -- Jiri Kosina SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [3.9-rc1] irq 16: nobody cared (was [3.9-rc1] very poor interrupt responses)
On Fri, 15 Mar 2013, Jiri Kosina wrote: I have the same problem on my Lenovo T500. I think the graphics card is involved. This laptop has hybrid graphics - one Intel GMA 4500MHD and one ATI Mobility Radeon HD 3650. When I boot with the Intel card, I get irq 16: nobody cared during boot, not when I boot with the ATI card. Confirming this. After a lot of hassle, I have bisected this reliably to commit 28c70f162a315bdcfbe0bf940a740ef8bfb918d6 Author: Daniel Vetter daniel.vet...@ffwll.ch Date: Sat Dec 1 13:53:45 2012 +0100 drm/i915: use the gmbus irq for waits Adding Daniel, Imre and Daniel to CC while I will try to figure out what's happening in parallel. Attaching dmesg.txt from the machine with 28c70f162a as head, with drm.debug=0xe. Just a datapoint -- I have put a trivial debugging patch in place, and it reveals that nobody cared for irq 16 happens long after last I915_WRITE(GMBUS4 + reg_offset, 0); has been performed in gmbus_wait_hw_status(). On the other hand, if I comment out both GMBUS4 register offset writes in gmbus_wait_hw_status(), then it of course falls back to GPIO bit-banging, but the nobody cared for irq 16 is gone. So it seems like something gets severely confused by the I915_WRITE to GMBUS4 + reg_offset. So far this seems to have been reported solely on Lenovos as far as I can see (although a completely different types), so it might be some platform-specific quirk? Honestly, I still don't understand how all the GMBUS stuff relates to IRQ 16 at all. -- Jiri Kosina SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [3.9-rc1] irq 16: nobody cared (was [3.9-rc1] very poor interrupt responses)
On Fri, Mar 15, 2013 at 02:33:13PM +0100, Jiri Kosina wrote: On Fri, 15 Mar 2013, Harald Arnesen wrote: I have the same problem on my Lenovo T500. I think the graphics card is involved. This laptop has hybrid graphics - one Intel GMA 4500MHD and one ATI Mobility Radeon HD 3650. When I boot with the Intel card, I get irq 16: nobody cared during boot, not when I boot with the ATI card. Confirming this. After a lot of hassle, I have bisected this reliably to commit 28c70f162a315bdcfbe0bf940a740ef8bfb918d6 Author: Daniel Vetter daniel.vet...@ffwll.ch Date: Sat Dec 1 13:53:45 2012 +0100 drm/i915: use the gmbus irq for waits Adding Daniel, Imre and Daniel to CC while I will try to figure out what's happening in parallel. Wasn't this fixed by the merge from David (2cc79544bd0aabb4b3cf467ead5df526d9134c64)? I can't figure out the exact commit that the merge message referred to though... greg k-h -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [3.9-rc1] irq 16: nobody cared (was [3.9-rc1] very poor interrupt responses)
On Fri, 15 Mar 2013, Greg KH wrote: I have the same problem on my Lenovo T500. I think the graphics card is involved. This laptop has hybrid graphics - one Intel GMA 4500MHD and one ATI Mobility Radeon HD 3650. When I boot with the Intel card, I get irq 16: nobody cared during boot, not when I boot with the ATI card. Confirming this. After a lot of hassle, I have bisected this reliably to commit 28c70f162a315bdcfbe0bf940a740ef8bfb918d6 Author: Daniel Vetter daniel.vet...@ffwll.ch Date: Sat Dec 1 13:53:45 2012 +0100 drm/i915: use the gmbus irq for waits Adding Daniel, Imre and Daniel to CC while I will try to figure out what's happening in parallel. Wasn't this fixed by the merge from David (2cc79544bd0aabb4b3cf467ead5df526d9134c64)? Why do you think it should, please? (I am seeing this with a2362d247 still). -- Jiri Kosina SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [3.9-rc1] irq 16: nobody cared (was [3.9-rc1] very poor interrupt responses)
On Fri, Mar 15, 2013 at 04:37:56PM +0100, Jiri Kosina wrote: On Fri, 15 Mar 2013, Greg KH wrote: I have the same problem on my Lenovo T500. I think the graphics card is involved. This laptop has hybrid graphics - one Intel GMA 4500MHD and one ATI Mobility Radeon HD 3650. When I boot with the Intel card, I get irq 16: nobody cared during boot, not when I boot with the ATI card. Confirming this. After a lot of hassle, I have bisected this reliably to commit 28c70f162a315bdcfbe0bf940a740ef8bfb918d6 Author: Daniel Vetter daniel.vet...@ffwll.ch Date: Sat Dec 1 13:53:45 2012 +0100 drm/i915: use the gmbus irq for waits Adding Daniel, Imre and Daniel to CC while I will try to figure out what's happening in parallel. Wasn't this fixed by the merge from David (2cc79544bd0aabb4b3cf467ead5df526d9134c64)? Why do you think it should, please? The line: - Fix PCH irq handling race which resulted in missed gmbus/dp aux irqs and subsequent fallout (Paulo) (I am seeing this with a2362d247 still). Ok, I guess it isn't still fixed properly, just was guessing :) greg k-h -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [3.9-rc1] irq 16: nobody cared (was [3.9-rc1] very poor interrupt responses)
On Fri, 15 Mar 2013, Greg KH wrote: I have the same problem on my Lenovo T500. I think the graphics card is involved. This laptop has hybrid graphics - one Intel GMA 4500MHD and one ATI Mobility Radeon HD 3650. When I boot with the Intel card, I get irq 16: nobody cared during boot, not when I boot with the ATI card. Confirming this. After a lot of hassle, I have bisected this reliably to commit 28c70f162a315bdcfbe0bf940a740ef8bfb918d6 Author: Daniel Vetter daniel.vet...@ffwll.ch Date: Sat Dec 1 13:53:45 2012 +0100 drm/i915: use the gmbus irq for waits Adding Daniel, Imre and Daniel to CC while I will try to figure out what's happening in parallel. Wasn't this fixed by the merge from David (2cc79544bd0aabb4b3cf467ead5df526d9134c64)? Why do you think it should, please? The line: - Fix PCH irq handling race which resulted in missed gmbus/dp aux irqs and subsequent fallout (Paulo) Ah, that one. I believe that should be irrelevant for GM chipsets, as they don't have AUX line, right? (I am seeing this with a2362d247 still). Ok, I guess it isn't still fixed properly, just was guessing :) Seems like this is a different issue. Thanks, -- Jiri Kosina SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [3.9-rc1] irq 16: nobody cared (was [3.9-rc1] very poor interrupt responses)
On Fri, Mar 15, 2013 at 8:14 AM, Jiri Kosina jkos...@suse.cz wrote: Just a datapoint -- I have put a trivial debugging patch in place, and it reveals that nobody cared for irq 16 happens long after last I915_WRITE(GMBUS4 + reg_offset, 0); has been performed in gmbus_wait_hw_status(). On the other hand, if I comment out both GMBUS4 register offset writes in gmbus_wait_hw_status(), then it of course falls back to GPIO bit-banging, but the nobody cared for irq 16 is gone. So it seems like something gets severely confused by the I915_WRITE to GMBUS4 + reg_offset. So far this seems to have been reported solely on Lenovos as far as I can see (although a completely different types), so it might be some platform-specific quirk? Honestly, I still don't understand how all the GMBUS stuff relates to IRQ 16 at all. that device is using i915 :00:02.0: irq 44 for MSI/MSI-X so can you try to boot with pci=nomsi? -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [3.9-rc1] irq 16: nobody cared (was [3.9-rc1] very poor interrupt responses)
On Wed, 13 Mar 2013, Jiri Kosina wrote: OK, this is a me too, on Thinkpad x200s. [4.116847] irq 16: nobody cared (try booting with the irqpoll option) [4.116849] Pid: 1, comm: systemd Not tainted 3.9.0-rc2-00188-g6c23cbb #186 [4.116850] Call Trace: [4.116860] IRQ [810db0f8] __report_bad_irq+0x38/0xf0 [4.116862] [810db3a3] note_interrupt+0x1f3/0x240 [4.116865] [810d8977] handle_irq_event_percpu+0x147/0x230 [4.116867] [810d8aa9] handle_irq_event+0x49/0x70 [4.116869] [810dbbc1] handle_fasteoi_irq+0x61/0x100 [4.116873] [81004689] handle_irq+0x59/0x150 [4.116877] [8104e916] ? irq_enter+0x16/0x80 [4.116879] [81003d4b] do_IRQ+0x5b/0xe0 [4.116883] [815563ad] common_interrupt+0x6d/0x6d [4.116888] EOI [81320dc1] ? cfb_imageblit+0x581/0x5b0 [4.116891] [8131e019] bit_putcs+0x329/0x560 [4.116893] [8131dc8f] ? bit_cursor+0x5cf/0x630 [4.116896] [81317a28] fbcon_putcs+0xf8/0x130 [4.116898] [8131dcf0] ? bit_cursor+0x630/0x630 [4.116900] [8131a27e] fbcon_redraw+0x16e/0x1e0 [4.116902] [8131a554] fbcon_scroll+0x264/0xe40 [4.116905] [8138a263] scrup+0x113/0x120 [4.116907] [8138a4d0] lf+0x80/0x90 [4.116910] [8138e1dd] do_con_trol+0xcd/0x1360 [4.116912] [8138f725] do_con_write+0x2b5/0xa10 [4.116915] [81552bb7] ? __mutex_lock_slowpath+0x237/0x2e0 [4.116917] [8138fed9] con_write+0x19/0x30 [4.116920] [8137823b] do_output_char+0x1eb/0x220 [4.116922] [813782b6] process_output+0x46/0x70 [4.116924] [81378b0f] n_tty_write+0x13f/0x2f0 [4.116928] [8107a900] ? try_to_wake_up+0x2b0/0x2b0 [4.116930] [8137553c] tty_write+0x1cc/0x280 [4.116932] [813789d0] ? n_tty_ioctl+0x110/0x110 [4.116934] [8137569d] redirected_tty_write+0xad/0xc0 [4.116937] [811733ab] vfs_write+0xcb/0x130 [4.116939] [81173bac] sys_write+0x5c/0xa0 [4.116943] [8155e4a9] system_call_fastpath+0x16/0x1b [4.116943] handlers: [4.116959] [a0048450] usb_hcd_irq [usbcore] [4.116960] Disabling IRQ #16 I don't think I have seen this message on rc1+ (8343bce, to be precise), but I have definitely seen sluggish system response on that kernel as well. Attaching lspci, /proc/interrupts and dmesg. Can you try to do a git bisect for this? Is the sluggish system response clear enough that you can tell reliably when it is present and when it isn't? Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [3.9-rc1] irq 16: nobody cared (was [3.9-rc1] very poor interrupt responses)
On Thu, 14 Mar 2013, Alan Stern wrote: [4.116847] irq 16: nobody cared (try booting with the irqpoll option) [4.116849] Pid: 1, comm: systemd Not tainted 3.9.0-rc2-00188-g6c23cbb #186 [4.116850] Call Trace: [4.116860] IRQ [810db0f8] __report_bad_irq+0x38/0xf0 [4.116862] [810db3a3] note_interrupt+0x1f3/0x240 [4.116865] [810d8977] handle_irq_event_percpu+0x147/0x230 [4.116867] [810d8aa9] handle_irq_event+0x49/0x70 [4.116869] [810dbbc1] handle_fasteoi_irq+0x61/0x100 [4.116873] [81004689] handle_irq+0x59/0x150 [4.116877] [8104e916] ? irq_enter+0x16/0x80 [4.116879] [81003d4b] do_IRQ+0x5b/0xe0 [4.116883] [815563ad] common_interrupt+0x6d/0x6d [4.116888] EOI [81320dc1] ? cfb_imageblit+0x581/0x5b0 [4.116891] [8131e019] bit_putcs+0x329/0x560 [4.116893] [8131dc8f] ? bit_cursor+0x5cf/0x630 [4.116896] [81317a28] fbcon_putcs+0xf8/0x130 [4.116898] [8131dcf0] ? bit_cursor+0x630/0x630 [4.116900] [8131a27e] fbcon_redraw+0x16e/0x1e0 [4.116902] [8131a554] fbcon_scroll+0x264/0xe40 [4.116905] [8138a263] scrup+0x113/0x120 [4.116907] [8138a4d0] lf+0x80/0x90 [4.116910] [8138e1dd] do_con_trol+0xcd/0x1360 [4.116912] [8138f725] do_con_write+0x2b5/0xa10 [4.116915] [81552bb7] ? __mutex_lock_slowpath+0x237/0x2e0 [4.116917] [8138fed9] con_write+0x19/0x30 [4.116920] [8137823b] do_output_char+0x1eb/0x220 [4.116922] [813782b6] process_output+0x46/0x70 [4.116924] [81378b0f] n_tty_write+0x13f/0x2f0 [4.116928] [8107a900] ? try_to_wake_up+0x2b0/0x2b0 [4.116930] [8137553c] tty_write+0x1cc/0x280 [4.116932] [813789d0] ? n_tty_ioctl+0x110/0x110 [4.116934] [8137569d] redirected_tty_write+0xad/0xc0 [4.116937] [811733ab] vfs_write+0xcb/0x130 [4.116939] [81173bac] sys_write+0x5c/0xa0 [4.116943] [8155e4a9] system_call_fastpath+0x16/0x1b [4.116943] handlers: [4.116959] [a0048450] usb_hcd_irq [usbcore] [4.116960] Disabling IRQ #16 I don't think I have seen this message on rc1+ (8343bce, to be precise), but I have definitely seen sluggish system response on that kernel as well. Attaching lspci, /proc/interrupts and dmesg. Can you try to do a git bisect for this? Is the sluggish system response clear enough that you can tell reliably when it is present and when it isn't? That was my first thought, but unfortunately I am afraid there will be point at which I will easily make a bisection mistake, as the responsiveness of the system varies over time, so it's not really a 100% objective measure. -- Jiri Kosina SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [3.9-rc1] irq 16: nobody cared (was [3.9-rc1] very poor interrupt responses)
On Thu, 14 Mar 2013, Jiri Kosina wrote: On Thu, 14 Mar 2013, Alan Stern wrote: [4.116847] irq 16: nobody cared (try booting with the irqpoll option) [4.116849] Pid: 1, comm: systemd Not tainted 3.9.0-rc2-00188-g6c23cbb #186 [4.116850] Call Trace: [4.116860] IRQ [810db0f8] __report_bad_irq+0x38/0xf0 [4.116862] [810db3a3] note_interrupt+0x1f3/0x240 [4.116865] [810d8977] handle_irq_event_percpu+0x147/0x230 [4.116867] [810d8aa9] handle_irq_event+0x49/0x70 [4.116869] [810dbbc1] handle_fasteoi_irq+0x61/0x100 [4.116873] [81004689] handle_irq+0x59/0x150 [4.116877] [8104e916] ? irq_enter+0x16/0x80 [4.116879] [81003d4b] do_IRQ+0x5b/0xe0 [4.116883] [815563ad] common_interrupt+0x6d/0x6d [4.116888] EOI [81320dc1] ? cfb_imageblit+0x581/0x5b0 [4.116891] [8131e019] bit_putcs+0x329/0x560 [4.116893] [8131dc8f] ? bit_cursor+0x5cf/0x630 [4.116896] [81317a28] fbcon_putcs+0xf8/0x130 [4.116898] [8131dcf0] ? bit_cursor+0x630/0x630 [4.116900] [8131a27e] fbcon_redraw+0x16e/0x1e0 [4.116902] [8131a554] fbcon_scroll+0x264/0xe40 [4.116905] [8138a263] scrup+0x113/0x120 [4.116907] [8138a4d0] lf+0x80/0x90 [4.116910] [8138e1dd] do_con_trol+0xcd/0x1360 [4.116912] [8138f725] do_con_write+0x2b5/0xa10 [4.116915] [81552bb7] ? __mutex_lock_slowpath+0x237/0x2e0 [4.116917] [8138fed9] con_write+0x19/0x30 [4.116920] [8137823b] do_output_char+0x1eb/0x220 [4.116922] [813782b6] process_output+0x46/0x70 [4.116924] [81378b0f] n_tty_write+0x13f/0x2f0 [4.116928] [8107a900] ? try_to_wake_up+0x2b0/0x2b0 [4.116930] [8137553c] tty_write+0x1cc/0x280 [4.116932] [813789d0] ? n_tty_ioctl+0x110/0x110 [4.116934] [8137569d] redirected_tty_write+0xad/0xc0 [4.116937] [811733ab] vfs_write+0xcb/0x130 [4.116939] [81173bac] sys_write+0x5c/0xa0 [4.116943] [8155e4a9] system_call_fastpath+0x16/0x1b [4.116943] handlers: [4.116959] [a0048450] usb_hcd_irq [usbcore] [4.116960] Disabling IRQ #16 I don't think I have seen this message on rc1+ (8343bce, to be precise), but I have definitely seen sluggish system response on that kernel as well. Attaching lspci, /proc/interrupts and dmesg. Can you try to do a git bisect for this? Is the sluggish system response clear enough that you can tell reliably when it is present and when it isn't? That was my first thought, but unfortunately I am afraid there will be point at which I will easily make a bisection mistake, as the responsiveness of the system varies over time, so it's not really a 100% objective measure. All right. There have been only three significant changes to uhci-hcd since last summer, and two of them appear to be completely unrelated to this issue. The three commits are 3171fcabb169 USB: uhci: beautify source code 13996ca7afd5 USB: uhci: check buffer length to avoid memory overflow 0f815a0a700b USB: UHCI: fix IRQ race during initialization Reverting the first two almost certainly will not have any effect, but you may as well try it anyway. The third commit may be relevant. If you revert all three and still see the problem then it must be caused by changes outside of the USB stack. Differences in interrupt routing could be a result of changes to PCI or ACPI. Have you compared the current /proc/interrupts with versions from earlier kernels without this problem? Is occurrence of the nobody cared connected with any particular device? Somebody reported a similar problem not long ago (although IIRC it was for OHCI rather than UHCI) which appeared to be related to activity on the built-in webcam. Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [3.9-rc1] irq 16: nobody cared (was [3.9-rc1] very poor interrupt responses)
On Thu, 14 Mar 2013, Alan Stern wrote: Can you try to do a git bisect for this? Is the sluggish system response clear enough that you can tell reliably when it is present and when it isn't? That was my first thought, but unfortunately I am afraid there will be point at which I will easily make a bisection mistake, as the responsiveness of the system varies over time, so it's not really a 100% objective measure. All right. There have been only three significant changes to uhci-hcd since last summer, and two of them appear to be completely unrelated to this issue. The three commits are 3171fcabb169 USB: uhci: beautify source code 13996ca7afd5 USB: uhci: check buffer length to avoid memory overflow 0f815a0a700b USB: UHCI: fix IRQ race during initialization Reverting the first two almost certainly will not have any effect, but you may as well try it anyway. The third commit may be relevant. I have reverted all three commits, and the nobody cared is still there. If you revert all three and still see the problem then it must be caused by changes outside of the USB stack. Differences in interrupt routing could be a result of changes to PCI or ACPI. Have you compared the current /proc/interrupts with versions from earlier kernels without this problem? The diff of stripped-down (without CPU statistics) /proc/interrupts from some oldish working 3.1 and the current tree: --- /tmp/interrupts-old.txt 2013-03-14 16:30:46.938710286 +0100 +++ /tmp/interrupts-new.txt 2013-03-14 16:30:18.954571413 +0100 @@ -3,27 +3,28 @@ 8:IO-APIC-edge rtc0 9:IO-APIC-fasteoi acpi 12:IO-APIC-edge i8042 - 16:IO-APIC-fasteoi uhci_hcd:usb6 - 17:IO-APIC-fasteoi uhci_hcd:usb7 - 18:IO-APIC-fasteoi ata_generic, uhci_hcd:usb8 - 19:IO-APIC-fasteoi ehci_hcd:usb2 - 20:IO-APIC-fasteoi uhci_hcd:usb3 - 21:IO-APIC-fasteoi uhci_hcd:usb4 - 22:IO-APIC-fasteoi uhci_hcd:usb5 - 23:IO-APIC-fasteoi ehci_hcd:usb1 + 16:IO-APIC-fasteoi uhci_hcd:usb4 + 17:IO-APIC-fasteoi uhci_hcd:usb5 + 18:IO-APIC-fasteoi ata_generic, uhci_hcd:usb6 + 19:IO-APIC-fasteoi ehci_hcd:usb8 + 20:IO-APIC-fasteoi uhci_hcd:usb1 + 21:IO-APIC-fasteoi uhci_hcd:usb2 + 22:IO-APIC-fasteoi uhci_hcd:usb3 + 23:IO-APIC-fasteoi ehci_hcd:usb7, i801_smbus 40:PCI-MSI-edge PCIe PME 41:PCI-MSI-edge PCIe PME 42:PCI-MSI-edge PCIe PME 43:PCI-MSI-edge ahci 44:PCI-MSI-edge i915 45:PCI-MSI-edge eth0 - 46:PCI-MSI-edge iwlagn + 46:PCI-MSI-edge iwlwifi 47:PCI-MSI-edge snd_hda_intel NMI:Non-maskable interrupts LOC:Local timer interrupts SPU:Spurious interrupts PMI:Performance monitoring interrupts IWI:IRQ work interrupts +RTR:APIC ICR read retries RES:Rescheduling interrupts CAL:Function call interrupts TLB:TLB shootdowns IRQ16 is routed differently (usb4 vs usb6), so that might be relevant. Is occurrence of the nobody cared connected with any particular device? Somebody reported a similar problem not long ago (although IIRC it was for OHCI rather than UHCI) which appeared to be related to activity on the built-in webcam. Will check this. No external devices are plugged in, I think the only internal one it has is bluetooth chip. I'll try turning it off. -- Jiri Kosina SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [3.9-rc1] irq 16: nobody cared (was [3.9-rc1] very poor interrupt responses)
On Thu, 14 Mar 2013, Jiri Kosina wrote: Is occurrence of the nobody cared connected with any particular device? Somebody reported a similar problem not long ago (although IIRC it was for OHCI rather than UHCI) which appeared to be related to activity on the built-in webcam. Will check this. No external devices are plugged in, I think the only internal one it has is bluetooth chip. I'll try turning it off. That didn't help (I disabled it via hard rfkill and it vanished from lsusb), i.e. it happens even with only the hubs being there. -- Jiri Kosina SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [3.9-rc1] irq 16: nobody cared (was [3.9-rc1] very poor interrupt responses)
On Thu, 14 Mar 2013, Jiri Kosina wrote: I don't think I have seen this message on rc1+ (8343bce, to be precise), but I have definitely seen sluggish system response on that kernel as well. Attaching lspci, /proc/interrupts and dmesg. Can you try to do a git bisect for this? Is the sluggish system response clear enough that you can tell reliably when it is present and when it isn't? That was my first thought, but unfortunately I am afraid there will be point at which I will easily make a bisection mistake, as the responsiveness of the system varies over time, so it's not really a 100% objective measure. So I will try a bisect, but it'll take some time so that I could claim it to be trustworthy. Therefore in case anyone has any idea in parallel, I am all ears. -- Jiri Kosina SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [3.9-rc1] irq 16: nobody cared (was [3.9-rc1] very poor interrupt responses)
On Thu, 14 Mar 2013, Jiri Kosina wrote: I have reverted all three commits, and the nobody cared is still there. If you revert all three and still see the problem then it must be caused by changes outside of the USB stack. Differences in interrupt routing could be a result of changes to PCI or ACPI. Have you compared the current /proc/interrupts with versions from earlier kernels without this problem? The diff of stripped-down (without CPU statistics) /proc/interrupts from some oldish working 3.1 and the current tree: --- /tmp/interrupts-old.txt 2013-03-14 16:30:46.938710286 +0100 +++ /tmp/interrupts-new.txt 2013-03-14 16:30:18.954571413 +0100 @@ -3,27 +3,28 @@ 8:IO-APIC-edge rtc0 9:IO-APIC-fasteoi acpi 12:IO-APIC-edge i8042 - 16:IO-APIC-fasteoi uhci_hcd:usb6 - 17:IO-APIC-fasteoi uhci_hcd:usb7 - 18:IO-APIC-fasteoi ata_generic, uhci_hcd:usb8 - 19:IO-APIC-fasteoi ehci_hcd:usb2 - 20:IO-APIC-fasteoi uhci_hcd:usb3 - 21:IO-APIC-fasteoi uhci_hcd:usb4 - 22:IO-APIC-fasteoi uhci_hcd:usb5 - 23:IO-APIC-fasteoi ehci_hcd:usb1 + 16:IO-APIC-fasteoi uhci_hcd:usb4 + 17:IO-APIC-fasteoi uhci_hcd:usb5 + 18:IO-APIC-fasteoi ata_generic, uhci_hcd:usb6 + 19:IO-APIC-fasteoi ehci_hcd:usb8 + 20:IO-APIC-fasteoi uhci_hcd:usb1 + 21:IO-APIC-fasteoi uhci_hcd:usb2 + 22:IO-APIC-fasteoi uhci_hcd:usb3 + 23:IO-APIC-fasteoi ehci_hcd:usb7, i801_smbus 40:PCI-MSI-edge PCIe PME 41:PCI-MSI-edge PCIe PME 42:PCI-MSI-edge PCIe PME 43:PCI-MSI-edge ahci 44:PCI-MSI-edge i915 45:PCI-MSI-edge eth0 - 46:PCI-MSI-edge iwlagn + 46:PCI-MSI-edge iwlwifi 47:PCI-MSI-edge snd_hda_intel NMI:Non-maskable interrupts LOC:Local timer interrupts SPU:Spurious interrupts PMI:Performance monitoring interrupts IWI:IRQ work interrupts +RTR:APIC ICR read retries RES:Rescheduling interrupts CAL:Function call interrupts TLB:TLB shootdowns IRQ16 is routed differently (usb4 vs usb6), so that might be relevant. It looks like the order of probing changed. The old kernel did ehci-hcd before uhci-hcd and the new kernel did them in the opposite order. Consequently usb3-usb8 in the old kernel (the UHCI devices) are the same as usb1-usb6 in the new kernel. Likewise, usb1-usb2 in the old kernel are usb7-usb8 in the new kernel. In fact, the only major difference appears to be i801_smbus on IRQ 23. It's hard to see how that could have any effect. Is occurrence of the nobody cared connected with any particular device? Somebody reported a similar problem not long ago (although IIRC it was for OHCI rather than UHCI) which appeared to be related to activity on the built-in webcam. Will check this. No external devices are plugged in, I think the only internal one it has is bluetooth chip. I'll try turning it off. All right. One other thing you could try: Transplant the entire uhci-hcd driver from 3.1 (or whatever) into 3.9-rc1. It should go okay -- you may have to apply by hand the appropriate parts of commits bc677d5b6464, 90ab5ee94171, and 9ffc93f203c1. Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [3.9-rc1] irq 16: nobody cared (was [3.9-rc1] very poor interrupt responses)
On Thu, 14 Mar 2013, Jiri Kosina wrote: There have been only three significant changes to uhci-hcd since last summer, and two of them appear to be completely unrelated to this issue. The three commits are 3171fcabb169 USB: uhci: beautify source code 13996ca7afd5 USB: uhci: check buffer length to avoid memory overflow 0f815a0a700b USB: UHCI: fix IRQ race during initialization Reverting the first two almost certainly will not have any effect, but you may as well try it anyway. The third commit may be relevant. I have reverted all three commits, and the nobody cared is still there. There's one other commit I failed to find at first: 840008bb5162 (USB: UHCI: notify usbcore about port resumes). Probably not relevant, but you should check to make sure. Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [3.9-rc1] irq 16: nobody cared (was [3.9-rc1] very poor interrupt responses)
On Thursday, March 14, 2013 05:09:59 PM Jiri Kosina wrote: On Thu, 14 Mar 2013, Jiri Kosina wrote: I don't think I have seen this message on rc1+ (8343bce, to be precise), but I have definitely seen sluggish system response on that kernel as well. Attaching lspci, /proc/interrupts and dmesg. Can you try to do a git bisect for this? Is the sluggish system response clear enough that you can tell reliably when it is present and when it isn't? That was my first thought, but unfortunately I am afraid there will be point at which I will easily make a bisection mistake, as the responsiveness of the system varies over time, so it's not really a 100% objective measure. So I will try a bisect, but it'll take some time so that I could claim it to be trustworthy. Therefore in case anyone has any idea in parallel, I am all ears. This one is a candidate to focus on I think: commit 181380b702eee1a9aca51354d7b87c7b08541fcf Author: Yinghai Lu ying...@kernel.org Date: Sat Feb 16 11:58:34 2013 -0700 PCI/ACPI: Don't cache _PRT, and don't associate them with bus numbers Thanks, Rafael -- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center. -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [3.9-rc1] irq 16: nobody cared (was [3.9-rc1] very poor interrupt responses)
On Thu, 2013-03-14 at 17:09 +0100, Jiri Kosina wrote: On Thu, 14 Mar 2013, Jiri Kosina wrote: I don't think I have seen this message on rc1+ (8343bce, to be precise), but I have definitely seen sluggish system response on that kernel as well. Attaching lspci, /proc/interrupts and dmesg. Can you try to do a git bisect for this? Is the sluggish system response clear enough that you can tell reliably when it is present and when it isn't? That was my first thought, but unfortunately I am afraid there will be point at which I will easily make a bisection mistake, as the responsiveness of the system varies over time, so it's not really a 100% objective measure. So I will try a bisect, but it'll take some time so that I could claim it to be trustworthy. Therefore in case anyone has any idea in parallel, I am all ears. When I had this happen on -next, it was PCI + ACPI-related and I had to bisect it. But for me the problem was quite noticable and showed up right at login prompt. Regards, Peter Hurley PS - I already confirmed that the commit that fixes that is in 3.9-rc1 -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [3.9-rc1] irq 16: nobody cared (was [3.9-rc1] very poor interrupt responses)
On Thu, 2013-03-14 at 17:46 +0100, Rafael J. Wysocki wrote: On Thursday, March 14, 2013 05:09:59 PM Jiri Kosina wrote: On Thu, 14 Mar 2013, Jiri Kosina wrote: I don't think I have seen this message on rc1+ (8343bce, to be precise), but I have definitely seen sluggish system response on that kernel as well. Attaching lspci, /proc/interrupts and dmesg. Can you try to do a git bisect for this? Is the sluggish system response clear enough that you can tell reliably when it is present and when it isn't? That was my first thought, but unfortunately I am afraid there will be point at which I will easily make a bisection mistake, as the responsiveness of the system varies over time, so it's not really a 100% objective measure. So I will try a bisect, but it'll take some time so that I could claim it to be trustworthy. Therefore in case anyone has any idea in parallel, I am all ears. This one is a candidate to focus on I think: commit 181380b702eee1a9aca51354d7b87c7b08541fcf Author: Yinghai Lu ying...@kernel.org Date: Sat Feb 16 11:58:34 2013 -0700 PCI/ACPI: Don't cache _PRT, and don't associate them with bus numbers This patch __fixed__ this problem for me in linux-next back in February. Rafael, did you hold back some ACPI patches from 3.9 that would have made fix no longer applicable? -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [3.9-rc1] irq 16: nobody cared (was [3.9-rc1] very poor interrupt responses)
On Thursday, March 14, 2013 01:06:04 PM Peter Hurley wrote: On Thu, 2013-03-14 at 17:46 +0100, Rafael J. Wysocki wrote: On Thursday, March 14, 2013 05:09:59 PM Jiri Kosina wrote: On Thu, 14 Mar 2013, Jiri Kosina wrote: I don't think I have seen this message on rc1+ (8343bce, to be precise), but I have definitely seen sluggish system response on that kernel as well. Attaching lspci, /proc/interrupts and dmesg. Can you try to do a git bisect for this? Is the sluggish system response clear enough that you can tell reliably when it is present and when it isn't? That was my first thought, but unfortunately I am afraid there will be point at which I will easily make a bisection mistake, as the responsiveness of the system varies over time, so it's not really a 100% objective measure. So I will try a bisect, but it'll take some time so that I could claim it to be trustworthy. Therefore in case anyone has any idea in parallel, I am all ears. This one is a candidate to focus on I think: commit 181380b702eee1a9aca51354d7b87c7b08541fcf Author: Yinghai Lu ying...@kernel.org Date: Sat Feb 16 11:58:34 2013 -0700 PCI/ACPI: Don't cache _PRT, and don't associate them with bus numbers This patch __fixed__ this problem for me in linux-next back in February. Rafael, did you hold back some ACPI patches from 3.9 that would have made fix no longer applicable? No, I didn't. I'm afraid, though, that the fix might not be effective on some systems for a reason that's unclear at the moment. So in fact the one to check is commit 4f535093cf (PCI: Put pci_dev in device tree as early as possible) and if the problem doesn't appear before that, we need to figure out why the fix may not be sufficient. Thanks, Rafael -- I speak only for myself. Rafael J. Wysocki, Intel Open Source Technology Center. -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [3.9-rc1] irq 16: nobody cared (was [3.9-rc1] very poor interrupt responses)
On Thu, 2013-03-14 at 18:22 +0100, Rafael J. Wysocki wrote: On Thursday, March 14, 2013 01:06:04 PM Peter Hurley wrote: On Thu, 2013-03-14 at 17:46 +0100, Rafael J. Wysocki wrote: On Thursday, March 14, 2013 05:09:59 PM Jiri Kosina wrote: On Thu, 14 Mar 2013, Jiri Kosina wrote: I don't think I have seen this message on rc1+ (8343bce, to be precise), but I have definitely seen sluggish system response on that kernel as well. Attaching lspci, /proc/interrupts and dmesg. Can you try to do a git bisect for this? Is the sluggish system response clear enough that you can tell reliably when it is present and when it isn't? That was my first thought, but unfortunately I am afraid there will be point at which I will easily make a bisection mistake, as the responsiveness of the system varies over time, so it's not really a 100% objective measure. So I will try a bisect, but it'll take some time so that I could claim it to be trustworthy. Therefore in case anyone has any idea in parallel, I am all ears. This one is a candidate to focus on I think: commit 181380b702eee1a9aca51354d7b87c7b08541fcf Author: Yinghai Lu ying...@kernel.org Date: Sat Feb 16 11:58:34 2013 -0700 PCI/ACPI: Don't cache _PRT, and don't associate them with bus numbers This patch __fixed__ this problem for me in linux-next back in February. Rafael, did you hold back some ACPI patches from 3.9 that would have made fix no longer applicable? No, I didn't. I'm afraid, though, that the fix might not be effective on some systems for a reason that's unclear at the moment. So in fact the one to check is commit 4f535093cf (PCI: Put pci_dev in device tree as early as possible) and if the problem doesn't appear before that, we need to figure out why the fix may not be sufficient. I agree. Commit 4f535093cf (PCI: Put pci_dev in device tree as early as possible) is the likely culprit, and Don't cache _PRT... is probably an insufficient fix. Not so sure about the other reporters though because they had active devices on those USB ports. Regards, Peter Hurley -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [3.9-rc1] irq 16: nobody cared (was [3.9-rc1] very poor interrupt responses)
On Wed, Mar 13, 2013 at 2:35 PM, Jiri Kosina jkos...@suse.cz wrote: OK, this is a me too, on Thinkpad x200s. [4.116847] irq 16: nobody cared (try booting with the irqpoll option) [4.116849] Pid: 1, comm: systemd Not tainted 3.9.0-rc2-00188-g6c23cbb #186 [4.116850] Call Trace: [4.116860] IRQ [810db0f8] __report_bad_irq+0x38/0xf0 [4.116862] [810db3a3] note_interrupt+0x1f3/0x240 [4.116865] [810d8977] handle_irq_event_percpu+0x147/0x230 [4.116867] [810d8aa9] handle_irq_event+0x49/0x70 [4.116869] [810dbbc1] handle_fasteoi_irq+0x61/0x100 [4.116873] [81004689] handle_irq+0x59/0x150 [4.116877] [8104e916] ? irq_enter+0x16/0x80 [4.116879] [81003d4b] do_IRQ+0x5b/0xe0 [4.116883] [815563ad] common_interrupt+0x6d/0x6d [4.116888] EOI [81320dc1] ? cfb_imageblit+0x581/0x5b0 [4.116891] [8131e019] bit_putcs+0x329/0x560 [4.116893] [8131dc8f] ? bit_cursor+0x5cf/0x630 [4.116896] [81317a28] fbcon_putcs+0xf8/0x130 [4.116898] [8131dcf0] ? bit_cursor+0x630/0x630 [4.116900] [8131a27e] fbcon_redraw+0x16e/0x1e0 [4.116902] [8131a554] fbcon_scroll+0x264/0xe40 [4.116905] [8138a263] scrup+0x113/0x120 [4.116907] [8138a4d0] lf+0x80/0x90 [4.116910] [8138e1dd] do_con_trol+0xcd/0x1360 [4.116912] [8138f725] do_con_write+0x2b5/0xa10 [4.116915] [81552bb7] ? __mutex_lock_slowpath+0x237/0x2e0 [4.116917] [8138fed9] con_write+0x19/0x30 [4.116920] [8137823b] do_output_char+0x1eb/0x220 [4.116922] [813782b6] process_output+0x46/0x70 [4.116924] [81378b0f] n_tty_write+0x13f/0x2f0 [4.116928] [8107a900] ? try_to_wake_up+0x2b0/0x2b0 [4.116930] [8137553c] tty_write+0x1cc/0x280 [4.116932] [813789d0] ? n_tty_ioctl+0x110/0x110 [4.116934] [8137569d] redirected_tty_write+0xad/0xc0 [4.116937] [811733ab] vfs_write+0xcb/0x130 [4.116939] [81173bac] sys_write+0x5c/0xa0 [4.116943] [8155e4a9] system_call_fastpath+0x16/0x1b [4.116943] handlers: [4.116959] [a0048450] usb_hcd_irq [usbcore] [4.116960] Disabling IRQ #16 I don't think I have seen this message on rc1+ (8343bce, to be precise), but I have definitely seen sluggish system response on that kernel as well. Attaching lspci, /proc/interrupts and dmesg. can you post dmesg with debug ignore_loglevel pci=routeirq with current linus tree and v3.8 or previous working kernel? Thanks Yinghai -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [3.9-rc1] irq 16: nobody cared (was [3.9-rc1] very poor interrupt responses)
Am Freitag, den 08.03.2013, 21:19 -0500 schrieb Alan Stern: On Fri, 8 Mar 2013, Peter Hurley wrote: [ +linux-usb ] On Fri, 2013-03-08 at 14:12 -0500, Shawn Starr wrote: Hello folks, I am noticing since rc0 and now rc1, very poor interrupt handling. Keyboard response, mouse movements, display refreshing etc. General input/display sluggishness. Did something break IRQ handling somewhere? I need to validate if this happens with X not running also if it is i915 related somehow. The behavor is noticed in a console login however. Device: Lenovo W500 laptop Hi Shawn, Unhandled interrupts is the problem. Is the device below being id'd properly? If you remove this device, does the problem go away? Does either of the kernels in question have commit 0f815a0a700b (USB: UHCI: fix IRQ race during initialization)? That commit was added to fix precisely this sort of thing. I think so: $ git describe v3.9-rc1-211-g47b3bc9 $ git branch --contains 0f815a0a700b * master Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [3.9-rc1] irq 16: nobody cared (was [3.9-rc1] very poor interrupt responses)
[ +linux-pci, +linux-acpi, +Rafael Wysocki, +Bjorn Helgaas ] On Sat, 2013-03-09 at 09:53 +0100, Thomas Meyer wrote: Am Freitag, den 08.03.2013, 21:19 -0500 schrieb Alan Stern: On Fri, 8 Mar 2013, Peter Hurley wrote: [ +linux-usb ] On Fri, 2013-03-08 at 14:12 -0500, Shawn Starr wrote: Hello folks, I am noticing since rc0 and now rc1, very poor interrupt handling. Keyboard response, mouse movements, display refreshing etc. General input/display sluggishness. Did something break IRQ handling somewhere? I need to validate if this happens with X not running also if it is i915 related somehow. The behavor is noticed in a console login however. Device: Lenovo W500 laptop Hi Shawn, Unhandled interrupts is the problem. Is the device below being id'd properly? If you remove this device, does the problem go away? Does either of the kernels in question have commit 0f815a0a700b (USB: UHCI: fix IRQ race during initialization)? That commit was added to fix precisely this sort of thing. I think so: $ git describe v3.9-rc1-211-g47b3bc9 $ git branch --contains 0f815a0a700b * master This might not be caused by USB. There were a lot of changes to PCI and ACPI for 3.9. Probably best to each file a bug at bugzilla.kernel.org with: Last known good kernel version -- For both good and bad kernels (preferably as attachments) -- /proc/interrupts lsusb lspci dmesg and reply back with the bugzilla #. It may be necessary to bisect this problem. Regards, Peter Hurley PS - I know it can be difficult to get those things on the bad kernel. It's easier if you boot to console. -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [3.9-rc1] irq 16: nobody cared (was [3.9-rc1] very poor interrupt responses)
On Fri, 8 Mar 2013, Peter Hurley wrote: [ +linux-usb ] On Fri, 2013-03-08 at 14:12 -0500, Shawn Starr wrote: Hello folks, I am noticing since rc0 and now rc1, very poor interrupt handling. Keyboard response, mouse movements, display refreshing etc. General input/display sluggishness. Did something break IRQ handling somewhere? I need to validate if this happens with X not running also if it is i915 related somehow. The behavor is noticed in a console login however. Device: Lenovo W500 laptop Hi Shawn, Unhandled interrupts is the problem. Is the device below being id'd properly? If you remove this device, does the problem go away? Does either of the kernels in question have commit 0f815a0a700b (USB: UHCI: fix IRQ race during initialization)? That commit was added to fix precisely this sort of thing. Alan Stern -- To unsubscribe from this list: send the line unsubscribe linux-usb in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html