Re: PROBLEM: 4.15.0-rc3 APIC causes lockups on Core 2 Duo laptop
On Fri, Dec 29, 2017 at 02:09:40PM +0100, Thomas Gleixner wrote: > On Fri, 29 Dec 2017, Alexandru Chirvasitu wrote: > > All right, I tried to do some more digging around, in the hope of > > getting as close to the source of the problem as I can. > > > > I went back to the very first commit that went astray for me, 2db1f95 > > (which is the only one actually panicking), and tried to move from its > > parent 90ad9e2 (that boots fine) to it gradually, altering the code in > > small chunks. > > > > I tried to ignore the stuff that clearly shouldn't make a difference, > > such as definitions. So in the end I get defined-but-unused-function > > errors in my compilations, but I'm ignoring those for now. Some > > results: > > > > (1) When I move from the good commit 90ad9e2 according to the attached > > bad-diff (which moves partly towards 2db1f95), I get a panic. > > > > (2) On the other hand, when I further change this last panicking > > commit by simply doing > > > > > > > > removed activate / deactivate from x86_vector_domain_ops > > > > diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c > > index 7317ba5a..063594d 100644 > > --- a/arch/x86/kernel/apic/vector.c > > +++ b/arch/x86/kernel/apic/vector.c > > @@ -514,8 +514,6 @@ void x86_vector_debug_show(struct seq_file *m, struct > > irq_domain *d, > > static const struct irq_domain_ops x86_vector_domain_ops = { > > .alloc = x86_vector_alloc_irqs, > > .free = x86_vector_free_irqs, > > - .activate = x86_vector_activate, > > - .deactivate = x86_vector_deactivate, > > #ifdef CONFIG_GENERIC_IRQ_DEBUGFS > > .debug_show = x86_vector_debug_show, > > #endif > > > > > > all is well. > > Nice detective work. Unfortunately that's not a real solution ... > Oh, of course. It was never intended as a solution, only as information perhaps enabling someone who knows what they're doing (unlike myself :) ) to find one. > Can you try the patch below on top of Linus tree, please? > > Thanks, > Applied it to 464e1d5 4.15-rc5 just now. it appears to be trouble-free: booted, logged me in fine, the works. > tglx > > 8<- > --- a/kernel/irq/msi.c > +++ b/kernel/irq/msi.c > @@ -339,6 +339,40 @@ int msi_domain_populate_irqs(struct irq_ > return ret; > } > > +/* > + * Carefully check whether the device can use reservation mode. If > + * reservation mode is enabled then the early activation will assign a > + * dummy vector to the device. If the PCI/MSI device does not support > + * masking of the entry then this can result in spurious interrupts when > + * the device driver is not absolutely careful. But even then a malfunction > + * of the hardware could result in a spurious interrupt on the dummy vector > + * and render the device unusable. If the entry can be masked then the core > + * logic will prevent the spurious interrupt and reservation mode can be > + * used. For now reservation mode is restricted to PCI/MSI. > + */ > +static bool msi_check_reservation_mode(struct irq_domain *domain, > +struct msi_domain_info *info, > +struct device *dev) > +{ > + struct msi_desc *desc; > + > + if (domain->bus_token != DOMAIN_BUS_PCI_MSI) > + return false; > + > + if (!(info->flags & MSI_FLAG_MUST_REACTIVATE)) > + return false; > + > + if (IS_ENABLED(CONFIG_PCI_MSI) && pci_msi_ignore_mask) > + return false; > + > + /* > + * Checking the first MSI descriptor is sufficient. MSIX supports > + * masking and MSI does so when the maskbit is set. > + */ > + desc = first_msi_entry(dev); > + return desc->msi_attrib.is_msix || desc->msi_attrib.maskbit; > +} > + > /** > * msi_domain_alloc_irqs - Allocate interrupts from a MSI interrupt domain > * @domain: The domain to allocate from > @@ -353,9 +387,11 @@ int msi_domain_alloc_irqs(struct irq_dom > { > struct msi_domain_info *info = domain->host_data; > struct msi_domain_ops *ops = info->ops; > - msi_alloc_info_t arg; > + struct irq_data *irq_data; > struct msi_desc *desc; > + msi_alloc_info_t arg; > int i, ret, virq; > + bool can_reserve; > > ret = msi_domain_prepare_irqs(domain, dev, nvec, &arg); > if (ret) > @@ -385,6 +421,8 @@ int msi_domain_alloc_irqs(struct irq_dom > if (ops->msi_finish) > ops->msi_finish(&arg, 0); > > + can_reserve = msi_check_reservation_mode(domain, info, dev); > + > for_each_msi_entry(desc, dev) { > virq = desc->irq; > if (desc->nvec_used == 1) > @@ -397,17 +435,28 @@ int msi_domain_alloc_irqs(struct irq_dom >* the MSI entries before the PCI layer enables MSI in the >
Re: PROBLEM: 4.15.0-rc3 APIC causes lockups on Core 2 Duo laptop
On Fri, 29 Dec 2017, Alexandru Chirvasitu wrote: > All right, I tried to do some more digging around, in the hope of > getting as close to the source of the problem as I can. > > I went back to the very first commit that went astray for me, 2db1f95 > (which is the only one actually panicking), and tried to move from its > parent 90ad9e2 (that boots fine) to it gradually, altering the code in > small chunks. > > I tried to ignore the stuff that clearly shouldn't make a difference, > such as definitions. So in the end I get defined-but-unused-function > errors in my compilations, but I'm ignoring those for now. Some > results: > > (1) When I move from the good commit 90ad9e2 according to the attached > bad-diff (which moves partly towards 2db1f95), I get a panic. > > (2) On the other hand, when I further change this last panicking > commit by simply doing > > > > removed activate / deactivate from x86_vector_domain_ops > > diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c > index 7317ba5a..063594d 100644 > --- a/arch/x86/kernel/apic/vector.c > +++ b/arch/x86/kernel/apic/vector.c > @@ -514,8 +514,6 @@ void x86_vector_debug_show(struct seq_file *m, struct > irq_domain *d, > static const struct irq_domain_ops x86_vector_domain_ops = { > .alloc = x86_vector_alloc_irqs, > .free = x86_vector_free_irqs, > - .activate = x86_vector_activate, > - .deactivate = x86_vector_deactivate, > #ifdef CONFIG_GENERIC_IRQ_DEBUGFS > .debug_show = x86_vector_debug_show, > #endif > > > all is well. Nice detective work. Unfortunately that's not a real solution ... Can you try the patch below on top of Linus tree, please? Thanks, tglx 8<- --- a/kernel/irq/msi.c +++ b/kernel/irq/msi.c @@ -339,6 +339,40 @@ int msi_domain_populate_irqs(struct irq_ return ret; } +/* + * Carefully check whether the device can use reservation mode. If + * reservation mode is enabled then the early activation will assign a + * dummy vector to the device. If the PCI/MSI device does not support + * masking of the entry then this can result in spurious interrupts when + * the device driver is not absolutely careful. But even then a malfunction + * of the hardware could result in a spurious interrupt on the dummy vector + * and render the device unusable. If the entry can be masked then the core + * logic will prevent the spurious interrupt and reservation mode can be + * used. For now reservation mode is restricted to PCI/MSI. + */ +static bool msi_check_reservation_mode(struct irq_domain *domain, + struct msi_domain_info *info, + struct device *dev) +{ + struct msi_desc *desc; + + if (domain->bus_token != DOMAIN_BUS_PCI_MSI) + return false; + + if (!(info->flags & MSI_FLAG_MUST_REACTIVATE)) + return false; + + if (IS_ENABLED(CONFIG_PCI_MSI) && pci_msi_ignore_mask) + return false; + + /* +* Checking the first MSI descriptor is sufficient. MSIX supports +* masking and MSI does so when the maskbit is set. +*/ + desc = first_msi_entry(dev); + return desc->msi_attrib.is_msix || desc->msi_attrib.maskbit; +} + /** * msi_domain_alloc_irqs - Allocate interrupts from a MSI interrupt domain * @domain:The domain to allocate from @@ -353,9 +387,11 @@ int msi_domain_alloc_irqs(struct irq_dom { struct msi_domain_info *info = domain->host_data; struct msi_domain_ops *ops = info->ops; - msi_alloc_info_t arg; + struct irq_data *irq_data; struct msi_desc *desc; + msi_alloc_info_t arg; int i, ret, virq; + bool can_reserve; ret = msi_domain_prepare_irqs(domain, dev, nvec, &arg); if (ret) @@ -385,6 +421,8 @@ int msi_domain_alloc_irqs(struct irq_dom if (ops->msi_finish) ops->msi_finish(&arg, 0); + can_reserve = msi_check_reservation_mode(domain, info, dev); + for_each_msi_entry(desc, dev) { virq = desc->irq; if (desc->nvec_used == 1) @@ -397,17 +435,28 @@ int msi_domain_alloc_irqs(struct irq_dom * the MSI entries before the PCI layer enables MSI in the * card. Otherwise the card latches a random msi message. */ - if (info->flags & MSI_FLAG_ACTIVATE_EARLY) { - struct irq_data *irq_data; + if (!(info->flags & MSI_FLAG_ACTIVATE_EARLY)) + continue; + irq_data = irq_domain_get_irq_data(domain, desc->irq); + if (!can_reserve) + irqd_clr_can_reserve(irq_data); + ret = irq_domain_activate_irq(i
Re: PROBLEM: 4.15.0-rc3 APIC causes lockups on Core 2 Duo laptop
On Fri, Dec 29, 2017 at 06:49:15AM -0500, Alexandru Chirvasitu wrote: > All right, I tried to do some more digging around, in the hope of > getting as close to the source of the problem as I can. > > I went back to the very first commit that went astray for me, 2db1f95 > (which is the only one actually panicking), and tried to move from its > parent 90ad9e2 (that boots fine) to it gradually, altering the code in > small chunks. > > I tried to ignore the stuff that clearly shouldn't make a difference, > such as definitions. So in the end I get defined-but-unused-function > errors in my compilations, but I'm ignoring those for now. Some > results: > > (1) When I move from the good commit 90ad9e2 according to the attached > bad-diff (which moves partly towards 2db1f95), I get a panic. > > (2) On the other hand, when I further change this last panicking > commit by simply doing > > > > removed activate / deactivate from x86_vector_domain_ops > > diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c > index 7317ba5a..063594d 100644 > --- a/arch/x86/kernel/apic/vector.c > +++ b/arch/x86/kernel/apic/vector.c > @@ -514,8 +514,6 @@ void x86_vector_debug_show(struct seq_file *m, struct > irq_domain *d, > static const struct irq_domain_ops x86_vector_domain_ops = { > .alloc = x86_vector_alloc_irqs, > .free = x86_vector_free_irqs, > - .activate = x86_vector_activate, > - .deactivate = x86_vector_deactivate, > #ifdef CONFIG_GENERIC_IRQ_DEBUGFS > .debug_show = x86_vector_debug_show, > #endif > > > all is well. > And sure enough, simply diffing removed activate / deactivate from x86_vector_domain_ops diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c index 3f53572..e6cb55d 100644 --- a/arch/x86/kernel/apic/vector.c +++ b/arch/x86/kernel/apic/vector.c @@ -511,8 +511,6 @@ void x86_vector_debug_show(struct seq_file *m, struct irq_domain *d, static const struct irq_domain_ops x86_vector_domain_ops = { .alloc = x86_vector_alloc_irqs, .free = x86_vector_free_irqs, - .activate = x86_vector_activate, - .deactivate = x86_vector_deactivate, #ifdef CONFIG_GENERIC_IRQ_DEBUGFS .debug_show = x86_vector_debug_show, #endif directly against 2db1f95 fixes the issues (no freezes, lockups, or panics). > > > > On Fri, Dec 29, 2017 at 09:07:45AM +0100, Thomas Gleixner wrote: > > On Thu, 28 Dec 2017, Alexandru Chirvasitu wrote: > > > On Fri, Dec 29, 2017 at 12:36:37AM +0100, Thomas Gleixner wrote: > > > > On Thu, 28 Dec 2017, Alexandru Chirvasitu wrote: > > > > > > > > > Attached, but heads up on this: when redirecting the output of lspci > > > > > -vvv to a text file as root I get > > > > > > > > > > pcilib: sysfs_read_vpd: read failed: Input/output error > > > > > > > > > > I can find bugs filed for various distros to this same effect, but > > > > > haven't tracked down any explanations. > > > > > > > > Weird, but the info looks complete. > > > > > > > > Can you please add 'pci=nomsi' to the 4.15 kernel command line and see > > > > whether that works? > > > > > > It does (emailing from that successful boot as we speak). I'm on a > > > clean 4.15-rc5 (as in no patches, etc.). > > > > > > This was also suggested way at the top of this thread by Dexuan Cui > > > for 4.15-rc3 (where this exchange started), and it worked back then > > > too. > > > > I missed that part of the conversation. Let me stare into the MSI code > > again. > > > > Thanks, > > > > tglx > diff --git a/arch/x86/include/asm/irq_vectors.h > b/arch/x86/include/asm/irq_vectors.h > index aaf8d28..1e9bd28 100644 > --- a/arch/x86/include/asm/irq_vectors.h > +++ b/arch/x86/include/asm/irq_vectors.h > @@ -101,12 +101,8 @@ > #define POSTED_INTR_NESTED_VECTOR0xf0 > #endif > > -/* > - * Local APIC timer IRQ vector is on a different priority level, > - * to work around the 'lost local interrupt if more than 2 IRQ > - * sources per level' errata. > - */ > -#define LOCAL_TIMER_VECTOR 0xef > +#define MANAGED_IRQ_SHUTDOWN_VECTOR 0xef > +#define LOCAL_TIMER_VECTOR 0xee > > #define NR_VECTORS256 > > diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c > index f08d44f..7317ba5a 100644 > --- a/arch/x86/kernel/apic/vector.c > +++ b/arch/x86/kernel/apic/vector.c > @@ -32,7 +32,8 @@ struct apic_chip_data { > unsigned intprev_cpu; > unsigned intirq; > struct hlist_node clist; > - u8 move_in_progress : 1; > + unsigned intmove_in_progress: 1, >
Re: PROBLEM: 4.15.0-rc3 APIC causes lockups on Core 2 Duo laptop
All right, I tried to do some more digging around, in the hope of getting as close to the source of the problem as I can. I went back to the very first commit that went astray for me, 2db1f95 (which is the only one actually panicking), and tried to move from its parent 90ad9e2 (that boots fine) to it gradually, altering the code in small chunks. I tried to ignore the stuff that clearly shouldn't make a difference, such as definitions. So in the end I get defined-but-unused-function errors in my compilations, but I'm ignoring those for now. Some results: (1) When I move from the good commit 90ad9e2 according to the attached bad-diff (which moves partly towards 2db1f95), I get a panic. (2) On the other hand, when I further change this last panicking commit by simply doing removed activate / deactivate from x86_vector_domain_ops diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c index 7317ba5a..063594d 100644 --- a/arch/x86/kernel/apic/vector.c +++ b/arch/x86/kernel/apic/vector.c @@ -514,8 +514,6 @@ void x86_vector_debug_show(struct seq_file *m, struct irq_domain *d, static const struct irq_domain_ops x86_vector_domain_ops = { .alloc = x86_vector_alloc_irqs, .free = x86_vector_free_irqs, - .activate = x86_vector_activate, - .deactivate = x86_vector_deactivate, #ifdef CONFIG_GENERIC_IRQ_DEBUGFS .debug_show = x86_vector_debug_show, #endif all is well. On Fri, Dec 29, 2017 at 09:07:45AM +0100, Thomas Gleixner wrote: > On Thu, 28 Dec 2017, Alexandru Chirvasitu wrote: > > On Fri, Dec 29, 2017 at 12:36:37AM +0100, Thomas Gleixner wrote: > > > On Thu, 28 Dec 2017, Alexandru Chirvasitu wrote: > > > > > > > Attached, but heads up on this: when redirecting the output of lspci > > > > -vvv to a text file as root I get > > > > > > > > pcilib: sysfs_read_vpd: read failed: Input/output error > > > > > > > > I can find bugs filed for various distros to this same effect, but > > > > haven't tracked down any explanations. > > > > > > Weird, but the info looks complete. > > > > > > Can you please add 'pci=nomsi' to the 4.15 kernel command line and see > > > whether that works? > > > > It does (emailing from that successful boot as we speak). I'm on a > > clean 4.15-rc5 (as in no patches, etc.). > > > > This was also suggested way at the top of this thread by Dexuan Cui > > for 4.15-rc3 (where this exchange started), and it worked back then > > too. > > I missed that part of the conversation. Let me stare into the MSI code > again. > > Thanks, > > tglx diff --git a/arch/x86/include/asm/irq_vectors.h b/arch/x86/include/asm/irq_vectors.h index aaf8d28..1e9bd28 100644 --- a/arch/x86/include/asm/irq_vectors.h +++ b/arch/x86/include/asm/irq_vectors.h @@ -101,12 +101,8 @@ #define POSTED_INTR_NESTED_VECTOR 0xf0 #endif -/* - * Local APIC timer IRQ vector is on a different priority level, - * to work around the 'lost local interrupt if more than 2 IRQ - * sources per level' errata. - */ -#define LOCAL_TIMER_VECTOR 0xef +#define MANAGED_IRQ_SHUTDOWN_VECTOR0xef +#define LOCAL_TIMER_VECTOR 0xee #define NR_VECTORS 256 diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c index f08d44f..7317ba5a 100644 --- a/arch/x86/kernel/apic/vector.c +++ b/arch/x86/kernel/apic/vector.c @@ -32,7 +32,8 @@ struct apic_chip_data { unsigned intprev_cpu; unsigned intirq; struct hlist_node clist; - u8 move_in_progress : 1; + unsigned intmove_in_progress: 1, + is_managed : 1; }; struct irq_domain *x86_vector_domain; @@ -152,6 +153,28 @@ static void apic_update_vector(struct irq_data *irqd, unsigned int newvec, per_cpu(vector_irq, newcpu)[newvec] = desc; } +static void vector_assign_managed_shutdown(struct irq_data *irqd) +{ + unsigned int cpu = cpumask_first(cpu_online_mask); + + apic_update_irq_cfg(irqd, MANAGED_IRQ_SHUTDOWN_VECTOR, cpu); +} + +static int reserve_managed_vector(struct irq_data *irqd) +{ + const struct cpumask *affmsk = irq_data_get_affinity_mask(irqd); + struct apic_chip_data *apicd = apic_chip_data(irqd); + unsigned long flags; + int ret; + + raw_spin_lock_irqsave(&vector_lock, flags); + apicd->is_managed = true; + ret = irq_matrix_reserve_managed(vector_matrix, affmsk); + raw_spin_unlock_irqrestore(&vector_lock, flags); + trace_vector_reserve_managed(irqd->irq, ret); + return ret; +} + static int allocate_vector(struct irq_data *irqd, const struct cpumask *dest) { struct apic_chip_data *apicd = apic_chip_data(irqd); @@ -211,9 +234,58 @@ static int assign_irq_v
Re: PROBLEM: 4.15.0-rc3 APIC causes lockups on Core 2 Duo laptop
On Thu, 28 Dec 2017, Alexandru Chirvasitu wrote: > On Fri, Dec 29, 2017 at 12:36:37AM +0100, Thomas Gleixner wrote: > > On Thu, 28 Dec 2017, Alexandru Chirvasitu wrote: > > > > > Attached, but heads up on this: when redirecting the output of lspci > > > -vvv to a text file as root I get > > > > > > pcilib: sysfs_read_vpd: read failed: Input/output error > > > > > > I can find bugs filed for various distros to this same effect, but > > > haven't tracked down any explanations. > > > > Weird, but the info looks complete. > > > > Can you please add 'pci=nomsi' to the 4.15 kernel command line and see > > whether that works? > > It does (emailing from that successful boot as we speak). I'm on a > clean 4.15-rc5 (as in no patches, etc.). > > This was also suggested way at the top of this thread by Dexuan Cui > for 4.15-rc3 (where this exchange started), and it worked back then > too. I missed that part of the conversation. Let me stare into the MSI code again. Thanks, tglx ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: PROBLEM: 4.15.0-rc3 APIC causes lockups on Core 2 Duo laptop
On Thu, Dec 28, 2017 at 06:15:19PM -0600, Bjorn Helgaas wrote: > On Thu, Dec 28, 2017 at 06:30:58PM -0500, Alexandru Chirvasitu wrote: > > Attached, but heads up on this: when redirecting the output of lspci > > -vvv to a text file as root I get > > > > pcilib: sysfs_read_vpd: read failed: Input/output error > > > > I can find bugs filed for various distros to this same effect, but > > haven't tracked down any explanations. > > This is a tangent, but I think you should *always* see "Input/output > error" on this system when running "lspci -vvv" as root, regardless of > whether you redirect the output (the error probably goes to stderr, > not stdout, so it's probably easy to miss when not redirecting the > output). > > I think this is the -EIO return from pci_vpd_read(), which probably > means pci_vpd_size() returned 0 for one of your devices, which means > the VPD data provided by the device wasn't formatted correctly. If > this happens, you should see a warning in dmesg about it ("invalid VPD > tag" or similar) -- could you verify that? > This in dmesg: pci :06:00.0: [Firmware Bug]: disabling VPD access (can't determine size of non-standard VPD format) So yes, looks like you pinned it down good. No other VPD instances in dmesg. And yes, the error does seem to always be present. I see it with lspci -vvv 2>&1 | grep pcilib so it was there in stderr all along. > It's possible we should return something other than -EIO, or maybe > pcilib should do something other than emitting the warning. In > pcilib, sysfs_read_vpd() emits the warning [1], and it would seem sort > of ugly to special-case EIO, so maybe we should change this in the > kernel. > > It looks like your Qualcomm Atheros Attansic NIC at 06:00.0 is the > only device with VPD, so that's probably the one: > > 06:00.0 Ethernet controller: Qualcomm Atheros Attansic L2 Fast Ethernet > Capabilities: [6c] Vital Product Data > Not readable > > I think lspci would still print "Not readable" if we just made the > kernel return 0 instead of -EIO [2]. > > Bjorn > > [1] > https://git.kernel.org/pub/scm/utils/pciutils/pciutils.git/tree/lib/sysfs.c#n410 > [2] > https://git.kernel.org/pub/scm/utils/pciutils/pciutils.git/tree/ls-vpd.c#n87 ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: PROBLEM: 4.15.0-rc3 APIC causes lockups on Core 2 Duo laptop
On Thu, Dec 28, 2017 at 06:30:58PM -0500, Alexandru Chirvasitu wrote: > Attached, but heads up on this: when redirecting the output of lspci > -vvv to a text file as root I get > > pcilib: sysfs_read_vpd: read failed: Input/output error > > I can find bugs filed for various distros to this same effect, but > haven't tracked down any explanations. This is a tangent, but I think you should *always* see "Input/output error" on this system when running "lspci -vvv" as root, regardless of whether you redirect the output (the error probably goes to stderr, not stdout, so it's probably easy to miss when not redirecting the output). I think this is the -EIO return from pci_vpd_read(), which probably means pci_vpd_size() returned 0 for one of your devices, which means the VPD data provided by the device wasn't formatted correctly. If this happens, you should see a warning in dmesg about it ("invalid VPD tag" or similar) -- could you verify that? It's possible we should return something other than -EIO, or maybe pcilib should do something other than emitting the warning. In pcilib, sysfs_read_vpd() emits the warning [1], and it would seem sort of ugly to special-case EIO, so maybe we should change this in the kernel. It looks like your Qualcomm Atheros Attansic NIC at 06:00.0 is the only device with VPD, so that's probably the one: 06:00.0 Ethernet controller: Qualcomm Atheros Attansic L2 Fast Ethernet Capabilities: [6c] Vital Product Data Not readable I think lspci would still print "Not readable" if we just made the kernel return 0 instead of -EIO [2]. Bjorn [1] https://git.kernel.org/pub/scm/utils/pciutils/pciutils.git/tree/lib/sysfs.c#n410 [2] https://git.kernel.org/pub/scm/utils/pciutils/pciutils.git/tree/ls-vpd.c#n87 ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: PROBLEM: 4.15.0-rc3 APIC causes lockups on Core 2 Duo laptop
On Fri, Dec 29, 2017 at 12:36:37AM +0100, Thomas Gleixner wrote: > On Thu, 28 Dec 2017, Alexandru Chirvasitu wrote: > > > Attached, but heads up on this: when redirecting the output of lspci > > -vvv to a text file as root I get > > > > pcilib: sysfs_read_vpd: read failed: Input/output error > > > > I can find bugs filed for various distros to this same effect, but > > haven't tracked down any explanations. > > Weird, but the info looks complete. > > Can you please add 'pci=nomsi' to the 4.15 kernel command line and see > whether that works? It does (emailing from that successful boot as we speak). I'm on a clean 4.15-rc5 (as in no patches, etc.). This was also suggested way at the top of this thread by Dexuan Cui for 4.15-rc3 (where this exchange started), and it worked back then too. > > Thanks, > > tglx ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: PROBLEM: 4.15.0-rc3 APIC causes lockups on Core 2 Duo laptop
On Thu, 28 Dec 2017, Alexandru Chirvasitu wrote: > Attached, but heads up on this: when redirecting the output of lspci > -vvv to a text file as root I get > > pcilib: sysfs_read_vpd: read failed: Input/output error > > I can find bugs filed for various distros to this same effect, but > haven't tracked down any explanations. Weird, but the info looks complete. Can you please add 'pci=nomsi' to the 4.15 kernel command line and see whether that works? Thanks, tglx ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: PROBLEM: 4.15.0-rc3 APIC causes lockups on Core 2 Duo laptop
Attached, but heads up on this: when redirecting the output of lspci -vvv to a text file as root I get pcilib: sysfs_read_vpd: read failed: Input/output error I can find bugs filed for various distros to this same effect, but haven't tracked down any explanations. On Fri, Dec 29, 2017 at 12:19:19AM +0100, Thomas Gleixner wrote: > On Thu, 28 Dec 2017, Thomas Gleixner wrote: > > > On Thu, 28 Dec 2017, Alexandru Chirvasitu wrote: > > > > > Attached. > > > > > > I don't have a 4.14 family kernel available at the moment on that > > > machine. What I'm attaching comes from the 4.13 one I was playing with > > > yesterday, what with kexec and all. > > > > Good enough. Thanks ! > > Spoke too fast. Could you please run that command as root? > > And while at it please provide the output of > > cat /proc/interrupts > > as well. > > Thanks, > > tglx > 00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD/ATI] RC410 Host Bridge (rev 01) Subsystem: ASUSTeK Computer Inc. RC410 Host Bridge Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- SERR- TAbort- SERR- TAbort- Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: [b0] Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] RC4xx/RS4xx PCI Bridge [int gfx] Kernel modules: shpchp 00:04.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RC4xx/RS4xx PCI Express Port 1 (prog-if 00 [Normal decode]) Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: [50] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Capabilities: [58] Express (v1) Root Port (Slot+), MSI 00 DevCap: MaxPayload 128 bytes, PhantFunc 0 ExtTag+ RBE- DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 128 bytes DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend- LnkCap: Port #1, Speed 2.5GT/s, Width x1, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp- LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk- ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- LnkSta: Speed 2.5GT/s, Width x1, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise- Slot #0, PowerLimit 25.000W; Interlock- NoCompl- SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg- Control: AttnInd Off, PwrInd Off, Power- Interlock- SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock- Changed: MRL- PresDet+ LinkState- RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible- RootCap: CRSVisible- RootSta: PME ReqID , PMEStatus- PMEPending- Capabilities: [80] MSI: Enable- Count=1/1 Maskable- 64bit- Address: Data: Capabilities: [b0] Subsystem: ASUSTeK Computer Inc. RC4xx/RS4xx PCI Express Port 1 Capabilities: [b8] HyperTransport: MSI Mapping Enable+ Fixed+ Kernel driver in use: pcieport Kernel modules: shpchp 00:05.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RC4xx/RS4xx PCI Express Port 2 (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: [50] Power Management version 3 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Capabilities: [58] Express (v1) Root Port (Slot+), MSI 00 DevCap: MaxPayload 128 bytes, PhantFunc 0 ExtTag+ RBE- DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ MaxPayload 128 bytes, MaxReadReq 128 bytes DevSta: CorrErr-
Re: PROBLEM: 4.15.0-rc3 APIC causes lockups on Core 2 Duo laptop
On Thu, 28 Dec 2017, Thomas Gleixner wrote: > On Thu, 28 Dec 2017, Alexandru Chirvasitu wrote: > > > Attached. > > > > I don't have a 4.14 family kernel available at the moment on that > > machine. What I'm attaching comes from the 4.13 one I was playing with > > yesterday, what with kexec and all. > > Good enough. Thanks ! Spoke too fast. Could you please run that command as root? And while at it please provide the output of cat /proc/interrupts as well. Thanks, tglx ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: PROBLEM: 4.15.0-rc3 APIC causes lockups on Core 2 Duo laptop
On Thu, 28 Dec 2017, Alexandru Chirvasitu wrote: > Attached. > > I don't have a 4.14 family kernel available at the moment on that > machine. What I'm attaching comes from the 4.13 one I was playing with > yesterday, what with kexec and all. Good enough. Thanks ! ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: PROBLEM: 4.15.0-rc3 APIC causes lockups on Core 2 Duo laptop
Attached. I don't have a 4.14 family kernel available at the moment on that machine. What I'm attaching comes from the 4.13 one I was playing with yesterday, what with kexec and all. On Thu, Dec 28, 2017 at 10:54:25PM +0100, Thomas Gleixner wrote: > On Thu, 28 Dec 2017, Thomas Gleixner wrote: > > On Thu, 28 Dec 2017, Alexandru Chirvasitu wrote: > > > > > No; it seems to be tied to this specific issue, and I was seeing even > > > before getting logs just now, whenever I'd start one of the bad > > > kernels in recovery mode. > > > > > > But no, I've never seen that in any other logs, or on any other > > > screens outside of those popping up in relation to this problem. > > > > Ok. I'll dig into it and we have a 100% reproducer reported by someone else > > now, which might give us simpler insight into that issue. I'll let you know > > once I have something to test. Might be a couple of days though. > > Can you please provide the output of > > lspci -vvv > > from a working kernel, preferrably 4.14.y > > Thanks, > > tglx 00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD/ATI] RC410 Host Bridge (rev 01) Subsystem: ASUSTeK Computer Inc. RC410 Host Bridge Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx- Status: Cap- 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- SERR- TAbort- SERR- TAbort- Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: Kernel modules: shpchp 00:04.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RC4xx/RS4xx PCI Express Port 1 (prog-if 00 [Normal decode]) Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: Kernel driver in use: pcieport Kernel modules: shpchp 00:05.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RC4xx/RS4xx PCI Express Port 2 (prog-if 00 [Normal decode]) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: Kernel driver in use: pcieport Kernel modules: shpchp 00:06.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RC4xx/RS4xx PCI Express Port 3 (prog-if 00 [Normal decode]) Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: Kernel driver in use: pcieport Kernel modules: shpchp 00:07.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] RC4xx/RS4xx PCI Express Port 4 (prog-if 00 [Normal decode]) Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- SERR- TAbort- Reset- FastB2B- PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn- Capabilities: Kernel driver in use: pcieport Kernel modules: shpchp 00:12.0 SATA controller: Advanced Micro Devices, Inc. [AMD/ATI] SB600 Non-Raid-5 SATA (prog-if 01 [AHCI 1.0]) Subsystem: ASUSTeK Computer Inc. SB600 Non-Raid-5 SATA Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap+ 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- SERR- Kernel driver in use: ahci Kernel modules: ahci 00:13.0 USB controller: Advanced Micro Devices, Inc. [AMD/ATI] SB600 USB (OHCI0) (prog-if 10 [OHCI]) Subsystem: ASUSTeK Computer Inc. SB600 USB (OHCI0) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx- Status: Cap- 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- Kernel driver in use: ehci-pci Kernel modules: ehci_pci 00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD/ATI] SBx00 SMBus Controller (rev 13) Subsystem: ASUSTeK Computer Inc. SBx00 SMBus Controller Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap- 66MHz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- SERR- TAbort- SERR- TAbort- SERR- Kernel driver in use: snd_hda_intel Kernel modules: snd_hda_intel 00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD/ATI] SB600 PCI to LPC Bridge
Re: PROBLEM: 4.15.0-rc3 APIC causes lockups on Core 2 Duo laptop
On Thu, 28 Dec 2017, Thomas Gleixner wrote: > On Thu, 28 Dec 2017, Alexandru Chirvasitu wrote: > > > No; it seems to be tied to this specific issue, and I was seeing even > > before getting logs just now, whenever I'd start one of the bad > > kernels in recovery mode. > > > > But no, I've never seen that in any other logs, or on any other > > screens outside of those popping up in relation to this problem. > > Ok. I'll dig into it and we have a 100% reproducer reported by someone else > now, which might give us simpler insight into that issue. I'll let you know > once I have something to test. Might be a couple of days though. Can you please provide the output of lspci -vvv from a working kernel, preferrably 4.14.y Thanks, tglx ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: PROBLEM: 4.15.0-rc3 APIC causes lockups on Core 2 Duo laptop
On Thu, 28 Dec 2017, Dexuan Cui wrote: > > From: Thomas Gleixner [mailto:t...@linutronix.de] > > Sent: Thursday, December 28, 2017 03:03 > > > > On Wed, Dec 20, 2017 at 02:12:05AM +, Dexuan Cui wrote: > > > > > > For Linux VM running on Hyper-V, we did get "spurious APIC interrupt > > > > through vector " and a patchset, which included the patch you identifed > > > > ("genirq: Add config option for reservation mode"), was made to fix the > > > > issue. But since you're using a physical machine rathter than a VM, I > > > > suspect it should be a different issue. > > > > Aaargh! Why was this never reported and where is that magic patchset? > > tglx > > Hi Thomas, > The Hyper-V specific issue was reported and you made a patchset to fix it: > https://patchwork.kernel.org/patch/10006171/ > https://lkml.org/lkml/2017/10/17/120 Ah, ok. I did not make the connection. I'll have a scan through the tree to figure out whether there is some other weird place which is missing that. Thanks, tglx ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: PROBLEM: 4.15.0-rc3 APIC causes lockups on Core 2 Duo laptop
> From: Thomas Gleixner [mailto:t...@linutronix.de] > Sent: Thursday, December 28, 2017 03:03 > > > On Wed, Dec 20, 2017 at 02:12:05AM +, Dexuan Cui wrote: > > > > For Linux VM running on Hyper-V, we did get "spurious APIC interrupt > > > through vector " and a patchset, which included the patch you identifed > > > ("genirq: Add config option for reservation mode"), was made to fix the > > > issue. But since you're using a physical machine rathter than a VM, I > > > suspect it should be a different issue. > > Aaargh! Why was this never reported and where is that magic patchset? > tglx Hi Thomas, The Hyper-V specific issue was reported and you made a patchset to fix it: https://patchwork.kernel.org/patch/10006171/ https://lkml.org/lkml/2017/10/17/120 Thanks, -- Dexuan ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: PROBLEM: 4.15.0-rc3 APIC causes lockups on Core 2 Duo laptop
On Thu, 28 Dec 2017, Alexandru Chirvasitu wrote: > No; it seems to be tied to this specific issue, and I was seeing even > before getting logs just now, whenever I'd start one of the bad > kernels in recovery mode. > > But no, I've never seen that in any other logs, or on any other > screens outside of those popping up in relation to this problem. Ok. I'll dig into it and we have a 100% reproducer reported by someone else now, which might give us simpler insight into that issue. I'll let you know once I have something to test. Might be a couple of days though. Thanks, tglx ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: PROBLEM: 4.15.0-rc3 APIC causes lockups on Core 2 Duo laptop
No; it seems to be tied to this specific issue, and I was seeing even before getting logs just now, whenever I'd start one of the bad kernels in recovery mode. But no, I've never seen that in any other logs, or on any other screens outside of those popping up in relation to this problem. On Thu, Dec 28, 2017 at 06:29:05PM +0100, Thomas Gleixner wrote: > On Thu, 28 Dec 2017, Alexandru Chirvasitu wrote: > > On Thu, Dec 28, 2017 at 05:10:28PM +0100, Thomas Gleixner wrote: > > > On Thu, 28 Dec 2017, Alexandru Chirvasitu wrote: > > > > Actually, it decided to cooperate for just long enough for me to get > > > > the dmesg out. Attached. > > > > > > > > This is from the kernel you asked about: Dou's patch + yours, i.e. the > > > > latest one in that git log I just sent, booted up with 'apic=debug'. > > > > > > Ok. As I suspected that warning does not trigger. I would have been > > > massively surprised if that happened. So Dou's patch is just a red herring > > > and just might change the timing enough to make the problem 'hide'. > > > > > > Can you try something completely different please? > > > > > > Just use plain Linus tree without any additional patches on top and > > > disable > > > CONFIG_NO_HZ_IDLE, i.e. select CONFIG_HZ_PERIODIC. > > > > > > If that works, then reenable it and add 'nohz=off' to the kernel command > > > line. > > > > > > > No go here I'm afraid: > > > > Linus' clean 4.15-rc5 compiled with CONFIG_HZ_PERIODIC exhibits the > > familiar behaviour: lockups, sometimes instant upon trying to log in, > > sometimes logging me in and freaking out seconds later. > > Ok. So it's not the issue I had in mind. > > Back to some of the interesting bits in the logs: > > [ 36.017942] spurious APIC interrupt through vector ff on CPU#0, should > never happen. > > Does that message ever show up in 4.14 or 4.9? > > Thanks, > > tglx ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: PROBLEM: 4.15.0-rc3 APIC causes lockups on Core 2 Duo laptop
On Thu, 28 Dec 2017, Alexandru Chirvasitu wrote: > On Thu, Dec 28, 2017 at 05:10:28PM +0100, Thomas Gleixner wrote: > > On Thu, 28 Dec 2017, Alexandru Chirvasitu wrote: > > > Actually, it decided to cooperate for just long enough for me to get > > > the dmesg out. Attached. > > > > > > This is from the kernel you asked about: Dou's patch + yours, i.e. the > > > latest one in that git log I just sent, booted up with 'apic=debug'. > > > > Ok. As I suspected that warning does not trigger. I would have been > > massively surprised if that happened. So Dou's patch is just a red herring > > and just might change the timing enough to make the problem 'hide'. > > > > Can you try something completely different please? > > > > Just use plain Linus tree without any additional patches on top and disable > > CONFIG_NO_HZ_IDLE, i.e. select CONFIG_HZ_PERIODIC. > > > > If that works, then reenable it and add 'nohz=off' to the kernel command > > line. > > > > No go here I'm afraid: > > Linus' clean 4.15-rc5 compiled with CONFIG_HZ_PERIODIC exhibits the > familiar behaviour: lockups, sometimes instant upon trying to log in, > sometimes logging me in and freaking out seconds later. Ok. So it's not the issue I had in mind. Back to some of the interesting bits in the logs: [ 36.017942] spurious APIC interrupt through vector ff on CPU#0, should never happen. Does that message ever show up in 4.14 or 4.9? Thanks, tglx ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: PROBLEM: 4.15.0-rc3 APIC causes lockups on Core 2 Duo laptop
On Thu, 28 Dec 2017, Alexandru Chirvasitu wrote: > Actually, it decided to cooperate for just long enough for me to get > the dmesg out. Attached. > > This is from the kernel you asked about: Dou's patch + yours, i.e. the > latest one in that git log I just sent, booted up with 'apic=debug'. Ok. As I suspected that warning does not trigger. I would have been massively surprised if that happened. So Dou's patch is just a red herring and just might change the timing enough to make the problem 'hide'. Can you try something completely different please? Just use plain Linus tree without any additional patches on top and disable CONFIG_NO_HZ_IDLE, i.e. select CONFIG_HZ_PERIODIC. If that works, then reenable it and add 'nohz=off' to the kernel command line. Thanks, tglx ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: PROBLEM: 4.15.0-rc3 APIC causes lockups on Core 2 Duo laptop
On Thu, Dec 28, 2017 at 10:48:35AM -0500, Alexandru Chirvasitu wrote: > On Thu, Dec 28, 2017 at 03:48:15PM +0100, Thomas Gleixner wrote: > > On Thu, 28 Dec 2017, Alexandru Chirvasitu wrote: > > > On Thu, Dec 28, 2017 at 12:00:47PM +0100, Thomas Gleixner wrote: > > > > Ok, lets take a step back. The bisect/kexec attempts led us away from > > > > the > > > > initial problem which is the machine locking up after login, right? > > > > > > > > > > Yes; sorry about that.. > > > > Nothing to be sorry about. > > > > > x86/vector: Replace the raw_spin_lock() with > > > > > > diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c > > > index 7504491..e5bab02 100644 > > > --- a/arch/x86/kernel/apic/vector.c > > > +++ b/arch/x86/kernel/apic/vector.c > > > @@ -726,6 +726,7 @@ static int apic_set_affinity(struct irq_data *irqd, > > > const struct cpumask *dest, bool force) > > > { > > > struct apic_chip_data *apicd = apic_chip_data(irqd); > > > + unsigned long flags; > > > int err; > > > > > > /* > > > @@ -740,13 +741,13 @@ static int apic_set_affinity(struct irq_data *irqd, > > > (apicd->is_managed || apicd->can_reserve)) > > > return IRQ_SET_MASK_OK; > > > > > > - raw_spin_lock(&vector_lock); > > > + raw_spin_lock_irqsave(&vector_lock, flags); > > > cpumask_and(vector_searchmask, dest, cpu_online_mask); > > > if (irqd_affinity_is_managed(irqd)) > > > err = assign_managed_vector(irqd, vector_searchmask); > > > else > > > err = assign_vector_locked(irqd, vector_searchmask); > > > - raw_spin_unlock(&vector_lock); > > > + raw_spin_unlock_irqrestore(&vector_lock, flags); > > > return err ? err : IRQ_SET_MASK_OK; > > > } > > > > > > With this, I still get the lockup messages after login, but not the > > > freezes! > > > > That's really interesting. There should be no code path which calls into > > that with interrupts enabled. I assume you never ran that kernel with > > CONFIG_PROVE_LOCKING=y. > > > > Correct. That option is not set in .config. > > > Find below a debug patch which should show us the call chain for that > > case. Please apply that on top of Dou's patch so the machine stays > > accessible. Plain output from dmesg is sufficient. > > > > > The lockups register in the log, which I am attaching (see below for > > > attachment naming conventions). > > > > Hmm. That's RCU lockups and that backtrace on the CPU which gets the stall > > looks very familiar. I'd like to see the above result first and then I'll > > send you another pile of patches which might cure that RCU issue. > > > > Thanks, > > > > tglx > > > > 8<--- > > --- a/arch/x86/kernel/apic/vector.c > > +++ b/arch/x86/kernel/apic/vector.c > > @@ -729,6 +729,8 @@ static int apic_set_affinity(struct irq_ > > unsigned long flags; > > int err; > > > > + WARN_ON_ONCE(!irqs_disabled()); > > + > > /* > > * Core code can call here for inactive interrupts. For inactive > > * interrupts which use managed or reservation mode there is no > > > > > > > > Bit of a step back here: the kernel treated with Dou's patch no longer > logs me in reliably as before, with or without this newest patch on > top.. > > So now I sometimes get immediate lockups and freezes upon trying to > log in, and other times I get logged in but get a freeze seconds > later. > > In no case can I roam around long nough to get a dmesg, and I no > longer get the non-freezing lockups from before. I can't imagine what > I could possibly have changed.. > > Here's the output of `git log --pretty=oneline -5` on the branch I'm > working in. > > > > f2c02af5cc1d620c039b21fab0ca5948a06daf90 2nd tglx patch > 7715575170bacf3566d400b9f2210a10ce152880 x86/vector: Replace the > raw_spin_lock() with raw_spin_lock_irqsave() > 8d9d56caf33d78bfe6b6087767b1b84acee58458 x86-32: fix kexec with stack canary > (CONFIG_CC_STACKPROTECTOR) > a197e9dea4ccb72e1a6457fac15329bd5319e719 irq/matrix: Remove the overused > BUGON() in irq_matrix_assign_system() > 464e1d5f23cca236b930ef068c328a64cab78fb1 Linux 4.15-rc5 > > > > 7715575170bacf3566d400b9f2210a10ce152880, which is the kernel with > Dou's patch, logged me in and allowed me to produce the dmesg from > before. I did this a couple of times back then. I no longer can, for > some reason, as it's reverted back to the no-go lockups from before. > > And the next one, f2c02af5cc1d620c039b21fab0ca5948a06daf90, where I > applied the patch you just sent, behaves identically. > > Actually, it decided to cooperate for just long enough for me to get the dmesg out. Attached. This is from the kernel you asked about: Dou's patch + yours, i.e. the latest one in that git log I just sent, booted up with 'apic=debug'. [0.00] Linux version 4.15.0-rc5-dou-thms-p2+ (root@ax
Re: PROBLEM: 4.15.0-rc3 APIC causes lockups on Core 2 Duo laptop
On Thu, Dec 28, 2017 at 03:48:15PM +0100, Thomas Gleixner wrote: > On Thu, 28 Dec 2017, Alexandru Chirvasitu wrote: > > On Thu, Dec 28, 2017 at 12:00:47PM +0100, Thomas Gleixner wrote: > > > Ok, lets take a step back. The bisect/kexec attempts led us away from the > > > initial problem which is the machine locking up after login, right? > > > > > > > Yes; sorry about that.. > > Nothing to be sorry about. > > > x86/vector: Replace the raw_spin_lock() with > > > > diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c > > index 7504491..e5bab02 100644 > > --- a/arch/x86/kernel/apic/vector.c > > +++ b/arch/x86/kernel/apic/vector.c > > @@ -726,6 +726,7 @@ static int apic_set_affinity(struct irq_data *irqd, > > const struct cpumask *dest, bool force) > > { > > struct apic_chip_data *apicd = apic_chip_data(irqd); > > + unsigned long flags; > > int err; > > > > /* > > @@ -740,13 +741,13 @@ static int apic_set_affinity(struct irq_data *irqd, > > (apicd->is_managed || apicd->can_reserve)) > > return IRQ_SET_MASK_OK; > > > > - raw_spin_lock(&vector_lock); > > + raw_spin_lock_irqsave(&vector_lock, flags); > > cpumask_and(vector_searchmask, dest, cpu_online_mask); > > if (irqd_affinity_is_managed(irqd)) > > err = assign_managed_vector(irqd, vector_searchmask); > > else > > err = assign_vector_locked(irqd, vector_searchmask); > > - raw_spin_unlock(&vector_lock); > > + raw_spin_unlock_irqrestore(&vector_lock, flags); > > return err ? err : IRQ_SET_MASK_OK; > > } > > > > With this, I still get the lockup messages after login, but not the > > freezes! > > That's really interesting. There should be no code path which calls into > that with interrupts enabled. I assume you never ran that kernel with > CONFIG_PROVE_LOCKING=y. > Correct. That option is not set in .config. > Find below a debug patch which should show us the call chain for that > case. Please apply that on top of Dou's patch so the machine stays > accessible. Plain output from dmesg is sufficient. > > > The lockups register in the log, which I am attaching (see below for > > attachment naming conventions). > > Hmm. That's RCU lockups and that backtrace on the CPU which gets the stall > looks very familiar. I'd like to see the above result first and then I'll > send you another pile of patches which might cure that RCU issue. > > Thanks, > > tglx > > 8<--- > --- a/arch/x86/kernel/apic/vector.c > +++ b/arch/x86/kernel/apic/vector.c > @@ -729,6 +729,8 @@ static int apic_set_affinity(struct irq_ > unsigned long flags; > int err; > > + WARN_ON_ONCE(!irqs_disabled()); > + > /* >* Core code can call here for inactive interrupts. For inactive >* interrupts which use managed or reservation mode there is no > > > Bit of a step back here: the kernel treated with Dou's patch no longer logs me in reliably as before, with or without this newest patch on top.. So now I sometimes get immediate lockups and freezes upon trying to log in, and other times I get logged in but get a freeze seconds later. In no case can I roam around long nough to get a dmesg, and I no longer get the non-freezing lockups from before. I can't imagine what I could possibly have changed.. Here's the output of `git log --pretty=oneline -5` on the branch I'm working in. f2c02af5cc1d620c039b21fab0ca5948a06daf90 2nd tglx patch 7715575170bacf3566d400b9f2210a10ce152880 x86/vector: Replace the raw_spin_lock() with raw_spin_lock_irqsave() 8d9d56caf33d78bfe6b6087767b1b84acee58458 x86-32: fix kexec with stack canary (CONFIG_CC_STACKPROTECTOR) a197e9dea4ccb72e1a6457fac15329bd5319e719 irq/matrix: Remove the overused BUGON() in irq_matrix_assign_system() 464e1d5f23cca236b930ef068c328a64cab78fb1 Linux 4.15-rc5 7715575170bacf3566d400b9f2210a10ce152880, which is the kernel with Dou's patch, logged me in and allowed me to produce the dmesg from before. I did this a couple of times back then. I no longer can, for some reason, as it's reverted back to the no-go lockups from before. And the next one, f2c02af5cc1d620c039b21fab0ca5948a06daf90, where I applied the patch you just sent, behaves identically. ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: PROBLEM: 4.15.0-rc3 APIC causes lockups on Core 2 Duo laptop
On Thu, 28 Dec 2017, Alexandru Chirvasitu wrote: > On Thu, Dec 28, 2017 at 12:00:47PM +0100, Thomas Gleixner wrote: > > Ok, lets take a step back. The bisect/kexec attempts led us away from the > > initial problem which is the machine locking up after login, right? > > > > Yes; sorry about that.. Nothing to be sorry about. > x86/vector: Replace the raw_spin_lock() with > > diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c > index 7504491..e5bab02 100644 > --- a/arch/x86/kernel/apic/vector.c > +++ b/arch/x86/kernel/apic/vector.c > @@ -726,6 +726,7 @@ static int apic_set_affinity(struct irq_data *irqd, > const struct cpumask *dest, bool force) > { > struct apic_chip_data *apicd = apic_chip_data(irqd); > + unsigned long flags; > int err; > > /* > @@ -740,13 +741,13 @@ static int apic_set_affinity(struct irq_data *irqd, > (apicd->is_managed || apicd->can_reserve)) > return IRQ_SET_MASK_OK; > > - raw_spin_lock(&vector_lock); > + raw_spin_lock_irqsave(&vector_lock, flags); > cpumask_and(vector_searchmask, dest, cpu_online_mask); > if (irqd_affinity_is_managed(irqd)) > err = assign_managed_vector(irqd, vector_searchmask); > else > err = assign_vector_locked(irqd, vector_searchmask); > - raw_spin_unlock(&vector_lock); > + raw_spin_unlock_irqrestore(&vector_lock, flags); > return err ? err : IRQ_SET_MASK_OK; > } > > With this, I still get the lockup messages after login, but not the > freezes! That's really interesting. There should be no code path which calls into that with interrupts enabled. I assume you never ran that kernel with CONFIG_PROVE_LOCKING=y. Find below a debug patch which should show us the call chain for that case. Please apply that on top of Dou's patch so the machine stays accessible. Plain output from dmesg is sufficient. > The lockups register in the log, which I am attaching (see below for > attachment naming conventions). Hmm. That's RCU lockups and that backtrace on the CPU which gets the stall looks very familiar. I'd like to see the above result first and then I'll send you another pile of patches which might cure that RCU issue. Thanks, tglx 8<--- --- a/arch/x86/kernel/apic/vector.c +++ b/arch/x86/kernel/apic/vector.c @@ -729,6 +729,8 @@ static int apic_set_affinity(struct irq_ unsigned long flags; int err; + WARN_ON_ONCE(!irqs_disabled()); + /* * Core code can call here for inactive interrupts. For inactive * interrupts which use managed or reservation mode there is no ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: PROBLEM: 4.15.0-rc3 APIC causes lockups on Core 2 Duo laptop
On Wed, 20 Dec 2017, Alexandru Chirvasitu wrote: > Merging the contents of another exchange spawned from the original > > On Wed, Dec 20, 2017 at 02:12:05AM +, Dexuan Cui wrote: > > For Linux VM running on Hyper-V, we did get "spurious APIC interrupt > > through vector " and a patchset, which included the patch you identifed > > ("genirq: Add config option for reservation mode"), was made to fix the > > issue. But since you're using a physical machine rathter than a VM, I > > suspect it should be a different issue. Aaargh! Why was this never reported and where is that magic patchset? Thanks, tglx ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: PROBLEM: 4.15.0-rc3 APIC causes lockups on Core 2 Duo laptop
On Wed, 20 Dec 2017, Alexandru Chirvasitu wrote: > On Wed, Dec 20, 2017 at 11:58:57AM +0800, Dou Liyang wrote: > > At 12/20/2017 08:31 AM, Thomas Gleixner wrote: > > > > I had never heard of 'bisect' before this casual mention (you might tell > > > > I am a bit out of my depth). I've since applied it to Linus' tree > > > > between > > > > > > > bebc608 Linux 4.14 (good) > > > > > > > > and > > > > > > > > 4fbd8d1 Linux 4.15-rc1 (bad) > > > > > > Is Linus current head 4.15-rc4 bad as well? > > > > > [...] > > Yes. Exactly the same symptoms on > > 1291a0d5 Linux 4.15-rc4 > > compiled just now from Linus' tree. Ok, lets take a step back. The bisect/kexec attempts led us away from the initial problem which is the machine locking up after login, right? Could you try the patch below on top of Linus tree (rc5+)? Thanks, tglx 8<--- --- a/arch/x86/kernel/apic/apic_flat_64.c +++ b/arch/x86/kernel/apic/apic_flat_64.c @@ -151,7 +151,7 @@ static struct apic apic_flat __ro_after_ .apic_id_valid = default_apic_id_valid, .apic_id_registered = flat_apic_id_registered, - .irq_delivery_mode = dest_LowestPrio, + .irq_delivery_mode = dest_Fixed, .irq_dest_mode = 1, /* logical */ .disable_esr= 0, --- a/arch/x86/kernel/apic/probe_32.c +++ b/arch/x86/kernel/apic/probe_32.c @@ -105,7 +105,7 @@ static struct apic apic_default __ro_aft .apic_id_valid = default_apic_id_valid, .apic_id_registered = default_apic_id_registered, - .irq_delivery_mode = dest_LowestPrio, + .irq_delivery_mode = dest_Fixed, /* logical delivery broadcast to all CPUs: */ .irq_dest_mode = 1, --- a/arch/x86/kernel/apic/x2apic_cluster.c +++ b/arch/x86/kernel/apic/x2apic_cluster.c @@ -184,7 +184,7 @@ static struct apic apic_x2apic_cluster _ .apic_id_valid = x2apic_apic_id_valid, .apic_id_registered = x2apic_apic_id_registered, - .irq_delivery_mode = dest_LowestPrio, + .irq_delivery_mode = dest_Fixed, .irq_dest_mode = 1, /* logical */ .disable_esr= 0, ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: PROBLEM: 4.15.0-rc3 APIC causes lockups on Core 2 Duo laptop
Hi Alexandru, At 12/28/2017 10:51 AM, Alexandru Chirvasitu wrote: Ah, of course. Attached is the output of `journalctl --boot=-1` after booting, getting locked up, and then rebooting a good kernel. For the Hard lockups on both CPUs after login: Please try the patch in the attachment by git am ./0001-x86-vector-Replace-the-raw_spin_lock-with-raw_spin_l.patch or patch -p1 < ./0001-x86-vector-Replace-the-raw_spin_lock-with-raw_spin_l.patch Slightly different version of 4.15-rc5; this one has both patches applied, yours and Linus' for kexec, but the latter shouldn't make a difference. --- You'll see another trace in there that's been bugging me, about W=X checking. I'm not qualified to judge how related they are, but during these past few days I've compiled and tested many kernels, and many of them have exhibited the W+X thing but*not* the lockups. Yes, I found it, but I am not familiar with it and have no idea. Thanks, dou. -8< >From 57d8543ea4dcf2a53b1c37757da12866a52aaf57 Mon Sep 17 00:00:00 2001 From: Dou Liyang Date: Thu, 28 Dec 2017 16:20:48 +0800 Subject: [PATCH] x86/vector: Replace the raw_spin_lock() with raw_spin_lock_irqsave() Signed-off-by: Dou Liyang --- arch/x86/kernel/apic/vector.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c index 750449152b04..a43ca26d5dfd 100644 --- a/arch/x86/kernel/apic/vector.c +++ b/arch/x86/kernel/apic/vector.c @@ -726,6 +726,7 @@ static int apic_set_affinity(struct irq_data *irqd, const struct cpumask *dest, bool force) { struct apic_chip_data *apicd = apic_chip_data(irqd); + unsigned long flags; int err; /* @@ -740,13 +741,13 @@ static int apic_set_affinity(struct irq_data *irqd, (apicd->is_managed || apicd->can_reserve)) return IRQ_SET_MASK_OK; - raw_spin_lock(&vector_lock); + raw_spin_lock_irqsave(&vector_lock, flags); cpumask_and(vector_searchmask, dest, cpu_online_mask); if (irqd_affinity_is_managed(irqd)) err = assign_managed_vector(irqd, vector_searchmask); else err = assign_vector_locked(irqd, vector_searchmask); - raw_spin_unlock(&vector_lock); + raw_spin_unlock_irqrestore(&vector_lock, flags); return err ? err : IRQ_SET_MASK_OK; } -- 2.14.3 ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: PROBLEM: 4.15.0-rc3 APIC causes lockups on Core 2 Duo laptop
On Thu, Dec 28, 2017 at 10:06:25AM +0800, Dou Liyang wrote: > Hi Alexandru, > > Thanks for testing ! > At 12/28/2017 12:18 AM, Alexandru Chirvasitu wrote: > > As per instructions, I did the following: > > > > (1) > > > > Checked out > > > > 464e1d5 Linux 4.15-rc5 > > > > (after getting my copy up to date, fetching, pulling ,etc.) and > > compiled it as-is. Config attached (the one labeled 'np' for 'no > > patch'). > > > > Result: > > > > Boot with no extraparameters locks up after login, as before; > > > > apic=debug does not panic, but locks up after login, as before; > > > I also hope to see the log with "apic=debug" by "journalctl" command, > though the logs don't have the lockup trace. Ah, of course. Attached is the output of `journalctl --boot=-1` after booting, getting locked up, and then rebooting a good kernel. Slightly different version of 4.15-rc5; this one has both patches applied, yours and Linus' for kexec, but the latter shouldn't make a difference. --- You'll see another trace in there that's been bugging me, about W=X checking. I'm not qualified to judge how related they are, but during these past few days I've compiled and tested many kernels, and many of them have exhibited the W+X thing but *not* the lockups. I hope to trace that one back to the original commit with another bisect one of these days, but they do seem to be different issues. > > Thanks, > dou. > > > > > -- Logs begin at Sat 2017-12-23 08:45:59 EST, end at Wed 2017-12-27 21:42:46 EST. -- Dec 27 21:39:03 D-69-91-141-110 kernel: Linux version 4.15.0-rc5-kex-fix+ (root@axiomatic) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.5)) #1 SMP Wed Dec 27 17:37:47 EST 2017 Dec 27 21:39:03 D-69-91-141-110 kernel: x86/fpu: x87 FPU will use FXSAVE Dec 27 21:39:03 D-69-91-141-110 kernel: e820: BIOS-provided physical RAM map: Dec 27 21:39:03 D-69-91-141-110 kernel: BIOS-e820: [mem 0x-0x0009fbff] usable Dec 27 21:39:03 D-69-91-141-110 kernel: BIOS-e820: [mem 0x0009fc00-0x0009] reserved Dec 27 21:39:03 D-69-91-141-110 kernel: BIOS-e820: [mem 0x000e-0x000f] reserved Dec 27 21:39:03 D-69-91-141-110 kernel: BIOS-e820: [mem 0x0010-0xb7f9] usable Dec 27 21:39:03 D-69-91-141-110 kernel: BIOS-e820: [mem 0xb7fa-0xb7fadfff] ACPI data Dec 27 21:39:03 D-69-91-141-110 kernel: BIOS-e820: [mem 0xb7fae000-0xb7fe] ACPI NVS Dec 27 21:39:03 D-69-91-141-110 kernel: BIOS-e820: [mem 0xb7ff-0xb7ff] reserved Dec 27 21:39:03 D-69-91-141-110 kernel: BIOS-e820: [mem 0xfee0-0xfee00fff] reserved Dec 27 21:39:03 D-69-91-141-110 kernel: BIOS-e820: [mem 0xffb8-0x] reserved Dec 27 21:39:03 D-69-91-141-110 kernel: NX (Execute Disable) protection: active Dec 27 21:39:03 D-69-91-141-110 kernel: random: fast init done Dec 27 21:39:03 D-69-91-141-110 kernel: SMBIOS 2.4 present. Dec 27 21:39:03 D-69-91-141-110 kernel: DMI: ASUSTeK Computer Inc. F5RL /F5RL , BIOS 210 06/12/2008 Dec 27 21:39:03 D-69-91-141-110 kernel: e820: update [mem 0x-0x0fff] usable ==> reserved Dec 27 21:39:03 D-69-91-141-110 kernel: e820: remove [mem 0x000a-0x000f] usable Dec 27 21:39:03 D-69-91-141-110 kernel: e820: last_pfn = 0xb7fa0 max_arch_pfn = 0x100 Dec 27 21:39:03 D-69-91-141-110 kernel: MTRR default type: uncachable Dec 27 21:39:03 D-69-91-141-110 kernel: MTRR fixed ranges enabled: Dec 27 21:39:03 D-69-91-141-110 kernel: 0-9 write-back Dec 27 21:39:03 D-69-91-141-110 kernel: A-B uncachable Dec 27 21:39:03 D-69-91-141-110 kernel: C-C write-protect Dec 27 21:39:03 D-69-91-141-110 kernel: D-D uncachable Dec 27 21:39:03 D-69-91-141-110 kernel: E-E write-through Dec 27 21:39:03 D-69-91-141-110 kernel: F-F write-protect Dec 27 21:39:03 D-69-91-141-110 kernel: MTRR variable ranges enabled: Dec 27 21:39:03 D-69-91-141-110 kernel: 0 base 0 mask F8000 write-back Dec 27 21:39:03 D-69-91-141-110 kernel: 1 base 08000 mask FE000 write-back Dec 27 21:39:03 D-69-91-141-110 kernel: 2 base 0A000 mask FF000 write-back Dec 27 21:39:03 D-69-91-141-110 kernel: 3 base 0B000 mask FF800 write-back Dec 27 21:39:03 D-69-91-141-110 kernel: 4 base 0B800 mask FFC00 write-back Dec 27 21:39:03 D-69-91-141-110 kernel: 5 base 0BC00 mask FFF00 write-back Dec 27 21:39:03 D-69-91-141-110 kernel: 6 base 0C000 mask FF000 write-combining Dec 27 21:39:03 D-69-91-141-110 kernel: 7 disabled Dec 27 21:39:03 D-69-91-141-110 kernel: x86/PAT: Configuration [0-7]: WB WC UC- UC WB WP UC- WT Dec 27 21:39:03 D-69-91-141-110 kernel: Scan for SMP in [mem 0x-0x03ff] Dec 27 21:39:03 D-69-91-141-110 kernel: Scan for SMP in [mem 0x0009fc00-0x0009] Dec 27 21:39:03 D-69-91-141-110 kernel: Scan
Re: PROBLEM: 4.15.0-rc3 APIC causes lockups on Core 2 Duo laptop
Hi Alexandru, Thanks for testing ! At 12/28/2017 12:18 AM, Alexandru Chirvasitu wrote: As per instructions, I did the following: (1) Checked out 464e1d5 Linux 4.15-rc5 (after getting my copy up to date, fetching, pulling ,etc.) and compiled it as-is. Config attached (the one labeled 'np' for 'no patch'). Result: Boot with no extraparameters locks up after login, as before; apic=debug does not panic, but locks up after login, as before; I also hope to see the log with "apic=debug" by "journalctl" command, though the logs don't have the lockup trace. Thanks, dou. ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: PROBLEM: 4.15.0-rc3 APIC causes lockups on Core 2 Duo laptop
Hi Alexandru, At 12/24/2017 04:01 AM, Alexandru Chirvasitu wrote: On Sat, Dec 23, 2017 at 02:32:52PM +0100, Thomas Gleixner wrote: On Sat, 23 Dec 2017, Dexuan Cui wrote: From: Alexandru Chirvasitu [mailto:achirva...@gmail.com] Sent: Friday, December 22, 2017 14:29 The output of that precise command run just now on a freshly-compiled copy of that commit is attached. On Fri, Dec 22, 2017 at 09:31:28PM +, Dexuan Cui wrote: From: Alexandru Chirvasitu [mailto:achirva...@gmail.com] Sent: Friday, December 22, 2017 06:21 In the absence of logs, the best I can do at the moment is attach a picture of the screen I am presented with on the boot attempt. Alex The panic happens in irq_matrix_assign_system+0x4e/0xd0 in your picture. IMO we should find which line of code causes the panic. I suppose "objdump -D kernel/irq/matrix.o" can help to do that. Thanks, -- Dexuan The BUG_ON panic happens at line 147: BUG_ON(!test_and_clear_bit(bit, cm->alloc_map)); There are 2 bugs in your laptop: 1. Hard lockups on both CPUs after login 2. panic with "apic=debug" For the 2th bug, please try the following patch(need Thomas confirmation :) ) in Linux 4.15-rc5. I think it can fix the panic. If the 2th bug fixed, let's back to the 1th bug: Is Linus current head 4.15-rc5 bad as well? If yes, Please using "apic=debug" and give the dmesg log. Thanks, dou. 8<--- irq/matrix: Remove the overused BUGON() in irq_matrix_assign_system() Currently, x86 marks the preallocated legacy interrupts when initializing IRQ(native_init_IRQ), but will clear them if they are not activated in vector_configure_legacy(). So, in irq_matrix_assign_system(), replacing an legacy vector which may not allocated in a cpumap->alloc_map[] with a system vector will trigger the BUGON(); Remove the BUGON(). Signed-off-by: Dou Liyang --- kernel/irq/matrix.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/kernel/irq/matrix.c b/kernel/irq/matrix.c index 0ba0dd8863a7..876cbeab9ca2 100644 --- a/kernel/irq/matrix.c +++ b/kernel/irq/matrix.c @@ -143,11 +143,12 @@ void irq_matrix_assign_system(struct irq_matrix *m, unsigned int bit, BUG_ON(m->online_maps > 1 || (m->online_maps && !replace)); set_bit(bit, m->system_map); - if (replace) { - BUG_ON(!test_and_clear_bit(bit, cm->alloc_map)); + + if (replace && test_and_clear_bit(bit, cm->alloc_map)){ cm->allocated--; m->total_allocated--; } + if (bit >= m->alloc_start && bit < m->alloc_end) m->systembits_inalloc++; -- ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: PROBLEM: 4.15.0-rc3 APIC causes lockups on Core 2 Duo laptop
Hi Thomas, At 12/23/2017 09:32 PM, Thomas Gleixner wrote: [...] The BUG_ON panic happens at line 147: BUG_ON(!test_and_clear_bit(bit, cm->alloc_map)); I'm sure Thomas and Dou know it better than me. I'll have a look after the holidays. Merry Christmas! :-) I am trying to look into it. Thanks, dou ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: PROBLEM: 4.15.0-rc3 APIC causes lockups on Core 2 Duo laptop
On Sat, Dec 23, 2017 at 02:32:52PM +0100, Thomas Gleixner wrote: > On Sat, 23 Dec 2017, Dexuan Cui wrote: > > > > From: Alexandru Chirvasitu [mailto:achirva...@gmail.com] > > > Sent: Friday, December 22, 2017 14:29 > > > > > > The output of that precise command run just now on a freshly-compiled > > > copy of that commit is attached. > > > > > > On Fri, Dec 22, 2017 at 09:31:28PM +, Dexuan Cui wrote: > > > > > From: Alexandru Chirvasitu [mailto:achirva...@gmail.com] > > > > > Sent: Friday, December 22, 2017 06:21 > > > > > > > > > > In the absence of logs, the best I can do at the moment is attach a > > > > > picture of the screen I am presented with on the apic=debug boot > > > > > attempt. > > > > > Alex > > > > > > > > The panic happens in irq_matrix_assign_system+0x4e/0xd0 in your picture. > > > > IMO we should find which line of code causes the panic. I suppose > > > > "objdump -D kernel/irq/matrix.o" can help to do that. > > > > > > > > Thanks, > > > > -- Dexuan > > > > The BUG_ON panic happens at line 147: > >BUG_ON(!test_and_clear_bit(bit, cm->alloc_map)); > > > > I'm sure Thomas and Dou know it better than me. > > I'll have a look after the holidays. > Thanks for that! A quick follow-up on my inability to make kexec / kdump work in order to perhaps produce better logs: I've done another bisect for that with this result: # first bad commit: [e802a51ede91350438c051da2f238f5e8c918ead] x86/idt: Consolidate IDT invalidation I am quite certain this is the one for that issue. Its only parent is # good: [8f55868f9e42fea56021b17421914b9e4fda4960] x86/idt: Remove unused set_trap_gate() (i.e. one of the "good" commits I hit upon during the bisect). On the core 2 duo machine I've been referring to e802a51 and later commits simply return me to a regular BIOS boot when issuing either kexec -e on a loaded crash kernel or crashing with echo c > /proc/sysrq-trigger. Alex ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: PROBLEM: 4.15.0-rc3 APIC causes lockups on Core 2 Duo laptop
On Sat, 23 Dec 2017, Dexuan Cui wrote: > > From: Alexandru Chirvasitu [mailto:achirva...@gmail.com] > > Sent: Friday, December 22, 2017 14:29 > > > > The output of that precise command run just now on a freshly-compiled > > copy of that commit is attached. > > > > On Fri, Dec 22, 2017 at 09:31:28PM +, Dexuan Cui wrote: > > > > From: Alexandru Chirvasitu [mailto:achirva...@gmail.com] > > > > Sent: Friday, December 22, 2017 06:21 > > > > > > > > In the absence of logs, the best I can do at the moment is attach a > > > > picture of the screen I am presented with on the apic=debug boot > > > > attempt. > > > > Alex > > > > > > The panic happens in irq_matrix_assign_system+0x4e/0xd0 in your picture. > > > IMO we should find which line of code causes the panic. I suppose > > > "objdump -D kernel/irq/matrix.o" can help to do that. > > > > > > Thanks, > > > -- Dexuan > > The BUG_ON panic happens at line 147: >BUG_ON(!test_and_clear_bit(bit, cm->alloc_map)); > > I'm sure Thomas and Dou know it better than me. I'll have a look after the holidays. Thanks, tglx ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: PROBLEM: 4.15.0-rc3 APIC causes lockups on Core 2 Duo laptop
I was just now trying to track down my other issue, whereby somewhere along the tree kexec stops working properly. In the process of doing that I realized I had initially made one change to the original 4.9 config beyond oldconfig: I'd turned off WX debugging. I've now compiled a bunch of versions with WX debugging back on, and new behavior arises. I am attaching the joournalctl log of a booted 4.13 kernel (from Linus' tree, commit 569dbb8). It boots and logs me in, but returns a call trace I wasn't seeing without the WX debugging. I am sending over in case it provides any information. The trace bears the 23:24:09 timestamp. On Sat, Dec 23, 2017 at 01:35:12AM +, Dexuan Cui wrote: > > From: Alexandru Chirvasitu [mailto:achirva...@gmail.com] > > Sent: Friday, December 22, 2017 14:29 > > > > The output of that precise command run just now on a freshly-compiled > > copy of that commit is attached. > > > > On Fri, Dec 22, 2017 at 09:31:28PM +, Dexuan Cui wrote: > > > > From: Alexandru Chirvasitu [mailto:achirva...@gmail.com] > > > > Sent: Friday, December 22, 2017 06:21 > > > > > > > > In the absence of logs, the best I can do at the moment is attach a > > > > picture of the screen I am presented with on the apic=debug boot > > > > attempt. > > > > Alex > > > > > > The panic happens in irq_matrix_assign_system+0x4e/0xd0 in your picture. > > > IMO we should find which line of code causes the panic. I suppose > > > "objdump -D kernel/irq/matrix.o" can help to do that. > > > > > > Thanks, > > > -- Dexuan > > The BUG_ON panic happens at line 147: >BUG_ON(!test_and_clear_bit(bit, cm->alloc_map)); > > I'm sure Thomas and Dou know it better than me. > > 137 void irq_matrix_assign_system(struct irq_matrix *m, unsigned int bit, > 138 bool replace) > 139 { > 140 struct cpumap *cm = this_cpu_ptr(m->maps); > 141 > 142 BUG_ON(bit > m->matrix_bits); > 143 BUG_ON(m->online_maps > 1 || (m->online_maps && !replace)); > 144 > 145 set_bit(bit, m->system_map); > 146 if (replace) { > 147 BUG_ON(!test_and_clear_bit(bit, cm->alloc_map)); > 148 cm->allocated--; > 149 m->total_allocated--; > 150 } > 151 if (bit >= m->alloc_start && bit < m->alloc_end) > 152 m->systembits_inalloc++; > 153 > 154 trace_irq_matrix_assign_system(bit, m); > 155 } > > -- Dexuan > -- Logs begin at Thu 2017-12-14 19:59:20 EST, end at Fri 2017-12-22 23:45:29 EST. -- Dec 22 23:24:09 D-69-91-141-110 kernel: random: get_random_bytes called from start_kernel+0x32/0x3a0 with crng_init=0 Dec 22 23:24:09 D-69-91-141-110 kernel: Linux version 4.13.0 (root@axiomatic) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.5)) #1 SMP Fri Dec 22 22:18:58 EST 2017 Dec 22 23:24:09 D-69-91-141-110 kernel: x86/fpu: x87 FPU will use FXSAVE Dec 22 23:24:09 D-69-91-141-110 kernel: e820: BIOS-provided physical RAM map: Dec 22 23:24:09 D-69-91-141-110 kernel: BIOS-e820: [mem 0x-0x0009fbff] usable Dec 22 23:24:09 D-69-91-141-110 kernel: BIOS-e820: [mem 0x0009fc00-0x0009] reserved Dec 22 23:24:09 D-69-91-141-110 kernel: BIOS-e820: [mem 0x000e-0x000f] reserved Dec 22 23:24:09 D-69-91-141-110 kernel: BIOS-e820: [mem 0x0010-0xb7f9] usable Dec 22 23:24:09 D-69-91-141-110 kernel: BIOS-e820: [mem 0xb7fa-0xb7fadfff] ACPI data Dec 22 23:24:09 D-69-91-141-110 kernel: BIOS-e820: [mem 0xb7fae000-0xb7fe] ACPI NVS Dec 22 23:24:09 D-69-91-141-110 kernel: BIOS-e820: [mem 0xb7ff-0xb7ff] reserved Dec 22 23:24:09 D-69-91-141-110 kernel: BIOS-e820: [mem 0xfee0-0xfee00fff] reserved Dec 22 23:24:09 D-69-91-141-110 kernel: BIOS-e820: [mem 0xffb8-0x] reserved Dec 22 23:24:09 D-69-91-141-110 kernel: NX (Execute Disable) protection: active Dec 22 23:24:09 D-69-91-141-110 kernel: random: fast init done Dec 22 23:24:09 D-69-91-141-110 kernel: SMBIOS 2.4 present. Dec 22 23:24:09 D-69-91-141-110 kernel: DMI: ASUSTeK Computer Inc. F5RL /F5RL , BIOS 210 06/12/2008 Dec 22 23:24:09 D-69-91-141-110 kernel: tsc: Fast TSC calibration using PIT Dec 22 23:24:09 D-69-91-141-110 kernel: e820: update [mem 0x-0x0fff] usable ==> reserved Dec 22 23:24:09 D-69-91-141-110 kernel: e820: remove [mem 0x000a-0x000f] usable Dec 22 23:24:09 D-69-91-141-110 kernel: e820: last_pfn = 0xb7fa0 max_arch_pfn = 0x100 Dec 22 23:24:09 D-69-91-141-110 kernel: MTRR default type: uncachable Dec 22 23:24:09 D-69-91-141-110 kernel: MTRR fixed ranges enabled: Dec 22 23:24:09 D-69-91-141-110 kernel: 0-9 write-back Dec 22 23:24:09 D-69-91-141-110 kernel: A-B uncachable Dec 22 23:24:09 D-69-91-141-110 kernel: C-C write-protect Dec 22 23:24:09 D-69-91-141-110 kernel
RE: PROBLEM: 4.15.0-rc3 APIC causes lockups on Core 2 Duo laptop
> From: Alexandru Chirvasitu [mailto:achirva...@gmail.com] > Sent: Friday, December 22, 2017 14:29 > > The output of that precise command run just now on a freshly-compiled > copy of that commit is attached. > > On Fri, Dec 22, 2017 at 09:31:28PM +, Dexuan Cui wrote: > > > From: Alexandru Chirvasitu [mailto:achirva...@gmail.com] > > > Sent: Friday, December 22, 2017 06:21 > > > > > > In the absence of logs, the best I can do at the moment is attach a > > > picture of the screen I am presented with on the apic=debug boot > > > attempt. > > > Alex > > > > The panic happens in irq_matrix_assign_system+0x4e/0xd0 in your picture. > > IMO we should find which line of code causes the panic. I suppose > > "objdump -D kernel/irq/matrix.o" can help to do that. > > > > Thanks, > > -- Dexuan The BUG_ON panic happens at line 147: BUG_ON(!test_and_clear_bit(bit, cm->alloc_map)); I'm sure Thomas and Dou know it better than me. 137 void irq_matrix_assign_system(struct irq_matrix *m, unsigned int bit, 138 bool replace) 139 { 140 struct cpumap *cm = this_cpu_ptr(m->maps); 141 142 BUG_ON(bit > m->matrix_bits); 143 BUG_ON(m->online_maps > 1 || (m->online_maps && !replace)); 144 145 set_bit(bit, m->system_map); 146 if (replace) { 147 BUG_ON(!test_and_clear_bit(bit, cm->alloc_map)); 148 cm->allocated--; 149 m->total_allocated--; 150 } 151 if (bit >= m->alloc_start && bit < m->alloc_end) 152 m->systembits_inalloc++; 153 154 trace_irq_matrix_assign_system(bit, m); 155 } -- Dexuan ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
RE: PROBLEM: 4.15.0-rc3 APIC causes lockups on Core 2 Duo laptop
> From: Alexandru Chirvasitu [mailto:achirva...@gmail.com] > Sent: Friday, December 22, 2017 06:21 > > In the absence of logs, the best I can do at the moment is attach a > picture of the screen I am presented with on the apic=debug boot > attempt. > Alex The panic happens in irq_matrix_assign_system+0x4e/0xd0 in your picture. IMO we should find which line of code causes the panic. I suppose "objdump -D kernel/irq/matrix.o" can help to do that. Thanks, -- Dexuan ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: PROBLEM: 4.15.0-rc3 APIC causes lockups on Core 2 Duo laptop
Hi Alexandru, At 12/21/2017 10:23 AM, Alexandru Chirvasitu wrote: This might be more helpful. I ran another bisect with the following final log: --- git bisect start # bad: [d6ffc6ac83b1f9f12652d89b9cb5bcbfbea7796c] x86/vector: Respect affinity mask in irq descriptor git bisect bad d6ffc6ac83b1f9f12652d89b9cb5bcbfbea7796c # good: [e4ae4c8ea7c65f61fde29c689d148c8c9e05305a] Merge branch 'irq/core' into x86/apic git bisect good e4ae4c8ea7c65f61fde29c689d148c8c9e05305a # good: [4ef76eb6de734dc03a7f3b8f80884362364e6049] x86/apic: Get rid of the legacy irq data storage git bisect good 4ef76eb6de734dc03a7f3b8f80884362364e6049 # good: [ba801640b10d87b1c4e26cbcbe414a001255404f] x86/vector: Compile SMP only code conditionally git bisect good ba801640b10d87b1c4e26cbcbe414a001255404f # good: [90ad9e2d91067983f3328e21b306323877e5f48a] x86/io_apic: Reevaluate vector configuration on activate() git bisect good 90ad9e2d91067983f3328e21b306323877e5f48a # bad: [4900be83602b6be07366d3e69f756c1959f4169a] x86/vector/msi: Switch to global reservation mode git bisect bad 4900be83602b6be07366d3e69f756c1959f4169a # bad: [2db1f959d9dc16035f2eb44ed5fdb2789b754d6a] x86/vector: Handle managed interrupts proper git bisect bad 2db1f959d9dc16035f2eb44ed5fdb2789b754d6a # first bad commit: [2db1f959d9dc16035f2eb44ed5fdb2789b754d6a] x86/vector: Handle managed interrupts proper It's helpful to me. I tried it in QEmu with (Intel(R) Core(TM)2 Duo CPU T7700 @ 2.40GHz) but, can't reproduced the bug. --- That first bad commit 2db1f95 identified at the end is interesting: it's the only one I've tried through all of this that actually gives me a kernel panic when unadorned with kernel options (so unlike all of the others it fails to even drop me at a tty login prompt). I tried a number of things to fiddle with it: it boots fine with either nolapic or noapic. The former results in seeing a single cpu with lscpu, but the latter (noapic) seems to give me as much functionality as I'd need. I'm not seeing the issue noted before, Because the "noapic" just disable the I/O APIC, but, the "nolapic" will disable both the Local APIC and I/O APIC whereby noapic for 4.15.0-rc3 was somehow disabling my ethernet card. I hope this second bisect went down better than the last one.. Could you add "apic=debug" in the kernel command line, then, give me the dmesg log? Thanks, dou. ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: PROBLEM: 4.15.0-rc3 APIC causes lockups on Core 2 Duo laptop
Hi Thomas, At 12/20/2017 08:31 AM, Thomas Gleixner wrote: On Tue, 19 Dec 2017, Alexandru Chirvasitu wrote: I had never heard of 'bisect' before this casual mention (you might tell I am a bit out of my depth). I've since applied it to Linus' tree between bebc608 Linux 4.14 (good) and 4fbd8d1 Linux 4.15-rc1 (bad) Is Linus current head 4.15-rc4 bad as well? [...] Thanks for doing that bisect, but unfortunately this commit cannot be the problematic one, It merily adds a config symbol, but it does not change any code at all. It has no effect whatsoever. So something might have gone wrong in your bisecting. Agree. I CC'ed Dou Liyang. He has changed the early APIC setup code and there has been an issue reported already. Though I lost track of that. Dou, any Is it this one? https://marc.info/?l=linux-kernel&m=151188084018443 pointers? Not sure, but seems the APIC failed to start in that 32-bit system. I will look into it. Alex, Could you give me your .config file and the dmesg-log of 4.15.0-rc3. Thanks, dou ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: PROBLEM: 4.15.0-rc3 APIC causes lockups on Core 2 Duo laptop
On Tue, 19 Dec 2017, Alexandru Chirvasitu wrote: > I had never heard of 'bisect' before this casual mention (you might tell > I am a bit out of my depth). I've since applied it to Linus' tree between > bebc608 Linux 4.14 (good) > > and > > 4fbd8d1 Linux 4.15-rc1 (bad) Is Linus current head 4.15-rc4 bad as well? > It took about 13 attempts (I had access to a faster machine to compile > on, and ccache helped once the cache built up some momentum). The result > is (as presented by 'git bisect' at the end of the process, between the > --- dividers added by me for clarity): > --- start of output --- > > 2b5175c4fa974b6aa05bbd2ee8d443a8036a1714 is the first bad commit > commit 2b5175c4fa974b6aa05bbd2ee8d443a8036a1714 > Author: Thomas Gleixner > Date: Tue Oct 17 09:54:57 2017 +0200 > > genirq: Add config option for reservation mode > > The interrupt reservation mode requires reactivation of PCI/MSI > interrupts. Create a config option, so the PCI code can set the > corresponding flag when required. > > Signed-off-by: Thomas Gleixner > Cc: Josh Poulson > Cc: Mihai Costache > Cc: Stephen Hemminger > Cc: Marc Zyngier > Cc: linux-...@vger.kernel.org > Cc: Haiyang Zhang > Cc: Dexuan Cui > Cc: Simon Xiao > Cc: Saeed Mahameed > Cc: Jork Loeser > Cc: Bjorn Helgaas > Cc: de...@linuxdriverproject.org > Cc: KY Srinivasan > Link: https://lkml.kernel.org/r/20171017075600.369375...@linutronix.de > > :04 04 5e73031cc0c8411a20722cce7876ab7b82ed3858 > dcf98e7a6b7d5f7c5353b7ccab02125e6d332ec8 M kernel > > --- end of output --- > > Consequently, I am cc-ing in the listed addresses. Thanks for doing that bisect, but unfortunately this commit cannot be the problematic one, It merily adds a config symbol, but it does not change any code at all. It has no effect whatsoever. So something might have gone wrong in your bisecting. I CC'ed Dou Liyang. He has changed the early APIC setup code and there has been an issue reported already. Though I lost track of that. Dou, any pointers? Thanks, tglx ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: PROBLEM: 4.15.0-rc3 APIC causes lockups on Core 2 Duo laptop
Thank you! On Mon, Dec 18, 2017 at 11:11:31AM +0100, Pavel Machek wrote: > Hi! > On Mon 2017-12-18 03:20:11, Alexandru Chirvasitu wrote: > > Short description of the problem: latest rc kernel results in seemingly > > APIC-caused hard lockups, whereas latest stable kernel works fine. > > > > I have an old ASUS F5RL laptop with an Intel Core 2 Duo CPU T5450 @1.66GHz. > > It is currently running Debian 9.3 stable 32 bit (by default on a > > 4.9-series kernel), but I have been compiling and installing the latest > > kernels. > > > > Thanks for doing that. > > > The latest rc kernel at the time of this writing (4.15.0-rc3) boots but > > then results in hard lockups on both CPUs after login. Starting in recovery > > mode returns the error > > > > "spurious APIC interrupt through vector ff on CPU#0, should never happen" > > > > before lockip up the CPUs again. A hard reboot is necessary. > > > > Starting with kernel option noapic logs me in uneventfully, but for some > > reason has the effect of rendering my ethrenet card inoperable. It is a > > Qualcomm Atheros Attansic L2 Fast Ethernet (rev a0), handled by kernel > > module atl2. In noapic mode the card is still seen by the system, can be > > brought up / down, etc., but dhclient never manages to acquire a lease. > > > > Starting with kernel option nolapic instead brings up the network and logs > > me in, but only sees one CPU instead of two, as usual. > > > > The latest kernel that exhibits none of these issues is the latest stable > > one as of this writing: 4.14.7. > > > > --- > > > > As this seems to be APIC-related, I am sending the message to the > > maintainers mentioned in arch/x86/kernel/apic/apic.c. I am unsure whether > > this is the correct procedure however. > > > > Good enough procedure. You want to always copy linux-kernel mailing > list, and you should probably look for X86 maintainers in MAINTAINERS > file, and cc them, too. > > If you run out of other options, you can always do "git bisect"... > I had never heard of 'bisect' before this casual mention (you might tell I am a bit out of my depth). I've since applied it to Linus' tree between bebc608 Linux 4.14 (good) and 4fbd8d1 Linux 4.15-rc1 (bad) It took about 13 attempts (I had access to a faster machine to compile on, and ccache helped once the cache built up some momentum). The result is (as presented by 'git bisect' at the end of the process, between the --- dividers added by me for clarity): --- start of output --- 2b5175c4fa974b6aa05bbd2ee8d443a8036a1714 is the first bad commit commit 2b5175c4fa974b6aa05bbd2ee8d443a8036a1714 Author: Thomas Gleixner Date: Tue Oct 17 09:54:57 2017 +0200 genirq: Add config option for reservation mode The interrupt reservation mode requires reactivation of PCI/MSI interrupts. Create a config option, so the PCI code can set the corresponding flag when required. Signed-off-by: Thomas Gleixner Cc: Josh Poulson Cc: Mihai Costache Cc: Stephen Hemminger Cc: Marc Zyngier Cc: linux-...@vger.kernel.org Cc: Haiyang Zhang Cc: Dexuan Cui Cc: Simon Xiao Cc: Saeed Mahameed Cc: Jork Loeser Cc: Bjorn Helgaas Cc: de...@linuxdriverproject.org Cc: KY Srinivasan Link: https://lkml.kernel.org/r/20171017075600.369375...@linutronix.de :04 04 5e73031cc0c8411a20722cce7876ab7b82ed3858 dcf98e7a6b7d5f7c5353b7ccab02125e6d332ec8 M kernel --- end of output --- Consequently, I am cc-ing in the listed addresses. Thank you, Alex Chirvasitu > Best regards, Pavel > -- > (english) http://www.livejournal.com/~pavelmachek > (cesky, pictures) > http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel