Re: PROBLEM: 4.15.0-rc3 APIC causes lockups on Core 2 Duo laptop

2017-12-28 Thread Dou Liyang

Hi Alexandru,

At 12/28/2017 10:51 AM, Alexandru Chirvasitu wrote:

Ah, of course. Attached is the output of `journalctl --boot=-1` after
booting, getting locked up, and then rebooting a good kernel.


For the Hard lockups on both CPUs after login:

Please try the patch in the attachment by

git am ./0001-x86-vector-Replace-the-raw_spin_lock-with-raw_spin_l.patch

or

patch -p1 < 
./0001-x86-vector-Replace-the-raw_spin_lock-with-raw_spin_l.patch



Slightly different version of 4.15-rc5; this one has both patches
applied, yours and Linus' for kexec, but the latter shouldn't make a
difference.

---

You'll see another trace in there that's been bugging me, about W=X
checking. I'm not qualified to judge how related they are, but during
these past few days I've compiled and tested many kernels, and many of
them have exhibited the W+X thing but*not*  the lockups.



Yes, I found it, but I am not familiar with it and have no idea.

Thanks,
dou.

-8<


>From 57d8543ea4dcf2a53b1c37757da12866a52aaf57 Mon Sep 17 00:00:00 2001
From: Dou Liyang <douly.f...@cn.fujitsu.com>
Date: Thu, 28 Dec 2017 16:20:48 +0800
Subject: [PATCH] x86/vector: Replace the raw_spin_lock() with
 raw_spin_lock_irqsave()

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 arch/x86/kernel/apic/vector.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
index 750449152b04..a43ca26d5dfd 100644
--- a/arch/x86/kernel/apic/vector.c
+++ b/arch/x86/kernel/apic/vector.c
@@ -726,6 +726,7 @@ static int apic_set_affinity(struct irq_data *irqd,
 			 const struct cpumask *dest, bool force)
 {
 	struct apic_chip_data *apicd = apic_chip_data(irqd);
+	unsigned long flags;
 	int err;
 
 	/*
@@ -740,13 +741,13 @@ static int apic_set_affinity(struct irq_data *irqd,
 	(apicd->is_managed || apicd->can_reserve))
 		return IRQ_SET_MASK_OK;
 
-	raw_spin_lock(_lock);
+	raw_spin_lock_irqsave(_lock, flags);
 	cpumask_and(vector_searchmask, dest, cpu_online_mask);
 	if (irqd_affinity_is_managed(irqd))
 		err = assign_managed_vector(irqd, vector_searchmask);
 	else
 		err = assign_vector_locked(irqd, vector_searchmask);
-	raw_spin_unlock(_lock);
+	raw_spin_unlock_irqrestore(_lock, flags);
 	return err ? err : IRQ_SET_MASK_OK;
 }
 
-- 
2.14.3

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: PROBLEM: 4.15.0-rc3 APIC causes lockups on Core 2 Duo laptop

2017-12-27 Thread Dou Liyang

Hi Alexandru,

Thanks for testing !
At 12/28/2017 12:18 AM, Alexandru Chirvasitu wrote:

As per instructions, I did the following:

(1)

Checked out

464e1d5 Linux 4.15-rc5

(after getting my copy up to date, fetching, pulling ,etc.) and
compiled it as-is. Config attached (the one labeled 'np' for 'no
patch').

Result:

Boot with no extraparameters locks up after login, as before;

apic=debug does not panic, but locks up after login, as before;


I also hope to see the log with "apic=debug" by "journalctl" command,
though the logs don't have the lockup trace.

Thanks,
dou.






___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: PROBLEM: 4.15.0-rc3 APIC causes lockups on Core 2 Duo laptop

2017-12-27 Thread Dou Liyang

Hi Alexandru,

At 12/24/2017 04:01 AM, Alexandru Chirvasitu wrote:

On Sat, Dec 23, 2017 at 02:32:52PM +0100, Thomas Gleixner wrote:

On Sat, 23 Dec 2017, Dexuan Cui wrote:


From: Alexandru Chirvasitu [mailto:achirva...@gmail.com]
Sent: Friday, December 22, 2017 14:29

The output of that precise command run just now on a freshly-compiled
copy of that commit is attached.

On Fri, Dec 22, 2017 at 09:31:28PM +, Dexuan Cui wrote:

From: Alexandru Chirvasitu [mailto:achirva...@gmail.com]
Sent: Friday, December 22, 2017 06:21

In the absence of logs, the best I can do at the moment is attach a
picture of the screen I am presented with on the  boot
attempt.
Alex


The panic happens in irq_matrix_assign_system+0x4e/0xd0 in your picture.
IMO we should find which line of code causes the panic. I suppose
"objdump -D kernel/irq/matrix.o" can help to do that.

Thanks,
-- Dexuan


The BUG_ON panic happens at line 147:
BUG_ON(!test_and_clear_bit(bit, cm->alloc_map));



There are 2 bugs in your laptop:

  1. Hard lockups on both CPUs after login
  2. panic with "apic=debug"

For the 2th bug, please try the following patch(need Thomas confirmation
:) ) in Linux 4.15-rc5. I think it can fix the panic.

If the 2th bug fixed, let's back to the 1th bug:

Is Linus current head 4.15-rc5 bad as well?

If yes, Please using "apic=debug" and give the dmesg log.

Thanks,
dou.

8<---

irq/matrix: Remove the overused BUGON() in irq_matrix_assign_system()

Currently, x86 marks the preallocated legacy interrupts when initializing
IRQ(native_init_IRQ), but will clear them if they are not activated in
vector_configure_legacy().

So, in irq_matrix_assign_system(), replacing an legacy vector which may
not allocated in a cpumap->alloc_map[] with a system vector will trigger
the BUGON();

Remove the BUGON().

Signed-off-by: Dou Liyang <douly.f...@cn.fujitsu.com>
---
 kernel/irq/matrix.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/irq/matrix.c b/kernel/irq/matrix.c
index 0ba0dd8863a7..876cbeab9ca2 100644
--- a/kernel/irq/matrix.c
+++ b/kernel/irq/matrix.c
@@ -143,11 +143,12 @@ void irq_matrix_assign_system(struct irq_matrix 
*m, unsigned int bit,

BUG_ON(m->online_maps > 1 || (m->online_maps && !replace));

set_bit(bit, m->system_map);
-   if (replace) {
-   BUG_ON(!test_and_clear_bit(bit, cm->alloc_map));
+
+   if (replace && test_and_clear_bit(bit, cm->alloc_map)){
cm->allocated--;
m->total_allocated--;
}
+
if (bit >= m->alloc_start && bit < m->alloc_end)
m->systembits_inalloc++;

--


___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: PROBLEM: 4.15.0-rc3 APIC causes lockups on Core 2 Duo laptop

2017-12-23 Thread Dou Liyang

Hi Thomas,

At 12/23/2017 09:32 PM, Thomas Gleixner wrote:
[...]


The BUG_ON panic happens at line 147:
BUG_ON(!test_and_clear_bit(bit, cm->alloc_map));

I'm sure Thomas and Dou know it better than me.


I'll have a look after the holidays.



Merry Christmas!  :-)

I am trying to look into it.

Thanks,
dou


___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: PROBLEM: 4.15.0-rc3 APIC causes lockups on Core 2 Duo laptop

2017-12-22 Thread Dou Liyang

Hi Alexandru,

At 12/21/2017 10:23 AM, Alexandru Chirvasitu wrote:

This might be more helpful. I ran another bisect with the following
final log:

---

git bisect start
# bad: [d6ffc6ac83b1f9f12652d89b9cb5bcbfbea7796c] x86/vector: Respect affinity 
mask in irq descriptor
git bisect bad d6ffc6ac83b1f9f12652d89b9cb5bcbfbea7796c
# good: [e4ae4c8ea7c65f61fde29c689d148c8c9e05305a] Merge branch 'irq/core' into 
x86/apic
git bisect good e4ae4c8ea7c65f61fde29c689d148c8c9e05305a
# good: [4ef76eb6de734dc03a7f3b8f80884362364e6049] x86/apic: Get rid of the 
legacy irq data storage
git bisect good 4ef76eb6de734dc03a7f3b8f80884362364e6049
# good: [ba801640b10d87b1c4e26cbcbe414a001255404f] x86/vector: Compile SMP only 
code conditionally
git bisect good ba801640b10d87b1c4e26cbcbe414a001255404f
# good: [90ad9e2d91067983f3328e21b306323877e5f48a] x86/io_apic: Reevaluate 
vector configuration on activate()
git bisect good 90ad9e2d91067983f3328e21b306323877e5f48a
# bad: [4900be83602b6be07366d3e69f756c1959f4169a] x86/vector/msi: Switch to 
global reservation mode
git bisect bad 4900be83602b6be07366d3e69f756c1959f4169a
# bad: [2db1f959d9dc16035f2eb44ed5fdb2789b754d6a] x86/vector: Handle managed 
interrupts proper
git bisect bad 2db1f959d9dc16035f2eb44ed5fdb2789b754d6a
# first bad commit: [2db1f959d9dc16035f2eb44ed5fdb2789b754d6a] x86/vector: 
Handle managed interrupts proper


It's helpful to me. I tried it in QEmu with

  (Intel(R) Core(TM)2 Duo CPU  T7700  @ 2.40GHz)
but, can't reproduced the bug.



---

That first bad commit 2db1f95 identified at the end is interesting:
it's the only one I've tried through all of this that actually gives
me a kernel panic when unadorned with kernel options (so unlike all of
the others it fails to even drop me at a tty login prompt).

I tried a number of things to fiddle with it: it boots fine with
either nolapic or noapic. The former results in seeing a single cpu
with lscpu, but the latter (noapic) seems to give me as much
functionality as I'd need. I'm not seeing the issue noted before,


Because the "noapic" just disable the I/O APIC, but, the "nolapic" will
disable both the Local APIC and I/O APIC


whereby noapic for 4.15.0-rc3 was somehow disabling my ethernet card.

I hope this second bisect went down better than the last one..


Could you add "apic=debug" in the kernel command line, then, give me the 
dmesg log?


Thanks,
dou.


___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: PROBLEM: 4.15.0-rc3 APIC causes lockups on Core 2 Duo laptop

2017-12-19 Thread Dou Liyang

Hi Thomas,

At 12/20/2017 08:31 AM, Thomas Gleixner wrote:

On Tue, 19 Dec 2017, Alexandru Chirvasitu wrote:


I had never heard of 'bisect' before this casual mention (you might tell
I am a bit out of my depth). I've since applied it to Linus' tree between



bebc608 Linux 4.14 (good)

and

4fbd8d1 Linux 4.15-rc1 (bad)


Is Linus current head 4.15-rc4 bad as well?


[...]


Thanks for doing that bisect, but unfortunately this commit cannot be the
problematic one, It merily adds a config symbol, but it does not change any
code at all. It has no effect whatsoever. So something might have gone
wrong in your bisecting.



Agree.


I CC'ed Dou Liyang. He has changed the early APIC setup code and there has
been an issue reported already. Though I lost track of that. Dou, any

     Is it this one?
               https://marc.info/?l=linux-kernel=151188084018443

pointers?



Not sure, but seems the APIC failed to start in that 32-bit system.

I will look into it.

Alex,

Could you give me your .config file and the dmesg-log of 4.15.0-rc3.

Thanks,
dou


___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel