Bug#883938: Bug #883938: linux-image-3.16.0-4-amd64: Kernel panic on boot after upgrading to debian 8.10 kernel 3.16.51

2017-12-31 Thread ผู้หญิง ธรรมดา
On Mon, 11 Dec 2017 19:07:05 +0100 Karsten Heiken <
hei...@luis.uni-hannover.de> wrote:
> Bernhard Schmidt wrote:
> >Can you check whether numa=off on the kernel command line fixes this as
well?
>
> Indeed it does. Appending numa=off to the kernel command line fixes
> the bug in my KVM virtual machines as well as my physical servers (at
> least the ones I've tested so far).
>
> So this might be a viable workaround for the people that already
> patched their systems.
>
>
>


Bug#883938: linux-image-3.16.0-4-amd64: Kernel panic on boot after upgrading to debian 8.10 kernel 3.16.51

2017-12-14 Thread Ian Jackson
The Xen Project CI has six machines which were affected by this
regression.  (Kernel messages are near-identical to those reported
by others in this bug.)  As suggested I downloaded this kernel;

  
https://people.debian.org/~benh/packages/jessie-pu/linux-image-3.16.0-4-amd64_3.16.51-3~a.test_amd64.deb

and I can confirm that this fixes the issue.

We will continue to use this kernel for the affected machines, until
the next Debian point release (in which I hope this will be fixed).

Thanks to Ben for the quick fix and to #debian-kernel for the pointer
to the right bug report.

Regards,
Ian.



Bug#883938: linux-image-3.16.0-4-amd64: Kernel panic on boot after upgrading to debian 8.10 kernel 3.16.51

2017-12-14 Thread Jim Cobley
Also unable to boot since update to 3.16.51 on Intel 2nd Gen i7-2600 @ 
3.4GHz on Asus P8 H67-M (1 CPU, 8 core)

***UUID*** not found

Booted from another drive, downloaded linux-image-3.16.0-4-amd64 with 
3.16.43-2+deb8u2 and extracted files over boot/lib/usr on main drive. 
Now booting with no apparent loss of configuration but question how to 
get back onstream with updates again.




Bug#883938: linux-image-3.16.0-4-amd64: Kernel panic on boot after upgrading to debian 8.10 kernel 3.16.51

2017-12-14 Thread O.Neupert
The same issue here with CPU Intel(R) Xeon(R) CPU E5-2603 0 @ 1.80GHz(8
Core).
Back to 3.16.43-2+deb8u5 works for me.
This was a production system.


-- 
Mit freundlichen Grüßen / Best regards

Olaf Neupert

PARADIGMA Software GmbH

Paul-Lincke-Ufer 8e, 10999 Berlin | Feringastr. 6, 85774 Unterföhring
Telefon
+49(0)30-7261062-27 | Fax  +49(0)30-7261062-29

e-mail  o.neup...@paradigma-software.de | www.paradigma-software.de

Geschäftsführer: Wolfgang Kastenhuber, Christine Fitz  | Sitz der
Gesellschaft: Berlin, Amtsgericht Berlin Charlottenburg HRB 96303



Diese e-mail enthält vertrauliche und/oder rechtlich geschützte
Informationen,
die ausschliesslich fuer die adressierte(n) Person(en)  bestimmt sind.
Unbefugten ist es nicht gestattet, diese zu lesen, zu kopieren,
weiterzuleiten
oder anderweitig zu verbreiten oder zu verwenden.

Sollten Sie diese e-mail irrtümlich erhalten haben, löschen Sie die
Informationen bitte von Ihrem Computer.

The information contained in this e-mail and any files transmitted with
it are
confidential and solely intended for the use of the addressee(s). If you
are
not the intended recipient of this e-mail, please note that any review,
dissemination, disclosure, alteration, printing, copying or transmission of
this e-mail and/or any file transmitted with it, is prohibited and may be
unlawful.

If you have received this e-mail by mistake, please delete the
information and
any file transmitted with it.



Bug#883938: linux-image-3.16.0-4-amd64: Kernel panic on boot after upgrading to debian 8.10 kernel 3.16.51

2017-12-13 Thread Vladislav Kurz
Package: src:linux
Followup-For: Bug #883938

Dear Maintainer,

I have run into this same problem on HP ProLiant DL380 G6 (fortunately this is 
only a backup server)

I confirm that numa=off workaround works for me.  That allowed me to install 
linux-image-3.16.0-4-amd64_3.16.51-3~a.test_amd64.deb which now boots OK even 
without numa=off.

Thanks for preparing the fixed version.

As a prevention I have installed the fixed kernel also on HP ProLiant DL380 
Gen9, but did not dare yet to reboot that one as it is the production box.

Best Regards
Vladislav Kurz



Bug#883938: linux-image-3.16.0-4-amd64: Kernel panic on boot after upgrading to debian 8.10 kernel 3.16.51

2017-12-13 Thread Sebastian.Gerke
Hello Ben,


Thank you for the fix. We have same issue as other people in this conversation.
When do you think that the fix will be rollout as stable?


Sebastian Gerke

--
Deutsches Zentrum für Luft- und Raumfahrt e.V. (DLR)
Institut für Antriebstechnik | Linder Höhe | 51147 Köln

Sebastian Gerke | IT-Servicemanagement | Gebäude 45 Raum B01
Telefon 02203 601-2982 |  sebastian.ge...@dlr.de
www.DLR.de



Bug#883938: linux-image-3.16.0-4-amd64: Kernel panic on boot after upgrading to debian 8.10 kernel 3.16.51

2017-12-11 Thread Salvatore Bonaccorso
On Mon, Dec 11, 2017 at 11:10:42PM +0100, Salvatore Bonaccorso wrote:
> Hi Ben,
> 
> On Mon, Dec 11, 2017 at 09:47:49PM +, Ben Hutchings wrote:
> > On Mon, 2017-12-11 at 21:29 +, Ben Hutchings wrote:
> > > There were several commits interspersed with sched/topology fixes
> > > upstream that I thought were cleanup and therefore didn't backport to
> > > the 3.16 stable branch.  Now I suspect that at least some of them are
> > > needed.
> > > 
> > > I'm attaching backports of 3 of the commits that I left out.  Can you
> > > test whether they fix the regression for you?
> 
> I built v3.16.51 with the three patches on top, and looks good I have
> no  seen any further problems with those.

To confirm, I now as well bootet two affected systems with you
testkernel containg the patches and they boot now.

Regards,
Salvatore



Bug#883938: Bug #883938: linux-image-3.16.0-4-amd64: Kernel panic on boot after upgrading to debian 8.10 kernel 3.16.51

2017-12-11 Thread Heiko Schlittermann
The provided work-around (setting numa=off) worked here.
Do you need further information about the system in use?

BTW: Thank you for the NUMA advice, it was about 21.00 CET when
I started a search engine and found the bug entry and the suggested
work-around, so, your advice came just in time.

Best regards from Dresden/Germany
Viele Grüße aus Dresden
Heiko Schlittermann
-- 
 SCHLITTERMANN.de  internet & unix support -
 Heiko Schlittermann, Dipl.-Ing. (TU) - {fon,fax}: +49.351.802998{1,3} -
 gnupg encrypted messages are welcome --- key ID: F69376CE -
 ! key id 7CBF764A and 972EAC9F are revoked since 2015-01  -


signature.asc
Description: PGP signature


Bug#883938: linux-image-3.16.0-4-amd64: Kernel panic on boot after upgrading to debian 8.10 kernel 3.16.51

2017-12-11 Thread Salvatore Bonaccorso
Hi Ben,

On Mon, Dec 11, 2017 at 09:47:49PM +, Ben Hutchings wrote:
> On Mon, 2017-12-11 at 21:29 +, Ben Hutchings wrote:
> > There were several commits interspersed with sched/topology fixes
> > upstream that I thought were cleanup and therefore didn't backport to
> > the 3.16 stable branch.  Now I suspect that at least some of them are
> > needed.
> > 
> > I'm attaching backports of 3 of the commits that I left out.  Can you
> > test whether they fix the regression for you?

I built v3.16.51 with the three patches on top, and looks good I have
not seen any further problems with those.

Regards,
Salvatore



Bug#883938: linux-image-3.16.0-4-amd64: Kernel panic on boot after upgrading to debian 8.10 kernel 3.16.51

2017-12-11 Thread Ben Hutchings
On Mon, 2017-12-11 at 21:29 +, Ben Hutchings wrote:
> There were several commits interspersed with sched/topology fixes
> upstream that I thought were cleanup and therefore didn't backport to
> the 3.16 stable branch.  Now I suspect that at least some of them are
> needed.
> 
> I'm attaching backports of 3 of the commits that I left out.  Can you
> test whether they fix the regression for you?
> 
> If this doesn't help, I think we may just have to revert the
> sched/topology fixes.

Sorry, the first patch should be this one and not
0001-sched-topology-Simplify-build_overlap_sched_groups.patch.

Ben.

-- 
Ben Hutchings
A free society is one where it is safe to be unpopular. - Adlai
Stevenson
From aae254d8033b9d41e7915dcb3fe3642336408171 Mon Sep 17 00:00:00 2001
From: Peter Zijlstra 
Date: Wed, 26 Apr 2017 17:36:41 +0200
Subject: [PATCH 1/3] sched/topology: Remove FORCE_SD_OVERLAP

commit af85596c74de2fd9abb87501ae280038ac28a3f4 upstream.

Its an obsolete debug mechanism and future code wants to rely on
properties this undermines.

Namely, it would be good to assume that SD_OVERLAP domains have
children, but if we build the entire hierarchy with SD_OVERLAP this is
obviously false.

Signed-off-by: Peter Zijlstra (Intel) 
Cc: Linus Torvalds 
Cc: Mike Galbraith 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: linux-ker...@vger.kernel.org
Signed-off-by: Ingo Molnar 
[bwh: Backported to 3.16: adjust filename]
Signed-off-by: Ben Hutchings 
---
 kernel/sched/core.c | 2 +-
 kernel/sched/features.h | 1 -
 2 files changed, 1 insertion(+), 2 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 0940ee9603ae..0692ee2dd889 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -6631,7 +6631,7 @@ static int build_sched_domains(const struct cpumask *cpu_map,
 			sd = build_sched_domain(tl, cpu_map, attr, sd, i);
 			if (tl == sched_domain_topology)
 *per_cpu_ptr(d.sd, i) = sd;
-			if (tl->flags & SDTL_OVERLAP || sched_feat(FORCE_SD_OVERLAP))
+			if (tl->flags & SDTL_OVERLAP)
 sd->flags |= SD_OVERLAP;
 			if (cpumask_equal(cpu_map, sched_domain_span(sd)))
 break;
diff --git a/kernel/sched/features.h b/kernel/sched/features.h
index 90284d117fe6..79bad00ea840 100644
--- a/kernel/sched/features.h
+++ b/kernel/sched/features.h
@@ -56,7 +56,6 @@ SCHED_FEAT(NONTASK_CAPACITY, true)
  */
 SCHED_FEAT(TTWU_QUEUE, true)
 
-SCHED_FEAT(FORCE_SD_OVERLAP, false)
 SCHED_FEAT(RT_RUNTIME_SHARE, true)
 SCHED_FEAT(LB_MIN, false)
 


signature.asc
Description: This is a digitally signed message part


Bug#883938: linux-image-3.16.0-4-amd64: Kernel panic on boot after upgrading to debian 8.10 kernel 3.16.51

2017-12-11 Thread Ben Hutchings
There were several commits interspersed with sched/topology fixes
upstream that I thought were cleanup and therefore didn't backport to
the 3.16 stable branch.  Now I suspect that at least some of them are
needed.

I'm attaching backports of 3 of the commits that I left out.  Can you
test whether they fix the regression for you?

If this doesn't help, I think we may just have to revert the
sched/topology fixes.

Ben.

-- 
Ben Hutchings
A free society is one where it is safe to be unpopular. - Adlai
Stevenson

From: Peter Zijlstra 
Date: Fri, 14 Apr 2017 17:32:07 +0200
Subject: sched/topology: Simplify build_overlap_sched_groups()
Origin: https://git.kernel.org/linus/91eaed0d61319f58a9f8e43d41a8cbb069b4f73d

Now that the first group will always be the previous domain of this
@cpu this can be simplified.

In fact, writing the code now removed should've been a big clue I was
doing it wrong :/

Signed-off-by: Peter Zijlstra (Intel) 
Cc: Linus Torvalds 
Cc: Mike Galbraith 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: linux-ker...@vger.kernel.org
Signed-off-by: Ingo Molnar 
[bwh: Backported to 3.16: adjust filename, context]
---
 kernel/sched/topology.c | 13 ++---
 1 file changed, 2 insertions(+), 11 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 921dedde2ee1..6b10e0a956c7 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -557,7 +557,7 @@ static void init_overlap_sched_group(struct sched_domain *sd,
 static int
 build_overlap_sched_groups(struct sched_domain *sd, int cpu)
 {
-	struct sched_group *first = NULL, *last = NULL, *groups = NULL, *sg;
+	struct sched_group *first = NULL, *last = NULL, *sg;
 	const struct cpumask *span = sched_domain_span(sd);
 	struct cpumask *covered = sched_domains_tmpmask;
 	struct sd_data *sdd = sd->private;
@@ -587,15 +587,6 @@ build_overlap_sched_groups(struct sched_domain *sd, int cpu)
 
 		init_overlap_sched_group(sd, sg);
 
-		/*
-		 * Make sure the first group of this domain contains the
-		 * canonical balance cpu. Otherwise the sched_domain iteration
-		 * breaks. See update_sg_lb_stats().
-		 */
-		if ((!groups && cpumask_test_cpu(cpu, sg_span)) ||
-		group_balance_cpu(sg) == cpu)
-			groups = sg;
-
 		if (!first)
 			first = sg;
 		if (last)
@@ -603,7 +594,7 @@ build_overlap_sched_groups(struct sched_domain *sd, int cpu)
 		last = sg;
 		last->next = first;
 	}
-	sd->groups = groups;
+	sd->groups = first;
 
 	return 0;
 
From 5727151be2afd55868244c2621ce0e350eecdc8d Mon Sep 17 00:00:00 2001
From: Peter Zijlstra 
Date: Fri, 14 Apr 2017 17:32:07 +0200
Subject: [PATCH 2/3] sched/topology: Simplify build_overlap_sched_groups()

commit 91eaed0d61319f58a9f8e43d41a8cbb069b4f73d upstream.

Now that the first group will always be the previous domain of this
@cpu this can be simplified.

In fact, writing the code now removed should've been a big clue I was
doing it wrong :/

Signed-off-by: Peter Zijlstra (Intel) 
Cc: Linus Torvalds 
Cc: Mike Galbraith 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: linux-ker...@vger.kernel.org
Signed-off-by: Ingo Molnar 
[bwh: Backported to 3.16: adjust filename, context]
Signed-off-by: Ben Hutchings 
---
 kernel/sched/core.c | 13 ++---
 1 file changed, 2 insertions(+), 11 deletions(-)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 0692ee2dd889..68bc1bf9f5ab 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -5869,7 +5869,7 @@ static void init_overlap_sched_group(struct sched_domain *sd,
 static int
 build_overlap_sched_groups(struct sched_domain *sd, int cpu)
 {
-	struct sched_group *first = NULL, *last = NULL, *groups = NULL, *sg;
+	struct sched_group *first = NULL, *last = NULL, *sg;
 	const struct cpumask *span = sched_domain_span(sd);
 	struct cpumask *covered = sched_domains_tmpmask;
 	struct sd_data *sdd = sd->private;
@@ -5899,15 +5899,6 @@ build_overlap_sched_groups(struct sched_domain *sd, int cpu)
 
 		init_overlap_sched_group(sd, sg);
 
-		/*
-		 * Make sure the first group of this domain contains the
-		 * canonical balance cpu. Otherwise the sched_domain iteration
-		 * breaks. See update_sg_lb_stats().
-		 */
-		if ((!groups && cpumask_test_cpu(cpu, sg_span)) ||
-		group_balance_cpu(sg) == cpu)
-			groups = sg;
-
 		if (!first)
 			first = sg;
 		if (last)
@@ -5915,7 +5906,7 @@ build_overlap_sched_groups(struct sched_domain *sd, int cpu)
 		last = sg;
 		last->next = first;
 	}
-	sd->groups = groups;
+	sd->groups = first;
 
 	return 0;
 
From 10f83418ab83f5d4ff5f7bc53cbd3f9455981186 Mon Sep 17 00:00:00 2001
From: Lauro Ramos Venancio 
Date: Thu, 20 Apr 2017 16:51:40 -0300
Subject: [PATCH 3/3] sched/topology: 

Bug#883938: linux-image-3.16.0-4-amd64: Kernel panic on boot after upgrading to debian 8.10 kernel 3.16.51

2017-12-11 Thread Salvatore Bonaccorso
Hi

here the bisect log (unless overseen something with the last good commit):

git bisect start
# good: [3717265153a66c2ecbd745ea8eef213289b4a55e] Linux 3.16.48
git bisect good 3717265153a66c2ecbd745ea8eef213289b4a55e
# bad: [c45c05f42d5d3baf5d18e648c064788381fcfa1c] Linux 3.16.51
git bisect bad c45c05f42d5d3baf5d18e648c064788381fcfa1c
# bad: [36284647e2096b8bcc238772b644151967a5189b] parisc: pci memory bar 
assignment fails with 64bit kernels on dino/cujo
git bisect bad 36284647e2096b8bcc238772b644151967a5189b
# bad: [bfc20d54acae250fde0c8d56fc57fd41ea6dbe99] Revert "ACPI / EC: Add 
support to disallow QR_EC to be issued before completing previous QR_EC"
git bisect bad bfc20d54acae250fde0c8d56fc57fd41ea6dbe99
# bad: [b657f2742871a071a015328ec45c26679998096c] PCI/PM: Restore the status of 
PCI devices across hibernation
git bisect bad b657f2742871a071a015328ec45c26679998096c
# bad: [6d8c76588c1b03d63c8b76eaabac7a537e2d6714] drm/msm/hdmi: Use bitwise 
operators when building register values
git bisect bad 6d8c76588c1b03d63c8b76eaabac7a537e2d6714
# bad: [95ac78eafd026fb38c38a8f312acdd6fd8aee747] kvm: x86: Guest BNDCFGS 
requires guest MPX support
git bisect bad 95ac78eafd026fb38c38a8f312acdd6fd8aee747
# bad: [54749559969963ea435013122941643a90f1c6b3] f2fs: load inode's flag from 
disk
git bisect bad 54749559969963ea435013122941643a90f1c6b3
# good: [49d4283d847987fccbd7c7ce8f59f0fd765702ff] sched/topology: Fix building 
of overlapping sched-groups
git bisect good 49d4283d847987fccbd7c7ce8f59f0fd765702ff
# bad: [5a7aba3f27f7262cc5fad1b85804ea0ad9f4275c] sched/topology: Fix 
overlapping sched_group_capacity
git bisect bad 5a7aba3f27f7262cc5fad1b85804ea0ad9f4275c
# bad: [00c978eada13e2cd1bc7da485e99ab5fb7c3418c] sched/topology: Fix 
overlapping sched_group_mask
git bisect bad 00c978eada13e2cd1bc7da485e99ab5fb7c3418c
# first bad commit: [00c978eada13e2cd1bc7da485e99ab5fb7c3418c] sched/topology: 
Fix overlapping sched_group_mask

With 00c978eada13e2cd1bc7da485e99ab5fb7c3418c the systems boots, but warns
once:

cut-cut-cut-cut-cut-cut-
[...]
[1.958685] smpboot: CPU0: Quad-Core AMD Opteron(tm) Processor 2356 (fam: 
10, model: 02, steppi
ng: 03)
[2.176232] Performance Events: AMD PMU driver.
[2.230503] ... version:0
[2.278424] ... bit width:  48
[2.327396] ... generic registers:  4
[2.375308] ... value mask: 
[2.438837] ... max period: 7fff
[2.502353] ... fixed-purpose events:   0
[2.550271] ... event mask: 000f
[2.615975] NMI watchdog: enabled on all CPUs, permanently consumes one 
hw-PMU counter.
[2.711926] x86: Booting SMP configuration:
[2.761931]  node  #0, CPUs:#1  #2  #3
[2.859347]  node  #1, CPUs:#4  #5  #6  #7
[3.064352] x86: Booted up 2 nodes, 8 CPUs
[3.115572] smpboot: Total of 8 processors activated (36803.45 BogoMIPS)
[3.200017] [ cut here ]
[3.255328] WARNING: CPU: 0 PID: 1 at kernel/sched/core.c:5807 
build_sched_domains+0x7da/0xc80()
[3.360528] Modules linked in:
[3.397169] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.16.48+ #10
[3.471097] Hardware name: Sun Microsystems Sun Fire X4140/Sun Fire 
X4140, BIOS 0ABMN068 10/26/2009
[3.583508]   81514999  
0009
[3.672426]  810699f7 88021d699598 88021d69ab40 
88021d69ab58
[3.761306]   88041db520c0 8109bb0a 

[3.850185] Call Trace:
[3.879420]  [] ? dump_stack+0x5d/0x78
[3.942942]  [] ? warn_slowpath_common+0x77/0x90
[4.016855]  [] ? build_sched_domains+0x7da/0xc80
[4.091806]  [] ? sched_init_smp+0x38b/0x444
[4.161558]  [] ? mutex_lock+0xe/0x30
[4.224020]  [] ? put_online_cpus+0x23/0x90
[4.292744]  [] ? stop_machine+0x2c/0x40
[4.358340]  [] ? kernel_init_freeable+0x106/0x207
[4.434331]  [] ? rest_init+0x80/0x80
[4.496793]  [] ? kernel_init+0xa/0xf0
[4.560319]  [] ? ret_from_fork+0x58/0x90
[4.626955]  [] ? rest_init+0x80/0x80
[4.689436] ---[ end trace 4340fcfda5f7e5a6 ]---
[4.745076] devtmpfs: initialized
[4.791668] PM: Registering ACPI NVS region [mem 0xd7fbe000-0xd7fe] 
(204800 bytes)
[4.886661] futex hash table entries: 8192 (order: 7, 524288 bytes)
[...]
cut-cut-cut-cut-cut-cut-

With the subsequent commit:

cut-cut-cut-cut-cut-cut-
[...]
[1.958675] smpboot: CPU0: Quad-Core AMD Opteron(tm) Processor 2356 (fam: 
10, model: 02, stepping: 03)
[2.176232] Performance Events: AMD PMU driver.
[2.230510] ... version:0
[2.278440] ... bit width:  48
[2.327401] ... generic registers:  4
[2.375315] ... value mask: 
[ 

Bug#883938: Bug #883938: linux-image-3.16.0-4-amd64: Kernel panic on boot after upgrading to debian 8.10 kernel 3.16.51

2017-12-11 Thread Karsten Heiken

Bernhard Schmidt wrote:

Can you check whether numa=off on the kernel command line fixes this as well?


Indeed it does. Appending numa=off to the kernel command line fixes
the bug in my KVM virtual machines as well as my physical servers (at
least the ones I've tested so far).

So this might be a viable workaround for the people that already
patched their systems.



Bug#883938: Bug #883938: linux-image-3.16.0-4-amd64: Kernel panic on boot after upgrading to debian 8.10 kernel 3.16.51

2017-12-11 Thread Bernhard Schmidt
Hi Karsten,

Thanks for the test. Can you check whether numa=off on the kernel command line 
fixes this as well?

Bernhard
-- 
Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet.

Bug#883938: Bug #883938: linux-image-3.16.0-4-amd64: Kernel panic on boot after upgrading to debian 8.10 kernel 3.16.51

2017-12-11 Thread Karsten Heiken

I can reproduce the bug on multiple (physical) servers running in our
data center.

When booting my virtual servers with the new kernel, the issue did not
arise...

...until I enabled NUMA.

Booting a virtual machine with 2 sockets and 16 cores each works fine
when NUMA is disabled in KVM.

With NUMA enabled the machine boots only if I set the number of
sockets to "1". When I set the number of sockets to two, the machine
crashes.



Bug#883938: Bug #883938: linux-image-3.16.0-4-amd64: Kernel panic on boot after upgrading to debian 8.10 kernel 3.16.51

2017-12-11 Thread Christoph Martin
Just as a note:

Upgrading to jessie-backports version 4.9.51-1~bpo8+1 is also an option,
which works.

Christoph
<>

signature.asc
Description: OpenPGP digital signature


Bug#883938: linux-image-3.16.0-4-amd64: Kernel panic on boot after upgrading to debian 8.10 kernel 3.16.51

2017-12-11 Thread Salvatore Bonaccorso
Hi

The issue is as well happening with 3.16.51 vanilla (and config
generated with localmodconfig, make deb-pkg built).

Trying to bisect now.

Regards,
Salvatore



Bug#883938: linux-image-3.16.0-4-amd64: Kernel panic on boot after upgrading to debian 8.10 kernel 3.16.51

2017-12-11 Thread Miquel van Smoorenburg
Same here, whole cluster of machines down here with this kernel. Same 
panic message, so I won't repeat it here. These are Supermicro boxes 
with Xeon CPUs:


$ lscpu
Architecture:  x86_64
CPU op-mode(s):    32-bit, 64-bit
Byte Order:    Little Endian
CPU(s):    16
On-line CPU(s) list:   0-15
Thread(s) per core:    2
Core(s) per socket:    4
Socket(s): 2
NUMA node(s):  2
Vendor ID: GenuineIntel
CPU family:    6
Model: 26
Model name:    Intel(R) Xeon(R) CPU   L5520  @ 2.27GHz
Stepping:  5
CPU MHz:   1600.000
CPU max MHz:   2268.
CPU min MHz:   1600.
BogoMIPS:  4533.36
Virtualization:    VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache:  256K
L3 cache:  8192K
NUMA node0 CPU(s): 0-3,8-11
NUMA node1 CPU(s): 4-7,12-15

Mike.



Bug#883938: linux-image-3.16.0-4-amd64: Kernel panic on boot after upgrading to debian 8.10 kernel 3.16.51

2017-12-11 Thread Bernhard Schmidt
On Mon, Dec 11, 2017 at 09:23:45AM +0100, Salvatore Bonaccorso wrote:

> The issue seems not present when rolling back to 3.16.48-1 (this is
> one kernel version which was only present in jessie-proposed-update).
> 
> Can someone confirm? If yes it has to be a change between 3.16.48-1
> and 3.16.51-2.

Yes, I can confirm that 3.16.48-1 boots fine on my affected servers.

Bernhard



Bug#883938: linux-image-3.16.0-4-amd64: Kernel panic on boot after upgrading to debian 8.10 kernel 3.16.51

2017-12-11 Thread Salvatore Bonaccorso
The issue seems not present when rolling back to 3.16.48-1 (this is
one kernel version which was only present in jessie-proposed-update).

Can someone confirm? If yes it has to be a change between 3.16.48-1
and 3.16.51-2.

Regards,
Salvatore



Bug#883938: linux-image-3.16.0-4-amd64: Kernel panic on boot after upgrading to debian 8.10 kernel 3.16.51

2017-12-10 Thread Dominic Benson
I encountered this on an old Intel SR1600 box with Westmere Xeons.

I have an Ivy Bridge E5-1620v2 physical box that has booted successfully
with this kernel, and I haven't seen any problem among the dozen or so
VMs that have so far rebooted onto it (VMware ESXi 6.0/6.5, on dual E5
Xeon Sandy Bridge/Ivy Bridge/Haswell/Broadwell).

As with previous reporters, reverting to 3.16.43-2+deb8u5 avoids the
problem.

Details of the problematic system as follows:

** Command line:
BOOT_IMAGE=/vmlinuz-3.16.0-4-amd64 root=/dev/mapper/raid1-root ro quiet

** Tainted: I (2048)
 * Working around severe firmware bug.

** Kernel log:
[    5.810814] ioatdma :00:16.4: irq 95 for MSI/MSI-X
[    5.811126] ioatdma :00:16.5: irq 96 for MSI/MSI-X
[    5.811438] ioatdma :00:16.6: irq 97 for MSI/MSI-X
[    5.811752] ioatdma :00:16.7: irq 98 for MSI/MSI-X
[    5.875627] shpchp: Standard Hot Plug PCI Controller Driver version: 0.4
[    5.913072] ipmi message handler version 39.2
[    5.950042] EDAC MC: Ver: 3.0.0
[    5.962572] input: Sleep Button as
/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0E:00/input/input0
[    5.962578] ACPI: Sleep Button [SLPB]
[    5.962625] input: Power Button as
/devices/LNXSYSTM:00/LNXPWRBN:00/input/input1
[    5.962627] ACPI: Power Button [PWRF]
[    5.974649] SSE version of gcm_enc/dec engaged.
[    5.978346] alg: No test for __gcm-aes-aesni (__driver-gcm-aes-aesni)
[    6.022862] EDAC MC1: Giving out device to module i7core_edac.c
controller i7 core #1: DEV :fe:03.0 (POLLED)
[    6.022894] EDAC PCI0: Giving out device to module i7core_edac
controller EDAC PCI controller: DEV :fe:03.0 (POLLED)
[    6.023304] EDAC MC0: Giving out device to module i7core_edac.c
controller i7 core #0: DEV :ff:03.0 (POLLED)
[    6.023329] EDAC PCI1: Giving out device to module i7core_edac
controller EDAC PCI controller: DEV :ff:03.0 (POLLED)
[    6.023600] EDAC i7core: Driver loaded, 2 memory controller(s) found.
[    6.062760] cdc_acm 7-1:1.0: This device cannot do calls on its own.
It is not a modem.
[    6.062885] cdc_acm 7-1:1.0: ttyACM0: USB ACM device
[    6.064546] usbcore: registered new interface driver cdc_acm
[    6.064549] cdc_acm: USB Abstract Control Model driver for USB modems
and ISDN adapters
[    6.072093] IPMI System Interface driver.
[    6.072121] ipmi_si: probing via ACPI
[    6.072143] ipmi_si 00:02: [io  0x0ca2] regsize 1 spacing 1 irq 0
[    6.072144] ipmi_si: Adding ACPI-specified kcs state machine
[    6.072156] ipmi_si: probing via SMBIOS
[    6.072158] ipmi_si: SMBIOS: io 0xca2 regsize 1 spacing 1 irq 0
[    6.072159] ipmi_si: Adding SMBIOS-specified kcs state machine
duplicate interface
[    6.072161] ipmi_si: Trying ACPI-specified kcs state machine at i/o
address 0xca2, slave address 0x0, irq 0
[    6.122134] input: PC Speaker as /devices/platform/pcspkr/input/input2
[    6.215855] ipmi_si 00:02: Found new BMC (man_id: 0x000157, prod_id:
0x003e, dev_id: 0x21)
[    6.215869] ipmi_si 00:02: IPMI kcs interface initialized
[    6.639127] hidraw: raw HID events driver (C) Jiri Kosina
[    6.727507] alg: No test for crc32 (crc32-pclmul)
[    6.853789] iTCO_vendor_support: vendor-support=0
[    6.857666] usbcore: registered new interface driver usbhid
[    6.857668] usbhid: USB HID core driver
[    6.885272] iTCO_wdt: Intel TCO WatchDog Timer Driver v1.11
[    6.885304] iTCO_wdt: unable to reset NO_REBOOT flag, device disabled
by hardware/BIOS
[    6.911461] [drm] Initialized drm 1.1.0 20060810
[    6.995935] input: Dell Dell USB Keyboard as
/devices/pci:00/:00:1a.1/usb4/4-1/4-1:1.0/0003:413C:2003.0001/input/input3
[    6.996179] hid-generic 0003:413C:2003.0001: input,hidraw0: USB HID
v1.10 Keyboard [Dell Dell USB Keyboard] on usb-:00:1a.1-1/input0
[    6.996349] input: American Megatrends Inc. Virtual Keyboard and
Mouse as
/devices/pci:00/:00:1a.2/usb5/5-1/5-1:1.0/0003:046B:FF10.0002/input/input4
[    6.996455] hid-generic 0003:046B:FF10.0002: input,hidraw1: USB HID
v1.10 Keyboard [American Megatrends Inc. Virtual Keyboard and Mouse] on
usb-:00:1a.2-1/input0
[    6.996622] input: American Megatrends Inc. Virtual Keyboard and
Mouse as
/devices/pci:00/:00:1a.2/usb5/5-1/5-1:1.1/0003:046B:FF10.0003/input/input5
[    6.996956] hid-generic 0003:046B:FF10.0003: input,hidraw2: USB HID
v1.10 Mouse [American Megatrends Inc. Virtual Keyboard and Mouse] on
usb-:00:1a.2-1/input1
[    7.014469] kvm: VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL does not work
properly. Using workaround
[    7.565424] Adding 11717628k swap on /dev/sdb5.  Priority:-1
extents:1 across:11717628k FS
[    8.905845] XFS (dm-1): Mounting V4 Filesystem
[    8.921306] XFS (dm-2): Mounting V4 Filesystem
[    9.319538] XFS (dm-4): Mounting V4 Filesystem
[    9.484788] XFS (dm-5): Mounting V4 Filesystem
[    9.564591] XFS (dm-5): Starting recovery (logdev: internal)
[    9.666941] XFS (dm-5): Ending recovery (logdev: internal)
[    9.696815] XFS (dm-6): Mounting V4 Filesystem
[    9.718313] 

Bug#883938: linux-image-3.16.0-4-amd64: Kernel panic on boot after upgrading to debian 8.10 kernel 3.16.51

2017-12-10 Thread Marek Skrobacki
​FWIW, I run into exact same kernel panic on R810. Downgrading
to 3.16.43-2+deb8u5 "resolved" the problem.


-- Marek


Bug#883938: linux-image-3.16.0-4-amd64: Kernel panic on boot after upgrading to debian 8.10 kernel 3.16.51

2017-12-09 Thread Salvatore Bonaccorso
Hi

> I only have a terrible java KVM app available for seeing the console, and from
> what I can see, it logs with timestamps [0.811] through [0.841] (typing in
> from an image - excuse any typos, and leaving out long hexadecimal numbers
> that might not be interesting)

here is one captured from a console (for completeness):

[0.425445] general protection fault:  [#1] SMP
[0.485003] Modules linked in:
[0.521588] CPU: 0 PID: 1 Comm: swapper/0 Tainted: GW 
3.16.0-4-amd64 #1 Debian 3.16.51-2
[0.630874] Hardware name: Sun Microsystems Sun Fire X4140/Sun Fire 
X4140, BIOS 0ABMN068 10/26/2009
[0.743251] task: 88021f4db2d0 ti: 88021f4f task.ti: 
88021f4f
[0.832756] RIP: 0010:[]  [] 
build_sched_domains+0x72d/0xcf0
[0.937939] RSP: :88021f4f3df8  EFLAGS: 00010216
[1.001457] RAX:  RBX:  RCX: 0004
[1.086807] RDX: 00016d48 RSI:  RDI: 0200
[1.172154] RBP: 88021f6b2598 R08: 88021f6b3960 R09: 0124
[1.257488] R10:  R11: 88021f4f3b06 R12: 88021f6b3940
[1.342845] R13: 0200 R14: 88021f6a20c0 R15: 0200
[1.428191] FS:  () GS:880227c0() 
knlGS:
[1.524975] CS:  0010 DS:  ES:  CR0: 8005003b
[1.593668] CR2: 880427fff000 CR3: 01813000 CR4: 07f0
[1.679030] Stack:
[1.703015]  8802 88021f6b3958 88021f6b2500 
88021f6a20c0
[1.791889]     
880426f3c080
[1.880812]   f1c8  

[1.969682] Call Trace:
[1.998903]  [] ? sched_init_smp+0x398/0x452
[2.068660]  [] ? mutex_lock+0xe/0x2a
[2.131133]  [] ? put_online_cpus+0x23/0x80
[2.199847]  [] ? stop_machine+0x2c/0x40
[2.265451]  [] ? kernel_init_freeable+0xdd/0x1e1
[2.340404]  [] ? rest_init+0x80/0x80
[2.402867]  [] ? kernel_init+0xa/0xf0
[2.466382]  [] ? ret_from_fork+0x58/0x90
[2.533018]  [] ? rest_init+0x80/0x80
[2.595491] Code: c0 0f 85 46 05 00 00 48 8b 74 24 08 48 c7 c2 00 dd a6 81 
bf ff ff ff ff e8 91 78 21 00 48 98 49 8b 56 10 48 8b 04 c5 a0 1e 8e 81 <48> 8b 
14 10 b8 01 00 00 00 49 89 54 24 10 f0 0
f c1 02 85 c0 75
[2.827913] RIP  [] build_sched_domains+0x72d/0xcf0
[2.905062]  RSP 
[2.946759] ---[ end trace d66b66b984085567 ]---
[3.001983] Kernel panic - not syncing: Attempted to kill init! 
exitcode=0x000b
[3.001983]
[3.111285] ---[ end Kernel panic - not syncing: Attempted to kill init! 
exitcode=0x000b
[3.111285]

Salvatore



Bug#883938: linux-image-3.16.0-4-amd64: Kernel panic on boot after upgrading to debian 8.10 kernel 3.16.51

2017-12-09 Thread debbug
On Sat, Dec 09, 2017 at 18:46:08 +0100, Rene Engelhard wrote:
> On Sat, Dec 09, 2017 at 06:23:04PM +0100, debbug wrote:
> > Package: src:linux
> > Version: 3.16.43-2+deb8u5
> > Severity: grave
> > Justification: renders package unusable
> > 
> > (Note: This bug affects version 3.16.51-2, not 3.16.43-2+deb8u5,
> > but that's the version that "reportbug" filled in after downgrading
> > to get this system back up running)
> 
> Then you should have edited the report accordingly...
> 

I was going to, but I wasn't sure about the proper version number or 
syntax to put in (3.16.51 or 3.16.51-2 or something else) and I didn't 
want to risk losing the bug in a void or failing to submit. 

Thanks for fixing up the version number!



Bug#883938: linux-image-3.16.0-4-amd64: Kernel panic on boot after upgrading to debian 8.10 kernel 3.16.51

2017-12-09 Thread Rene Engelhard
notfound 883938 3.16.43-2+deb8u5
found 883938 3.16.51-2
thanks

Hi,

On Sat, Dec 09, 2017 at 06:23:04PM +0100, debbug wrote:
> Package: src:linux
> Version: 3.16.43-2+deb8u5
> Severity: grave
> Justification: renders package unusable
> 
> (Note: This bug affects version 3.16.51-2, not 3.16.43-2+deb8u5,
> but that's the version that "reportbug" filled in after downgrading
> to get this system back up running)

Then you should have edited the report accordingly...

Regards,

Rene



Processed: Re: Bug#883938: linux-image-3.16.0-4-amd64: Kernel panic on boot after upgrading to debian 8.10 kernel 3.16.51

2017-12-09 Thread Debian Bug Tracking System
Processing commands for cont...@bugs.debian.org:

> notfound 883938 3.16.43-2+deb8u5
Bug #883938 [src:linux] linux-image-3.16.0-4-amd64: Kernel panic on boot after 
upgrading to debian 8.10 kernel 3.16.51
No longer marked as found in versions linux/3.16.43-2+deb8u5.
> found 883938 3.16.51-2
Bug #883938 [src:linux] linux-image-3.16.0-4-amd64: Kernel panic on boot after 
upgrading to debian 8.10 kernel 3.16.51
Marked as found in versions linux/3.16.51-2.
> thanks
Stopping processing here.

Please contact me if you need assistance.
-- 
883938: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=883938
Debian Bug Tracking System
Contact ow...@bugs.debian.org with problems



Bug#883938: linux-image-3.16.0-4-amd64: Kernel panic on boot after upgrading to debian 8.10 kernel 3.16.51

2017-12-09 Thread debbug
Package: src:linux
Version: 3.16.43-2+deb8u5
Severity: grave
Justification: renders package unusable

(Note: This bug affects version 3.16.51-2, not 3.16.43-2+deb8u5,
but that's the version that "reportbug" filled in after downgrading
to get this system back up running)

After upgrading from Debian 8.9 to Debian 8.10, this Dell PowerEdge R430 server
immediately and consistently throws a kernel panic on boot. (kernel 3.16.51-2)

Booting from debian-8.9.0-amd64-netinst.iso in rescue mode let me drop into 
a shell for the target installation, where I performed something like:
  cd /var/cache/apt/archives
  dpkg -i *3.16.43*deb8u5*
  reboot
which restored the previous kernel (3.16.43-deb8u5). This kernel works fine.

I only have a terrible java KVM app available for seeing the console, and from
what I can see, it logs with timestamps [0.811] through [0.841] (typing in
from an image - excuse any typos, and leaving out long hexadecimal numbers
that might not be interesting)
  general protection fault:  [#1] SMP
  CPU: 0 PID: 1 Comm: swapper/0 Tainted: G   W  3.16.0-4-amd64 #1 Debian 
3.16.51-2
  Hardware name: Dell PowerEdge R430/03XKDV, BIOS 1.2.6 06/08/2015
  task: 88085fa532d0 ti: 88085fa58000 task.ti: 88085fa58000
  RIP: 0010:[] [<8109be3d)> 
build_sched_domains+0x72d/0cf0
  (lots of x86_64 registers)
  Call trace:
sched_init_smp+0x398/0x452
mutex_lock+0xe/0x2a
put_online_cpus+...
stop_machine+...
kernel_init_freeable+...
rest_init+...
kernel_init+...
ret_from_fork+...
rest_init+...
  Kernel panic - not syncing: Attempted to kill init! exitcode=0x000b
  ---[ end Kernel panic - not syncing: Attempted to kill init! 
exitcode=0x000b

I also stopped by #debian on irc.debian.org and other people mentioned getting
similar (or same) panics on different types of hardware (HP ProLiant DL380 G7,
and a Supermicro server). Downgrading to 3.16.43-deb8u5 seemed to work for them
as well.

-- Package-specific info:
** Version:
Linux version 3.16.0-4-amd64 (debian-ker...@lists.debian.org) (gcc version 
4.8.4 (Debian 4.8.4-1) ) #1 SMP Debian 3.16.43-2+deb8u5 (2017-09-19)

** Command line:
BOOT_IMAGE=/vmlinuz-3.16.0-4-amd64 
root=UUID=89e3af9f-5bf1-4e93-99c0-0eca2f4fd312 ro quiet

** Tainted: O (4096)
 * Out-of-tree module has been loaded.

** Kernel log:
[5.957932] systemd[1]: Starting udev Kernel Device Manager...
[5.961475] ipmi message handler version 39.2
[5.961963] ipmi device interface
[5.962401] Copyright (C) 2004 MontaVista Software - IPMI Powerdown via 
sys_reboot.
[5.963211] IPMI System Interface driver.
[5.963253] ipmi_si: probing via SMBIOS
[5.963255] ipmi_si: SMBIOS: io 0xca8 regsize 1 spacing 4 irq 10
[5.963256] ipmi_si: Adding SMBIOS-specified kcs state machine
[5.963259] ipmi_si: Trying SMBIOS-specified kcs state machine at i/o 
address 0xca8, slave address 0x20, irq 10
[5.978757] systemd-udevd[396]: starting version 215
[5.978816] systemd[1]: Started udev Kernel Device Manager.
[5.978877] systemd[1]: Starting Copy rules generated while the root was 
ro...
[5.979249] systemd[1]: Starting LSB: Set preliminary keymap...
[6.010575] systemd[1]: Started Copy rules generated while the root was ro.
[6.031435] systemd[1]: Mounted FUSE Control File System.
[6.055061] wmi: Mapper loaded
[6.060621] input: Power Button as 
/devices/LNXSYSTM:00/LNXPWRBN:00/input/input3
[6.060626] ACPI: Power Button [PWRF]
[6.082592] ACPI Error: No handler for Region [SYSI] (88107ec271e8) 
[IPMI] (20140424/evregion-163)
[6.082600] ACPI Error: Region IPMI (ID=7) has no handler 
(20140424/exfldio-297)
[6.082605] ACPI Error: Method parse/execution failed [\_SB_.PMI0._GHL] 
(Node 88107ec26310), AE_NOT_EXIST (20140424/psparse-536)
[6.082617] ACPI Error: Method parse/execution failed [\_SB_.PMI0._PMC] 
(Node 88107ec262c0), AE_NOT_EXIST (20140424/psparse-536)
[6.082628] ACPI Exception: AE_NOT_EXIST, Evaluating _PMC 
(20140424/power_meter-755)
[6.120161] ipmi_si ipmi_si.0: Using irq 10
[6.123175] ipmi_si ipmi_si.0: Couldn't set irq info: cc.
[6.123177] ipmi_si ipmi_si.0: Maybe ok, but ipmi might run very slowly.
[6.142529] ipmi_si ipmi_si.0: Found new BMC (man_id: 0x0002a2, prod_id: 
0x0100, dev_id: 0x20)
[6.145569] IPMI poweroff: ATCA Detect mfg 0x2A2 prod 0x100
[6.145571] IPMI poweroff: Found a chassis style poweroff function
[6.145613] ipmi_si ipmi_si.0: IPMI kcs interface initialized
[6.189769] shpchp: Standard Hot Plug PCI Controller Driver version: 0.4
[6.229237] mei_me :00:16.0: Device doesn't have valid ME Interface
[6.237853] IPMI Watchdog: driver initialized
[6.288912] systemd[1]: Mounted Debug File System.
[6.288984] systemd[1]: Mounted Huge Pages File System.
[6.289004] systemd[1]: Mounted POSIX Message Queue File System.
[6.289448] systemd[1]: Started Increase datagram queue length.
[6.289850]