Public bug reported:
We started testing the jammy/linux-nvidia-6.2 kernels on the nvidia
servers (DGX-1/DGX-2/H100) and hit the following warning during boot:
[ 7.690486] ------------[ cut here ]------------
[ 7.690487] Interrupts were enabled early
[ 7.690490] WARNING: CPU: 0 PID: 0 at init/main.c:1065
start_kernel+0x4da/0x540
[ 7.690498] Modules linked in:
[ 7.690501] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.2.0-1004-nvidia
#4~22.04.1-Ubuntu
[ 7.690504] Hardware name: NVIDIA NVIDIA DGX-2/NVIDIA DGX-2, BIOS 0.29
06/07/2021
[ 7.690505] RIP: 0010:start_kernel+0x4da/0x540
[ 7.690508] Code: ff 48 c7 c7 e8 26 f0 97 e8 b3 59 a8 fd 0f 0b e9 96 fd ff
ff e8 a7 1d 04 00 e9 7c fe ff ff 48 c7 c7 18 27 f0 97 e8 96 59 a8 fd <0f> 0b e9
ed fd ff ff 48 c7 c7 b0 26 f0 97 e8 83 59 a8 fd 0f 0b ff
[ 7.690510] RSP: 0000:ffffffff98803f08 EFLAGS: 00010246
[ 7.690512] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 7.690513] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[ 7.690514] RBP: ffffffff98803f20 R08: 0000000000000000 R09: 0000000000000000
[ 7.690515] R10: 0000000000000000 R11: 0000000000000000 R12: 00000000000000e0
[ 7.690516] R13: 000000005a1ccde0 R14: 000000005a1c7469 R15: 000000005a1d7ee0
[ 7.690518] FS: 0000000000000000(0000) GS:ffff964900600000(0000)
knlGS:0000000000000000
[ 7.690520] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7.690521] CR2: ffff970bfffff000 CR3: 000000ecd7810001 CR4: 00000000000606f0
[ 7.690522] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 7.690523] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 7.690524] Call Trace:
[ 7.690526] <TASK>
[ 7.690529] x86_64_start_kernel+0x102/0x180
[ 7.690536] secondary_startup_64_no_verify+0xe5/0xeb
[ 7.690544] </TASK>
[ 7.690544] ---[ end trace 0000000000000000 ]---
I also see pretty much the same thing on some Ampere based arm64
servers:
[ 0.000519] ------------[ cut here ]------------
[ 0.000521] Interrupts were enabled early
[ 0.000525] WARNING: CPU: 0 PID: 0 at init/main.c:1065
start_kernel+0x3ac/0x514
[ 0.000531] Modules linked in:
[ 0.000535] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.2.0-1004-nvidia
#4~22.04.1-Ubuntu
[ 0.000538] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 0.000540] pc : start_kernel+0x3ac/0x514
[ 0.000543] lr : start_kernel+0x3ac/0x514
[ 0.000545] sp : ffffdec5ff733e60
[ 0.000546] x29: ffffdec5ff733e60 x28: 00000819aa09baac x27: 0000403ffdd124e0
[ 0.000549] x26: 00000000bfdf3788 x25: 000000009b6fc000 x24: 00000000001dba7b
[ 0.000552] x23: 00005ec57c980000 x22: 00000819ab2a0000 x21: ffffdec5ff749140
[ 0.000555] x20: ffffdec5ff73d9c0 x19: ffffdec5ffbe4000 x18: ffffdec5ff74a1c8
[ 0.000558] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
[ 0.000560] x14: 0000000000000000 x13: 0a796c7261652064 x12: 656c62616e652065
[ 0.000563] x11: 656820747563205b x10: 2d2d2d2d2d2d2d2d x9 : 0000000000000000
[ 0.000565] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000
[ 0.000568] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
[ 0.000571] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000
[ 0.000573] Call trace:
[ 0.000574] start_kernel+0x3ac/0x514
[ 0.000577] __primary_switched+0xc0/0xc8
[ 0.000580] ---[ end trace 0000000000000000 ]---
The warning does not appear on an older thunderx2 server.
** Affects: linux-nvidia-6.2 (Ubuntu)
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-nvidia-6.2 in Ubuntu.
https://bugs.launchpad.net/bugs/2026891
Title:
linux-nvidia-6.2 on DGX servers: "WARNING: CPU: 0 PID: 0 at
init/main.c:1065 start_kernel+0x4da/0x540"
Status in linux-nvidia-6.2 package in Ubuntu:
New
Bug description:
We started testing the jammy/linux-nvidia-6.2 kernels on the nvidia
servers (DGX-1/DGX-2/H100) and hit the following warning during boot:
[ 7.690486] ------------[ cut here ]------------
[ 7.690487] Interrupts were enabled early
[ 7.690490] WARNING: CPU: 0 PID: 0 at init/main.c:1065
start_kernel+0x4da/0x540
[ 7.690498] Modules linked in:
[ 7.690501] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.2.0-1004-nvidia
#4~22.04.1-Ubuntu
[ 7.690504] Hardware name: NVIDIA NVIDIA DGX-2/NVIDIA DGX-2, BIOS 0.29
06/07/2021
[ 7.690505] RIP: 0010:start_kernel+0x4da/0x540
[ 7.690508] Code: ff 48 c7 c7 e8 26 f0 97 e8 b3 59 a8 fd 0f 0b e9 96 fd ff
ff e8 a7 1d 04 00 e9 7c fe ff ff 48 c7 c7 18 27 f0 97 e8 96 59 a8 fd <0f> 0b e9
ed fd ff ff 48 c7 c7 b0 26 f0 97 e8 83 59 a8 fd 0f 0b ff
[ 7.690510] RSP: 0000:ffffffff98803f08 EFLAGS: 00010246
[ 7.690512] RAX: 0000000000000000 RBX: 0000000000000000 RCX:
0000000000000000
[ 7.690513] RDX: 0000000000000000 RSI: 0000000000000000 RDI:
0000000000000000
[ 7.690514] RBP: ffffffff98803f20 R08: 0000000000000000 R09:
0000000000000000
[ 7.690515] R10: 0000000000000000 R11: 0000000000000000 R12:
00000000000000e0
[ 7.690516] R13: 000000005a1ccde0 R14: 000000005a1c7469 R15:
000000005a1d7ee0
[ 7.690518] FS: 0000000000000000(0000) GS:ffff964900600000(0000)
knlGS:0000000000000000
[ 7.690520] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7.690521] CR2: ffff970bfffff000 CR3: 000000ecd7810001 CR4:
00000000000606f0
[ 7.690522] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 7.690523] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[ 7.690524] Call Trace:
[ 7.690526] <TASK>
[ 7.690529] x86_64_start_kernel+0x102/0x180
[ 7.690536] secondary_startup_64_no_verify+0xe5/0xeb
[ 7.690544] </TASK>
[ 7.690544] ---[ end trace 0000000000000000 ]---
I also see pretty much the same thing on some Ampere based arm64
servers:
[ 0.000519] ------------[ cut here ]------------
[ 0.000521] Interrupts were enabled early
[ 0.000525] WARNING: CPU: 0 PID: 0 at init/main.c:1065
start_kernel+0x3ac/0x514
[ 0.000531] Modules linked in:
[ 0.000535] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.2.0-1004-nvidia
#4~22.04.1-Ubuntu
[ 0.000538] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 0.000540] pc : start_kernel+0x3ac/0x514
[ 0.000543] lr : start_kernel+0x3ac/0x514
[ 0.000545] sp : ffffdec5ff733e60
[ 0.000546] x29: ffffdec5ff733e60 x28: 00000819aa09baac x27:
0000403ffdd124e0
[ 0.000549] x26: 00000000bfdf3788 x25: 000000009b6fc000 x24:
00000000001dba7b
[ 0.000552] x23: 00005ec57c980000 x22: 00000819ab2a0000 x21:
ffffdec5ff749140
[ 0.000555] x20: ffffdec5ff73d9c0 x19: ffffdec5ffbe4000 x18:
ffffdec5ff74a1c8
[ 0.000558] x17: 0000000000000000 x16: 0000000000000000 x15:
0000000000000000
[ 0.000560] x14: 0000000000000000 x13: 0a796c7261652064 x12:
656c62616e652065
[ 0.000563] x11: 656820747563205b x10: 2d2d2d2d2d2d2d2d x9 :
0000000000000000
[ 0.000565] x8 : 0000000000000000 x7 : 0000000000000000 x6 :
0000000000000000
[ 0.000568] x5 : 0000000000000000 x4 : 0000000000000000 x3 :
0000000000000000
[ 0.000571] x2 : 0000000000000000 x1 : 0000000000000000 x0 :
0000000000000000
[ 0.000573] Call trace:
[ 0.000574] start_kernel+0x3ac/0x514
[ 0.000577] __primary_switched+0xc0/0xc8
[ 0.000580] ---[ end trace 0000000000000000 ]---
The warning does not appear on an older thunderx2 server.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-nvidia-6.2/+bug/2026891/+subscriptions
--
Mailing list: https://launchpad.net/~kernel-packages
Post to : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp