---------- Forwarded message --------- From: Gabriel Francisco <frc.gabr...@gmail.com> Date: Thu, Oct 12, 2023 at 8:23 PM Subject: Re: Bug#1053122: linux-image-6.5.0-1-amd64: using smp_processor_id() in preemptible To: Ben Hutchings <b...@decadent.org.uk>
Hi, > The CPU registers contain several addresses starting ffff89, except for > rbx which starts ffff99 (and is the faulting address). That looks like > a single bit got flipped. Thanks for the explanation! (now I know how to detect bit flips) :D > The first BUG message should be more meaningful that what comes after. > This shows the kernel tried to access non-existent memory. Yes, I should have reported the first one indeed, I thought too much and ended reporting the second one. Sorry about that. > This could be due to a kernel bug, but is more likely a hardware > problem. Please test the RAM with memtest86+. Also if you've enabled > any overclocking options, turn those off. Even with XMP(3000@1.35v) enabled (F4-3000C16-16GISB), memtest86+ ran for 3 hours and printed PASS in the screen. I removed the XMP profile from my memories and ordered new rams to check if my current ones are faulty (or not). The message in dmesg was only one occasion. (but I reported it anyways) The hang does still happens with/without XMP when running 6.5.x kernel series. It happens when maximizing a video (or time-to-time when my cursor enters the video area) when using kernel 6.5.x. It does not happen with kernel 6.1.x series. I'm using amgpu module. Greetings, *Gabriel Francisco* Linux User #507840 email: frc.gabriel[at]gmail.com <frc.gabr...@gmail.com> On Thu, Oct 5, 2023 at 1:15 AM Ben Hutchings <b...@decadent.org.uk> wrote: > Control: retitle -1 linux-image-6.5.0-1-amd64: Kernel page fault in > process exit due to bit flip > Control: tag -1 moreinfo > > On Wed, 2023-09-27 at 20:45 +0200, Gabriel Francisco wrote: > > Package: src:linux > > Version: 6.5.3-1 > > Severity: important > > Tags: upstream > > X-Debbugs-Cc: frc.gabr...@gmail.com > > > > Dear Maintainer, > > > > First of all thanks for your hard work! > > > > I noticed my computer started freezing for few seconds when > entering/exiting > > full screen videos in youtube using firefox and while trying to check if > the > > issue also afected chromium I saw the following message in dmesg: > > > > [12569.564300] BUG: unable to handle page fault for address: > ffff991989e936b8 > > [12569.564304] #PF: supervisor write access in kernel mode > > [12569.564306] #PF: error_code(0x0002) - not-present page > > The first BUG message should be more meaningful that what comes after. > This shows the kernel tried to access non-existent memory. > > > [12569.564308] PGD 0 P4D 0 > > [12569.564311] Oops: 0002 [#1] PREEMPT SMP NOPTI > > [12569.564314] CPU: 10 PID: 328649 Comm: Chroot Helper Not tainted > 6.5.0-1-amd64 #1 Debian 6.5.3-1 > > [12569.564317] Hardware name: ASUS System Product Name/ROG STRIX B550-F > GAMING WIFI II, BIOS 3205 08/14/2023 > > [12569.564318] RIP: 0010:down_write+0x23/0x70 > > [12569.564324] Code: 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 53 > 48 89 fb e8 2e bc ff ff bf 01 00 00 00 e8 74 3a 53 ff 31 c0 ba 01 00 00 00 > <f0> 48 0f b1 13 75 33 65 48 8b 04 25 80 29 03 00 48 89 43 08 bf 01 > > [12569.564326] RSP: 0018:ffffa189d736fc70 EFLAGS: 00010246 > > [12569.564328] RAX: 0000000000000000 RBX: ffff991989e936b8 RCX: > ffff891797aaef00 > > [12569.564330] RDX: 0000000000000001 RSI: ffff891989e645c0 RDI: > ffffffff8e7c95dc > > [12569.564331] RBP: ffffffffffffffff R08: 0000000000000060 R09: > 0000000080400014 > > [12569.564333] R10: ffff8918cbfeb7f8 R11: 0000000000000006 R12: > 00007f7e5fd00000 > > [12569.564334] R13: 0000000000000001 R14: ffff891989e645c0 R15: > ffff891989e64958 > > The CPU registers contain several addresses starting ffff89, except for > rbx which starts ffff99 (and is the faulting address). That looks like > a single bit got flipped. > > This could be due to a kernel bug, but is more likely a hardware > problem. Please test the RAM with memtest86+. Also if you've enabled > any overclocking options, turn those off. > > [...] > > After that the computer can't shutdown and systemd keeps waiting on > process PID 328649 (Chroot Helper). > > This (and the other BUG messages) are because that process crashed in > kernel mode and couldn't properly exit. > > Ben. > > -- > Ben Hutchings > Beware of bugs in the above code; > I have only proved it correct, not tried it. - Donald Knuth > >