---------- Forwarded message ---------
From: Gabriel Francisco <frc.gabr...@gmail.com>
Date: Thu, Oct 12, 2023 at 8:23 PM
Subject: Re: Bug#1053122: linux-image-6.5.0-1-amd64: using
smp_processor_id() in preemptible
To: Ben Hutchings <b...@decadent.org.uk>


Hi,

> The CPU registers contain several addresses starting ffff89, except for
> rbx which starts ffff99 (and is the faulting address).  That looks like
> a single bit got flipped.

Thanks for the explanation! (now I know how to detect bit flips) :D

> The first BUG message should be more meaningful that what comes after.
> This shows the kernel tried to access non-existent memory.

Yes, I should have reported the first one indeed, I thought too much and
ended reporting the second one. Sorry about that.

> This could be due to a kernel bug, but is more likely a hardware
> problem.  Please test the RAM with memtest86+.  Also if you've enabled
> any overclocking options, turn those off.

Even with XMP(3000@1.35v) enabled (F4-3000C16-16GISB), memtest86+ ran for 3
hours and printed PASS in the screen.
I removed the XMP profile from my memories and ordered new rams to check if
my current ones are faulty (or not).

The message in dmesg was only one occasion. (but I reported it anyways)

The hang does still happens with/without XMP when running 6.5.x kernel
series. It happens when maximizing a video (or time-to-time when my cursor
enters the video area) when using kernel 6.5.x. It does not happen with
kernel 6.1.x series.

I'm using amgpu module.

Greetings,

*Gabriel Francisco*
Linux User #507840
email: frc.gabriel[at]gmail.com <frc.gabr...@gmail.com>


On Thu, Oct 5, 2023 at 1:15 AM Ben Hutchings <b...@decadent.org.uk> wrote:

> Control: retitle -1 linux-image-6.5.0-1-amd64: Kernel page fault in
> process exit due to bit flip
> Control: tag -1 moreinfo
>
> On Wed, 2023-09-27 at 20:45 +0200, Gabriel Francisco wrote:
> > Package: src:linux
> > Version: 6.5.3-1
> > Severity: important
> > Tags: upstream
> > X-Debbugs-Cc: frc.gabr...@gmail.com
> >
> > Dear Maintainer,
> >
> > First of all thanks for your hard work!
> >
> > I noticed my computer started freezing for few seconds when
> entering/exiting
> > full screen videos in youtube using firefox and while trying to check if
> the
> > issue also afected chromium I saw the following message in dmesg:
> >
> > [12569.564300] BUG: unable to handle page fault for address:
> ffff991989e936b8
> > [12569.564304] #PF: supervisor write access in kernel mode
> > [12569.564306] #PF: error_code(0x0002) - not-present page
>
> The first BUG message should be more meaningful that what comes after.
> This shows the kernel tried to access non-existent memory.
>
> > [12569.564308] PGD 0 P4D 0
> > [12569.564311] Oops: 0002 [#1] PREEMPT SMP NOPTI
> > [12569.564314] CPU: 10 PID: 328649 Comm: Chroot Helper Not tainted
> 6.5.0-1-amd64 #1  Debian 6.5.3-1
> > [12569.564317] Hardware name: ASUS System Product Name/ROG STRIX B550-F
> GAMING WIFI II, BIOS 3205 08/14/2023
> > [12569.564318] RIP: 0010:down_write+0x23/0x70
> > [12569.564324] Code: 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 00 53
> 48 89 fb e8 2e bc ff ff bf 01 00 00 00 e8 74 3a 53 ff 31 c0 ba 01 00 00 00
> <f0> 48 0f b1 13 75 33 65 48 8b 04 25 80 29 03 00 48 89 43 08 bf 01
> > [12569.564326] RSP: 0018:ffffa189d736fc70 EFLAGS: 00010246
> > [12569.564328] RAX: 0000000000000000 RBX: ffff991989e936b8 RCX:
> ffff891797aaef00
> > [12569.564330] RDX: 0000000000000001 RSI: ffff891989e645c0 RDI:
> ffffffff8e7c95dc
> > [12569.564331] RBP: ffffffffffffffff R08: 0000000000000060 R09:
> 0000000080400014
> > [12569.564333] R10: ffff8918cbfeb7f8 R11: 0000000000000006 R12:
> 00007f7e5fd00000
> > [12569.564334] R13: 0000000000000001 R14: ffff891989e645c0 R15:
> ffff891989e64958
>
> The CPU registers contain several addresses starting ffff89, except for
> rbx which starts ffff99 (and is the faulting address).  That looks like
> a single bit got flipped.
>
> This could be due to a kernel bug, but is more likely a hardware
> problem.  Please test the RAM with memtest86+.  Also if you've enabled
> any overclocking options, turn those off.
>
> [...]
> > After that the computer can't shutdown and systemd keeps waiting on
> process PID 328649 (Chroot Helper).
>
> This (and the other BUG messages) are because that process crashed in
> kernel mode and couldn't properly exit.
>
> Ben.
>
> --
> Ben Hutchings
> Beware of bugs in the above code;
> I have only proved it correct, not tried it. - Donald Knuth
>
>

Reply via email to