Re: 5.17.0 boot issue on Miata
On Mon, Apr 25, 2022 at 10:26:46AM +0100, John Garry wrote: > Please try v5.18-rc2 as it should have a fix in commit eaba83b5b850 Up and running on v5.18-rc5 as I type this. Fix confirmed. Thanks! --Bob
Re: 5.17.0 boot issue on Miata
(Adding linux-scsi and linux-kernel, now that bisection is complete.) On Wed, Apr 06, 2022 at 05:44:01PM -0500, Bob Tracy wrote: > v5.17-rc2 ok. v5.17-rc3 I get the disk sector errors and hang that I > reported in the first message in this thread. This is on an Alpha Miata platform (PWS 433au) with QLogic ISP1020 controller. Here's the implicated commit: edb854a3680bacc9ef9b91ec0c5ff6105886f6f3 is the first bad commit commit edb854a3680bacc9ef9b91ec0c5ff6105886f6f3 Author: Ming Lei Date: Thu Jan 27 23:37:33 2022 +0800 scsi: core: Reallocate device's budget map on queue depth change We currently use ->cmd_per_lun as initial queue depth for setting up the budget_map. Martin Wilck reported that it is common for the queue_depth to be subsequently updated in slave_configure() based on detected hardware characteristics. As a result, for some drivers, the static host template settings for cmd_per_lun and can_queue won't actually get used in practice. And if the default values are used to allocate the budget_map, memory may be consumed unnecessarily. Fix the issue by reallocating the budget_map after ->slave_configure() returns. At that time the device queue_depth should accurately reflect what the hardware needs. Link: https://lore.kernel.org/r/20220127153733.409132-1-ming@redhat.com Cc: Bart Van Assche Reported-by: Martin Wilck Suggested-by: Martin Wilck Tested-by: Martin Wilck Reviewed-by: Martin Wilck Reviewed-by: Bart Van Assche Signed-off-by: Ming Lei Signed-off-by: Martin K. Petersen drivers/scsi/scsi_scan.c | 55 +++- 1 file changed, 50 insertions(+), 5 deletions(-) Respectfully, --Bob
Re: 5.17.0 boot issue on Miata
On Wed, Apr 06, 2022 at 05:44:01PM -0500, Bob Tracy wrote: > v5.17-rc2 ok. v5.17-rc3 I get the disk sector errors and hang that I > reported in the first message in this thread. > > I'm going to try a native build of '-rc3' just to rule out any > cross-compiler strangeness. Should have something to report in another > 34 hours or so :-(. Confirmed: the native build was just as broken as the cross build. The bug was introduced somewhere between v5.17-rc2 and v5.17-rc3. But at least I have a bit more confidence in the integrity of what the cross tools build. Interesting aside: the cross build's vmlinux.gz was approx. 200k larger. That might be due to gcc version differences (native toolchain is 11.2, and the cross toolchain is 11.1). I'll start the actual bisection process today. If I don't finish today, it will be at least another week before I can get back to this, so apologies in advance. --Bob
Re: 5.17.0 boot issue on Miata
On Tue, Apr 05, 2022 at 08:22:48PM +0200, Helge Deller wrote: > You don't need to enable it, but for alpha it's probably beneficial to enable > it. > When enabled, you will see a big speed improvement when logging in to a > graphics text > console and printing info. E.g. try "time dmesg" with and without that > option... > The "dmesg" will scroll the screen, and that's what it accelerates (only if > the driver > has such hardware bitblt-support). v5.17-rc2 ok. v5.17-rc3 I get the disk sector errors and hang that I reported in the first message in this thread. (Unrelated, I *did* enable the framebuffer option, and that part of the boot worked just fine.) I'm going to try a native build of '-rc3' just to rule out any cross-compiler strangeness. Should have something to report in another 34 hours or so :-(. --Bob
Re: 5.17.0 boot issue on Miata
On 4/5/22 15:55, Bob Tracy wrote: > On Tue, Apr 05, 2022 at 05:01:25PM +1200, Michael Cree wrote: >> On Mon, Apr 04, 2022 at 08:42:38PM -0500, Bob Tracy wrote: >>> On Sun, Mar 27, 2022 at 11:21:57AM +1300, Michael Cree wrote: On Thu, Mar 24, 2022 at 09:54:15PM -0500, Bob Tracy wrote: > When I attempt to boot a 5.17.0 kernel built from the kernel.org > sources, I see disk sector errors on my "sda" device, and the boot > process hangs at the point where "systemd-udevd.service" starts. > > Rebooting on 5.16.0 works with no disk I/O errors of any kind. Oh, you can run a 5.16.y kernel on Alpha? I have had problems with everything since 5.9.y with rare, random, corruptions in memory in user space (exhibiting as glibc detected memory corruptions or segfaults). >>> >>> Did we have this painted into the "SMP vs. not-SMP" corner at one point? >> >> No, this affects both ES45 (with 3 cpus) and XP1000 (one cpu). >> >> The problem is rare. I often have to run tests for 12 hours on >> the XP1000 before I see a problem. On miata it might occur even >> less often. >> >> I hope I am getting close to the bad commit, but it is taking >> time when I run testing for a whole day before I feel confident >> enough to mark the kernel as good. And I have been wrong on >> that one a couple of times now, having to repeat part of the >> bisection. > > Stupid question, but possibly related to what I'm seeing in v5.17-final. > Beginning with "-rc3" there's a new FRAMEBUFFER_CONSOLE_LEGACY_ACCELERATION > configuration option. Do I need this enabled on Miata if I normally > boot in a video mode that displays a logo? I'll try "no" for the "-rc3" > build if/when "-rc2" boots properly. You don't need to enable it, but for alpha it's probably beneficial to enable it. When enabled, you will see a big speed improvement when logging in to a graphics text console and printing info. E.g. try "time dmesg" with and without that option... The "dmesg" will scroll the screen, and that's what it accelerates (only if the driver has such hardware bitblt-support). Helge
Re: 5.17.0 boot issue on Miata
On Tue, Apr 05, 2022 at 05:01:25PM +1200, Michael Cree wrote: > On Mon, Apr 04, 2022 at 08:42:38PM -0500, Bob Tracy wrote: > > On Sun, Mar 27, 2022 at 11:21:57AM +1300, Michael Cree wrote: > > > On Thu, Mar 24, 2022 at 09:54:15PM -0500, Bob Tracy wrote: > > > > When I attempt to boot a 5.17.0 kernel built from the kernel.org > > > > sources, I see disk sector errors on my "sda" device, and the boot > > > > process hangs at the point where "systemd-udevd.service" starts. > > > > > > > > Rebooting on 5.16.0 works with no disk I/O errors of any kind. > > > > > > Oh, you can run a 5.16.y kernel on Alpha? I have had problems > > > with everything since 5.9.y with rare, random, corruptions in > > > memory in user space (exhibiting as glibc detected memory > > > corruptions or segfaults). > > > > Did we have this painted into the "SMP vs. not-SMP" corner at one point? > > No, this affects both ES45 (with 3 cpus) and XP1000 (one cpu). > > The problem is rare. I often have to run tests for 12 hours on > the XP1000 before I see a problem. On miata it might occur even > less often. > > I hope I am getting close to the bad commit, but it is taking > time when I run testing for a whole day before I feel confident > enough to mark the kernel as good. And I have been wrong on > that one a couple of times now, having to repeat part of the > bisection. Stupid question, but possibly related to what I'm seeing in v5.17-final. Beginning with "-rc3" there's a new FRAMEBUFFER_CONSOLE_LEGACY_ACCELERATION configuration option. Do I need this enabled on Miata if I normally boot in a video mode that displays a logo? I'll try "no" for the "-rc3" build if/when "-rc2" boots properly. --Bob
Re: 5.17.0 boot issue on Miata
On Mon, Apr 04, 2022 at 08:42:38PM -0500, Bob Tracy wrote: > On Sun, Mar 27, 2022 at 11:21:57AM +1300, Michael Cree wrote: > > On Thu, Mar 24, 2022 at 09:54:15PM -0500, Bob Tracy wrote: > > > When I attempt to boot a 5.17.0 kernel built from the kernel.org > > > sources, I see disk sector errors on my "sda" device, and the boot > > > process hangs at the point where "systemd-udevd.service" starts. > > > > > > Rebooting on 5.16.0 works with no disk I/O errors of any kind. > > > > Oh, you can run a 5.16.y kernel on Alpha? I have had problems > > with everything since 5.9.y with rare, random, corruptions in > > memory in user space (exhibiting as glibc detected memory > > corruptions or segfaults). > > Did we have this painted into the "SMP vs. not-SMP" corner at one point? No, this affects both ES45 (with 3 cpus) and XP1000 (one cpu). The problem is rare. I often have to run tests for 12 hours on the XP1000 before I see a problem. On miata it might occur even less often. I hope I am getting close to the bad commit, but it is taking time when I run testing for a whole day before I feel confident enough to mark the kernel as good. And I have been wrong on that one a couple of times now, having to repeat part of the bisection. Cheers, Michael.
Re: 5.17.0 boot issue on Miata
On Sun, Mar 27, 2022 at 11:21:57AM +1300, Michael Cree wrote: > On Thu, Mar 24, 2022 at 09:54:15PM -0500, Bob Tracy wrote: > > When I attempt to boot a 5.17.0 kernel built from the kernel.org > > sources, I see disk sector errors on my "sda" device, and the boot > > process hangs at the point where "systemd-udevd.service" starts. > > > > Rebooting on 5.16.0 works with no disk I/O errors of any kind. > > Oh, you can run a 5.16.y kernel on Alpha? I have had problems > with everything since 5.9.y with rare, random, corruptions in > memory in user space (exhibiting as glibc detected memory > corruptions or segfaults). Did we have this painted into the "SMP vs. not-SMP" corner at one point? Miata is an automatic not-SMP case for hand-built kernels for that architecture, which might explain why I'm not seeing the problems with user space memory corruption. Just successfully booted on v5.17-rc1 a little while ago. Moving on to "-rc2". --Bob
Re: 5.17.0 boot issue on Miata
On Thu, Mar 24, 2022 at 09:54:15PM -0500, Bob Tracy wrote: > When I attempt to boot a 5.17.0 kernel built from the kernel.org > sources, I see disk sector errors on my "sda" device, and the boot > process hangs at the point where "systemd-udevd.service" starts. > > Rebooting on 5.16.0 works with no disk I/O errors of any kind. Oh, you can run a 5.16.y kernel on Alpha? I have had problems with everything since 5.9.y with rare, random, corruptions in memory in user space (exhibiting as glibc detected memory corruptions or segfaults). This is why I am still running a 5.8.y kernel on the Debian Ports buildd. I just compiled up a 5.16.y kernel and the problem is still there. It did take a bit to trigger the bug (about 10 hours of testing before it happened). I had done a bisection between 5.8.0 and 5.9.0 last year but I think it went astray (as testing is difficult and not fool proof). You email has prompted me to go back to it and see if I can nail down the offending commit. We really want to get it fixed. Cheers, Michael.
5.17.0 boot issue on Miata
When I attempt to boot a 5.17.0 kernel built from the kernel.org sources, I see disk sector errors on my "sda" device, and the boot process hangs at the point where "systemd-udevd.service" starts. Rebooting on 5.16.0 works with no disk I/O errors of any kind. Assuming the 5.17.0 kernel or its associated initrd had bad sectors, I rebuilt both and saw no I/O errors during the build nor afterward when copying the new kernel into place under "/boot". Even tried a cross-compile build of a 5.17.0 alpha kernel on my x86_64 platform to save build time (34 hours for a native build on a PWS 433au vs. 2 hours on the x86_64 platform). That build produced identical results when I tried booting on it. If anyone else is seeing this and can get a head-start on bisecting, that would be very much appreciated. I won't be able to get to it for about a week and a half :-(. 5.16.0 works. 5.17.0 doesn't. Might get lucky and find that the offending changes happened in the first 5.17.0 release candidate. As always, sincere thanks in advance. --Bob