Re: update-grub causes a system lockup

2021-01-19 Thread John Paul Adrian Glaubitz
Hi Dennis!

On 1/12/21 5:58 PM, Dennis Clarke wrote:
> I was thinking that the architecture may be the issue. The age I mean.
> So I dragged out a newer Oracle T4 unit to try. I have no idea what will
> happen with the newer unit and have never tried to run the installer via
> the new SP/console serial interface but will give it a try.

There are known issues with older CPUs which are a bug in the kernel.

However, currently I don't know how to reproduce the crash. If you have 
something
the reproducibly causes the kernel to crash on the old SPARC CPUs that I can 
use,
it would be very helpful for fixing the problem.

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaub...@debian.org
`. `'   Freie Universitaet Berlin - glaub...@physik.fu-berlin.de
  `-GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913



Re: update-grub causes a system lockup

2021-01-12 Thread John Paul Adrian Glaubitz
On 1/12/21 6:18 PM, John Paul Adrian Glaubitz wrote:
> If you could fine a reliable reproducer to trigger the crash, I can later use
> that to bisect the problem to find which particular commit introduced the
> regression.
> 
> So, if you want to help with the SPARC port, this would be an excellent 
> opportunity.

Alternatively, could you just send me your grub.conf which causes the crash when
running update-grub?

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaub...@debian.org
`. `'   Freie Universitaet Berlin - glaub...@physik.fu-berlin.de
  `-GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913



Re: update-grub causes a system lockup

2021-01-12 Thread John Paul Adrian Glaubitz
On 1/12/21 5:58 PM, Dennis Clarke wrote:
>> Either way, it's good to have something which allows to reproduce the bug.
>>
> 
> I was thinking that the architecture may be the issue. The age I mean.
> So I dragged out a newer Oracle T4 unit to try. I have no idea what will
> happen with the newer unit and have never tried to run the installer via
> the new SP/console serial interface but will give it a try.

There are known issues with kernel stability on older SPARC CPUs which have
not been resolved yet.

If you could fine a reliable reproducer to trigger the crash, I can later use
that to bisect the problem to find which particular commit introduced the
regression.

So, if you want to help with the SPARC port, this would be an excellent 
opportunity.

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaub...@debian.org
`. `'   Freie Universitaet Berlin - glaub...@physik.fu-berlin.de
  `-GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913



Re: update-grub causes a system lockup

2021-01-12 Thread Dennis Clarke
On 1/12/21 12:47 AM, John Paul Adrian Glaubitz wrote:
> On 1/12/21 1:39 AM, Dennis Clarke wrote:
>>
>> I made a few minor edits to /etc/default/grub and then :
>>
>> root@ceres:~# update-grub
>> [  303.211729] watchdog: BUG: soft lockup - CPU#0 stuck for 22s!
>> [grub-probe:261]
>> [  303.306793] Modules linked in: sg(E) envctrl(E) display7seg(E)
>> flash(E) fuse(E) configfs(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E)
>> crc16(E) mbcache(E) jbd2(E) crc32c_generic(E) sd_mod(E) t10_pi(E)
>> crc_t10dif(E) crct10dif_generic(E) crct10dif_common(E) ata_generic(E)
>> pata_cmd64x(E) sym53c8xx(E) libata(E) scsi_transport_spi(E) scsi_mod(E)
>> sunhme(E)
>> (...)
>> Also this has been happening for months.
> 
> I would suggest installing a 4.x kernel and see if that helps. I know that
> 5.x kernels can be a bit unstable on certain older SPARC machines.
> 
> If the issue goes away with an older kernel, try bisecting to find the commit
> that introduced the issue.
> 
> Either way, it's good to have something which allows to reproduce the bug.
> 

I was thinking that the architecture may be the issue. The age I mean.
So I dragged out a newer Oracle T4 unit to try. I have no idea what will
happen with the newer unit and have never tried to run the installer via
the new SP/console serial interface but will give it a try.

Dennis



Re: update-grub causes a system lockup

2021-01-11 Thread John Paul Adrian Glaubitz
On 1/12/21 1:39 AM, Dennis Clarke wrote:
> 
> I made a few minor edits to /etc/default/grub and then :
> 
> root@ceres:~# update-grub
> [  303.211729] watchdog: BUG: soft lockup - CPU#0 stuck for 22s!
> [grub-probe:261]
> [  303.306793] Modules linked in: sg(E) envctrl(E) display7seg(E)
> flash(E) fuse(E) configfs(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E)
> crc16(E) mbcache(E) jbd2(E) crc32c_generic(E) sd_mod(E) t10_pi(E)
> crc_t10dif(E) crct10dif_generic(E) crct10dif_common(E) ata_generic(E)
> pata_cmd64x(E) sym53c8xx(E) libata(E) scsi_transport_spi(E) scsi_mod(E)
> sunhme(E)
> (...)
> Also this has been happening for months.

I would suggest installing a 4.x kernel and see if that helps. I know that
5.x kernels can be a bit unstable on certain older SPARC machines.

If the issue goes away with an older kernel, try bisecting to find the commit
that introduced the issue.

Either way, it's good to have something which allows to reproduce the bug.

Adrian

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaub...@debian.org
`. `'   Freie Universitaet Berlin - glaub...@physik.fu-berlin.de
  `-GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913



update-grub causes a system lockup

2021-01-11 Thread Dennis Clarke


I made a few minor edits to /etc/default/grub and then :

root@ceres:~# update-grub
[  303.211729] watchdog: BUG: soft lockup - CPU#0 stuck for 22s!
[grub-probe:261]
[  303.306793] Modules linked in: sg(E) envctrl(E) display7seg(E)
flash(E) fuse(E) configfs(E) ip_tables(E) x_tables(E) autofs4(E) ext4(E)
crc16(E) mbcache(E) jbd2(E) crc32c_generic(E) sd_mod(E) t10_pi(E)
crc_t10dif(E) crct10dif_generic(E) crct10dif_common(E) ata_generic(E)
pata_cmd64x(E) sym53c8xx(E) libata(E) scsi_transport_spi(E) scsi_mod(E)
sunhme(E)
[  303.716582] CPU: 0 PID: 261 Comm: grub-probe Tainted: GE
5.10.0-1-sparc64 #1 Debian 5.10.5-1
[  303.845889] TSTATE: 11001606 TPC: 0094c4f0 TNPC:
0094c4f4 Y: Tainted: GE
[  303.993559] TPC: 
[  304.043951] g0: f800068f5ec0 g1: 0098 g2:
 g3: 0196df50
[  304.158439] g4: f8000ac388a0 g5: 5ff099f6 g6:
f8000b6fc000 g7: 0ef10180
[  304.272918] o0: 00f24960 o1: f8000b6ff8ec o2:
f800042833d0 o3: 
[  304.387399] o4:  o5:  sp:
f8000b6fef81 ret_pc: 0094c4c0
[  304.506456] RPC: 
[  304.556875] l0: 00f24800 l1:  l2:
00664c00 l3: 000661c58e90
[  304.671360] l4: 0002 l5: f8000b6ff8f0 l6:
00e12000 l7: 0001
[  304.785838] i0: f8000ad93048 i1: f8000b47b600 i2:
00f24800 i3: 00f24978
[  304.900318] i4: 00ec i5: 10076818 i6:
f8000b6ff031 i7: 00665838
[  305.014814] I7: 
[  305.066356] Call Trace:
[  305.098501] [<00665838>] chrdev_open+0x98/0x1e0
[  305.167245] [<0065ae30>] do_dentry_open+0x170/0x420
[  305.240529] [<0065ca68>] vfs_open+0x28/0x40
[  305.304691] [<00671348>] path_openat+0x988/0x1100
[  305.375707] [<00673dd0>] do_filp_open+0x50/0x100
[  305.445573] [<0065cd30>] do_sys_openat2+0x70/0x180
[  305.517732] [<0065d268>] sys_openat+0x48/0xc0
[  305.584186] [<00406174>] linux_sparc_syscall+0x34/0x44
~

At this point I have to signal a break to the console.

I am not yet sure exactly which binary causes this problem but I
am going with a wild guess that somewhere in /usr/sbin/grub-mkconfig
we end up with a show stopping fault.  I am walking through it line
by line and trying to find the culprit.

Also this has been happening for months.



-- 
Dennis Clarke
RISC-V/SPARC/PPC/ARM/CISC
UNIX and Linux spoken
GreyBeard and suspenders optional