Re: Linux 6.1.27, cgroup: Instruction fault 4 with systemd
Hi, let me add some additional data point(s): After some testing on different machines and with different kernel types it looks like this problem is exclusive to MP kernels. This also when running a MP kernel on a single processor machine actually (tested on an AlphaServer 800 5/400 w/EV56). Running an SP kernel does not trigger that problem. I posted a diff between the -alpha-generic and -alpha-smp kernel configurations on [1]. [1]: https://pastebin.com/AwZQjHD9 On 22.05.23 11:37, John Paul Adrian Glaubitz wrote: Hello Frank! On Mon, 2023-05-22 at 11:34 +0200, Frank Scheiner wrote: Maybe someone on linux-alpha has an idea what could be the reason? Try reproducing it with libcgroup to see if it's a systemd or a kernel bug: https://wiki.archlinux.org/title/cgroups#Examples Took me a while to get back to this and actually get it working... Following misc. examples and manpages (e.g. [2] and [3]) I did the following to test cgroup functionality with System V init installed and running instead of systemd: ``` root@ds25:~# uname -a Linux ds25 6.3.0-1-alpha-smp #1 SMP Debian 6.3.7-1 (2023-06-12) alpha GNU/Linux root@ds25:~# mount [...] cgroup on /sys/fs/cgroup type tmpfs (rw,relatime,mode=755,inode64) cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,relatime,cpuset) cgroup on /sys/fs/cgroup/cpu type cgroup (rw,relatime,cpu) [...] cgroup on /sys/fs/cgroup/rdma type cgroup (rw,relatime,rdma) cgroup on /sys/fs/cgroup/misc type cgroup (rw,relatime,misc) root@ds25:~# CGROUP=/sys/fs/cgroup root@ds25:~# mkdir $CGROUP/red root@ds25:~# mount -t cgroup -o cpuset red $CGROUP/red root@ds25:~# mkdir -p $CGROUP/red/shells/bash root@ds25:~# chown root:root $CGROUP/red/shells/bash/* root@ds25:~# id johndoe uid=1001(johndoe) gid=1001(johndoe) groups=1001(johndoe),100(users) root@ds25:~# chown root:johndoe $CGROUP/red/shells/bash/tasks root@ds25:~# echo $(cgget -n -v -r cpuset.mems /) > $CGROUP/red/shells/cpuset.mems root@ds25:~# echo $(cgget -n -v -r cpuset.cpus /) > $CGROUP/red/shells/cpuset.cpus root@ds25:~# echo 0 > $CGROUP/red/shells/bash/cpuset.mems root@ds25:~# echo 0 > $CGROUP/red/shells/bash/cpuset.cpus root@ds25:~# cat /proc/self/cgroup 13:misc:/ 12:rdma:/ 11:pids:/ 10:net_prio:/ 9:perf_event:/ 8:net_cls:/ 7:freezer:/ 6:devices:/ 5:memory:/ 4:blkio:/ 3:cpuacct:/ 2:cpu:/ 1:cpuset:/ root@ds25:~# echo $$ 1496 root@ds25:~# cgexec -g cpuset:shells/bash bash root@ds25:~# echo $$ 1695 root@ds25:~# cat /proc/self/cgroup 13:misc:/ [...] 2:cpu:/ 1:cpuset:/shells/bash ``` [2]: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/resource_management_guide/ch-using_control_groups [3]: https://wiki.archlinux.org/title/cgroups#Examples I then ran `7za b` in that shell and though `7za` executes two threads assuming it has access to both CPUs, `htop` showed both of them running on the first processor only. So it looks like at least this part of the cgroup functionality is working with Linux 6.3.0-1 from Debian when using System V init. So it could be that this problem is only triggered with one or multiple specific controller(s). But I don't exactly know how to determine the used controller(s) for target "graphical.target" - where this seems to happen according to (see more details on [4]): ``` [...] [ 11.864251] systemd[1]: Queued start job for default target graphical.target. [ 11.958978] CPU 1 [ 11.958978] systemd(1): Instruction fault 4 [...] ``` [4]: https://lists.debian.org/debian-alpha/2023/05/msg00012.html Cheers, Frank
Re: [PATCH] module: fix module load for ia64
On 29.05.23 01:00, Song Liu wrote: Frank reported boot regression in ia64 as: ELILO v3.16 for EFI/IA-64 .. Uncompressing Linux... done Loading file AC100221.initrd.img...done [0.00] Linux version 6.4.0-rc3 (root@x4270) (ia64-linux-gcc (GCC) 12.2.0, GNU ld (GNU Binutils) 2.39) #1 SMP Thu May 25 15:52:20 CEST 2023 [0.00] efi: EFI v1.1 by HP [0.00] efi: SALsystab=0x3ee7a000 ACPI 2.0=0x3fe2a000 ESI=0x3ee7b000 SMBIOS=0x3ee7c000 HCDP=0x3fe28000 [0.00] PCDP: v3 at 0x3fe28000 [0.00] earlycon: uart8250 at MMIO 0xf405 (options '9600n8') [0.00] printk: bootconsole [uart8250] enabled [0.00] ACPI: Early table checksum verification disabled [0.00] ACPI: RSDP 0x3FE2A000 28 (v02 HP) [0.00] ACPI: XSDT 0x3FE2A02C CC (v01 HP rx2620 HP ) [...] [3.793350] Run /init as init process Loading, please wait... Starting systemd-udevd version 252.6-1 [3.951100] [ cut here ] [3.951100] WARNING: CPU: 6 PID: 140 at kernel/module/main.c:1547 __layout_sections+0x370/0x3c0 [3.949512] Unable to handle kernel paging request at virtual address 1000 [3.951100] Modules linked in: [3.951100] CPU: 6 PID: 140 Comm: (udev-worker) Not tainted 6.4.0-rc3 #1 [3.956161] (udev-worker)[142]: Oops 11003706212352 [1] [3.951774] Hardware name: hp server rx2620 , BIOS 04.29 11/30/2007 [3.951774] [3.951774] Call Trace: [3.958339] Unable to handle kernel paging request at virtual address 1000 [3.956161] Modules linked in: [3.951774] [] show_stack.part.0+0x30/0x60 [3.951774] sp=e00183a67b20 bsp=e00183a61628 [3.956161] [3.956161] which bisect to module_memory change [1]. Debug showed that ia64 uses some special sections: __layout_sections: section .got (sh_flags 1002) matched to MOD_INVALID __layout_sections: section .sdata (sh_flags 1003) matched to MOD_INVALID __layout_sections: section .sbss (sh_flags 1003) matched to MOD_INVALID All these sections are loaded to module core memory before [1]. Fix ia64 boot by loading these sections to MOD_DATA (core rw data). [1] commit ac3b43283923 ("module: replace module_layout with module_memory") Fixes: ac3b43283923 ("module: replace module_layout with module_memory") Reported-by: Frank Scheiner Closes: https://lists.debian.org/debian-ia64/2023/05/msg00010.html Closes: https://marc.info/?l=linux-ia64&m=168509859125505 Cc: Linus Torvalds Signed-off-by: Song Liu --- kernel/module/main.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/kernel/module/main.c b/kernel/module/main.c index b4c7e925fdb0..9da4b551321e 100644 --- a/kernel/module/main.c +++ b/kernel/module/main.c @@ -1521,14 +1521,14 @@ static void __layout_sections(struct module *mod, struct load_info *info, bool i MOD_RODATA, MOD_RO_AFTER_INIT, MOD_DATA, - MOD_INVALID,/* This is needed to match the masks array */ + MOD_DATA, }; static const int init_m_to_mem_type[] = { MOD_INIT_TEXT, MOD_INIT_RODATA, MOD_INVALID, MOD_INIT_DATA, - MOD_INVALID,/* This is needed to match the masks array */ + MOD_INIT_DATA, }; for (m = 0; m < ARRAY_SIZE(masks); ++m) { Just want to add another observation (though not strictly ia64 but I wanted to keep the context): Testing showed that this patch also fixes module loading for alpha (tested on an AlphaServer DS25 w/v6.4-rc4 w/ and w/o the patch applied). Cheers, Frank
Re: systemd 252.6-1 produces an Instruction fault, sysvinit works
Dear Michael, On 21.05.23 02:11, Michael Cree wrote: On Thu, May 18, 2023 at 01:01:25PM +0200, Frank Scheiner wrote: Welcome to Debian GNU/Linux 12 (bookworm)! [ 11.958978] CPU 1 [ 11.958978] systemd(1): Instruction fault 4 [ 12.032220] pc = [] ra = [] ps = Not tainted [ 12.131829] pc is at 0xfc0005163bfc [ 12.177728] ra is at 0xfc0005163bf8 Yeah, I think I have seen this one too when I tried out the new kernel. Instruction fault 4 occurs on the execution of a reserved instruction (i.e. an invalid opcode) or the execution of a privileged instruction in user mode. Interestingly the program counter is in kernel space. So raises the question is it the kernel that tried to execute an invalid instruction? Looking at the code: [ 13.559563] Code: [ 13.559563] fc00 [ 13.582024] [ 13.610344] [ 13.638664] 05163bfc [ 13.666985] fc00 [ 13.695305] 02871148 [ 13.723625] [ 13.751946] [ 13.779289] These do not appear to be valid code. They look more like addresses. It has the appearance that the kernel has jumped into data and tried to execute it as code! Thanks for your explanations and analysis, that's more than helpful. Maybe report it to the Linux Kernel Mail List? The thread is here: https://marc.info/?l=linux-alpha&m=168474811430816&w=2 I would have posted a link to the debian-alpha mailing list, but the assumed greylisting there prevented my message to appear there yet. Cheers, Frank
Linux 6.1.27, cgroup: Instruction fault 4 with systemd
Dear all, as already outlined on the debian-alpha mailing list ([1]), I get an instruction fault 4 with Linux 6.1.27 (6.1.0-9 on Debian actually) and systemd on my DS25: ``` aboot: Linux/Alpha SRM bootloader version 1.0_pre20040408 aboot: switching to OSF/1 PALcode version 1.92 aboot: loading initrd (5376720 bytes/10502 blocks) at 0xfc00ffacc000 aboot: starting kernel network with arguments root=/dev/nfs ip=:enP2p2s5:dhcp console=ttyS0,9600n8 [0.00] Linux version 6.1.0-9-alpha-smp (debian-ker...@lists.debian.org) (gcc-12 (Debian 12.2.0-9) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP Debian 6.1.27-1 (2023-05-08) [0.00] Booting GENERIC on Titan variation Granite using machine vector PRIVATEER from SRM [0.00] Major Options: SMP MAGIC_SYSRQ [0.00] Command line: root=/dev/nfs ip=:enP2p2s5:dhcp console=ttyS0,9600n8 [...] Begin: Running /scripts/nfs-bottom ... done. Begin: Running /scripts/init-bottom ... done. [9.820307] systemd[1]: systemd 252.6-1 running in system mode (+PAM +AUDIT +SELINUX +APPARMOR +IMA +SMACK -SECCOMP +GCRYPT -GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY +P11KIT +QRENCODE +TPM2 +BZIP2 +LZ4 +XZ +ZLIB +ZSTD -BPF_FRAMEWORK -XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified) [ 10.202143] systemd[1]: Detected architecture alpha. Welcome to Debian GNU/Linux 12 (bookworm)! [ 11.864251] systemd[1]: Queued start job for default target graphical.target. [ 11.958978] CPU 1 [ 11.958978] systemd(1): Instruction fault 4 [ 12.032220] pc = [] ra = [] ps = Not tainted [ 12.131829] pc is at 0xfc0005163bfc [ 12.177728] ra is at 0xfc0005163bf8 [ 12.223626] v0 = t0 = 0023 t1 = fc00066eb800 [ 12.310540] t2 = fc000512e680 t3 = 00f0 t4 = 0008 [ 12.398431] t5 = 0001 t6 = t7 = fc000516 [ 12.486321] a0 = a1 = fc0005163bc0 a2 = fc0005163bf8 [ 12.573235] a3 = 0001 a4 = 0002c8cf86cc a5 = 0001 [ 12.661126] t8 = 0080 t9 = 0001 t10= fc0002891148 [ 12.749016] t11= pv = fc00011d4a40 at = 5f19e10505e118bf [ 12.835930] gp = fc0002871148 sp = 440a695e [ 12.899407] Disabling lock debugging due to kernel taint [ 12.962883] Trace: [ 12.987298] [] cgroup_migrate_execute+0x338/0x600 [ 13.062493] [] cgroup_update_dfl_csses+0x2c8/0x330 [ 13.138665] [] cgroup_subtree_control_write+0x56c/0x5e0 [ 13.219719] [] cgroup_file_write+0xa4/0x1a0 [ 13.288079] [] kernfs_fop_write_iter+0x1a4/0x330 [ 13.362297] [] vfs_write+0x250/0x4c0 [ 13.423821] [] ksys_write+0x8c/0x140 [ 13.485344] [] entSys+0xac/0xc0 [ 13.541985] [ 13.559563] Code: [ 13.559563] fc00 [ 13.582024] [ 13.610344] [ 13.638664] 05163bfc [ 13.666985] fc00 [ 13.695305] 02871148 [ 13.723625] [ 13.751946] [ 13.779289] ``` [1]: https://lists.debian.org/debian-alpha/2023/05/msg7.html Checking with a few alternatives, this already seems to happen with Linux 6.0.7 and systemd 251.6-1 and 250.4-1. When using sysvinit, the system comes up OK and runs stable over a few runs of `7z b` and `openssl speed -elapsed`. It does also not happen when using Linux 5.3.0-3 from Debian with the same systemd versions on the same machine. Michael provided a first analysis on [2], Adrian locates it in the cgroup code. [2]: https://lists.debian.org/debian-alpha/2023/05/msg00010.html Maybe someone on linux-alpha has an idea what could be the reason? Cheers, Frank
systemd 252.6-1 produces an Instruction fault, sysvinit works
Hi all, subject says it all: I yesterday upgraded my root FS(es) on my DS25 and noticed the following issue with the systemd version right where the login prompt should appear (I seem to remember that I recognized something similar already late last year with a self-compiled kernel but attributed it to the kernel being self-compiled): ``` aboot: Linux/Alpha SRM bootloader version 1.0_pre20040408 aboot: switching to OSF/1 PALcode version 1.92 aboot: loading initrd (5376720 bytes/10502 blocks) at 0xfc00ffacc000 aboot: starting kernel network with arguments root=/dev/nfs ip=:enP2p2s5:dhcp console=ttyS0,9600n8 [0.00] Linux version 6.1.0-9-alpha-smp (debian-ker...@lists.debian.org) (gcc-12 (Debian 12.2.0-9) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP Debian 6.1.27-1 (2023-05-08) [0.00] Booting GENERIC on Titan variation Granite using machine vector PRIVATEER from SRM [0.00] Major Options: SMP MAGIC_SYSRQ [0.00] Command line: root=/dev/nfs ip=:enP2p2s5:dhcp console=ttyS0,9600n8 [...] Begin: Running /scripts/nfs-bottom ... done. Begin: Running /scripts/init-bottom ... done. [9.820307] systemd[1]: systemd 252.6-1 running in system mode (+PAM +AUDIT +SELINUX +APPARMOR +IMA +SMACK -SECCOMP +GCRYPT -GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY +P11KIT +QRENCODE +TPM2 +BZIP2 +LZ4 +XZ +ZLIB +ZSTD -BPF_FRAMEWORK -XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified) [ 10.202143] systemd[1]: Detected architecture alpha. Welcome to Debian GNU/Linux 12 (bookworm)! [ 11.864251] systemd[1]: Queued start job for default target graphical.target. [ 11.958978] CPU 1 [ 11.958978] systemd(1): Instruction fault 4 [ 12.032220] pc = [] ra = [] ps = Not tainted [ 12.131829] pc is at 0xfc0005163bfc [ 12.177728] ra is at 0xfc0005163bf8 [ 12.223626] v0 = t0 = 0023 t1 = fc00066eb800 [ 12.310540] t2 = fc000512e680 t3 = 00f0 t4 = 0008 [ 12.398431] t5 = 0001 t6 = t7 = fc000516 [ 12.486321] a0 = a1 = fc0005163bc0 a2 = fc0005163bf8 [ 12.573235] a3 = 0001 a4 = 0002c8cf86cc a5 = 0001 [ 12.661126] t8 = 0080 t9 = 0001 t10= fc0002891148 [ 12.749016] t11= pv = fc00011d4a40 at = 5f19e10505e118bf [ 12.835930] gp = fc0002871148 sp = 440a695e [ 12.899407] Disabling lock debugging due to kernel taint [ 12.962883] Trace: [ 12.987298] [] cgroup_migrate_execute+0x338/0x600 [ 13.062493] [] cgroup_update_dfl_csses+0x2c8/0x330 [ 13.138665] [] cgroup_subtree_control_write+0x56c/0x5e0 [ 13.219719] [] cgroup_file_write+0xa4/0x1a0 [ 13.288079] [] kernfs_fop_write_iter+0x1a4/0x330 [ 13.362297] [] vfs_write+0x250/0x4c0 [ 13.423821] [] ksys_write+0x8c/0x140 [ 13.485344] [] entSys+0xac/0xc0 [ 13.541985] [ 13.559563] Code: [ 13.559563] fc00 [ 13.582024] [ 13.610344] [ 13.638664] 05163bfc [ 13.666985] fc00 [ 13.695305] 02871148 [ 13.723625] [ 13.751946] [ 13.779289] ``` With the sysvinit version of the root FS and initramfs everything works (did some `7z b` runs and an `openssl speed -elapsed` w/o an issue after booting, too): ``` [...] INIT: Entering runlevel: 2 Using makefile-style concurrent boot in runlevel 2. Setting up console font and keymap...done. [ 55.348604] loop0: detected capacity change from 0 to 2097152 [ 55.389620] Adding 1048568k swap on /dev/loop0. Priority:-2 extents:1 across:1048568k SSFS Starting enhanced syslogd: rsyslogd. Starting SMP IRQ Balancer: irqbalance. Starting periodic command scheduler: cron. Loading cpufreq kernel modules...done (none). Starting system message bus: dbus. Starting OpenBSD Secure Shell server: sshd. [ 57.059541] exim4(1425): unaligned trap at 02900088: 5f9dc335 28 2 [ 57.060517] exim4(1425): unaligned trap at 028fff14: 5f9dc335 28 1 [ 57.062470] exim4(1425): unaligned trap at 02900088: 5f9dc335 28 2 Starting MTA: exim4. Debian GNU/Linux 12 ds25 ttyS0 ds25 login: [ 63.064420] do_entUnaUser: 15 callbacks suppressed [ 63.064420] exim4(1445): unaligned trap at 02900088: 979d736e 28 2 [ 63.212858] exim4(1445): unaligned trap at 028fff14: 979d736e 28 1 [ 63.304655] exim4(1445): unaligned trap at 02900088: 979d736e 28 2 [ 63.395475] exim4(1445): unaligned trap at 028fff14: 979d736e 28 1 [ 63.488248] exim4(1445): unaligned trap at 02900088: 979d736e 28 2 ds25 login: ``` I'll try to downgrade systemd with some versions from snapshot.debian.org. UPDATE: I did downgrade to 251.6-1 and 250.4-1, which both didn't help with Linux 6.1.0-9, though, Interestingly when booting with Linux 5.3.0-3 eve
Re: Bug#1036158: gcc-13: Please raise baseline for alpha to EV56
Hi all, On 17.05.23 11:27, John Paul Adrian Glaubitz wrote: Hi Michael! On Tue, 2023-05-16 at 20:25 +1200, Michael Cree wrote: On Tue, May 16, 2023 at 09:38:56AM +0200, John Paul Adrian Glaubitz wrote: After a long discussion on IRC and the mailing list, we have agreed to raise the baseline for the alpha architecture to EV56 to improve the generated code and fix a number of issues. The change is already being implemented in the glibc packages which switches to EV56 [1] since hwcaps are no longer available with glibc 2.37 [2]. Could you raise the baseline for gcc on alpha to EV56? I assume, it should be "--with-cpu=ev56" or "--with-arch=ev56". Yes, please! I suggest the following in debian/rules2: ifneq (,$(findstring alpha,$(DEB_TARGET_ARCH))) CONFARGS += --with-cpu=ev56 --with-tune=ev6 endif (the --with-tune only affects instruction scheduling and better tunes code for ev6 and more recent machines, but allows execution down to ev56.) I have tested this in the past with a rebuild of most packages that are in the base essential chroot in the past and it works well. Doesn't that come with a speed penalty for EV56 machines? I'm asking because EV56 is currently the baseline for QEMU when emulating Alpha. How high will it be? I have some numbers from my PWS 500au for 7z 16.02 and openssl 3.0.7, so we could compare that later on. With everything below EV56 dropped, I'd say let's get everything out of the later (real) machines and use "ev67" here. Even my DS20E already uses EV67s. UPDATE: Reading through [1] and [2], it looks like there's no difference between EV6 and EV67 for instruction scheduling. So fine as proposed. [1]: https://gcc.gnu.org/onlinedocs/gcc/DEC-Alpha-Options.html [2]: https://uprrp2.uprrp.edu/help?key=SQLPRE72~SQLPRE_Command_Line~Arguments~ARCHITECTURE&title=VMS%20Help&referer= Cheers, Frank
Re: Unversioned symbols when building kernel package
Hi Adrian, On 04.01.23 22:15, John Paul Adrian Glaubitz wrote: Hello! I just tried to build the Debian kernel package for alpha which fails with: debian/bin/buildcheck.py debian/build/build_alpha_none_alpha-generic alpha none alpha-generic ABI is not completely versioned! Refusing to continue. Unversioned symbols: strcat module: vmlinux, version: 0x, export: EXPORT_SYMBOL strcpy module: vmlinux, version: 0x, export: EXPORT_SYMBOL strncat module: vmlinux, version: 0x, export: EXPORT_SYMBOL strncpy module: vmlinux, version: 0x, export: EXPORT_SYMBOL Can't read ABI reference. ABI not checked! make[2]: *** [debian/rules.real:218: debian/stamps/build_alpha_none_alpha-generic] Error 1 make[2]: Leaving directory '/<>' make[1]: *** [debian/rules.gen:426: build-arch_alpha_none_alpha-generic_real] Error 2 make[1]: Leaving directory '/<>' make: *** [debian/rules:39: build-arch] Error 2 dpkg-buildpackage: error: debian/rules binary-arch subprocess returned exit status 2 According to this comment by Ben [1], this is an issue that is trivially fixed by adding the appropriate header to arch/$ARCH/include/asm-prototypes.h. However, looking at the header file, "#include " is already present so I'm not sure what else we're missing. Indeed, and that file ([1]) wasn't touched in 5 years. I wonder if that then is not an error of the Debian build scripts. [1]: https://github.com/torvalds/linux/blob/master/arch/alpha/include/asm/string.h BTW the same error is already present for 5.19.6-1 - the build ran on 2022-09-18. Maybe the error corresponds with a change in the Debian linux repo ([3]) before that date. [2]: https://buildd.debian.org/status/fetch.php?pkg=linux&arch=alpha&ver=5.19.6-1&stamp=1663530012&raw=0) [3]: https://salsa.debian.org/kernel-team/linux Looking at [4] the message "Can't read ABI reference." could indicate a missing file - filename is defined in line 51 ([5]) and `/debian/abi/` does not exist in [3]. [4]: https://salsa.debian.org/kernel-team/linux/-/blob/master/debian/bin/buildcheck.py#L54-L72 [5]: https://salsa.debian.org/kernel-team/linux/-/blob/master/debian/bin/buildcheck.py#L51 Cheers, Frank
Re: glibc regression on alpha with 2.34+
Hi Adrian, On 15.12.22 11:06, Frank Scheiner wrote: Hi, On 15.12.22 11:02, John Paul Adrian Glaubitz wrote: Hi! On 12/15/22 10:49, Frank Scheiner wrote: Maybe adding [1] might help, but the patch actually removes it. It's missing this hunk: diff --git a/sysdeps/unix/sysv/linux/sysconf-sigstksz.h b/sysdeps/unix/sysv/linux/sysconf-sigstksz.h index 64d450b22c..4552e77d59 100644 --- a/sysdeps/unix/sysv/linux/sysconf-sigstksz.h +++ b/sysdeps/unix/sysv/linux/sysconf-sigstksz.h @@ -21,7 +21,7 @@ static long int sysconf_sigstksz (void) { - long int minsigstacksize = GLRO(dl_minsigstacksize); + long int minsigstacksize = 4096 ; //GLRO(dl_minsigstacksize); assert (minsigstacksize != 0); _Static_assert (__builtin_constant_p (MINSIGSTKSZ), "MINSIGSTKSZ is constant"); I was experimenting with a custom sysconf-sigstksz.h like on ia64 which I forgot to purge, sorry. Ok, I will use this and run it again. I renamed the build directory to reflect that the build was optimized for EV67. The result confirms your findings: ``` root@ds15:/srv/storage/build# LD_LIBRARY_PATH=$PWD/glibc-2.34-plus-patch-ev67 /bin/bash Floating point exception ``` Cheers, Frank
Re: glibc regression on alpha with 2.34+
Hi, On 15.12.22 11:02, John Paul Adrian Glaubitz wrote: Hi! On 12/15/22 10:49, Frank Scheiner wrote: Maybe adding [1] might help, but the patch actually removes it. It's missing this hunk: diff --git a/sysdeps/unix/sysv/linux/sysconf-sigstksz.h b/sysdeps/unix/sysv/linux/sysconf-sigstksz.h index 64d450b22c..4552e77d59 100644 --- a/sysdeps/unix/sysv/linux/sysconf-sigstksz.h +++ b/sysdeps/unix/sysv/linux/sysconf-sigstksz.h @@ -21,7 +21,7 @@ static long int sysconf_sigstksz (void) { - long int minsigstacksize = GLRO(dl_minsigstacksize); + long int minsigstacksize = 4096 ; //GLRO(dl_minsigstacksize); assert (minsigstacksize != 0); _Static_assert (__builtin_constant_p (MINSIGSTKSZ), "MINSIGSTKSZ is constant"); I was experimenting with a custom sysconf-sigstksz.h like on ia64 which I forgot to purge, sorry. Ok, I will use this and run it again. Cheers, Frank
Re: glibc regression on alpha with 2.34+
Hi, On 15.12.22 09:09, John Paul Adrian Glaubitz wrote: Hi! On 12/14/22 21:44, Frank Scheiner wrote: I'm attaching the second diff as a patch. I think there's some whitespace difference. I manually applied the rejected stuff, made a `git diff` and comparing that to your attached patch gives: Or just use the attached patch file from my previous mail. Yeah, did that in the end to be sure, but it looks like both are incomplete (because both versions gave the following result): ``` root@ds15:/srv/storage/glibc# git status HEAD detached at glibc-2.34 nothing to commit, working tree clean root@ds15:/srv/storage/glibc# patch -p1 < ../../bz20305-workaround2.patch patching file elf/dl-sysdep.c patching file elf/rtld_static_init.c patching file sysdeps/generic/ldsodefs.h patching file sysdeps/unix/sysv/linux/sysconf-pthread_stack_min.h patching file sysdeps/unix/sysv/linux/sysconf.c root@ds15:/srv/storage/glibc# cd ../build/glibc-2.34-plus-patch/ root@ds15:/srv/storage/build/glibc-2.34-plus-patch# CC="alpha-linux-gnu-gcc-12 -mcpu=ev67 -mtune=ev67 " CXX="alpha-linux-gnu-g++-12 -mcpu=ev67 -mtune=ev67 " MIG="alpha-linux-gnu-mig" ../../glibc/configure --host=alphaev67-linux-gnu --disable-werror --prefix=/usr --disable-sanity-checks [...] root@ds15:/srv/storage/build/glibc-2.34-plus-patch# time make [...] alpha-linux-gnu-gcc-12 -mcpu=ev67 -mtune=ev67 ../sysdeps/unix/sysv/linux/alpha/sysconf.c -c -std=gnu11 -fgnu89-inline -g -O2 -Wall -Wwrite-strings -Wundef -fmerge-all-constants -frounding-math -fno-stack-protector -fno-common -Wstrict-prototypes -Wold-style-definition -fmath-errno -mlong-double-128 -mieee -mfp-rounding-mode=d -fexceptions -DGETCONF_DIR='"/usr/libexec/getconf"' -ftls-model=initial-exec -I../include -I/srv/storage/build/glibc-2.34-plus-patch/posix -I/srv/storage/build/glibc-2.34-plus-patch -I../sysdeps/unix/sysv/linux/alpha/alphaev67/fpu -I../sysdeps/alpha/alphaev67/fpu -I../sysdeps/unix/sysv/linux/alpha/alphaev67 -I../sysdeps/unix/sysv/linux/alpha/fpu -I../sysdeps/alpha/fpu -I../sysdeps/unix/sysv/linux/alpha -I../sysdeps/alpha/nptl -I../sysdeps/unix/sysv/linux/wordsize-64 -I../sysdeps/ieee754/ldbl-64-128 -I../sysdeps/ieee754/ldbl-opt -I../sysdeps/unix/sysv/linux/include -I../sysdeps/unix/sysv/linux -I../sysdeps/nptl -I../sysdeps/pthread -I../sysdeps/gnu -I../sysdeps/unix/inet -I../sysdeps/unix/sysv -I../sysdeps/unix/alpha -I../sysdeps/unix -I../sysdeps/posix -I../sysdeps/alpha/alphaev67 -I../sysdeps/alpha/alphaev6 -I../sysdeps/alpha/alphaev5 -I../sysdeps/alpha -I../sysdeps/wordsize-64 -I../sysdeps/ieee754/ldbl-128 -I../sysdeps/ieee754/dbl-64 -I../sysdeps/ieee754/flt-32 -I../sysdeps/ieee754 -I../sysdeps/generic -I.. -I../libio -I. -D_LIBC_REENTRANT -include /srv/storage/build/glibc-2.34-plus-patch/libc-modules.h -DMODULE_NAME=libc -include ../include/libc-symbols.h -DTOP_NAMESPACE=glibc -o /srv/storage/build/glibc-2.34-plus-patch/posix/sysconf.o -MD -MP -MF /srv/storage/build/glibc-2.34-plus-patch/posix/sysconf.o.dt -MT /srv/storage/build/glibc-2.34-plus-patch/posix/sysconf.o In file included from ../sysdeps/alpha/ldsodefs.h:40, from ../sysdeps/gnu/ldsodefs.h:46, from ../sysdeps/unix/sysv/linux/ldsodefs.h:25, from ../sysdeps/unix/sysv/linux/sysconf.c:29, from ../sysdeps/unix/sysv/linux/alpha/sysconf.c:127: ../sysdeps/unix/sysv/linux/sysconf-sigstksz.h: In function ‘sysconf_sigstksz’: ../sysdeps/generic/ldsodefs.h:512:21: error: ‘_dl_minsigstacksize’ undeclared (first use in this function); did you mean ‘minsigstacksize’? 512 | # define GLRO(name) _##name | ^ ../sysdeps/unix/sysv/linux/sysconf-sigstksz.h:24:30: note: in expansion of macro ‘GLRO’ 24 | long int minsigstacksize = GLRO(dl_minsigstacksize); | ^~~~ ../sysdeps/generic/ldsodefs.h:512:21: note: each undeclared identifier is reported only once for each function it appears in 512 | # define GLRO(name) _##name | ^ ../sysdeps/unix/sysv/linux/sysconf-sigstksz.h:24:30: note: in expansion of macro ‘GLRO’ 24 | long int minsigstacksize = GLRO(dl_minsigstacksize); | ^~~~ In file included from ../sysdeps/unix/sysv/linux/sysconf.c:30: ../sysdeps/unix/sysv/linux/sysconf-sigstksz.h: At top level: ../sysdeps/unix/sysv/linux/sysconf-sigstksz.h:22:1: warning: ‘sysconf_sigstksz’ defined but not used [-Wunused-function] 22 | sysconf_sigstksz (void) | ^~~~ make[2]: *** [/srv/storage/build/glibc-2.34-plus-patch/sysd-rules:179: /srv/storage/build/glibc-2.34-plus-patch/posix/sysconf.o] Error 1 make[2]: Leaving directory '/srv/storage/glibc/posix' make[1]: *** [Makefile:478: posix/subdir_lib] Error 2 make[1]: Leaving directory '/srv/storage/glibc' make: *** [Makefile:9: all] Error 2
Re: glibc regression on alpha with 2.34+
On 14.12.22 21:32, John Paul Adrian Glaubitz wrote: Hi! On 12/14/22 21:16, Frank Scheiner wrote: I'll do that tomorrow. The thing is that this diff doesn't apply cleanly: Which version of the workaround diff did you use? There are two. There is one that applies cleanly on top of 6c57d320484988e87e446e2e60ce42816bf51d53 and a second one that applies cleanly on top of glibc-2.34, I posted both. There were some changes between 6c57d320484988e87e446e2e60ce42816bf51d53 and glibc-2.34 in the minstksize/stksize code which is why you need the second diff that was also part of my mail. I used the one from the bottom of your mail, just below "Interestingly, when I checkout the tag glibc-2.34 and disabled the _dl_minsigstacksize symbol in "struct rtld_global_ro {}" again with the following hack, I'm no longer getting a segfault but a floating point exception: " I'm attaching the second diff as a patch. I think there's some whitespace difference. I manually applied the rejected stuff, made a `git diff` and comparing that to your attached patch gives: ``` root@nfs:/srv/nfs/ds15/root/srv# diff -Nur glibc-fix-2.patch bz20305-workaround2.patch --- glibc-fix-2.patch 2022-12-14 21:24:01.259696291 +0100 +++ bz20305-workaround2.patch 2022-12-14 21:37:25.439904377 +0100 @@ -1,5 +1,5 @@ diff --git a/elf/dl-sysdep.c b/elf/dl-sysdep.c -index d47bef1340..d3dc6e5c57 100644 +index d47bef1340..8462e5859a 100644 --- a/elf/dl-sysdep.c +++ b/elf/dl-sysdep.c @@ -116,10 +116,10 @@ _dl_sysdep_start (void **start_argptr, @@ -12,7 +12,7 @@ - GLRO(dl_minsigstacksize) = CONSTANT_MINSIGSTKSZ; + /* /\* NB: Default to a constant CONSTANT_MINSIGSTKSZ. *\/ */ + /* _Static_assert (__builtin_constant_p (CONSTANT_MINSIGSTKSZ), */ -+ /* "CONSTANT_MINSIGSTKSZ is constant"); */ ++ /*"CONSTANT_MINSIGSTKSZ is constant"); */ + /* GLRO(dl_minsigstacksize) = CONSTANT_MINSIGSTKSZ; */ for (av = GLRO(dl_auxv); av->a_type != AT_NULL; set_seen (av++)) @@ -25,8 +25,8 @@ - GLRO(dl_minsigstacksize) = av->a_un.a_val; - break; + /* case AT_MINSIGSTKSZ: */ -+ /* GLRO(dl_minsigstacksize) = av->a_un.a_val; */ -+ /* break; */ ++ /* GLRO(dl_minsigstacksize) = av->a_un.a_val; */ ++ /* break; */ DL_PLATFORM_AUXV } ``` ...so I think we're covered, unless the difference in the index line is important. I'll compile that tomorrow and see what happens. Cheers, Frank
Re: glibc regression on alpha with 2.34+
Hi Adrian, On 14.12.22 20:51, John Paul Adrian Glaubitz wrote: [...] Can we be sure that this reproducer identifies the same problem than the build failures from the original post ([1])? [1]: https://lists.debian.org/debian-alpha/2022/11/msg3.html Well, this is how I identified that there was a problem with glibc on alpha. I built the packages manually with the testsuite enabled and installed them into a chroot for testing which resulted in a segfault when dpkg tried to configure the libc-bin package. I assume the many testsuite failures are a direct result of this bug which just causes many tests to segfault. We had a similar problem on sparc64 where a single bug in the static build caused many testsuite failures. I see. Interestingly, when I checkout the tag glibc-2.34 and disabled the _dl_minsigstacksize symbol in "struct rtld_global_ro {}" again with the following hack, I'm no longer getting a segfault but a floating point exception: [...] Could you verify this on your DS-15? I'll do that tomorrow. The thing is that this diff doesn't apply cleanly: ``` root@ds15:/srv/storage/glibc# git checkout glibc-2.34 Note: switching to 'glibc-2.34'. You are in 'detached HEAD' state. You can look around, make experimental changes and commit them, and you can discard any commits you make in this state without impacting any branches by switching back to a branch. [...] HEAD is now at ae37d06c7d Update ChangeLog.old/ChangeLog.23. root@ds15:/srv/storage/glibc# patch -p1 < ../../glibc-fix.patch patching file elf/dl-sysdep.c Hunk #1 FAILED at 116. Hunk #2 FAILED at 185. 2 out of 2 hunks FAILED -- saving rejects to file elf/dl-sysdep.c.rej patching file elf/rtld_static_init.c patching file sysdeps/generic/ldsodefs.h patching file sysdeps/unix/sysv/linux/sysconf-pthread_stack_min.h Hunk #1 succeeded at 22 with fuzz 1. patching file sysdeps/unix/sysv/linux/sysconf.c Hunk #1 succeeded at 84 with fuzz 2. ``` Not sure why, shouldn't we have the same source state? Should I try to apply the rejected stuff manually? Cheers, Frank
Re: glibc regression on alpha with 2.34+
On 14.12.22 20:55, John Paul Adrian Glaubitz wrote: [...] Unfortunately it also doesn't work here when optimized for EV67. OK, this just confirms what my cross-compile tests with "-mcpu=ev67 -mtune=ev67" where the segfault wasn't fixed either by raising the baseline. If you have a user account for glibc bugzilla, you should subscribe to the bug report I opened for this particular issue [1]. Or can you just put me on the CC list? H. J. Lu raises a good question, namely whether alpha has any hardcoded values for "struct rtld_global_ro {}". I have no answer for that. Cheers, Frank
Re: glibc regression on alpha with 2.34+
On 14.12.22 18:21, Frank Scheiner wrote: [...] Regardless, I can confirm this on my DS15: ``` root@ds15:/srv/storage/build# LD_LIBRARY_PATH=$PWD/glibc-at-36231bee7ab36d59dd121ea85b91411ae86945f3 /bin/bash root@ds15:/srv/storage/build# echo $? 0 root@ds15:/srv/storage/build# exit exit root@ds15:/srv/storage/build# LD_LIBRARY_PATH=$PWD/glibc-at-6c57d320484988e87e446e2e60ce42816bf51d53 /bin/bash Segmentation fault root@ds15:/srv/storage/build# echo $? 139 ``` ...6c57d320484988e87e446e2e60ce42816bf51d53 is the first bad commit and 36231bee7ab36d59dd121ea85b91411ae86945f3 is its parent. Do we also have a result for glibc@6c57d320484988e87e446e2e60ce42816bf51d53 with `-mcpu=ev67`? ``` root@ds15:/srv/storage/build/glibc-at-6c57d320484988e87e446e2e60ce42816bf51d53-ev67# CC="alpha-linux-gnu-gcc-12 -mcpu=ev67 -mtune=ev67 " CXX="alpha-linux-gnu-g++-12 -mcpu=ev67 -mtune=ev67 " MIG="alpha-linux-gnu-mig" ../../glibc/configure --host=alphaev67-linux-gnu --disable-werror --prefix=/usr --disable-sanity-checks [...] root@ds15:/srv/storage/build# LD_LIBRARY_PATH=$PWD/glibc-at-6c57d320484988e87e446e2e60ce42816bf51d53-ev67 /bin/bash Segmentation fault ``` Unfortunately it also doesn't work here when optimized for EV67. Cheers, Frank
Re: glibc regression on alpha with 2.34+
Hi Adrian, On 13.12.22 17:21, John Paul Adrian Glaubitz wrote: Hi! On 12/13/22 10:52, John Paul Adrian Glaubitz wrote: You could cross-compile glibc. That's most likely what I am going to do. For the record, here's how I am doing it. [...] Thanks for that, this is quite useful. 4. Enter alpha schroot and run the the following command from the build directory: (sid-alpha-sbuild)glaubitz@z6:~/glibc-git/build$ LD_LIBRARY_PATH=/home/glaubitz/glibc-git/build /bin/bash If the bug is present, this command will segfault: Segmentation fault Otherwise it will just spawn another bash which can be exited with "exit": (sid-alpha-sbuild)glaubitz@z6:~/glibc-git/build$ LD_LIBRARY_PATH=/home/glaubitz/glibc-git/build /bin/bash (sid-alpha-sbuild)glaubitz@z6:~/glibc-git/build$ exit (sid-alpha-sbuild)glaubitz@z6:~/glibc-git/build$ Can we be sure that this reproducer identifies the same problem than the build failures from the original post ([1])? [1]: https://lists.debian.org/debian-alpha/2022/11/msg3.html Regardless, I can confirm this on my DS15: ``` root@ds15:/srv/storage/build# LD_LIBRARY_PATH=$PWD/glibc-at-36231bee7ab36d59dd121ea85b91411ae86945f3 /bin/bash root@ds15:/srv/storage/build# echo $? 0 root@ds15:/srv/storage/build# exit exit root@ds15:/srv/storage/build# LD_LIBRARY_PATH=$PWD/glibc-at-6c57d320484988e87e446e2e60ce42816bf51d53 /bin/bash Segmentation fault root@ds15:/srv/storage/build# echo $? 139 ``` ...6c57d320484988e87e446e2e60ce42816bf51d53 is the first bad commit and 36231bee7ab36d59dd121ea85b91411ae86945f3 is its parent. Do we also have a result for glibc@6c57d320484988e87e446e2e60ce42816bf51d53 with `-mcpu=ev67`? Cheers, Frank
Re: glibc regression on alpha with 2.34+
Hi, On 13.12.22 10:52, John Paul Adrian Glaubitz wrote: [...] During this compilation I got 4 segfaults from the compiler (gcc-12) and a "gcc: internal compiler error: Aborted signal terminated program cc1". If you are interested in the details, I have all the error messages available. Is that glibc from upstream or the Debian package? Upstream, I followed [1]. [1]: https://sourceware.org/glibc/wiki/Testing/Builds Also, is the machine's memory known to be good? Please make sure to test it. I don't know for sure - you never know for software running on the same hardware that's gonna be tested - but the SROM testing didn't show any problems at least: ``` SROM V1.0-0 CPU # 00 @ 1000 MHz SROM program starting Reloading SROM SROM V1.0-1 CPU # 00 @ 1000 MHz System Bus Speed @ 0125 MHz SROM program starting Bcache data tests in progress Bcache address test in progress CPU parity and ECC detection in progress Bcache ECC data tests in progress Bcache TAG lines tests in progress Memory sizing in progress Memory configuration in progress Testing AAR2 Memory data test in progress Memory address test in progress Memory pattern test in progress Testing AAR0 Memory data test in progress Memory address test in progress Memory pattern test in progress Memory initialization Loading console Code execution complete (transfer control) ``` I can try to increase the depth of testing - if possible - but I'd expect some ECC related messaging for any failures happening and there was none on the system console. [...] Summarizing it, I'd be grateful if someone could do the bisecting on one of the buildds or developer machines. You could cross-compile glibc. That's most likely what I am going to do. Can the testing happen on a different arch, too? Cheers, Frank
Re: glibc regression on alpha with 2.34+
Hi again, just wanted to clarify something I saw in the build logs from the buildds - imago to be specific. On 13.12.22 10:33, Frank Scheiner wrote: [...] Summarizing it, I'd be grateful if someone could do the bisecting on one of the buildds or developer machines. According to the logs on [1] and [2], the machine seems to run w/o Bcache: ``` uname -a Linux imago 5.8.18-titan-p1+ #63 SMP Sat Jan 8 16:18:01 NZDT 2022 alpha GNU/Linux if [ -f /proc/cpuinfo ] ; then cat /proc/cpuinfo ; fi cpu : Alpha cpu model : EV68CB cpu variation : 7 cpu revision: 0 cpu serial number : JA44900165 system type : Titan system variation: Privateer system revision : 0 system serial number: AY50901023 cycle frequency [Hz]: 125000 timer frequency [Hz]: 1024.00 page size [bytes] : 8192 phys. address bits : 44 max. addr. space # : 255 BogoMIPS: 2480.92 kernel unaligned acc: 0 (pc=0,va=0) user unaligned acc : 338897612 (pc=20e100c,va=1200e71cc) platform string : AlphaServer ES45 Model 1B cpus detected : 3 cpus active : 3 cpu active mask : 0007 L1 Icache : 64K, 2-way, 64b line L1 Dcache : 64K, 2-way, 64b line L2 cache: n/a L3 cache: n/a ``` [1]: https://buildd.debian.org/status/fetch.php?pkg=glibc&arch=alpha&ver=2.34-8&stamp=1662963628&raw=0 [2]: https://buildd.debian.org/status/fetch.php?pkg=glibc&arch=alpha&ver=2.36-4&stamp=1667607306&raw=0 See "n/a" for the L2 cache line? Is that (1) an error in the kernel not being able to detect the 16 MiB Bcache of the 1250 MHz processor modules or (2) is this machine really running w/o active Bcache? If (2) I don't know how much effect this has on compilation, but I recently had a similar issue with the DS15 - i.e. Bcache not activated - and could see a noticable difference in performance for e.g. `7za b` and `openssl speed -elapsed` when compared to runs with active Bcache later. Though I found a solution for the DS15, I didn't find anything related yet for an ES45. But maybe this is just a kernel bug and the Bcache is active despite that message. You can check in SRM with `show config | more`: ``` >>>show config | more hp AlphaStation DS15 [...] Processors CPU 0 Alpha EV68CB pass 4.0 1000 MHz 2MB Bcache [...] ``` If you see "0MB Bcache" something is wrong. Alternatively the RMC should also be able to tell you the state of the processors (using the `cpu` command: ``` RMC>cpu �0;1m CPU Powerup Status Translation �0m EV6 BIST: PASS CPU ID: 0 (primary) STR Test: PASS CSC Test: PASS PCHIP0 Test: PASS DIMx Test: PASS TIG Bus Test: PASS DPR Test: STARTED - PASS CPU Speed Test: PASS - 1000MHz SROM Power-On Time: 12-13-42 09:45:36 SROM Power-On Error: No error System Bus Speed: 125MHz Last Synch State Test: PASS Bcache Size: 2MB ``` ). Though I'm not sure how this will look for mutliple processors. Cheers, Frank
Re: glibc regression on alpha with 2.34+
Hi guys, On 13.12.22 06:15, John Paul Adrian Glaubitz wrote: [...] I am still interested in fixing the glibc bug and will work on bisecting it. I yestderday did give that a try on a DS15, but it took already hours to get glibc 2.33 compiled. During this compilation I got 4 segfaults from the compiler (gcc-12) and a "gcc: internal compiler error: Aborted signal terminated program cc1". If you are interested in the details, I have all the error messages available. This went on during `make test` with another segfault and this one here after more than 2 hours of processing: ``` root@ds15:/srv/storage/build/glibc-2.33# time make test [...] g++ tst-thread_local1.cc -c -I/srv/storage/build/glibc-2.33/ -g -O2 -Wall -Wwrite-strings -Wundef -fmerge-all-constants -frounding-math -fno-stack-protector -mlong-double-128 -mieee -mfp-rounding-mode=d -std=gnu++11 -I../include -I/srv/storage/build/glibc-2.33/nptl -I/srv/storage/build/glibc-2.33 -I../sysdeps/unix/sysv/linux/alpha/fpu -I../sysdeps/alpha/fpu -I../sysdeps/unix/sysv/linux/alpha -I../sysdeps/alpha/nptl -I../sysdeps/unix/sysv/linux/wordsize-64 -I../sysdeps/ieee754/ldbl-64-128 -I../sysdeps/ieee754/ldbl-opt -I../sysdeps/unix/sysv/linux/include -I../sysdeps/unix/sysv/linux -I../sysdeps/nptl -I../sysdeps/pthread -I../sysdeps/gnu -I../sysdeps/unix/inet -I../sysdeps/unix/sysv -I../sysdeps/unix/alpha -I../sysdeps/unix -I../sysdeps/posix -I../sysdeps/alpha -I../sysdeps/wordsize-64 -I../sysdeps/ieee754/ldbl-128 -I../sysdeps/ieee754/dbl-64 -I../sysdeps/ieee754/flt-32 -I../sysdeps/ieee754 -I../sysdeps/generic -I.. -I../libio -I. -D_LIBC_REENTRANT -include /srv/storage/build/glibc-2.33/libc-modules.h -DMODULE_NAME=testsuite -include ../include/libc-symbols.h -DTOP_NAMESPACE=glibc -o /srv/storage/build/glibc-2.33/nptl/tst-thread_local1.o -MD -MP -MF /srv/storage/build/glibc-2.33/nptl/tst-thread_local1.o.dt -MT /srv/storage/build/glibc-2.33/nptl/tst-thread_local1.o tst-thread_local1.cc: In function ‘int do_test()’: tst-thread_local1.cc:177:5: error: variable ‘std::arraychar*, std::function >, 2> do_thread_X’ has initializer but incomplete type 177 | do_thread_X | ^~~ tst-thread_local1.cc: At global scope: tst-thread_local1.cc:133:1: warning: ‘void* thread_with_access(void*)’ defined but not used [-Wunused-function] 133 | thread_with_access (void *) | ^~ tst-thread_local1.cc:127:1: warning: ‘void* thread_without_access(void*)’ defined but not used [-Wunused-function] 127 | thread_without_access (void *) | ^ make[2]: *** [../o-iterator.mk:9: /srv/storage/build/glibc-2.33/nptl/tst-thread_local1.o] Error 1 make[2]: Leaving directory '/srv/storage/glibc/nptl' make[1]: *** [Makefile:479: nptl/tests] Error 2 make[1]: Leaving directory '/srv/storage/glibc' make: *** [Makefile:9: check] Error 2 real129m3.940s user104m25.441s sys 11m36.611s ``` ...after which I stopped. I did use `--disable-werror` during the configure step, but maybe this is not enough. OTOH it's only a warning so why does it err? Ah, I see it `-Wundef` is set. I'll have a look what the buildds use for the configure step. Today I'll also give it another try, but with gcc-11 this time - just in case something is wrong with gcc-12 - but frankly I don't think this goes anywhere on the DS15: Even with a DS25 I have available here I can only speed up the compilation and it takes already more than twice the power of the DS15, so nothing gained. My ES45s are in cold storage and I don't dare to start them up in such a low temperature environment. Summarizing it, I'd be grateful if someone could do the bisecting on one of the buildds or developer machines. Cheers, Frank
Re: glibc regression on alpha with 2.34+
On 12.12.22 09:17, Michael Cree wrote: On Mon, Dec 12, 2022 at 08:56:40AM +0100, Frank Scheiner wrote: Dear Michael, On 12.12.22 08:27, Michael Cree wrote: With the usrmerge uploads now depending on a recent libc version Alpha is now dead in the water. Nothing can be built. Thus we have to fix glibc to continue building. I am not prepared to fix ev4 issues so if no one else is prepared to fix them then without a architecture baseline raise this is the end of Alpha on Debian Ports. I'm not sure I fully understand the issue here: See, glibc used to work for alpha up until 2.33 as I read. Then a change broke it for alpha with 2.34. Does the respective glibc maintainer for alpha (Richard Henderson according to [1]) really have no interest in fixing it? RTH hasn't had working Alpha hardware for quite some time. One of the glibc maintainers did have access to one of my Alphas until last year but unfortunately the hosting site is no longer prepared to host it so I can no longer make that Alpha available to developers. Thanks for clarifying. So with that glibc Alpha support is rotting fast. Many of the other ports (e.g. armel, armhf, i386) have had architecture baseline increases in the last few years, and none support hardware anywhere near as old as alpha ev4. I am no longer personally prepared to support Alpha unless the architecture baseline increase is done. I have no ev4/ev45 hardware and no longer have any interest in supporting them. Yeah, I figured that already from your first email today. In your email to Adrian you write about BWX capable processors as new baseline. So EV56 instead of EV67? Cheers, Frank
Re: glibc regression on alpha with 2.34+
Dear Michael, On 12.12.22 08:27, Michael Cree wrote: On Sun, Nov 20, 2022 at 01:47:59PM +0100, Frank Scheiner wrote: On 20.11.22 10:03, Michael Cree wrote: On Sun, Nov 13, 2022 at 12:45:17AM +0100, John Paul Adrian Glaubitz wrote: I just noticed that there is a regression in glibc on alpha with version 2.34 or later. Interestingly the vast number of the failing tests pass if one builds with a compiler that raises the baseline to EV67. This has been proposed a number of times in the past for the Debian distribution. I think it is time we did it. One of our last EV56 users has recently bowed out due to hardware failure and I am only running EV67 hardware. I still have the following pre EV67 machines available and in working order: * AXPpci 33 (LCA4) * AlphaStation 200 (EV4) / 255 (EV45) / 500 (EV56) * PWS 500au (EV56) * AlphaServer 800 (EV56) ...and can provide testing on them. All of them eventually ran Debian Can you fix the ev4 based bugs in glibc? If not, I am not interested. I already told you what I can provide. With the usrmerge uploads now depending on a recent libc version Alpha is now dead in the water. Nothing can be built. Thus we have to fix glibc to continue building. I am not prepared to fix ev4 issues so if no one else is prepared to fix them then without a architecture baseline raise this is the end of Alpha on Debian Ports. I'm not sure I fully understand the issue here: See, glibc used to work for alpha up until 2.33 as I read. Then a change broke it for alpha with 2.34. Does the respective glibc maintainer for alpha (Richard Henderson according to [1]) really have no interest in fixing it? [1]: https://sourceware.org/glibc/wiki/MAINTAINERS#Machine_maintainers Cheers, Frank
Re: glibc regression on alpha with 2.34+
On 20.11.22 10:03, Michael Cree wrote: On Sun, Nov 13, 2022 at 12:45:17AM +0100, John Paul Adrian Glaubitz wrote: I just noticed that there is a regression in glibc on alpha with version 2.34 or later. Looking at the build logs for Debian's 2.34-8 [1], 2.35-4 [2] and 2.36-4 [3], it's obvious there is something wrong given the many "Segmentation Fault" errors. I had hoped I could fix this issue by passing "--disable-default-pie" like we already did on sparc64, but it seems it's not the same bug [4]. At least, this particular workaround does not help. Interestingly the vast number of the failing tests pass if one builds with a compiler that raises the baseline to EV67. This has been proposed a number of times in the past for the Debian distribution. I think it is time we did it. One of our last EV56 users has recently bowed out due to hardware failure and I am only running EV67 hardware. I still have the following pre EV67 machines available and in working order: * AXPpci 33 (LCA4) * AlphaStation 200 (EV4) / 255 (EV45) / 500 (EV56) * PWS 500au (EV56) * AlphaServer 800 (EV56) ...and can provide testing on them. All of them eventually ran Debian GNU/Linux Sid with up to Linux 5.x.x IIRC and I will also try them with 6.0.x. And I believe the majority of still exsiting, still working Alpha systems are pre EV67 systems. Given the fact that EV6[...] and EV7[...] based systems are nowadays very expensive for hobby use (I don't want to say unobtainium), I expect that dropping support for pre EV67 will kill off most of the user base for Debian on Alpha (and also Gentoo I assume). Phrasing it differently: Who needs a port that only runs on the buildds and a handful of (hobbyist) machines around the world (like ppc64le ;-))? My two cents. All the best, Frank
Re: Linux 6.0.7 MP kernel works on DS25
Hi Adrian, On 12.11.22 11:17, John Paul Adrian Glaubitz wrote: Don't know what to make out of this. Is this a problem in the kernel sources or a problem for the Debian kernel team? [2]: https://buildd.debian.org/status/logs.php?pkg=linux&arch=alpha No idea, really. We need to ask someone from the Debian kernel team, they probably know how to fix this. How can we contact them and who for Alpha specifically? Cheers, Frank
Linux 6.0.7 MP kernel works on DS25
Hi all, just a short update for all that might haven't noticed yet: It looks like MP operation is working **again** on Alpha with recent kernels - which is just a pleasure to see! I'm unsure what was fixed in the kernel to make it work again, quickly scanning through the changes I didn't find anything related to Titan, only changes regarding Marvel. Nevertheless a **big thank you** to all people involved in "fixing" this! I tested 6.0.7 on my DS25, running: * `7za b` * `openssl speed -elapsed` * and nearly two hours of kernel compilation (with `localmodconfig`) ...which worked w/o an apparent issue. I couldn't get it to work with systemd though, so had to resort to replacing systemd with sysvinit, see [1] for details. [1]: https://pastebin.com/FbFG7jHx I don't know if my kernel config misses something needed by systemd, though I took the one of the last working kernel 5.3.0-3 as base (`olddefconfig`, then `localmodconfig` plus small changes via `menuconfig`), which works fine with the same userland (incl. recreated current initramfs). I actually build the 6.0.7 kernel manually because I wanted to verify it builds and really works on Alpha. [2] only shows build failures for the last attempted kernel builds, though these seem to happen in the Debian part of the build. Since 6.x the build process complains about unversioned symbols: ``` [...] make[3]: Leaving directory '/<>/debian/build/build_alpha_none_alpha-generic' debian/bin/buildcheck.py debian/build/build_alpha_none_alpha-generic alpha none alpha-generic ABI is not completely versioned! Refusing to continue. Unversioned symbols: strcat module: vmlinux, version: 0x, export: EXPORT_SYMBOL strcpy module: vmlinux, version: 0x, export: EXPORT_SYMBOL strncat module: vmlinux, version: 0x, export: EXPORT_SYMBOL strncpy module: vmlinux, version: 0x, export: EXPORT_SYMBOL Can't read ABI reference. ABI not checked! make[2]: *** [debian/rules.real:218: debian/stamps/build_alpha_none_alpha-generic] Error 1 make[2]: Leaving directory '/<>' make[1]: *** [debian/rules.gen:426: build-arch_alpha_none_alpha-generic_real] Error 2 make[1]: Leaving directory '/<>' make: *** [debian/rules:39: build-arch] Error 2 dpkg-buildpackage: error: debian/rules binary-arch subprocess returned exit status 2 ``` Don't know what to make out of this. Is this a problem in the kernel sources or a problem for the Debian kernel team? [2]: https://buildd.debian.org/status/logs.php?pkg=linux&arch=alpha Cheers, Frank
Re: Alpha has gone to its reward
Dear Bob, On 02.07.22 04:43, Bob Tracy wrote: We had a horrific electrical storm on the 28th, and a lightning strike took out my home air-conditioning units, my cable modem, my Wifi router, a 16-port switch, my Ooma Telo, my main computer, my printer, and... my PWS-433au :-(. My condolences, what a pity, another classic gone... :,(, not to speak of the other hardware you lost. :-/ The Alpha isn't worth repairing, and I'm not going to go to the trouble of replacing it. I had my fun with it, and frankly, it lasted far longer in 24x7x365 use than I had any right to expect. All this with the original hardware (PSU, system board, memory)? Insane! Bottom line: After more years than I care to remember, I'm out of the race. Will continue to lurk and help where/when I can, but I won't be doing any actual testing or debugging on Alpha. Sincere thanks to the experts hanging out here who have helped me through many a rough spot with the Alpha platform. Thanks for all your involvement here, I myself started my Alpha machine collection with a PWS (a 500au to be exact) and I found it always helpful and encouraging to see your machine running a GUI and what not. Cheers, Frank
Re: MP kernels broken with version 5.4.0-1
On 20.02.22 19:34, Michael Cree wrote: On Sun, Feb 20, 2022 at 01:39:44PM +0100, Frank Scheiner wrote: I'm unsure if someone already noticed, but it looks like the MP kernels for the alpha arch are broken since at least 5.4.0-1 ([1]), tested on: * quad processor ES45 (with 5.4.0-1, 5.7.0-1) * single processor AS 800 (with 5.7.0-1) * dual processor DS25 (with 5.4.0-1 and 5.15.0-1) Commit f2f84b05e02b7710a201f0017b3272ad7ef703d1 is the problem. Reverting that results in a working SMP kernel, at least up to version 5.8.y. Great! I hoped you would have a look. :-) There is another problem (causing occassional memory corruptions in user space) introduced into the 5.9.0 kernel which has proved tricky to nail down. I have an idea where it is but not what the actual problem is. I think I can confirm that, because the panic on my DS25 with 5.15.0-1 looks different than with 5.4.0-1. I didn't create a log, but IIRC the panic originated from systemd there. How should we proceed from here? Cheers, Frank
MP kernels broken with version 5.4.0-1
Dear all, I'm unsure if someone already noticed, but it looks like the MP kernels for the alpha arch are broken since at least 5.4.0-1 ([1]), tested on: * quad processor ES45 (with 5.4.0-1, 5.7.0-1) * single processor AS 800 (with 5.7.0-1) * dual processor DS25 (with 5.4.0-1 and 5.15.0-1) [1]: http://snapshot.debian.org/archive/debian-ports/20191229T085938Z/pool-alpha/main/l/linux/linux-image-5.4.0-1-alpha-smp_5.4.6-1_alpha.deb 5.3.x still works fine, tested on: * DS25 (with 5.3.0-6) * AS 800 (with 5.3.0-2)) And now guess what, the SP kernel still works, up to 5.15.0-1, and tested on the DS25. I assume the multiprocessor alpha buildds should have the same problem, but for the one I checked it still ran with 5.3.6 [2], though the linked log is already from 2020. [2]: https://buildd.debian.org/status/fetch.php?pkg=ppx-tools-versioned&arch=alpha&ver=5.4.0-1%2Bb2&stamp=1608280942&raw=0 I wonder if there is some current or better live information about the hardware and software of the buildds available somewhere on the web? If time allows I will try to bisect this issue, but as I am already behind with figuring out what broke the kernel for UltraSPARC III(i), I wouldn't mind someone else being faster with bisecting the Alpha MP issue. @Michael: If you remember my first email to you and this list from early 2017, we had a similar situation, though the SP kernels were toast back then and now it's the MP ones. Well, the situation back then was definitely better for us. :-) Cheers, Frank
Re: systemd woes continue
On 7/17/19 15:50, Frank Scheiner wrote: On 7/17/19 15:41, John Paul Adrian Glaubitz wrote: On 7/17/19 3:16 PM, Frank Scheiner wrote: To sum things up: what Adrian intends to do for Alpha - pre-include the firmware on the installer discs - seems to be the only way to get this problem fixed w/o manual intervention during installation. It's also the standard way how it's done. debian-cd (the software used to build the CD images) supports adding firmware packages. But the problem is that firmware packages are usually part of the non-free suite which is not available in Debian Ports but on the main Debian repositories only. However, the main repositories and the Debian Ports repositories are separate mirrors and currently, debian-cd does not support more than one mirror during CD image build. Ok, could it work when using the CDN (deb.debian.org) as described on [2]? Or does this just redirect to the actual ports mirror for ports architectures? [2]: https://www.ports.debian.org/mirrors Probably not, as [3] shows that one still needs to use different directories for the different targets. [3]: http://deb.debian.org/ Ok, what about a redirect on the Debian-Ports mirror for a "/debian-ports/pool/non-free/" directory to e.g. "http://deb.debian.org/debian/pool/non-free";? Apt seems to support redirects. Cheers, Frank
Re: systemd woes continue
On 7/17/19 15:41, John Paul Adrian Glaubitz wrote: On 7/17/19 3:16 PM, Frank Scheiner wrote: debian-installer doesn't use fdisk (anymore), it uses partman. Did you try any of the recent installation images, see [1]. Please note these images are currently shipped without proprietary firmware. Yeah, that's a problem for any Alpha with a Qlogic SCSI controller. If you want to use e100 (e.g. DE602) and tg3 (e.g. DEGXA) driven NICs during installation, these should be added here, too. What do you mean by adding them? The drivers? Or the firmware? If the firmware is not packaged, someone needs to add them to one of the firmware packages. I just wanted to "add" these NICs to the list of hardware that needs firmware to work correctly on Alpha, i.e. like: ``` Yeah, that's a problem for any Alpha with a Qlogic SCSI controller... ...and a e100 or tg3 driven NIC. ``` To sum things up: what Adrian intends to do for Alpha - pre-include the firmware on the installer discs - seems to be the only way to get this problem fixed w/o manual intervention during installation. It's also the standard way how it's done. debian-cd (the software used to build the CD images) supports adding firmware packages. But the problem is that firmware packages are usually part of the non-free suite which is not available in Debian Ports but on the main Debian repositories only. However, the main repositories and the Debian Ports repositories are separate mirrors and currently, debian-cd does not support more than one mirror during CD image build. Ok, could it work when using the CDN (deb.debian.org) as described on [2]? Or does this just redirect to the actual ports mirror for ports architectures? [2]: https://www.ports.debian.org/mirrors So, while I can enable to include firmware during image build, debian-cd is unable to find the firmware packages as they are not part of the Debian Ports mirror I hope that clarifies the problem. It's on my TODO list. It's just not trivial since I need to modify debian-cd to be able to merge the contrib and non-free repositories from the main FTP servers during CD image build. JH Chatenet once created a debootstrap "addon" (see [1] for details) that merges "unstable" and "unreleased" suites, maybe functionality from that addon can be reused here? That just merges two suites but not two mirrors. Oh, I see, didn't anticipate that the non-free suite is on different mirrors. But see above. Cheers, Frank
Re: systemd woes continue
On 7/17/19 11:54, John Paul Adrian Glaubitz wrote: On 7/17/19 11:00 AM, Michael Cree wrote: On Thu, Jul 11, 2019 at 04:11:44PM +0200, John Paul Adrian Glaubitz wrote: I assume you are talking about the non-functionality of a separate /usr partition, but this is something that isn't guaranteed to work well on Linux, Pardon? A separate /usr partition has always been supported on Linux, so I am not sure what you are tallking about... It's not really supported anymore: https://freedesktop.org/wiki/Software/systemd/separate-usr-is-broken/ https://lwn.net/Articles/670071/ We have recently had a similar discussion on the debian-68k mailing list. The gist is: A lot of projects don't test their code on systems with separate /usr partitions anymore, so things get silently broken. Very unfortunate. debian-installer doesn't use fdisk (anymore), it uses partman. Did you try any of the recent installation images, see [1]. Please note these images are currently shipped without proprietary firmware. Yeah, that's a problem for any Alpha with a Qlogic SCSI controller. If you want to use e100 (e.g. DE602) and tg3 (e.g. DEGXA) driven NICs during installation, these should be added here, too. I actually wanted to provide a write-up of my in-depth testing of some older ISOs from Adrian on DS20E, DS25 and ES45, but I didn't find the time yet and there are already newer ISOs available to try, so I'll just put some parts of what I learned here, as it seems to fit the discussion: What I saw during my testing (when a Qlogic SCSI controller (e.g. KZPBA) is used - mainly relevant for ES45 which has no integrated SCSI controller, DS20E and DS25 instead have integrated Adaptec SCSI controllers which aren't affected) is, that the qla1280 driver gets loaded during startup of the installer (kernel) automatically. It doesn't work though due to missing firmware. Then when you provide the firmware DEBs to the installer it takes multiple attempts to actually install the required firmware for the Qlogic SCSI controller. When it finally succeeds (I actually don't remember if it succeeded at all, or if I manually put the firmware in place), the qla1280 driver is unloaded and then reloaded (this time with firmware in place). But unfortunately the Qlogic SCSI controller is no longer responsible. I don't know if this is due to the first load w/o firmware or due to the unloading, but it doesn't matter, providing the firmware as intended by the installer doesn't work for such a configuration. The only way to get it working was to blacklist the qla1280 module during startup, manually mount a prepared firmware directory to `/lib/firmware` and manually load the qla1280 module afterwards but before entering the partitioning step. Similar for the NIC modules, when using e100 or tg3 driven NICs on Alpha. BTW here other architectures differ, e.g. an rx2660 (ia64) with two tg3 driven NICs works perfectly fine w/o firmware available and I've also seen it working perfectly with e100 driven NICs on x86 IIRC. If you have a tulip driven NIC and/or a sym53c8xx driven SCSI controller (e.g. KZPCA) you're fine, as these don't require firmware. Same for machines with Adaptec controllers, though the integrated NICs of the DS25 still require firmware to operate correctly. To sum things up: what Adrian intends to do for Alpha - pre-include the firmware on the installer discs - seems to be the only way to get this problem fixed w/o manual intervention during installation. It's on my TODO list. It's just not trivial since I need to modify debian-cd to be able to merge the contrib and non-free repositories from the main FTP servers during CD image build. JH Chatenet once created a debootstrap "addon" (see [1] for details) that merges "unstable" and "unreleased" suites, maybe functionality from that addon can be reused here? [1]: https://lists.debian.org/debian-alpha/2014/06/msg00012.html Cheers, Frank
Re: Updated installation images 2019-01-20
On 5/10/19 23:03, Michael Cree wrote: On Fri, May 10, 2019 at 09:31:02PM +0200, Frank Scheiner wrote: On 5/10/19 21:11, John Paul Adrian Glaubitz wrote: On May 10, 2019, at 9:07 PM, Skye wrote: I was able to boot from CDROM and the loading ran fine until it got to the hardware discovery phase. It was unable to find the CD-ROM hardware. My AlphaStations all have the Toshiba SCSI CD-ROM player. Good to hear that! Last time I tested Debian GNU/Linux Sid (not the installer!) on my older AlphaStations with EV4(5) they hang during kernel boot. Looks like I should give them another try with the current kernel and userland. Thanks for the encouraging information. :-) Yes, the generic kernel has been fixed and uploaded into the archive. It should now boot correctly. Yes, Michael, I think so, too. But last time I tested my AlphaStations 200 and 255 I used the MP kernel, as the SP kernel did just crash the machine(s) and they did hang with the MP kernel about halfway during the kernel boot and couldn't finish booting then - while other SP Alpha machines worked flawlessly with the MP kernel. UPDATE: Just finished my testing on my AlphaStations 200 and 255 and also tested my AXPpci 33 in addition and can confirm, that the current SP kernel (4.19.0-5) works on these. Great to have the AlphaStations back on Linux :-), IIRC the AXPpci 33 didn't show the hangs mentioned earlier. I only had some strange issue on the AlphaStation 255, where it looked like the machine "hang" for while (maybe 15 or 30 seconds) and then continued during kernel boot. The strange thing here was, that the timestamps of the following kernel messages didn't reflect these hangs. E.g. although the machine looked like it was blocked for a 20 seconds or so, the next message didn't had that amount of time added to the timestamp value. Maybe a hardware issue, as the NVRAM battery is depleted on this machine. But could be unrelated, as the AXPpci 33's NVRAM battery is also depleted and no such issues there. Ever noticed something like that on your Alpha machines? Cheers, Frank
Re: Updated installation images 2019-01-20
On 5/10/19 21:11, John Paul Adrian Glaubitz wrote: On May 10, 2019, at 9:07 PM, Skye wrote: I was able to boot from CDROM and the loading ran fine until it got to the hardware discovery phase. It was unable to find the CD-ROM hardware. My AlphaStations all have the Toshiba SCSI CD-ROM player. Good to hear that! Last time I tested Debian GNU/Linux Sid (not the installer!) on my older AlphaStations with EV4(5) they hang during kernel boot. Looks like I should give them another try with the current kernel and userland. Thanks for the encouraging information. :-) Cheers, Frank
Re: pata drivers in the alpha installer
Hi again, On 5/8/19 23:16, Frank Scheiner wrote: Hi all, On 5/8/19 22:24, John Paul Adrian Glaubitz wrote: Hello JH Chatenet! On 5/8/19 10:19 PM, jhcha54...@free.fr wrote: I wonder if the line in the patch of bug #920353 [1] : pata-modules-${kernel:Version} would still be helpful. (I haven't tested since I submitted the bug report) I missed that part back then because you put two different topics into one bug report. I only addressed the issue with the release name but not the missing pata-modules entry. I will have a look if it's indeed missing and add the drivers. The pata-modules are indeed missing in the initramfs for the installer CDROM and the cause of my installation problems on my DS25 (see [1] and follow-ups). I haven't yet tested on my DS20E and ES45, but assume they are also affected, if their disc drives are attached to a PATA port. [1]: https://lists.debian.org/debian-alpha/2019/05/msg0.html I now think we also need to add the "ata-modules" UDEB to the initramfs for the installer CDROM, as this contains the `libata` module, which seems to be a dependency of at least the `pata_ali` module: ``` root@ds25:/lib/modules/4.19.0-4-alpha-smp/kernel/drivers/ata# modprobe -v pata_ali insmod /lib/modules/4.19.0-4-alpha-smp/kernel/drivers/ata/libata.ko insmod /lib/modules/4.19.0-4-alpha-smp/kernel/drivers/ata/pata_ali.ko ``` ...and possibly others (see [2]). [2]: https://salsa.debian.org/kernel-team/linux/blob/master/debian/installer/modules/ata-modules Cheers, Frank
Re: pata drivers in the alpha installer
Hi all, On 5/8/19 22:24, John Paul Adrian Glaubitz wrote: Hello JH Chatenet! On 5/8/19 10:19 PM, jhcha54...@free.fr wrote: I wonder if the line in the patch of bug #920353 [1] : pata-modules-${kernel:Version} would still be helpful. (I haven't tested since I submitted the bug report) I missed that part back then because you put two different topics into one bug report. I only addressed the issue with the release name but not the missing pata-modules entry. I will have a look if it's indeed missing and add the drivers. The pata-modules are indeed missing in the initramfs for the installer CDROM and the cause of my installation problems on my DS25 (see [1] and follow-ups). I haven't yet tested on my DS20E and ES45, but assume they are also affected, if their disc drives are attached to a PATA port. [1]: https://lists.debian.org/debian-alpha/2019/05/msg0.html @JH: I must have overlooked that part like Adrian or maybe forgot it, because I remember your message to the debian-alpha list back in January. IIRC testing at that time was not possible on real hardware because of a bug in the kernel that only showed in non-MP kernels - and unfortunately the installer kernel for alpha is non-MP. Thankfully that's fixed now. Cheers, Frank
Re: Updated installation images for Debian Ports 2019-04-20
On 5/6/19 23:37, John Paul Adrian Glaubitz wrote: On May 6, 2019, at 11:17 PM, Frank Scheiner mailto:frank.schei...@web.de>> wrote: I just tried the 2019-04-20 ISO on my DS25. Unfortunately the disc drive cannot be detected by the installer so I'm stuck at this point. As far as I know, a lot of Alpha-specific hardware needs firmware but I haven’t looked into creating images with firmware yet. Yeah, but this problems seems purely due to the pata-modules missing in the initramfs. UPDATE: On second thought, [2] and [3] seem to be unrelated to what is available in the initramfs of the installer. From what I saw, the drivers from the sata-modules UDEB are actually included in the initramfs, but the needed drivers from the pata-modules UDEB are not. Where is this configured? [4] includes the sata-modules UDEB as optional, but no pata-modules UDEB. So is this maybe configured in [4]? If yes, I can provide a patch on salsa.d.o to fix that issue, just need a confirmation. Here: https://salsa.debian.org/kernel-team/linux/tree/master/debian/installer/modules/alpha-generic Try to understand the include logic first before sending a PR. A lot of modules are actually included using the “common” directory. Could you explain a little more? With the "common" directory you mean [5]? [5]: https://salsa.debian.org/kernel-team/linux/tree/master/debian/installer/modules/ The way I understand it, the configuration files in [6] control what UDEBs are built for the alpha kernel used for the installer, as they seem to include the configuration files from the assumed "common" super directory which list the drivers to include for the respective devices/protocols/standards/etc.. And the configuration in [7] seems to control what UDEBs are installed in the initramfs. This would make sense for [7] as it looks like the alpha configuration file for the installer CDROM and the initramfs for this CDROM should contain all needed drivers to be able to access the used disc drive. Hence I think we need to update [7]. The question remains if one needs to specify dependencies explicitly or if dependencies are resolved during build. I'll try to correlate the modules in the initramfs with what is configured in [7] and come to a conclusion. [6]: https://salsa.debian.org/kernel-team/linux/tree/master/debian/installer/modules/alpha-generic [7]: https://salsa.debian.org/installer-team/debian-installer/blob/master/build/pkg-lists/cdrom/alpha.cfg Cheers, Frank
Re: Updated installation images for Debian Ports 2019-04-20
On 4/20/19 10:09, John Paul Adrian Glaubitz wrote: Hello! I just uploaded updated installation images 2019-04-20 for the following Debian Ports architectures: * alpha [...] I uploaded both CD images [1] as well as netboot images [2]. Please test those images and report back over the mailing list for the corresponding architecture. Known issues: * alpha - There have been reports about missing firmware for storage devices. The firmware required for these devices is located in the "firmware-linux" and "firmware-linux-nonfree" packages but currently don't have a floppy driver in debian-installer to load additional firmware. I will figure out how to build CD images including firmware similar to the images available for i386 and amd64. I just tried the 2019-04-20 ISO on my DS25. Unfortunately the disc drive cannot be detected by the installer so I'm stuck at this point. Checking the drivers available in the shipped initramfs, there seems to be only one PATA driver available - `pata_sis` - and according to [1] this is actually a dependency of a SATA driver. [1]: https://salsa.debian.org/kernel-team/linux/blob/master/debian/installer/modules/pata-modules#L4-5 The missing driver for my machine is `pata_ali`, but as said the other PATA drivers are also missing. From [2] and [3] I'd expect them to be included, but this seems to be not the case for an unknown reason. [2]: https://salsa.debian.org/kernel-team/linux/blob/master/debian/installer/modules/pata-modules#L1-2 [3]: https://salsa.debian.org/kernel-team/linux/blob/master/debian/installer/modules/alpha-generic/pata-modules Loading this driver manually when using a NFS root FS makes the disc drive accessible on my DS25: ``` root@ds25:/lib/modules/4.19.0-4-alpha-smp/kernel/drivers/ata# modprobe -v pata_ali insmod /lib/modules/4.19.0-4-alpha-smp/kernel/drivers/ata/libata.ko insmod /lib/modules/4.19.0-4-alpha-smp/kernel/drivers/ata/pata_ali.ko [ 182.457914] libata version 3.00 loaded. [ 182.463773] scsi host0: pata_ali [ 182.520414] scsi host1: pata_ali [ 182.559476] ata1: PATA max UDMA/33 cmd 0x1f0 ctl 0x3f6 bmdma 0x10050 irq 14 [ 182.643461] ata2: PATA max UDMA/33 cmd 0x170 ctl 0x376 bmdma 0x10058 irq 15 [ 182.882718] ata1.00: ATAPI: SAMSUNG DVD-ROM SD-616Q, F401, max UDMA/33 [ 182.960843] ata1.00: WARNING: ATAPI DMA disabled for reliability issues. It can be enabled [ 183.061429] ata1.00: WARNING: via pata_ali.atapi_dma modparam or corresponding sysfs node. [ 183.162015] scsi 0:0:0:0: CD-ROMSAMSUNG DVD-ROM SD-616Q F401 PQ: 0 ANSI: 5 [ 183.192289] scsi 0:0:0:0: Attached scsi generic sg0 type 5 [ 183.733304] sr 0:0:0:0: [sr0] scsi3-mmc drive: 16x/48x cd/rw xa/form2 cdda tray [ 183.734280] cdrom: Uniform CD-ROM driver Revision: 3.20 [ 183.736234] sr 0:0:0:0: Attached scsi CD-ROM sr0 ``` UPDATE: On second thought, [2] and [3] seem to be unrelated to what is available in the initramfs of the installer. From what I saw, the drivers from the sata-modules UDEB are actually included in the initramfs, but the needed drivers from the pata-modules UDEB are not. Where is this configured? [4] includes the sata-modules UDEB as optional, but no pata-modules UDEB. So is this maybe configured in [4]? If yes, I can provide a patch on salsa.d.o to fix that issue, just need a confirmation. [4]: https://salsa.debian.org/installer-team/debian-installer/blob/master/build/pkg-lists/cdrom/alpha.cfg Cheers, Frank
Re: Updated installation images for Debian Ports 2019-04-09
On 4/16/19 14:55, Darren Goossens wrote: > Hi, and thanks for the inte3resting discussion. > > Yes, the QLOGIC chip seems to be the issue. The Debian 5 installer > works flawlessly. I used full disk not netinstall for that. I believe at that time the Linux kernel (tree) still included a lot of firmware files. These remaining in-tree firmware files were removed with [1]. And the `qlogic/1040.bin.ihex` firmware file was among them. [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=b38923a068c10fc36ca8f596d650d095ce390b85 Not sure though, if this was also the case for the Linux kernel in Lenny. > I cannot say what would work best for the most people, and thus be > best place to put your efforts, but I do know that the floppy works on > this machine (AlphaServer 1200), though in Deb 5 I had to modprobe > floppy first. A PCI to USB adapter could also work and avoid the size limitations of floppy disks - if you have one at hand. Cheers, Frank
Re: Updated installation images for Debian Ports 2019-04-09
On 4/15/19 03:53, Darren Goossens wrote: Hi Sorry If I am replying in the wrong way, I tried to understand how these lists work but I always seem to get it wrong. I have tried to use the Alpha netboot image. It booted very nicely, but then I got the message that I needed to load some firmware from removable media -- in this case, qlogic/1040.bin. The installer did not detect the CDROM drive, which is a standard SCSI one that came with the AlphaServer. I put the files on a floppy (AlphaServer 1200 does not have USB) but there is no floppy kernel module on the installer, so it could not read the floppy. So I'm kind of stuck. Can you get your network up already at that point - netcat (as `nc`) is available in the installer environment - or does this also require access to the installer disc? Cheers, Frank
Re: is there a working UP generic 4.X kernel available?
Hi Bob, On 3/7/19 09:59, Bob Tracy wrote: I'm finally starting to get a bit of traction on Debian bug #919825, but Michael Biebl would really like to see me testing with a Debian-provided kernel instead of my hand-built kernel.org versions (now running 5.0.0). I saw where Ben Hutchings grabbed the fix referenced here (https://salsa.debian.org/kernel-team/linux/merge_requests/79) for inclusion in sid, and the corresponding issue was closed approximately four weeks ago. A quick check of the kernels available over at "http://ftp.ports.debian.org/debian-ports/pool-alpha/main/l/linux/"; doesn't show anything with a late enough date stamp to include the fix. You could use an SMP kernel right-now from Debian Ports instead. They do work without the fix on MP **and** SP Alpha machines. If I'm mistaken as to the availability of a working generic kernel for single-processor alpha systems, kindly point me to it and I'll be happy to give it a try. Otherwise, what's the targeted kernel version for the fix in sid? And approximately when might that show up? Ben included the fix in [1] and 4.19.0-3 (4.19.20-1) should have it included ([2]). [1]: https://salsa.debian.org/kernel-team/linux/commit/886c02b804afbc9bdf6f5bb955d1778dd7f60dff [2]: https://salsa.debian.org/kernel-team/linux/commit/9050e91ac08f3411937c14288f986b674e5f59bf Cheers, Frank
Re: PWS 433au (Miata) recovery update
On 1/27/19 22:53, Alex Winbow wrote: On Mon, 28 Jan 2019, Michael Cree wrote: samba build-depends on ceph [1] but ceph hasn't built on Alpha for some time [2]. Looks like dtp-relative relocation errors during linking in the build of ceph [3] is the reason. I have a theory that gcc is not taking the spec file listed in its invocation arguments in the correct order with other passed arguments thus we are not getting correct linking for some shared libraries in the repository. That leads to FTBFS in other packages with these dtp-relative relocation errors. I haven't explored my theory enough to make a bug report against gcc. Thanks, Michael. How can I help debug this? The alpha I'm bringing up will be a replacement server. Does anyone have an archive of "samba-common_4.7.3+dfsg-1_all.deb"? That seems like it may be missing piece for installing the older samba 4.7.3 for now on alpha (the binary packages still being present in the archive). snapshot.debian.org seems to still have it on: https://snapshot.debian.org/package/samba/2%3A4.7.3%2Bdfsg-1/#samba-common_2:3a:4.7.3:2b:dfsg-1 Direct download from: https://snapshot.debian.org/archive/debian/20171124T034111Z/pool/main/s/samba/samba-common_4.7.3%2Bdfsg-1_all.deb
Re: Updated installation images 2019-01-20
Dear Adrian, On 1/21/19 00:42, John Paul Adrian Glaubitz wrote: Hi! On 1/20/19 9:40 PM, Bob Tracy wrote: Thank you! I'll give the Alpha version a try in the next few days. Waiting on a 4.18 kernel build: should be done in the next 24 hours or so. The klibc package included in the current images is affected by a serious bug which breaks the boot on nearly all architecture [1]. I will therefore have to build new images once the new klibc package has been synced on the FTP mirrors. I did not yet check the ISO for Alpha, but what kernel version and type (non-MP/MP) does it include actually? Because we had that problem with non-MP kernels not working on many - if not all - Alpha machines earlier. And although it was actually fixed by Michael, the fix is only included since Linux v5.0-rc1 according to the tagging as I understand it, see [1] for details. [1]: https://github.com/torvalds/linux/commit/6ab7d47bcbf0144a8cb81536c2cead4cde18acfe So maybe it'd be better to delay the rebuild until after Linux 5.0 reaches unstable? Cheers, Frank
Re: Use SMP kernel for Alpha (udeb) builds
On 12/13/18 09:08, Michael Cree wrote: On Sun, Dec 09, 2018 at 08:21:11AM +1300, Michael Cree wrote: On Sun, Dec 09, 2018 at 07:54:52AM +1300, Michael Cree wrote: On Sat, Dec 08, 2018 at 12:01:23PM +1300, Michael Cree wrote: On Fri, Dec 07, 2018 at 10:39:58PM +0100, Frank Scheiner wrote: On 12/7/18 22:06, Michael Cree wrote: On Tue, Dec 04, 2018 at 05:38:51PM +0100, Frank Scheiner wrote: As per [1] and our recent discussions the generic 4.x kernels seem to no longer work on Alpha machines which also renders any installer images using the generic 4.x kernels non-working. Bisection leads to: dca496451bddea9aa87b7510dc2eb413d1a19dfd is the first bad commit Actually I am not so sure about that. It appears that sometimes a bad kernel can boot which might have lead me astray. That commit after failing once (assuming I did not make a mistake in the bisection) is now booting... It would appear that I accidently marked one step in the bisection incorrectly but I have now identified the problem commit and it makes a lot more sense. The first bad commit is commit b38d08f3181c5025a7ce84646494cc4748492a3b Author: Tejun Heo Date: Tue Sep 2 14:46:02 2014 -0400 percpu: restructure locking The commit prior to that one boots reliably but this one fails to boot a generic kernel. I'll report it to the linux kernel mail list. That's great! Thanks for getting things forward. Cheers, Frank
Re: Use SMP kernel for Alpha (udeb) builds
Hi Bob, Michael, On 12/8/18 21:03, Bob Tracy wrote: On Sat, Dec 08, 2018 at 07:41:15PM +0100, Frank Scheiner wrote: On 12/8/18 15:05, Bob Tracy wrote: So can we assume `CONFIG_ALPHA_GENERIC=y` also activates `CONFIG_ALPHA_LEGACY_START_ADDRESS`? I wouldn't assume so, particularly for the Gentoo kernel source tree to whatever extent it differs from the kernel.org source tree. What the dependency is saying is, you can't have the legacy start address config option force-enabled unless you're building a generic kernel. Thanks for the clarification. Otherwise, the (alpha) processor-specific config options presumably dictate whether the legacy start address is used. This is, I think, why Gentoo includes a generic+lsa kernel and a generic+nolsa kernel in their install image. Not helpful for our problem, but say, does the generic+nolsa kernel also boot on your PWS? I'd actually expect it to work. Because if that lsa is really only needed for older bootloaders (as mentioned on [1]), using a nolsa kernel on an older Alpha with a current bootloader shouldn't be a problem. [1]: https://cateee.net/lkddb/web-lkddb/ALPHA_LEGACY_START_ADDRESS.html But then I don't understand why Gentoo "today" still needs two different kernels. Just to be clear, Gentoo's generic kernel *does* have SMP configured, and *with* the legacy start address enabled should boot just fine on your PWS as it does on mine. Yes, I expect that, too. But if SMP support really has a play in our problem, than the Gentoo kernels (being both SMP capable) cannot provide "new" information for our problem. BTW, the patches applied by Gentoo for a slightly newer kernel (4.14.72) are available on [2]. [2]: https://dev.gentoo.org/~mpagano/genpatches/patches-4.14-72.html The kernel version is 4.14(.65). I was missing the time yesterday, so tested the Debian generic kernel on my ES45 today. it behaves like the DS25, i.e. it seems to hang after aboot starts the kernel: As I'm still missing a "MMJ to whatever" adapter I "copied" this from a glass console: ``` [...] bootstrap code read in base = 2fc000, image_start = 0, image_bytes = 90b86c(9484396) initializing HWRPB at 2000 initializing page table at initializing machine state setting affinity to the primary CPU jumping to bootstrap code aboot: Linux/Alpha SRM bootloader version 1.0_pre20040408 aboot: switching to OSF/1 PALcode version 1.92 aboot: loading initrd (4874860 bytes/9522 blocks) at 0xfc00ffb46000 aboot: starting kernel network with arguments root=/dev/nfs ip=:eth0:dhcp console=tty1 net.ifnames=0 biosdevname=0 ``` Another thing to note: Pushing (and releasing) the halt button on the ES45's OCP has no effect afterwards (I haven't yet checked that on my DS25). Doing the same when the Debian SMP kernel has started and the OS is running returns me immediately to the SRM prompt. So this mechanism seems to be broken by loading the Debian generic kernel. Cheers, Frank
Re: Use SMP kernel for Alpha (udeb) builds
Hi Philippe, On 12/9/18 19:12, Philippe Mathieu-Daudé wrote: FYI I've added few tests to QEMU to avoid regressions, one is booting the DP264 machine (not yet merged, the specific test is here:) Wow, didn't knew Alpha emulation is already that good with QEMU! https://lists.gnu.org/archive/html/qemu-devel/2018-04/msg03082.html I tested a recent Debian SMP kernel and got: alpha-softmmu/qemu-system-alpha \ -kernel vmlinuz-4.18.0-3-alpha-generic \ This is the non-SMP kernel, but I assume you meant that actually. -append console=srm -initrd initrd.gz \ -nographic -net nic -net user -d mmu,unimp \ -drive file=debian-503-alpha-businesscard.iso,if=ide,media=cdrom PCI: 00:00:0 class 0300 id 1013:00b8 PCI: region 0: 1000 PCI: region 1: 1200 PCI: 00:01:0 class 0200 id 8086:100e PCI: region 0: 1202 PCI: region 1: c000 PCI: 00:02:0 class 0101 id 1095:0646 PCI: region 0: c040 PCI: region 1: c048 PCI: region 3: c04c [0.00] Linux version 4.18.0-3-alpha-generic (debian-ker...@lists.debian.org) (gcc version 7.3.0 (Debian 7.3.0-30)) #1 Debian 4.18.20-2 (2018-11-23) [0.00] bootconsole [srm0] enabled [0.00] Booting GENERIC on Tsunami variation Clipper using machine vector Clipper from SRM [...] [0.003906] [ cut here ] [0.004882] WARNING: CPU: 0 PID: 0 at /build/linux-kQe68U/linux-4.18.20/init/main.c:650 start_kernel+0x4dc/0x754 [0.004882] Interrupts were enabled early [0.004882] Modules linked in: [0.005859] CPU: 0 PID: 0 Comm: swapper Not tainted 4.18.0-3-alpha-generic #1 Debian 4.18.20-2 [0.006835]fc00018f3dc8 fc000216ee70 fc000103597c fc0001898ddc [0.007812]fc00010359f4 fc00018ce1b0 fc0002171704 fc000216ee70 [0.007812]fc000216ee70 028a fc0001898ddc [0.007812]fc0001898ddc fc000173e371 fc00018f3e88 [0.008789]fc18 fc000216ee70 0001 [0.008789]fc00018acab8 0001 [0.008789] Trace: [0.009765] [] __warn+0x15c/0x180 [0.009765] [] warn_slowpath_fmt+0x54/0x70 [0.009765] [] _stext+0x1c/0x20 [0.009765] [] _stext+0x0/0x20 [0.010742] [0.010742] ---[ end trace c85a0517f87d04be ]--- [...] [2.127928] ledtrig-cpu: registered to indicate activity on CPUs [2.131834] NET: Registered protocol family 10 [2.264647] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input0 [...] Maybe the warning at init/main.c:650 is useful for your real hw? Maybe Michael and Bob can make something out of this. But a problem is, that we actually don't get that far on real hardware with the non-SMP kernel. All machines I tested so far either (1) fall back to SRM or (2) seem to hang (DS25/ES45 and most likely other Titan based systems) after aboot starts the kernel. Cheers, Frank P.S. BTW, that Avocado stuff looks interesting. I wonder if that could also be used to verify (Linux kernel) bootups on real hardware. That would definitely ease up testing new kernel versions.
Re: Use SMP kernel for Alpha (udeb) builds
On 12/8/18 15:05, Bob Tracy wrote: On Sat, Dec 08, 2018 at 11:15:21AM +0100, Frank Scheiner wrote: Is this Gentoo generic installer kernel SMP capable? I believe these Gentoo kernels have the config included in the kernel image, so available as `/proc/config.gz` during runtime, I think. From the "image.squashfs" file on the Gentoo "install-alpha-minimal" image, attached is "etc/kernels/kernel-config-alpha-4.14.65-gentoo" which appears to correspond to the "nolsa" kernel variant. To your question about whether SMP is configured, most definitely "yes" with CONFIG_NR_CPUS=32. Thanks for checking. This seems to be definitely a SMP capable kernel, as `CONFIG_SMP=y` is also set. About the `CONFIG_ALPHA_LEGACY_START_ADDRESS`, [1] mentions this is actually needed for older boot loaders only which hardcoded the kernel start address. And the Gentoo config shows it as inactive: `# CONFIG_ALPHA_LEGACY_START_ADDRESS is not set` [1]: https://cateee.net/lkddb/web-lkddb/ALPHA_LEGACY_START_ADDRESS.html But interesting, [1] also says, that this option depends on CONFIG_ALPHA_GENERIC, which is actually set (`CONFIG_ALPHA_GENERIC=y`) in the Gentoo config. So can we assume `CONFIG_ALPHA_GENERIC=y` also activates `CONFIG_ALPHA_LEGACY_START_ADDRESS`? If yes this could correspond to the behaviour of the generic Debian kernel on my DS25. I just tested a `netabootwrap`ped `4.18.0-2-alpha-generic` and after aboot emits the "starting kernel [...]" message nothing happens: ``` >>>boot (boot ega0.0.0.5.2 -flags root=/dev/nfs ip=:enP2p2s5:dhcp console=ttyS0,9600n8) Trying BOOTP boot. Broadcasting BOOTP Request... Received BOOTP Packet File Name is: /AC100259 local inet address: 172.16.2.89 remote inet address: 172.16.0.2 TFTP Read File Name: /AC100259 netmask = 255.255.0.0 Server is on same subnet as client. block number= 0 port_number= 35092 . bootstrap code read in base = 39c000, image_start = 0, image_bytes = 90b86c(9484396) initializing HWRPB at 2000 initializing page table at initializing machine state setting affinity to the primary CPU jumping to bootstrap code aboot: Linux/Alpha SRM bootloader version 1.0_pre20040408 aboot: switching to OSF/1 PALcode version 1.92 aboot: loading initrd (4874860 bytes/9522 blocks) at 0xfc00ffb46000 aboot: starting kernel network with arguments root=/dev/nfs ip=:enP2p2s5:dhcp console=ttyS0,9600n8 ``` And as [1] says, the SRM firmware of Titan machines is bigger than on older Alpha machines, so the kernel start address for the generic kernel might have ended somewhere inside the SRM. I'll check that with my ES45, too. The same kernel leads to: ``` CPU 0 booting (boot ewa0.0.0.3.0 -flags root=/dev/nfs ip=dhcp console=tty1 console=ttyS0,9600n8) Trying BOOTP boot. Broadcasting BOOTP Request... .Received BOOTP Packet File Name is: /AC10020F local inet address: 172.16.2.15 remote inet address: 172.16.0.2 TFTP Read File Name: /AC10020F netmask = 255.255.0.0 Server is on same subnet as client. . bootstrap code read in base = 1e6000, image_start = 0, image_bytes = 90b86c initializing HWRPB at 2000 initializing page table at 1d8000 initializing machine state setting affinity to the primary CPU jumping to bootstrap code aboot: Linux/Alpha SRM bootloader version 1.0_pre20040408 aboot: switching to OSF/1 PALcode version 1.22 aboot: loading initrd (4874860 bytes/9522 blocks) at 0xfc0023b56000 aboot: starting kernel network with arguments root=/dev/nfs ip=dhcp console=tty1 console=ttyS0,9600n8 halted CPU 0 halt code = 6 double error halt PC = fc000107f868 boot failure ``` ...on my PWS 500au. I hence assume, the SRM is small enough on this machine, so the kernel start address doesn't end up in the SRM. The SMP kernel boots without an issue on both machines. But strange, the kernel configuration files for both `4.18.0-2-alpha-generic` and `4.18.0-2-alpha-smp` contain: ``` # grep -n CONFIG_ALPHA_GENERIC config-4.18.0-2-alpha-generic config-4.18.0-2-alpha-smp config-4.18.0-2-alpha-generic:288:CONFIG_ALPHA_GENERIC=y config-4.18.0-2-alpha-smp:296:CONFIG_ALPHA_GENERIC=y ``` So shouldn't this setting then not also imply `CONFIG_ALPHA_LEGACY_START_ADDRESS` active for both kernels (so also for the SMP kernel)? But maybe some other active/inactive option in the SMP kernel remedies the dependent `CONFIG_ALPHA_LEGACY_START_ADDRESS`. A unified diff between both configurations is attached. Oh btw, the generic config also has "CONFIG_BROKEN_ON_SMP=y" but I am not sure what this means. [2] mentions this is sort of attached to drivers unsafe on S
Re: Use SMP kernel for Alpha (udeb) builds
On 12/8/18 06:58, Bob Tracy wrote: On Sat, Dec 08, 2018 at 10:06:25AM +1300, Michael Cree wrote: On Tue, Dec 04, 2018 at 05:38:51PM +0100, Frank Scheiner wrote: As per [1] and our recent discussions the generic 4.x kernels seem to no longer work on Alpha machines which also renders any installer images using the generic 4.x kernels non-working. Yes, that was noted some time ago. A generic kernel does not boot since about 3.13. I can't remember why I never attempted bisecting this back when it was first noted to be a problem, maybe because it didn't affect me because I normally run my own spun kernels. Ditto on this end. I figure a first pass at the problem would be to compare our respective kernel configs against the generic one, just to get a reading on what code *may* be involved. I can provide my Miata config for a 4.14 kernel (and that's about all I can do until I'm back up and running) if that would be helpful. Another data point to consider would be the kernel config for the current (as of the end of November) Gentoo "install-alpha-minimal" image, which works on Miata at least (modulo the missing Qlogic firmware issue). The associated kernel is "4.14.65-gentoo", and two variants are present on the image -- a "generic" one, and one without a "legacy start address". The "aboot.conf" file has the following comment: # Some later alphas need a special kernel without legacy start address, most # notably the DS15A and DS25 workstations as well as the ES45, ES47 and GS # series of servers. The Miata boots fine with the "generic" kernel, and panics when I try the "nolsa" kernel. Is this Gentoo generic installer kernel SMP capable? I believe these Gentoo kernels have the config included in the kernel image, so available as `/proc/config.gz` during runtime, I think. Bottom line: I think the way forward will be easier from a Debian perspective if the Debian installer for alpha includes a >= 4.14 kernel, because the 4.8 and 4.9 kernels are known to have issues anyway. An upgrade would also put alpha closer to being in-sync with the "testing" distro on Intel/AMD platforms. I think the kernel version used on the installers will be the same version that's available as `linux-image-[...].deb` at the time of creation, as kernel-wedge creates the udebs from the `linux-image-[...].deb` IIUIC. Cheers, Frank
Re: Use SMP kernel for Alpha (udeb) builds
On 12/7/18 22:06, Michael Cree wrote: On Tue, Dec 04, 2018 at 05:38:51PM +0100, Frank Scheiner wrote: As per [1] and our recent discussions the generic 4.x kernels seem to no longer work on Alpha machines which also renders any installer images using the generic 4.x kernels non-working. Yes, that was noted some time ago. A generic kernel does not boot since about 3.13. Must be after 3.16 because a Debian 3.16 generic kernel still worked on my PWS 500au back in 2017 or even earlier. I can't remember why I never attempted bisecting this back when it was first noted to be a problem, maybe because it didn't affect me because I normally run my own spun kernels. Yes, you mentioned that it doesn't affect non-generic kernels, e.g. kernels built for specific hardware like DP264. Confirmed on: * AlphaStation 200 (w/EV4 x 1) * AlphaStation 255 (w/EV45 x 1) * Personal Workstation 500au (w/EV56 x 1) * AlphaServer DS20E (w/EV67 x 2) Also on XP1000 so I would presume on any DP264 based machine. Also expected on: * AlphaServer ES45 (w/EV68CB x 4) Actually no. I seem to recall that the generic kernel does boot on ES45 (Titan). Interesting, maybe I should also give that a try on my DS25. I can check that at some point when the buildds are not busy. If you want to avoid a reboot on the buildd machine, I can have a look with my ES45 on Sunday. Cheers, Frank
Re: Use SMP kernel for Alpha (udeb) builds
On 12/5/18 07:33, Bob Tracy wrote: Can you open PRs so that these changes can get merged? I will then build new images. Sure, created them now: * First part: https://salsa.debian.org/kernel-team/linux/merge_requests/79 * Second part: https://salsa.debian.org/installer-team/debian-installer/merge_requests/6 @all: Unfortunately both patches weren't included in the latest (maintainer?) commits/releases - not that I expected that ;-). I keep rebasing them (due to constant changes to `debian/changelog`) but would it help if people (with influence :-)) upvote these patches with that "thumbs up" buttons? Much appreciated, gentlemen. Wish I could do more than offer my system up as a test platform, but so it goes... I'll be happy to help with determining the "actual problem which is yet unknown" with the alpha generic kernel, once my system is back up and running :-). Hey, if someone knows how to use `kernel-wedge` manually, we could build a netboot image right away, assuming that `kernel-wedge` can use the existing linux-image-[...]-alpha-smp package to build the needed udebs. That would not require a rebuild of the linux-image-[...]-alpha-smp package and save us a lot of time. Cheers, Frank
Re: Use SMP kernel for Alpha (udeb) builds
Hi Adrian, On 12/4/18 17:45, John Paul Adrian Glaubitz wrote: ## Patches ## 1. https://salsa.debian.org/frank-scheiner-guest/linux/commit/865cacfd7722b346629082ab3094b6ad93964095 2. https://salsa.debian.org/frank-scheiner-guest/debian-installer/commit/7269679bec8bae997ef5ed7619e9f8df2e184134 I think both patches are already enough to produce the needed alpha-smp udebs and will allow to produce working installer images (e.g. netboot images might work instantly and could be an alternative way for Bob to reinstall his PWS). What do you think? Is there anything obvious missing? Can you open PRs so that these changes can get merged? I will then build new images. Sure, created them now: * First part: https://salsa.debian.org/kernel-team/linux/merge_requests/79 * Second part: https://salsa.debian.org/installer-team/debian-installer/merge_requests/6 Cheers, Frank
Use SMP kernel for Alpha (udeb) builds
Dear all, As per [1] and our recent discussions the generic 4.x kernels seem to no longer work on Alpha machines which also renders any installer images using the generic 4.x kernels non-working. [1]: https://lists.debian.org/debian-alpha/2017/03/msg7.html Confirmed on: * AlphaStation 200 (w/EV4 x 1) * AlphaStation 255 (w/EV45 x 1) * Personal Workstation 500au (w/EV56 x 1) * AlphaServer DS20E (w/EV67 x 2) Also expected on: * AXPpci33 (w/LCA4 x 1) * AlphaStation 500 (w/EV56 x 1) * AlphaServer DS25 (w/EV68CB x 2) * AlphaServer ES45 (w/EV68CB x 4) The following two patches should switch the used kernels to the SMP version. As: (1) I don't exactly know how to build images using multiple kernels (i.e. what happens if $TEMP_KERNEL has multiple kernel names in it, which seems to be supported according to [2], will the image creation in e.g. [3] than run multiple times automatically?) and I don't want to break things, [2]: https://salsa.debian.org/installer-team/debian-installer/blob/master/build/config/dir#L79 [3]: https://salsa.debian.org/installer-team/debian-installer/blob/master/build/config/alpha/netboot.cfg (2) I can't find a similar example for another architecture and (3) the images with the generic kernels are non-working anyhow, ...I just omitted the generic ones for now. This is sort of a workaround and does not fix the actual problem which is yet unknown, but I believe getting working installer images is more important at the moment. With working installer images more people could get involved and maybe sometime in the future someone has enough time and effort to invest in fixing the actual problem. ## Patches ## 1. https://salsa.debian.org/frank-scheiner-guest/linux/commit/865cacfd7722b346629082ab3094b6ad93964095 2. https://salsa.debian.org/frank-scheiner-guest/debian-installer/commit/7269679bec8bae997ef5ed7619e9f8df2e184134 I think both patches are already enough to produce the needed alpha-smp udebs and will allow to produce working installer images (e.g. netboot images might work instantly and could be an alternative way for Bob to reinstall his PWS). What do you think? Is there anything obvious missing? Cheers, Frank
Re: [alpha] Debian 9.0 NETINST fails
Dear Bob, sorry, looks like I missed your mails to the debian-alpha list until now. On 11/2/18 19:56, Bob Tracy wrote: Additional info... Frank Scheiner reported similar badness on his PWS back in March of 2017. See the "debian-alpha" archive link: https://lists.debian.org/debian-alpha/2017/03/msg7.html Executive summary: SMP 4.x kernels work fine, but the generic Debian kernel does *not* (or at least didn't at that time). I can confirm this. I don't remember exactly when I tried the Debian 9 Sid installer image with 4.x generic kernel provided by Adrian, but I remember that it produced the same result, i.e.: ``` [...] halted CPU 0 halt code = 5 HALT instruction executed [...] ``` I believe since then no newer installer image for Alpha was produced. As a workaround, could it work to netboot the matching stock SMP kernel (4.9.0-3) with the netboot installer initrd from [1]? I don't know how to extract the initrd from the `netabootwrap`ed image though. Or could it work to netboot the SMP kernel with the cdrom installer initrd from [1] and the installer CDROM in the CDROM drive? [1]. http://ftp.ports.debian.org/debian-ports/pool-alpha/main/d/debian-installer/debian-installer-images_20170615_alpha.tar.gz Other approach: as per [2] hppa for example uses two kernels. So could we just change [2] for alpha to also include the SMP kernel, with e.g. that patch: ``` --- debian/installer/kernel-versions2018-11-06 13:30:54.152319148 +0100 +++ debian/installer/kernel-versions-new2018-11-06 13:31:38.992320296 +0100 @@ -1,5 +1,6 @@ # arch version flavour installedname suffix build-depends alpha - alpha-generic - y - +alpha - alpha-smp - y - amd64 - amd64 - - - arm64 - arm64 - - - armel - marvell - y - ``` ...and the installer images will also include the SMP kernel? [2]: https://salsa.debian.org/kernel-team/linux/raw/master/debian/installer/kernel-versions Cheers, Frank
Re: Latest install images
Hi, On 09/11/2017 09:06 AM, Gyenes Istvan wrote: Hello, I have tried out the latest images but it fails to boot both on AS800 and Miata. The same things happen on DS10 As per [1] and [2] I was partly successful in netbooting my Alpha machines (which include a PWS 500au and a DS20E (similar to a DS10 from a chipset and CPU point of view)) with the SMP versions of the Debian Linux kernels (v4.9.18 at that time), although most of the machines are uniprocessor machines. [1]: https://lists.debian.org/debian-alpha/2017/03/msg7.html [2]: https://lists.debian.org/debian-alpha/2017/04/msg00018.html But I assume this won't help much for the current installer images. :-/ Cheers, Frank
Re: Debian kernel boot failure on Alpha [was Re: systemd FTBFS]
Hi Michael, ooops, looks like I have to correct myself. :-( The Debian v4.8 and v4.9 (up to 4.9.18) SMP kernels do not successfully boot on my AlphaStations 200 and 255 - at least not fully, they do start though. I didn't test the `agp=off` kernel command line option on the AlphaStations before finishing my last email, but assumed it will just make them work, as it didn't do any harm when using it on the PWS 500au. :-/ It looks like it only works until or after the agpgart message appears. But then it hangs. And when using `agp=off` it hangs just before the agpgart message would have appeared. Maybe this is related to the commit you mention at the end, but that reads like it should affect all my other Alpha machines, too. I have seen the AlphaStation 255 working until the IP auto configuration with the v4.8.x (and the v4.9.18 one just today) SMP kernel one single time each, but the IP auto configuration didn't succeed, because the NIC couldn't or didn't accept the IP address given by the DHCP server. After several tries it looks like it only gets to the IP auto configuration for the first boot after the machine was powered off for a longer time. So could be a heat issue, too... But even when it reaches the IP auto configuration it does not succeed, though I now also use an additional NIC as described below to avoid the usage of the de2104x based one during IP auto configuration. I think the IP auto configuration problem is related to the de2104x driver, as I have had the same problem on my AXPpci33 with a DE435 NIC. But there I could workaround it by using the DE435 just for netbooting the combined kernel and initrd image and using a second NIC for mounting the NFS root FS during kernel boot. The AXPpci33 does not show the other problem of the AlphaStations and works just like the PWS 500au and AlphaServer DS20E. But the AlphaStations nearly always start to hang before reaching the IP auto configuration, even if using another NIC in addition. On 04/10/2017 10:38 AM, Michael Cree wrote: [...] There were some messages about enabling BWX (starting with [1]), but as it should still work with EV56 when enabled I wondered if my problems were related at all. [1]: https://lists.debian.org/debian-alpha/2014/09/msg0.html Mid of 2015 you wrote on [2] that you haven't yet enabled BWX, I've managed to switch the defaults in gcc in a test run and started a local repository rebuild to test it but it exposed a nasty bug in libc so I couldn't continue to test. That took some time to fix and I never returned to exploring switching Debian Ports to using BWX. My feeling is that we should in fact do that considering that the i386 arch recently dropped support for Intel chips that are more recent than the old-time Alpha cpus we still support! Yeah, but those old-time Alpha CPUs are so much more interesting than any i386 based CPU, don't you agree? :-D Think about the implications of using BWX: I don't know for sure what Alpha machines other people have available, but I assume that using BWX would lock out a big part of the machines that are available to hobbyists nowadays (everything pre EV56). Which would also lock out a lot of people that could test things on their machines in the past and at least help in this respect. So I'm afraid this change could shrink the Debian Alpha user base considerably unless people manage to procure newer Alpha gear, which might prove difficult. I'm afraid with less machines supported by Debian GNU/Linux it will be even easier to give up on the Alpha architecture completely in the future. On the other hand I understand your arguments in [1]. I am no kernel developer so I can't offer you my help in this regard to spread the load, but I can offer to test things on my Alpha machines if it helps. And keep in mind that Debian Alpha is still alive despite BWX not used since more than two years. :-) I'm thankful that I can still run Debian GNU/Linux on my Alpha machines nowadays and I'd love to also keep the older machines running with Debian GNU/Linux in the future. But that of course doesn't buy you extra time for maintaining Debian Ports. :-/ [1]: https://lists.debian.org/debian-alpha/2014/09/msg0.html Until recently I didn't have a reason to also try the SMP variant of the v4.x Linux kernels, but now that I have a DS20E and had the same problems with the generic Linux kernel v4.x from Debian I also tried the SMP variant on it. And the SMP variant of the v4.x kernel just works (tested with "4.8.0.2-alpha-smp", "Debian 4.8.11-1" from 2016-12-02 to be exact). Now that it is interesting! The SMP kernel also boots on my XP1000. The only downside to using an SMP kernel is extra allocations of per-cpu data structures, and the loss of some optimised code that eject SMP barriers. Well, I was happy that it ran at all on what I have available - except for the AlphaStations of course - so I'm fine with some downsides
Re: Debian kernel boot failure on Alpha [was Re: systemd FTBFS]
Dear Michael, just a quick note because it might be related or at least helpful: I have had a similar problem than you, i.e. no generic Linux kernel v4.x from Debian runs or ran on my Alpha gear (the newest one is a PWS 500au with EV56). E.g. this is the output of a tried boot with a generic Linux kernel v.4.3.0-1 from Debian: ``` >>>show boot* boot_devewa0.0.0.3.0 boot_file boot_osflagsroot=/dev/nfs ip=dhcp console=tty1 console=ttyS0,9600n8 boot_reset OFF bootdef_dev ewa0.0.0.3.0 booted_dev booted_file booted_osflags >>>boot (boot ewa0.0.0.3.0 -flags root=/dev/nfs ip=dhcp console=tty1 console=ttyS0,9600n8) Trying BOOTP boot. Broadcasting BOOTP Request... Received BOOTP Packet File Name is: /AC10020F local inet address: 172.16.5.1 remote inet address: 172.16.3.6 TFTP Read File Name: /AC10020F netmask = 255.255.0.0 Server is on same subnet as client. .. bootstrap code read in base = 1e6000, image_start = 0, image_bytes = 79a5c6 initializing HWRPB at 2000 initializing page table at 1d8000 initializing machine state setting affinity to the primary CPU jumping to bootstrap code aboot: Linux/Alpha SRM bootloader version 1.0_pre20040408 aboot: switching to OSF/1 PALcode version 1.22 aboot: loading initrd (4241862 bytes/8285 blocks) at 0xfc0023bf2000 aboot: starting kernel network with arguments root=/dev/nfs ip=dhcp console=tty1 console=ttyS0,9600n8 halted CPU 0 halt code = 5 HALT instruction executed PC = fc000134b8a0 boot failure >>> ``` The older 3.16.x kernel worked without a problem on my machines though. Hence I thought that maybe something was changed in the v4.x kernels that made them incompatible with pre EV6 CPUs. There were some messages about enabling BWX (starting with [1]), but as it should still work with EV56 when enabled I wondered if my problems were related at all. [1]: https://lists.debian.org/debian-alpha/2014/09/msg0.html Mid of 2015 you wrote on [2] that you haven't yet enabled BWX, so also for my other gear (LCA4, EV4, EV45) this couldn't be the problem, as it wasn't enabled yet. [2]: https://lists.debian.org/debian-alpha/2015/06/msg1.html As no v4.x kernel worked with my machines I stayed with the working 3.16.x kernels. Until recently I didn't have a reason to also try the SMP variant of the v4.x Linux kernels, but now that I have a DS20E and had the same problems with the generic Linux kernel v4.x from Debian I also tried the SMP variant on it. And the SMP variant of the v4.x kernel just works (tested with "4.8.0.2-alpha-smp", "Debian 4.8.11-1" from 2016-12-02 to be exact). Today I thought about also trying the SMP variant of the v4.x kernel on my older Alpha non-SMP machines - well, there's no reason that an SMP kernel shouldn't work on a non-SMP machine I believe, if it is not prohibited by some configuration. And guess what, the SMP v4.x kernel also works there. For my AlphaStations 200 and 255 I had to disable AGP support with `agp=off` to be able to continue booting after the agpgart message. I don't really know why it bothers these machines, but disabling AGP support on a system without AGP ports doesn't hurt, so no issue. I currently still have a problem with the kernel level IP autoconfiguration on the older machines, but the SMP kernel seems to be booting on all machines. Find the dmesg output for my PWS 500au below. So maybe you could try the v4.x SMP kernel on your XP1000 and see if it works there, too? Please CC me because I'm not on the list. Cheers, Frank ``` [0.00] Linux version 4.8.0-2-alpha-smp (debian-ker...@lists.debian.org) (gcc version 5.4.1 20161019 (Debian 5.4.1-3) ) #1 SMP Debian 4.8.11-1 (2016-12-02) [0.00] Booting GENERIC on Miata using machine vector Miata from SRM [0.00] Major Options: SMP MAGIC_SYSRQ [0.00] Command line: root=/dev/nfs ip=dhcp console=tty1 console=ttyS0,9600n8 [0.00] memcluster 0, usage 1, start0, end 243 [0.00] memcluster 1, usage 0, start 243, end73727 [0.00] memcluster 2, usage 1, start73727, end73728 [0.00] freeing pages 243:2048 [0.00] freeing pages 4271:73727 [0.00] reserving pages 4271:4273 [0.00] Initial ramdisk at: 0xfc0023b8e000 (4648815 bytes) [0.00] 2048K Bcache detected; load hit latency 29 cycles, load miss latency 121 cycles [0.00] pci: cia revision 1 (pyxis) [0.00] SMP: 1 CPUs probed -- cpu_present_mask = 1 [0.00] On node 0 totalpages: 73727 [0.00] free_area_init_node: node 0, pgdat fc000207fac0, node_mem_map fc1f0400 [0.00] DMA zone: 576 pages used for memmap [0.00] DMA zone: 0 pages reserved [0.00] DMA zone: 73727 pages, LIFO batch:15 [0.00] percpu: Embedded