Bug#1004392: systemd: Incorrect location of configuration files
Package: systemd Version: 247.3-6 Severity: serious Justification: Policy 10.7 Dear Maintainer, /usr/lib/tmpfiles.d/x11.conf should be a configuration file. Entries in it must be disabled in order to run containers with accelerated X11 and DRI access. As it is under lib, changes to it are overwritten on every systemd update breaking all containers which run X apps with direct access to local X-server. 1. There is no way to disable it permanently. 2. There is no way to override it in a way which disables the defaults Actually, most of that directory does not belong in /usr - it should be under /etc as per Debian policy for configuration files and should be handled as config on system upgrades and updates. -- Package-specific info: -- System Information: Debian Release: 11.2 APT prefers stable-updates APT policy: (500, 'stable-updates'), (500, 'stable') Architecture: amd64 (x86_64) Kernel: Linux 5.10.0-10-amd64 (SMP w/8 CPU threads) Kernel taint flags: TAINT_OOT_MODULE, TAINT_UNSIGNED_MODULE Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8), LANGUAGE=en_GB:en Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system) LSM: AppArmor: enabled Versions of packages systemd depends on: ii adduser3.118 ii libacl12.2.53-10 ii libapparmor1 2.13.6-10 ii libaudit1 1:3.0-2 ii libblkid1 2.36.1-8 ii libc6 2.31-13+deb11u2 ii libcap21:2.44-1 ii libcrypt1 1:4.4.18-4 ii libcryptsetup122:2.3.5-1 ii libgcrypt201.8.7-6 ii libgnutls303.7.1-5 ii libgpg-error0 1.38-2 ii libip4tc2 1.8.7-1 ii libkmod2 28-1 ii liblz4-1 1.9.3-2 ii liblzma5 5.2.5-2 ii libmount1 2.36.1-8 ii libpam0g 1.4.0-9+deb11u1 ii libseccomp22.5.1-1+deb11u1 ii libselinux13.1-3 ii libsystemd0247.3-6 ii libzstd1 1.4.8+dfsg-2.1 ii mount 2.36.1-8 ii ntp [time-daemon] 1:4.2.8p15+dfsg-1 ii util-linux 2.36.1-8 Versions of packages systemd recommends: ii dbus 1.12.20-2 Versions of packages systemd suggests: ii policykit-10.105-31 pn systemd-container Versions of packages systemd is related to: pn dracut ii initramfs-tools 0.140 ii libnss-systemd 247.3-6 ii libpam-systemd 247.3-6 ii udev 247.3-6 -- Configuration Files: /etc/systemd/logind.conf changed: [Login] KillUserProcesses=yes KillExcludeUsers=root -- no debconf information
Bug#989571: linux-image-5.10.0-0.bpo.3-amd64: Incorrect large USB disk sizing leading to data corruption
Package: src:linux Version: 5.10.13-1~bpo10+1 Severity: critical Justification: causes serious data loss Dear Maintainer, Large USB drives (example - Seagate 4TB Backup) which work perfectly fine with 4.19 are identified as incorrect size. In the case of the 4TB sized USB it's identified as a 17GB and for some unfatomable reason mounted as loop. The result is severe data corruption making all 4TB of data on the drive unrecoverable. Tested with the original USB bridge coming with the drive and after attaching the SATA drive inside to an alternative USB bridge. Same result in both cases. -- Package-specific info: ** Version: Linux version 5.10.0-0.bpo.3-amd64 (debian-ker...@lists.debian.org) (gcc-8 (Debian 8.3.0-6) 8.3.0, GNU ld (GNU Binutils for Debian) 2.31.1) #1 SMP Debian 5.10.13-1~bpo10+1 (2021-02-11) ** Command line: BOOT_IMAGE=diskless/amd64/vmlinuz-5.10.0-0.bpo.3-amd64 initrd=diskless/amd64/initrd.img-5.10.0-0.bpo.3-amd64 root=/dev/nfs ip=dhcp nfsroot=192.168.3.3:/exports/boot/madding mitigations=off rw -- ** Tainted: S (4) * SMP kernel oops on an officially SMP incapable processor ** Kernel log: [754632.929276] nfs: server 192.168.3.3 OK [754635.600887] rpc_check_timeout: 443 callbacks suppressed [754635.600889] nfs: server 192.168.3.3 not responding, still trying [754635.612996] nfs: server 192.168.3.3 not responding, still trying [754635.625266] nfs: server 192.168.3.3 not responding, still trying [754635.625462] nfs: server 192.168.3.3 not responding, still trying [754635.637374] nfs: server 192.168.3.3 not responding, still trying [754635.649472] nfs: server 192.168.3.3 not responding, still trying [754635.661739] nfs: server 192.168.3.3 not responding, still trying [754635.661922] nfs: server 192.168.3.3 not responding, still trying [754635.673850] nfs: server 192.168.3.3 not responding, still trying [754635.686131] nfs: server 192.168.3.3 not responding, still trying [791938.374623] lxc-bridge0: port 3(tap-opsft2-0) entered blocking state [791938.374628] lxc-bridge0: port 3(tap-opsft2-0) entered forwarding state [791938.374654] lxc-bridge0: port 4(tap-opsft3-0) entered blocking state [791938.374655] lxc-bridge0: port 4(tap-opsft3-0) entered forwarding state [791938.375075] lxc-bridge0: port 2(tap-opsft1-0) entered blocking state [791938.375078] lxc-bridge0: port 2(tap-opsft1-0) entered forwarding state [791938.388241] k8-bridge0: port 2(tap-opsft1-1) entered blocking state [791938.388243] k8-bridge0: port 2(tap-opsft1-1) entered forwarding state [791938.388402] k8-bridge0: port 4(tap-opsft3-1) entered blocking state [791938.388405] k8-bridge0: port 4(tap-opsft3-1) entered forwarding state [791938.388481] k8-bridge0: port 3(tap-opsft2-1) entered blocking state [791938.388484] k8-bridge0: port 3(tap-opsft2-1) entered forwarding state [801076.265404] usb 4-2.4: new SuperSpeed Gen 1 USB device number 5 using xhci_hcd [801076.289933] usb 4-2.4: New USB device found, idVendor=174c, idProduct=55aa, bcdDevice= 1.00 [801076.289937] usb 4-2.4: New USB device strings: Mfr=2, Product=3, SerialNumber=1 [801076.289939] usb 4-2.4: Product: ASM105x [801076.289940] usb 4-2.4: Manufacturer: ASMT [801076.289942] usb 4-2.4: SerialNumber: [801076.291139] scsi host10: uas [801076.291557] scsi 10:0:0:0: Direct-Access ASMT 2115 0 PQ: 0 ANSI: 6 [801076.292065] sd 10:0:0:0: Attached scsi generic sg0 type 0 [801076.292232] sd 10:0:0:0: [sda] Spinning up disk... [801077.321342] ..ready [801082.447597] sd 10:0:0:0: [sda] 7814037168 512-byte logical blocks: (4.00 TB/3.64 TiB) [801082.447600] sd 10:0:0:0: [sda] 4096-byte physical blocks [801082.447673] sd 10:0:0:0: [sda] Write Protect is off [801082.447674] sd 10:0:0:0: [sda] Mode Sense: 43 00 00 00 [801082.447832] sd 10:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [801082.448032] sd 10:0:0:0: [sda] Optimal transfer size 33553920 bytes not a multiple of physical block size (4096 bytes) [801082.494646] sd 10:0:0:0: [sda] Attached SCSI disk [801150.687429] loop: module loaded [801150.815997] EXT4-fs (loop0): mounted filesystem with ordered data mode. Opts: (null) [803002.579925] blk_update_request: I/O error, dev loop0, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 0 [803002.579960] blk_update_request: I/O error, dev loop0, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 0 [803017.725341] EXT4-fs (loop0): mounted filesystem with ordered data mode. Opts: (null) [803081.125594] blk_update_request: I/O error, dev loop0, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 0 [803081.125635] blk_update_request: I/O error, dev loop0, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 0 [803085.522063] EXT4-fs (loop0): mounted filesystem with ordered data mode. Opts: (null) [803239.336895] blk_update_request: I/O error, dev loop0, sector 0 op 0x1:(WRITE) flags 0x800 phys_seg 0 prio class 0 [803239.336950] blk_update_request: I/O
Bug#983379: [PATCH] um: mark all kernel symbols as local
On 05/03/2021 20:43, Johannes Berg wrote: From: Johannes Berg Ritesh reported a bug [1] against UML, noting that it crashed on startup. The backtrace shows the following (heavily redacted): (gdb) bt ... #26 0x60015b5d in sem_init () at ipc/sem.c:268 #27 0x7f89906d92f7 in ?? () from /lib/x86_64-linux-gnu/libcom_err.so.2 #28 0x7f8990ab8fb2 in call_init (...) at dl-init.c:72 ... #40 0x7f89909bf3a6 in nss_load_library (...) at nsswitch.c:359 ... #44 0x7f8990895e35 in _nss_compat_getgrnam_r (...) at nss_compat/compat-grp.c:486 #45 0x7f8990968b85 in __getgrnam_r [...] #46 0x7f89909d6b77 in grantpt [...] #47 0x7f8990a9394e in __GI_openpty [...] #48 0x604a1f65 in openpty_cb (...) at arch/um/os-Linux/sigio.c:407 #49 0x604a58d0 in start_idle_thread (...) at arch/um/os-Linux/skas/process.c:598 #50 0x60004a3d in start_uml () at arch/um/kernel/skas/process.c:45 #51 0x600047b2 in linux_main (...) at arch/um/kernel/um_arch.c:334 #52 0x6000574f in main (...) at arch/um/os-Linux/main.c:144 indicating that the UML function openpty_cb() calls openpty(), which internally calls __getgrnam_r(), which causes the nsswitch machinery to get started. This loads, through lots of indirection that I snipped, the libcom_err.so.2 library, which (in an unknown function, "??") calls sem_init(). Now, of course it wants to get libpthread's sem_init(), since it's linked against libpthread. However, the dynamic linker looks up that symbol against the binary first, and gets the kernel's sem_init(). Hajime Tazaki noted that "objcopy -L" can localize a symbol, so the dynamic linker wouldn't do the lookup this way. I tried, but for some reason that didn't seem to work. Doing the same thing in the linker script instead does seem to work, though I cannot entirely explain - it *also* works if I just add "VERSION { { global: *; }; }" instead, indicating that something else is happening that I don't really understand. It may be that explicitly doing that marks them with some kind of empty version, and that's different from the default. Explicitly marking them with a version breaks kallsyms, so that doesn't seem to be possible. Marking all the symbols as local seems correct, and does seem to address the issue, so do that. Also do it for static link, nsswitch libraries could still be loaded there. [1] https://bugs.debian.org/983379 Reported-by: Ritesh Raj Sarraf Signed-off-by: Johannes Berg --- arch/um/kernel/dyn.lds.S | 6 ++ arch/um/kernel/uml.lds.S | 6 ++ 2 files changed, 12 insertions(+) diff --git a/arch/um/kernel/dyn.lds.S b/arch/um/kernel/dyn.lds.S index dacbfabf66d8..2f2a8ce92f1e 100644 --- a/arch/um/kernel/dyn.lds.S +++ b/arch/um/kernel/dyn.lds.S @@ -6,6 +6,12 @@ OUTPUT_ARCH(ELF_ARCH) ENTRY(_start) jiffies = jiffies_64; +VERSION { + { +local: *; + }; +} + SECTIONS { PROVIDE (__executable_start = START); diff --git a/arch/um/kernel/uml.lds.S b/arch/um/kernel/uml.lds.S index 45d957d7004c..7a8e2b123e29 100644 --- a/arch/um/kernel/uml.lds.S +++ b/arch/um/kernel/uml.lds.S @@ -7,6 +7,12 @@ OUTPUT_ARCH(ELF_ARCH) ENTRY(_start) jiffies = jiffies_64; +VERSION { + { +local: *; + }; +} + SECTIONS { /* This must contain the right address - not quite the default ELF one.*/ Acked-By: Anton Ivanov -- Anton R. Ivanov Cambridgegreys Limited. Registered in England. Company Number 10273661 https://www.cambridgegreys.com/
Bug#983379: linux uml segfault
On 05/03/2021 18:32, Johannes Berg wrote: On 5 March 2021 18:39:42 CET, Anton Ivanov wrote: On 04/03/2021 07:47, Johannes Berg wrote: On Thu, 2021-03-04 at 14:38 +0900, Hajime Tazaki wrote: Now, I don't know how to fix it (short of changing your nsswitch configuration) - maybe we could somehow rename sem_init()? Or maybe we can somehow give the kernel binary a lower symbol resolution than the libc/libpthread. objcopy (from binutils) can localize symbols (i.e., objcopy -L sem_init $orig_file $new_file). It also does renaming symbols. But not sure this is the ideal solution. Yes, we started thinking about it but it was too late at night when I replied ... I think there's basically a way to have an external list of symbols to export, for symbol versioning, that we could/should use to basically not export any of the kernel symbols out to libs. How does UML handle symbol conflicts between userspace code and Linux kernel (like this case sem_init) ? AFAIK, libnl has a same symbol as Linux kernel (genlmsg_put) and others can possibly do as well. I fear it doesn't? Let's assume it does not, and try to fix this by de-conflicting the symbol. For the time being, also, let's aim for a Debian specific patch just to go into their "patches" dir for build so that UML is not dropped out of the release. This should make all internal uses of sem_init be um_sem_init in the actual object files. I will chase the issue of it picking up glibc memcpy separately. Upon close inspection it looks like a different issue - it is in the other direction (picking a dynamic symbol instead of the one from the tree). I spent all day chasing it today and I cannot reproduce it. At the same time it was reproducible yesterday without any problems :( +#ifdef CONFIG_UML +void __init um_sem_init(void) +#else void __init sem_init(void) +#endif Might be easier to just #define sem_init um_sem_init in an appropriate header file, perhaps even in arch/um/? I thought of that, but surrendered to the "dark side" of the quick and ugly fix. We can do that for the ipc/sem.c - it brings in uaccess.h which ultimately pulls uaccess from our asm tree. So if we do it there, it will end up in sem.c However, that function is also referenced and is invoked out of ipc/util.c which does not pull that include. I am going to dig through the rest of our includes to see if we can find a suitable one which will be picked up by both sem.c and util.c. I hope there is a place which we can use for a "proper" fix. By the way, I actually remember seeing a couple of includes like that somewhere dealing with other um symbol conflicts, just can't remember where I saw it. johannes -- Anton R. Ivanov https://www.kot-begemot.co.uk/
Bug#983379: linux uml segfault
On 04/03/2021 07:47, Johannes Berg wrote: On Thu, 2021-03-04 at 14:38 +0900, Hajime Tazaki wrote: Now, I don't know how to fix it (short of changing your nsswitch configuration) - maybe we could somehow rename sem_init()? Or maybe we can somehow give the kernel binary a lower symbol resolution than the libc/libpthread. objcopy (from binutils) can localize symbols (i.e., objcopy -L sem_init $orig_file $new_file). It also does renaming symbols. But not sure this is the ideal solution. Yes, we started thinking about it but it was too late at night when I replied ... I think there's basically a way to have an external list of symbols to export, for symbol versioning, that we could/should use to basically not export any of the kernel symbols out to libs. How does UML handle symbol conflicts between userspace code and Linux kernel (like this case sem_init) ? AFAIK, libnl has a same symbol as Linux kernel (genlmsg_put) and others can possibly do as well. I fear it doesn't? Let's assume it does not, and try to fix this by de-conflicting the symbol. For the time being, also, let's aim for a Debian specific patch just to go into their "patches" dir for build so that UML is not dropped out of the release. This should make all internal uses of sem_init be um_sem_init in the actual object files. I will chase the issue of it picking up glibc memcpy separately. Upon close inspection it looks like a different issue - it is in the other direction (picking a dynamic symbol instead of the one from the tree). I spent all day chasing it today and I cannot reproduce it. At the same time it was reproducible yesterday without any problems :( Ritesh, can you give the following a spin - it renames sem_init as um_sem_init for UML only? diff --git a/ipc/sem.c b/ipc/sem.c index f6c30a85dadf..5157796daf54 100644 --- a/ipc/sem.c +++ b/ipc/sem.c @@ -263,7 +263,11 @@ void sem_exit_ns(struct ipc_namespace *ns) } #endif +#ifdef CONFIG_UML +void __init um_sem_init(void) +#else void __init sem_init(void) +#endif { sem_init_ns(_ipc_ns); ipc_init_proc_interface("sysvipc/sem", diff --git a/ipc/util.h b/ipc/util.h index 5766c61aed0e..b3356efb3c96 100644 --- a/ipc/util.h +++ b/ipc/util.h @@ -47,7 +47,12 @@ extern int ipc_min_cycle; #define IPCMNI_IDX_MASK((1 << IPCMNI_SHIFT) - 1) #endif /* CONFIG_SYSVIPC_SYSCTL */ +#ifdef CONFIG_UML +void um_sem_init(void); +#define sem_init() um_sem_init() +#else void sem_init(void); +#endif void msg_init(void); void shm_init(void); johannes -- Anton R. Ivanov https://www.kot-begemot.co.uk/
Bug#983379: linux uml segfault
On 04/03/2021 18:41, Anton Ivanov wrote: On 04/03/2021 08:05, Benjamin Berg wrote: On Thu, 2021-03-04 at 08:47 +0100, Johannes Berg wrote: On Thu, 2021-03-04 at 14:38 +0900, Hajime Tazaki wrote: Now, I don't know how to fix it (short of changing your nsswitch configuration) - maybe we could somehow rename sem_init()? Or maybe we can somehow give the kernel binary a lower symbol resolution than the libc/libpthread. objcopy (from binutils) can localize symbols (i.e., objcopy -L sem_init $orig_file $new_file). It also does renaming symbols. But not sure this is the ideal solution. Yes, we started thinking about it but it was too late at night when I replied ... I think there's basically a way to have an external list of symbols to export, for symbol versioning, that we could/should use to basically not export any of the kernel symbols out to libs. Maybe using the ld --version-script= option here works to mark all kernel symbols as being "local" and prevent them from being picked up by libraries. Benjamin How does UML handle symbol conflicts between userspace code and Linux kernel (like this case sem_init) ? AFAIK, libnl has a same symbol as Linux kernel (genlmsg_put) and others can possibly do as well. I fear it doesn't? I can confirm that it did and this bug is bisect-able. with 5.7 # dd if=/dev/ubda of=/dev/null bs=1M 16384+1 records in 16384+1 records out 17179869696 bytes (17 GB, 16 GiB) copied, 10.6973 s, 1.6 GB/s with 5.10 the speed is 2.2 5.7 with "strings from glibc" patch speed is 2.2 As we did not do anything else in this timeframe to jack up the speed from 1.6GB/s to 2.2GB/s and as it is identical to the speed you get with the "use glibc strings.h" this looks like a good criteria to bisect on. I am going to do a bisect with 5.7 "good" and 5.10 "bad" using the speed test as a working hypothesis. This is proving very "interesting" to try to chase down, because the "picking the wrong library" does not happen every time. F.E. yesterday my 5.10 builds were picking glibc memcpy and friends. Today with the same config and everything else the same it is picking built-ins. I need to finds some better way to reproduce this. A. A. johannes ___ linux-um mailing list linux...@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-um ___ linux-um mailing list linux...@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-um -- Anton R. Ivanov https://www.kot-begemot.co.uk/
Bug#983379: linux uml segfault
On 04/03/2021 08:05, Benjamin Berg wrote: On Thu, 2021-03-04 at 08:47 +0100, Johannes Berg wrote: On Thu, 2021-03-04 at 14:38 +0900, Hajime Tazaki wrote: Now, I don't know how to fix it (short of changing your nsswitch configuration) - maybe we could somehow rename sem_init()? Or maybe we can somehow give the kernel binary a lower symbol resolution than the libc/libpthread. objcopy (from binutils) can localize symbols (i.e., objcopy -L sem_init $orig_file $new_file). It also does renaming symbols. But not sure this is the ideal solution. Yes, we started thinking about it but it was too late at night when I replied ... I think there's basically a way to have an external list of symbols to export, for symbol versioning, that we could/should use to basically not export any of the kernel symbols out to libs. Maybe using the ld --version-script= option here works to mark all kernel symbols as being "local" and prevent them from being picked up by libraries. Benjamin How does UML handle symbol conflicts between userspace code and Linux kernel (like this case sem_init) ? AFAIK, libnl has a same symbol as Linux kernel (genlmsg_put) and others can possibly do as well. I fear it doesn't? I can confirm that it did and this bug is bisect-able. with 5.7 # dd if=/dev/ubda of=/dev/null bs=1M 16384+1 records in 16384+1 records out 17179869696 bytes (17 GB, 16 GiB) copied, 10.6973 s, 1.6 GB/s with 5.10 the speed is 2.2 5.7 with "strings from glibc" patch speed is 2.2 As we did not do anything else in this timeframe to jack up the speed from 1.6GB/s to 2.2GB/s and as it is identical to the speed you get with the "use glibc strings.h" this looks like a good criteria to bisect on. I am going to do a bisect with 5.7 "good" and 5.10 "bad" using the speed test as a working hypothesis. A. johannes ___ linux-um mailing list linux...@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-um ___ linux-um mailing list linux...@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-um -- Anton R. Ivanov https://www.kot-begemot.co.uk/
Bug#983379: linux uml segfault
On 04/03/2021 05:38, Hajime Tazaki wrote: On Thu, 04 Mar 2021 07:40:00 +0900, Johannes Berg wrote: I think the problem is here: #24 0x6080f234 in ipc_init_ids (ids=0x60c60de8 ) at ipc/util.c:119 #25 0x60813c6d in sem_init_ns (ns=0x60d895bb ) at ipc/sem.c:254 #26 0x60015b5d in sem_init () at ipc/sem.c:268 #27 0x7f89906d92f7 in ?? () from /lib/x86_64-linux- gnu/libcom_err.so.2 You're in the init of libcom_err.so.2, which is loaded by "libnss_nis.so.2" which is loaded by normal NSS code (getgrnam): #40 0x7f89909bf3a6 in nss_load_library (ni=ni@entry=0x61497db0) at nsswitch.c:359 #41 0x7f89909bfc39 in __GI___nss_lookup_function (ni=0x61497db0, fct_name=, fct_name@entry=0x7f899089b020 "setgrent") at nsswitch.c:467 #42 0x7f899089554b in init_nss_interface () at nss_compat/compat- grp.c:83 #43 init_nss_interface () at nss_compat/compat-grp.c:79 #44 0x7f8990895e35 in _nss_compat_getgrnam_r (name=0x7f8990a2a1e0 "tty", grp=0x7ffe3e7a2910, buffer=0x7ffe3e7a24e0 "", buflen=1024, errnop=0x7f899089eb00) at nss_compat/compat-grp.c:486 #45 0x7f8990968b85 in __getgrnam_r (name=name@entry=0x7f8990a2a1e0 "tty", resbuf=resbuf@entry=0x7ffe3e7a2910, buffer=buffer@entry=0x7ffe3e7a24e0 "", buflen=1024, result=result@entry=0x7ffe3e7a2908) at ../nss/getXXbyYY_r.c:315 You have a strange nsswitch configuration that causes all of this (libnss_nis.so.2 -> libcom_err.so.2) to get loaded. Now libcom_err.so.2 is trying to call sem_init(), and that gets ... tada ... Linux's sem_init() instead of libpthread's. And then the crash. Now, I don't know how to fix it (short of changing your nsswitch configuration) - maybe we could somehow rename sem_init()? Or maybe we can somehow give the kernel binary a lower symbol resolution than the libc/libpthread. objcopy (from binutils) can localize symbols (i.e., objcopy -L sem_init $orig_file $new_file). It also does renaming symbols. But not sure this is the ideal solution. How does UML handle symbol conflicts between userspace code and Linux kernel (like this case sem_init) ? AFAIK, libnl has a same symbol as Linux kernel (genlmsg_put) and others can possibly do as well. It used to handle them. I do not think it does now - something broke and it's fairly recent. I actually have something which confirms this. I worked on a patch around 5.8-5.9 which would give the option to pick up libc equivalents for the functions from string.h and there was a clear performance difference of ~ 20%+ This is because UML has no means of optimizing them and picks up the worst case scenario x86 version. I parked that for a while, because had to look at other stuff at work. I restarted working on it after 5.10. My first observation was that despite not changing anything in the patches, the gain was no longer there. The performance was the same as if it picked up libc equivalents. I can either try to reproduce the nss config which causes the sem_init issue or use my own libc patchset to try to dissect. The problem commit will be roughly around the time the performance difference from applying the "switch to libc" goes away. Brgds, A. -- Hajime ___ linux-um mailing list linux...@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-um -- Anton R. Ivanov https://www.kot-begemot.co.uk/
Bug#983379: linux uml segfault
On 03/03/2021 22:40, Johannes Berg wrote: I think the problem is here: #24 0x6080f234 in ipc_init_ids (ids=0x60c60de8 ) at ipc/util.c:119 #25 0x60813c6d in sem_init_ns (ns=0x60d895bb ) at ipc/sem.c:254 #26 0x60015b5d in sem_init () at ipc/sem.c:268 #27 0x7f89906d92f7 in ?? () from /lib/x86_64-linux- gnu/libcom_err.so.2 You're in the init of libcom_err.so.2, which is loaded by "libnss_nis.so.2" which is loaded by normal NSS code (getgrnam): #40 0x7f89909bf3a6 in nss_load_library (ni=ni@entry=0x61497db0) at nsswitch.c:359 #41 0x7f89909bfc39 in __GI___nss_lookup_function (ni=0x61497db0, fct_name=, fct_name@entry=0x7f899089b020 "setgrent") at nsswitch.c:467 #42 0x7f899089554b in init_nss_interface () at nss_compat/compat- grp.c:83 #43 init_nss_interface () at nss_compat/compat-grp.c:79 #44 0x7f8990895e35 in _nss_compat_getgrnam_r (name=0x7f8990a2a1e0 "tty", grp=0x7ffe3e7a2910, buffer=0x7ffe3e7a24e0 "", buflen=1024, errnop=0x7f899089eb00) at nss_compat/compat-grp.c:486 #45 0x7f8990968b85 in __getgrnam_r (name=name@entry=0x7f8990a2a1e0 "tty", resbuf=resbuf@entry=0x7ffe3e7a2910, buffer=buffer@entry=0x7ffe3e7a24e0 "", buflen=1024, result=result@entry=0x7ffe3e7a2908) at ../nss/getXXbyYY_r.c:315 You have a strange nsswitch configuration that causes all of this (libnss_nis.so.2 -> libcom_err.so.2) to get loaded. Now libcom_err.so.2 is trying to call sem_init(), and that gets ... tada ... Linux's sem_init() instead of libpthread's. And then the crash. Now, I don't know how to fix it (short of changing your nsswitch configuration) - maybe we could somehow rename sem_init()? Or maybe we can somehow give the kernel binary a lower symbol resolution than the libc/libpthread. I have not looked in depth in how the linking process works, but it should have picked up the sem_init from the kernel library, not libc. We are already supposed to do that regarding kernel vs libc string.h functions - memcpy, etc. Though for all of them the libc does the same so invoking the wrong one does not kill you so this may have been broken for a while and we were simply not noticing it. johannes -- Anton R. Ivanov https://www.kot-begemot.co.uk/
Bug#983379: linux uml segfault
On 03/03/2021 10:45, Ritesh Raj Sarraf wrote: HI Anton, On Wed, 2021-03-03 at 09:30 +, Anton Ivanov wrote: OTOH, I have one more user (other than you) who's not been able to reproduce the issue. I will do a dissect the moment I figure out how to reproduce it. I will try to do some more experiments on that tomorrow. I tried to alter the userspace a bit, but it makes no difference. Out of curiosity, what are you running it on? Bare-metal machines. 3 different machines, all Intel processors. And it fails on all 3 of them. Hmmm... All mine are AMD. I can try to boot up an Intel later today with Bullseye to see if it makes a difference. On the distribution side, all 3 of them run Debian Unstable, with Linux 5.10.13 The code here is: static inline u32 printk_caller_id(void) { return in_task() ? task_pid_nr(current) : 0x8000 + raw_smp_processor_id(); } That is something which should not bomb out unless we have memory corruption or something along those lines - current being invalid. Must be something different. Not all machines could have bad memory at the same time. I did not mean bad memory. I meant memory corruption as a result of race, buffer overrun or anything else like that. -- Anton R. Ivanov https://www.kot-begemot.co.uk/
Bug#983379: linux uml segfault
On 02/03/2021 17:27, Ritesh Raj Sarraf wrote: On Tue, 2021-03-02 at 17:05 +, Anton Ivanov wrote: So the best I can extract for you is to compile the kernel with as much information as possible. Can you try using one of the older kernels so we can verify if this is indeed a 5.10 thing. That was the first thing I tried. I tested it with 5.10, 5.9 and 5.4. All 3 crashed. That's when I knew this one was going to be painful one to conclude. The only other input I have is that I have one more user who's reported to be able to reproduce the issue. OTOH, I have one more user (other than you) who's not been able to reproduce the issue. I will do a dissect the moment I figure out how to reproduce it. I will try to do some more experiments on that tomorrow. I tried to alter the userspace a bit, but it makes no difference. Out of curiosity, what are you running it on? Meanwhile, I enabled some debug info in the kernel. Here's what I have got so far: ``` (gdb) bt #0 0x7f89908dc087 in kill () at ../sysdeps/unix/syscall- template.S:120 #1 0x604a3514 in uml_abort () at arch/um/os-Linux/util.c:94 #2 0x604a3791 in os_dump_core () at arch/um/os- Linux/util.c:149 #3 0x6048d126 in panic_exit (self=0x2e66d5, unused1=6, unused2=0x0) at arch/um/kernel/um_arch.c:217 #4 0x604c725a in notifier_call_chain (nl=0x2e66d5, val=0, v=0x60d82f40 , nr_to_call=-1, nr_calls=0x0) at kernel/notifier.c:83 #5 0x604c72f6 in atomic_notifier_call_chain (nh=0x2e66d5, val=6, v=0x0) at kernel/notifier.c:217 #6 0x60a54607 in panic (fmt=0x60a55225 "UH\211\345H\201\354", ) at kernel/panic.c:272 #7 0x6048cca3 in segv (fi=, ip=1615717312, is_user=0, regs=0x60c2ee58 ) at arch/um/kernel/trap.c:246 #8 0x6048ce64 in segv_handler (sig=3040981, unused_si=0x6, regs=0x60c2ee58 ) at arch/um/kernel/trap.c:190 #9 0x604a2556 in sig_handler_common (sig=11, si=0x60c2fbf0 , mc=0x60c2fae8 ) at arch/um/os-Linux/signal.c:48 #10 0x604a2aa2 in sig_handler (sig=3040981, si=0x6, mc=0x0) at arch/um/os-Linux/signal.c:81 #11 0x604a265f in hard_handler (sig=3040981, si=0x60c2fbf0 , p=0x0) at arch/um/os-Linux/signal.c:180 #12 The code here is: static inline u32 printk_caller_id(void) { return in_task() ? task_pid_nr(current) : 0x8000 + raw_smp_processor_id(); } That is something which should not bomb out unless we have memory corruption or something along those lines - current being invalid. A. #13 0x604de3c0 in printk_caller_id () at kernel/printk/printk.c:1924 #14 log_output (text_len=, text=, dev_info=, lflags=, level=, facility=) at kernel/printk/printk.c:1932 #15 vprintk_store (facility=1624806843, level=5, dev_info=0x0, fmt=0x35 , args=0x1) at kernel/printk/printk.c:2004 #16 0x604de8b7 in vprintk_emit (facility=1624806843, level=1622768673, dev_info=0x35, fmt=0x1 , args=0x60b97c22) at kernel/printk/printk.c:2029 #17 0x604debad in vprintk_deferred (fmt=0x1 , args=0x60b97c21) at kernel/printk/printk.c:3079 #18 0x60a554de in printk_deferred (fmt=0x60d895bb "\n") at kernel/printk/printk.c:3091 #19 0x6092680f in _warn_unseeded_randomness (previous=, caller=, func_name=) at drivers/char/random.c:1534 #20 _warn_unseeded_randomness (func_name=0x60abf380 <__func__.38> "get_random_u32", caller=0x608b5f25 , previous=0x35) at drivers/char/random.c:1516 #21 0x60927d47 in get_random_u32 () at drivers/char/random.c:2221 #22 0x608b5f25 in bucket_table_alloc (nbuckets=64, gfp=3264, ht=) at lib/rhashtable.c:203 #23 0x608b6733 in rhashtable_init (ht=0x60c60e30 , params=0x608b5e06 ) at lib/rhashtable.c:1061 #24 0x6080f234 in ipc_init_ids (ids=0x60c60de8 ) at ipc/util.c:119 #25 0x60813c6d in sem_init_ns (ns=0x60d895bb ) at ipc/sem.c:254 #26 0x60015b5d in sem_init () at ipc/sem.c:268 #27 0x7f89906d92f7 in ?? () from /lib/x86_64-linux- gnu/libcom_err.so.2 #28 0x7f8990ab8fb2 in call_init (l=, argc=argc@entry=5, argv=argv@entry=0x7ffe3e7a4c98, env=env@entry=0x7ffe3e7a4cc8) at dl-init.c:72 #29 0x7f8990ab90b9 in call_init (env=0x7ffe3e7a4cc8, argv=0x7ffe3e7a4c98, argc=5, l=) at dl-init.c:30 #30 _dl_init (main_map=0x61497ea0, argc=5, argv=0x7ffe3e7a4c98, env=0x7ffe3e7a4cc8) at dl-init.c:119 #31 0x7f89909d82bd in __GI__dl_catch_exception (exception=exception@entry=0x0, operate=operate@entry=0x7f8990abc5a0 , args=args@entry=0x7ffe3e7a1e80) at dl-error- skeleton.c:182 #32 0x7f8990abd028 in dl_open_worker (a=a@entry=0x7ffe3e7a2020) at dl-open.c:758 #33 0x7f89909d8260 in __GI__dl_catch_exception (exception=exception@entry=0x7ffe3e7a2000, operate=operate@entry=0x7f8990abcc70 , args=args@entry=0x7ffe3e7a2020) at dl-error-skeleton.c:208 #34 0x7f8990abc8ca in _dl_open (file=0x7ffe3e7a22a0 "libnss_nis.so.2", mode=-2147483646, caller_dlopen=0x7f89909bf3a6 , nsid=-2, argc=5, argv=0x7ffe3e7a2
Bug#983379: linux uml segfault
On 02/03/2021 14:23, Ritesh Raj Sarraf wrote: On Tue, 2021-03-02 at 11:34 +, Anton Ivanov wrote: If gdb gives you the exact lines, that may be helpful. It doesn't. But it does show drawbacks in my packaging. The debug symbols packaged are not read/honored by gdb at all. ``` Reading symbols from /usr/bin/linux.uml... Reading symbols from /usr/lib/debug/.build- id/6f/ea141539149074c72e80fb8004de124fda115b.debug... (No debugging symbols found in /usr/lib/debug/.build- id/6f/ea141539149074c72e80fb8004de124fda115b.debug) warning: Can't open file /dev/shm/#20817 (deleted) during file-backed mapping note processing [New LWP 18788] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux- gnu/libthread_db.so.1". Core was generated by `linux ubd0=qemu-linux-image.img'. Program terminated with signal SIGABRT, Aborted. #0 0x7f51842c0087 in kill () at ../sysdeps/unix/syscall- template.S:120 120 ../sysdeps/unix/syscall-template.S: No such file or directory. (gdb) bt #0 0x7f51842c0087 in kill () at ../sysdeps/unix/syscall- template.S:120 #1 0x6049dc20 in uml_abort () #2 0x6049de7a in os_dump_core () #3 0x60486e47 in panic_exit () #4 0x604c0a03 in notifier_call_chain () #5 0x604c0a98 in atomic_notifier_call_chain () #6 0x60a26b85 in panic () #7 0x604869e1 in segv () #8 0x60486ba9 in segv_handler () #9 0x6049ccc0 in sig_handler_common () #10 0x6049d1ec in sig_handler () #11 0x6049cdc6 in hard_handler () #12 #13 0x604d45b4 in vprintk_store () #14 0x604d4aa8 in vprintk_emit () #15 0x604d4d86 in vprintk_deferred () #16 0x60a27a02 in printk_deferred () #17 0x609031b2 in get_random_u32 () #18 0x6088ff65 in bucket_table_alloc.isra () #19 0x60890740 in rhashtable_init () #20 0x607efaa2 in ipc_init_ids () #21 0x600153c9 in sem_init () ``` So the best I can extract for you is to compile the kernel with as much information as possible. Can you try using one of the older kernels so we can verify if this is indeed a 5.10 thing. I will do a dissect the moment I figure out how to reproduce it. I will try to do some more experiments on that tomorrow. Thanks, Ritesh -- Anton R. Ivanov https://www.kot-begemot.co.uk/
Bug#983379: linux uml segfault
On 02/03/2021 09:09, Ritesh Raj Sarraf wrote: On Wed, 2021-02-24 at 11:44 +, Anton Ivanov wrote: In all cases it boots cleanly and there are no segfaults. So, frankly, no idea what is causing it to crash - I have run most combinations of 5.10 on a 5.10, all work fine here. Is there any other way I can help you with this issue ? I do have the core dump available on my local machine. If gdb gives you the exact lines, that may be helpful. I have looked through the bt several times, it is something through which my set-up cruises through. The actual moment you see in the backtrace is this one: [0.08] random: get_random_u32 called from bucket_table_alloc.isra.0+0x115/0x13d with crng_init=0 However, in your case, instead of getting this printk warning out it blows up. Why - I don't know. A. ___ linux-um mailing list linux...@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-um -- Anton R. Ivanov https://www.kot-begemot.co.uk/
Bug#983379: linux uml segfault
On 23/02/2021 17:26, Ritesh Raj Sarraf wrote: Added the debian bug report in CC. On Tue, 2021-02-23 at 17:19 +, Anton Ivanov wrote: The current Debian user-mode-linux package in unstable is based on the 5.10.5 stable source which includes the mentioned patch, but is still causing an error for some users. After updating the tree to 5.10.5 and applying all Debian patches from the package, I cannot reproduce the bug. I am running it on 5.10, 5.2 and 4.19 hosts with the same parameters without issues. Hosts are all up to date Debian 10.8 and so is the UML userspace. Did you mean 5.10, 5.2 and 4.19 (UML) guests ? We've seen this happen on Debian Testing and Unstable Host (of which the former would soon be the next stable i.e. Debian Bullseye). In our tests, when running the same linux uml binary (5.10) on a Debian Stable Host, it is working fine. I cannot reproduce it on a physical Bullseye host using the Debian user-mode-linux package compiled from source. Environment - Bullseye minimal install and build deps. 6 cores/12 threads Ryzen I cannot reproduce it using the upstream source and the patches from the user-mode-linux package Environment - same as above. I cannot reproduce it using the upstream source + patches and compiling on Buster using the following: 1. Bullseye physical host, minimal install, same hardware 2. Bullseye VM, minimal install, running with 4 vCPUs on the same host 3. Bullseye LXC container running on a Debian Buster host, minimal install, same hardware In all cases it boots cleanly and there are no segfaults. So, frankly, no idea what is causing it to crash - I have run most combinations of 5.10 on a 5.10, all work fine here. -- Anton R. Ivanov https://www.kot-begemot.co.uk/
Bug#983379: linux uml segfault
On 23/02/2021 17:26, Ritesh Raj Sarraf wrote: Added the debian bug report in CC. On Tue, 2021-02-23 at 17:19 +, Anton Ivanov wrote: The current Debian user-mode-linux package in unstable is based on the 5.10.5 stable source which includes the mentioned patch, but is still causing an error for some users. After updating the tree to 5.10.5 and applying all Debian patches from the package, I cannot reproduce the bug. I am running it on 5.10, 5.2 and 4.19 hosts with the same parameters without issues. Hosts are all up to date Debian 10.8 and so is the UML userspace. Did you mean 5.10, 5.2 and 4.19 (UML) guests ? No. Hosts. I have several 6core/12thread Ryzens which are used for development testing. They all use identical userspace with the sole difference being the kernel. They all use a selection of 5.x because 4.19 does not support the hardware properly. The 4.19 testing is done on my old "test farm" which is all A8s and Athlon X760. We've seen this happen on Debian Testing and Unstable Host (of which the former would soon be the next stable i.e. Debian Bullseye). In our tests, when running the same linux uml binary (5.10) on a Debian Stable Host, it is working fine. OK. I will upgrade one of my systems to Debian testing to try to reproduce this. -- Anton R. Ivanov https://www.kot-begemot.co.uk/
Bug#940821: NFS Caching broken in 4.19.37
On 20/02/2021 20:04, Salvatore Bonaccorso wrote: Hi, On Mon, Jul 08, 2019 at 07:19:54PM +0100, Anton Ivanov wrote: Hi list, NFS caching appears broken in 4.19.37. The more cores/threads the easier to reproduce. Tested with identical results on Ryzen 1600 and 1600X. 1. Mount an openwrt build tree over NFS v4 2. Run make -j `cat /proc/cpuinfo | grep vendor | wc -l` ; make clean in a loop 3. Result after 3-4 iterations: State on the client ls -laF /var/autofs/local/src/openwrt/build_dir/target-mips_24kc_musl/linux-ar71xx_tiny/linux-4.14.125/arch/mips/include/generated/uapi/asm total 8 drwxr-xr-x 2 anivanov anivanov 4096 Jul 8 11:40 ./ drwxr-xr-x 3 anivanov anivanov 4096 Jul 8 11:40 ../ State as seen on the server (mounted via nfs from localhost): ls -laF /var/autofs/local/src/openwrt/build_dir/target-mips_24kc_musl/linux-ar71xx_tiny/linux-4.14.125/arch/mips/include/generated/uapi/asm total 12 drwxr-xr-x 2 anivanov anivanov 4096 Jul 8 11:40 ./ drwxr-xr-x 3 anivanov anivanov 4096 Jul 8 11:40 ../ -rw-r--r-- 1 anivanov anivanov 32 Jul 8 11:40 ipcbuf.h Actual state on the filesystem: ls -laF /exports/work/src/openwrt/build_dir/target-mips_24kc_musl/linux-ar71xx_tiny/linux-4.14.125/arch/mips/include/generated/uapi/asm total 12 drwxr-xr-x 2 anivanov anivanov 4096 Jul 8 11:40 ./ drwxr-xr-x 3 anivanov anivanov 4096 Jul 8 11:40 ../ -rw-r--r-- 1 anivanov anivanov 32 Jul 8 11:40 ipcbuf.h So the client has quite clearly lost the plot. Telling it to drop caches and re-reading the directory shows the file present. It is possible to reproduce this using a linux kernel tree too, just takes much more iterations - 10+ at least. Both client and server run 4.19.37 from Debian buster. This is filed as debian bug 931500. I originally thought it to be autofs related, but IMHO it is actually something fundamentally broken in nfs caching resulting in cache corruption. According to the reporter downstream in Debian, at https://bugs.debian.org/940821#26 thi seem still reproducible with more recent kernels than the initial reported. Is there anything Anton can provide to try to track down the issue? Anton, can you reproduce with current stable series? 100% reproducible with any kernel from 4.9 to 5.4, stable or backports. It may exist in earlier versions, but I do not have a machine with anything before 4.9 to test at present. From 1-2 make clean && make cycles to one afternoon depending on the number of machine cores. More cores/threads the faster it does it. I tried playing with protocol minor versions, caching options, etc - it is still reproducible for any nfs4 settings as long as there is client side caching of metadata. A. Regards, Salvatore -- Anton R. Ivanov Cambridgegreys Limited. Registered in England. Company Number 10273661 https://www.cambridgegreys.com/
Bug#940821: closed by Bastian Blank (No response by submitter)
On 20/02/2021 10:33, Debian Bug Tracking System wrote: This is an automatic notification regarding your Bug report which was filed against the src:linux package: #940821: linux-image-5.2.0-2-amd64: file cache corruption with nfs4 It has been closed by Bastian Blank . Their explanation is attached below along with your original report. If this explanation is unsatisfactory and you have not received a better one in a separate message then please contact Bastian Blank by replying to this email. I missed the question. Probably hit the spam bucket for some reason. I am able to reproduce it with more recent versions as well. The most recent one I have around is 5.4.0-0.bpo.2-amd64 Still reproducible 100% - just tested it. It is trivial to reproduce if anyone actually bothers to do so. Just grab a big enough tree where make runs truly in parallel - openwrt is best, but even the Linux kernel does the job. Mount it via nfs4 from another server (it will work even locally, but takes longer to reproduce - may take a whole afternoon) Run while make -j 12 clean && make -j 12 ; do true ; done Leave it to run. On 6 cores/12 threads it takes 2-3 builds of openwrt or ~ 5-8 linux kernel builds to blow up. More cores - faster. Less cores slower. I sent it to the mailing list too, but nobody could be bothered to even ask any questions. -- Anton R. Ivanov https://www.kot-begemot.co.uk/
Bug#938962: [PATCH] um: Add back support for extra userspace libraries
PCAP and VDE network transports require linking with userspace libraries. The current build system has no means of passing these as arguments. This patch adds a script to expand the library list for linking for these transports as well as any future driver that needs to rely on additional libraries on the userspace side. Signed-off-by: Anton Ivanov --- arch/um/scripts/extra-libs.sh | 10 ++ scripts/link-vmlinux.sh | 4 +++- 2 files changed, 13 insertions(+), 1 deletion(-) create mode 100644 arch/um/scripts/extra-libs.sh diff --git a/arch/um/scripts/extra-libs.sh b/arch/um/scripts/extra-libs.sh new file mode 100644 index ..0592485e0675 --- /dev/null +++ b/arch/um/scripts/extra-libs.sh @@ -0,0 +1,10 @@ +#!/bin/sh + +# This file should be included from link-vmlinux, not executed!!! + +if [ "${CONFIG_UML_NET_VDE}" = "y" ] ; then + UML_EXTRA_LIBS="$UML_EXTRA_LIBS -lvde -lvdeplug" +fi +if [ "${CONFIG_UML_NET_PCAP}" = "y" ] ; then + UML_EXTRA_LIBS="$UML_EXTRA_LIBS -lpcap" +fi diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh index 06495379fcd8..15f9e5096da0 100755 --- a/scripts/link-vmlinux.sh +++ b/scripts/link-vmlinux.sh @@ -90,11 +90,13 @@ vmlinux_link() -Wl,--end-group \ ${@}" + . arch/um/scripts/extra-libs.sh + ${CC} ${CFLAGS_vmlinux} \ -o ${output}\ -Wl,-T,${lds} \ ${objects} \ - -lutil -lrt -lpthread + -lutil -lrt -lpthread ${UML_EXTRA_LIBS} rm -f linux fi } -- 2.20.1
Bug#938962: [PATCH] um: Fix pcap and vde driver builds
On 16/10/2019 08:53, Anton Ivanov wrote: Signed-off-by: Anton Ivanov --- arch/um/drivers/Makefile | 8 scripts/link-vmlinux.sh | 2 +- 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/arch/um/drivers/Makefile b/arch/um/drivers/Makefile index 693319839f69..34355057ec85 100644 --- a/arch/um/drivers/Makefile +++ b/arch/um/drivers/Makefile @@ -24,6 +24,14 @@ LDFLAGS_vde.o := -r $(shell $(CC) $(CFLAGS) -print-file-name=libvdeplug.a) targets := pcap_kern.o pcap_user.o vde_kern.o vde_user.o +ifeq ($(CONFIG_UML_NET_PCAP),y) + export UML_EXTRA_LIBS += -lpcap +endif +ifeq ($(CONFIG_UML_NET_VDE),y) + export UML_EXTRA_LIBS += -lvde -lvdeplug +endif + + $(obj)/pcap.o: $(obj)/pcap_kern.o $(obj)/pcap_user.o $(LD) -r -dp -o $@ $^ $(ld_flags) diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh index 915775eb2921..d3e6a6cdfc13 100755 --- a/scripts/link-vmlinux.sh +++ b/scripts/link-vmlinux.sh @@ -86,7 +86,7 @@ vmlinux_link() ${CC} ${CFLAGS_vmlinux} -o ${2} \ -Wl,-T,${lds} \ ${objects} \ - -lutil -lrt -lpthread + -lutil -lrt -lpthread ${UML_EXTRA_LIBS} rm -f linux fi } This will not work as advertised unfortunately - I have to write out the libs list somewhere and load it again in the link script instead of passing it as an environment variable. A fixed patch will follow shortly. -- Anton R. Ivanov Cambridgegreys Limited. Registered in England. Company Number 10273661 https://www.cambridgegreys.com/
Bug#938962: [PATCH] um: Fix pcap and vde driver builds
Signed-off-by: Anton Ivanov --- arch/um/drivers/Makefile | 8 scripts/link-vmlinux.sh | 2 +- 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/arch/um/drivers/Makefile b/arch/um/drivers/Makefile index 693319839f69..34355057ec85 100644 --- a/arch/um/drivers/Makefile +++ b/arch/um/drivers/Makefile @@ -24,6 +24,14 @@ LDFLAGS_vde.o := -r $(shell $(CC) $(CFLAGS) -print-file-name=libvdeplug.a) targets := pcap_kern.o pcap_user.o vde_kern.o vde_user.o +ifeq ($(CONFIG_UML_NET_PCAP),y) + export UML_EXTRA_LIBS += -lpcap +endif +ifeq ($(CONFIG_UML_NET_VDE),y) + export UML_EXTRA_LIBS += -lvde -lvdeplug +endif + + $(obj)/pcap.o: $(obj)/pcap_kern.o $(obj)/pcap_user.o $(LD) -r -dp -o $@ $^ $(ld_flags) diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh index 915775eb2921..d3e6a6cdfc13 100755 --- a/scripts/link-vmlinux.sh +++ b/scripts/link-vmlinux.sh @@ -86,7 +86,7 @@ vmlinux_link() ${CC} ${CFLAGS_vmlinux} -o ${2} \ -Wl,-T,${lds} \ ${objects} \ - -lutil -lrt -lpthread + -lutil -lrt -lpthread ${UML_EXTRA_LIBS} rm -f linux fi } -- 2.20.1
Bug#938962: Build fix for VDE and PCAP drivers
Hi all, A patch to fix the build for these follows. I will stick to my original suggestion - pcap should be obsoleted in favour of vector raw + BPF firmware load. The latter will work on interfaces where gso/gro is enabled. The original pcap will fail on that due to the 1500 bytes size limit in the legacy net code. I had to dig the root cause here and figure out what is going on while working on an AF_XDP transport as that had the same problem - it needed to pass -lbpf -lelf -lz which could not be passed under the current build system. A.
Bug#938962: [PATCH] um: Loadable BPF "Firmware" for vector drivers
On 01/10/2019 08:50, Johannes Berg wrote: On Mon, 2019-09-30 at 14:19 +0100, Anton Ivanov wrote: All vector drivers now allow a BPF program to be loaded and associated with the RX socket in the host kernel. 1. The program can be loaded as an extra kernel command line option to any of the drivers. 2. The program can also be loaded as "firmware", using the ethtool flash option. It is possible to turn this facility on or off using a command line option. A simplistic wrapper for generating the BPF firmware for the raw socket driver out of a tcpdump/libpcap filter expression can be found at: https://github.com/kot-begemot-uk/uml_vector_utilities/ That's kinda cool. Why just BPF though, not eBPF with all that brings? The filter language for the SOCKOPT is specified as BPF everywhere. I have not looked at what the sockopt does in the host kernel under the hood and if it takes eBPF. Also, the intention is to provide backward compatible wrappers for the existing pcap functionality as per the Debian bug which is cc-ed and that generates/uses basic BPF out of a pcap expression. We can add those to the "uml-utilities" package present in Debian and other distros. I will try to get around and write a wrapper which takes legacy UML network interface arguments and rewrites them as options for the new vector drivers. Or is that because the BPF filter is actually attached to the socket in the host, if I'm reading this correctly? Yes. The idea is to offload it from the guest to the host. I have had this idea as well as some PoC code to do that since like 2012. (e)BPF is an excellent way to represent "firmware" for vNICs, I am surprised it is not in active use :) It should be possible to expand the concept for other stuff like AF_XDP, etc but I need to get around to implement that in the first place. Couple of style nits below: +static bool get_bpf_flash(struct arglist *def) +{ + return uml_vector_fetch_arg(def, "bpfflash") != NULL; +} + + Needs just one blank line? @@ -1125,6 +1142,7 @@ static int vector_net_close(struct net_device *dev) netif_stop_queue(dev); del_timer(>tl); + if (vp->fds == NULL) return 0; not needed @@ -1139,6 +1157,8 @@ static int vector_net_close(struct net_device *dev) } tasklet_kill(>tx_poll); if (vp->fds->rx_fd > 0) { + if (vp->bpf) + uml_vector_detach_bpf(vp->fds->rx_fd, vp->bpf); os_close_file(vp->fds->rx_fd); vp->fds->rx_fd = -1; } I guess you moved some code here or something and the blank line was left? +/* + * We cannot use the firmware.c loader API here because this is not a module + * and we do not have a proper device structure to pass to it as required + * by the firmware API + */ You just have to make up a platform device, see e.g. net/wireless/reg.c. IMHO better than open-coding all this. Good idea. @@ -1528,8 +1618,9 @@ static void vector_eth_configure( .in_write_poll = false, .coalesce = 2, .req_size = get_req_size(def), - .in_error = false - }); + .in_error = false, + .bpf= NULL + }); That's not really needed, but I guess you have everything here anyway. +int uml_vector_detach_bpf(int fd, void *bpf) +{ + struct sock_fprog *prog = bpf; + + int err = setsockopt(fd, SOL_SOCKET, SO_DETACH_FILTER, bpf, sizeof(struct sock_fprog)); Spurious blank line, line too long. -void *uml_vector_default_bpf(int fd, void *mac) + if (err < 0) + printk(KERN_ERR BPF_DETACH_FAIL, prog->len, prog->filter, fd, -errno); also looks pretty long + return err; +} +void *uml_vector_default_bpf(void *mac) { struct sock_filter *bpf; uint32_t *mac1 = (uint32_t *)(mac + 2); uint16_t *mac2 = (uint16_t *) mac; - struct sock_fprog bpf_prog = { - .len = 6, - .filter = NULL, - }; + struct sock_fprog *bpf_prog; + bpf_prog = uml_kmalloc(sizeof(struct sock_fprog), UM_GFP_KERNEL); + if (bpf_prog != NULL) { generally, kernel coding style prefers to remove " != NULL" (per checkpatch, anyway) + bpf_prog->len = DEFAULT_BPF_LEN; + bpf_prog->filter = NULL; + } else + return NULL; and braces on all branches of if statements johannes Ack - I will look at the other bits, thanks for reviewing it. ___ linux-um mailing list linux...@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-um -- Anton R. Ivanov Cambridgegreys Limited. Registered in England. Company Number 10273661 https://www.cambridgegreys.com/
Bug#938962: [PATCH] um: Loadable BPF "Firmware" for vector drivers
All vector drivers now allow a BPF program to be loaded and associated with the RX socket in the host kernel. 1. The program can be loaded as an extra kernel command line option to any of the drivers. 2. The program can also be loaded as "firmware", using the ethtool flash option. It is possible to turn this facility on or off using a command line option. A simplistic wrapper for generating the BPF firmware for the raw socket driver out of a tcpdump/libpcap filter expression can be found at: https://github.com/kot-begemot-uk/uml_vector_utilities/ Signed-off-by: Anton Ivanov --- arch/um/drivers/vector_kern.c | 109 +++--- arch/um/drivers/vector_kern.h | 8 ++- arch/um/drivers/vector_user.c | 94 +++-- arch/um/drivers/vector_user.h | 8 ++- 4 files changed, 190 insertions(+), 29 deletions(-) diff --git a/arch/um/drivers/vector_kern.c b/arch/um/drivers/vector_kern.c index af27d5c41776..7453b99ac1d2 100644 --- a/arch/um/drivers/vector_kern.c +++ b/arch/um/drivers/vector_kern.c @@ -1,5 +1,5 @@ /* - * Copyright (C) 2017 - Cambridge Greys Limited + * Copyright (C) 2017 - 2019 Cambridge Greys Limited * Copyright (C) 2011 - 2014 Cisco Systems Inc * Copyright (C) 2001 - 2007 Jeff Dike (jdike@{addtoit,linux.intel}.com) * Copyright (C) 2001 Lennert Buytenhek (buyt...@gnu.org) and @@ -21,6 +21,9 @@ #include #include #include +#include +#include +#include #include #include #include @@ -128,6 +131,17 @@ static int get_mtu(struct arglist *def) return ETH_MAX_PACKET; } +static char *get_bpf_file(struct arglist *def) +{ + return uml_vector_fetch_arg(def, "bpffile"); +} + +static bool get_bpf_flash(struct arglist *def) +{ + return uml_vector_fetch_arg(def, "bpfflash") != NULL; +} + + static int get_depth(struct arglist *def) { char *mtu = uml_vector_fetch_arg(def, "depth"); @@ -176,6 +190,7 @@ static int get_transport_options(struct arglist *def) int vec_rx = VECTOR_RX; int vec_tx = VECTOR_TX; long parsed; + int result = 0; if (vector != NULL) { if (kstrtoul(vector, 10, ) == 0) { @@ -186,14 +201,16 @@ static int get_transport_options(struct arglist *def) } } + if (get_bpf_flash(def)) + result = VECTOR_BPF_FLASH; if (strncmp(transport, TRANS_TAP, TRANS_TAP_LEN) == 0) - return 0; + return result; if (strncmp(transport, TRANS_HYBRID, TRANS_HYBRID_LEN) == 0) - return (vec_rx | VECTOR_BPF); + return (result | vec_rx | VECTOR_BPF); if (strncmp(transport, TRANS_RAW, TRANS_RAW_LEN) == 0) - return (vec_rx | vec_tx | VECTOR_QDISC_BYPASS); - return (vec_rx | vec_tx); + return (result | vec_rx | vec_tx | VECTOR_QDISC_BYPASS); + return (result | vec_rx | vec_tx); } @@ -1125,6 +1142,7 @@ static int vector_net_close(struct net_device *dev) netif_stop_queue(dev); del_timer(>tl); + if (vp->fds == NULL) return 0; @@ -1139,6 +1157,8 @@ static int vector_net_close(struct net_device *dev) } tasklet_kill(>tx_poll); if (vp->fds->rx_fd > 0) { + if (vp->bpf) + uml_vector_detach_bpf(vp->fds->rx_fd, vp->bpf); os_close_file(vp->fds->rx_fd); vp->fds->rx_fd = -1; } @@ -1146,7 +1166,10 @@ static int vector_net_close(struct net_device *dev) os_close_file(vp->fds->tx_fd); vp->fds->tx_fd = -1; } + if (vp->bpf != NULL) + kfree(vp->bpf->filter); kfree(vp->bpf); + vp->bpf = NULL; kfree(vp->fds->remote_addr); kfree(vp->transport_data); kfree(vp->header_rxbuffer); @@ -1196,6 +1219,8 @@ static int vector_net_open(struct net_device *dev) vp->opened = true; spin_unlock_irqrestore(>lock, flags); + vp->bpf = uml_vector_user_bpf(get_bpf_file(vp->parsed)); + vp->fds = uml_vector_user_open(vp->unit, vp->parsed); if (vp->fds == NULL) @@ -1267,8 +1292,11 @@ static int vector_net_open(struct net_device *dev) if (!uml_raw_enable_qdisc_bypass(vp->fds->rx_fd)) vp->options |= VECTOR_BPF; } - if ((vp->options & VECTOR_BPF) != 0) - vp->bpf = uml_vector_default_bpf(vp->fds->rx_fd, dev->dev_addr); + if (((vp->options & VECTOR_BPF) != 0) && (vp->bpf == NULL)) + vp->bpf = uml_vector_default_bpf(dev->dev_addr); + + if (vp->bpf != NULL) + uml_vector_attach_bpf(vp->fds->rx_fd, vp->bpf); netif_start_queue(dev); @@ -1347,6 +1375,67 @@ static v
Bug#940821: linux-image-5.2.0-2-amd64: file cache corruption with nfs4
Package: src:linux Version: 5.2.9-2 Severity: critical Justification: breaks unrelated software Dear Maintainer, NFSv4 caching is completely broken on SMP. How to reproduce: Option 1. clone openwrt, run while make clean && make -j `nproc` ; do true ; done It will break depending on number of CPUs within several runs. Symptoms of breakage. A directory on the client looks empty. Example (mnt is an NFSv4 mount): ls -laF /mnt/src/openwrt/build_dir/target-mips_24kc_musl/linux-ar71xx_tiny/linux-4.14.125/arch/mips/include/generated/uapi/asm total 8 drwxr-xr-x 2 anivanov anivanov 4096 Sep 20 10:51 ./ drwxr-xr-x 3 anivanov anivanov 4096 Sep 20 10:51 ../ While it actually has a file in it (same on server): ls -laF /exports/work/src/openwrt/build_dir/target-mips_24kc_musl/linux-ar71xx_tiny/linux-4.14.125/arch/mips/include/generated/uapi/asm total 12 drwxr-xr-x 2 anivanov anivanov 4096 Sep 20 10:51 ./ drwxr-xr-x 3 anivanov anivanov 4096 Sep 20 10:51 ../ -rw-r--r-- 1 anivanov anivanov 32 Sep 20 10:51 ipcbuf.h This cache entry on the client does not expire as it should per the NFSv4 caching documentation - the only way of dealing with it is reboot, unmount or caches drop. Option 2. Have your $HOME on nfsv4 and use thunderbird. Move mails between folders. Sooner or later (usually sooner) you will lose an email. So this is both "breaks unrelated software" and "data loss" depending on what you are doing. Tested on: AMD Ryzen 5 2400G, AMD Ryzen 5 1600X, AMD Ryzen 5 1600, AMD A8-6500 Shows up on all. Fastest on the 6 core 12 thread ryzens, slowest on the AMD A8 (takes up to 3 iterations of make there). Brgds, A. -- Package-specific info: ** Version: Linux version 5.2.0-2-amd64 (debian-ker...@lists.debian.org) (gcc version 8.3.0 (Debian 8.3.0-21)) #1 SMP Debian 5.2.9-2 (2019-08-21) ** Command line: BOOT_IMAGE=/boot/vmlinuz-5.2.0-2-amd64 root=UUID=8eb17efb-6574-42d0-885e-487b98364059 ro mitigations=off noht quiet ** Not tainted ** Kernel log: [3.684402] input: HD-Audio Generic Front Mic as /devices/pci:00/:00:08.1/:09:00.3/sound/card0/input8 [3.684490] input: HD-Audio Generic Rear Mic as /devices/pci:00/:00:08.1/:09:00.3/sound/card0/input9 [3.684555] input: HD-Audio Generic Line as /devices/pci:00/:00:08.1/:09:00.3/sound/card0/input10 [3.685553] input: HD-Audio Generic Line Out as /devices/pci:00/:00:08.1/:09:00.3/sound/card0/input11 [3.685627] input: HD-Audio Generic Front Headphone as /devices/pci:00/:00:08.1/:09:00.3/sound/card0/input12 [3.806626] kvm: Nested Virtualization enabled [3.806636] kvm: Nested Paging enabled [3.806637] SVM: Virtual VMLOAD VMSAVE supported [3.806637] SVM: Virtual GIF supported [3.820371] MCE: In-kernel MCE decoding enabled. [3.824533] EDAC amd64: Node 0: DRAM ECC disabled. [3.824536] EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load. Either enable ECC checking or force module loading by setting 'ecc_enable_override'. (Note that use of the override may cause unknown side effects.) [3.872569] pktcdvd: pktcdvd0: writer mapped to sr0 [3.900858] EDAC amd64: Node 0: DRAM ECC disabled. [3.900860] EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load. Either enable ECC checking or force module loading by setting 'ecc_enable_override'. (Note that use of the override may cause unknown side effects.) [3.948661] EDAC amd64: Node 0: DRAM ECC disabled. [3.948662] EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load. Either enable ECC checking or force module loading by setting 'ecc_enable_override'. (Note that use of the override may cause unknown side effects.) [3.996651] EDAC amd64: Node 0: DRAM ECC disabled. [3.996652] EDAC amd64: ECC disabled in the BIOS or no ECC capability, module will not load. Either enable ECC checking or force module loading by setting 'ecc_enable_override'. (Note that use of the override may cause unknown side effects.) [4.002382] audit: type=1400 audit(1568973482.655:2): apparmor="STATUS" operation="profile_load" profile="unconfined" name="libreoffice-xpdfimport" pid=706 comm="apparmor_parser" [4.002712] audit: type=1400 audit(1568973482.655:3): apparmor="STATUS" operation="profile_load" profile="unconfined" name="libreoffice-senddoc" pid=701 comm="apparmor_parser" [4.005254] audit: type=1400 audit(1568973482.659:4): apparmor="STATUS" operation="profile_load" profile="unconfined" name="libreoffice-oopslash" pid=699 comm="apparmor_parser" [4.007555] audit: type=1400 audit(1568973482.659:5): apparmor="STATUS" operation="profile_load" profile="unconfined" name="nvidia_modprobe" pid=702 comm="apparmor_parser" [4.007558] audit: type=1400 audit(1568973482.659:6):
Bug#938962: user-mode-linux needs update for new linux
On 12/09/2019 15:42, Anton Ivanov wrote: On 12/09/2019 13:14, Ritesh Raj Sarraf wrote: Hi, I am not sure if this has been reported upstream but with libpcap 1.9, user mode linux fails to build. The build failure happens with both, 5.2 and 4.19 LTS kernels. A much detailed report is available at: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=938962 libpcap 1.9 introduces `pcap_open` which is also declared in linux headers in arch/um/drivers/pcap_user.c I think the best way forward here is to kill the old libpcap driver altogether. You get the same functionality from vector raw including the ability to load a bpf filter. The only thing that needs is a wrapper to compile the filter before handing it to UML. A side effect is that it is ~ 10+ time faster - in the multigigabit range. Alternatively, I can wrap it so it looks like pcap to any existing scripts and is actually vector underneath, but that will lose some of the tunables, like offloads, vector depth, etc. Thanks, Ritesh On Sat, 2019-09-07 at 17:18 +0200, Romain Francoise wrote: Hi, On Tue, Sep 3, 2019 at 3:21 PM Ritesh Raj Sarraf wrote: [...] In file included from /usr/include/pcap.h:43, from arch/um/drivers/pcap_user.c:7: /usr/include/pcap/pcap.h:835:18: note: previous declaration of ‘pcap_open’ was here PCAP_API pcap_t *pcap_open(const char *source, int snaplen, int flags, ^ make[2]: *** [scripts/Makefile.build:309: arch/um/drivers/pcap_user.o] Error 1 libpcap 1.9 includes support for remote capture, which was originally a part of WinPcap extensions. The `pcap_open()' symbol is part of that API and that's why it's defined in the header file even though remote support is not enabled in Debian. I suggest you rename the function defined in your program so that it doesn't conflict with libpcap. ___ linux-um mailing list linux...@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-um I am going to try to write a wrapper to form arguments for the current vector raw driver and if there is something that needs to be fixed in it. I will post is as a proposed patch vs the debian package once its ready. Brgds, A -- Anton R. Ivanov https://www.kot-begemot.co.uk/
Bug#938962: user-mode-linux needs update for new linux
On 12/09/2019 13:14, Ritesh Raj Sarraf wrote: Hi, I am not sure if this has been reported upstream but with libpcap 1.9, user mode linux fails to build. The build failure happens with both, 5.2 and 4.19 LTS kernels. A much detailed report is available at: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=938962 libpcap 1.9 introduces `pcap_open` which is also declared in linux headers in arch/um/drivers/pcap_user.c I think the best way forward here is to kill the old libpcap driver altogether. You get the same functionality from vector raw including the ability to load a bpf filter. The only thing that needs is a wrapper to compile the filter before handing it to UML. A side effect is that it is ~ 10+ time faster - in the multigigabit range. Alternatively, I can wrap it so it looks like pcap to any existing scripts and is actually vector underneath, but that will lose some of the tunables, like offloads, vector depth, etc. Thanks, Ritesh On Sat, 2019-09-07 at 17:18 +0200, Romain Francoise wrote: Hi, On Tue, Sep 3, 2019 at 3:21 PM Ritesh Raj Sarraf wrote: [...] In file included from /usr/include/pcap.h:43, from arch/um/drivers/pcap_user.c:7: /usr/include/pcap/pcap.h:835:18: note: previous declaration of ‘pcap_open’ was here PCAP_API pcap_t *pcap_open(const char *source, int snaplen, int flags, ^ make[2]: *** [scripts/Makefile.build:309: arch/um/drivers/pcap_user.o] Error 1 libpcap 1.9 includes support for remote capture, which was originally a part of WinPcap extensions. The `pcap_open()' symbol is part of that API and that's why it's defined in the header file even though remote support is not enabled in Debian. I suggest you rename the function defined in your program so that it doesn't conflict with libpcap. ___ linux-um mailing list linux...@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-um -- Anton R. Ivanov https://www.kot-begemot.co.uk/
Bug#939877: closed by Josue Ortega (Bug#939877: fixed in rpcbind 1.2.5-7)
I missed the actual receive line in the 1.2.5-7 apologies. It alone DOES Not fix it though. There is breakage in libwrap to accompany it. Once the fix in 1.2.5-7 is in, rpcbind starts receiving (according to strace) messages which is followed by interrogating addresses and interfaces by netlink. As I do not see any netlink references anywhere in the rpcbind or the libtirpc-dev, I believe this is wrap which now has broken broadcast check. So anything compiled with wrap which needs to receive broadcasts need to be set as ALL:ALL in hosts.allow - otherwise it is dropped. Upgrading to both 1.2.5-7 _AND_ setting hosts.allow to ALL:ALL provides a viable workaround. The remaining part of this bug is libwrap, you can refile it vs that. Best Regards, -- Anton R. Ivanov https://www.kot-begemot.co.uk/
Bug#939877: closed by Josue Ortega (Bug#939877: fixed in rpcbind 1.2.5-7)
That's not it. Same story with 1.2.5-7 from unstable. This is after NIS restart on the client on the NIS server: root@jain:# tcpdump -nvvv -i enp7s0f1.502 udp and port 111 tcpdump: listening on enp7s0f1.502, link-type EN10MB (Ethernet), capture size 262144 bytes 192.168.20.41.36268 > 192.168.20.63.111: [udp sum ok] UDP, length 92 09:02:57.820457 IP (tos 0x0, ttl 64, id 55627, offset 0, flags [DF], proto UDP (17), length 120) 192.168.20.41.36268 > 192.168.20.63.111: [udp sum ok] UDP, length 92 09:03:03.826888 IP (tos 0x0, ttl 64, id 55969, offset 0, flags [DF], proto UDP (17), length 120) And on - the RPC retransmits to broadcast address (63 on this subnet it is /26) Traffic only one way, strace on rpcbind shows only netlink messages, no udp recv Same thing after setting a nis server address on the client and restarting nis - immediate response tcpdump -nvvv -i enp7s0f1.502 udp and port 111 192.168.20.41.800 > 192.168.3.3.111: [udp sum ok] UDP, length 56 09:05:00.429940 IP (tos 0x0, ttl 64, id 22755, offset 0, flags [DF], proto UDP (17), length 56) 192.168.3.3.111 > 192.168.20.41.800: [bad udp cksum 0x98b2 -> 0x1245!] UDP, strace of the rpcbind process sendmsg(6, {msg_name={sa_family=AF_INET, sin_port=htons(800), sin_addr=inet_addr("192.168.20.41")}, msg_namelen=16, msg_iov=[{iov_base=".{\272q\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\2\265", iov_len=28}], msg_iovlen=1, msg_control=[{cmsg_len=28, cmsg_level=SOL_IP, cmsg_type=IP_PKTINFO, cmsg_data={ipi_ifindex=0, ipi_spec_dst=inet_addr("192.168.3.3"), ipi_addr=inet_addr("192.168.3.3")}}], msg_controllen=32, msg_flags=0}, 0) = 28 That line (strace) never occurs in the broadcast case. It simply is not listening to broadcast queries. I will try to wade through the source to see exactly how it manages it, because listening on INADDR_ANY should in theory get you broadcasts. On 09/09/2019 22:00, Debian Bug Tracking System wrote: This is an automatic notification regarding your Bug report which was filed against the rpcbind package: #939877: rpcbind: Does not receive any broadcast queries resulting in complete breakage of NIS It has been closed by Josue Ortega . Their explanation is attached below along with your original report. If this explanation is unsatisfactory and you have not received a better one in a separate message then please contact Josue Ortega by replying to this email. -- Anton R. Ivanov https://www.kot-begemot.co.uk/
Bug#939877: rpcbind: Does not receive any broadcast queries resulting in complete breakage of NIS
Package: rpcbind Version: 1.2.5-0.3 Severity: grave Justification: renders package unusable Dear Maintainer, After an upgrade to buster rpcbind no longer receives any broadcast queries. Unicast works. This is verified via strace - it has occasional netlink messages, but any of the broadcast traffic to port 111 never hit it. As a result clients can no longer find a nis server which has been upgraded to buster. -- System Information: Debian Release: 10.1 APT prefers stable-updates APT policy: (500, 'stable-updates'), (500, 'stable') Architecture: amd64 (x86_64) Kernel: Linux 4.19.0-6-amd64 (SMP w/8 CPU cores) Kernel taint flags: TAINT_OOT_MODULE, TAINT_UNSIGNED_MODULE Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8), LANGUAGE=en_GB:en (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system) LSM: AppArmor: enabled Versions of packages rpcbind depends on: ii adduser 3.118 ii libc62.28-10 ii libsystemd0 241-7~deb10u1 ii libtirpc31.1.4-0.4 ii libwrap0 7.6.q-28 ii lsb-base 10.2019051400 rpcbind recommends no packages. rpcbind suggests no packages. -- no debconf information
Bug#926305: closed by Elimar Riesebieter (Re: Bug#926305: nis startup scripts are completely broken)
Please reopen. Advice is no replacement for a Depends in the package control file. As shipped the package is still broken and at the reported severity - breaking most of the system A. On 18/04/2019 14:48, Debian Bug Tracking System wrote: This is an automatic notification regarding your Bug report which was filed against the nis package: #926305: nis startup scripts are completely broken It has been closed by Elimar Riesebieter . Their explanation is attached below along with your original report. If this explanation is unsatisfactory and you have not received a better one in a separate message then please contact Elimar Riesebieter by replying to this email. -- Anton R. Ivanov https://www.kot-begemot.co.uk/
Bug#926305: nis startup scripts are completely broken
That is not an advice. If nscd is a required dependency, NIS should bring it in. Presently it is not. Still broken A. On 18/04/2019 14:43, Elimar Riesebieter wrote: * Elimar Riesebieter [2019-04-03 11:06 +0200]: * Anton Ivanov [2019-04-03 09:43 +0100]: Package: nis Version: 3.17.1-3+b1 Severity: critical Justification: breaks unrelated software Dear Maintainer, Startup scripts are completely broken. Something in the systemd conversion/autogeneration. The ypbind binary is never started, the script goes into "backgrounded" and fails. From there on the system is unusable - you cannot log in, UIDs and groups do not resolve, etc. The same system operated correctly before buster upgrade and will operate correctly if ypbind is invoked from the command line. This looks like a pure systemd conversion issue of some sort. At my systems installing nscd helped. As well setting "YPBINDARGS=" in /etc/default/nis must be. This bug should be closed as there is no response from the reporter. As well it seems to be fixed following the advices given above, though. Elimar -- Anton R. Ivanov https://www.kot-begemot.co.uk/
Bug#926305: nis startup scripts are completely broken
Package: nis Version: 3.17.1-3+b1 Severity: critical Justification: breaks unrelated software Dear Maintainer, Startup scripts are completely broken. Something in the systemd conversion/autogeneration. The ypbind binary is never started, the script goes into "backgrounded" and fails. From there on the system is unusable - you cannot log in, UIDs and groups do not resolve, etc. The same system operated correctly before buster upgrade and will operate correctly if ypbind is invoked from the command line. This looks like a pure systemd conversion issue of some sort. -- Package-specific info: NIS domain: home -- System Information: Debian Release: buster/sid APT prefers testing APT policy: (500, 'testing') Architecture: amd64 (x86_64) Kernel: Linux 4.19.0-4-amd64 (SMP w/2 CPU cores) Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8), LANGUAGE=en_GB:en (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system) LSM: AppArmor: enabled Versions of packages nis depends on: ii debconf [debconf-2.0] 1.5.71 ii hostname 3.21 ii libc6 2.28-8 ii libgdbm6 1.18.1-4 ii libsystemd0241-1 ii lsb-base 10.2019031300 ii make 4.2.1-1.2 ii netbase5.6 ii rpcbind [portmap] 1.2.5-0.3 nis recommends no packages. Versions of packages nis suggests: pn nscd -- Configuration Files: /etc/yp.conf changed [not included] -- debconf information: * nis/domain: home
Bug#878046: amanda-server: Fails all backups if one or more hosts are down
I am OK to wait for the upload On 22 October 2017 13:26:56 EEST, Jose M Calhariz <j...@calhariz.com> wrote: >That is an old problem of amanda that is solved on v3.5. But the error >messages are usually different from what you see. > >I have been working on a new package that I should upload very shortly, >to sid and backports. If you are dead on water I >can provide my working in progress packages for stretch on amd64. > >Kind regards >Jose M Calhariz > >On 09/10/17 06:55, Anton Ivanov wrote: >> Package: amanda-server >> Version: 1:3.3.9-5 >> Severity: grave >> Justification: renders package unusable >> >> Dear Maintainer, >> >> If one or more backup host is unreachable, the backup of all hosts >fails. >> >> Example - backing up two hosts - smaug and TerriblTerror: >> >> If the latter is unreachable >> >> TerribleTerror1 /etc lev 0 FAILED [Request to TerribleTerror1 >failed: Connection timed out] >> >> The former (and all other hosts in the backup sequence) fail with: >> >> smaug /exports/md0/home/aivanov lev 0 FAILED [Request to smaug >failed: error sending REQ: write error to: Broken pipe] >> >> -- System Information: >> Debian Release: 9.0 >> APT prefers stable >> APT policy: (500, 'stable') >> Architecture: amd64 (x86_64) >> >> Kernel: Linux 4.9.0-3-amd64 (SMP w/4 CPU cores) >> Locale: LANG=en_GB.utf8, LC_CTYPE=en_GB.utf8 (charmap=UTF-8), >LANGUAGE=en_GB:en (charmap=UTF-8) >> Shell: /bin/sh linked to /bin/dash >> Init: systemd (via /run/systemd/system) >> >> Versions of packages amanda-server depends on: >> ii amanda-common 1:3.3.9-5 >> ii bsd-mailx [mailx] 8.1.2-0.20160123cvs-4 >> ii libc6 2.24-11+deb9u1 >> ii libcurl3 7.52.1-5 >> ii libglib2.0-0 2.50.3-2 >> ii libssl1.1 1.1.0f-3 >> ii perl 5.24.1-3 >> >> amanda-server recommends no packages. >> >> Versions of packages amanda-server suggests: >> ii amanda-client 1:3.3.9-5 >> ii cpio 2.11+dfsg-6 >> ii gnuplot5.0.5+dfsg1-6 >> ii mt-st 1.3-1 >> >> -- no debconf information -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Bug#878046: amanda-server: Fails all backups if one or more hosts are down
Package: amanda-server Version: 1:3.3.9-5 Severity: grave Justification: renders package unusable Dear Maintainer, If one or more backup host is unreachable, the backup of all hosts fails. Example - backing up two hosts - smaug and TerriblTerror: If the latter is unreachable TerribleTerror1 /etc lev 0 FAILED [Request to TerribleTerror1 failed: Connection timed out] The former (and all other hosts in the backup sequence) fail with: smaug /exports/md0/home/aivanov lev 0 FAILED [Request to smaug failed: error sending REQ: write error to: Broken pipe] -- System Information: Debian Release: 9.0 APT prefers stable APT policy: (500, 'stable') Architecture: amd64 (x86_64) Kernel: Linux 4.9.0-3-amd64 (SMP w/4 CPU cores) Locale: LANG=en_GB.utf8, LC_CTYPE=en_GB.utf8 (charmap=UTF-8), LANGUAGE=en_GB:en (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system) Versions of packages amanda-server depends on: ii amanda-common 1:3.3.9-5 ii bsd-mailx [mailx] 8.1.2-0.20160123cvs-4 ii libc6 2.24-11+deb9u1 ii libcurl3 7.52.1-5 ii libglib2.0-0 2.50.3-2 ii libssl1.1 1.1.0f-3 ii perl 5.24.1-3 amanda-server recommends no packages. Versions of packages amanda-server suggests: ii amanda-client 1:3.3.9-5 ii cpio 2.11+dfsg-6 ii gnuplot5.0.5+dfsg1-6 ii mt-st 1.3-1 -- no debconf information
Bug#844584: dhclient should perform additional validity checks
Package: isc-dhcp-client Version: 4.3.1-6+deb8u2 Severity: serious File: /sbin/dhclient Tags: security https://samy.pl/poisontap/ This is a variation on an ancient "gem" by a DSL Modem vendor where the router pretends to be the entire internet by spoofing arp so that it captures all traffic. The best way to deal with this is to set an upper limit on the size of acceptable netmask in /etc/default/isc-dhcp-client and verify it in a hook (which can be debian specific). This way dhcp reply of 0.0.0.0/0 or anything larger than a class A will raise a security alert instead of blindly exposing the machine to a spoofing attack. -- System Information: Debian Release: 8.6 APT prefers stable-updates APT policy: (500, 'stable-updates'), (500, 'stable') Architecture: amd64 (x86_64) Foreign Architectures: i386 Kernel: Linux 3.16.0-4-amd64 (SMP w/4 CPU cores) Locale: LANG=en_GB.utf8, LC_CTYPE=en_GB.utf8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system) Versions of packages isc-dhcp-client depends on: ii debianutils 4.4+b1 ii iproute2 3.16.0-2 ii isc-dhcp-common 4.3.1-6+deb8u2 ii libc6 2.19-18+deb8u6 ii libdns-export100 1:9.9.5.dfsg-9+deb8u7 ii libirs-export91 1:9.9.5.dfsg-9+deb8u7 ii libisc-export95 1:9.9.5.dfsg-9+deb8u7 isc-dhcp-client recommends no packages. Versions of packages isc-dhcp-client suggests: pn avahi-autoipd pn resolvconf -- no debconf information
Bug#798178: warzone2100: Major regressions compared to squeeze
Package: warzone2100 Version: 3.1.1-1 Severity: grave Justification: renders package unusable Dear Maintainer, The new version is unplayable. 1. Units produced during a remote mission instead of being delivered to the factory delivery area of the factory producing them are locked in a rock somewhere off-map rendering them unusable as well as rendering most preparation strategies unusable. This is an obvious bug and it worked correctly in the squeeze version. 2. The "nudge your neigbour" in the unit obstacle avoidance algorithm does not work. The result is unit deadlock, because units that need to "give you way" for you to get past them just sit and wait unless moved manually. This can be resolved only by picking every unit separately, and moving them "by hand". They definitely cannot be moved in a formation any more. This renders commanders, sensors, etc mostly unusable. You now cannot retreat a group under command because the "subordinates" will not move out of the way for the commander to pass. They will also not move out of the way for any damaged units to go for repair. Again - this worked in squeeze. Frankly, can we have the squeeze version recompiled and released as an "update", this "improvement" is unplayable. I am definitely recompiling it locally from squeeze sources as the main users (the kids) are revolting that this is unusable. -- System Information: Debian Release: 8.1 APT prefers stable APT policy: (500, 'stable') Architecture: amd64 (x86_64) Kernel: Linux 3.16.0-4-amd64 (SMP w/2 CPU cores) Locale: LANG=en_GB.utf8, LC_CTYPE=en_GB.utf8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Init: systemd (via /run/systemd/system) Versions of packages warzone2100 depends on: ii libc6 2.19-18 ii libfontconfig12.11.0-6.3 ii libfreetype6 2.5.2-3 ii libfribidi0 0.19.6-3 ii libgcc1 1:4.9.2-10 ii libgl1-mesa-glx [libgl1] 10.3.2-1 ii libglc0 0.7.2-5+b2 ii libglew1.10 1.10.0-3 ii libglu1-mesa [libglu1]9.0.0-2 ii libminiupnpc101.9.20140610-2 ii libogg0 1.3.2-1 ii libopenal11:1.15.1-5 ii libphysfs12.0.3-2 ii libpng12-01.2.50-2+b2 ii libqt4-network4:4.8.6+git64-g5dc8b2b+dfsg-3+deb8u1 ii libqt4-script 4:4.8.6+git64-g5dc8b2b+dfsg-3+deb8u1 ii libqtcore44:4.8.6+git64-g5dc8b2b+dfsg-3+deb8u1 ii libsdl1.2debian 1.2.15-10+b1 ii libstdc++64.9.2-10 ii libtheora01.1.1+dfsg.1-6 ii libvorbis0a 1.3.4-2 ii libvorbisfile31.3.4-2 ii libx11-6 2:1.6.2-3 ii libxrandr22:1.4.2-1+b1 ii warzone2100-data 3.1.1-1 ii zlib1g1:1.2.8.dfsg-2+b1 Versions of packages warzone2100 recommends: ii warzone2100-music 3.1.1-1 warzone2100 suggests no packages. -- no debconf information
Bug#741075: user-mode-linux: Occasional memory corruption on startup under high load
On 09/03/14 21:35, Mattia Dongili wrote: On Sat, Mar 08, 2014 at 07:04:56AM +, Anton Ivanov wrote: Package: user-mode-linux Version: 3.2-2um-1+deb7u2+b1 Severity: grave Tags: patch Justification: causes non-serious data loss Dear Maintainer, This bug is perennial. If we go through old bugs with cannot reproduce tag 50% of them are this one, the other 50% are the you should not use pipe for interprocess IPC which we will submit shortly. Manifestation of the problem - UML dies on startup for no reason with a memory corruption message. Occurs only on heavily loaded systems and usually when running a lot of UMLs. Thanks for the patch. I have noticed that you submitted these patch-set (together with the other two you sent here and more) upstream and they will be in the stable branch. The easiest path here is also to go through the stable release of linux-source where uml is built from. I'll keep an eye on the stable tree but it'd be very helpful if you could add the stable tree commit ids once the patches get included. Same story for the other two bugs. All 3 bugs have now patches submitted upstream. I have submitted our other improvements as well. While they do not make a speed daemon of uml userspace they get it reasonably close to kvm. Kernel itself is now faster than qemu-kvm for most networking stuff. A. Thanks! -- To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#741075: user-mode-linux: Occasional memory corruption on startup under high load
On 09/03/14 21:35, Mattia Dongili wrote: On Sat, Mar 08, 2014 at 07:04:56AM +, Anton Ivanov wrote: Package: user-mode-linux Version: 3.2-2um-1+deb7u2+b1 Severity: grave Tags: patch Justification: causes non-serious data loss Dear Maintainer, This bug is perennial. If we go through old bugs with cannot reproduce tag 50% of them are this one, the other 50% are the you should not use pipe for interprocess IPC which we will submit shortly. Manifestation of the problem - UML dies on startup for no reason with a memory corruption message. Occurs only on heavily loaded systems and usually when running a lot of UMLs. Thanks for the patch. I have noticed that you submitted these patch-set (together with the other two you sent here and more) upstream and they will be in the stable branch. The easiest path here is also to go through the stable release of linux-source where uml is built from. I'll keep an eye on the stable tree but it'd be very helpful if you could add the stable tree commit ids once the patches get included. Same story for the other two bugs. Thanks! You are welcome. I will update them once Richard Weinberger gets around to merge them (hopefully soon). -- If you think it's expensive to hire a professional to do the job, wait until you hire an amateur. Paul Neal Red Adair A. R. Ivanov E-mail: anton.iva...@kot-begemot.co.uk -- To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#741075: user-mode-linux: Occasional memory corruption on startup under high load
Package: user-mode-linux Version: 3.2-2um-1+deb7u2+b1 Severity: grave Tags: patch Justification: causes non-serious data loss Dear Maintainer, This bug is perennial. If we go through old bugs with cannot reproduce tag 50% of them are this one, the other 50% are the you should not use pipe for interprocess IPC which we will submit shortly. Manifestation of the problem - UML dies on startup for no reason with a memory corruption message. Occurs only on heavily loaded systems and usually when running a lot of UMLs. -- System Information: Debian Release: 7.3 APT prefers stable-updates APT policy: (500, 'stable-updates'), (500, 'stable') Architecture: amd64 (x86_64) Kernel: Linux 3.2.0-4-amd64 (SMP w/2 CPU cores) Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/dash Versions of packages user-mode-linux depends on: ii libc6 2.13-38 ii uml-utilities 20070815-1.1-ai-1.8 user-mode-linux recommends no packages. Versions of packages user-mode-linux suggests: ii gnome-terminal [x-terminal-emulator] 3.4.1.1-2 ii konsole [x-terminal-emulator] 4:4.8.4-2 pn rootstrap none pn slirp none pn user-mode-linux-doc none pn vde2 none ii xfce4-terminal [x-terminal-emulator] 0.4.8-1+b1 ii xterm [x-terminal-emulator] 278-4 -- no debconf information From 9c3a9af21c0bfeca27eac958fde215594b4ee3fa Mon Sep 17 00:00:00 2001 From: Anton Ivanov antiv...@cisco.com Date: Sat, 8 Mar 2014 06:49:27 + Subject: [PATCH 2/3] BUG: Memory corruption on startup The reverse case of this race (you must msync before read) is well known. This is the not so common one. It can be triggered only on systems which do a lot of task switching and only at UML startup. If you are starting 200+ UMLs ~ 0.5% will always die without this fix. --- arch/um/include/shared/os.h |1 + arch/um/kernel/physmem.c|1 + arch/um/os-Linux/file.c |6 ++ 3 files changed, 8 insertions(+) diff --git a/arch/um/include/shared/os.h b/arch/um/include/shared/os.h index 89b686c1..3c9738d 100644 --- a/arch/um/include/shared/os.h +++ b/arch/um/include/shared/os.h @@ -136,6 +136,7 @@ extern int os_ioctl_generic(int fd, unsigned int cmd, unsigned long arg); extern int os_get_ifname(int fd, char *namebuf); extern int os_set_slip(int fd); extern int os_mode_fd(int fd, int mode); +extern int os_fsync_file(int fd); extern int os_seek_file(int fd, unsigned long long offset); extern int os_open_file(const char *file, struct openflags flags, int mode); diff --git a/arch/um/kernel/physmem.c b/arch/um/kernel/physmem.c index f116db1..30fdd5d0 100644 --- a/arch/um/kernel/physmem.c +++ b/arch/um/kernel/physmem.c @@ -103,6 +103,7 @@ void __init setup_physmem(unsigned long start, unsigned long reserve_end, */ os_seek_file(physmem_fd, __pa(__syscall_stub_start)); os_write_file(physmem_fd, __syscall_stub_start, PAGE_SIZE); + os_fsync_file(physmem_fd); bootmap_size = init_bootmem(pfn, pfn + delta); free_bootmem(__pa(reserve_end) + bootmap_size, diff --git a/arch/um/os-Linux/file.c b/arch/um/os-Linux/file.c index b049a63..a4f0e65 100644 --- a/arch/um/os-Linux/file.c +++ b/arch/um/os-Linux/file.c @@ -237,6 +237,12 @@ void os_close_file(int fd) { close(fd); } +int os_fsync_file(int fd) +{ + if (fsync(fd) 0) + return -errno; + return 0; +} int os_seek_file(int fd, unsigned long long offset) { -- 1.7.10.4
Bug#622652: alsa-driver: fails to build on powermac
On 10/11/11 09:32, Anton Ivanov wrote: On 10/11/11 08:49, Jonathan Nieder wrote: Anton Ivanov wrote: I can try to patch it to build some time next week. However, looking at the supported kernels file in the package it may be better to go straight for 1.0.24 which is current alsa stable. Any news on that? (No problem if the answer is no. :)) By the way, for reference, what kernel were you building against? Apologies, I have been overloaded with other stuff for the last few months. I have 3 nearly free weeks until Dec will get around to look at this and other Mac specific bugs (I have a few more filed vs X, etc). It fails to build because the alsa driver source has the expectation that pdev_archdata contains the same information as dev_archdata which on ppc includes references to the openfirmware tree. Well, in 2.6.32 as shipped in squeeze that structure is blank. So rather unsurprisingly it ftbs. I am going to pull .38 to see how did this structure evolve over time. Brgds, -- To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#622652: alsa-driver: fails to build on powermac
On 10/11/11 08:49, Jonathan Nieder wrote: Anton Ivanov wrote: I can try to patch it to build some time next week. However, looking at the supported kernels file in the package it may be better to go straight for 1.0.24 which is current alsa stable. Any news on that? (No problem if the answer is no. :)) By the way, for reference, what kernel were you building against? Apologies, I have been overloaded with other stuff for the last few months. I have 3 nearly free weeks until Dec will get around to look at this and other Mac specific bugs (I have a few more filed vs X, etc). -- If you think it's expensive to hire a professional to do the job, wait until you hire an amateur. Paul Neal Red Adair A. R. Ivanov E-mail: anton.iva...@kot-begemot.co.uk -- To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#622652: [Pkg-alsa-devel] Bug#622652: alsa-driver: fails to build on powermac
On 04/14/11 18:27, Elimar Riesebieter wrote: * Anton Ivanov [110413 17:57 +0100]: Package: alsa-driver Which version? Severity: serious Justification: fails to build from source (but built successfully in the past) Elimar Standard squeeze one. 1.0.23+dfsg-2 This is a ppc only problem. Unless I am mistaken, that part of the build is not invoked on other platforms. I can try to patch it to build some time next week. However, looking at the supported kernels file in the package it may be better to go straight for 1.0.24 which is current alsa stable. Brgds, -- Understanding is a three-edged sword: your side, their side, and the truth. --Kosh Naranek A. R. Ivanov E-mail: aiva...@sigsegv.cx WWW: http://www.sigsegv.cx/ pub 1024D/DDE5E715 2002-03-03 Anton R. Ivanovai...@sigsegv.cx Fingerprint: C824 CBD7 EE4B D7F8 5331 89D5 FCDA 572E DDE5 E715 -- To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#622652: Acknowledgement (alsa-driver: fails to build on powermac)
Some digging points to this: http://permalink.gmane.org/gmane.linux.kernel.commits.head/226657 as a likely culprit. Brgds, -- To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#622652: alsa-driver: fails to build on powermac
Package: alsa-driver Severity: serious Justification: fails to build from source (but built successfully in the past) I am still getting from time to time (not always) sound glitches similar to the ones reported in: Bug#610859 so I decided to try building more recent alsa from source. However it does not build. In file included from /usr/src/modules/alsa-driver/ppc/pmac.c:13: /usr/src/modules/alsa-driver/ppc/../alsa-kernel/ppc/pmac.c: In function ‘detect_byte_swap’: /usr/src/modules/alsa-driver/ppc/../alsa-kernel/ppc/pmac.c:925: error: implicit declaration of function ‘of_machine_is_compatible’ make[7]: *** [/usr/src/modules/alsa-driver/ppc/pmac.o] Error 1 make[6]: *** [/usr/src/modules/alsa-driver/ppc] Error 2 make[5]: *** [_module_/usr/src/modules/alsa-driver] Error 2 make[4]: *** [sub-make] Error 2 make[3]: *** [all] Error 2 make[3]: Leaving directory `/usr/src/linux-headers-2.6.32-5-powerpc' make[2]: *** [compile] Error 2 make[2]: Leaving directory `/usr/src/modules/alsa-driver' make[1]: *** [build-stamp] Error 2 make[1]: Leaving directory `/usr/src/modules/alsa-driver' This is the tail of m-a a-i alsa -- System Information: Debian Release: 6.0 APT prefers stable APT policy: (500, 'stable') Architecture: powerpc (ppc) Kernel: Linux 2.6.32-5-powerpc Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/bash -- To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#520095: removes the toplevel mountpoint directories and fails to start the next time
On Sun, 2009-08-02 at 01:40 +0200, Jan Christoph Nordholz wrote: Hi Michael, No idea. If I were knew, I'd attach a patch for this issue. The code is quite.. funny and fragile, I tried to understand it right before submitting a bugreport but that wasn't quite successful. I ran it under strace - pure automountd, without any startu scripts but with the same args. It never ever tried to mkdir or rename. It created two random dirs in /tmp, mounted a tmpfs over one of them (running mount(8)), bind-mounted it on second dir, next did stat(/misc) (which returned ENOENT) and immediately gave up returning it can't mount /misc. this is the strace log on my system after the spawned umount process has terminated: ] 30451 --- SIGCHLD (Child exited) @ 0 (0) --- ] 30451 rmdir(/tmp/autoa1Aqlv) = 0 ] 30451 rmdir(/tmp/autohY6Rkm) = 0 ] 30451 rt_sigaction(SIGTERM, {0xb801dd70, [HUP USR1 USR2 ALRM TERM], SA_RESTART}, NULL, 8) = 0 ] 30451 rt_sigaction( several more ) ] 30451 open(/etc/mtab, O_RDONLY) = 8 ] 30451 fstat64(8, {st_mode=S_IFREG|0644, st_size=701, ...}) = 0 ] 30451 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7fe7000 ] 30451 read(8, /dev/sda2 / ext3 rw,errors=remou..., 4096) = 701 ] 30451 read(8, , 4096) = 0 ] 30451 close(8) = 0 ] 30451 munmap(0xb7fe7000, 4096) = 0 ] 30451 stat64(/misc, 0xbfbca894) = -1 ENOENT (No such file or directory) ] 30451 open(/etc/mtab, O_RDONLY) = 8 ] 30451 fstat64(8, {st_mode=S_IFREG|0644, st_size=701, ...}) = 0 ] 30451 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7fe7000 ] 30451 read(8, /dev/sda2 / ext3 rw,errors=remou..., 4096) = 701 ] 30451 read(8, , 4096) = 0 ] 30451 close(8) = 0 ] 30451 munmap(0xb7fe7000, 4096) = 0 ] 30451 statfs(/, {f_type=EXT2_SUPER_MAGIC, f_bsize=4096, f_blocks=9612195, f_bfree=5459645, f_bavail=4971364, f_files=2444624, f_ffree=2142 ] 30451 mkdir(/misc, 0555) = 0 ] 30451 pipe([8, 11]) = 0 ] 30451 pipe([12, 13])= 0 ] 30451 rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [], 8) = 0 ] 30451 pipe([14, 15])= 0 ] 30451 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb7e7fb48) = 30454 ] [...] which calls: ] 30454 execve(/bin/mount, [/bin/mount, -t, autofs, -o, fd=11,pgrp=30451,minproto=2,maxp..., automount(pid30451), /misc], [/* 44 vars */]) = 0 Maybe you can spot the difference that's causing your automountd to give up - but I'd suggest switching to v5 anyway because upstream development on v4 has ceased, and I'd like to drop v4 before Squeeze is released. Can I propose a simple workaround until v5 is out. Once upon a time the automount init.d script used to create the dirs. What exactly is the problem in doing this once again? It is a one-liner after all. Brgds, Regards, Jan -- Understanding is a three-edged sword: your side, their side, and the truth. --Kosh Naranek A. R. Ivanov E-mail: aiva...@sigsegv.cx WWW: http://www.sigsegv.cx/ pub 1024D/DDE5E715 2002-03-03 Anton R. Ivanov ariva...@sigsegv.cx Fingerprint: C824 CBD7 EE4B D7F8 5331 89D5 FCDA 572E DDE5 E715 -- To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#520095: autofs: 100% reproducible on NFS root for the last two releases
Package: autofs Version: 4.1.4+debian-2.1 Followup-For: Bug #520095 I had the same problem on NFS root with Sarge and it still exists in Lenny. Prior to Sarge the autofs init script was checking if the mountpoint dirs exist and if not - creating them. Without this it is broken on NFS root systems (100% reproducible). -- System Information: Debian Release: 5.0 APT prefers stable APT policy: (500, 'stable') Architecture: i386 (i686) Kernel: Linux 2.6.26-1-686 (SMP w/2 CPU cores) Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8) Shell: /bin/sh linked to /bin/bash Versions of packages autofs depends on: ii libc6 2.7-18 GNU C Library: Shared libraries ii ucf 3.0016 Update Configuration File: preserv Versions of packages autofs recommends: ii module-init-tools3.4-1 tools for managing Linux kernel mo ii nfs-common 1:1.1.2-6lenny1 NFS support files common to client autofs suggests no packages. -- no debconf information -- To UNSUBSCRIBE, email to debian-bugs-rc-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Bug#403915: RIPD loses sanity after changing a large chunk of iptable rules
Package: quagga Version: 0.98.3-7.2 Severity: serious This is observed only on one of several of our firewall systems (not the most loaded and most complex ones). They have 1000+ iptable rules generated by scripts and after reloading them ripd goes south. The process is still running but it does not generate any further updates. The vtysh interface shows all relevant RIP commands and a correct RIP configuration. Nothing obvious in the log so far. I will try to build the version from testing and test it after the 27th of December to see if it suffers from the same bug. -- System Information: Debian Release: 3.1 Architecture: i386 (i686) Kernel: Linux 2.6.14-1-686 Locale: LANG=C, LC_CTYPE=C (charmap=ANSI_X3.4-1968) Versions of packages quagga depends on: ii iproute 20041019-3 Professional tools to control the ii libc6 2.3.2.ds1-22sarge4 GNU C Library: Shared libraries an ii libcap1 1:1.10-14 support for getting/setting POSIX. ii libncurses5 5.4-4 Shared libraries for terminal hand ii libpam0g 0.76-22Pluggable Authentication Modules l ii libreadline4 4.3-11 GNU readline and history libraries ii logrotate 3.7-5 Log rotation utility -- no debconf information -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#375967: Segfaults
Package: fam Version: 2.7.0-6sarge1 Severity: serious fam segfaults when running on a heavily loaded server. The machine in question is an imap server running courier (with fam support) and an NFS server as well (circa 100 users). When started as /usr/sbin/famd -T 0 it will exit after a few minutes. Running it in foreground with -v does not produce anything reasonable. You see multiple messages about clients closing connections and a segfault at the end. -- System Information: Debian Release: 3.1 APT prefers testing APT policy: (500, 'testing') Architecture: i386 (i686) Kernel: Linux 2.6.10-1-k8-smp Locale: LANG=C, LC_CTYPE=C (charmap=ANSI_X3.4-1968) Versions of packages fam depends on: ii libc6 2.3.2.ds1-22 GNU C Library: Shared libraries an ii libgcc1 1:3.4.3-13 GCC support library ii libstdc++5 1:3.3.5-13 The GNU Standard C++ Library v3 ii portmap 5-9 The RPC portmapper -- no debconf information -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#375967: Acknowledgement (Segfaults)
I tried to run it under gdb without success. It gets a signal 33 after a while which I think is actually a GDB artefact. Any ideas on how to debug this will be appreciated. -- Understanding is a three-edged sword: your side, their side, and the truth. --Kosh Naranek A. R. Ivanov E-mail: [EMAIL PROTECTED] WWW: http://www.sigsegv.cx/ pub 1024D/DDE5E715 2002-03-03 Anton R. Ivanov [EMAIL PROTECTED] Fingerprint: C824 CBD7 EE4B D7F8 5331 89D5 FCDA 572E DDE5 E715 -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#375967: Acknowledgement (Segfaults)
Thomas Girard wrote: Selon Anton Ivanov [EMAIL PROTECTED]: I tried to run it under gdb without success. It gets a signal 33 after a while which I think is actually a GDB artefact. Any ideas on how to debug this will be appreciated. If this is *really* an artefact you can use handle SIG33 noprint nostop. famd: NetConnection.c++:252: void NetConnection::flush(): Assertion `ret == omsgList-len' failed. Program received signal SIGABRT, Aborted. 0xb7def83b in raise () from /lib/tls/libc.so.6 -- Understanding is a three-edged sword: your side, their side, and the truth. --Kosh Naranek A. R. Ivanov E-mail: [EMAIL PROTECTED] WWW: http://www.sigsegv.cx/ pub 1024D/DDE5E715 2002-03-03 Anton R. Ivanov [EMAIL PROTECTED] Fingerprint: C824 CBD7 EE4B D7F8 5331 89D5 FCDA 572E DDE5 E715 -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#375967: Acknowledgement (Segfaults)
Thomas Girard wrote: Selon Anton Ivanov [EMAIL PROTECTED]: famd: NetConnection.c++:252: void NetConnection::flush(): Assertion `ret == omsgList-len' failed. Program received signal SIGABRT, Aborted. 0xb7def83b in raise () from /lib/tls/libc.so.6 Great. And what does `bt full' give you then ? Program received signal SIGABRT, Aborted. 0xb7def83b in raise () from /lib/tls/libc.so.6 (gdb) bt full #0 0xb7def83b in raise () from /lib/tls/libc.so.6 No symbol table info available. #1 0xb7df0fa2 in abort () from /lib/tls/libc.so.6 No symbol table info available. #2 0xb7de92df in __assert_fail () from /lib/tls/libc.so.6 No symbol table info available. #3 0x080559aa in NetConnection::flush (this=0x80c684c) at NetConnection.c++:263 ret = 0 #4 0x08055853 in NetConnection::mprintf (this=0xb7ef8e80, format=0x0) at NetConnection.c++:236 msg = (NetConnection::msgList_s *) 0x80f5d88 #5 0x0804a321 in ClientConnection::send_event (this=0x80c684c, [EMAIL PROTECTED], request=409, name=0x80f5c40 X-Debian-Apps-Net-komba2.desktop) at ClientConnection.c++:54 code = 0 '\0' #6 0x0805c012 in TCP_Client::post_event (this=0x80c6820, [EMAIL PROTECTED], request=409, path=0x80f5c40 X-Debian-Apps-Net-komba2.desktop) at TCP_Client.c++:312 No locals. #7 0x0804a9fb in ClientInterest::post_event (this=0x80edcc0, [EMAIL PROTECTED], eventpath=0x80f5c40 X-Debian-Apps-Net-komba2.desktop) at ClientInterest.c++:131 No locals. #8 0x0804bce5 in DirEntry::post_event (this=0x80f5c98, [EMAIL PROTECTED], eventpath=0x0) at Interest.h:61 No locals. #9 0x0804cdc6 in DirectoryScanner::done (this=0x8076560) at DirectoryScanner.c++:149 dp = (dirent *) 0x0 ep = (class DirEntry *) 0x80f5c98 epp2 = (class DirEntry **) 0x6 ready = true #10 0x0804c0f4 in Directory (this=0x80edcc0, name=0x0, c=0x0, r=0, [EMAIL PROTECTED]) at Directory.c++:54 No locals. ---Type return to continue, or q return to quit--- #11 0x0805342c in MxClient::monitor_dir (this=0x80c6820, request=409, path=0xbfffe6d0 /exports/systems-team-home/mf2/.local/share/applications/menu-xdg, [EMAIL PROTECTED]) at MxClient.c++:92 ip = (class ClientInterest *) 0xbfffe668 #12 0x0805bb5f in TCP_Client::input_msg (this=0x80c6820, msg=0xbfffe6c0 À\\\f\b, size=104) at TCP_Client.c++:198 i = 6 grouplist = (gid_t *) 0x80edca0 ngroups = -1073748272 c = {static SuperUser = {static SuperUser = same as static member of an already seen type, p = 0x806e008, static untrusted = {static SuperUser = same as static member of an already seen type, p = 0x8073ca0, static untrusted = same as static member of an already seen type, static insecure_compat = false, static impllist = 0x80727e0, static nimpl = 16, static nimpl_alloc = 22}, static insecure_compat = false, static impllist = 0x80727e0, static nimpl = 16, static nimpl_alloc = 22}, p = 0x80c5cc0, static untrusted = same as static member of an already seen type, static insecure_compat = false, static impllist = 0x80727e0, static nimpl = 16, static nimpl_alloc = 22} got_N_with_groups = false p = 0x80c6b49 q = 0x80c6b49 opcode = 77 'M' reqnum = 409 uid = 1039 gid = 1039 filename = /exports/systems-team-home/mf2/.local/share/applications/menu-xdg\000\000/xdg-applications/Windows+Applications/Programs/FirstClass\000\000martSketch\000\000dia+Browser\000\000P\221þ·\000\000\000\000µØ\006\000\000èÿ¿ètò·¤çÿ¿\000\020\000\000X.Ý·ètò·\000\000\000\000\2256Ê\006 èÿ¿... i = 6 ---Type return to continue, or q return to quit--- msg_cred = {static SuperUser = {static SuperUser = same as static member of an already seen type, p = 0x806e008, static untrusted = {static SuperUser = same as static member of an already seen type, p = 0x8073ca0, static untrusted = same as static member of an already seen type, static insecure_compat = false, static impllist = 0x80727e0, static nimpl = 16, static nimpl_alloc = 22}, static insecure_compat = false, static impllist = 0x80727e0, static nimpl = 16, static nimpl_alloc = 22}, p = 0x80c5cc0, static untrusted = same as static member of an already seen type, static insecure_compat = false, static impllist = 0x80727e0, static nimpl = 16, static nimpl_alloc = 22} #13 0x0805b766 in TCP_Client::input_handler (msg=0x6 Address 0x6 out of bounds, nbytes=0, closure=0x80c6820) at TCP_Client.c++:69 No locals. #14 0x0804a2c6 in ClientConnection::input_msg (this=0x6, msg=0x0, nbytes=0) at ClientConnection.c++:40 No locals. #15 0x08055692 in NetConnection::deliver_input (this=0x80c684c) at NetConnection.c++:170 ihead = 0x80c6ade remaining = 135031626 #16 0x08059354 in Scheduler::handle_io (fds=0xb840, iotype=Scheduler::FDInfo::read) at Scheduler.c++:315 fp = (Scheduler::FDInfo *) 0x6 fd = 364 #17 0x08059431 in Scheduler::select () at Scheduler.c
Bug#308792: After the last update Via C3 systems give assertion failed in ld.so at boot
Package: libc6 Version: 2.3.2.ds1-21 Severity: critical After the last update C3 Version 1 with a kernel 2.6 image will fail on boot with: Inconsistency detected by ld.so: do_rel.h: 109 elf_dynamic_do_rel: Assertion '(map-l_info[(34+0+(0x6ff - (0x6ff0)))] != ((void *0))' failed! Tested with the following 2.6 images: older 2.6.10-1-386 (subversion 2), current 2.6.10-1-386 (subversion 10) whatever the debian installer tries to put on the system when booted with 2.6 - most likely current 2.6.10-1-386 2.6.9 (686 config altered to optimize for 386). with the default 2.4.18 image from woody will boot normally with 2.6.10-2 and 2.6.9 will boot normally if the libc6 is 2.3.2.ds1-20 or earlier. I am looking at the changelog for ds1-21 and so far I have no idea what could have caused it. -- System Information: Debian Release: 3.1 APT prefers testing APT policy: (500, 'testing') Architecture: i386 (i686) Kernel: Linux 2.4.18-bf2.4 Locale: LANG=C, LC_CTYPE=C (charmap=ANSI_X3.4-1968) Versions of packages libc6 depends on: ii libdb1-compat 2.1.3-7The Berkeley database routines [gl -- no debconf information -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Bug#308792: After the last update Via C3 systems give assertion failed in ld.so at boot
Daniel Jacobowitz wrote: On Thu, May 12, 2005 at 11:39:23AM +, Anton Ivanov wrote: Package: libc6 Version: 2.3.2.ds1-21 Severity: critical After the last update C3 Version 1 with a kernel 2.6 image will fail on boot with: Inconsistency detected by ld.so: do_rel.h: 109 elf_dynamic_do_rel: Assertion '(map-l_info[(34+0+(0x6ff - (0x6ff0)))] != ((void *0))' failed! Tested with the following 2.6 images: older 2.6.10-1-386 (subversion 2), current 2.6.10-1-386 (subversion 10) whatever the debian installer tries to put on the system when booted with 2.6 - most likely current 2.6.10-1-386 2.6.9 (686 config altered to optimize for 386). with the default 2.4.18 image from woody will boot normally with 2.6.10-2 and 2.6.9 will boot normally if the libc6 is 2.3.2.ds1-20 or earlier. I am looking at the changelog for ds1-21 and so far I have no idea what could have caused it. That's: #ifdef RTLD_BOOTSTRAP /* The dynamic linker always uses versioning. */ assert (map-l_info[VERSYMIDX (DT_VERSYM)] != NULL); #else The problem is not going to be anywhere near there. That is a check on the ld.so binary, which works elsewhere. Probably your mmap is busted. I do not see anything that would cause this in -21 either. I tested with a few more versions and some alternative hardware. 2.4.27 also does not boot unless you turn off ACPI and APIC. If you turn them off it boots. All 2.6 images and 2.4.27 boot OK on C3 V2. It starts looking like an an interaction of specific hardware and drivers - C3 V1, CMD649 ide and a few others. I still have no idea why does it bomb out in mmap/ldso. With a hardware problem I would have expected it to barf much earlier and in a more consistent manner. A. -- La Châtelier's Law: If some stress is brought to bear on a system in equilibrium, the equilibrium is displaced in the direction which tends to undo the effect of the stress.