[systemd-devel] Antw: systemd prerelease 243-rc1
>>> systemd tag bot schrieb am 30.07.2019 um 19:09 in Nachricht <20190730170916.1.c7b12db1b9d29...@refi64.com>: > A new systemd ☠️ pre-release ☠️ has just been tagged. Please download the > tarball here: > > https://github.com/systemd/systemd/archive/v243-rc1.tar.gz > > NOTE: This is ☠️ pre-release☠️ software. Do not run this on production > systems, but please test this and report any issues you find to GitHub: > > https://github.com/systemd/systemd/issues/new?template=Bug_report.md > > Changes since the previous release: > [...] > * Previously, filters defined with SystemCallFilter= would have the > effect that any calling of an offending system call would > terminate > the calling thread. This behaviour never made much sense, since > killing individual threads of unsuspecting processes is likely to > create more problems than it solves. With this release the default > action changed from killing the thread to killing the whole > process. For this to work correctly both a kernel version (>= 4.14) I never used that feature, but I feel an error code like EPERM would be most appropriate, because that's what it really is. > and a libseccomp version (>= 2.4.0) supporting this new seccomp > action is required. If an older kernel or libseccomp is used the > old > behaviour continues to be used. This change does not affect any > services that have no system call filters defined, or that use > SystemCallErrorNumber= (and thus see EPERM or another error > instead > of being killed when calling an offending system call). Note that > systemd documentation always claimed that the whole process is > killed. With this change behaviour is thus adjusted to match the > documentation. [...] ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd's connections to /run/systemd/private ?
On Tue, 2019-07-30 at 14:56 -0400, Brian Reichert wrote: > I see, between 13:49:30 and 13:50:01, I see 25 'successful' calls > for close(), e.g.: > > 13:50:01 close(19) = 0 > > Followed by getsockopt(), and a received message on the supposedly-closed > file descriptor: > > 13:50:01 getsockopt(19, SOL_SOCKET, SO_PEERCRED, {pid=3323, uid=0, gid=0}, > [12]) = 0 Are you sure it's the same file descriptor? You don't explicitly say anything about there not being any relevant lines between those. Does systemd really just call getsockopt() on fd 19 after closing it, with nothing to trigger that? Obvious candidates to check in the strace would be an accept call returning a new fd 19, or epoll indicating activity on the fd (though I'd expect systemd to remove the fd from the epoll set after closing it). ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd.journald.forward_to doesn't forward all journal messages
On Mon, Jul 29, 2019 at 1:26 AM Lennart Poettering wrote: > > On So, 28.07.19 22:11, Chris Murphy (li...@colorremedies.com) wrote: > > > Using either of the following: > > > > systemd.log_level=debug systemd.journald.forward_to_kmsg log_buf_len=8M > > > > systemd.log_level=debug systemd.log_target=kmsg log_buf_len=8M > > Note that this is not sufficient. You also have to pass > "printk.devkmsg=on" too, otherwise the kernel ratelimits log output > from usperspace ridiculously a lot, and you will see lots of dropped > messages. > > I have documented this now here: > > https://github.com/systemd/systemd/pull/13208 BOOT_IMAGE=/images/pxeboot/vmlinuz root=live:CDLABEL=Fedora-WS-Live-rawh-20190728-n-1 rd.live.image systemd.wants=zram-swap.service systemd.log_level=debug systemd.journald.forward_to_kmsg log_buf_len=8M printk.devkmsg=on Many messages I see in the journal still do not appear in kmsg. For example from /dev/kmsg 6,20619,201107529,-;zram: Cannot change disksize for initialized device 12,23154,208596765,-;org.fedoraproject.Anaconda.Modules.Network[2498]: DEBUG:anaconda.modules.network.network:Applying boot options KernelArguments([('BOOT_IMAGE', '/images/pxeboot/vmlinuz'), ('root', 'live:CDLABEL=Fedora-WS-Live-rawh-20190728-n-1'), ('rd.live.image', None), ('systemd.wants', 'zram-swap.service'), ('systemd.log_level', 'debug'), ('systemd.journald.forward_to_kmsg', None), ('log_buf_len', '8M'), ('printk.devkmsg', 'on')]) 12,25049,210822858,-;org.fedoraproject.Anaconda.Modules.Storage[2498]: DEBUG:anaconda.modules.storage.disk_selection.selection:Protected devices are set to '['/dev/zram0']'. ^C [root@localhost-live liveuser]# journalctl -o short-monotonic | grep zram [ 203.224915] localhost-live systemd[1477]: Added job dev-zram0.device/nop to transaction. [ 203.225017] localhost-live systemd[1477]: dev-zram0.device: Installed new job dev-zram0.device/nop as 295 [ 203.225143] localhost-live systemd[1477]: Added job sys-devices-virtual-block-zram0.device/nop to transaction. [ 203.225245] localhost-live systemd[1477]: sys-devices-virtual-block-zram0.device: Installed new job sys-devices-virtual-block-zram0.device/nop as 296 [ 203.225355] localhost-live systemd[1477]: sys-devices-virtual-block-zram0.device: Job 296 sys-devices-virtual-block-zram0.device/nop finished, result=done [ 203.225570] localhost-live systemd[1477]: dev-zram0.device: Job 295 dev-zram0.device/nop finished, result=done [ 208.959944] localhost-live systemd[1477]: Added job dev-zram0.device/nop to transaction. [ 208.961015] localhost-live systemd[1477]: dev-zram0.device: Installed new job dev-zram0.device/nop as 340 [ 208.961324] localhost-live systemd[1477]: Added job sys-devices-virtual-block-zram0.device/nop to transaction. [ 208.961508] localhost-live systemd[1477]: sys-devices-virtual-block-zram0.device: Installed new job sys-devices-virtual-block-zram0.device/nop as 341 [ 208.961789] localhost-live systemd[1477]: sys-devices-virtual-block-zram0.device: Job 341 sys-devices-virtual-block-zram0.device/nop finished, result=done [ 208.962021] localhost-live systemd[1477]: dev-zram0.device: Job 340 dev-zram0.device/nop finished, result=done [ 209.822448] localhost-live systemd[1477]: Added job dev-zram0.device/nop to transaction. [ 209.822625] localhost-live systemd[1477]: dev-zram0.device: Installed new job dev-zram0.device/nop as 377 [ 209.822757] localhost-live systemd[1477]: Added job sys-devices-virtual-block-zram0.device/nop to transaction. [ 209.822861] localhost-live systemd[1477]: sys-devices-virtual-block-zram0.device: Installed new job sys-devices-virtual-block-zram0.device/nop as 378 [ 209.822983] localhost-live systemd[1477]: sys-devices-virtual-block-zram0.device: Job 378 sys-devices-virtual-block-zram0.device/nop finished, result=done [ 209.823106] localhost-live systemd[1477]: dev-zram0.device: Job 377 dev-zram0.device/nop finished, result=done [ 213.866820] localhost-live anaconda[2490]: blivet: DeviceTree.get_device_by_path: path: /dev/zram0 ; incomplete: False ; hidden: False ; [ 213.868392] localhost-live anaconda[2490]: blivet: failed to resolve '/dev/zram0' [root@localhost-live liveuser]# Literally zero of those lines appear in kmsg 6,20619,201107529,-;zram: Cannot change disksize for initialized device 12,23154,208596765,-;org.fedoraproject.Anaconda.Modules.Network[2498]: DEBUG:anaconda.modules.network.network:Applying boot options KernelArguments([('BOOT_IMAGE', '/images/pxeboot/vmlinuz'), ('root', 'live:CDLABEL=Fedora-WS-Live-rawh-20190728-n-1'), ('rd.live.image', None), ('systemd.wants', 'zram-swap.service'), ('systemd.log_level', 'debug'), ('systemd.journald.forward_to_kmsg', None), ('log_buf_len', '8M'), ('printk.devkmsg', 'on')]) 12,25049,210822858,-;org.fedoraproject.Anaconda.Modules.Storage[2498]: DEBUG:anaconda.modules.storage.disk_selection.selection:Protected devices are set to '['/dev/zram0']'. The first is a kernel message, the next two are anaconda messages that don't appear
Re: [systemd-devel] systemd's connections to /run/systemd/private ?
On Thu, Jul 11, 2019 at 08:35:38PM +, Zbigniew J??drzejewski-Szmek wrote: > On Thu, Jul 11, 2019 at 10:08:43AM -0400, Brian Reichert wrote: > > Does that sound like expected behavior? > > No, this shouldn't happen. > > What I was trying to say, is that if you have the strace log, you > can figure out what created the stale connection and what the dbus > call was, and from all that info it should be fairly simply to figure > out what the calling command was. Once you have that, it'll be much > easier to reproduce the issue in controlled setting and look for the > fix. I'm finally revisiting this. I haven't found a way to get a trace to start early enough to catch the initial open() on all of the targeted file descriptors, but I'm trying to make do with what I have. To sum up, in my naive analysis, I see close() called many times on a file descriptor. I then see more messages come in on that same descriptor. But the timestamp of the descriptor in /proc never changes. I created a service to launch strace as early as I can figure: localhost:~ # cat /usr/lib/systemd/system/systemd_strace.service [Unit] Description=strace systemd DefaultDependencies=no After=local-fs.target Before=sysinit.target ConditionPathExists=!/etc/initrd-release [Service] ExecStart=/usr/bin/strace -p1 -t -o /home/systemd.strace -e recvmsg,close,accept4,getsockname,getsockopt,sendmsg -s999 ExecStop=/bin/echo systemd_strace.service will soon exit Type=simple [Install] WantedBy=multi-user.target I introduced the '-t' flag, so I'd get timestamps on the recorded entries. I rebooted the server, and after ~20 minutes, I found stale descriptors, that seem to date to when the host first booted. Note the age of them, relative to the boot time, and they have no connected peers. localhost:~ # uptime 14:10pm up 0:21, 3 users, load average: 0.81, 0.24, 0.15 localhost:~ # date Tue Jul 30 14:10:09 EDT 2019 localhost:~ # lsof -nP /run/systemd/private | awk '/systemd/ { sub(/u/, "", $4); print $4}' | ( cd /proc/1/fd; xargs ls -t --full-time ) | tail -5 lrwx-- 1 root root 64 2019-07-30 13:49:25.458694632 -0400 14 -> socket:[28742] lrwx-- 1 root root 64 2019-07-30 13:49:25.458694632 -0400 16 -> socket:[35430] lrwx-- 1 root root 64 2019-07-30 13:49:25.458694632 -0400 17 -> socket:[37758] lrwx-- 1 root root 64 2019-07-30 13:49:25.458694632 -0400 18 -> socket:[41044] lrwx-- 1 root root 64 2019-07-30 13:49:25.458694632 -0400 19 -> socket:[43411] localhost:~ # ss -x | grep /run/systemd/private | grep -v -e '* 0' | wc -l 0 This is an XFS filesystem, so I can't directly get the creation time of my trace file, but I can see the first entry is timestamped '13:49:07'. I copied the trace file aside, and edited that copy to trim everything off after 14:10:09, when I ran that 'date' command above. As early as I tried to start this trace, dozens of file descriptors had already been created. Trying to focus on FD 19 (the oldest connection to /run/systemd/private): I see, between 13:49:30 and 13:50:01, I see 25 'successful' calls for close(), e.g.: 13:50:01 close(19) = 0 Followed by getsockopt(), and a received message on the supposedly-closed file descriptor: 13:50:01 getsockopt(19, SOL_SOCKET, SO_PEERCRED, {pid=3323, uid=0, gid=0}, [12]) = 0 13:50:01 getsockopt(19, SOL_SOCKET, SO_RCVBUF, [4194304], [4]) = 0 13:50:01 getsockopt(19, SOL_SOCKET, SO_SNDBUF, [262144], [4]) = 0 13:50:01 getsockopt(19, SOL_SOCKET, SO_PEERCRED, {pid=3323, uid=0, gid=0}, [12]) = 0 13:50:01 getsockopt(19, SOL_SOCKET, SO_ACCEPTCONN, [0], [4]) = 0 13:50:01 getsockname(19, {sa_family=AF_LOCAL, sun_path="/run/systemd/private"}, [23]) = 0 13:50:01 recvmsg(19, {msg_name(0)=NULL, msg_iov(1)=[{"\0AUTH EXTERNAL 30\r\nNEGOTIATE_UNIX_FD\r\nBEGIN\r\n", 256}], msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = 45 13:50:01 sendmsg(19, {msg_name(0)=NULL, msg_iov(3)=[{"OK 9fcf621ece0a4fe897586e28058cd2fb\r\nAGREE_UNIX_FD\r\n", 52}, {NULL, 0}, {NULL, 0}], msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 52 13:50:01 sendmsg(19, {msg_name(0)=NULL, msg_iov(2)=[{"l\4\1\1P\0\0\0\1\0\0\0p\0\0\0\1\1o\0\31\0\0\0/org/freedesktop/systemd1\0\0\0\0\0\0\0\2\1s\0\0\0\0org.freedesktop.systemd1.Manager\0\0\0\0\0\0\0\0\3\1s\0\7\0\0\0UnitNew\0\10\1g\0\2so\0", 128}, {"\20\0\0\0session-11.scope\0\0\0\0003\0\0\0/org/freedesktop/systemd1/unit/session_2d11_2escope\0", 80}], msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = -1 EPIPE (Broken pipe) I see a continuous stream of messages coming in on FD 19, though the end of the trace, but the age of the file descriptor in /proc never seems to change. Am I misinterpreting something? > Zbyszek -- Brian Reichert BSD admin/developer at large ___ systemd-devel mailing list systemd-dev
[systemd-devel] systemd prerelease 243-rc1
A new systemd ☠️ pre-release ☠️ has just been tagged. Please download the tarball here: https://github.com/systemd/systemd/archive/v243-rc1.tar.gz NOTE: This is ☠️ pre-release☠️ software. Do not run this on production systems, but please test this and report any issues you find to GitHub: https://github.com/systemd/systemd/issues/new?template=Bug_report.md Changes since the previous release: * This release enables unprivileged programs (i.e. requiring neither setuid nor file capabilities) to send ICMP Echo (i.e. ping) requests by turning on the "net.ipv4.ping_group_range" sysctl of the Linux kernel for the whole UNIX group range, i.e. all processes. This change should be reasonably safe, as the kernel support for it was specifically implemented to allow safe access to ICMP Echo for processes lacking any privileges. If this is not desirable, it can be disabled again by setting the parameter to "1 0". * Previously, filters defined with SystemCallFilter= would have the effect that any calling of an offending system call would terminate the calling thread. This behaviour never made much sense, since killing individual threads of unsuspecting processes is likely to create more problems than it solves. With this release the default action changed from killing the thread to killing the whole process. For this to work correctly both a kernel version (>= 4.14) and a libseccomp version (>= 2.4.0) supporting this new seccomp action is required. If an older kernel or libseccomp is used the old behaviour continues to be used. This change does not affect any services that have no system call filters defined, or that use SystemCallErrorNumber= (and thus see EPERM or another error instead of being killed when calling an offending system call). Note that systemd documentation always claimed that the whole process is killed. With this change behaviour is thus adjusted to match the documentation. * On 64 bit systems, the "kernel.pid_max" sysctl is now bumped to 4194304 by default, i.e. the full 22bit range the kernel allows, up from the old 16bit range. This should improve security and robustness, as PID collisions are made less likely (though certainly still possible). There are rumours this might create compatibility problems, though at this moment no practical ones are known to us. Downstream distributions are hence advised to undo this change in their builds if they are concerned about maximum compatibility, but for everybody else we recommend leaving the value bumped. Besides improving security and robustness this should also simplify things as the maximum number of allowed concurrent tasks was previously bounded by both "kernel.pid_max" and "kernel.threads-max" and now effectively only a single knob is left ("kernel.threads-max"). There have been concerns that usability is affected by this change because larger PID numbers are harder to type, but we believe the change from 5 digits to 7 digits doesn't hamper usability. * MemoryLow= and MemoryMin= gained hierarchy-aware counterparts, DefaultMemoryLow= and DefaultMemoryMin=, which can be used to hierarchically set default memory protection values for a particular subtree of the unit hierarchy. * Memory protection directives can now take a value of zero, allowing explicit opting out of a default value propagated by an ancestor. * A new setting DisableControllers= has been added that may be used to explicitly disable one or more cgroups controllers for a unit and all its children. * systemd now defaults to the "unified" cgroup hierarchy setup during build-time, i.e. -Ddefault-hierarchy=unified is now the build-time default. Previously, -Ddefault-hierarchy=hybrid was the default. This change reflects the fact that cgroupsv2 support has matured substantially in both systemd and in the kernel, and is clearly the way forward. Downstream production distributions might want to continue to use -Ddefault-hierarchy=hybrid (or even =legacy) for their builds as unfortunately the popular container managers have not caught up with the kernel API changes. * Man pages are not built by default anymore (html pages were already disabled by default), to make development builds quicker. When building systemd for a full installation with documentation, meson should be called with -Dman=true and/or -Dhtml=true as appropriate. The default was changed based on the
Re: [systemd-devel] KExecWatchdogSec NEWS entry needs work
On Tue, Jul 30, 2019 at 08:32:44AM +1000, Clinton Roy wrote: > Particularly the following sentence: > > This option defaults to off, since it depends on drivers and > software setup whether the watchdog is correctly reset again after > the kexec completed, and thus for the general case not clear if safe > (since it might cause unwanted watchdog reboots after the kexec > completed otherwise). > > I can't quite work out what intent is, otherwise I'd take a stab myself. https://github.com/systemd/systemd/pull/13227 Zbyszek ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] vconsole.conf, systemd-localed and the console keymap in the initrd
Hi, On 30-07-19 10:49, Hans de Goede wrote: Hi All, When using full-disk encryption the console keymap is used in the initrd to enter the disk-crypt password. There are a couple of issues with this: 1) keymap changes do not become effective until a new kernel (which generated a new initrd which includes the updated vconsole.conf) gets installed: https://bugzilla.redhat.com/show_bug.cgi?id=1405539 Note this one is part of: https://fedoraproject.org/wiki/Fedora_Program_Management/Prioritized_bugs_and_issues We could have the tools re-generate the existing initrds when the keymap changes but that is not 100% bullet proof, if some bug has snuck in which causes new initrds to not boot, then we've just overwritten the older fallback initrds with ones which will also not boot... Also in the future we want to move to using a single generic pre-generated initrd everywhere and silverblue is already doing this, which brings me to 2: 2) When using a generic initrd which does not include /etc/vconsole.conf the keymap will also be "us" independent of what the system is configured to use. I forgot to put a link to the issue for this here, for those who are interested this is being tracked / discussed here: https://github.com/fedora-silverblue/issue-tracker/issues/3 Regards, Hans I believe that the best way to fix is this is probably to specify the keymap on the kernel commandline using vconsole.keymap= on the kernel commandline. So 2 questions: 1) What is your (systemd devs) take on this, does using vconsole.keymap= on the kernel commandline sound like the right solution, or do you have other suggestions? 2) I wonder what will happen when runtime changing the keymap when vconsole.keymap=foo is specified on the kernel commandline? systemd-vconsole-setup will use the values on the kernel commandline over those in /etc/vconsole.conf, and until we reboot those 2 will no longer be in sync. systemd-vconsole-setup runs when a new vtconsole gets added, but that should (normally) not happen after boot so that is not a problem. But I wonder how systemd-localed applies changes to the current vtconsole(s) does it do this itself, or does it use systemd-vconsole-setup for this ? I ask because if it uses systemd-vconsole-setup and that prefers the kernel commandline value then the change will not happen until reboot. Which I believe would be a regression compared to how things work now... Regards, Hans ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] vconsole.conf, systemd-localed and the console keymap in the initrd
Hi All, When using full-disk encryption the console keymap is used in the initrd to enter the disk-crypt password. There are a couple of issues with this: 1) keymap changes do not become effective until a new kernel (which generated a new initrd which includes the updated vconsole.conf) gets installed: https://bugzilla.redhat.com/show_bug.cgi?id=1405539 Note this one is part of: https://fedoraproject.org/wiki/Fedora_Program_Management/Prioritized_bugs_and_issues We could have the tools re-generate the existing initrds when the keymap changes but that is not 100% bullet proof, if some bug has snuck in which causes new initrds to not boot, then we've just overwritten the older fallback initrds with ones which will also not boot... Also in the future we want to move to using a single generic pre-generated initrd everywhere and silverblue is already doing this, which brings me to 2: 2) When using a generic initrd which does not include /etc/vconsole.conf the keymap will also be "us" independent of what the system is configured to use. I believe that the best way to fix is this is probably to specify the keymap on the kernel commandline using vconsole.keymap= on the kernel commandline. So 2 questions: 1) What is your (systemd devs) take on this, does using vconsole.keymap= on the kernel commandline sound like the right solution, or do you have other suggestions? 2) I wonder what will happen when runtime changing the keymap when vconsole.keymap=foo is specified on the kernel commandline? systemd-vconsole-setup will use the values on the kernel commandline over those in /etc/vconsole.conf, and until we reboot those 2 will no longer be in sync. systemd-vconsole-setup runs when a new vtconsole gets added, but that should (normally) not happen after boot so that is not a problem. But I wonder how systemd-localed applies changes to the current vtconsole(s) does it do this itself, or does it use systemd-vconsole-setup for this ? I ask because if it uses systemd-vconsole-setup and that prefers the kernel commandline value then the change will not happen until reboot. Which I believe would be a regression compared to how things work now... Regards, Hans ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel