[systemd-devel] Antw: systemd prerelease 243-rc1

2019-07-30 Thread Ulrich Windl
>>> systemd tag bot  schrieb am 30.07.2019
um
19:09 in Nachricht <20190730170916.1.c7b12db1b9d29...@refi64.com>:
> A new systemd ☠️ pre-release ☠️ has just been tagged. Please download the 
> tarball here:
> 
> https://github.com/systemd/systemd/archive/v243-rc1.tar.gz 
> 
> NOTE: This is ☠️ pre-release☠️ software. Do not run this on production 
> systems, but please test this and report any issues you find to GitHub:
> 
> https://github.com/systemd/systemd/issues/new?template=Bug_report.md

> 
> Changes since the previous release:
> 
[...]
> * Previously, filters defined with SystemCallFilter= would have the
>   effect that any calling of an offending system call would 
> terminate
>   the calling thread. This behaviour never made much sense, since
>   killing individual threads of unsuspecting processes is likely to
>   create more problems than it solves. With this release the
default
>   action changed from killing the thread to killing the whole
>   process. For this to work correctly both a kernel version (>=
4.14)

I never used that feature, but I feel an error code like EPERM would be most
appropriate, because that's what it really is.

>   and a libseccomp version (>= 2.4.0) supporting this new seccomp
>   action is required. If an older kernel or libseccomp is used the 
> old
>   behaviour continues to be used. This change does not affect any
>   services that have no system call filters defined, or that use
>   SystemCallErrorNumber= (and thus see EPERM or another error 
> instead
>   of being killed when calling an offending system call). Note that
>   systemd documentation always claimed that the whole process is
>   killed. With this change behaviour is thus adjusted to match the
>   documentation.
[...]


___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] systemd's connections to /run/systemd/private ?

2019-07-30 Thread Uoti Urpala
On Tue, 2019-07-30 at 14:56 -0400, Brian Reichert wrote:
> I see, between 13:49:30 and 13:50:01, I see 25 'successful' calls
> for close(), e.g.:
> 
>   13:50:01 close(19)  = 0
> 
> Followed by getsockopt(), and a received message on the supposedly-closed
> file descriptor:
> 
>   13:50:01 getsockopt(19, SOL_SOCKET, SO_PEERCRED, {pid=3323, uid=0, gid=0}, 
> [12]) = 0

Are you sure it's the same file descriptor? You don't explicitly say
anything about there not being any relevant lines between those. Does
systemd really just call getsockopt() on fd 19 after closing it, with
nothing to trigger that? Obvious candidates to check in the strace
would be an accept call returning a new fd 19, or epoll indicating
activity on the fd (though I'd expect systemd to remove the fd from the
epoll set after closing it).


___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] systemd.journald.forward_to doesn't forward all journal messages

2019-07-30 Thread Chris Murphy
On Mon, Jul 29, 2019 at 1:26 AM Lennart Poettering
 wrote:
>
> On So, 28.07.19 22:11, Chris Murphy (li...@colorremedies.com) wrote:
>
> > Using either of the following:
> >
> > systemd.log_level=debug systemd.journald.forward_to_kmsg log_buf_len=8M
> >
> > systemd.log_level=debug systemd.log_target=kmsg log_buf_len=8M
>
> Note that this is not sufficient. You also have to pass
> "printk.devkmsg=on" too, otherwise the kernel ratelimits log output
> from usperspace ridiculously a lot, and you will see lots of dropped
> messages.
>
> I have documented this now here:
>
> https://github.com/systemd/systemd/pull/13208

BOOT_IMAGE=/images/pxeboot/vmlinuz
root=live:CDLABEL=Fedora-WS-Live-rawh-20190728-n-1 rd.live.image
systemd.wants=zram-swap.service systemd.log_level=debug
systemd.journald.forward_to_kmsg log_buf_len=8M printk.devkmsg=on

Many messages I see in the journal still do not appear in kmsg. For
example from /dev/kmsg

6,20619,201107529,-;zram: Cannot change disksize for initialized device
12,23154,208596765,-;org.fedoraproject.Anaconda.Modules.Network[2498]:
DEBUG:anaconda.modules.network.network:Applying boot options
KernelArguments([('BOOT_IMAGE', '/images/pxeboot/vmlinuz'), ('root',
'live:CDLABEL=Fedora-WS-Live-rawh-20190728-n-1'), ('rd.live.image',
None), ('systemd.wants', 'zram-swap.service'), ('systemd.log_level',
'debug'), ('systemd.journald.forward_to_kmsg', None), ('log_buf_len',
'8M'), ('printk.devkmsg', 'on')])
12,25049,210822858,-;org.fedoraproject.Anaconda.Modules.Storage[2498]:
DEBUG:anaconda.modules.storage.disk_selection.selection:Protected
devices are set to '['/dev/zram0']'.
^C
[root@localhost-live liveuser]# journalctl -o short-monotonic | grep zram
[  203.224915] localhost-live systemd[1477]: Added job
dev-zram0.device/nop to transaction.
[  203.225017] localhost-live systemd[1477]: dev-zram0.device:
Installed new job dev-zram0.device/nop as 295
[  203.225143] localhost-live systemd[1477]: Added job
sys-devices-virtual-block-zram0.device/nop to transaction.
[  203.225245] localhost-live systemd[1477]:
sys-devices-virtual-block-zram0.device: Installed new job
sys-devices-virtual-block-zram0.device/nop as 296
[  203.225355] localhost-live systemd[1477]:
sys-devices-virtual-block-zram0.device: Job 296
sys-devices-virtual-block-zram0.device/nop finished, result=done
[  203.225570] localhost-live systemd[1477]: dev-zram0.device: Job 295
dev-zram0.device/nop finished, result=done
[  208.959944] localhost-live systemd[1477]: Added job
dev-zram0.device/nop to transaction.
[  208.961015] localhost-live systemd[1477]: dev-zram0.device:
Installed new job dev-zram0.device/nop as 340
[  208.961324] localhost-live systemd[1477]: Added job
sys-devices-virtual-block-zram0.device/nop to transaction.
[  208.961508] localhost-live systemd[1477]:
sys-devices-virtual-block-zram0.device: Installed new job
sys-devices-virtual-block-zram0.device/nop as 341
[  208.961789] localhost-live systemd[1477]:
sys-devices-virtual-block-zram0.device: Job 341
sys-devices-virtual-block-zram0.device/nop finished, result=done
[  208.962021] localhost-live systemd[1477]: dev-zram0.device: Job 340
dev-zram0.device/nop finished, result=done
[  209.822448] localhost-live systemd[1477]: Added job
dev-zram0.device/nop to transaction.
[  209.822625] localhost-live systemd[1477]: dev-zram0.device:
Installed new job dev-zram0.device/nop as 377
[  209.822757] localhost-live systemd[1477]: Added job
sys-devices-virtual-block-zram0.device/nop to transaction.
[  209.822861] localhost-live systemd[1477]:
sys-devices-virtual-block-zram0.device: Installed new job
sys-devices-virtual-block-zram0.device/nop as 378
[  209.822983] localhost-live systemd[1477]:
sys-devices-virtual-block-zram0.device: Job 378
sys-devices-virtual-block-zram0.device/nop finished, result=done
[  209.823106] localhost-live systemd[1477]: dev-zram0.device: Job 377
dev-zram0.device/nop finished, result=done
[  213.866820] localhost-live anaconda[2490]: blivet:
DeviceTree.get_device_by_path: path: /dev/zram0 ; incomplete: False ;
hidden: False ;
[  213.868392] localhost-live anaconda[2490]: blivet: failed to
resolve '/dev/zram0'
[root@localhost-live liveuser]#


Literally zero of those lines appear in kmsg

6,20619,201107529,-;zram: Cannot change disksize for initialized device
12,23154,208596765,-;org.fedoraproject.Anaconda.Modules.Network[2498]:
DEBUG:anaconda.modules.network.network:Applying boot options
KernelArguments([('BOOT_IMAGE', '/images/pxeboot/vmlinuz'), ('root',
'live:CDLABEL=Fedora-WS-Live-rawh-20190728-n-1'), ('rd.live.image',
None), ('systemd.wants', 'zram-swap.service'), ('systemd.log_level',
'debug'), ('systemd.journald.forward_to_kmsg', None), ('log_buf_len',
'8M'), ('printk.devkmsg', 'on')])
12,25049,210822858,-;org.fedoraproject.Anaconda.Modules.Storage[2498]:
DEBUG:anaconda.modules.storage.disk_selection.selection:Protected
devices are set to '['/dev/zram0']'.

The first is a kernel message, the next two are anaconda messages that
don't appear 

Re: [systemd-devel] systemd's connections to /run/systemd/private ?

2019-07-30 Thread Brian Reichert
On Thu, Jul 11, 2019 at 08:35:38PM +, Zbigniew J??drzejewski-Szmek wrote:
> On Thu, Jul 11, 2019 at 10:08:43AM -0400, Brian Reichert wrote:
> > Does that sound like expected behavior?
> 
> No, this shouldn't happen.
> 
> What I was trying to say, is that if you have the strace log, you
> can figure out what created the stale connection and what the dbus
> call was, and from all that info it should be fairly simply to figure
> out what the calling command was. Once you have that, it'll be much
> easier to reproduce the issue in controlled setting and look for the
> fix.

I'm finally revisiting this. I haven't found a way to get a trace
to start early enough to catch the initial open() on all of the
targeted file descriptors, but I'm trying to make do with what I
have.

To sum up, in my naive analysis, I see close() called many times
on a file descriptor. I then see more messages come in on that same
descriptor.  But the timestamp of the descriptor in /proc never
changes.

I created a service to launch strace as early as I can figure:

  localhost:~ # cat /usr/lib/systemd/system/systemd_strace.service
  [Unit]
  Description=strace systemd
  DefaultDependencies=no
  After=local-fs.target
  Before=sysinit.target
  ConditionPathExists=!/etc/initrd-release
  
  [Service]
  ExecStart=/usr/bin/strace -p1 -t -o /home/systemd.strace -e
  recvmsg,close,accept4,getsockname,getsockopt,sendmsg -s999
  ExecStop=/bin/echo systemd_strace.service will soon exit
  Type=simple
  
  [Install]
  WantedBy=multi-user.target
  
I introduced the '-t' flag, so I'd get timestamps on the recorded
entries.

I rebooted the server, and after ~20 minutes, I found stale
descriptors, that seem to date to when the host first booted.

Note the age of them, relative to the boot time, and they have no
connected peers.

  localhost:~ # uptime
   14:10pm  up   0:21,  3 users,  load average: 0.81, 0.24, 0.15
  localhost:~ # date
  Tue Jul 30 14:10:09 EDT 2019
  localhost:~ # lsof -nP /run/systemd/private | awk '/systemd/ { sub(/u/, "",
  $4); print $4}' | ( cd /proc/1/fd; xargs ls -t --full-time ) | tail -5
  lrwx-- 1 root root 64 2019-07-30 13:49:25.458694632 -0400 14 -> 
socket:[28742]
  lrwx-- 1 root root 64 2019-07-30 13:49:25.458694632 -0400 16 -> 
socket:[35430]
  lrwx-- 1 root root 64 2019-07-30 13:49:25.458694632 -0400 17 -> 
socket:[37758]
  lrwx-- 1 root root 64 2019-07-30 13:49:25.458694632 -0400 18 -> 
socket:[41044]
  lrwx-- 1 root root 64 2019-07-30 13:49:25.458694632 -0400 19 -> 
socket:[43411]
  localhost:~ # ss -x | grep /run/systemd/private | grep -v -e '* 0' | wc -l
  0

This is an XFS filesystem, so I can't directly get the creation
time of my trace file, but I can see the first entry is timestamped
'13:49:07'.

I copied the trace file aside, and edited that copy to trim everything
off after 14:10:09, when I ran that 'date' command above.

As early as I tried to start this trace, dozens of file descriptors
had already been created.

Trying to focus on FD 19 (the oldest connection to /run/systemd/private):

I see, between 13:49:30 and 13:50:01, I see 25 'successful' calls
for close(), e.g.:

13:50:01 close(19)  = 0

Followed by getsockopt(), and a received message on the supposedly-closed
file descriptor:

  13:50:01 getsockopt(19, SOL_SOCKET, SO_PEERCRED, {pid=3323, uid=0, gid=0}, 
[12]) = 0
  13:50:01 getsockopt(19, SOL_SOCKET, SO_RCVBUF, [4194304], [4]) = 0
  13:50:01 getsockopt(19, SOL_SOCKET, SO_SNDBUF, [262144], [4]) = 0
  13:50:01 getsockopt(19, SOL_SOCKET, SO_PEERCRED, {pid=3323, uid=0, gid=0}, 
[12]) = 0
  13:50:01 getsockopt(19, SOL_SOCKET, SO_ACCEPTCONN, [0], [4]) = 0
  13:50:01 getsockname(19, {sa_family=AF_LOCAL, 
sun_path="/run/systemd/private"}, [23]) = 0
  13:50:01 recvmsg(19, {msg_name(0)=NULL, msg_iov(1)=[{"\0AUTH EXTERNAL 
30\r\nNEGOTIATE_UNIX_FD\r\nBEGIN\r\n", 256}], msg_controllen=0, 
msg_flags=MSG_CMSG_CLOEXEC}, MSG_DONTWAIT|MSG_NOSIGNAL|MSG_CMSG_CLOEXEC) = 45
  13:50:01 sendmsg(19, {msg_name(0)=NULL, msg_iov(3)=[{"OK 
9fcf621ece0a4fe897586e28058cd2fb\r\nAGREE_UNIX_FD\r\n", 52}, {NULL, 0}, {NULL, 
0}], msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = 52 13:50:01 
sendmsg(19, {msg_name(0)=NULL, 
msg_iov(2)=[{"l\4\1\1P\0\0\0\1\0\0\0p\0\0\0\1\1o\0\31\0\0\0/org/freedesktop/systemd1\0\0\0\0\0\0\0\2\1s\0\0\0\0org.freedesktop.systemd1.Manager\0\0\0\0\0\0\0\0\3\1s\0\7\0\0\0UnitNew\0\10\1g\0\2so\0",
 128}, 
{"\20\0\0\0session-11.scope\0\0\0\0003\0\0\0/org/freedesktop/systemd1/unit/session_2d11_2escope\0",
 80}], msg_controllen=0, msg_flags=0}, MSG_DONTWAIT|MSG_NOSIGNAL) = -1 EPIPE 
(Broken pipe)

I see a continuous stream of messages coming in on FD 19, though
the end of the trace, but the age of the file descriptor in /proc
never seems to change.

Am I misinterpreting something?

> Zbyszek

-- 
Brian Reichert  
BSD admin/developer at large
___
systemd-devel mailing list
systemd-dev

[systemd-devel] systemd prerelease 243-rc1

2019-07-30 Thread systemd tag bot
A new systemd ☠️ pre-release ☠️ has just been tagged. Please download the 
tarball here:

https://github.com/systemd/systemd/archive/v243-rc1.tar.gz

NOTE: This is ☠️ pre-release☠️ software. Do not run this on production systems, 
but please test this and report any issues you find to GitHub:

https://github.com/systemd/systemd/issues/new?template=Bug_report.md

Changes since the previous release:

* This release enables unprivileged programs (i.e. requiring neither
  setuid nor file capabilities) to send ICMP Echo (i.e. ping) requests
  by turning on the "net.ipv4.ping_group_range" sysctl of the Linux
  kernel for the whole UNIX group range, i.e. all processes. This
  change should be reasonably safe, as the kernel support for it was
  specifically implemented to allow safe access to ICMP Echo for
  processes lacking any privileges. If this is not desirable, it can be
  disabled again by setting the parameter to "1 0".

* Previously, filters defined with SystemCallFilter= would have the
  effect that any calling of an offending system call would terminate
  the calling thread. This behaviour never made much sense, since
  killing individual threads of unsuspecting processes is likely to
  create more problems than it solves. With this release the default
  action changed from killing the thread to killing the whole
  process. For this to work correctly both a kernel version (>= 4.14)
  and a libseccomp version (>= 2.4.0) supporting this new seccomp
  action is required. If an older kernel or libseccomp is used the old
  behaviour continues to be used. This change does not affect any
  services that have no system call filters defined, or that use
  SystemCallErrorNumber= (and thus see EPERM or another error instead
  of being killed when calling an offending system call). Note that
  systemd documentation always claimed that the whole process is
  killed. With this change behaviour is thus adjusted to match the
  documentation.

* On 64 bit systems, the "kernel.pid_max" sysctl is now bumped to
  4194304 by default, i.e. the full 22bit range the kernel allows, up
  from the old 16bit range. This should improve security and
  robustness, as PID collisions are made less likely (though certainly
  still possible). There are rumours this might create compatibility
  problems, though at this moment no practical ones are known to
  us. Downstream distributions are hence advised to undo this change in
  their builds if they are concerned about maximum compatibility, but
  for everybody else we recommend leaving the value bumped. Besides
  improving security and robustness this should also simplify things as
  the maximum number of allowed concurrent tasks was previously bounded
  by both "kernel.pid_max" and "kernel.threads-max" and now effectively
  only a single knob is left ("kernel.threads-max"). There have been
  concerns that usability is affected by this change because larger PID
  numbers are harder to type, but we believe the change from 5 digits
  to 7 digits doesn't hamper usability.

* MemoryLow= and MemoryMin= gained hierarchy-aware counterparts,
  DefaultMemoryLow= and DefaultMemoryMin=, which can be used to
  hierarchically set default memory protection values for a particular
  subtree of the unit hierarchy.

* Memory protection directives can now take a value of zero, allowing
  explicit opting out of a default value propagated by an ancestor.

* A new setting DisableControllers= has been added that may be used to
  explicitly disable one or more cgroups controllers for a unit and all
  its children.

* systemd now defaults to the "unified" cgroup hierarchy setup during
  build-time, i.e. -Ddefault-hierarchy=unified is now the build-time
  default. Previously, -Ddefault-hierarchy=hybrid was the default. This
  change reflects the fact that cgroupsv2 support has matured
  substantially in both systemd and in the kernel, and is clearly the
  way forward. Downstream production distributions might want to
  continue to use -Ddefault-hierarchy=hybrid (or even =legacy) for
  their builds as unfortunately the popular container managers have not
  caught up with the kernel API changes.

* Man pages are not built by default anymore (html pages were already
  disabled by default), to make development builds quicker. When
  building systemd for a full installation with documentation, meson
  should be called with -Dman=true and/or -Dhtml=true as appropriate.
  The default was changed based on the 

Re: [systemd-devel] KExecWatchdogSec NEWS entry needs work

2019-07-30 Thread Zbigniew Jędrzejewski-Szmek
On Tue, Jul 30, 2019 at 08:32:44AM +1000, Clinton Roy wrote:
> Particularly the following sentence:
> 
> This option defaults to off, since it depends on drivers and
> software setup whether the watchdog is correctly reset again after
> the kexec completed, and thus for the general case not clear if safe
> (since it might cause unwanted watchdog reboots after the kexec
> completed otherwise).
> 
> I can't quite work out what intent is, otherwise I'd take a stab myself.

https://github.com/systemd/systemd/pull/13227

Zbyszek
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] vconsole.conf, systemd-localed and the console keymap in the initrd

2019-07-30 Thread Hans de Goede

Hi,

On 30-07-19 10:49, Hans de Goede wrote:

Hi All,

When using full-disk encryption the console keymap is used in the
initrd to enter the disk-crypt password.

There are a couple of issues with this:

1) keymap changes do not become effective until a new kernel
(which generated a new initrd which includes the updated vconsole.conf)
gets installed:
https://bugzilla.redhat.com/show_bug.cgi?id=1405539
Note this one is part of:
https://fedoraproject.org/wiki/Fedora_Program_Management/Prioritized_bugs_and_issues

We could have the tools re-generate the existing initrds when the
keymap changes but that is not 100% bullet proof, if some bug has
snuck in which causes new initrds to not boot, then we've just
overwritten the older fallback initrds with ones which will also
not boot...  Also in the future we want to move to using a single
generic pre-generated initrd everywhere and silverblue is already
doing this, which brings me to 2:

2) When using a generic initrd which does not include /etc/vconsole.conf
the keymap will also be "us" independent of what the system is
configured to use.


I forgot to put a link to the issue for this here, for those who are
interested this is being tracked / discussed here:

https://github.com/fedora-silverblue/issue-tracker/issues/3

Regards,

Hans





I believe that the best way to fix is this is probably to specify the
keymap on the kernel commandline using vconsole.keymap= on the kernel
commandline.

So 2 questions:

1) What is your (systemd devs) take on this, does using vconsole.keymap=
on the kernel commandline sound like the right solution, or do you have
other suggestions?

2) I wonder what will happen when runtime changing the keymap when
vconsole.keymap=foo is specified on the kernel commandline?

systemd-vconsole-setup will use the values on the kernel commandline
over those in /etc/vconsole.conf, and until we reboot those 2 will
no longer be in sync. systemd-vconsole-setup runs when a new vtconsole
gets added, but that should (normally) not happen after boot so that
is not a problem. But I wonder how systemd-localed applies changes
to the current vtconsole(s) does it do this itself, or does it use
systemd-vconsole-setup for this ?

I ask because if it uses systemd-vconsole-setup and that prefers the
kernel commandline value then the change will not happen until reboot.
Which I believe would be a regression compared to how things work now...

Regards,

Hans

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

[systemd-devel] vconsole.conf, systemd-localed and the console keymap in the initrd

2019-07-30 Thread Hans de Goede

Hi All,

When using full-disk encryption the console keymap is used in the
initrd to enter the disk-crypt password.

There are a couple of issues with this:

1) keymap changes do not become effective until a new kernel
(which generated a new initrd which includes the updated vconsole.conf)
gets installed:
https://bugzilla.redhat.com/show_bug.cgi?id=1405539
Note this one is part of:
https://fedoraproject.org/wiki/Fedora_Program_Management/Prioritized_bugs_and_issues

We could have the tools re-generate the existing initrds when the
keymap changes but that is not 100% bullet proof, if some bug has
snuck in which causes new initrds to not boot, then we've just
overwritten the older fallback initrds with ones which will also
not boot...  Also in the future we want to move to using a single
generic pre-generated initrd everywhere and silverblue is already
doing this, which brings me to 2:

2) When using a generic initrd which does not include /etc/vconsole.conf
the keymap will also be "us" independent of what the system is
configured to use.

I believe that the best way to fix is this is probably to specify the
keymap on the kernel commandline using vconsole.keymap= on the kernel
commandline.

So 2 questions:

1) What is your (systemd devs) take on this, does using vconsole.keymap=
on the kernel commandline sound like the right solution, or do you have
other suggestions?

2) I wonder what will happen when runtime changing the keymap when
vconsole.keymap=foo is specified on the kernel commandline?

systemd-vconsole-setup will use the values on the kernel commandline
over those in /etc/vconsole.conf, and until we reboot those 2 will
no longer be in sync. systemd-vconsole-setup runs when a new vtconsole
gets added, but that should (normally) not happen after boot so that
is not a problem. But I wonder how systemd-localed applies changes
to the current vtconsole(s) does it do this itself, or does it use
systemd-vconsole-setup for this ?

I ask because if it uses systemd-vconsole-setup and that prefers the
kernel commandline value then the change will not happen until reboot.
Which I believe would be a regression compared to how things work now...

Regards,

Hans
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel