[systemd-devel] ConditionNeedsUpdate, read-only /usr, and sysext
Hello everybody, The behavior of ConditionNeedsUpdate is that if /etc/.updated is older than /usr/, then it is true. I have some issues with this. But maybe I do not use it the right way. First, when using a read-only /usr partition (updated through sysupdate), the time of /usr is of the build of that filesystem. In the case of GNOME OS, to ensure reproducibility bit by bit, we set all times to some time in 2011. So that does not work for us. But now let's say we work-around that, and we make our system take a date that is reproducible, let's say the git commit of our metadata. Then we have a second issue. Because of systemd-sysext, it might be that /usr is not anymore the time of the /usr filesystem, but the time of a directory created on the fly by systemd-sysext (or maybe it keeps the time from the / fileystem, I do not know, but for sure the time stamp is from when systemd-sysext was started). If systemd-update-done happens after systemd-sysext (and it effectively does on 254), then the date of /etc/.updated will become the time when systemd-sysext started. Let's imagine that I do not boot that machine often. My system is booting a new version. And there is already another new version available on the sysupdate server. My system will download a build of /usr that is likely to be older than the boot time. So next reboot, the condition will be false, even though I did have an update. And it will be false until I download a version that was built after the boot time of my last successful update. So my question is, is there plan to replace time stamp comparison for ConditionNeedsUpdate with something that works better with sysupdate and sysext? Maybe copying IMAGE_VERSION from /usr/lib/os-release into /etc/.updated for example? Thanks, -- Valentin David m...@valentindavid.com
[systemd-devel] udev database cross-version compatibility
Hello, Back in 2014 and again in 2020, there were discussions on the mailing-list related to udev database version safety. This was important to know if libudev from a container could access to /run/udev/data files safely. Given then libudev and systemd-udevd would be potentially different version. The conclusion was that there was no guarantee. And based on that flatpak has not provided /run/udev/data to applications. Later, the format changed in #16853 (udev: make uevents "sticky"). But this caused issue #17605 (Units with BindsTo= are being killed on upgrade from v246 to v247). This was fixed by #17622 (sd-device: make sd_device_has_current_tag() compatible with udev database generated by older udevd). It seems to me, because udev needs to handle upgrade and downgrade, that it will continue to handle some compatibility across versions. Is it safe now for flatpak to provide /run/udev/data to containers? (Also, snapd does it, oops) -- Valentin David m...@valentindavid.com
Re: [systemd-devel] systemd-repart very slow creation of partitions with Encrypt=
On Mon, Jun 5, 2023 at 11:09 AM Lennart Poettering wrote: > On Mo, 05.06.23 10:41, Valentin David (valentin.da...@canonical.com) > wrote: > > > On Mon, Jun 5, 2023 at 9:56 AM Lennart Poettering < > lenn...@poettering.net> > > wrote: > > > > > On So, 04.06.23 14:25, Valentin David (valentin.da...@canonical.com) > > > wrote: > > > > > > > I have been trying to create a root partition from initrd with > > > > systemd-repart. The repart.d file for this partition is as follow: > > > > > > > > [Partition] > > > > Type=root > > > > Label=root > > > > Encrypt=tpm2 > > > > Format=ext4 > > > > FactoryReset=yes > > > > > > > > I am just using systemd-repart.service in initrd, without > modification > > > > (that is, it finds the disk from /sysusr/usr). Even though this is > > > working, > > > > the problem I have is that it takes a very long time for the > partition to > > > > be created. Looking at the logs, it spends most of time in the > > > > reencryption. > > > > > > reencryption? We don't do any reencrytion really. i.e. we do not > > > actually support anything like "cryptsetup reencrypt" at all. All we > > > do is the equivalent of "cryptsetup luksFormat". Are you suggesting > > > that repart is slower at formatting a block device via LUKS than > > > invoking cryptsetup directly would be? I'd find that very surprising... > > > > > > > This is what it looks like in src/partition/repart.c. Function > > partition_encrypt calls sym_crypt_reencrypt_init_by_passphrase and > > then sym_crypt_reencrypt. > > And make_filesystem is called before partition_encrypt. So it must > > reencrypt since mkfs was called before. > > Oh, fuck, yeah, Daan added that. > > This is a bug really. > I will open an issue on github then.
Re: [systemd-devel] systemd-repart very slow creation of partitions with Encrypt=
I think that behavior was introduced by https://github.com/systemd/systemd/commit/48a09a8fff480aab9a68e95e95cc37f6b1438751 On Mon, Jun 5, 2023 at 10:41 AM Valentin David wrote: > > > On Mon, Jun 5, 2023 at 9:56 AM Lennart Poettering > wrote: > >> On So, 04.06.23 14:25, Valentin David (valentin.da...@canonical.com) >> wrote: >> >> > I have been trying to create a root partition from initrd with >> > systemd-repart. The repart.d file for this partition is as follow: >> > >> > [Partition] >> > Type=root >> > Label=root >> > Encrypt=tpm2 >> > Format=ext4 >> > FactoryReset=yes >> > >> > I am just using systemd-repart.service in initrd, without modification >> > (that is, it finds the disk from /sysusr/usr). Even though this is >> working, >> > the problem I have is that it takes a very long time for the partition >> to >> > be created. Looking at the logs, it spends most of time in the >> > reencryption. >> >> reencryption? We don't do any reencrytion really. i.e. we do not >> actually support anything like "cryptsetup reencrypt" at all. All we >> do is the equivalent of "cryptsetup luksFormat". Are you suggesting >> that repart is slower at formatting a block device via LUKS than >> invoking cryptsetup directly would be? I'd find that very surprising... >> > > This is what it looks like in src/partition/repart.c. Function > partition_encrypt calls sym_crypt_reencrypt_init_by_passphrase and then > sym_crypt_reencrypt. > And make_filesystem is called before partition_encrypt. So it must > reencrypt since mkfs was called before. > > >> > For 11GB partition on a VM, it takes more than 2 minutes. On the bare >> metal >> > with a 512 GB nvme disk, it has been running for 3 hours. And it is >> still >> > not finished. >> >> This is really strange. The LUKS formatting should just write a >> superlock onto the disk, which is just a couple of sectors, and should >> barely take any time. >> >> Or are you saying "mke2fs" takes that long? >> >> Note that we specify lazy_itable_init=1 during formatting ext4, hence >> it should actually be super fast too... >> > > No. mkfs was done. In the logs it was all about reencryption. See > https://gitlab.gnome.org/-/snippets/5809/raw/main/snippetfile1.txt > > >> >> > I do not think cryptsetup reencryption supports holes. Is it normal to >> have >> > a full reencryption of a disk that was just initialized with mkfs.ext4? >> If >> > so, could we at least move the effective reencryption after >> > systemd-repart.service, so that the rest of the system can continue to >> boot? >> > >> > I am running: >> > systemd 253.4 running in system mode (+PAM +AUDIT +SELINUX +APPARMOR >> +IMA >> > +SMACK +SECCOMP +GCRYPT -GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS >> +FIDO2 >> > +IDN2 -IDN -IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 +PWQUALITY >> +P11KIT >> > -QRENCODE +TPM2 +BZIP2 +LZ4 +XZ +ZLIB +ZSTD -BPF_FRAMEWORK -XKBCOMM >> > ON +UTMP +SYSVINIT default-hierarchy=unified) >> > >> > Cryptsetup: v2.6.1 >> >> I am a bit puzzled by this. WOuld be good to figure out what actually >> is so slow here? formatting luks? formatting ext4? discarding? >> >> Lennart >> >> -- >> Lennart Poettering, Berlin >> >
Re: [systemd-devel] systemd-repart very slow creation of partitions with Encrypt=
On Mon, Jun 5, 2023 at 9:56 AM Lennart Poettering wrote: > On So, 04.06.23 14:25, Valentin David (valentin.da...@canonical.com) > wrote: > > > I have been trying to create a root partition from initrd with > > systemd-repart. The repart.d file for this partition is as follow: > > > > [Partition] > > Type=root > > Label=root > > Encrypt=tpm2 > > Format=ext4 > > FactoryReset=yes > > > > I am just using systemd-repart.service in initrd, without modification > > (that is, it finds the disk from /sysusr/usr). Even though this is > working, > > the problem I have is that it takes a very long time for the partition to > > be created. Looking at the logs, it spends most of time in the > > reencryption. > > reencryption? We don't do any reencrytion really. i.e. we do not > actually support anything like "cryptsetup reencrypt" at all. All we > do is the equivalent of "cryptsetup luksFormat". Are you suggesting > that repart is slower at formatting a block device via LUKS than > invoking cryptsetup directly would be? I'd find that very surprising... > This is what it looks like in src/partition/repart.c. Function partition_encrypt calls sym_crypt_reencrypt_init_by_passphrase and then sym_crypt_reencrypt. And make_filesystem is called before partition_encrypt. So it must reencrypt since mkfs was called before. > > For 11GB partition on a VM, it takes more than 2 minutes. On the bare > metal > > with a 512 GB nvme disk, it has been running for 3 hours. And it is still > > not finished. > > This is really strange. The LUKS formatting should just write a > superlock onto the disk, which is just a couple of sectors, and should > barely take any time. > > Or are you saying "mke2fs" takes that long? > > Note that we specify lazy_itable_init=1 during formatting ext4, hence > it should actually be super fast too... > No. mkfs was done. In the logs it was all about reencryption. See https://gitlab.gnome.org/-/snippets/5809/raw/main/snippetfile1.txt > > > I do not think cryptsetup reencryption supports holes. Is it normal to > have > > a full reencryption of a disk that was just initialized with mkfs.ext4? > If > > so, could we at least move the effective reencryption after > > systemd-repart.service, so that the rest of the system can continue to > boot? > > > > I am running: > > systemd 253.4 running in system mode (+PAM +AUDIT +SELINUX +APPARMOR +IMA > > +SMACK +SECCOMP +GCRYPT -GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS > +FIDO2 > > +IDN2 -IDN -IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 +PWQUALITY +P11KIT > > -QRENCODE +TPM2 +BZIP2 +LZ4 +XZ +ZLIB +ZSTD -BPF_FRAMEWORK -XKBCOMM > > ON +UTMP +SYSVINIT default-hierarchy=unified) > > > > Cryptsetup: v2.6.1 > > I am a bit puzzled by this. WOuld be good to figure out what actually > is so slow here? formatting luks? formatting ext4? discarding? > > Lennart > > -- > Lennart Poettering, Berlin >
[systemd-devel] systemd-repart very slow creation of partitions with Encrypt=
I have been trying to create a root partition from initrd with systemd-repart. The repart.d file for this partition is as follow: [Partition] Type=root Label=root Encrypt=tpm2 Format=ext4 FactoryReset=yes I am just using systemd-repart.service in initrd, without modification (that is, it finds the disk from /sysusr/usr). Even though this is working, the problem I have is that it takes a very long time for the partition to be created. Looking at the logs, it spends most of time in the reencryption. For 11GB partition on a VM, it takes more than 2 minutes. On the bare metal with a 512 GB nvme disk, it has been running for 3 hours. And it is still not finished. I do not think cryptsetup reencryption supports holes. Is it normal to have a full reencryption of a disk that was just initialized with mkfs.ext4? If so, could we at least move the effective reencryption after systemd-repart.service, so that the rest of the system can continue to boot? I am running: systemd 253.4 running in system mode (+PAM +AUDIT +SELINUX +APPARMOR +IMA +SMACK +SECCOMP +GCRYPT -GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN -IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 +PWQUALITY +P11KIT -QRENCODE +TPM2 +BZIP2 +LZ4 +XZ +ZLIB +ZSTD -BPF_FRAMEWORK -XKBCOMM ON +UTMP +SYSVINIT default-hierarchy=unified) Cryptsetup: v2.6.1
Re: [systemd-devel] Unmountable mounts and systemd-fsck@.service conflicting with shutdown.target
It is a call to systemd-mount done from initramfs. It ends up in /run/systemd/transient and survives the root switch. The generated unit contains Requires=systemd-fsck@service. Is the conflict on shutdown.target to make shutdown kill fsck if it is running? Generated systemd-cryptsetup@.service units have "DefaultDependencies=no" and no conflict on shutdown. Maybe this is missing then. "cryptsetup attach" might be running. On Fri, Jan 6, 2023 at 1:34 PM Lennart Poettering wrote: > On Do, 05.01.23 14:18, Valentin David (valentin.da...@canonical.com) > wrote: > > > Hello, > > > > In Ubuntu Core, we have some mounts that cannot be unmounted until we > have > > switched root. > > > > To simplify, this looks like that: > > > > / mounts a ro loop devices backed by /some/disk/some/path/image.img > > /some/disk mounts a block device (let's say /dev/some-block0p1) > > > > In this case, /some/disk cannot be unmounted. > > > > We do not want to lazily unmount, we cannot get errors if something > fails. > > (Unless we had a lazy unmount that would only work when read-only) > > > > We do remount /some/disk read-only on shutdown. And in the shutdown > > intramfs, we unmount /oldroot/some/disk. > > > > However, we get an error message with systemd trying to unmount it. While > > functionally, it does not matter, it is still very problematic to have > > error messages. > > > > Using `DefaultDependencies=no` is not enough. I have tried to be clever > and > > add some-disk.mount to shutdown.target.wants so it would not try to > unmount > > it. But systemd got confused with conflicts and randomly kills stop jobs > > until there is no conflict. > > > > Debugging it, I have found that this is because some-disk.mount depends > on > > systemd-fsck@some\x2dblock0p1.service. And systemd-fsck@.service > conflicts > > with shutdown.target. > > > > I wonder if having conflict on shutdown.target really needed. Could we > > remove it? (And also add DefaultDepenencies=no to > > system-systemd\x2dfsck.slice) With this, mounts with > DefaultDependencie=no > > do not get unmounted as part of shutdown.target. (They do during > > systemd-shutdown) > > hmm, so we generally want system services to go away before > shutdown. This is a very special case though. I wonder if we can just > override systemd-fsck@….service for that specific case? > > How are those mounts established? i.e. by which unit is the > systemd-fsck@.service instance pulled in? and how was that configured? > fstab? ubuntu-own code? > > Lennart > > -- > Lennart Poettering, Berlin >
[systemd-devel] Unmountable mounts and systemd-fsck@.service conflicting with shutdown.target
Hello, In Ubuntu Core, we have some mounts that cannot be unmounted until we have switched root. To simplify, this looks like that: / mounts a ro loop devices backed by /some/disk/some/path/image.img /some/disk mounts a block device (let's say /dev/some-block0p1) In this case, /some/disk cannot be unmounted. We do not want to lazily unmount, we cannot get errors if something fails. (Unless we had a lazy unmount that would only work when read-only) We do remount /some/disk read-only on shutdown. And in the shutdown intramfs, we unmount /oldroot/some/disk. However, we get an error message with systemd trying to unmount it. While functionally, it does not matter, it is still very problematic to have error messages. Using `DefaultDependencies=no` is not enough. I have tried to be clever and add some-disk.mount to shutdown.target.wants so it would not try to unmount it. But systemd got confused with conflicts and randomly kills stop jobs until there is no conflict. Debugging it, I have found that this is because some-disk.mount depends on systemd-fsck@some\x2dblock0p1.service. And systemd-fsck@.service conflicts with shutdown.target. I wonder if having conflict on shutdown.target really needed. Could we remove it? (And also add DefaultDepenencies=no to system-systemd\x2dfsck.slice) With this, mounts with DefaultDependencie=no do not get unmounted as part of shutdown.target. (They do during systemd-shutdown)