Re: [systemd-devel] BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures
* Dave Martin via Libc-alpha: > On Mon, Oct 26, 2020 at 05:45:42PM +0100, Florian Weimer via Libc-alpha wrote: >> * Dave Martin via Libc-alpha: >> >> > Would it now help to add something like: >> > >> > int mchangeprot(void *addr, size_t len, int old_flags, int new_flags) >> > { >> >int ret = -EINVAL; >> >mmap_write_lock(current->mm); >> >if (all vmas in [addr .. addr + len) have >> >their mprotect flags set to old_flags) { >> > >> >ret = mprotect(addr, len, new_flags); >> >} >> > >> >mmap_write_unlock(current->mm); >> >return ret; >> > } >> >> I suggested something similar as well. Ideally, the interface would >> subsume pkey_mprotect, though, and have a separate flags argument from >> the protection flags. But then we run into argument list length limits. >> >> Thanks, >> Florian > > I suppose. Assuming that a syscall filter can inspect memory, we might > be able to bundle arguments into a struct if necessary. But that leads to a discussion about batch mmap/mprotect/munmap, and that's again incompatible with seccomp (it would need a checking loop). Thanks, Florian -- Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn, Commercial register: Amtsgericht Muenchen, HRB 153243, Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures
* Dave Martin via Libc-alpha: > Would it now help to add something like: > > int mchangeprot(void *addr, size_t len, int old_flags, int new_flags) > { > int ret = -EINVAL; > mmap_write_lock(current->mm); > if (all vmas in [addr .. addr + len) have > their mprotect flags set to old_flags) { > > ret = mprotect(addr, len, new_flags); > } > > mmap_write_unlock(current->mm); > return ret; > } I suggested something similar as well. Ideally, the interface would subsume pkey_mprotect, though, and have a separate flags argument from the protection flags. But then we run into argument list length limits. Thanks, Florian -- Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn, Commercial register: Amtsgericht Muenchen, HRB 153243, Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures
* Topi Miettinen: > Allowing mprotect(PROT_EXEC|PROT_BTI) would mean that all you need to > circumvent MDWX is to add PROT_BTI flag. I'd suggest getting the flags > right at mmap() time or failing that, reverting the PROT_BTI for > legacy programs later. > > Could the kernel tell the loader of the BTI situation with auxiliary > vectors? Then it would be easy for the loader to always use the best > mmap() flags without ever needing to mprotect(). I think what we want is a mprotect2 call with a flags argument (separate from protection flags) that tells the kernel that the request *removes* protection flags and should fail otherwise. seccomp could easily filter that then. But like the other proposals, the migration story isn't great. You would need kernel and seccomp/systemd etc. updates before glibc starts working, even if glibc has a fallback from mprotect2 to mprotect (because the latter would be blocked). Thanks, Florian -- Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn, Commercial register: Amtsgericht Muenchen, HRB 153243, Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures
* Topi Miettinen: >> The dynamic loader has to process the LOAD segments to get to the ELF >> note that says to enable BTI. Maybe we could do a first pass and >> load only the segments that cover notes. But that requires lots of >> changes to generic code in the loader. > > What if the loader always enabled BTI for PROT_EXEC pages, but then > when discovering that this was a mistake, mprotect() the pages without > BTI? Is that architecturally supported? How costly is the mprotect change if the pages have not been faulted in yet? > Then both BTI and MDWX would work and the penalty of not getting > MDWX would fall to non-BTI programs. What's the expected proportion of > BTI enabled code vs. disabled in the future, is it perhaps expected > that a distro would enable the flag globally so eventually only a few > legacy programs might be unprotected? Eventually, I expect that mainstream distributions build everything for BTI, so yes, the PROT_BTI removal would only be needed for legacy programs. Thanks, Florian -- Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn, Commercial register: Amtsgericht Muenchen, HRB 153243, Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures
* Lennart Poettering: > On Mi, 21.10.20 22:44, Jeremy Linton (jeremy.lin...@arm.com) wrote: > >> Hi, >> >> There is a problem with glibc+systemd on BTI enabled systems. Systemd >> has a service flag "MemoryDenyWriteExecute" which uses seccomp to deny >> PROT_EXEC changes. Glibc enables BTI only on segments which are marked as >> being BTI compatible by calling mprotect PROT_EXEC|PROT_BTI. That call is >> caught by the seccomp filter, resulting in service failures. >> >> So, at the moment one has to pick either denying PROT_EXEC changes, or BTI. >> This is obviously not desirable. >> >> Various changes have been suggested, replacing the mprotect with mmap calls >> having PROT_BTI set on the original mapping, re-mmapping the segments, >> implying PROT_EXEC on mprotect PROT_BTI calls when VM_EXEC is already set, >> and various modification to seccomp to allow particular mprotect cases to >> bypass the filters. In each case there seems to be an undesirable attribute >> to the solution. >> >> So, whats the best solution? > > Did you see Topi's comments on the systemd issue? > > https://github.com/systemd/systemd/issues/17368#issuecomment-710485532 > > I think I agree with this: it's a bit weird to alter the bits after > the fact. Can't glibc set up everything right from the begining? That > would keep both concepts working. The dynamic loader has to process the LOAD segments to get to the ELF note that says to enable BTI. Maybe we could do a first pass and load only the segments that cover notes. But that requires lots of changes to generic code in the loader. Thanks, Florian -- Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn, Commercial register: Amtsgericht Muenchen, HRB 153243, Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] rdrand generated with march=winchip-c6 in systemd-241
* tedheadster: > Here is the full disassembly with a few comments: > > (gdb) disassemble > Dump of assembler code for function rdrand: >0xb7e21440 <+0>: push %esi >0xb7e21441 <+1>: push %ebx >0xb7e21442 <+2>: call 0xb7e0af5d <__x86.get_pc_thunk.si> >0xb7e21447 <+7>: add$0x1acb39,%esi >0xb7e2144d <+13>:mov0x2870(%esi),%ecx >0xb7e21453 <+19>:test %ecx,%ecx >0xb7e21455 <+21>:js 0xb7e21480 >0xb7e21457 <+23>:test %ecx,%ecx >0xb7e21459 <+25>:je 0xb7e214e0 >0xb7e2145f <+31>:rdrand %ecx < illegal instruction was > attempted here Can you capture register contents at the point of the crash? Does this reproduce in a chroot? Maybe you can trace the whole thing with a debugger. Does the crash reproduce if you single-step through the whole function? ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Running “telinit u” on glibc update
On 05/16/2018 11:52 PM, Lennart Poettering wrote: Hence, please go aahead and drop it from the rpm scripts. Thanks for the clear advice. I filed https://bugzilla.redhat.com/show_bug.cgi?id=1579225 to track the removal. Florian ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] Running “telinit u” on glibc update
In Fedora, for historic reasons, we run “/sbin/telinit u” after installing a new glibc RPM package version. Does this still make sense? Should we remove the code which invokes telinit from the glibc package? Thanks, Florian ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Significant performance loss caused by commit a65f06b: journal: return -ECHILD after a fork
* Lennart Poettering: > On Wed, 12.07.17 09:53, Florian Weimer (f...@deneb.enyo.de) wrote: > >> * Lennart Poettering: >> >> > On Tue, 11.07.17 21:24, Florian Weimer (f...@deneb.enyo.de) wrote: >> > >> >> * Lennart Poettering: >> >> >> >> > This all stems from my experiences with PulseAudio back in the day: >> >> > People do not grok the effect of fork(): it only duplicates the >> >> > invoking thread, not any other threads of the process, moreover all >> >> > data structures are copied as they are, and that's a time bomb really: >> >> >> >> These days, the PID can change even without a fork, so the story is a >> >> bit different. >> > >> > Can you elaborate? >> >> I'm no longer sure that the PID can change with the current kernel, >> but I cannot rule it out, either. But other weirdness is possible: >> for example, after a fork, getpid and getppid could be equal. > > Please be less vague. So I understand correctly that the PID cannot > change without fork after all? I don't know. I don't see such a guarantee provided by the kernel. I think the important aspect is that as far as glibc is concerned, the PID *can* change even if its fork and vfork functions are never called, so we cannot reliably invalidate a cache. We could decide that it is undefined to call the fork/vfork/clone system calls directly, but that would break systemd functionality, I think. >> > Are you talking about cases where you invoke clone() directly, instead >> > of via glibc's wrappers? We do that too in systemd, but I am not sure >> > this is really reason enough to introduce this regression in glibc: >> > this is easily worked around (which we do in systemd), and given that >> > the time between clone() and execve() should be short, and the >> > code between the two minimal this isn't really much of a problem. >> >> There are other uses of clone which do not immediately lead to an >> execve. > > Can you elaborate? Have a real-life user of this? src/nspawn/nspawn.c? A lot of things happen in the inner_child function. >> > Where was this discussed in detail? Do you have any links about the >> > discussions about this? >> >> It was on libc-alpha and the glibc bug tracker. > > Link? https://sourceware.org/ml/libc-alpha/2016-10/msg00136.html https://sourceware.org/ml/libc-alpha/2016-10/msg00233.html https://sourceware.org/ml/libc-alpha/2016-11/msg00247.html https://sourceware.org/ml/libc-alpha/2016-11/msg00303.html https://www.postgresql.org/message-id/e1ukzbn-0006c6...@gemulon.postgresql.org The last reason explains why using getpid for fork detection does not work in the way one would expect. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Significant performance loss caused by commit a65f06b: journal: return -ECHILD after a fork
* Lennart Poettering: > On Wed, 12.07.17 09:51, Florian Weimer (f...@deneb.enyo.de) wrote: > >> * Lennart Poettering: >> >> > On Tue, 11.07.17 21:26, Florian Weimer (f...@deneb.enyo.de) wrote: >> > >> >> * Lennart Poettering: >> >> >> >> > Apparently, this regressed between this version and >> >> > glibc-2.24-9.fc25.x86_64 hence. >> >> >> >> Yes, I backported the fork cache removal to Fedora 25. There is no >> >> longer a good way to main such a cache in userspace because glibc >> >> cannot intercept anymore all the ways that can change the PID of the >> >> current process because the kernel interfaces for process management >> >> are incredibly rich these days. >> > >> > Please be more specific here. What is this all about? >> >> We got many bug reports over the years about sandboxes and other heavy >> users of namespaces and clone that the glibc PID cache got out of >> sync, both in child and parent (!) processes. > > have any links? <https://bugs.chromium.org/p/chromium/issues/detail?id=484870> You guys ran into this as well and wrote a raw_getpid function which calls the system call. (You should have reported the bug instead.) >> > What triggered this specifically? is this about docker? docker is >> > written in golang anyway, iirc, which doesn't bother with linking to >> > libc anyway? >> >> It needs glibc for access to the host and user databases. > > can you elaborate? I fail to see any relationship between > unshare()/fork()/getpid() and NSS? You asked why docker links against glibc. >> > Is this a glibc upstream choice primarily? Were the regressions this >> > causes considered? >> >> I raised the problem of applications calling getpid frequently and >> named OpenSSL as an example. > > Link? See the collection of links in the other message. > And I am pretty sure the usecase is very valid... And yes, > even if checking getpid() misses some theoretical corner cases, > pthread_atfork() or whatever else you propose will miss others too, > and is much uglier codewise, introduces deps, yadda yadda... We actually increased the accuracy of your fork detection logic (even though it's still broken), so I'm puzzled why you keep calling this a regression. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Significant performance loss caused by commit a65f06b: journal: return -ECHILD after a fork
* Lennart Poettering: > On Tue, 11.07.17 21:26, Florian Weimer (f...@deneb.enyo.de) wrote: > >> * Lennart Poettering: >> >> > Apparently, this regressed between this version and >> > glibc-2.24-9.fc25.x86_64 hence. >> >> Yes, I backported the fork cache removal to Fedora 25. There is no >> longer a good way to main such a cache in userspace because glibc >> cannot intercept anymore all the ways that can change the PID of the >> current process because the kernel interfaces for process management >> are incredibly rich these days. > > BTW, with this change you are breaking expressly documented behaviour: > > http://man7.org/linux/man-pages/man2/getpid.2.html > > "Since glibc version 2.3.4, the glibc wrapper function for > getpid() caches PIDs, so as to avoid additional system calls when > a process calls getpid() repeatedly." Note that the manpages project, while extremely valuable, is not official glibc documentation and sometimes explains library internals which can change over time. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Significant performance loss caused by commit a65f06b: journal: return -ECHILD after a fork
* Lennart Poettering: > On Tue, 11.07.17 21:24, Florian Weimer (f...@deneb.enyo.de) wrote: > >> * Lennart Poettering: >> >> > This all stems from my experiences with PulseAudio back in the day: >> > People do not grok the effect of fork(): it only duplicates the >> > invoking thread, not any other threads of the process, moreover all >> > data structures are copied as they are, and that's a time bomb really: >> >> These days, the PID can change even without a fork, so the story is a >> bit different. > > Can you elaborate? I'm no longer sure that the PID can change with the current kernel, but I cannot rule it out, either. But other weirdness is possible: for example, after a fork, getpid and getppid could be equal. > Are you talking about cases where you invoke clone() directly, instead > of via glibc's wrappers? We do that too in systemd, but I am not sure > this is really reason enough to introduce this regression in glibc: > this is easily worked around (which we do in systemd), and given that > the time between clone() and execve() should be short, and the > code between the two minimal this isn't really much of a problem. There are other uses of clone which do not immediately lead to an execve. > Where was this discussed in detail? Do you have any links about the > discussions about this? It was on libc-alpha and the glibc bug tracker. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Significant performance loss caused by commit a65f06b: journal: return -ECHILD after a fork
* Lennart Poettering: > On Tue, 11.07.17 21:26, Florian Weimer (f...@deneb.enyo.de) wrote: > >> * Lennart Poettering: >> >> > Apparently, this regressed between this version and >> > glibc-2.24-9.fc25.x86_64 hence. >> >> Yes, I backported the fork cache removal to Fedora 25. There is no >> longer a good way to main such a cache in userspace because glibc >> cannot intercept anymore all the ways that can change the PID of the >> current process because the kernel interfaces for process management >> are incredibly rich these days. > > Please be more specific here. What is this all about? We got many bug reports over the years about sandboxes and other heavy users of namespaces and clone that the glibc PID cache got out of sync, both in child and parent (!) processes. > What triggered this specifically? is this about docker? docker is > written in golang anyway, iirc, which doesn't bother with linking to > libc anyway? It needs glibc for access to the host and user databases. > Is this a glibc upstream choice primarily? Were the regressions this > causes considered? I raised the problem of applications calling getpid frequently and named OpenSSL as an example. > I mean, the getpid() checking code is not only in use in systemd, but > in various other bits, in particular PulseAudio, where I started > adding these checks for a good reason. It sounds pretty strange to me > to just regress all that... Fork detection using getpid is not reliable. It gives false negatives in the case of double-forks, where the process can be different but the PID is the same due to reuse. Considering that this use case is broken, I don't think it's worthwhile to jump through hoops to support code which is fundamentally broken anyway. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Significant performance loss caused by commit a65f06b: journal: return -ECHILD after a fork
* Lennart Poettering: > Apparently, this regressed between this version and > glibc-2.24-9.fc25.x86_64 hence. Yes, I backported the fork cache removal to Fedora 25. There is no longer a good way to main such a cache in userspace because glibc cannot intercept anymore all the ways that can change the PID of the current process because the kernel interfaces for process management are incredibly rich these days. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Significant performance loss caused by commit a65f06b: journal: return -ECHILD after a fork
* Lennart Poettering: > This all stems from my experiences with PulseAudio back in the day: > People do not grok the effect of fork(): it only duplicates the > invoking thread, not any other threads of the process, moreover all > data structures are copied as they are, and that's a time bomb really: These days, the PID can change even without a fork, so the story is a bit different. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Significant performance loss caused by commit a65f06b: journal: return -ECHILD after a fork
* Michael Chapman: > It's a pity glibc doesn't provide an equivalent for pthread_atfork() > outside of the pthread library. Having a notification that a fork has just > occurred would allow us to do the PID caching ourselves. In fact, it does, as a public symbol: __register_atfork. It's just not really documented, so it lives a bit a vacuum, like the old __secure_getenv call. To fix this, we need to make the dynamic linker disregard sonames when resolving non-interposed versioned symbols, and then move the definition from libpthread to libc. ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Is sd_journal_send non-blocking?
On 02/06/2017 12:36 PM, Lennart Poettering wrote: On Mon, 06.02.17 11:12, Florian Weimer (fwei...@redhat.com) wrote: The manual page I've got does not say it is, but the implementation does not wait for acknowledgment from the journal. The implementation does not block explicitly, but I think the sendmsg call can block until the receiver queue is empty. The background for this question is that we have a feature request for a non-blocking logging interface. I wonder if the journal fits this requirement. I assume that with “non-blocking”, the feature request submitter means that the function does not block indefinitely, say due to a service process outage. Memory allocations can take a fairly long time as well (due to paging), but I think that doesn't count here. The socket we use is blocking, but we try to increase the socket buffer to 8M, to move the point where we start to block out late. But that only works with sufficient privileges. The sender buffer size is currently a limit on the datagram size. The data goes straight into the receiver's buffer, so it doesn't have any impact on blocking behavior. So yes, we are always blocking, we don't throw data away. Sure. I was wondering if indefinite blocking is considered a critical service failure and if there is a watchdog which would catch a stuck journal daemon. (Hanging syslog servers are apparently a fairly common problem, and this is where the RFE originally came from.) I'd be willing to take a patch however, that adds a call sd_journal_set_block_timeout() or so, that takes a time value we pass to SO_SNDTIMEO for the logging socket. This would permit clients to precisely control how long they want us to wait before we give up. And in the case where a zero timeout is set we'd instead set O_NONBLOCK, thus making logging entirely non-blocking. This would help with discarding data. It would not help those who want to integrate logging into an event loop. Which is probably bad idea anyway, but I'm not quite sure yet what the purpose of non-blocking logging is. Perhaps there is a desire to apply some back-pressure to reduce the rate at which logging messages are generated. But in general, this merely introduces deadlocks. Thanks, Florian ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] Is sd_journal_send non-blocking?
The manual page I've got does not say it is, but the implementation does not wait for acknowledgment from the journal. The implementation does not block explicitly, but I think the sendmsg call can block until the receiver queue is empty. The background for this question is that we have a feature request for a non-blocking logging interface. I wonder if the journal fits this requirement. I assume that with “non-blocking”, the feature request submitter means that the function does not block indefinitely, say due to a service process outage. Memory allocations can take a fairly long time as well (due to paging), but I think that doesn't count here. Thanks, Florian ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Running ldconfig at boot
On 05/23/2016 11:59 AM, Lennart Poettering wrote: On Mon, 23.05.16 11:34, Florian Weimer (fwei...@redhat.com) wrote: On 05/20/2016 04:10 PM, Lennart Poettering wrote: If such systems require specialized unit files, then you can put ldconfig.service there, instead of exposing all systemd users to the service. No they don't. Basic Fedora works fine in this mode, without any changes. It isn#t round, and it isn't supported fully by Fedora, but the basics do work just fine. Hmm. And this is sufficient justification to force ldconfig.service on all users by default, despite its impact on correctness (see below) and boot time (as reported on this thread as well)? Well, I'd claim that expecting that if you install libraries on mounts not included in local-fs.target they still appear in ldconfig is wrong. I mean, the promise "local-fs.target" makes is that that's all needed for services to run, and for upgrades of the OS to take place. That's not what the documentation in systemd.special(7) says. It essentially defines local-fs.target based on file system type. It says nothing about service availability. I expect there are still installations out there which run /opt (or something like it) off NFS. I think the trigger is just implemented incorrectly. It should keep track of UUIDs in files (such as /usr/.change-uuid), rather than looking at mtimes. Then you can just write a new UUID once you update parts of the file system externally and completely avoid running such triggers when all subtrees are locally maintained. I am not sure I grok why uuids would be a better option than mtimes here... Time-base synchronization tends to cause problems. If /etc is modified before the system time is set from NTP, you get an incorrect timestamp. If /usr goes backwards in time, you don't run the update actions. And the update actions are triggered by unexpected package installations. If you used a UUID file instead, you could (a) mark /usr as externally supplied and (b) avoid all issues related to clock skew. Florian ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Running ldconfig at boot
On 05/20/2016 04:04 PM, Lennart Poettering wrote: On Fri, 20.05.16 15:55, Florian Weimer (fwei...@redhat.com) wrote: On 05/20/2016 02:59 PM, Lennart Poettering wrote: On Fri, 20.05.16 14:01, Florian Weimer (fwei...@redhat.com) wrote: The default systemd configuration runs ldconfig at boot. Why? It's conditionalized via ConditionNeedsUpdate=, which means it is only run when /etc is older than /usr. (This is tested via checking modification times of /etc/.updated and /usr), see explanation on systemd.unit(5). The usecase for this is to permit systems where a single /usr tree is shared among multiple systems, and might be updated at any time, and the changes need to be propagated to /etc on each individual systems. The keyword is "stateless systems". Do such systems need systemd configuration changes? Not sure I understand this question? If such systems require specialized unit files, then you can put ldconfig.service there, instead of exposing all systemd users to the service. There are some more packages where installation or upgrades would update the /usr mtime. I don't have a current Fedora rawhide list, but I'm attaching an older version. The /usr mtime is not a reliable indicator for what you are trying to detect, I think. Well, it really doesn't have to be. I mean, in the worst case we'll run ldconfig once too often, which should be idempotent... As I explained, the way you run it, it is not necessarily idempotent. Florian ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Running ldconfig at boot
On 05/20/2016 02:59 PM, Lennart Poettering wrote: On Fri, 20.05.16 14:01, Florian Weimer (fwei...@redhat.com) wrote: The default systemd configuration runs ldconfig at boot. Why? It's conditionalized via ConditionNeedsUpdate=, which means it is only run when /etc is older than /usr. (This is tested via checking modification times of /etc/.updated and /usr), see explanation on systemd.unit(5). The usecase for this is to permit systems where a single /usr tree is shared among multiple systems, and might be updated at any time, and the changes need to be propagated to /etc on each individual systems. The keyword is "stateless systems". Do such systems need systemd configuration changes? Note that normally this should not be triggered at all, since this only works on systems where /usr itsel is explicitly touched after each updated so that the mtime is updated. That should normally not happen, except when your distro is prepared for that, and does that explicitly. It happens if the filesystem package is upgraded. I think this is what triggered the last mtime update on my workstation. There are some more packages where installation or upgrades would update the /usr mtime. I don't have a current Fedora rawhide list, but I'm attaching an older version. The /usr mtime is not a reliable indicator for what you are trying to detect, I think. Florian nevra | name +- arm-none-eabi-binutils-cs-2014.05.28-3.fc22.x86_64 | /usr/arm-none-eabi arm-none-eabi-newlib-2.1.0-5.fc21.noarch | /usr/arm-none-eabi avr-binutils-1:2.24-4.fc22.x86_64 | /usr/avr avr-libc-1.8.0-9.fc21.noarch | /usr/avr cinnamon-control-center-filesystem-2.4.2-1.fc22.i686 | /usr/share cinnamon-control-center-filesystem-2.4.2-1.fc22.x86_64 | /usr/share filesystem-3.2-32.fc22.x86_64 | /usr/bin filesystem-3.2-32.fc22.x86_64 | /usr/games filesystem-3.2-32.fc22.x86_64 | /usr/include filesystem-3.2-32.fc22.x86_64 | /usr/lib filesystem-3.2-32.fc22.x86_64 | /usr/lib64 filesystem-3.2-32.fc22.x86_64 | /usr/libexec filesystem-3.2-32.fc22.x86_64 | /usr/local filesystem-3.2-32.fc22.x86_64 | /usr/sbin filesystem-3.2-32.fc22.x86_64 | /usr/share filesystem-3.2-32.fc22.x86_64 | /usr/src krb5-appl-clients-1.0.3-9.fc22.x86_64 | /usr/kerberos krb5-appl-servers-1.0.3-9.fc22.x86_64 | /usr/kerberos mingw32-filesystem-99-5.fc21.noarch| /usr/i686-w64-mingw32 mingw64-filesystem-99-5.fc21.noarch| /usr/x86_64-w64-mingw32 msp430-binutils-2.21.1a-8.fc22.x86_64 | /usr/msp430 nqp-jvm-0.0.2014.04-3.fc22.noarch | /usr/languages nqp-moar-0.0.2014.04-3.fc22.x86_64 | /usr/languages sblim-cmpi-network-1.4.0-13.fc22.i686 | /usr/share sblim-cmpi-network-1.4.0-13.fc22.x86_64| /usr/share (25 rows) ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] Running ldconfig at boot
The default systemd configuration runs ldconfig at boot. Why? Most deployments of systemd appear to be dynamically linked, so if the ld.so caches are corrupted, you will never get to the point where you can run ldconfig. Running ldconfig too early tends to cause problems because the file system might not have been set up completely, and the cache does not match what the system administrator has configured. Florian ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [PATCH] gssd: Improve scalability by not waiting for child processes
* Steve Dickson: > +static void > +sig_child(int signal) > +{ > + int err; > + pid_t pid; > + > + /* Parent: just wait on child to exit and return */ > + do { > + pid = wait(); > + } while(pid == -1 && errno != -ECHILD); > + > + if (WIFSIGNALED(err)) > + printerr(0, "WARNING: forked child was killed" > + "with signal %d\n", WTERMSIG(err)); > +} prinerr calls vfprintf or vsyslog. Neither is safe to use in signal handlers, so you need to log this message in some other way. Florian ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Use of capabilities in default service files
On 07/20/2015 02:34 PM, Reindl Harald wrote: Am 20.07.2015 um 13:58 schrieb Florian Weimer: On 07/20/2015 01:52 PM, Reindl Harald wrote: Am 20.07.2015 um 13:24 schrieb Florian Weimer: CapabilityBoundingSet=CAP_IPC_OWNER CAP_SETUID CAP_SETGID CAP_SETPCAP m4_ifdef(`HAVE_SMACK', CAP_MAC_ADMIN ) … What's the intent of these settings? Is it a form of hardening? If yes, it is rather ineffective because UID=0 does not need any capabilities to completely compromise the system. UID=0 *does* need capabilities, drwxr-xr-x. 2 root root 37 Jun 13 10:09 /etc/cron.d -rw-r--r--. 1 root root 3068 Jul 17 19:47 /etc/passwd UID=0 without CAP_DAC_OVERRIDE (or any other capabilities) can write to these locations and escalate fairly directly to full root. why should it need CAP_DAC_OVERRIDE when it *owns* the files and has write permissions? Exactly, it's the reason why I suspect something fishy is going on if people to harden services running UID=0 by taking away capabilities. chown the file to a different user and root no longer can write there to protect /etc and /usr ReadOnlyDirectories is the way to go ReadOnlyDirectories=/etc ReadOnlyDirectories=/usr Then you still have instant root through: drwx--. 2 root root 20 Feb 3 12:36 /var/spool/cron/ I found the “CapabilityBoundingSet=” line (empty set) somewhat worrying, it seems to me that this service should use a separate UID, not 0 that's a different story and works for services wich don't need to read files only readable by root and not listening on privileged ports like mysqld and in that case even if there is a root exploit CapabilityBoundingSet would reduce the damage [Service] Type=simple User=mysql Group=mysql And that's fine. But doing hardening for UID=0 services seems a very bad practice to me because it looks like someone is assuming that UID=0 without capabilities is just another “nobody” user. Which is not surprising, because capabilities are often advertised that way. -- Florian Weimer / Red Hat Product Security ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] Does socket activation block a TCP port for listening by other processes?
We have quite a zoo of services which listen on localhost, on a fixed TCP port, for use by local clients. The canonical example is PostgreSQL on 5432/TCP, for the benefit of Java clients (which cannot use the UNIX domain socket). This has the obvious issue that if a local attacker crashes the service, they can impersonate it by binding to the same port. Does socket activation reliably prevent such impersonation attacks? Or is there race, say during systemd configuration reloading or service restarts, where systemd temporarily does not listen to that port? -- Florian Weimer / Red Hat Product Security ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Does socket activation block a TCP port for listening by other processes?
On 07/21/2015 01:52 PM, David Herrmann wrote: Hi On Tue, Jul 21, 2015 at 1:37 PM, Florian Weimer fwei...@redhat.com wrote: We have quite a zoo of services which listen on localhost, on a fixed TCP port, for use by local clients. The canonical example is PostgreSQL on 5432/TCP, for the benefit of Java clients (which cannot use the UNIX domain socket). This has the obvious issue that if a local attacker crashes the service, they can impersonate it by binding to the same port. Does socket activation reliably prevent such impersonation attacks? Or is there race, say during systemd configuration reloading or service restarts, where systemd temporarily does not listen to that port? This race *does* exist with socket activation. The sockets may be re-created if the unit transitions between states (like restart). Thanks. Would it make sense to fix this in some way, so that the socket sticks around? However, this is only triggered by actions on the socket-unit, not the actually activated unit. A crash of the activated service will thus not trigger state-changes on the socket-unit. Good to know. Furthermore, we do not treat occupying a resource as security mechanism. If an attacker has access to bind such a port, we do not place restrictions to prevent that. Instead, you should place LSM restrictions to prevent this. I'm stuck with SELinux and the unconfined_t user, so I don't think I can set an LSM ACL on the port. And please note, the actual activated unit usually does *not* have rights to bind the socket. This is done by pid1. So an attacker would require rights of PID1 for such an attack. unconfined_t unfortunately has this right for port numbers larger than 1023, unfortunately. -- Florian Weimer / Red Hat Product Security ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] Use of capabilities in default service files
There are a few examples similar to these in the source tree: [Unit] Description=Legacy D-Bus Protocol Compatibility Daemon [Service] ExecStart=@rootlibexecdir@/systemd-bus-proxyd --address=kernel:path=/sys/fs/kdbus/0-system/bus NotifyAccess=main CapabilityBoundingSet=CAP_IPC_OWNER CAP_SETUID CAP_SETGID CAP_SETPCAP m4_ifdef(`HAVE_SMACK', CAP_MAC_ADMIN ) … [Unit] Description=Hostname Service Documentation=man:systemd-hostnamed.service(8) man:hostname(5) man:machine-info(5) Documentation=http://www.freedesktop.org/wiki/Software/systemd/hostnamed [Service] ExecStart=@rootlibexecdir@/systemd-hostnamed BusName=org.freedesktop.hostname1 CapabilityBoundingSet=CAP_SYS_ADMIN … [Unit] Description=Locale Service Documentation=man:systemd-localed.service(8) man:locale.conf(5) man:vconsole.conf(5) Documentation=http://www.freedesktop.org/wiki/Software/systemd/localed [Service] ExecStart=@rootlibexecdir@/systemd-localed BusName=org.freedesktop.locale1 CapabilityBoundingSet= … What's the intent of these settings? Is it a form of hardening? If yes, it is rather ineffective because UID=0 does not need any capabilities to completely compromise the system. -- Florian Weimer / Red Hat Product Security ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Use of capabilities in default service files
On 07/20/2015 01:52 PM, Reindl Harald wrote: Am 20.07.2015 um 13:24 schrieb Florian Weimer: CapabilityBoundingSet=CAP_IPC_OWNER CAP_SETUID CAP_SETGID CAP_SETPCAP m4_ifdef(`HAVE_SMACK', CAP_MAC_ADMIN ) … What's the intent of these settings? Is it a form of hardening? If yes, it is rather ineffective because UID=0 does not need any capabilities to completely compromise the system. UID=0 *does* need capabilities, drwxr-xr-x. 2 root root 37 Jun 13 10:09 /etc/cron.d -rw-r--r--. 1 root root 3068 Jul 17 19:47 /etc/passwd UID=0 without CAP_DAC_OVERRIDE (or any other capabilities) can write to these locations and escalate fairly directly to full root. that's the whole purpose of CapabilityBoundingSet and so yes it is a form of hardening To me, it looks someone misunderstood the interactions between UID=0 and capabilities. our internal httpd package is using the following options and when you remove CAP_NET_BIND_SERVICE it could not bind to port 80, PrivateTmp=yes PrivateDevices=yes NoNewPrivileges=yes CapabilityBoundingSet=CAP_DAC_OVERRIDE CAP_IPC_LOCK CAP_NET_BIND_SERVICE CAP_SETGID CAP_SETUID Without code in the daemon to switch away from UID=0, this is not a very strong restriction (but CAP_DAC_OVERRIDE is root-equivalent anyway, so it probably does not matter). I found the “CapabilityBoundingSet=” line (empty set) somewhat worrying, it seems to me that this service should use a separate UID, not 0. -- Florian Weimer / Red Hat Product Security ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] systemd-resolved: Concerns raised about cache handling
On the oss-security mailing list, Sebastian Kramer raised some concerns about the DNS implementation in systemd-resolved: http://www.openwall.com/lists/oss-security/2014/11/12/5 I share his concerns, particularly those about caching data not directly pertaining to a response (and they were the reason why I asked about cache dumping because it's so much easier to show this with this debugging aid). I don't consider this so much a security vulnerability, but an interoperability failure in the making (because there are networks where broken recursive resolvers do not filter out incorrect or misleading data). So I'm more worried about accidents than attacks. Some of the other recommendations in RFC 5452 are also relevant to caching stubs. (Sadly, the RFC is incomplete, there is little public documentation on how to actually write interoperable DNS resolvers.) For example, I'm not sure if it is necessary to implement elaborate CNAME processing, or just cache everything in the answer section with the expected RR type, irrespective of the owner name of the resource records, and under the minimum TTL of the entire answer section. Even if you follow CNAME chains, you should only the initial name (QNAME) as a cache lookup key, adding the entire CNAME chain still can lead to cache poisoning. -- Florian Weimer / Red Hat Product Security ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] systemd-resolved D-Bus API documentation
On 10/20/2014 09:36 PM, Lennart Poettering wrote: On Tue, 14.10.14 11:54, Florian Weimer (fwei...@redhat.com) wrote: Is there some documentation of the D-Bus API and its expected behavior? No. We have not documented this so far, as we don't want to guarantee API stabilitiy for this yet. Is there any way to test this code, short of writing my own client? And is there a way to dump the contents of the internal cache? No, this is not available. (Note that strictly speaking the singular word cache is misleading, as there are multiple caches in place, one for each interface/protocol.) Not opposed to adding this though as a debug feature, but I wonder what your particular usecase for that would be? It would extremely helpful if you notice the cache has been poisoned and you want to figure out why. -- Florian Weimer / Red Hat Product Security ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] systemd-resolved D-Bus API documentation
Is there some documentation of the D-Bus API and its expected behavior? And is there a way to dump the contents of the internal cache? -- Florian Weimer / Red Hat Product Security ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] You are invited to give your thoughts on some async-signal-safety issues.
On 09/03/2014 08:45 PM, Lennart Poettering wrote: On Sun, 31.08.14 03:31, Steven Stewart-Gallus (sstewartgallu...@mylangara.bc.ca) wrote: I understand that systemd uses sandboxing and multithreading. After a fork, many things are messed up so it's only practical to use async-signal-safe functions after a fork from a threaded program. If you have ideas on what sort of functionality GLibc needs to change to make systemd more async-signal-safe after a fork you are invited to give your thoughts on the GLibc mailing list in the thread at https://sourceware.org/ml/libc-help/2014-08/msg00039.html. All our APIs contain checks that you cannot continue reuse contexts you create with them after a fork(). If you try the functions will return ESRCH. Steven is asking about what you need for glibc beyond what the standards provide. Things like malloc support, access to NSS databases, and so on. For example, bus_kernel_make_starter is called after forking and does a lot of things. But the real question is what you would need to get rid of the syscall(__NR_clone) in src/nspawn/nspawn.c. -- Florian Weimer / Red Hat Product Security ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Delaying (SSH) key generation until the urandom pool is initialized
On 05/05/2014 04:35 PM, Lennart Poettering wrote: On Wed, 30.04.14 17:06, Florian Weimer (fwei...@redhat.com) wrote: On 04/30/2014 02:28 PM, Daniel P. Berrange wrote: Interesting suggestion. I just used virt-manager to create the VM. I don't see any trace for rng or random in the domain XML file. If it is supported, I think it should be enabled by default. I'm told that it isn't turned on by default, but you can add it to a VM post-install. Since it feeds VMs from the host's /dev/random or /dev/hwrng, there was a question mark as to whether it was right to enable it by default or not, and if so what kind of rate limiting might be wanted by default. Ah, so it builds down to our distrust of hardware RNGs? How annoying. We should be able to trust Fedora-on-Fedora (or Debian-on-Debian or whatever) scenarios. But I get that in the general case, it's impossible to know what's on the other side of the virtio_rng side, so reservations remain. Hmm? Well, a virtualized OS has to trust the hypervisor, there's no way around that. I'm referring to this: * This function will use the architecture-specific hardware random * number generator if it is available. The arch-specific hw RNG will * almost certainly be faster than what we can do in software, but it * is impossible to verify that it is implemented securely (as * opposed, to, say, the AES encryption of a sequence number using a * key known by the NSA). So it's useful if we need the speed, but * only if we're willing to trust the hardware manufacturer not to * have put in a back door. I think this is the reason why the pool isn't considered initialized even if its contents has been randomized with RDRAND or similar instructions. I wouldn't be surprised if these minds have a similar concern about randomness coming from a hypervisor. -- Florian Weimer / Red Hat Product Security Team ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Delaying (SSH) key generation until the urandom pool is initialized
On 05/02/2014 01:05 PM, Lennart Poettering wrote: No, /dev/random can (and will) block long after booting. But that's what you want in this case, no? You want this to block after booting if there never has been enough entropy in the pool, right? What I want is no blocking at all. If the pool is initialized, this is safe unless there is a crippling bug in the kernel random driver (or complete break of SHA-1, significantly worse than what's known for, say MD4). If the pool is not initialized, it is practically possible to enumerate all possible pool states, and thus predict the data that comes out of /dev/urandom. So I grudgingly accept that some things have to block until /dev/urandom is initialized. Is there any kernel API currently available to at least query if the pool is properly initialized? No, there isn't. I proposed a patch for a new variable in /proc, and the question came up whether this is what userspace needs. There is now a separate discussion about injecting a few bits of entropy that in the virt case (where entropy is generally scarce initially), the pool is always initialized. So you have multiple options: a) fix the kernel to make /dev/urandom block until enough initial entropy has been gathered. Would fix the roblem, but probably slow down systemd. In systemd the hash functions used in hash tables are keyed off /dev/urandom, in order to make collision attacks difficult. This only requires low-quality entropy, as we rehash with a new key anyway should the hashtable reach its initial size limit. Hence blocking reading of /dev/urandom until the pool is initialized will slow down systemd since would have to block without actually needing any quality entropy... If systemd is blocked on /dev/unrandom at this stage, will anything else happen on the system that generates more entropy? b) fix the kernel to make the behaviour explained in a) something optional, maybe something that is enabled on /dev/random when O_DIRECT or so is passed during opening. i.e. if that flag is specifiied that /dev/random will behave like classic /dev/random until the initial pool is filled up, and like /dev/urandom afterwards. Or perhaps a separate device node. I am very conservative howver to simply delay the boot until the pool is initialized. I want this for specific services only. For things like hash tables, it's possible to reseed once more randomness becomes available, but that's not true for long-term key generation. -- Florian Weimer / Red Hat Product Security Team ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Delaying (SSH) key generation until the urandom pool is initialized
On 04/29/2014 09:30 PM, Tom Gundersen wrote: You can easily start the sockets early, but make the daemon itself wait for the key generation to finish. Thanks. Can you provide an example? (I don't want to change the daemon code.) The only thing you then have to make sure is that the key generation blocks until the non-blocking pool is initialized (I assume that is what's being used?). For that I suppose you just need to make the kernel block /dev/urandom until that's the case, I have seen this being discussed, but don't know the status of those patches. Would it be possible to do the blocking in a separate service? This way, it would be more visible in diagnostic tools, and it's not necessary to change all key generation code (including programs which just generation session keys). I don't know if we can change /dev/urandom to block because that doesn't look very backwards-compatible to me. -- Florian Weimer / Red Hat Product Security Team ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Delaying (SSH) key generation until the urandom pool is initialized
On 04/30/2014 01:08 PM, Lennart Poettering wrote: On Tue, 29.04.14 20:43, Florian Weimer (fwei...@redhat.com) wrote: The message at https://mail.gnome.org/archives/ostree-list/2014-February/msg00010.html contains two boot traces from virtual machines which show that the SSH key is generated before the kernel pool is sufficiently seeded. Are you saying ssh reads from /dev/urandom rather than /dev/random, but it should be reading from the latter? No, that's not what I wrote. Using /dev/urandom for key generation is fine once its pool is seeded. Using existing key generation algorithms with /dev/random instead does not work because they consume too much entropy and can block for significantly more time than just a few minutes. WHat does that have to do with systemd? It seems related to boot ordering, device availability, kernel event signaling and so on (although we have little of that at present). In particular, what I want is that a one-shot service for key generation only blocks on pool seeding when it actually needs to run (because the keys do not exist yet). From a support perspective, it seems better to do the waiting outside the one-shot service, so that systemd tools can be used to examine what is causing the delay. Would it be possible using socket activation to create the listening socket for SSH, but block the actual service startup until the keys have been generated after sufficient entropy became available? What would you need on the kernel side to implement the waiting? (Textual comparison of a log message is only good for a prototype.) THis already exists. It's called /dev/random... No, /dev/random can (and will) block long after booting. -- Florian Weimer / Red Hat Product Security Team ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Delaying (SSH) key generation until the urandom pool is initialized
On 04/30/2014 01:14 PM, Daniel P. Berrange wrote: On Tue, Apr 29, 2014 at 08:43:38PM +0200, Florian Weimer wrote: The message at https://mail.gnome.org/archives/ostree-list/2014-February/msg00010.html contains two boot traces from virtual machines which show that the SSH key is generated before the kernel pool is sufficiently seeded. I'm wondering if the VMs that ostree is creating are being given a virtio-rng device ? If not that would probably be a good idea to enable to allow them to get entropy. VMs are generally starved of entropy even beyond the initial boot up stage, so a virtual RNG is generally useful. Interesting suggestion. I just used virt-manager to create the VM. I don't see any trace for rng or random in the domain XML file. If it is supported, I think it should be enabled by default. (But I see a similar issue on bare metal.) -- Florian Weimer / Red Hat Product Security Team ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] Delaying (SSH) key generation until the urandom pool is initialized
On 04/30/2014 02:28 PM, Daniel P. Berrange wrote: Interesting suggestion. I just used virt-manager to create the VM. I don't see any trace for rng or random in the domain XML file. If it is supported, I think it should be enabled by default. I'm told that it isn't turned on by default, but you can add it to a VM post-install. Since it feeds VMs from the host's /dev/random or /dev/hwrng, there was a question mark as to whether it was right to enable it by default or not, and if so what kind of rate limiting might be wanted by default. Ah, so it builds down to our distrust of hardware RNGs? How annoying. We should be able to trust Fedora-on-Fedora (or Debian-on-Debian or whatever) scenarios. But I get that in the general case, it's impossible to know what's on the other side of the virtio_rng side, so reservations remain. -- Florian Weimer / Red Hat Product Security Team ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] Delaying (SSH) key generation until the urandom pool is initialized
The message at https://mail.gnome.org/archives/ostree-list/2014-February/msg00010.html contains two boot traces from virtual machines which show that the SSH key is generated before the kernel pool is sufficiently seeded. Would it be possible using socket activation to create the listening socket for SSH, but block the actual service startup until the keys have been generated after sufficient entropy became available? What would you need on the kernel side to implement the waiting? (Textual comparison of a log message is only good for a prototype.) -- Florian Weimer / Red Hat Product Security Team ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] threads and cgroups
On 02/12/2014 11:46 AM, Lennart Poettering wrote: The plan by the kernel guys is to not allow cgroup management on threads anymore, only entire processes. The APIs in systemd reflect this decision by the kernel guys: you cannot arrange individual threads for organization by cgroups. Interesting. Will there be an alternative for I/O bandwidth throttling? -- Florian Weimer / Red Hat Product Security Team ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
Re: [systemd-devel] [PATCH 00/11] *** SUBJECT HERE ***
On 12/20/2013 08:50 PM, Colin Walters wrote: On Thu, 2013-12-19 at 14:20 +0100, Florian Weimer wrote: readdir_r is rather broken on Linux because there are some directories it cannot read. Citation? Are you talking about http://womble.decadent.org.uk/readdir_r-advisory.html ? Partly, yes. Current glibc documentation has this to say: https://sourceware.org/git/?p=glibc.git;a=blob;f=manual/filesys.texi;h=1cac45393d554a7a6a83c184262e2ce0be7c8885;hb=HEAD#l495 (Disclaimer: I helped to write these paragraphs. Also see the comments about NAME_MAX in conf.texi.) Previous glibc versions truncated the buffer so that it wasn't necessarily NUL-terminated, or had a buffer overflow (on certain non-mainstream architectures). readdir_r also makes an necessary copy of the dirent struct. -- Florian Weimer / Red Hat Product Security Team ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] [PATCH 00/11] *** SUBJECT HERE ***
This patch series removes all uses of readdir_r. The changes are not entirely mechanical, some of the patches fix bugs in error handling. readdir_r is rather broken on Linux because there are some directories it cannot read. Florian Weimer (11): tmpfiles: replace readdir_r with readdir login: replace readdir_r with readdir delta: replace readdir_r with readdir core: replace readdir_r with readdir conf-files: replace readdir_r with readdir install: replace readdir_r with readdir util: replace readdir_r with readdir journal/vacuum: replace readdir_r with readdir journald/server: replace readdir_r with readdir journal: replace readdir_r with readdir util: remove union dirent_storage TODO | 1 - src/core/load-dropin.c| 7 --- src/delta/delta.c | 7 +++ src/journal/journal-vacuum.c | 9 - src/journal/journald-server.c | 8 src/journal/sd-journal.c | 24 ++-- src/login/sd-login.c | 8 src/shared/conf-files.c | 8 src/shared/install.c | 22 ++ src/shared/util.c | 37 + src/shared/util.h | 6 -- src/tmpfiles/tmpfiles.c | 8 12 files changed, 72 insertions(+), 73 deletions(-) -- 1.8.3.1 ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] [PATCH 02/11] login: replace readdir_r with readdir
--- src/login/sd-login.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/login/sd-login.c b/src/login/sd-login.c index c9a2e8a..2930b87 100644 --- a/src/login/sd-login.c +++ b/src/login/sd-login.c @@ -555,13 +555,13 @@ _public_ int sd_get_uids(uid_t **users) { for (;;) { struct dirent *de; -union dirent_storage buf; int k; uid_t uid; -k = readdir_r(d, buf.de, de); -if (k != 0) -return -k; +errno = 0; +de = readdir(d); +if (!de errno != 0) +return -errno; if (!de) break; -- 1.8.3.1 ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] [PATCH 06/11] install: replace readdir_r with readdir
The old code incorrectly assumed that readdir_r updates errno. --- src/shared/install.c | 22 ++ 1 file changed, 10 insertions(+), 12 deletions(-) diff --git a/src/shared/install.c b/src/shared/install.c index 17e8a75..5001ad4 100644 --- a/src/shared/install.c +++ b/src/shared/install.c @@ -212,11 +212,10 @@ static int remove_marked_symlinks_fd( for (;;) { struct dirent *de; -union dirent_storage buf; -int k; -k = readdir_r(d, buf.de, de); -if (k != 0) { +errno = 0; +de = readdir(d); +if (!de errno != 0) { r = -errno; break; } @@ -373,12 +372,11 @@ static int find_symlinks_fd( } for (;;) { -int k; struct dirent *de; -union dirent_storage buf; -k = readdir_r(d, buf.de, de); -if (k != 0) +errno = 0; +de = readdir(d); +if (!de errno != 0) return -errno; if (!de) @@ -1938,12 +1936,12 @@ int unit_file_get_list( for (;;) { struct dirent *de; -union dirent_storage buffer; _cleanup_unitfilelist_free_ UnitFileList *f = NULL; -r = readdir_r(d, buffer.de, de); -if (r != 0) -return -r; +errno = 0; +de = readdir(d); +if (!de errno != 0) +return -errno; if (!de) break; -- 1.8.3.1 ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] [PATCH 01/11] tmpfiles: replace readdir_r with readdir
--- src/tmpfiles/tmpfiles.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/tmpfiles/tmpfiles.c b/src/tmpfiles/tmpfiles.c index b7f6a2e..e83a73e 100644 --- a/src/tmpfiles/tmpfiles.c +++ b/src/tmpfiles/tmpfiles.c @@ -543,15 +543,15 @@ static int recursive_relabel_children(Item *i, const char *path) { for (;;) { struct dirent *de; -union dirent_storage buf; bool is_dir; int r; _cleanup_free_ char *entry_path = NULL; -r = readdir_r(d, buf.de, de); -if (r != 0) { +errno = 0; +de = readdir(d); +if (!de errno != 0) { if (ret == 0) -ret = -r; +ret = -errno; break; } -- 1.8.3.1 ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] [PATCH 09/11] journald/server: replace readdir_r with readdir
The available_space function now returns 0 if reading the directory fails. Previously, such errors were silently ignored. --- src/journal/journald-server.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/journal/journald-server.c b/src/journal/journald-server.c index 3f8b95d..a3bacda 100644 --- a/src/journal/journald-server.c +++ b/src/journal/journald-server.c @@ -135,11 +135,11 @@ static uint64_t available_space(Server *s, bool verbose) { for (;;) { struct stat st; struct dirent *de; -union dirent_storage buf; -r = readdir_r(d, buf.de, de); -if (r != 0) -break; +errno = 0; +de = readdir(d); +if (!de errno != 0) +return 0; if (!de) break; -- 1.8.3.1 ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] [PATCH 08/11] journal/vacuum: replace readdir_r with readdir
--- src/journal/journal-vacuum.c | 9 - 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/src/journal/journal-vacuum.c b/src/journal/journal-vacuum.c index d4a1c6c..b2b47d6 100644 --- a/src/journal/journal-vacuum.c +++ b/src/journal/journal-vacuum.c @@ -180,9 +180,7 @@ int journal_directory_vacuum( return -errno; for (;;) { -int k; struct dirent *de; -union dirent_storage buf; size_t q; struct stat st; char *p; @@ -190,9 +188,10 @@ int journal_directory_vacuum( sd_id128_t seqnum_id; bool have_seqnum; -k = readdir_r(d, buf.de, de); -if (k != 0) { -r = -k; +errno = 0; +de = readdir(d); +if (!de errno != 0) { +r = -errno; goto finish; } -- 1.8.3.1 ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] [PATCH 07/11] util: replace readdir_r with readdir
This fixes rm_rf_children_dangerous to detect errors during directory reading. Previously, it could dereference an uninitialized pointer. --- src/shared/util.c | 37 + 1 file changed, 17 insertions(+), 20 deletions(-) diff --git a/src/shared/util.c b/src/shared/util.c index f598971..481c172 100644 --- a/src/shared/util.c +++ b/src/shared/util.c @@ -2306,7 +2306,6 @@ bool is_device_path(const char *path) { int dir_is_empty(const char *path) { _cleanup_closedir_ DIR *d; -int r; d = opendir(path); if (!d) @@ -2314,11 +2313,11 @@ int dir_is_empty(const char *path) { for (;;) { struct dirent *de; -union dirent_storage buf; -r = readdir_r(d, buf.de, de); -if (r 0) -return -r; +errno = 0; +de = readdir(d); +if (!de errno != 0) +return -errno; if (!de) return 1; @@ -2660,14 +2659,15 @@ int rm_rf_children_dangerous(int fd, bool only_dirs, bool honour_sticky, struct for (;;) { struct dirent *de; -union dirent_storage buf; bool is_dir, keep_around; struct stat st; int r; -r = readdir_r(d, buf.de, de); -if (r != 0 ret == 0) { -ret = -r; +errno = 0; +de = readdir(d); +if (!de errno != 0) { +if (ret == 0) +ret = -errno; break; } @@ -4485,13 +4485,11 @@ int get_files_in_directory(const char *path, char ***list) { for (;;) { struct dirent *de; -union dirent_storage buf; -int k; -k = readdir_r(d, buf.de, de); -assert(k = 0); -if (k 0) -return -k; +errno = 0; +de = readdir(d); +if (!de errno != 0) +return -errno; if (!de) break; @@ -5617,15 +5615,14 @@ int on_ac_power(void) { for (;;) { struct dirent *de; -union dirent_storage buf; _cleanup_close_ int fd = -1, device = -1; char contents[6]; ssize_t n; -int k; -k = readdir_r(d, buf.de, de); -if (k != 0) -return -k; +errno = 0; +de = readdir(d); +if (!de errno != 0) +return -errno; if (!de) break; -- 1.8.3.1 ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] [PATCH 04/11] core: replace readdir_r with readdir
--- src/core/load-dropin.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/src/core/load-dropin.c b/src/core/load-dropin.c index a877e66..3504009 100644 --- a/src/core/load-dropin.c +++ b/src/core/load-dropin.c @@ -63,12 +63,13 @@ static int iterate_dir( for (;;) { struct dirent *de; -union dirent_storage buf; _cleanup_free_ char *f = NULL; int k; -k = readdir_r(d, buf.de, de); -if (k != 0) { +errno = 0; +de = readdir(d); +if (!de errno != 0) { +k = errno; log_error(Failed to read directory %s: %s, path, strerror(k)); return -k; } -- 1.8.3.1 ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] [PATCH 05/11] conf-files: replace readdir_r with readdir
--- src/shared/conf-files.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/src/shared/conf-files.c b/src/shared/conf-files.c index 92204a6..c86bb03 100644 --- a/src/shared/conf-files.c +++ b/src/shared/conf-files.c @@ -53,13 +53,13 @@ static int files_add(Hashmap *h, const char *root, const char *path, const char for (;;) { struct dirent *de; -union dirent_storage buf; char *p; int r; -r = readdir_r(dir, buf.de, de); -if (r != 0) -return -r; +errno = 0; +de = readdir(dir); +if (!de errno != 0) +return -errno; if (!de) break; -- 1.8.3.1 ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] [PATCH 11/11] util: remove union dirent_storage
--- TODO | 1 - src/shared/util.h | 6 -- 2 files changed, 7 deletions(-) diff --git a/TODO b/TODO index 2e3becc..1e8f2b0 100644 --- a/TODO +++ b/TODO @@ -48,7 +48,6 @@ Features: - ensure scope units may be started only a single time * code cleanup - - get rid of readdir_r/dirent_storage stuff, it's unnecessary on Linux - we probably should replace the left-over uses of strv_append() and replace them by strv_push() or strv_extend() * switch to SipHash for hashmaps/sets? diff --git a/src/shared/util.h b/src/shared/util.h index 3e0a6d5..488ce3b 100644 --- a/src/shared/util.h +++ b/src/shared/util.h @@ -44,12 +44,6 @@ #include macro.h #include time-util.h -union dirent_storage { -struct dirent de; -uint8_t storage[offsetof(struct dirent, d_name) + -((NAME_MAX + 1 + sizeof(long)) ~(sizeof(long) - 1))]; -}; - /* What is interpreted as whitespace? */ #define WHITESPACE \t\n\r #define NEWLINE \n\r -- 1.8.3.1 ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel
[systemd-devel] [PATCH 10/11] journal: replace readdir_r with readdir
This commit also adds error handling for failures during directory reading. --- src/journal/sd-journal.c | 24 ++-- 1 file changed, 18 insertions(+), 6 deletions(-) diff --git a/src/journal/sd-journal.c b/src/journal/sd-journal.c index bb116df..283d593 100644 --- a/src/journal/sd-journal.c +++ b/src/journal/sd-journal.c @@ -1442,10 +1442,16 @@ static int add_directory(sd_journal *j, const char *prefix, const char *dirname) for (;;) { struct dirent *de; -union dirent_storage buf; -r = readdir_r(d, buf.de, de); -if (r != 0 || !de) +errno = 0; +de = readdir(d); +if (!de errno != 0) { +r = -errno; +log_debug(Failed to read directory %s: %s, + m-path, strerror(errno)); +return r; +} +if (!de) break; if (dirent_is_file_with_suffix(de, .journal) || @@ -1526,11 +1532,17 @@ static int add_root_directory(sd_journal *j, const char *p) { for (;;) { struct dirent *de; -union dirent_storage buf; sd_id128_t id; -r = readdir_r(d, buf.de, de); -if (r != 0 || !de) +errno = 0; +de = readdir(d); +if (!de errno != 0) { +r = -errno; +log_debug(Failed to read directory %s: %s, + m-path, strerror(errno)); +return r; +} +if (!de) break; if (dirent_is_file_with_suffix(de, .journal) || -- 1.8.3.1 ___ systemd-devel mailing list systemd-devel@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/systemd-devel