Re: [systemd-devel] BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures

2020-10-27 Thread Florian Weimer
* Dave Martin via Libc-alpha:

> On Mon, Oct 26, 2020 at 05:45:42PM +0100, Florian Weimer via Libc-alpha wrote:
>> * Dave Martin via Libc-alpha:
>> 
>> > Would it now help to add something like:
>> >
>> > int mchangeprot(void *addr, size_t len, int old_flags, int new_flags)
>> > {
>> >int ret = -EINVAL;
>> >mmap_write_lock(current->mm);
>> >if (all vmas in [addr .. addr + len) have
>> >their mprotect flags set to old_flags) {
>> >
>> >ret = mprotect(addr, len, new_flags);
>> >}
>> >
>> >mmap_write_unlock(current->mm);
>> >return ret;
>> > }
>> 
>> I suggested something similar as well.  Ideally, the interface would
>> subsume pkey_mprotect, though, and have a separate flags argument from
>> the protection flags.  But then we run into argument list length limits.
>>
>> Thanks,
>> Florian
>
> I suppose.  Assuming that a syscall filter can inspect memory, we might
> be able to bundle arguments into a struct if necessary.

But that leads to a discussion about batch mmap/mprotect/munmap, and
that's again incompatible with seccomp (it would need a checking loop).

Thanks,
Florian
-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures

2020-10-26 Thread Florian Weimer
* Dave Martin via Libc-alpha:

> Would it now help to add something like:
>
> int mchangeprot(void *addr, size_t len, int old_flags, int new_flags)
> {
>   int ret = -EINVAL;
>   mmap_write_lock(current->mm);
>   if (all vmas in [addr .. addr + len) have
>   their mprotect flags set to old_flags) {
>
>   ret = mprotect(addr, len, new_flags);
>   }
>   
>   mmap_write_unlock(current->mm);
>   return ret;
> }

I suggested something similar as well.  Ideally, the interface would
subsume pkey_mprotect, though, and have a separate flags argument from
the protection flags.  But then we run into argument list length limits.

Thanks,
Florian
-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures

2020-10-22 Thread Florian Weimer
* Topi Miettinen:

> Allowing mprotect(PROT_EXEC|PROT_BTI) would mean that all you need to
> circumvent MDWX is to add PROT_BTI flag. I'd suggest getting the flags 
> right at mmap() time or failing that, reverting the PROT_BTI for
> legacy programs later.
>
> Could the kernel tell the loader of the BTI situation with auxiliary
> vectors? Then it would be easy for the loader to always use the best 
> mmap() flags without ever needing to mprotect().

I think what we want is a mprotect2 call with a flags argument (separate
from protection flags) that tells the kernel that the request *removes*
protection flags and should fail otherwise.  seccomp could easily filter
that then.

But like the other proposals, the migration story isn't great.  You
would need kernel and seccomp/systemd etc. updates before glibc starts
working, even if glibc has a fallback from mprotect2 to mprotect
(because the latter would be blocked).

Thanks,
Florian
-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures

2020-10-22 Thread Florian Weimer
* Topi Miettinen:

>> The dynamic loader has to process the LOAD segments to get to the ELF
>> note that says to enable BTI.  Maybe we could do a first pass and
>> load only the segments that cover notes.  But that requires lots of
>> changes to generic code in the loader.
>
> What if the loader always enabled BTI for PROT_EXEC pages, but then
> when discovering that this was a mistake, mprotect() the pages without
> BTI?

Is that architecturally supported?  How costly is the mprotect change if
the pages have not been faulted in yet?

> Then both BTI and MDWX would work and the penalty of not getting
> MDWX would fall to non-BTI programs. What's the expected proportion of
> BTI enabled code vs. disabled in the future, is it perhaps expected
> that a distro would enable the flag globally so eventually only a few
> legacy programs might be unprotected?

Eventually, I expect that mainstream distributions build everything for
BTI, so yes, the PROT_BTI removal would only be needed for legacy
programs.

Thanks,
Florian
-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] BTI interaction between seccomp filters in systemd and glibc mprotect calls, causing service failures

2020-10-22 Thread Florian Weimer
* Lennart Poettering:

> On Mi, 21.10.20 22:44, Jeremy Linton (jeremy.lin...@arm.com) wrote:
>
>> Hi,
>>
>> There is a problem with glibc+systemd on BTI enabled systems. Systemd
>> has a service flag "MemoryDenyWriteExecute" which uses seccomp to deny
>> PROT_EXEC changes. Glibc enables BTI only on segments which are marked as
>> being BTI compatible by calling mprotect PROT_EXEC|PROT_BTI. That call is
>> caught by the seccomp filter, resulting in service failures.
>>
>> So, at the moment one has to pick either denying PROT_EXEC changes, or BTI.
>> This is obviously not desirable.
>>
>> Various changes have been suggested, replacing the mprotect with mmap calls
>> having PROT_BTI set on the original mapping, re-mmapping the segments,
>> implying PROT_EXEC on mprotect PROT_BTI calls when VM_EXEC is already set,
>> and various modification to seccomp to allow particular mprotect cases to
>> bypass the filters. In each case there seems to be an undesirable attribute
>> to the solution.
>>
>> So, whats the best solution?
>
> Did you see Topi's comments on the systemd issue?
>
> https://github.com/systemd/systemd/issues/17368#issuecomment-710485532
>
> I think I agree with this: it's a bit weird to alter the bits after
> the fact. Can't glibc set up everything right from the begining? That
> would keep both concepts working.

The dynamic loader has to process the LOAD segments to get to the ELF
note that says to enable BTI.  Maybe we could do a first pass and load
only the segments that cover notes.  But that requires lots of changes
to generic code in the loader.

Thanks,
Florian
-- 
Red Hat GmbH, https://de.redhat.com/ , Registered seat: Grasbrunn,
Commercial register: Amtsgericht Muenchen, HRB 153243,
Managing Directors: Charles Cachera, Brian Klemm, Laurie Krebs, Michael O'Neill

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] rdrand generated with march=winchip-c6 in systemd-241

2019-05-11 Thread Florian Weimer
* tedheadster:

> Here is the full disassembly with a few comments:
>
> (gdb) disassemble
> Dump of assembler code for function rdrand:
>0xb7e21440 <+0>: push   %esi
>0xb7e21441 <+1>: push   %ebx
>0xb7e21442 <+2>: call   0xb7e0af5d <__x86.get_pc_thunk.si>
>0xb7e21447 <+7>: add$0x1acb39,%esi
>0xb7e2144d <+13>:mov0x2870(%esi),%ecx
>0xb7e21453 <+19>:test   %ecx,%ecx
>0xb7e21455 <+21>:js 0xb7e21480 
>0xb7e21457 <+23>:test   %ecx,%ecx
>0xb7e21459 <+25>:je 0xb7e214e0 
>0xb7e2145f <+31>:rdrand %ecx  < illegal instruction was
> attempted here

Can you capture register contents at the point of the crash?

Does this reproduce in a chroot?  Maybe you can trace the whole thing
with a debugger.  Does the crash reproduce if you single-step through
the whole function?
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel

Re: [systemd-devel] Running “telinit u” on glibc update

2018-05-17 Thread Florian Weimer

On 05/16/2018 11:52 PM, Lennart Poettering wrote:


Hence, please go aahead and drop it from the rpm scripts.


Thanks for the clear advice.  I filed

  https://bugzilla.redhat.com/show_bug.cgi?id=1579225

to track the removal.

Florian
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] Running “telinit u” on glibc update

2018-05-16 Thread Florian Weimer
In Fedora, for historic reasons, we run “/sbin/telinit u” after 
installing a new glibc RPM package version.


Does this still make sense?  Should we remove the code which invokes 
telinit from the glibc package?


Thanks,
Florian
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Significant performance loss caused by commit a65f06b: journal: return -ECHILD after a fork

2017-07-12 Thread Florian Weimer
* Lennart Poettering:

> On Wed, 12.07.17 09:53, Florian Weimer (f...@deneb.enyo.de) wrote:
>
>> * Lennart Poettering:
>> 
>> > On Tue, 11.07.17 21:24, Florian Weimer (f...@deneb.enyo.de) wrote:
>> >
>> >> * Lennart Poettering:
>> >> 
>> >> > This all stems from my experiences with PulseAudio back in the day:
>> >> > People do not grok the effect of fork(): it only duplicates the
>> >> > invoking thread, not any other threads of the process, moreover all
>> >> > data structures are copied as they are, and that's a time bomb really:
>> >> 
>> >> These days, the PID can change even without a fork, so the story is a
>> >> bit different.
>> >
>> > Can you elaborate?
>> 
>> I'm no longer sure that the PID can change with the current kernel,
>> but I cannot rule it out, either.  But other weirdness is possible:
>> for example, after a fork, getpid and getppid could be equal.
>
> Please be less vague. So I understand correctly that the PID cannot
> change without fork after all?

I don't know.  I don't see such a guarantee provided by the kernel.

I think the important aspect is that as far as glibc is concerned, the
PID *can* change even if its fork and vfork functions are never
called, so we cannot reliably invalidate a cache.

We could decide that it is undefined to call the fork/vfork/clone
system calls directly, but that would break systemd functionality, I
think.

>> > Are you talking about cases where you invoke clone() directly, instead
>> > of via glibc's wrappers? We do that too in systemd, but I am not sure
>> > this is really reason enough to introduce this regression in glibc:
>> > this is easily worked around (which we do in systemd), and given that
>> > the time between clone() and execve() should be short, and the
>> > code between the two minimal this isn't really much of a problem.
>> 
>> There are other uses of clone which do not immediately lead to an
>> execve.
>
> Can you elaborate? Have a real-life user of this?

src/nspawn/nspawn.c?  A lot of things happen in the inner_child
function.

>> > Where was this discussed in detail? Do you have any links about the
>> > discussions about this?
>> 
>> It was on libc-alpha and the glibc bug tracker.
>
> Link?

https://sourceware.org/ml/libc-alpha/2016-10/msg00136.html
https://sourceware.org/ml/libc-alpha/2016-10/msg00233.html
https://sourceware.org/ml/libc-alpha/2016-11/msg00247.html
https://sourceware.org/ml/libc-alpha/2016-11/msg00303.html
https://www.postgresql.org/message-id/e1ukzbn-0006c6...@gemulon.postgresql.org

The last reason explains why using getpid for fork detection does not
work in the way one would expect.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Significant performance loss caused by commit a65f06b: journal: return -ECHILD after a fork

2017-07-12 Thread Florian Weimer
* Lennart Poettering:

> On Wed, 12.07.17 09:51, Florian Weimer (f...@deneb.enyo.de) wrote:
>
>> * Lennart Poettering:
>> 
>> > On Tue, 11.07.17 21:26, Florian Weimer (f...@deneb.enyo.de) wrote:
>> >
>> >> * Lennart Poettering:
>> >> 
>> >> > Apparently, this regressed between this version and
>> >> > glibc-2.24-9.fc25.x86_64 hence.
>> >> 
>> >> Yes, I backported the fork cache removal to Fedora 25.  There is no
>> >> longer a good way to main such a cache in userspace because glibc
>> >> cannot intercept anymore all the ways that can change the PID of the
>> >> current process because the kernel interfaces for process management
>> >> are incredibly rich these days.
>> >
>> > Please be more specific here. What is this all about?
>> 
>> We got many bug reports over the years about sandboxes and other heavy
>> users of namespaces and clone that the glibc PID cache got out of
>> sync, both in child and parent (!) processes.
>
> have any links?

<https://bugs.chromium.org/p/chromium/issues/detail?id=484870>

You guys ran into this as well and wrote a raw_getpid function which
calls the system call.  (You should have reported the bug instead.)

>> > What triggered this specifically? is this about docker? docker is
>> > written in golang anyway, iirc, which doesn't bother with linking to
>> > libc anyway?
>> 
>> It needs glibc for access to the host and user databases.
>
> can you elaborate? I fail to see any relationship between
> unshare()/fork()/getpid() and NSS?

You asked why docker links against glibc.

>> > Is this a glibc upstream choice primarily? Were the regressions this
>> > causes considered?
>> 
>> I raised the problem of applications calling getpid frequently and
>> named OpenSSL as an example.
>
> Link?

See the collection of links in the other message.

> And I am pretty sure the usecase is very valid... And yes,
> even if checking getpid() misses some theoretical corner cases,
> pthread_atfork() or whatever else you propose will miss others too,
> and is much uglier codewise, introduces deps, yadda yadda...

We actually increased the accuracy of your fork detection logic (even
though it's still broken), so I'm puzzled why you keep calling this a
regression.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Significant performance loss caused by commit a65f06b: journal: return -ECHILD after a fork

2017-07-12 Thread Florian Weimer
* Lennart Poettering:

> On Tue, 11.07.17 21:26, Florian Weimer (f...@deneb.enyo.de) wrote:
>
>> * Lennart Poettering:
>> 
>> > Apparently, this regressed between this version and
>> > glibc-2.24-9.fc25.x86_64 hence.
>> 
>> Yes, I backported the fork cache removal to Fedora 25.  There is no
>> longer a good way to main such a cache in userspace because glibc
>> cannot intercept anymore all the ways that can change the PID of the
>> current process because the kernel interfaces for process management
>> are incredibly rich these days.
>
> BTW, with this change you are breaking expressly documented behaviour:
>
> http://man7.org/linux/man-pages/man2/getpid.2.html
>
> "Since glibc version 2.3.4, the glibc wrapper function for
>  getpid() caches PIDs, so as to avoid additional system calls when
>  a process calls getpid() repeatedly."

Note that the manpages project, while extremely valuable, is not
official glibc documentation and sometimes explains library internals
which can change over time.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Significant performance loss caused by commit a65f06b: journal: return -ECHILD after a fork

2017-07-12 Thread Florian Weimer
* Lennart Poettering:

> On Tue, 11.07.17 21:24, Florian Weimer (f...@deneb.enyo.de) wrote:
>
>> * Lennart Poettering:
>> 
>> > This all stems from my experiences with PulseAudio back in the day:
>> > People do not grok the effect of fork(): it only duplicates the
>> > invoking thread, not any other threads of the process, moreover all
>> > data structures are copied as they are, and that's a time bomb really:
>> 
>> These days, the PID can change even without a fork, so the story is a
>> bit different.
>
> Can you elaborate?

I'm no longer sure that the PID can change with the current kernel,
but I cannot rule it out, either.  But other weirdness is possible:
for example, after a fork, getpid and getppid could be equal.

> Are you talking about cases where you invoke clone() directly, instead
> of via glibc's wrappers? We do that too in systemd, but I am not sure
> this is really reason enough to introduce this regression in glibc:
> this is easily worked around (which we do in systemd), and given that
> the time between clone() and execve() should be short, and the
> code between the two minimal this isn't really much of a problem.

There are other uses of clone which do not immediately lead to an
execve.

> Where was this discussed in detail? Do you have any links about the
> discussions about this?

It was on libc-alpha and the glibc bug tracker.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Significant performance loss caused by commit a65f06b: journal: return -ECHILD after a fork

2017-07-12 Thread Florian Weimer
* Lennart Poettering:

> On Tue, 11.07.17 21:26, Florian Weimer (f...@deneb.enyo.de) wrote:
>
>> * Lennart Poettering:
>> 
>> > Apparently, this regressed between this version and
>> > glibc-2.24-9.fc25.x86_64 hence.
>> 
>> Yes, I backported the fork cache removal to Fedora 25.  There is no
>> longer a good way to main such a cache in userspace because glibc
>> cannot intercept anymore all the ways that can change the PID of the
>> current process because the kernel interfaces for process management
>> are incredibly rich these days.
>
> Please be more specific here. What is this all about?

We got many bug reports over the years about sandboxes and other heavy
users of namespaces and clone that the glibc PID cache got out of
sync, both in child and parent (!) processes.

> What triggered this specifically? is this about docker? docker is
> written in golang anyway, iirc, which doesn't bother with linking to
> libc anyway?

It needs glibc for access to the host and user databases.

> Is this a glibc upstream choice primarily? Were the regressions this
> causes considered?

I raised the problem of applications calling getpid frequently and
named OpenSSL as an example.

> I mean, the getpid() checking code is not only in use in systemd, but
> in various other bits, in particular PulseAudio, where I started
> adding these checks for a good reason. It sounds pretty strange to me
> to just regress all that...

Fork detection using getpid is not reliable.  It gives false negatives
in the case of double-forks, where the process can be different but
the PID is the same due to reuse.  Considering that this use case is
broken, I don't think it's worthwhile to jump through hoops to support
code which is fundamentally broken anyway.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Significant performance loss caused by commit a65f06b: journal: return -ECHILD after a fork

2017-07-11 Thread Florian Weimer
* Lennart Poettering:

> Apparently, this regressed between this version and
> glibc-2.24-9.fc25.x86_64 hence.

Yes, I backported the fork cache removal to Fedora 25.  There is no
longer a good way to main such a cache in userspace because glibc
cannot intercept anymore all the ways that can change the PID of the
current process because the kernel interfaces for process management
are incredibly rich these days.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Significant performance loss caused by commit a65f06b: journal: return -ECHILD after a fork

2017-07-11 Thread Florian Weimer
* Lennart Poettering:

> This all stems from my experiences with PulseAudio back in the day:
> People do not grok the effect of fork(): it only duplicates the
> invoking thread, not any other threads of the process, moreover all
> data structures are copied as they are, and that's a time bomb really:

These days, the PID can change even without a fork, so the story is a
bit different.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Significant performance loss caused by commit a65f06b: journal: return -ECHILD after a fork

2017-07-11 Thread Florian Weimer
* Michael Chapman:

> It's a pity glibc doesn't provide an equivalent for pthread_atfork() 
> outside of the pthread library. Having a notification that a fork has just 
> occurred would allow us to do the PID caching ourselves.

In fact, it does, as a public symbol: __register_atfork.

It's just not really documented, so it lives a bit a vacuum, like the
old __secure_getenv call.  To fix this, we need to make the dynamic
linker disregard sonames when resolving non-interposed versioned
symbols, and then move the definition from libpthread to libc.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Is sd_journal_send non-blocking?

2017-02-06 Thread Florian Weimer

On 02/06/2017 12:36 PM, Lennart Poettering wrote:

On Mon, 06.02.17 11:12, Florian Weimer (fwei...@redhat.com) wrote:


The manual page I've got does not say it is, but the implementation does not
wait for acknowledgment from the journal.  The implementation does not block
explicitly, but I think the sendmsg call can block until the receiver queue
is empty.

The background for this question is that we have a feature request for a
non-blocking logging interface.  I wonder if the journal fits this
requirement.

I assume that with “non-blocking”, the feature request submitter means that
the function does not block indefinitely, say due to a service process
outage.  Memory allocations can take a fairly long time as well (due to
paging), but I think that doesn't count here.


The socket we use is blocking, but we try to increase the socket
buffer to 8M, to move the point where we start to block out late. But
that only works with sufficient privileges.


The sender buffer size is currently a limit on the datagram size.  The 
data goes straight into the receiver's buffer, so it doesn't have any 
impact on blocking behavior.



So yes, we are always blocking, we don't throw data away.


Sure.  I was wondering if indefinite blocking is considered a critical 
service failure and if there is a watchdog which would catch a stuck 
journal daemon.


(Hanging syslog servers are apparently a fairly common problem, and this 
is where the RFE originally came from.)



I'd be willing to take a patch however, that adds a call
sd_journal_set_block_timeout() or so, that takes a time value we pass
to SO_SNDTIMEO for the logging socket. This would permit clients to
precisely control how long they want us to wait before we give up. And
in the case where a zero timeout is set we'd instead set O_NONBLOCK,
thus making logging entirely non-blocking.


This would help with discarding data.  It would not help those who want 
to integrate logging into an event loop.  Which is probably bad idea 
anyway, but I'm not quite sure yet what the purpose of non-blocking 
logging is.  Perhaps there is a desire to apply some back-pressure to 
reduce the rate at which logging messages are generated.  But in 
general, this merely introduces deadlocks.


Thanks,
Florian
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] Is sd_journal_send non-blocking?

2017-02-06 Thread Florian Weimer
The manual page I've got does not say it is, but the implementation does 
not wait for acknowledgment from the journal.  The implementation does 
not block explicitly, but I think the sendmsg call can block until the 
receiver queue is empty.


The background for this question is that we have a feature request for a 
non-blocking logging interface.  I wonder if the journal fits this 
requirement.


I assume that with “non-blocking”, the feature request submitter means 
that the function does not block indefinitely, say due to a service 
process outage.  Memory allocations can take a fairly long time as well 
(due to paging), but I think that doesn't count here.


Thanks,
Florian
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Running ldconfig at boot

2016-05-23 Thread Florian Weimer

On 05/23/2016 11:59 AM, Lennart Poettering wrote:

On Mon, 23.05.16 11:34, Florian Weimer (fwei...@redhat.com) wrote:


On 05/20/2016 04:10 PM, Lennart Poettering wrote:


If such systems require specialized unit files, then you can put
ldconfig.service there, instead of exposing all systemd users to the
service.


No they don't. Basic Fedora works fine in this mode, without any
changes. It isn#t round, and it isn't supported fully by Fedora, but
the basics do work just fine.


Hmm.  And this is sufficient justification to force ldconfig.service on all
users by default, despite its impact on correctness (see below) and boot
time (as reported on this thread as well)?


Well, I'd claim that expecting that if you install libraries on mounts
not included in local-fs.target they still appear in ldconfig is
wrong. I mean, the promise "local-fs.target" makes is that that's all
needed for services to run, and for upgrades of the OS to take
place.


That's not what the documentation in systemd.special(7) says.  It 
essentially defines local-fs.target based on file system type.  It says 
nothing about service availability.


I expect there are still installations out there which run /opt (or 
something like it) off NFS.



I think the trigger is just implemented incorrectly.  It should keep track
of UUIDs in files (such as /usr/.change-uuid), rather than looking at
mtimes.  Then you can just write a new UUID once you update parts of the
file system externally and completely avoid running such triggers when all
subtrees are locally maintained.


I am not sure I grok why uuids would be a better option than mtimes here...


Time-base synchronization tends to cause problems.  If /etc is modified 
before the system time is set from NTP, you get an incorrect timestamp. 
If /usr goes backwards in time, you don't run the update actions.  And 
the update actions are triggered by unexpected package installations.


If you used a UUID file instead, you could (a) mark /usr as externally 
supplied and (b) avoid all issues related to clock skew.


Florian
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Running ldconfig at boot

2016-05-20 Thread Florian Weimer

On 05/20/2016 04:04 PM, Lennart Poettering wrote:

On Fri, 20.05.16 15:55, Florian Weimer (fwei...@redhat.com) wrote:


On 05/20/2016 02:59 PM, Lennart Poettering wrote:

On Fri, 20.05.16 14:01, Florian Weimer (fwei...@redhat.com) wrote:


The default systemd configuration runs ldconfig at boot.  Why?


It's conditionalized via ConditionNeedsUpdate=, which means it is only
run when /etc is older than /usr. (This is tested via checking
modification times of /etc/.updated and /usr), see explanation on
systemd.unit(5).

The usecase for this is to permit systems where a single /usr tree is
shared among multiple systems, and might be updated at any time, and
the changes need to be propagated to /etc on each individual
systems. The keyword is "stateless systems".


Do such systems need systemd configuration changes?


Not sure I understand this question?


If such systems require specialized unit files, then you can put 
ldconfig.service there, instead of exposing all systemd users to the 
service.



There are some more packages where installation or upgrades would update the
/usr mtime.  I don't have a current Fedora rawhide list, but I'm attaching
an older version.  The /usr mtime is not a reliable indicator for what you
are trying to detect, I think.


Well, it really doesn't have to be. I mean, in the worst case we'll
run ldconfig once too often, which should be idempotent...


As I explained, the way you run it, it is not necessarily idempotent.

Florian

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Running ldconfig at boot

2016-05-20 Thread Florian Weimer

On 05/20/2016 02:59 PM, Lennart Poettering wrote:

On Fri, 20.05.16 14:01, Florian Weimer (fwei...@redhat.com) wrote:


The default systemd configuration runs ldconfig at boot.  Why?


It's conditionalized via ConditionNeedsUpdate=, which means it is only
run when /etc is older than /usr. (This is tested via checking
modification times of /etc/.updated and /usr), see explanation on
systemd.unit(5).

The usecase for this is to permit systems where a single /usr tree is
shared among multiple systems, and might be updated at any time, and
the changes need to be propagated to /etc on each individual
systems. The keyword is "stateless systems".


Do such systems need systemd configuration changes?


Note that normally this should not be triggered at all, since this
only works on systems where /usr itsel is explicitly touched after
each updated so that the mtime is updated. That should normally not
happen, except when your distro is prepared for that, and does that
explicitly.


It happens if the filesystem package is upgraded.  I think this is what 
triggered the last mtime update on my workstation.


There are some more packages where installation or upgrades would update 
the /usr mtime.  I don't have a current Fedora rawhide list, but I'm 
attaching an older version.  The /usr mtime is not a reliable indicator 
for what you are trying to detect, I think.


Florian
 nevra  |  name 
  
+-
 arm-none-eabi-binutils-cs-2014.05.28-3.fc22.x86_64 | /usr/arm-none-eabi
 arm-none-eabi-newlib-2.1.0-5.fc21.noarch   | /usr/arm-none-eabi
 avr-binutils-1:2.24-4.fc22.x86_64  | /usr/avr
 avr-libc-1.8.0-9.fc21.noarch   | /usr/avr
 cinnamon-control-center-filesystem-2.4.2-1.fc22.i686   | /usr/share
 cinnamon-control-center-filesystem-2.4.2-1.fc22.x86_64 | /usr/share
 filesystem-3.2-32.fc22.x86_64  | /usr/bin
 filesystem-3.2-32.fc22.x86_64  | /usr/games
 filesystem-3.2-32.fc22.x86_64  | /usr/include
 filesystem-3.2-32.fc22.x86_64  | /usr/lib
 filesystem-3.2-32.fc22.x86_64  | /usr/lib64
 filesystem-3.2-32.fc22.x86_64  | /usr/libexec
 filesystem-3.2-32.fc22.x86_64  | /usr/local
 filesystem-3.2-32.fc22.x86_64  | /usr/sbin
 filesystem-3.2-32.fc22.x86_64  | /usr/share
 filesystem-3.2-32.fc22.x86_64  | /usr/src
 krb5-appl-clients-1.0.3-9.fc22.x86_64  | /usr/kerberos
 krb5-appl-servers-1.0.3-9.fc22.x86_64  | /usr/kerberos
 mingw32-filesystem-99-5.fc21.noarch| /usr/i686-w64-mingw32
 mingw64-filesystem-99-5.fc21.noarch| 
/usr/x86_64-w64-mingw32
 msp430-binutils-2.21.1a-8.fc22.x86_64  | /usr/msp430
 nqp-jvm-0.0.2014.04-3.fc22.noarch  | /usr/languages
 nqp-moar-0.0.2014.04-3.fc22.x86_64 | /usr/languages
 sblim-cmpi-network-1.4.0-13.fc22.i686  | /usr/share
 sblim-cmpi-network-1.4.0-13.fc22.x86_64| /usr/share
(25 rows)

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] Running ldconfig at boot

2016-05-20 Thread Florian Weimer

The default systemd configuration runs ldconfig at boot.  Why?

Most deployments of systemd appear to be dynamically linked, so if the 
ld.so caches are corrupted, you will never get to the point where you 
can run ldconfig.


Running ldconfig too early tends to cause problems because the file 
system might not have been set up completely, and the cache does not 
match what the system administrator has configured.


Florian
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH] gssd: Improve scalability by not waiting for child processes

2015-10-04 Thread Florian Weimer
* Steve Dickson:

> +static void
> +sig_child(int signal)
> +{
> + int err;
> + pid_t pid;
> +
> + /* Parent: just wait on child to exit and return */
> + do {
> + pid = wait();
> + } while(pid == -1 && errno != -ECHILD);
> +
> + if (WIFSIGNALED(err))
> + printerr(0, "WARNING: forked child was killed"
> +  "with signal %d\n", WTERMSIG(err));
> +}

prinerr calls vfprintf or vsyslog.  Neither is safe to use in signal
handlers, so you need to log this message in some other way.

Florian
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Use of capabilities in default service files

2015-07-21 Thread Florian Weimer
On 07/20/2015 02:34 PM, Reindl Harald wrote:
 
 
 Am 20.07.2015 um 13:58 schrieb Florian Weimer:
 On 07/20/2015 01:52 PM, Reindl Harald wrote:


 Am 20.07.2015 um 13:24 schrieb Florian Weimer:
 CapabilityBoundingSet=CAP_IPC_OWNER CAP_SETUID CAP_SETGID CAP_SETPCAP
 m4_ifdef(`HAVE_SMACK', CAP_MAC_ADMIN )
 …
 What's the intent of these settings?  Is it a form of hardening?  If
 yes, it is rather ineffective because UID=0 does not need any
 capabilities to completely compromise the system.

 UID=0 *does* need capabilities,

 drwxr-xr-x. 2 root root   37 Jun 13 10:09 /etc/cron.d
 -rw-r--r--. 1 root root 3068 Jul 17 19:47 /etc/passwd

 UID=0 without CAP_DAC_OVERRIDE (or any other capabilities) can write to
 these locations and escalate fairly directly to full root.
 
 why should it need CAP_DAC_OVERRIDE when it *owns* the files and has
 write permissions?

Exactly, it's the reason why I suspect something fishy is going on if
people to harden services running UID=0 by taking away capabilities.

 chown the file to a different user and root no longer
 can write there
 
 to protect /etc and /usr ReadOnlyDirectories is the way to go
 ReadOnlyDirectories=/etc
 ReadOnlyDirectories=/usr

Then you still have instant root through:

drwx--. 2 root root 20 Feb  3 12:36 /var/spool/cron/

 I found the “CapabilityBoundingSet=” line (empty set) somewhat worrying,
 it seems to me that this service should use a separate UID, not 0
 
 that's a different story and works for services wich don't need to read
 files only readable by root and not listening on privileged ports like
 mysqld and in that case even if there is a root exploit
 CapabilityBoundingSet would reduce the damage
 
 [Service]
 Type=simple
 User=mysql
 Group=mysql

And that's fine.  But doing hardening for UID=0 services seems a very
bad practice to me because it looks like someone is assuming that UID=0
without capabilities is just another “nobody” user.  Which is not
surprising, because capabilities are often advertised that way.

-- 
Florian Weimer / Red Hat Product Security
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] Does socket activation block a TCP port for listening by other processes?

2015-07-21 Thread Florian Weimer
We have quite a zoo of services which listen on localhost, on a fixed
TCP port, for use by local clients.  The canonical example is PostgreSQL
on 5432/TCP, for the benefit of Java clients (which cannot use the UNIX
domain socket).  This has the obvious issue that if a local attacker
crashes the service, they can impersonate it by binding to the same port.

Does socket activation reliably prevent such impersonation attacks?  Or
is there race, say during systemd configuration reloading or service
restarts, where systemd temporarily does not listen to that port?

-- 
Florian Weimer / Red Hat Product Security
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Does socket activation block a TCP port for listening by other processes?

2015-07-21 Thread Florian Weimer
On 07/21/2015 01:52 PM, David Herrmann wrote:
 Hi
 
 On Tue, Jul 21, 2015 at 1:37 PM, Florian Weimer fwei...@redhat.com wrote:
 We have quite a zoo of services which listen on localhost, on a fixed
 TCP port, for use by local clients.  The canonical example is PostgreSQL
 on 5432/TCP, for the benefit of Java clients (which cannot use the UNIX
 domain socket).  This has the obvious issue that if a local attacker
 crashes the service, they can impersonate it by binding to the same port.

 Does socket activation reliably prevent such impersonation attacks?  Or
 is there race, say during systemd configuration reloading or service
 restarts, where systemd temporarily does not listen to that port?
 
 This race *does* exist with socket activation. The sockets may be
 re-created if the unit transitions between states (like restart).

Thanks.  Would it make sense to fix this in some way, so that the socket
sticks around?

 However, this is only triggered by actions on the socket-unit, not the
 actually activated unit. A crash of the activated service will thus
 not trigger state-changes on the socket-unit.

Good to know.

 Furthermore, we do not treat occupying a resource as security
 mechanism. If an attacker has access to bind such a port, we do not
 place restrictions to prevent that. Instead, you should place LSM
 restrictions to prevent this.

I'm stuck with SELinux and the unconfined_t user, so I don't think I can
set an LSM ACL on the port.

 And please note, the actual activated
 unit usually does *not* have rights to bind the socket. This is done
 by pid1. So an attacker would require rights of PID1 for such an
 attack.

unconfined_t unfortunately has this right for port numbers larger than
1023, unfortunately.

-- 
Florian Weimer / Red Hat Product Security
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] Use of capabilities in default service files

2015-07-20 Thread Florian Weimer
There are a few examples similar to these in the source tree:

[Unit]
Description=Legacy D-Bus Protocol Compatibility Daemon

[Service]
ExecStart=@rootlibexecdir@/systemd-bus-proxyd
--address=kernel:path=/sys/fs/kdbus/0-system/bus
NotifyAccess=main
CapabilityBoundingSet=CAP_IPC_OWNER CAP_SETUID CAP_SETGID CAP_SETPCAP
m4_ifdef(`HAVE_SMACK', CAP_MAC_ADMIN )
…

[Unit]
Description=Hostname Service
Documentation=man:systemd-hostnamed.service(8) man:hostname(5)
man:machine-info(5)
Documentation=http://www.freedesktop.org/wiki/Software/systemd/hostnamed

[Service]
ExecStart=@rootlibexecdir@/systemd-hostnamed
BusName=org.freedesktop.hostname1
CapabilityBoundingSet=CAP_SYS_ADMIN
…

[Unit]
Description=Locale Service
Documentation=man:systemd-localed.service(8) man:locale.conf(5)
man:vconsole.conf(5)
Documentation=http://www.freedesktop.org/wiki/Software/systemd/localed

[Service]
ExecStart=@rootlibexecdir@/systemd-localed
BusName=org.freedesktop.locale1
CapabilityBoundingSet=
…


What's the intent of these settings?  Is it a form of hardening?  If
yes, it is rather ineffective because UID=0 does not need any
capabilities to completely compromise the system.

-- 
Florian Weimer / Red Hat Product Security
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Use of capabilities in default service files

2015-07-20 Thread Florian Weimer
On 07/20/2015 01:52 PM, Reindl Harald wrote:
 
 
 Am 20.07.2015 um 13:24 schrieb Florian Weimer:
 CapabilityBoundingSet=CAP_IPC_OWNER CAP_SETUID CAP_SETGID CAP_SETPCAP
 m4_ifdef(`HAVE_SMACK', CAP_MAC_ADMIN )
 …
 What's the intent of these settings?  Is it a form of hardening?  If
 yes, it is rather ineffective because UID=0 does not need any
 capabilities to completely compromise the system.
 
 UID=0 *does* need capabilities,

drwxr-xr-x. 2 root root   37 Jun 13 10:09 /etc/cron.d
-rw-r--r--. 1 root root 3068 Jul 17 19:47 /etc/passwd

UID=0 without CAP_DAC_OVERRIDE (or any other capabilities) can write to
these locations and escalate fairly directly to full root.

 that's the whole purpose of
 CapabilityBoundingSet and so yes it is a form of hardening

To me, it looks someone misunderstood the interactions between UID=0 and
capabilities.

 our internal httpd package is using the following options and when you
 remove CAP_NET_BIND_SERVICE it could not bind to port 80,
 
 PrivateTmp=yes
 PrivateDevices=yes
 NoNewPrivileges=yes
 CapabilityBoundingSet=CAP_DAC_OVERRIDE CAP_IPC_LOCK CAP_NET_BIND_SERVICE
 CAP_SETGID CAP_SETUID

Without code in the daemon to switch away from UID=0, this is not a very
strong restriction (but CAP_DAC_OVERRIDE is root-equivalent anyway, so
it probably does not matter).

I found the “CapabilityBoundingSet=” line (empty set) somewhat worrying,
it seems to me that this service should use a separate UID, not 0.

-- 
Florian Weimer / Red Hat Product Security
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] systemd-resolved: Concerns raised about cache handling

2014-11-17 Thread Florian Weimer
On the oss-security mailing list, Sebastian Kramer raised some concerns 
about the DNS implementation in systemd-resolved:


  http://www.openwall.com/lists/oss-security/2014/11/12/5

I share his concerns, particularly those about caching data not directly 
pertaining to a response (and they were the reason why I asked about 
cache dumping because it's so much easier to show this with this 
debugging aid).  I don't consider this so much a security vulnerability, 
but an interoperability failure in the making (because there are 
networks where broken recursive resolvers do not filter out incorrect or 
misleading data).  So I'm more worried about accidents than attacks.


Some of the other recommendations in RFC 5452 are also relevant to 
caching stubs.  (Sadly, the RFC is incomplete, there is little public 
documentation on how to actually write interoperable DNS resolvers.)


For example, I'm not sure if it is necessary to implement elaborate 
CNAME processing, or just cache everything in the answer section with 
the expected RR type, irrespective of the owner name of the resource 
records, and under the minimum TTL of the entire answer section.  Even 
if you follow CNAME chains, you should only the initial name (QNAME) as 
a cache lookup key, adding the entire CNAME chain still can lead to 
cache poisoning.


--
Florian Weimer / Red Hat Product Security
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd-resolved D-Bus API documentation

2014-10-20 Thread Florian Weimer

On 10/20/2014 09:36 PM, Lennart Poettering wrote:

On Tue, 14.10.14 11:54, Florian Weimer (fwei...@redhat.com) wrote:


Is there some documentation of the D-Bus API and its expected
behavior?


No. We have not documented this so far, as we don't want to guarantee
API stabilitiy for this yet.


Is there any way to test this code, short of writing my own client?


And is there a way to dump the contents of the internal cache?


No, this is not available. (Note that strictly speaking the singular
word cache is misleading, as there are multiple caches in place, one
for each interface/protocol.) Not opposed to adding this though as a
debug feature, but I wonder what your particular usecase for that
would be?


It would extremely helpful if you notice the cache has been poisoned and 
you want to figure out why.


--
Florian Weimer / Red Hat Product Security
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] systemd-resolved D-Bus API documentation

2014-10-14 Thread Florian Weimer
Is there some documentation of the D-Bus API and its expected behavior? 
 And is there a way to dump the contents of the internal cache?


--
Florian Weimer / Red Hat Product Security
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] You are invited to give your thoughts on some async-signal-safety issues.

2014-09-05 Thread Florian Weimer

On 09/03/2014 08:45 PM, Lennart Poettering wrote:

On Sun, 31.08.14 03:31, Steven Stewart-Gallus 
(sstewartgallu...@mylangara.bc.ca) wrote:

I understand that systemd uses sandboxing and multithreading.  After a
fork, many things are messed up so it's only practical to use
async-signal-safe functions after a fork from a threaded program.  If
you have ideas on what sort of functionality GLibc needs to change to
make systemd more async-signal-safe after a fork you are invited to
give your thoughts on the GLibc mailing list in the thread at
https://sourceware.org/ml/libc-help/2014-08/msg00039.html.


All our APIs contain checks that you cannot continue reuse contexts
you create with them after a fork(). If you try the functions will
return ESRCH.


Steven is asking about what you need for glibc beyond what the standards 
provide.  Things like malloc support, access to NSS databases, and so 
on.  For example, bus_kernel_make_starter is called after forking and 
does a lot of things.


But the real question is what you would need to get rid of the 
syscall(__NR_clone) in src/nspawn/nspawn.c.


--
Florian Weimer / Red Hat Product Security
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Delaying (SSH) key generation until the urandom pool is initialized

2014-06-06 Thread Florian Weimer

On 05/05/2014 04:35 PM, Lennart Poettering wrote:

On Wed, 30.04.14 17:06, Florian Weimer (fwei...@redhat.com) wrote:



On 04/30/2014 02:28 PM, Daniel P. Berrange wrote:


Interesting suggestion.  I just used virt-manager to create the VM.
I don't see any trace for rng or random in the domain XML file.
If it is supported, I think it should be enabled by default.


I'm told that it isn't turned on by default, but you can add it to
a VM post-install. Since it feeds VMs from the host's /dev/random
or /dev/hwrng, there was a question mark as to whether it was right
to enable it by default or not, and if so what kind of rate limiting
might be wanted by default.


Ah, so it builds down to our distrust of hardware RNGs?  How
annoying. We should be able to trust Fedora-on-Fedora (or
Debian-on-Debian or whatever) scenarios.  But I get that in the
general case, it's impossible to know what's on the other side of
the virtio_rng side, so reservations remain.


Hmm? Well, a virtualized OS has to trust the hypervisor, there's no way
around that.


I'm referring to this:

 * This function will use the architecture-specific hardware random
 * number generator if it is available.  The arch-specific hw RNG will
 * almost certainly be faster than what we can do in software, but it
 * is impossible to verify that it is implemented securely (as
 * opposed, to, say, the AES encryption of a sequence number using a
 * key known by the NSA).  So it's useful if we need the speed, but
 * only if we're willing to trust the hardware manufacturer not to
 * have put in a back door.

I think this is the reason why the pool isn't considered initialized 
even if its contents has been randomized with RDRAND or similar 
instructions.


I wouldn't be surprised if these minds have a similar concern about 
randomness coming from a hypervisor.


--
Florian Weimer / Red Hat Product Security Team
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Delaying (SSH) key generation until the urandom pool is initialized

2014-05-02 Thread Florian Weimer

On 05/02/2014 01:05 PM, Lennart Poettering wrote:


No, /dev/random can (and will) block long after booting.


But that's what you want in this case, no? You want this to block after
booting if there never has been enough entropy in the pool, right?


What I want is no blocking at all.  If the pool is initialized, this is 
safe unless there is a crippling bug in the kernel random driver (or 
complete break of SHA-1, significantly worse than what's known for, say 
MD4).  If the pool is not initialized, it is practically possible to 
enumerate all possible pool states, and thus predict the data that comes 
out of /dev/urandom.


So I grudgingly accept that some things have to block until /dev/urandom 
is initialized.



Is there any kernel API currently available to at least query if the
pool is properly initialized?


No, there isn't.  I proposed a patch for a new variable in /proc, and 
the question came up whether this is what userspace needs.


There is now a separate discussion about injecting a few bits of entropy 
that in the virt case (where entropy is generally scarce initially), the 
pool is always initialized.



So you have multiple options:

a) fix the kernel to make /dev/urandom block until enough initial
entropy has been gathered. Would fix the roblem, but probably slow
down systemd. In systemd the hash functions used in hash tables are
keyed off /dev/urandom, in order to make collision attacks
difficult. This only requires low-quality entropy, as we rehash with
a new key anyway should the hashtable reach its initial size
limit. Hence blocking reading of /dev/urandom until the pool is
initialized will slow down systemd since would have to block without
actually needing any quality entropy...


If systemd is blocked on /dev/unrandom at this stage, will anything else 
happen on the system that generates more entropy?



b) fix the kernel to make the behaviour explained in a) something
optional, maybe something that is enabled on /dev/random when
O_DIRECT or so is passed during opening. i.e. if that flag is
specifiied that /dev/random will behave like classic /dev/random
until the initial pool is filled up, and like /dev/urandom
afterwards.


Or perhaps a separate device node.


I am very conservative howver to simply delay the boot until the pool is
initialized.


I want this for specific services only.  For things like hash tables, 
it's possible to reseed once more randomness becomes available, but 
that's not true for long-term key generation.


--
Florian Weimer / Red Hat Product Security Team
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Delaying (SSH) key generation until the urandom pool is initialized

2014-04-30 Thread Florian Weimer

On 04/29/2014 09:30 PM, Tom Gundersen wrote:


You can easily start the sockets early, but make the daemon itself
wait for the key generation to finish.


Thanks.  Can you provide an example?

(I don't want to change the daemon code.)


The only thing you then have to make sure is that the key generation
blocks until the non-blocking pool is initialized (I assume that is
what's being used?). For that I suppose you just need to make the
kernel block /dev/urandom until that's the case, I have seen this
being discussed, but don't know the status of those patches.


Would it be possible to do the blocking in a separate service?  This 
way, it would be more visible in diagnostic tools, and it's not 
necessary to change all key generation code (including programs which 
just generation session keys).


I don't know if we can change /dev/urandom to block because that doesn't 
look very backwards-compatible to me.


--
Florian Weimer / Red Hat Product Security Team
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Delaying (SSH) key generation until the urandom pool is initialized

2014-04-30 Thread Florian Weimer

On 04/30/2014 01:08 PM, Lennart Poettering wrote:

On Tue, 29.04.14 20:43, Florian Weimer (fwei...@redhat.com) wrote:


The message at 
https://mail.gnome.org/archives/ostree-list/2014-February/msg00010.html
contains two boot traces from virtual machines which show that the
SSH key is generated before the kernel pool is sufficiently seeded.


Are you saying ssh reads from /dev/urandom rather than /dev/random, but
it should be reading from the latter?


No, that's not what I wrote.

Using /dev/urandom for key generation is fine once its pool is seeded. 
Using existing key generation algorithms with /dev/random instead does 
not work because they consume too much entropy and can block for 
significantly more time than just a few minutes.



WHat does that have to do with systemd?


It seems related to boot ordering, device availability, kernel event 
signaling and so on (although we have little of that at present).


In particular, what I want is that a one-shot service for key generation 
only blocks on pool seeding when it actually needs to run (because the 
keys do not exist yet).  From a support perspective, it seems better to 
do the waiting outside the one-shot service, so that systemd tools can 
be used to examine what is causing the delay.



Would it be possible using socket activation to create the listening
socket for SSH, but block the actual service startup until the keys
have been generated after sufficient entropy became available?

What would you need on the kernel side to implement the waiting?
(Textual comparison of a log message is only good for a prototype.)


THis already exists. It's called /dev/random...


No, /dev/random can (and will) block long after booting.

--
Florian Weimer / Red Hat Product Security Team
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Delaying (SSH) key generation until the urandom pool is initialized

2014-04-30 Thread Florian Weimer

On 04/30/2014 01:14 PM, Daniel P. Berrange wrote:

On Tue, Apr 29, 2014 at 08:43:38PM +0200, Florian Weimer wrote:

The message at 
https://mail.gnome.org/archives/ostree-list/2014-February/msg00010.html
contains two boot traces from virtual machines which show that the
SSH key is generated before the kernel pool is sufficiently seeded.


I'm wondering if the VMs that ostree is creating are being given a
virtio-rng device ? If not that would probably be a good idea to
enable to allow them to get entropy. VMs are generally starved of
entropy even beyond the initial boot up stage, so a virtual RNG is
generally useful.


Interesting suggestion.  I just used virt-manager to create the VM.  I 
don't see any trace for rng or random in the domain XML file.  If it 
is supported, I think it should be enabled by default.


(But I see a similar issue on bare metal.)

--
Florian Weimer / Red Hat Product Security Team
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] Delaying (SSH) key generation until the urandom pool is initialized

2014-04-30 Thread Florian Weimer

On 04/30/2014 02:28 PM, Daniel P. Berrange wrote:


Interesting suggestion.  I just used virt-manager to create the VM.
I don't see any trace for rng or random in the domain XML file.
If it is supported, I think it should be enabled by default.


I'm told that it isn't turned on by default, but you can add it to
a VM post-install. Since it feeds VMs from the host's /dev/random
or /dev/hwrng, there was a question mark as to whether it was right
to enable it by default or not, and if so what kind of rate limiting
might be wanted by default.


Ah, so it builds down to our distrust of hardware RNGs?  How annoying. 
We should be able to trust Fedora-on-Fedora (or Debian-on-Debian or 
whatever) scenarios.  But I get that in the general case, it's 
impossible to know what's on the other side of the virtio_rng side, so 
reservations remain.


--
Florian Weimer / Red Hat Product Security Team
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] Delaying (SSH) key generation until the urandom pool is initialized

2014-04-29 Thread Florian Weimer
The message at 
https://mail.gnome.org/archives/ostree-list/2014-February/msg00010.html contains 
two boot traces from virtual machines which show that the SSH key is 
generated before the kernel pool is sufficiently seeded.


Would it be possible using socket activation to create the listening 
socket for SSH, but block the actual service startup until the keys have 
been generated after sufficient entropy became available?


What would you need on the kernel side to implement the waiting? 
(Textual comparison of a log message is only good for a prototype.)


--
Florian Weimer / Red Hat Product Security Team
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] threads and cgroups

2014-02-13 Thread Florian Weimer

On 02/12/2014 11:46 AM, Lennart Poettering wrote:


The plan by the kernel guys is to not allow cgroup management on threads
anymore, only entire processes. The APIs in systemd reflect this
decision by the kernel guys: you cannot arrange individual threads for
organization by cgroups.


Interesting.  Will there be an alternative for I/O bandwidth throttling?

--
Florian Weimer / Red Hat Product Security Team
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] [PATCH 00/11] *** SUBJECT HERE ***

2013-12-20 Thread Florian Weimer

On 12/20/2013 08:50 PM, Colin Walters wrote:

On Thu, 2013-12-19 at 14:20 +0100, Florian Weimer wrote:


readdir_r is rather broken on Linux because there are some directories
it cannot read.


Citation?  Are you talking about
http://womble.decadent.org.uk/readdir_r-advisory.html

?


Partly, yes.  Current glibc documentation has this to say:

https://sourceware.org/git/?p=glibc.git;a=blob;f=manual/filesys.texi;h=1cac45393d554a7a6a83c184262e2ce0be7c8885;hb=HEAD#l495

(Disclaimer: I helped to write these paragraphs.  Also see the comments 
about NAME_MAX in conf.texi.)


Previous glibc versions truncated the buffer so that it wasn't 
necessarily NUL-terminated, or had a buffer overflow (on certain 
non-mainstream architectures).  readdir_r also makes an necessary copy 
of the dirent struct.


--
Florian Weimer / Red Hat Product Security Team
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH 00/11] *** SUBJECT HERE ***

2013-12-19 Thread Florian Weimer
This patch series removes all uses of readdir_r.  The changes are not
entirely mechanical, some of the patches fix bugs in error handling.

readdir_r is rather broken on Linux because there are some directories
it cannot read.

Florian Weimer (11):
  tmpfiles: replace readdir_r with readdir
  login: replace readdir_r with readdir
  delta: replace readdir_r with readdir
  core: replace readdir_r with readdir
  conf-files: replace readdir_r with readdir
  install: replace readdir_r with readdir
  util: replace readdir_r with readdir
  journal/vacuum: replace readdir_r with readdir
  journald/server: replace readdir_r with readdir
  journal: replace readdir_r with readdir
  util: remove union dirent_storage

 TODO  |  1 -
 src/core/load-dropin.c|  7 ---
 src/delta/delta.c |  7 +++
 src/journal/journal-vacuum.c  |  9 -
 src/journal/journald-server.c |  8 
 src/journal/sd-journal.c  | 24 ++--
 src/login/sd-login.c  |  8 
 src/shared/conf-files.c   |  8 
 src/shared/install.c  | 22 ++
 src/shared/util.c | 37 +
 src/shared/util.h |  6 --
 src/tmpfiles/tmpfiles.c   |  8 
 12 files changed, 72 insertions(+), 73 deletions(-)

-- 
1.8.3.1

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH 02/11] login: replace readdir_r with readdir

2013-12-19 Thread Florian Weimer
---
 src/login/sd-login.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/login/sd-login.c b/src/login/sd-login.c
index c9a2e8a..2930b87 100644
--- a/src/login/sd-login.c
+++ b/src/login/sd-login.c
@@ -555,13 +555,13 @@ _public_ int sd_get_uids(uid_t **users) {
 
 for (;;) {
 struct dirent *de;
-union dirent_storage buf;
 int k;
 uid_t uid;
 
-k = readdir_r(d, buf.de, de);
-if (k != 0)
-return -k;
+errno = 0;
+de = readdir(d);
+if (!de  errno != 0)
+return -errno;
 
 if (!de)
 break;
-- 
1.8.3.1

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH 06/11] install: replace readdir_r with readdir

2013-12-19 Thread Florian Weimer
The old code incorrectly assumed that readdir_r updates errno.
---
 src/shared/install.c | 22 ++
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/src/shared/install.c b/src/shared/install.c
index 17e8a75..5001ad4 100644
--- a/src/shared/install.c
+++ b/src/shared/install.c
@@ -212,11 +212,10 @@ static int remove_marked_symlinks_fd(
 
 for (;;) {
 struct dirent *de;
-union dirent_storage buf;
-int k;
 
-k = readdir_r(d, buf.de, de);
-if (k != 0) {
+errno = 0;
+de = readdir(d);
+if (!de  errno != 0) {
 r = -errno;
 break;
 }
@@ -373,12 +372,11 @@ static int find_symlinks_fd(
 }
 
 for (;;) {
-int k;
 struct dirent *de;
-union dirent_storage buf;
 
-k = readdir_r(d, buf.de, de);
-if (k != 0)
+errno = 0;
+de = readdir(d);
+if (!de  errno != 0)
 return -errno;
 
 if (!de)
@@ -1938,12 +1936,12 @@ int unit_file_get_list(
 
 for (;;) {
 struct dirent *de;
-union dirent_storage buffer;
 _cleanup_unitfilelist_free_ UnitFileList *f = NULL;
 
-r = readdir_r(d, buffer.de, de);
-if (r != 0)
-return -r;
+errno = 0;
+de = readdir(d);
+if (!de  errno != 0)
+return -errno;
 
 if (!de)
 break;
-- 
1.8.3.1

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH 01/11] tmpfiles: replace readdir_r with readdir

2013-12-19 Thread Florian Weimer
---
 src/tmpfiles/tmpfiles.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/tmpfiles/tmpfiles.c b/src/tmpfiles/tmpfiles.c
index b7f6a2e..e83a73e 100644
--- a/src/tmpfiles/tmpfiles.c
+++ b/src/tmpfiles/tmpfiles.c
@@ -543,15 +543,15 @@ static int recursive_relabel_children(Item *i, const char 
*path) {
 
 for (;;) {
 struct dirent *de;
-union dirent_storage buf;
 bool is_dir;
 int r;
 _cleanup_free_ char *entry_path = NULL;
 
-r = readdir_r(d, buf.de, de);
-if (r != 0) {
+errno = 0;
+de = readdir(d);
+if (!de  errno != 0) {
 if (ret == 0)
-ret = -r;
+ret = -errno;
 break;
 }
 
-- 
1.8.3.1

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH 09/11] journald/server: replace readdir_r with readdir

2013-12-19 Thread Florian Weimer
The available_space function now returns 0 if reading the directory
fails.  Previously, such errors were silently ignored.
---
 src/journal/journald-server.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/journal/journald-server.c b/src/journal/journald-server.c
index 3f8b95d..a3bacda 100644
--- a/src/journal/journald-server.c
+++ b/src/journal/journald-server.c
@@ -135,11 +135,11 @@ static uint64_t available_space(Server *s, bool verbose) {
 for (;;) {
 struct stat st;
 struct dirent *de;
-union dirent_storage buf;
 
-r = readdir_r(d, buf.de, de);
-if (r != 0)
-break;
+errno = 0;
+de = readdir(d);
+if (!de  errno != 0)
+return 0;
 
 if (!de)
 break;
-- 
1.8.3.1

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH 08/11] journal/vacuum: replace readdir_r with readdir

2013-12-19 Thread Florian Weimer
---
 src/journal/journal-vacuum.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/src/journal/journal-vacuum.c b/src/journal/journal-vacuum.c
index d4a1c6c..b2b47d6 100644
--- a/src/journal/journal-vacuum.c
+++ b/src/journal/journal-vacuum.c
@@ -180,9 +180,7 @@ int journal_directory_vacuum(
 return -errno;
 
 for (;;) {
-int k;
 struct dirent *de;
-union dirent_storage buf;
 size_t q;
 struct stat st;
 char *p;
@@ -190,9 +188,10 @@ int journal_directory_vacuum(
 sd_id128_t seqnum_id;
 bool have_seqnum;
 
-k = readdir_r(d, buf.de, de);
-if (k != 0) {
-r = -k;
+errno = 0;
+de = readdir(d);
+if (!de  errno != 0) {
+r = -errno;
 goto finish;
 }
 
-- 
1.8.3.1

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH 07/11] util: replace readdir_r with readdir

2013-12-19 Thread Florian Weimer
This fixes rm_rf_children_dangerous to detect errors during directory
reading.  Previously, it could dereference an uninitialized pointer.
---
 src/shared/util.c | 37 +
 1 file changed, 17 insertions(+), 20 deletions(-)

diff --git a/src/shared/util.c b/src/shared/util.c
index f598971..481c172 100644
--- a/src/shared/util.c
+++ b/src/shared/util.c
@@ -2306,7 +2306,6 @@ bool is_device_path(const char *path) {
 
 int dir_is_empty(const char *path) {
 _cleanup_closedir_ DIR *d;
-int r;
 
 d = opendir(path);
 if (!d)
@@ -2314,11 +2313,11 @@ int dir_is_empty(const char *path) {
 
 for (;;) {
 struct dirent *de;
-union dirent_storage buf;
 
-r = readdir_r(d, buf.de, de);
-if (r  0)
-return -r;
+errno = 0;
+de = readdir(d);
+if (!de  errno != 0)
+return -errno;
 
 if (!de)
 return 1;
@@ -2660,14 +2659,15 @@ int rm_rf_children_dangerous(int fd, bool only_dirs, 
bool honour_sticky, struct
 
 for (;;) {
 struct dirent *de;
-union dirent_storage buf;
 bool is_dir, keep_around;
 struct stat st;
 int r;
 
-r = readdir_r(d, buf.de, de);
-if (r != 0  ret == 0) {
-ret = -r;
+errno = 0;
+de = readdir(d);
+if (!de  errno != 0) {
+if (ret == 0)
+ret = -errno;
 break;
 }
 
@@ -4485,13 +4485,11 @@ int get_files_in_directory(const char *path, char 
***list) {
 
 for (;;) {
 struct dirent *de;
-union dirent_storage buf;
-int k;
 
-k = readdir_r(d, buf.de, de);
-assert(k = 0);
-if (k  0)
-return -k;
+errno = 0;
+de = readdir(d);
+if (!de  errno != 0)
+return -errno;
 if (!de)
 break;
 
@@ -5617,15 +5615,14 @@ int on_ac_power(void) {
 
 for (;;) {
 struct dirent *de;
-union dirent_storage buf;
 _cleanup_close_ int fd = -1, device = -1;
 char contents[6];
 ssize_t n;
-int k;
 
-k = readdir_r(d, buf.de, de);
-if (k != 0)
-return -k;
+errno = 0;
+de = readdir(d);
+if (!de  errno != 0)
+return -errno;
 
 if (!de)
 break;
-- 
1.8.3.1

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH 04/11] core: replace readdir_r with readdir

2013-12-19 Thread Florian Weimer
---
 src/core/load-dropin.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/src/core/load-dropin.c b/src/core/load-dropin.c
index a877e66..3504009 100644
--- a/src/core/load-dropin.c
+++ b/src/core/load-dropin.c
@@ -63,12 +63,13 @@ static int iterate_dir(
 
 for (;;) {
 struct dirent *de;
-union dirent_storage buf;
 _cleanup_free_ char *f = NULL;
 int k;
 
-k = readdir_r(d, buf.de, de);
-if (k != 0) {
+errno = 0;
+de = readdir(d);
+if (!de  errno != 0) {
+k = errno;
 log_error(Failed to read directory %s: %s, path, 
strerror(k));
 return -k;
 }
-- 
1.8.3.1

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH 05/11] conf-files: replace readdir_r with readdir

2013-12-19 Thread Florian Weimer
---
 src/shared/conf-files.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/src/shared/conf-files.c b/src/shared/conf-files.c
index 92204a6..c86bb03 100644
--- a/src/shared/conf-files.c
+++ b/src/shared/conf-files.c
@@ -53,13 +53,13 @@ static int files_add(Hashmap *h, const char *root, const 
char *path, const char
 
 for (;;) {
 struct dirent *de;
-union dirent_storage buf;
 char *p;
 int r;
 
-r = readdir_r(dir, buf.de, de);
-if (r != 0)
-return -r;
+errno = 0;
+de = readdir(dir);
+if (!de  errno != 0)
+return -errno;
 
 if (!de)
 break;
-- 
1.8.3.1

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH 11/11] util: remove union dirent_storage

2013-12-19 Thread Florian Weimer
---
 TODO  | 1 -
 src/shared/util.h | 6 --
 2 files changed, 7 deletions(-)

diff --git a/TODO b/TODO
index 2e3becc..1e8f2b0 100644
--- a/TODO
+++ b/TODO
@@ -48,7 +48,6 @@ Features:
   - ensure scope units may be started only a single time
 
 * code cleanup
-  - get rid of readdir_r/dirent_storage stuff, it's unnecessary on Linux
   - we probably should replace the left-over uses of strv_append() and replace 
them by strv_push() or strv_extend()
 
 * switch to SipHash for hashmaps/sets?
diff --git a/src/shared/util.h b/src/shared/util.h
index 3e0a6d5..488ce3b 100644
--- a/src/shared/util.h
+++ b/src/shared/util.h
@@ -44,12 +44,6 @@
 #include macro.h
 #include time-util.h
 
-union dirent_storage {
-struct dirent de;
-uint8_t storage[offsetof(struct dirent, d_name) +
-((NAME_MAX + 1 + sizeof(long))  ~(sizeof(long) - 1))];
-};
-
 /* What is interpreted as whitespace? */
 #define WHITESPACE  \t\n\r
 #define NEWLINE \n\r
-- 
1.8.3.1

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


[systemd-devel] [PATCH 10/11] journal: replace readdir_r with readdir

2013-12-19 Thread Florian Weimer
This commit also adds error handling for failures during
directory reading.
---
 src/journal/sd-journal.c | 24 ++--
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/src/journal/sd-journal.c b/src/journal/sd-journal.c
index bb116df..283d593 100644
--- a/src/journal/sd-journal.c
+++ b/src/journal/sd-journal.c
@@ -1442,10 +1442,16 @@ static int add_directory(sd_journal *j, const char 
*prefix, const char *dirname)
 
 for (;;) {
 struct dirent *de;
-union dirent_storage buf;
 
-r = readdir_r(d, buf.de, de);
-if (r != 0 || !de)
+errno = 0;
+de = readdir(d);
+if (!de  errno != 0) {
+r = -errno;
+log_debug(Failed to read directory %s: %s,
+  m-path, strerror(errno));
+return r;
+}
+if (!de)
 break;
 
 if (dirent_is_file_with_suffix(de, .journal) ||
@@ -1526,11 +1532,17 @@ static int add_root_directory(sd_journal *j, const char 
*p) {
 
 for (;;) {
 struct dirent *de;
-union dirent_storage buf;
 sd_id128_t id;
 
-r = readdir_r(d, buf.de, de);
-if (r != 0 || !de)
+errno = 0;
+de = readdir(d);
+if (!de  errno != 0) {
+r = -errno;
+log_debug(Failed to read directory %s: %s,
+  m-path, strerror(errno));
+return r;
+}
+if (!de)
 break;
 
 if (dirent_is_file_with_suffix(de, .journal) ||
-- 
1.8.3.1

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel