Re: [systemd-devel] systemd kills mdmon if it was started manually by user

2011-11-08 Thread Lennart Poettering
On Mon, 07.11.11 11:09, Williams, Dan J (dan.j.willi...@intel.com) wrote:

  What exactly is kill_all_processes()?   is it SIGTERM or SIGKILL or both
  with a gap or ???
 
  SIGTERM followed by SIGKILL after 5s if the programs do not react to
  that in time. But note that this logic only applies to processes which
  for some reason managed to escape systemd's usual cgroup-based killing
  logic. Normal services are hence already killed at that time, and only
  processes which moved themselves out of any cgroup or for which the
  service files disabled killing might survive to this point.
 
 So I think mdmon should always try to escape itself from cgroup based
 killing.  It follows the lifespan of the array, and if the array is
 not stopped by the cgroup exit (or the array lifespan is not
 controlled in a service file), then mdmon must keep running.

Well, I think when it gets killed by the cgroup-based killer then it
should try to tear down its MD device.

In the mdmon service file use SendSIGKILL=no to disable sending of
SIGKILL after the initial SIGTERM. With KillSignal= you chan choose the
signal you first want to be killed with, if you don't want it to be
SIGTERM. With KillMode= you can choose whether only the main process of
the service, all processes of the service, or no processes of the
service shall be killed. With TimeoutSec= you can set the timeout
between the SIGTERM and the SIGKILL. See systemd.service(5) for more
information.

  You have relatively flexible control of the first step in this code. The
  second step is then the hammer that tries to fix up what this step
  didn't accomplish. My suggestion to check argv[0][0] was to avoid the
  hammer.
 
 I notice that if the rootfs is on a dm or md device systemd/shutdown
 will always fall through to ultimate_send_signal() which will not
 discriminate against processes flagged with '@'.  Since we aren't
 stopping the root md device I wonder if ultimate_send_signal() should
 also ignore flagged processes, or whether the failure to stop the root
 device is to be expected and let shutdown skip ultimate_send_signal()
 if the only remaining work is shutting down the rootfs-blockdev.  I'm
 leaning towards the latter.

The idea was to skip processes flgged with '@' in both the
ultimate_send_signal() and send_signal() calls.

Lennart

-- 
Lennart Poettering - Red Hat, Inc.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd kills mdmon if it was started manually by user

2011-11-08 Thread Michal Soltys

On 11-11-08 01:11, Michal Soltys wrote:


I've peeked into systemd, and from what I can see, it /only/ jumps back
to initramfs (prepare_new_root() and pivot_to_new_root()) if shutdown
binary is present on initramfs. And whenever mdmon is still running or
not, is not in any way determinent for pivot_root(2) call to succeed (or
... ?).

If /run/initramfs/shutdown is not present, then systemd just do the
things the old way as far as I can see - it doesn't even attempt to
pivot. And if it doesn't, the it can't umount the root (being itself
tied to it) ?

So essentially, if systemd execs /shutdown (after pivoting to
/run/initramfs) - then it's dracut's modules.d/99shutdown, which itself
sources hooks from other modules to do the rest of cleaning job. And
that should take care of all the remaining stuff (including terminating
mdmon in graceful way, and then umounting /oldroot). Either way - pretty
simple to add the necessary functionality to dracut.

So wouldn't simply a systemd's cgroup named say - immortals - with mdmon
(by default) in it suffice ? Pivot back as usual, leave mdmon alive, let
the dracut (or anything else used for initramfs) do the rest of the job
(properly).


I did some testings today, and it's all working nicely as expected. 
Actually I modified classic rc scripts I'm using under sysinit to 
perform full umount/detach (using similar methods to systemd), with 
mdmon happily living through everything. The only things needed after 
pivot_root were:


mdmon --takeover --all
telinit U

(so obviously my dracut image had mdmon, telinit and init, and slightly 
adjusted shutdown script).


Then everything from oldroot could be nicely and cleanly umounted.

Even more elegant would be if e.g. mdmon had added option such as:

--reroot newroot

to chroot() and reopen its files under newroot, and then systemd would 
call


mdmon --reroot /run/initramfs --all --takeover

after - prepare_new_root() and before - pivot_to_new_root()

Then even existing intiramfs image could (probably) be mdmon-agnostic.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd kills mdmon if it was started manually by user

2011-11-08 Thread Michal Soltys

On 11-11-08 17:46, Michal Soltys wrote:

Then even existing intiramfs image could (probably) be mdmon-agnostic.


Actually:

chroot /run/initramfs mdmon --takeover --all

worked just fine (after preparing new root - so after all mount --binds, 
and before pivot_root(8)).


So in context of systemd instead of sysv scripts - a fork / chroot / 
exec mdmon / wait - instead of killing it would do the thing, followed 
by pivot_to_new_root().


Actually anything that could benefit from immortality in one or the 
other way (perhaps udevd, so e.g. lvm doesn't need --noudevsync ? - when 
taken over inside dracut's shutdown or anything similar after going back 
to initramfs) that can be pre-chrooted into /run/initramfs and exec'ed, 
should work just fine. For the record, udevd worked properly with pivot 
survival.


___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd kills mdmon if it was started manually by user

2011-11-08 Thread Williams, Dan J
On Tue, Nov 8, 2011 at 6:43 AM, Lennart Poettering
lenn...@poettering.net wrote:
 On Mon, 07.11.11 11:09, Williams, Dan J (dan.j.willi...@intel.com) wrote:
 So I think mdmon should always try to escape itself from cgroup based
 killing.  It follows the lifespan of the array, and if the array is
 not stopped by the cgroup exit (or the array lifespan is not
 controlled in a service file), then mdmon must keep running.

 Well, I think when it gets killed by the cgroup-based killer then it
 should try to tear down its MD device.

We can easily fall off the complexity cliff trying to tear down the MD
device because it can be pinned by a mounted filesystem or being
claimed anywhere in an arbitrary stack of DM or MD devices.  I did not
think cgroup exit would umount() filesystems?

[..]
 I notice that if the rootfs is on a dm or md device systemd/shutdown
 will always fall through to ultimate_send_signal() which will not
 discriminate against processes flagged with '@'.  Since we aren't
 stopping the root md device I wonder if ultimate_send_signal() should
 also ignore flagged processes, or whether the failure to stop the root
 device is to be expected and let shutdown skip ultimate_send_signal()
 if the only remaining work is shutting down the rootfs-blockdev.  I'm
 leaning towards the latter.

 The idea was to skip processes flgged with '@' in both the
 ultimate_send_signal() and send_signal() calls.

Ok, that makes it easier.

--
Dan
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd kills mdmon if it was started manually by user

2011-11-07 Thread Lennart Poettering
On Mon, 07.11.11 13:52, NeilBrown (ne...@suse.de) wrote:

  Why doesn't the kernel do that on its own?
 
 Because the kernel doesn't know about the format of the metadata that
 describes the array.

Yupp, my suggestion would be to change that. 

  What we do right now is this:
  
  kill_all_processes();
  do {
   umount_all_file_systems_we_can();
   read_only_mount_all_remaining_file_systems();
  } while (we_had_some_success_with_that());
  jump_into_initrd();
  
  As long as mdmon references a file from the root disk we cannot umount
  it, so the loop wouldn't be effective.
 
 What exactly is kill_all_processes()?   is it SIGTERM or SIGKILL or both
 with a gap or ???

SIGTERM followed by SIGKILL after 5s if the programs do not react to
that in time. But note that this logic only applies to processes which
for some reason managed to escape systemd's usual cgroup-based killing
logic. Normal services are hence already killed at that time, and only
processes which moved themselves out of any cgroup or for which the
service files disabled killing might survive to this point.

 I assume a SIGKILL.  I don't mind a SIGTERM and it could be useful to
 expedite mdmon cleaning up.
 
 However there is an important piece missing.  When you remount,ro a
 filesystem, the block device doesn't get told so it thinks it is still open
 read/write.  So md cannot tell mdmon that the array is now read-only
 It would make a lot of sense for mdmon to exit after receiving a SIGTERM as
 soon as the device is marked read-only.  But it just doesn't know.

As mentioned by Kay, you can get notifications for this by poll()ing on
/proc/self/mountinfo. Note again however, that we kill first, and only
then try to unmount/remount.

 We can probably fix that, but that doesn't really help for now.
 
 I think I would like:
 
  - add to the above loop stop any virtual devices that we can.
Exactly how to do that if /proc and /sys are already unmounted
is unclear.  Is one or both of these kept around somewhere?

/proc and /sys are not unmounted in this loop. Being virtual API fs we
exclude them from this logic and leave them around until the initrd
unmounts them if it wants to.

Actually, in the loop above there are three more steps: in each
iteration we also try to detach all swap devices, all loopback devices
and all DM devices. We probably could add a similar operation for MD
devices here too. But note that this loop is more of a last-resort
thing, and normally shouldn't do much.

  - allow processes to be marked some way so they get SIGTERM but not
SIGKILL.  I'm happy adding magic char to argv[0].

Note that you can configure how you are killed relatively flexibly in
the service files and that the loop pointed out above is only this last
resort thing which is applied to all processes/mount points/... which
stick around after this normal shutdown.

Here's another attempt in explaining how this works:

snip
terminate_all_mount_and_service_units();
kill_all_remaining_processes();
do {
 umount_all_remaining_file_systems_we_can();
 read_only_mount_all_remaining_file_systems();
 detach_all_remaining_loop_devices();
 detach_all_remaining_swap_devices();
 detach_all_remaining_dm_devices();
} while (we_had_some_success_with_that());
jump_into_initrd();
/snip

You have relatively flexible control of the first step in this code. The
second step is then the hammer that tries to fix up what this step
didn't accomplish. My suggestion to check argv[0][0] was to avoid the
hammer.

Lennart

-- 
Lennart Poettering - Red Hat, Inc.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd kills mdmon if it was started manually by user

2011-11-07 Thread Williams, Dan J
On Mon, Nov 7, 2011 at 4:00 AM, Lennart Poettering
lenn...@poettering.net wrote:
 On Mon, 07.11.11 13:52, NeilBrown (ne...@suse.de) wrote:

  Why doesn't the kernel do that on its own?

 Because the kernel doesn't know about the format of the metadata that
 describes the array.

 Yupp, my suggestion would be to change that.

It's quite a bit of idiosyncratic code that needs to be duplicated in
kernel space and userspace (since userspace always needs to know how
to parse the metadata for array assembly).  All to record a dirty bit
that flips at most every 5 seconds, or a disk failure event which is
even less frequent.  Throw in policy constraints like restricting
which block devices can become part of the raid set.  Rinse and repeat
for every possible metadata format.

[..]
 What exactly is kill_all_processes()?   is it SIGTERM or SIGKILL or both
 with a gap or ???

 SIGTERM followed by SIGKILL after 5s if the programs do not react to
 that in time. But note that this logic only applies to processes which
 for some reason managed to escape systemd's usual cgroup-based killing
 logic. Normal services are hence already killed at that time, and only
 processes which moved themselves out of any cgroup or for which the
 service files disabled killing might survive to this point.

So I think mdmon should always try to escape itself from cgroup based
killing.  It follows the lifespan of the array, and if the array is
not stopped by the cgroup exit (or the array lifespan is not
controlled in a service file), then mdmon must keep running.

[..]

 Here's another attempt in explaining how this works:

 snip
 terminate_all_mount_and_service_units();
 kill_all_remaining_processes();
 do {
     umount_all_remaining_file_systems_we_can();
     read_only_mount_all_remaining_file_systems();
     detach_all_remaining_loop_devices();
     detach_all_remaining_swap_devices();
     detach_all_remaining_dm_devices();

So I've started putting together a md_detach_all() routine that will
attempt to stop all md devices (via sysfs).  Where all mdmon instances
have missed the initial killall with the argv '@' flagging.

Like the dm implementation it will address all but the root md device.

 } while (we_had_some_success_with_that());
 jump_into_initrd();

The final act of the initramfs is then mdadm --wait-clean --scan to
communicate with the rootfs-blockdev-mdmon to be sure the metadata has
been marked clean.  All other mdmon instances should have exited
naturally when their md devices stopped, but the --wait-clean --scan
will have ensured shutdown can progress safely.

 You have relatively flexible control of the first step in this code. The
 second step is then the hammer that tries to fix up what this step
 didn't accomplish. My suggestion to check argv[0][0] was to avoid the
 hammer.

I notice that if the rootfs is on a dm or md device systemd/shutdown
will always fall through to ultimate_send_signal() which will not
discriminate against processes flagged with '@'.  Since we aren't
stopping the root md device I wonder if ultimate_send_signal() should
also ignore flagged processes, or whether the failure to stop the root
device is to be expected and let shutdown skip ultimate_send_signal()
if the only remaining work is shutting down the rootfs-blockdev.  I'm
leaning towards the latter.

--
Dan
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd kills mdmon if it was started manually by user

2011-11-07 Thread Michal Soltys

On 11-11-02 14:32, Lennart Poettering wrote:

What we do right now is this:

kill_all_processes();
do {
  umount_all_file_systems_we_can();
  read_only_mount_all_remaining_file_systems();
} while (we_had_some_success_with_that());
jump_into_initrd();

As long as mdmon references a file from the root disk we cannot umount
it, so the loop wouldn't be effective.



I've peeked into systemd, and from what I can see, it /only/ jumps back 
to initramfs (prepare_new_root() and pivot_to_new_root()) if shutdown 
binary is present on initramfs. And whenever mdmon is still running or 
not, is not in any way determinent for pivot_root(2) call to succeed (or 
... ?).


If /run/initramfs/shutdown is not present, then systemd just do the 
things the old way as far as I can see - it doesn't even attempt to 
pivot. And if it doesn't, the it can't umount the root (being itself 
tied to it) ?


So essentially, if systemd execs /shutdown (after pivoting to 
/run/initramfs) - then it's dracut's modules.d/99shutdown, which itself 
sources hooks from other modules to do the rest of cleaning job. And 
that should take care of all the remaining stuff (including terminating 
mdmon in graceful way, and then umounting /oldroot). Either way - pretty 
simple to add the necessary functionality to dracut.


So wouldn't simply a systemd's cgroup named say - immortals - with mdmon 
(by default) in it suffice ? Pivot back as usual, leave mdmon alive, let 
the dracut (or anything else used for initramfs) do the rest of the job 
(properly).



p.s.
Sorry if I missed something obvious, it was a quick and late peek over 
systemd's shutdown.c.

___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd kills mdmon if it was started manually by user

2011-11-06 Thread NeilBrown
On Wed, 2 Nov 2011 14:32:25 +0100 Lennart Poettering lenn...@poettering.net
wrote:

 On Wed, 02.11.11 13:03, NeilBrown (ne...@suse.de) wrote:

  Each instance of mdmon manages a set of arrays and must remain running
  until all of those arrays are readonly (or shut down).  This allows it to
  record that all writes have completed and mark the array as 'clean' so a
  resync isn't needed at next boot.
 
 Why doesn't the kernel do that on its own?

Because the kernel doesn't know about the format of the metadata that
describes the array.

  
  You couldn't just do the equivalent of
fuser -k /some/filesystem
umount /some/filesystem
  
  iterating over filesystems with '/' last?
 
  Then anything that only uses the /run filesystem will survive.
 
 What we do right now is this:
 
 kill_all_processes();
 do {
  umount_all_file_systems_we_can();
  read_only_mount_all_remaining_file_systems();
 } while (we_had_some_success_with_that());
 jump_into_initrd();
 
 As long as mdmon references a file from the root disk we cannot umount
 it, so the loop wouldn't be effective.

What exactly is kill_all_processes()?   is it SIGTERM or SIGKILL or both
with a gap or ???

I assume a SIGKILL.  I don't mind a SIGTERM and it could be useful to
expedite mdmon cleaning up.

However there is an important piece missing.  When you remount,ro a
filesystem, the block device doesn't get told so it thinks it is still open
read/write.  So md cannot tell mdmon that the array is now read-only
It would make a lot of sense for mdmon to exit after receiving a SIGTERM as
soon as the device is marked read-only.  But it just doesn't know.

We can probably fix that, but that doesn't really help for now.

I think I would like:

 - add to the above loop stop any virtual devices that we can.
   Exactly how to do that if /proc and /sys are already unmounted
   is unclear.  Is one or both of these kept around somewhere?

 - allow processes to be marked some way so they get SIGTERM but not
   SIGKILL.  I'm happy adding magic char to argv[0].

We should be able to make it work with those changes - if they are possible.

Thanks,

NeilBrown



signature.asc
Description: PGP signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd kills mdmon if it was started manually by user

2011-11-06 Thread Kay Sievers
On Mon, Nov 7, 2011 at 03:52, NeilBrown ne...@suse.de wrote:

 However there is an important piece missing.  When you remount,ro a
 filesystem, the block device doesn't get told so it thinks it is still open
 read/write.  So md cannot tell mdmon that the array is now read-only

That ro/rw flag is visible in /proc/self/mountinfo, shouldn't it be
possible for mdmon to poll() that file and let the kernel wake stuff
up when the ro/rw flag changes, like we do for the usual mount changes
already?

Kay
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd kills mdmon if it was started manually by user

2011-11-06 Thread NeilBrown
On Mon, 7 Nov 2011 04:42:54 +0100 Kay Sievers kay.siev...@vrfy.org wrote:

 On Mon, Nov 7, 2011 at 03:52, NeilBrown ne...@suse.de wrote:
 
  However there is an important piece missing.  When you remount,ro a
  filesystem, the block device doesn't get told so it thinks it is still open
  read/write.  So md cannot tell mdmon that the array is now read-only
 
 That ro/rw flag is visible in /proc/self/mountinfo, shouldn't it be
 possible for mdmon to poll() that file and let the kernel wake stuff
 up when the ro/rw flag changes, like we do for the usual mount changes
 already?
 
 Kay

The ro/rw flag for file systems is in /proc/self/mountinfo.

However I want the ro/rw flag for the block device.
A block device can be partitioned so it might have multiple filesystems on it.
and it might have swap too.
or a dm table or another md device or an open file descriptor or 

Yes, I could maybe parse various different files and try to work out what is
going on.  But the kernel can easily *know* what is going on.

Making this work perfectly would require md dropping its write-access to
member devices when the last write-access to the top level device goes.  And
the same for dm and loop and .

But just filesystems would go a long way to catching the common cases
correctly.

Thanks,
NeilBrown



signature.asc
Description: PGP signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd kills mdmon if it was started manually by user

2011-11-02 Thread Williams, Dan J
On Wed, Nov 2, 2011 at 7:33 AM, Kay Sievers kay.siev...@vrfy.org wrote:
 People who like to put their rootfs on a userspace managed raid device
 just get what they asked for. :)

Proper care and feeding of mdmon and userspace managed block devices /
filesystems is a solvable problem.  To me the :) runs the risk of
implying we don't think we can get this right.

--
Dan
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd kills mdmon if it was started manually by user

2011-11-02 Thread Kay Sievers
On Wed, Nov 2, 2011 at 20:31, Williams, Dan J dan.j.willi...@intel.com wrote:
 On Wed, Nov 2, 2011 at 11:49 AM, Kay Sievers kay.siev...@vrfy.org wrote:
 On Wed, Nov 2, 2011 at 19:16, Williams, Dan J dan.j.willi...@intel.com 
 wrote:
 On Wed, Nov 2, 2011 at 7:33 AM, Kay Sievers kay.siev...@vrfy.org wrote:
 People who like to put their rootfs on a userspace managed raid device
 just get what they asked for. :)

 Proper care and feeding of mdmon and userspace managed block devices /
 filesystems is a solvable problem.  To me the :) runs the risk of
 implying we don't think we can get this right.

 It implied that I think it is totally insane what you guys try to
 accomplish. Managing the rootfs blockdev with tools contained in the
 rootfs itself is just crazy. No smiley this time.


 Yes, much clearer.  Which is why the never let mdmon run from an fs
 it is managing is better than the current dance that was implemented
 to address the need to drop initramfs memory and get around a lack of
 having a filesystem (like /run) that persisted from early boot.  But
 we now run back into the problem of pinning initramfs memory.  Does
 systemd already expect that the full initramfs sticks around to handle
 shutdown?  If so then we have come full circle and don't really need
 the mdmon --takeover functionality versus just letting the
 initramfs-mdmon handle their entire lifetime of the rootfs blockdev.

It all depends on the initramfs implementation. Systemd is not
involved here and has no knowledge about what was left behind, it just
checks if there is binary in /run provided by initramfs, and then it
calls this binary instead of just bringing down the box itself.

So far only dracut implements this shutdown logic, which is just a
go-back-to initramfs and disassemble/shut down everything that was
assembled before the initramfs started the real init.

I wouldn't be surprised if we see more of these use cases from
subsystems which put their rootfs on something that needs to be
managed from userspace.

The pinned memory for the tools in initramfs that stay around in tmpfs
is probably the price to pay for these kinds of setups of the rootfs,
when subsystems want to avoid adding the needed logic to the kernel to
safely shut down the rootfs.

Kay
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd kills mdmon if it was started manually by user

2011-11-02 Thread Williams, Dan J
On Wed, Nov 2, 2011 at 8:29 AM, Lennart Poettering
lenn...@poettering.net wrote:
 On Wed, 02.11.11 16:21, Kay Sievers (kay.siev...@vrfy.org) wrote:


 On Wed, Nov 2, 2011 at 16:17, Lennart Poettering lenn...@poettering.net 
 wrote:
  Kernel threads we detect by checking whether /proc/$PID/cmdline is
  empty, hence I'd suggest we use the first char of argv[0][0] here, to
  detect whether something is a process to avoid killing. Question is
  which char to choose for that. I am tempted to use '@'.

 Maybe introduce a 'initramfs' cgroup and move the pids there?

 Well, in which hierarchy? I am a bit concerned about having other
 subsystems muck with the systemd cgroup hierarchy, before systemd has
 set it up.

 I can see some elegance in having all code from the initrd that remains
 running during boot in a cgroup of its own, but that's probably
 orthogonal to finding a way to recognize processes not to kill at
 shutdown. Why? Because there's stuff like Plymouth which also stays
 around from the initramfs, but actually is something we *do* want to
 kill on shutdown.

So how about rather than binaries self modifying themselves as please
don't kill me with argv[][], shutdown can just avoid process where
/proc/$PID/cmdline starts with /run/initramfs?  Then it's up to  where
the initramfs runs the binary to determine which instances it wants
provenance over versus leaving to the init system.

For manually started arrays maybe we should arrange for an
initramfs-started-mdmon to spawn new instances for user started
containers, rather than using the local /sbin/mdmon.  Then the mdadm
-Ss initiated by /run/initramfs/shutdown can reliably stop any md
device regardless of how it was started.

--
Dan
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd kills mdmon if it was started manually by user

2011-11-02 Thread Lennart Poettering
On Wed, 02.11.11 10:21, Williams, Dan J (dan.j.willi...@intel.com) wrote:

  That means we'd:
 
  a) patch systemd to check whether argv[0][0] of a process is '@' and
  owned by root and exclude it from killing on shutdown.
 
  b) patch mdmon to set argv[0][0] of itself to '@' iff it is running from
  the initrd. If it is run from the main system it should not set that and
  just be shut down like any other service.
 
 Well, there are two cases to consider:
 
 1/ user starts the array manually and stops it with mdadm -Ss (mdmon
 automatically shuts down).  No need for a service mdmon just follows
 the lifespan of the array.
 
 2/ user starts the array but then expects it to be around until system 
 shutdown
 
 In the latter case let the initramfs-mdmon takeover all arrays with
 mdmon --takeover --all.  But if all arrays may eventually be
 re-parented to an mdmon instance from /run, why not always start mdmon
 from there?

Well I am not sure how mdmon works, but let's say you booted up with an
initrd lacking mdmon. Then, while the machine is up you set up a some md
device, and start mdmon for that. At this point it will be independent
of the initrd. But that should be OK since at shutdown time it can be
detached cleanly without any special magic, too, since mdmon is not
stored on that md device. So if you boot from md you need mdmon in the
initrd. If you just use md outside of the root disk, then running mdmon
as a normal service (i.e. one that is shut down like any other) should
be perfectly fine.

This why I suggested that only mdmon run from the initrd should set
argv[0][0] = '@', because only that one needs the special handling that
it cannot be terminated properly on shut down. The one running from the
normal system can be treated like a standard systemd service.

Lennart

-- 
Lennart Poettering - Red Hat, Inc.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd kills mdmon if it was started manually by user

2011-11-02 Thread Lennart Poettering
On Wed, 02.11.11 15:18, Williams, Dan J (dan.j.willi...@intel.com) wrote:

 
 On Wed, Nov 2, 2011 at 8:29 AM, Lennart Poettering
 lenn...@poettering.net wrote:
  On Wed, 02.11.11 16:21, Kay Sievers (kay.siev...@vrfy.org) wrote:
 
 
  On Wed, Nov 2, 2011 at 16:17, Lennart Poettering lenn...@poettering.net 
  wrote:
   Kernel threads we detect by checking whether /proc/$PID/cmdline is
   empty, hence I'd suggest we use the first char of argv[0][0] here, to
   detect whether something is a process to avoid killing. Question is
   which char to choose for that. I am tempted to use '@'.
 
  Maybe introduce a 'initramfs' cgroup and move the pids there?
 
  Well, in which hierarchy? I am a bit concerned about having other
  subsystems muck with the systemd cgroup hierarchy, before systemd has
  set it up.
 
  I can see some elegance in having all code from the initrd that remains
  running during boot in a cgroup of its own, but that's probably
  orthogonal to finding a way to recognize processes not to kill at
  shutdown. Why? Because there's stuff like Plymouth which also stays
  around from the initramfs, but actually is something we *do* want to
  kill on shutdown.
 
 So how about rather than binaries self modifying themselves as please
 don't kill me with argv[][], shutdown can just avoid process where
 /proc/$PID/cmdline starts with /run/initramfs?  Then it's up to  where
 the initramfs runs the binary to determine which instances it wants
 provenance over versus leaving to the init system.

Nope, whether something should be excluded of killing during shutdown is
orthogonal to being part of the initramfs. For example, Plymouth
(i.e. the graphical boot splash thingy) is started form initrd too, but
we definitely want to kill it on shut down.

I am a bit concerned about checks against paths since initrd might play
some namespacing games and the paths might not appear to the main system
they way you'd expect.

Lennart

-- 
Lennart Poettering - Red Hat, Inc.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd kills mdmon if it was started manually by user

2011-11-02 Thread Williams, Dan J
On Wed, Nov 2, 2011 at 4:39 PM, Lennart Poettering
lenn...@poettering.net wrote:
 On Wed, 02.11.11 15:18, Williams, Dan J (dan.j.willi...@intel.com) wrote:


 On Wed, Nov 2, 2011 at 8:29 AM, Lennart Poettering
 lenn...@poettering.net wrote:
  On Wed, 02.11.11 16:21, Kay Sievers (kay.siev...@vrfy.org) wrote:
 
 
  On Wed, Nov 2, 2011 at 16:17, Lennart Poettering lenn...@poettering.net 
  wrote:
   Kernel threads we detect by checking whether /proc/$PID/cmdline is
   empty, hence I'd suggest we use the first char of argv[0][0] here, to
   detect whether something is a process to avoid killing. Question is
   which char to choose for that. I am tempted to use '@'.
 
  Maybe introduce a 'initramfs' cgroup and move the pids there?
 
  Well, in which hierarchy? I am a bit concerned about having other
  subsystems muck with the systemd cgroup hierarchy, before systemd has
  set it up.
 
  I can see some elegance in having all code from the initrd that remains
  running during boot in a cgroup of its own, but that's probably
  orthogonal to finding a way to recognize processes not to kill at
  shutdown. Why? Because there's stuff like Plymouth which also stays
  around from the initramfs, but actually is something we *do* want to
  kill on shutdown.

 So how about rather than binaries self modifying themselves as please
 don't kill me with argv[][], shutdown can just avoid process where
 /proc/$PID/cmdline starts with /run/initramfs?  Then it's up to  where
 the initramfs runs the binary to determine which instances it wants
 provenance over versus leaving to the init system.

 Nope, whether something should be excluded of killing during shutdown is
 orthogonal to being part of the initramfs. For example, Plymouth
 (i.e. the graphical boot splash thingy) is started form initrd too, but
 we definitely want to kill it on shut down.

In the plymouth case the path would be /bin/plymouth, the initramfs
would need to take special care to run mdmon from /run/initramfs to
identify it as needing the initramfs environment to carry out its
shutdown.

 I am a bit concerned about checks against paths since initrd might play
 some namespacing games and the paths might not appear to the main system
 they way you'd expect.

The initramfs needs to be modified to either tell mdmon it is running
from the initramfs or arrange for /proc/$MDMON_PID/cwd to appear to be
from /run/initramfs.  I only like the latter because it works with
existing mdmon binaries, but we may need shutdown to always leave
mdmon alone...

For user started md arrays the shutdown sequence still goes:

killall -- umount

...and we would need to express::

killall (but mdmon) -- umount -- mdadm -Ss (stops all not in use arrays)

So maybe we do the argv @ tagging in all cases and systemd never
kills mdmon but arranges for all (stoppable) md devices to be stopped,
then rely on /run/initramfs/shutdown to handle the rootfs blockdev.

--
Dan
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd kills mdmon if it was started manually by user

2011-11-01 Thread NeilBrown
On Mon, 31 Oct 2011 12:06:13 +0100 Lennart Poettering
lenn...@poettering.net wrote:

 On Sun, 23.10.11 01:00, Dan Williams (dan.j.willi...@intel.com) wrote:
 
   Well, it would be nice if the md utils would offer something doing this
   without spawning multiple processes and killing them again.
  
  
  /me wonders why his raid5 resyncs every boot on Fedora 15 and has
  found this old thread.
  
  I'm tempted to:
  
  1/ teach ignore_proc() to scan for pid files in /dev/md/ (MDMON_DIR on
  Fedora)
 
 This will not help you.
 
 We nowadays jump back into the initrd when we shut down, so that the
 initrd disassembles everything it assembled at boot time. This for the
 first time enables us to ensure that all layers of our stack are in a
 sane state (i.e. fully offline) when we shut down, regardless in which
 way you stack it.

This sounds particularly elegant.
Is there some part of the filesystem, that survives through the whole process
- from before / is mounted until after it is unmounted?
Presumably this would be /run if anything.

mdmon must be running from the time that / becomes writable until after it
becomes readonly.
If we can have it from before it is mounted until after it is unmounted, that
might be even better.
(It is possible to start a new one which replaces the old one but if that was
only used for version upgrades, that would be best).

So if mdmon has a 'cwd' and all open files in /run (and the executable
elsewhere in the same filesystem), could it easily survive the 'kill all
processes before unmounting /' thing?

 
 However, just excluding mdmom from being killed will not make this work,
 simply because jumping into initrd only works sensibly if we can drop
 all references to all previous mounts which requires us to have only one
 process running at that time, and one process only.
 
 It always boils down to the same thing: mdmon must be something we can
 shutdown cleanly like every other process. Excluding it from that will
 just move the problem around, but not fix it.

My ideal would be that you just ignore mdmon.
After unmounting '/', you shutdown md arrays with mdadm -Ss and then mdmon
will spontaneously disappear.


 
  2/ arrange for mdadm --wait-clean --scan to be called after all
  filesytems have been mounted read only
 
 Won't help you really either, since we have to kill all processes before
 we jump into the initrd to fully disassemble mounts and storage.
 
 There'll always be this chicken and egg problem: we cannot disassmble
 all storage until all processes are gone and we are back in the
 initrd. But mdmon wants to stay running after we 
 
  ...but a few things strike me.  This does not seem to be what was
  being proposed above.  Systemd does not treat dm devices like a
  service and takes care to shut them down explicitly (but in that case
  there is an api that it can call).  Is it time for a libmd.so,  so
  systemd can invoke the --wait-clean --scan process itself?  Probably
  simpler to just SIGTERM mdmon and wait for it.
 
 We actually try to disassemble md already, i.e. we call the
 DM_DEV_REMOVE ioctl for all left-over devices. I am not really
 interested to link against libdm itself.

:-)
I get used to this .. people confusing md and dm, people confusing nfs-client
with nfs-server, people confusing me with some other Mr Brown :-)

NeilBrown



signature.asc
Description: PGP signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd kills mdmon if it was started manually by user

2011-10-31 Thread Lennart Poettering
On Sun, 23.10.11 01:00, Dan Williams (dan.j.willi...@intel.com) wrote:

  Well, it would be nice if the md utils would offer something doing this
  without spawning multiple processes and killing them again.
 
 
 /me wonders why his raid5 resyncs every boot on Fedora 15 and has
 found this old thread.
 
 I'm tempted to:
 
 1/ teach ignore_proc() to scan for pid files in /dev/md/ (MDMON_DIR on
 Fedora)

This will not help you.

We nowadays jump back into the initrd when we shut down, so that the
initrd disassembles everything it assembled at boot time. This for the
first time enables us to ensure that all layers of our stack are in a
sane state (i.e. fully offline) when we shut down, regardless in which
way you stack it.

However, just excluding mdmom from being killed will not make this work,
simply because jumping into initrd only works sensibly if we can drop
all references to all previous mounts which requires us to have only one
process running at that time, and one process only.

It always boils down to the same thing: mdmon must be something we can
shutdown cleanly like every other process. Excluding it from that will
just move the problem around, but not fix it.

 2/ arrange for mdadm --wait-clean --scan to be called after all
 filesytems have been mounted read only

Won't help you really either, since we have to kill all processes before
we jump into the initrd to fully disassemble mounts and storage.

There'll always be this chicken and egg problem: we cannot disassmble
all storage until all processes are gone and we are back in the
initrd. But mdmon wants to stay running after we 

 ...but a few things strike me.  This does not seem to be what was
 being proposed above.  Systemd does not treat dm devices like a
 service and takes care to shut them down explicitly (but in that case
 there is an api that it can call).  Is it time for a libmd.so,  so
 systemd can invoke the --wait-clean --scan process itself?  Probably
 simpler to just SIGTERM mdmon and wait for it.

We actually try to disassemble md already, i.e. we call the
DM_DEV_REMOVE ioctl for all left-over devices. I am not really
interested to link against libdm itself.

Lennart

-- 
Lennart Poettering - Red Hat, Inc.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd kills mdmon if it was started manually by user

2011-10-31 Thread Lennart Poettering
On Mon, 31.10.11 12:06, Lennart Poettering (lenn...@poettering.net) wrote:

 We actually try to disassemble md already, i.e. we call the
 DM_DEV_REMOVE ioctl for all left-over devices. I am not really
 interested to link against libdm itself.

Sorry, wasn't fully woken up yet and mixed up dm and md here. Ignore
this sentence...

Lennart

-- 
Lennart Poettering - Red Hat, Inc.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd kills mdmon if it was started manually by user

2011-10-24 Thread Thomas Jarosch
On Sunday, 23. October 2011 10:00:36 Dan Williams wrote:
 Is it time for a libmd.so, so systemd can invoke the --wait-clean --scan
 process itself?  Probably simpler to just SIGTERM mdmon and wait for it.

The mdadm code makes good use of non-reentrant functions like ctime(), 
readdir() and others. Luckily systemd is single threaded.

If we provide a public interface, that would need fixing though.

Cheers,
Thomas
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd kills mdmon if it was started manually by user

2011-10-24 Thread NeilBrown
On Sun, 23 Oct 2011 01:00:36 -0700 Dan Williams dan.j.willi...@intel.com
wrote:

 On Tue, Feb 8, 2011 at 9:28 AM, Lennart Poettering
 lenn...@poettering.net wrote:
  On Tue, 08.02.11 16:54, Andrey Borzenkov (arvidj...@mail.ru) wrote:
 
   a) mdmon is perfectly capable of restarting, it is already used to
   take over mdmon launched in initrd. The problem is to know when to
   restart - i.e. when respective libraries are changed. This is a job
   for package management in distribution. It is already employed for
   glibc, systemd and some others and can just as well be employed for
   mdmon. And this is totally unrelated to systemd :)
  
   Really, you are sying there is a synchronous way to make mdmon reexec
   itself? How does that work?
  
 
  I am not sure whether it qualifies as synchronous, but mdmon
  --takeover will kill any existing mdmon for this and start monitoring
  itself.
 
  I wonder if this is really fully synchronous, i.e. that a) there is no
  point in time where mdmon is not running during this restart and b) the
  mdmom --takeover command returns when the new daemon is fully up, and
  not right-away.
 
   Well, the root file systems cannot be unmounted, only remounted.
  
   So, is there a way to invoke mdmon so that it flushes all metadata
   changes to disk and immediately terminates then this should be all we
   need for a clean solution. We'd then shutdown the normal instances of
   mdmon down like any other daemon and simply invoke this metadata
   flushing command as part of late shutdown.
 
 
  Hmm ... it looks like you just need to
 
  start mdmon
  do mdadm --wait-clean
 
  After this you can kill mdmon again (assuming decide is no more in
  use).
 
 
  Well, it would be nice if the md utils would offer something doing this
  without spawning multiple processes and killing them again.
 
 
 /me wonders why his raid5 resyncs every boot on Fedora 15 and has
 found this old thread.
 
 I'm tempted to:
 
 1/ teach ignore_proc() to scan for pid files in /dev/md/ (MDMON_DIR on Fedora)
 2/ arrange for mdadm --wait-clean --scan to be called after all
 filesytems have been mounted read only
 
 ...but a few things strike me.  This does not seem to be what was
 being proposed above.  Systemd does not treat dm devices like a
 service and takes care to shut them down explicitly (but in that case
 there is an api that it can call).  Is it time for a libmd.so,  so
 systemd can invoke the --wait-clean --scan process itself?  Probably
 simpler to just SIGTERM mdmon and wait for it.
 
 --
 Dan

Hi Dan,
  could you please explain in a bit more detail exactly what you think it is
  that is going wrong for you?

  I don't think it is anything like the original problem, as I don't think
  you are starting array manually.

  I think your problem is that 'mdmon' is being killed too early at shutdown.
  Clear we need to get whatever-kills-user-processes to skip mdmon - maybe by
  writing the pid to some magic file that 'ignore_proc' already knows about?

  Ultimately we probably want to get udev to start mdmon for us and have
  mdadm notice and not start it itself.
  We also need to get udev to notice arrays that are being reshaped and to
  start the mdadm which montiors the reshape so that mdadm doesn't have to
  fork it itself.

  That should fix the original problem, but I don't think it addresses your
  problem at all.

  I don't have a Fedora install so I cannot hunt around to see what is
  happening.

  I don't like the idea for a 'libmd.so' at all - certainly not until the
  problem is properly understood and other solutions (like running
  scripts) prove ineffective.

NeilBrown


signature.asc
Description: PGP signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd kills mdmon if it was started manually by user

2011-10-23 Thread Dan Williams
On Tue, Feb 8, 2011 at 9:28 AM, Lennart Poettering
lenn...@poettering.net wrote:
 On Tue, 08.02.11 16:54, Andrey Borzenkov (arvidj...@mail.ru) wrote:

  a) mdmon is perfectly capable of restarting, it is already used to
  take over mdmon launched in initrd. The problem is to know when to
  restart - i.e. when respective libraries are changed. This is a job
  for package management in distribution. It is already employed for
  glibc, systemd and some others and can just as well be employed for
  mdmon. And this is totally unrelated to systemd :)
 
  Really, you are sying there is a synchronous way to make mdmon reexec
  itself? How does that work?
 

 I am not sure whether it qualifies as synchronous, but mdmon
 --takeover will kill any existing mdmon for this and start monitoring
 itself.

 I wonder if this is really fully synchronous, i.e. that a) there is no
 point in time where mdmon is not running during this restart and b) the
 mdmom --takeover command returns when the new daemon is fully up, and
 not right-away.

  Well, the root file systems cannot be unmounted, only remounted.
 
  So, is there a way to invoke mdmon so that it flushes all metadata
  changes to disk and immediately terminates then this should be all we
  need for a clean solution. We'd then shutdown the normal instances of
  mdmon down like any other daemon and simply invoke this metadata
  flushing command as part of late shutdown.


 Hmm ... it looks like you just need to

 start mdmon
 do mdadm --wait-clean

 After this you can kill mdmon again (assuming decide is no more in
 use).


 Well, it would be nice if the md utils would offer something doing this
 without spawning multiple processes and killing them again.


/me wonders why his raid5 resyncs every boot on Fedora 15 and has
found this old thread.

I'm tempted to:

1/ teach ignore_proc() to scan for pid files in /dev/md/ (MDMON_DIR on Fedora)
2/ arrange for mdadm --wait-clean --scan to be called after all
filesytems have been mounted read only

...but a few things strike me.  This does not seem to be what was
being proposed above.  Systemd does not treat dm devices like a
service and takes care to shut them down explicitly (but in that case
there is an api that it can call).  Is it time for a libmd.so,  so
systemd can invoke the --wait-clean --scan process itself?  Probably
simpler to just SIGTERM mdmon and wait for it.

--
Dan
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd kills mdmon if it was started manually by user

2011-02-09 Thread Lennart Poettering
On Tue, 08.02.11 12:07, Lennart Poettering (lenn...@poettering.net) wrote:

  At this point we know it is container, know that it has external
  metadata and know that we need external metadata handler (mdmon). But
  it is too late for systemd.
 
 Kay, do you know why this change event is used here? Any chance we can
 get rid of it?

So, it seems that the change event does make some sense here. I have
now added a new property to systemd: if you set SYSTEMD_READY=0 on a
udev device then systemd will consider it unplugged even if it shows up
in the udev tree. If this property is not set for a device, or is set to
1 we will conisder the device plugged.

To make this md stuff compatible with systemd we hence just need to set
SYSTEMD_READY=0 during the new event and drop it when the device is
fully set up. 

Andrey, since you are playing around with this, do you happen to know
which attribute we should check to set SYSTEMD_READY=0 properly? It
would be cool if we could come up with a default rule for inclusion in
our systemd rules file that will ensure the device only shows up when it
is ready.

Lennart

-- 
Lennart Poettering - Red Hat, Inc.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd kills mdmon if it was started manually by user

2011-02-08 Thread Lennart Poettering
On Fri, 04.02.11 22:55, Andrey Borzenkov (arvidj...@mail.ru) wrote:

  That's right, but the names are not known in advance and can change
  between reboots. This means such units have to be generated
  dynamically, exist until reboot (ramfs?) and be removed when array is
  destroyed. Not sure it is really manageable.
 
  Hmm? It should be sufficient to just write the service template properly
  (mdmon@.service) and then instantiate it when needed with systemctl
  start mdmon@xyz.service or something equivalent. itMs a matter of
  issuing a single dbus call.
 
  And which instance should generate them? mdadm?
 
  i think it is much nicer to spawn the necessary mdadm service instance
  from a udev rule,
 
 Yes, this can be done relatively easily; as proof of concept:
 
 SUBSYSTEM!=block, GOTO=systemd_md_end
 ACTION!=change, GOTO=systemd_md_end
 KERNEL!=md*, GOTO=systemd_md_end
 ATTR{md/metadata_version}==external:[A-Za-z]*, RUN+=/bin/systemctl
 start mdmon@%k.service
 LABEL=systemd_md_end

Nah, it's much better to simply use the SYSTEMD_WANTS var on the device.

Something like this:

, ENV{SYSTEMD_WANTS}=mdmon@%k.service

That way the device unit will simply have a wants dep on the service
unit, and this is prefectly discoverable.

 Setting SYSTEMD_WANTS would be more elegant solution, but it does not
 work with current systemd implementation. It is capable of starting
 requested units only on add event (effectively the very first time
 device becomes plugged), while mdmon must be started on change
 event, as only then we know whether mdmon is required at all.

Oha, so you are actually aware of SYSTEMD_WANTS. Hmm. I need to think
about this. Why does md employ the change event? Is this really
necessary, smells a bit foul.

 Running mdmon via systemd in this way opens up interesting
 possibility. E.g. service could be declared immortal and be exempt
 from usual shutdown sequence ... or is it possible to do already?

A service needs to conflict with shutdown.target to be shut down when we
go down normally. If your service does not conflict with shutdown.target
then it will stay around and be killed only after systemd is gone and
PID1 is systemd-shutdown which then kills all processes remaining
(independent of any idea of service) and the unmounts all file
systems. Normally all services conflict with shutdown.target implicitly,
which you can turn off by setting DefaultDependencies=.

 Actually it can be implemented even without mdadm patches; apparently
 it is possible to suppress normal starting of mdmon by setting
 MDADM_NO_MDMON=1

A this point mdmon is simply broken: if glibc or mdmon itself (or any
lib it is using) is upgraded, then mdmon will keep referencing the old
.so or binary as long as it is running. This means that the fs these
files are on cannot be remounted r/o. However mdmon insists on being
shutdown only after all fs got remounted ro. So you have a cyclic
ordering loop here: mdmon wants to be shut down after the remount, but
we need to shut it down before the remount. 

This is unfixable unless a) mdmon learns reexecution of itself without
losing state (like most init systems so), or b) mdmon would stop
insisting on being shutdown only after the remount.

In my eyes b) is very much preferebale: It should be possible to shut
down mdmon like any other service. And if then some md related code
still needs to be run on late shutdown this should be done from a new
process. I would be willing to add some hooks for this, so that we can
execute arbitrary drop-in processes as part of the final shutdown loop.

Lennart

-- 
Lennart Poettering - Red Hat, Inc.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd kills mdmon if it was started manually by user

2011-02-08 Thread Andrey Borzenkov
On Tue, Feb 8, 2011 at 12:48 PM, Lennart Poettering
lenn...@poettering.net wrote:
 On Fri, 04.02.11 22:55, Andrey Borzenkov (arvidj...@mail.ru) wrote:

  That's right, but the names are not known in advance and can change
  between reboots. This means such units have to be generated
  dynamically, exist until reboot (ramfs?) and be removed when array is
  destroyed. Not sure it is really manageable.
 
  Hmm? It should be sufficient to just write the service template properly
  (mdmon@.service) and then instantiate it when needed with systemctl
  start mdmon@xyz.service or something equivalent. itMs a matter of
  issuing a single dbus call.
 
  And which instance should generate them? mdadm?
 
  i think it is much nicer to spawn the necessary mdadm service instance
  from a udev rule,

 Yes, this can be done relatively easily; as proof of concept:

 SUBSYSTEM!=block, GOTO=systemd_md_end
 ACTION!=change, GOTO=systemd_md_end
 KERNEL!=md*, GOTO=systemd_md_end
 ATTR{md/metadata_version}==external:[A-Za-z]*, RUN+=/bin/systemctl
 start mdmon@%k.service
 LABEL=systemd_md_end

 Nah, it's much better to simply use the SYSTEMD_WANTS var on the device.

 Something like this:

 , ENV{SYSTEMD_WANTS}=mdmon@%k.service

 That way the device unit will simply have a wants dep on the service
 unit, and this is prefectly discoverable.

 Setting SYSTEMD_WANTS would be more elegant solution, but it does not
 work with current systemd implementation. It is capable of starting
 requested units only on add event (effectively the very first time
 device becomes plugged), while mdmon must be started on change
 event, as only then we know whether mdmon is required at all.

 Oha, so you are actually aware of SYSTEMD_WANTS. Hmm. I need to think
 about this. Why does md employ the change event? Is this really
 necessary, smells a bit foul.


I am probably the wrong one to ask, but here is what happens when
array is started (from udev perspective)

UDEV  [1297507039.109828] add  /devices/virtual/block/md127 (block)
UDEV_LOG=3
ACTION=add
DEVPATH=/devices/virtual/block/md127
SUBSYSTEM=block
DEVNAME=/dev/md127
DEVTYPE=disk
SEQNUM=1742
UDISKS_PRESENTATION_NOPOLICY=1
MAJOR=9
MINOR=127
TAGS=:systemd:

After this event device goes plugged and SYSTEMD_WANTS (if any) are
triggered. But at this point we have zero information about array to
decide anything.

UDEV  [1297507039.211940] change   /devices/virtual/block/md127 (block)
UDEV_LOG=3
ACTION=change
DEVPATH=/devices/virtual/block/md127
SUBSYSTEM=block
DEVNAME=/dev/md127
DEVTYPE=disk
SEQNUM=1743
MD_LEVEL=container
MD_DEVICES=2
MD_METADATA=ddf
MD_UUID=f8362f39:0436b20f:cf338104:afec436e
MD_DEVNAME=ddf0
UDISKS_PRESENTATION_NOPOLICY=1
MAJOR=9
MINOR=127
DEVLINKS=/dev/disk/by-id/md-uuid-f8362f39:0436b20f:cf338104:afec436e
/dev/md/ddf0
TAGS=:systemd:

At this point we know it is container, know that it has external
metadata and know that we need external metadata handler (mdmon). But
it is too late for systemd.


 Actually it can be implemented even without mdadm patches; apparently
 it is possible to suppress normal starting of mdmon by setting
 MDADM_NO_MDMON=1

 A this point mdmon is simply broken: if glibc or mdmon itself (or any
 lib it is using) is upgraded, then mdmon will keep referencing the old
 .so or binary as long as it is running. This means that the fs these
 files are on cannot be remounted r/o. However mdmon insists on being
 shutdown only after all fs got remounted ro. So you have a cyclic
 ordering loop here: mdmon wants to be shut down after the remount, but
 we need to shut it down before the remount.


Ehh ...

a) mdmon is perfectly capable of restarting, it is already used to
take over mdmon launched in initrd. The problem is to know when to
restart - i.e. when respective libraries are changed. This is a job
for package management in distribution. It is already employed for
glibc, systemd and some others and can just as well be employed for
mdmon. And this is totally unrelated to systemd :)

b) having binary launched off some fs should not prevent this fs to be
remountd ro - binaries are not opened rw

 This is unfixable unless a) mdmon learns reexecution of itself without
 losing state (like most init systems so), or b) mdmon would stop
 insisting on being shutdown only after the remount.


As far as I can tell, both is true today; but remounting is not
enough, unfortunately.

 In my eyes b) is very much preferebale: It should be possible to shut
 down mdmon like any other service. And if then some md related code
 still needs to be run on late shutdown this should be done from a new
 process. I would be willing to add some hooks for this, so that we can
 execute arbitrary drop-in processes as part of the final shutdown loop.


mdmon is needed to ensure metadata were correctly updated. So it needs
to exist as long as metadata *may* be updated. For practical purposes
it means - until file system is unmounted and flushed to disks. I am
not sure that remounting ro stops 

Re: [systemd-devel] systemd kills mdmon if it was started manually by user

2011-02-08 Thread Andrey Borzenkov
On Tue, Feb 8, 2011 at 2:07 PM, Lennart Poettering
lenn...@poettering.net wrote:
 On Tue, 08.02.11 13:52, Andrey Borzenkov (arvidj...@mail.ru) wrote:

 I am probably the wrong one to ask, but here is what happens when
 array is started (from udev perspective)

 [...]

 After this event device goes plugged and SYSTEMD_WANTS (if any) are
 triggered. But at this point we have zero information about array to
 decide anything.

 [...]

 At this point we know it is container, know that it has external
 metadata and know that we need external metadata handler (mdmon). But
 it is too late for systemd.

 Kay, do you know why this change event is used here? Any chance we can
 get rid of it?


 
  Actually it can be implemented even without mdadm patches; apparently
  it is possible to suppress normal starting of mdmon by setting
  MDADM_NO_MDMON=1
 
  A this point mdmon is simply broken: if glibc or mdmon itself (or any
  lib it is using) is upgraded, then mdmon will keep referencing the old
  .so or binary as long as it is running. This means that the fs these
  files are on cannot be remounted r/o. However mdmon insists on being
  shutdown only after all fs got remounted ro. So you have a cyclic
  ordering loop here: mdmon wants to be shut down after the remount, but
  we need to shut it down before the remount.
 

 Ehh ...

 a) mdmon is perfectly capable of restarting, it is already used to
 take over mdmon launched in initrd. The problem is to know when to
 restart - i.e. when respective libraries are changed. This is a job
 for package management in distribution. It is already employed for
 glibc, systemd and some others and can just as well be employed for
 mdmon. And this is totally unrelated to systemd :)

 Really, you are sying there is a synchronous way to make mdmon reexec
 itself? How does that work?


I am not sure whether it qualifies as synchronous, but mdmon
--takeover will kill any existing mdmon for this and start monitoring
itself.

 b) having binary launched off some fs should not prevent this fs to be
 remountd ro - binaries are not opened rw

 If you run a binary and then the package manager replaces it then the
 running instance will still refer to the old copy and this will have the
 effect that the file isn't actually deleted until the proces
 exits/execs. And because that is the way it is the kernel will refuse
 unmounting of the fs until you terminated/reexeced your process.

  This is unfixable unless a) mdmon learns reexecution of itself without
  losing state (like most init systems so), or b) mdmon would stop
  insisting on being shutdown only after the remount.

 As far as I can tell, both is true today; but remounting is not
 enough, unfortunately.

 So, you are saying we can shut down mdmon without ill effects early?


At least that's what I see. You can shutdown mdmon and continue to
work with file system, even if it is mounted rw. Under some conditions
mount will hang; i.e.

start array
kill mdmon
try to mount

mount will hang. If you start mdmon, it is mounted. But if you now

umount
kill mdmon
mount

it is mounted just fine.

  In my eyes b) is very much preferebale: It should be possible to shut
  down mdmon like any other service. And if then some md related code
  still needs to be run on late shutdown this should be done from a new
  process. I would be willing to add some hooks for this, so that we can
  execute arbitrary drop-in processes as part of the final shutdown loop.

 mdmon is needed to ensure metadata were correctly updated. So it needs
 to exist as long as metadata *may* be updated. For practical purposes
 it means - until file system is unmounted and flushed to disks. I am
 not sure that remounting ro stops all activity (at least, mounting ro
 definitely *writes* to device using some filesystems).

 Well, the root file systems cannot be unmounted, only remounted.

 So, is there a way to invoke mdmon so that it flushes all metadata
 changes to disk and immediately terminates then this should be all we
 need for a clean solution. We'd then shutdown the normal instances of
 mdmon down like any other daemon and simply invoke this metadata
 flushing command as part of late shutdown.


Hmm ... it looks like you just need to

start mdmon
do mdadm --wait-clean

After this you can kill mdmon again (assuming decide is no more in use).
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd kills mdmon if it was started manually by user

2011-02-08 Thread Lennart Poettering
On Tue, 08.02.11 16:54, Andrey Borzenkov (arvidj...@mail.ru) wrote:

  a) mdmon is perfectly capable of restarting, it is already used to
  take over mdmon launched in initrd. The problem is to know when to
  restart - i.e. when respective libraries are changed. This is a job
  for package management in distribution. It is already employed for
  glibc, systemd and some others and can just as well be employed for
  mdmon. And this is totally unrelated to systemd :)
 
  Really, you are sying there is a synchronous way to make mdmon reexec
  itself? How does that work?
 
 
 I am not sure whether it qualifies as synchronous, but mdmon
 --takeover will kill any existing mdmon for this and start monitoring
 itself.

I wonder if this is really fully synchronous, i.e. that a) there is no
point in time where mdmon is not running during this restart and b) the
mdmom --takeover command returns when the new daemon is fully up, and
not right-away.

  Well, the root file systems cannot be unmounted, only remounted.
 
  So, is there a way to invoke mdmon so that it flushes all metadata
  changes to disk and immediately terminates then this should be all we
  need for a clean solution. We'd then shutdown the normal instances of
  mdmon down like any other daemon and simply invoke this metadata
  flushing command as part of late shutdown.
 
 
 Hmm ... it looks like you just need to
 
 start mdmon
 do mdadm --wait-clean
 
 After this you can kill mdmon again (assuming decide is no more in
 use).


Well, it would be nice if the md utils would offer something doing this
without spawning multiple processes and killing them again. 

Lennart

-- 
Lennart Poettering - Red Hat, Inc.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd kills mdmon if it was started manually by user

2011-02-04 Thread Andrey Borzenkov
On Tue, Jan 25, 2011 at 7:28 AM, Lennart Poettering
lenn...@poettering.net wrote:
 On Tue, 25.01.11 06:58, Andrey Borzenkov (arvidj...@mail.ru) wrote:

  systemd supports instantiated services, for example to deal with the
  gettys (e.g. getty@tty5.service). It should be trivial to use the same
  for mdmon (e.g. mdmon@md3.service).
 
 That's right, but the names are not known in advance and can change
 between reboots. This means such units have to be generated
 dynamically, exist until reboot (ramfs?) and be removed when array is
 destroyed. Not sure it is really manageable.

 Hmm? It should be sufficient to just write the service template properly
 (mdmon@.service) and then instantiate it when needed with systemctl
 start mdmon@xyz.service or something equivalent. itMs a matter of
 issuing a single dbus call.

 And which instance should generate them? mdadm?

 i think it is much nicer to spawn the necessary mdadm service instance
 from a udev rule,

Yes, this can be done relatively easily; as proof of concept:

SUBSYSTEM!=block, GOTO=systemd_md_end
ACTION!=change, GOTO=systemd_md_end
KERNEL!=md*, GOTO=systemd_md_end
ATTR{md/metadata_version}==external:[A-Za-z]*, RUN+=/bin/systemctl
start mdmon@%k.service
LABEL=systemd_md_end

where mdon@.service is


[Unit]
Description=mdmon service
BindTo=dev-%i.device
After=dev-%i.device

[Service]
Type=forking
PIDFile=/dev/.mdadm/%i.pid
ExecStart=/sbin/mdmon --takeover %i


With the result

[root@localhost ~]# systemctl status mdmon@md127.service
mdmon@md127.service - mdmon service
  Loaded: loaded (/etc/systemd/system/mdmon@.service)
  Active: active (running) since Tue, 08 Feb 2011 09:43:30
-0500; 5min ago
 Process: 1467 ExecStart=/sbin/mdmon --takeover %i
(code=exited, status=0/SUCCESS)
Main PID: 1468 (mdmon)
  CGroup: name=systemd:/system/mdmon@.service/md127
  └ 1468 /sbin/mdmon --takeover md127

Setting SYSTEMD_WANTS would be more elegant solution, but it does not
work with current systemd implementation. It is capable of starting
requested units only on add event (effectively the very first time
device becomes plugged), while mdmon must be started on change
event, as only then we know whether mdmon is required at all.

Running mdmon via systemd in this way opens up interesting
possibility. E.g. service could be declared immortal and be exempt
from usual shutdown sequence ... or is it possible to do already?

Actually it can be implemented even without mdadm patches; apparently
it is possible to suppress normal starting of mdmon by setting
MDADM_NO_MDMON=1

   but you could even run it from mdadm by invoking one
 dbus call from it.


It may turn out to be necessary still. If container needs mdmon,
arrays it contains won't become read-write until mdmon is started. If
mdmon is started asynchronously by udev, there is window where someone
may try to use array before it is rw. As trivial example, mount unit
which depends on md device unit.

I do not think mdadm maintainer will be happy to add D-Bus dependency
to something that is likely to be included in initrd though :) But may
be we could simply try execl(/bin/systemctl, start, ...) before
current execl(/sbin/mdmon,... )?
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd kills mdmon if it was started manually by user

2011-01-24 Thread Lennart Poettering
On Sat, 22.01.11 20:55, Andrey Borzenkov (arvidj...@mail.ru) wrote:

  mdmon does not belong to user. User is not even aware that it is
  started. And it is likely not the last case. So systemd does need some
  framework which can move such processes out of user session. It
  probably needs some sd_daemon API to notify systemd that it is system
  level task even if it was started as result of user interaction.
 
   Well, it is started by user, so it belongs to user. And
  systemd has an API to start system-level task as a result of
  user interaction: it is called systemctl start mdmon.service.
 
 
 mdmon is not a singleton - it is started for every array that needs it
 (not each array needs it). Can you pass extra parameters that identify
 object mdmon should monitor via systemctl?

systemd supports instantiated services, for example to deal with the
gettys (e.g. getty@tty5.service). It should be trivial to use the same
for mdmon (e.g. mdmon@md3.service).

Lennart

-- 
Lennart Poettering - Red Hat, Inc.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd kills mdmon if it was started manually by user

2011-01-24 Thread Andrey Borzenkov
On Tue, Jan 25, 2011 at 6:44 AM, Lennart Poettering
lenn...@poettering.net wrote:
 On Sat, 22.01.11 20:55, Andrey Borzenkov (arvidj...@mail.ru) wrote:

  mdmon does not belong to user. User is not even aware that it is
  started. And it is likely not the last case. So systemd does need some
  framework which can move such processes out of user session. It
  probably needs some sd_daemon API to notify systemd that it is system
  level task even if it was started as result of user interaction.
 
   Well, it is started by user, so it belongs to user. And
  systemd has an API to start system-level task as a result of
  user interaction: it is called systemctl start mdmon.service.
 

 mdmon is not a singleton - it is started for every array that needs it
 (not each array needs it). Can you pass extra parameters that identify
 object mdmon should monitor via systemctl?

 systemd supports instantiated services, for example to deal with the
 gettys (e.g. getty@tty5.service). It should be trivial to use the same
 for mdmon (e.g. mdmon@md3.service).


That's right, but the names are not known in advance and can change
between reboots. This means such units have to be generated
dynamically, exist until reboot (ramfs?) and be removed when array is
destroyed. Not sure it is really manageable.

And which instance should generate them? mdadm?
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd kills mdmon if it was started manually by user

2011-01-22 Thread Andrey Borzenkov
2010/12/4 Tomasz Torcz to...@pipebreaker.pl:
 On Sat, Dec 04, 2010 at 03:08:05PM +0300, Andrey Borzenkov wrote:
  (/etc/pam.d/system-auth), which automatically creates cgroups by login
  session, which in turn gets killed when the user has completely logged 
  out.
  That is why your mdadm gets terminated, too.

 Sure.

  You can avoid that by adding create-session=0 to it, like:
 
  # grep pam_systemd /etc/pam.d/systemd-auth
  session     optional    pam_systemd.so create-session=0
 

 But I do want user session to be created; and systemd was specifically
 extended to properly terminate user sessions on shutdown. It is just
 that mdmon does not belong to user session at all.

  Man page talks about kill-session= and kill-user= parameters, which
 may be useful to you.


  Which is the recommented way if you want processes (created by the user) 
  to
  live on even when this user has fully logged out.
 

 mdmon does not belong to user. User is not even aware that it is
 started. And it is likely not the last case. So systemd does need some
 framework which can move such processes out of user session. It
 probably needs some sd_daemon API to notify systemd that it is system
 level task even if it was started as result of user interaction.

  Well, it is started by user, so it belongs to user. And
 systemd has an API to start system-level task as a result of
 user interaction: it is called systemctl start mdmon.service.


mdmon is not a singleton - it is started for every array that needs it
(not each array needs it). Can you pass extra parameters that identify
object mdmon should monitor via systemctl?

Using udev to listen to new array event and starting mdmon from there
looks promising. I do not know whether it is possible to identify such
arrays at this point though nor do I have hardware to test.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd kills mdmon if it was started manually by user

2011-01-06 Thread Lennart Poettering
On Sat, 04.12.10 11:41, Andrey Borzenkov (arvidj...@gmail.com) wrote:

 If user starts array manually (mdadm -A -s as example) from within
 user session and array needs mdmon, mdmon becomes part of user session
 control group:

Are you suggesting that mdadm forks off mdmon from within the user
session? This is horribly ugly and broken and they shouldn't do that.

 
 ├ user
 │ └ root
 │   └ 1
 │ ├ 1916 login -- root
 │ ├ 1930 -bash
 │ ├ 1964 gpg-agent --keep-display --daemon --write-env-file /root/.gnup...
 │ └ 2062 mdmon md127
 
 
 It is then killed by systemd during shutdown as part of user session.

Well, only if you enable that the user session is completely killed on
logout, which we currently don't do by default.

I wonder if it would make sense to add an option which kills user
sessions on log out only for uid != 0. This might help here, but only
half-way, since sudo would still break. But anyway, I'll add this to the
todo list.

 It results in dirty array on next boot.

Hmm, that shouldn't happen.

 Is there any magic that allows daemon to be exempted from killing?

Well, I have been discussing this with Kay and we'll most likely add
something like DontKillOnShutdown=yes or so, which if added to a unit
file will exempt it from killing during the normal service shutdown
phase, and the first killing spree (but not the second, post-umount
killing spree). But that of course would require mdmon to be started
like any other daemon, and not forked off mdadm.

That should mostly fix the problem, but then again I do believe that the
whole idea of mdmon is just borked, since it will necessarily pin page
from the root fs into memory which will create all kinds of problems,
for example after upgrades (i.e. mdmon maps libc into memory, libc gets
updated, the old libc deleted, which cannot be written to disk as long
as mdmon stays running pinning it, which will disallow the ultimate
unmounting/remounting of the fs).

Lennart

-- 
Lennart Poettering - Red Hat, Inc.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd kills mdmon if it was started manually by user

2011-01-06 Thread Lennart Poettering
On Sat, 04.12.10 15:08, Andrey Borzenkov (arvidj...@gmail.com) wrote:

  It is then killed by systemd during shutdown as part of user session.
  It results in dirty array on next boot.
 
  Is there any magic that allows daemon to be exempted from killing?
 
  While your raid should absolutely not be corrupted on next reboot
  when mdmon receives a SIGTERM,
 
 This won't be corrupted but it will initiate rebuilt. I have reports
 that such rebuild may take hours, costing performance and loss of
 redundancy.

Well, eventually we need to be able to kill mdmon. Otherwise we might
not be able to remount the root dir r/o. How exactly is mdmon supposed
to behave on shutdown?

Lennart

-- 
Lennart Poettering - Red Hat, Inc.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd kills mdmon if it was started manually by user

2011-01-06 Thread NeilBrown
On Fri, 7 Jan 2011 01:38:27 +0100 Lennart Poettering lenn...@poettering.net
wrote:

 On Sat, 04.12.10 11:41, Andrey Borzenkov (arvidj...@gmail.com) wrote:
 
  If user starts array manually (mdadm -A -s as example) from within
  user session and array needs mdmon, mdmon becomes part of user session
  control group:
 
 Are you suggesting that mdadm forks off mdmon from within the user
 session? This is horribly ugly and broken and they shouldn't do that.

What alternative would you suggest?

A daemon needs to be running while certain md arrays are running and writable.

NeilBrown
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd kills mdmon if it was started manually by user

2011-01-06 Thread Roman Mamedov
On Fri, 7 Jan 2011 02:09:32 +0100
Michael Biebl mbi...@gmail.com wrote:

 2011/1/7 Lennart Poettering lenn...@poettering.net:
 
  Well, I have been discussing this with Kay and we'll most likely add
  something like DontKillOnShutdown=yes or so, which if added to a unit
 
 Make that KillOnShutdown=no, please.

Agreed :) That reminds me of hal-disable-polling --enable-polling
( http://ur1.ca/2rmis )

-- 
With respect,
Roman


signature.asc
Description: PGP signature
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd kills mdmon if it was started manually by user

2011-01-06 Thread Lennart Poettering
On Fri, 07.01.11 12:16, NeilBrown (ne...@suse.de) wrote:

 
 On Fri, 7 Jan 2011 01:38:27 +0100 Lennart Poettering lenn...@poettering.net
 wrote:
 
  On Sat, 04.12.10 11:41, Andrey Borzenkov (arvidj...@gmail.com) wrote:
  
   If user starts array manually (mdadm -A -s as example) from within
   user session and array needs mdmon, mdmon becomes part of user session
   control group:
  
  Are you suggesting that mdadm forks off mdmon from within the user
  session? This is horribly ugly and broken and they shouldn't do that.
 
 What alternative would you suggest?

Start it as a normal service like any other. But if you fork off the
daemon from the user session then the daemon will run in a very broken
context: the resource limits of the user apply, the audit trail will
point to the user (i.e. /proc/self/loginuid), the cgroup will be of the
user, the daemon cannot be supervised as every other daemon. Also, the
daemon will inherit all the other process properties from the user,
which is almost definitely wrong. i.e. the env block and so
on, the sig mask. gazillions of small little properties. Of course, a
big bunch of them you can reset in your code, but that's a race you
cannot win: the kernel adds new process properties all the time, and
you'd have to reset them manually.

It's is really essential that daemons are started from a clean process
environment, and are detached from the user session. SysV kinda provides
that, for everything started on boot and in a limited way for stuff
started via /sbin/service. systemd provides that too and much more
correct. But just forking off things just like that is not a good
solution.

A thinkable, relatively simple solution in a systemd world is to pull in
the mdmon service from the udev device. The udev device would do all the
necessary matching to figure out whether mdmon is needed or not. If you
care about non-systemd environments something like this of course
becomes a lot more complex.

 A daemon needs to be running while certain md arrays are running and writable.

Well, but auto-spawning it from the user session is not really a usable 
solution.

Lennart

-- 
Lennart Poettering - Red Hat, Inc.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd kills mdmon if it was started manually by user

2010-12-04 Thread Christian Parpart
On Saturday, December 04, 2010 09:41:26 am Andrey Borzenkov wrote:
 If user starts array manually (mdadm -A -s as example) from within
 user session and array needs mdmon, mdmon becomes part of user session
 control group:
 
 ├ user
 │ └ root
 │   └ 1
 │ ├ 1916 login -- root
 │ ├ 1930 -bash
 │ ├ 1964 gpg-agent --keep-display --daemon --write-env-file
 /root/.gnup... │ └ 2062 mdmon md127
 
 
 It is then killed by systemd during shutdown as part of user session.
 It results in dirty array on next boot.
 
 Is there any magic that allows daemon to be exempted from killing?

While your raid should absolutely not be corrupted on next reboot 
when mdmon receives a SIGTERM, I suspect you're using pam_systemd.so 
(/etc/pam.d/system-auth), which automatically creates cgroups by login 
session, which in turn gets killed when the user has completely logged out.
That is why your mdadm gets terminated, too.
You can avoid that by adding create-session=0 to it, like:

# grep pam_systemd /etc/pam.d/systemd-auth
session optionalpam_systemd.so create-session=0

Which is the recommented way if you want processes (created by the user) to 
live on even when this user has fully logged out.

Regards,
Christian Parpart.

p.s.: see pam_systemd(8)
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel


Re: [systemd-devel] systemd kills mdmon if it was started manually by user

2010-12-04 Thread Andrey Borzenkov
On Sat, Dec 4, 2010 at 12:12 PM, Christian Parpart tra...@gentoo.org wrote:
 On Saturday, December 04, 2010 09:41:26 am Andrey Borzenkov wrote:
 If user starts array manually (mdadm -A -s as example) from within
 user session and array needs mdmon, mdmon becomes part of user session
 control group:

 ├ user
 │ └ root
 │   └ 1
 │     ├ 1916 login -- root
 │     ├ 1930 -bash
 │     ├ 1964 gpg-agent --keep-display --daemon --write-env-file
 /root/.gnup... │     └ 2062 mdmon md127


 It is then killed by systemd during shutdown as part of user session.
 It results in dirty array on next boot.

 Is there any magic that allows daemon to be exempted from killing?

 While your raid should absolutely not be corrupted on next reboot
 when mdmon receives a SIGTERM,

This won't be corrupted but it will initiate rebuilt. I have reports
that such rebuild may take hours, costing performance and loss of
redundancy.

   I suspect you're using 
 pam_systemd.so

Yes

 (/etc/pam.d/system-auth), which automatically creates cgroups by login
 session, which in turn gets killed when the user has completely logged out.
 That is why your mdadm gets terminated, too.

Sure.

 You can avoid that by adding create-session=0 to it, like:

 # grep pam_systemd /etc/pam.d/systemd-auth
 session     optional    pam_systemd.so create-session=0


But I do want user session to be created; and systemd was specifically
extended to properly terminate user sessions on shutdown. It is just
that mdmon does not belong to user session at all.

 Which is the recommented way if you want processes (created by the user) to
 live on even when this user has fully logged out.


mdmon does not belong to user. User is not even aware that it is
started. And it is likely not the last case. So systemd does need some
framework which can move such processes out of user session. It
probably needs some sd_daemon API to notify systemd that it is system
level task even if it was started as result of user interaction.
___
systemd-devel mailing list
systemd-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/systemd-devel