On Mon, 17 Nov 2014 17:39:07 Gaudenz Steinlin wrote: > I don't particularly like this in it's current state. The line > "ExecStartPre=-/bin/systemctl start ceph-osd*" seems very wrong to me.
"Very wrong"? Why? IMHO it is elegant because you can't define dependency on service if you don't know its name (or maybe I just couldn't find how to write a dependency on something like "ceph-osd@#"). It works correctly too as it can start only _enabled_ services. > I'm not a systemd expert but I did not find an easy way to create > something like a meta-service in a way that looks like integrated into > systemd. But then I don't think that's needed either. The way the > current init script tries to start all the different daemons in one > script always seemd odd to me. Do we need a meta service like this? Unfortunately we need it for compatibility. Otherwise systemd tries to start SysV script and we have a mess much worse than just having no-op meta service... > I agree with this. Having multiple instances per machine of ceph-mon or > ceph-mds does not make sense. On the other hand your proposed > implementation uses "%H" which resolves to the hostname. This is not > compatible with the current implementation in the init script which > parses the configuration file to find the id of the mds and mon. I'm not > sure how to solve this, but IMO all distributions should do this in the > same way and at the very least we need an upgrade path for users that > don't have the hostname as the id of their mon and mds (like having > mon.1, mon.2, ... instead of node1, node2, ...). I see 3 possible > solutions: > > - Add a script similar to the code in the current init script which > parses the config file to get the id and use that when starting the > daemon. > - Agreement that mons and mds should have their ids equal to the > hostname. I don't really like that solution as it seems quite > inflexible. > - Use a service template (with the @) nonetheless. This is probably the > simplest solution but requires more manual intervention by the cluster > administrator. He has to set the id manually when enabling the service. I didn't look deeper into this argument but if I recall correctly you can't start daemon without passing hostname, right? It _seems_ to work since it may try to bring up service even when it doesn't have corresponding section in ceph.conf... would it be enough if upstream modifies service to ignore host name supplied through command line and use ceph.conf only? I reckon it is merely a documentation issue which may worth mentioning for transition to systemd rather than introduce and fairly ugly workarounds... > Some other discussion points: > - Restart policy: I think we should take advantage of the fact that > systemd can monitor processes and restart them if they fail. I propose > to start the daemon in the forground (like it's done already) and set > "Restart=on-failure". See man systemd.service[1] for the details what > this means. Do we need custom values for RestartSec (time to sleep > before restart, default 100ms), StartLimitInterval, StartLimitBurst > (both related to start rate limiting, default 5 times in 10 seconds)? See boilerplate in my "[email protected]". Yes, we need all this to avoid infinite restarts as well as to avoid resarting services too fast. Malfunctioning OSD which dies soon after it restarted can put cluster to permanent "peering" state if restarted too often. One thing which makes me _very very_ unhappy about Ceph is that its OSDs are unstable because upstream do not treat 'em like mission-critical service and plugs untested code paths with asserts. I had cascade of OSD falures spreading like bushfire over the cluster more than twice and I just can't trust system like this with my data. Weeks of down time is just not acceptable... Nevertheless SysV init script do not handle restarts so for compatibility and due to above concerns we may decide not to use systemd auto restart facilities. In reality it helps little once OSD is crashing and it may be due to good reason like when read errors are detected on HDD... > - Mounting OSD filesystems: For sysvinit the init script mounts the OSD > filesystem. None of the proposed systemd solutions mounts any > filesystems. How did you miss "RequiresMountsFor=/var/lib/ceph/osd/ceph-%i" in my "ceph- [email protected]" file? > I think that mounting filesystems should not be done in > the ceph init scripts (independent of init system used). What's the > reason this was added to the init scripts and can't be done from > /etc/fstab like all other filesystems? My prefered solution for > systemd is to mount filesystems from /etc/fstab and to have > "RequiresMountsFor=/var/lib/ceph/mds/ceph-%i" in the individual > service files to ensure that the filesystem is mounted. An alternative > would be to create mount units or a generator similar to > systemd-fstab-generator. But this sounds like a lot of work for little > gain. I do not create a dedicated mount point for MDS... It sits on the same partition (RAID-1 + hot spare) as operating system... I don't want to assume that MON and MDS services need their dedicated mount points. -- All the best, Dmitry Smirnov. --- Each generation imagines itself to be more intelligent than the one that went before it, and wiser than the one that comes after it. -- George Orwell, Review of "A Coat of Many Colours: Occasional Essays" by Herbert Read, Poetry Quarterly (Winter 1945)
signature.asc
Description: This is a digitally signed message part.

