On Mon, 17 Nov 2014 17:39:07 Gaudenz Steinlin wrote:
> I don't particularly like this in it's current state. The line
> "ExecStartPre=-/bin/systemctl start ceph-osd*" seems very wrong to me.

"Very wrong"? Why? IMHO it is elegant because you can't define dependency on 
service if you don't know its name (or maybe I just couldn't find how to write 
a dependency on something like "ceph-osd@#"). It works correctly too as it can 
start only _enabled_ services.


> I'm not a systemd expert but I did not find an easy way to create
> something like a meta-service in a way that looks like integrated into
> systemd. But then I don't think that's needed either. The way the
> current init script tries to start all the different daemons in one
> script always seemd odd to me. Do we need a meta service like this?

Unfortunately we need it for compatibility. Otherwise systemd tries to start 
SysV script and we have a mess much worse than just having no-op meta 
service...


> I agree with this. Having multiple instances per machine of ceph-mon or
> ceph-mds does not make sense. On the other hand your proposed
> implementation uses "%H" which resolves to the hostname. This is not
> compatible with the current implementation in the init script which
> parses the configuration file to find the id of the mds and mon. I'm not
> sure how to solve this, but IMO all distributions should do this in the
> same way and at the very least we need an upgrade path for users that
> don't have the hostname as the id of their mon and mds (like having
> mon.1, mon.2, ... instead of node1, node2, ...). I see 3 possible
> solutions:
> 
> - Add a script similar to the code in the current init script which
>   parses the config file to get the id and use that when starting the
>   daemon.
> - Agreement that mons and mds should have their ids equal to the
>   hostname. I don't really like that solution as it seems quite
>   inflexible.
> - Use a service template (with the @) nonetheless. This is probably the
>   simplest solution but requires more manual intervention by the cluster
>   administrator. He has to set the id manually when enabling the service.

I didn't look deeper into this argument but if I recall correctly you can't 
start daemon without passing hostname, right? It _seems_ to work since it may 
try to bring up service even when it doesn't have corresponding section in 
ceph.conf... would it be enough if upstream modifies service to ignore host 
name supplied through command line and use ceph.conf only? I reckon it is 
merely a documentation issue which may worth mentioning for transition to 
systemd rather than introduce and fairly ugly workarounds...



> Some other discussion points:
> - Restart policy: I think we should take advantage of the fact that
>   systemd can monitor processes and restart them if they fail. I propose
>   to start the daemon in the forground (like it's done already) and set
>   "Restart=on-failure". See man systemd.service[1] for the details what
>   this means. Do we need custom values for RestartSec (time to sleep
>   before restart, default 100ms), StartLimitInterval, StartLimitBurst
>   (both related to start rate limiting, default 5 times in 10 seconds)?

See boilerplate in my "[email protected]". Yes, we need all this to avoid 
infinite restarts as well as to avoid resarting services too fast. 
Malfunctioning OSD which dies soon after it restarted can put cluster to 
permanent "peering" state if restarted too often.

One thing which makes me _very very_ unhappy about Ceph is that its OSDs are 
unstable because upstream do not treat 'em like mission-critical service and 
plugs untested code paths with asserts. I had cascade of OSD falures spreading 
like bushfire over the cluster more than twice and I just can't trust system 
like this with my data. Weeks of down time is just not acceptable...

Nevertheless SysV init script do not handle restarts so for compatibility and 
due to above concerns we may decide not to use systemd auto restart 
facilities. In reality it helps little once OSD is crashing and it may be due 
to good reason like when read errors are detected on HDD...


> - Mounting OSD filesystems: For sysvinit the init script mounts the OSD
>   filesystem. None of the proposed systemd solutions mounts any
>   filesystems.

How did you miss "RequiresMountsFor=/var/lib/ceph/osd/ceph-%i" in my "ceph-
[email protected]" file?


>   I think that mounting filesystems should not be done in
>   the ceph init scripts (independent of init system used). What's the
>   reason this was added to the init scripts and can't be done from
>   /etc/fstab like all other filesystems? My prefered solution for
>   systemd is to mount filesystems from /etc/fstab and to have
>   "RequiresMountsFor=/var/lib/ceph/mds/ceph-%i" in the individual
>   service files to ensure that the filesystem is mounted. An alternative
>   would be to create mount units or a generator similar to
>   systemd-fstab-generator. But this sounds like a lot of work for little
>   gain.

I do not create a dedicated mount point for MDS... It sits on the same 
partition (RAID-1 + hot spare) as operating system...
I don't want to assume that MON and MDS services need their dedicated mount 
points.

-- 
All the best,
 Dmitry Smirnov.

---

Each generation imagines itself to be more intelligent than the one that
went before it, and wiser than the one that comes after it.
        -- George Orwell, Review of "A Coat of Many Colours: Occasional
           Essays" by Herbert Read, Poetry Quarterly (Winter 1945)

Attachment: signature.asc
Description: This is a digitally signed message part.

Reply via email to