On Wed, 29 Jul 2015, Alex Elsayed wrote:
> Sage Weil wrote:
>
> > On Wed, 29 Jul 2015, Alex Elsayed wrote:
> <snip for gmane>
> >> My thinking is more that the "osd data = " key makes a lot less sense in
> >> the systemd world overall - passing the OSD the full path on the
> >> commandline via some --datadir would mean you could trivially use
> >> systemd's instance templating, and just do
> >>
> >> ExecStart=/usr/bin/ceph-osd -f --datadir=/var/lib/ceph/osd/%i
> >>
> >> and be done with it. Could even do RequiresMountsFor=/var/lib/ceph/osd/%i
> >> too, which would order it after (and make it depend on) any systemd.mount
> >> units for that path.
> >
> > Note that there is a 1:1 equivalence between command line options and
> > config options, so osd data = /foo and --osd-data foo are the same thing.
> > Not that I think that matters here--although it's possible to manually
> > specify paths in ceph.conf users can't do that if they want the udev magic
> > to work (that's already true today, without systemd).
>
> Sure, though my thought was that the udev magic would work more sanely _via_
> this. The missing part is loading the cluster and ID from the OSD data dir.
>
> > In any case, though, if your %i above is supposed to be the uuid, that's
> > much less friendly than what we have now, where users can do
> >
> > systemctl stop ceph-osd@12
> >
> > to stop osd.12.
> >
> > I'm not sure it's worth giving up the bind mount complexity unless it
> > really becomes painful to support, given how much nicer the admin
> > experience is...
>
> Well, that does presuppose that they've either SSHed into the machine
> manually, or are using systemctl -H to do so via systemctl. That's already
> not an especially nice user experience, since they need to manually consider
> the cluster's structure.
>
> Something more like 'ceph tell osd.N die' or similar could work, and
> SuccessExitStatus= could be used to make it even nicer (that even if it
> gives a different exit status for "die" as opposed to other successes,
> systemd can say "any of these exit codes are okay, don't autorestart")
>
> However, neither of those handles unmounting, and it still doesn't handle
> starting. All of the above are still partial solutions; hopefully iteration
> can result in something better in all ways.
>
> Also, note that if RequiresMountsFor= is used, unmounting the filesystem -
> by device or by mountpoint - will stop the unit due to proper dependency
> handling. (If RMF doesn't, BindsTo does - BindsTo will additionally do so if
> the device is unmounted or suddenly unplugged without systemd intervention)
>
> systemctl stop dev-sdc.device # all OSDs running off of sdc stop
> systemctl stop dev-sdd1.device # Just one partition this time
>
> Nice and tidy.
So, it seems like plan B would be something like:
- mounts on /var/lib/ceph/osd/data/$uuid. For new backends that have
multiple mounts (newstore likely will), we may also have something like
/var/lib/ceph/osd/data-fast/$uuid as an SSD partition or something.
- systemd ceph-osd@$uuid task runs
ceph-osd --cluster ceph --id 123 --osd-uuid $uuid
- simpler udev rules
- simpler ceph-disk behavior
- The 'one cluster per host' restriction would go away. This is currently
there because we only have a single systemd parameter for the @ services
and we're using the osd id (which is not unique across clusters). The
uuid would be, so that's a win.
But,
- admin can't tell from 'systemctl | grep ceph' or from 'df' or 'mount'
which OSD is which, but they could from 'ps ax | grep ceph-osd'.
- stopping an individual osd would be done by $uuid instead of osd id:
systemctl stop ceph-osd@66f354f2-752e-409f-8194-be05f6b071d9
For an admin this is probably a cut&paste from ps ax output?
- we could perhaps make a 'ceph-disk stop' and 'ceph-disk umount' commands
to make this a bit simpler?
What do people think? I like simple, but I don't want to make life too
hard on the admin.
sage
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html