On 11/7/18 1:00 AM, Hayashida, Mami wrote:
> I see.  Thank you for clarifying lots of things along the way -- this
> has been extremely helpful.   Neither "df | grep osd" nor "mount | grep
> osd" shows ceph-60 through 69.

OK, that isn't right then. I suggest you try this:

1) bring down OSD 60-69 (systemctl stop ceph-osd@60 etc)

2) move those directories out of the way, as in:

mkdir /var/lib/ceph/osd_old
mv /var/lib/ceph/osd/ceph-6[0-9] /var/lib/ceph/osd_old

(if this all works out you can delete them, just want to make sure you
don't accidentally wipe something important)

2) run `find /etc/systemd/system | grep ceph-volume` and check the
output. You're looking for symlinks in multi-user.target.wants or similar.

There should be a single "ceph-volume@lvm-<id>-<uuid>" entry for each
OSD, and the id and uuid should match the "ceph.osd_id" and
"ceph.osd_fsid" LVM tags from `ceph-volume lvm list`. You can also use
`lvs -o vg_name,name,lv_tags`

If you see anything of the format "ceph-volume@simple-..." then that is
old junk from previous attempts at using ceph-volume. They should be
symlinks and you should delete them and run `systemctl daemon-reload`.
Same story if you see any @lvm symlinks but with incorrect OSD IDs or
fsids. All of this should be recreated by the next step anyway if
deleted, so it should be safe to delete any symlinks in there that you
think might be wrong.

3) Run `ceph-volume lvm activate --all`

At this point `df` and `mount` should show tmpfs mounts for all your LVM
OSDs, and they should be up. List the OSD directories and check that
both `block` and `block.db` entries are symlinks to the right devices.
The right target symlinks should also have been created/enabled in
/etc/systemd/system/multi-user.target.wants.

The LVM dump you provided is correct. I suspect what happened is that
somewhere during this experiment OSDs were activated into the root
filesystem (instead of a tmpfs), perhaps using the ceph-volume simple
mode, perhaps something else. Since all the metadata is in LVM, it's
safe to move or delete all those OSD directories for BlueStore OSDs and
try activating them cleanly again, which hopefully will do the right thing.

In the end this all might fix your device ownership woes too, making the
udev rule unnecessary. If it all works out, try a reboot and see if
everything comes back up as it should.

-- 
Hector Martin (hec...@marcansoft.com)
Public Key: https://mrcn.st/pub
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to