I definitely had all the rbd volumes unmounted. I am not sure if they were
unmapped. I can try that.

On Fri, Feb 10, 2017 at 9:10 PM, Brad Hubbard <bhubb...@redhat.com> wrote:

> On Sat, Feb 11, 2017 at 2:58 PM, Brad Hubbard <bhubb...@redhat.com> wrote:
> > Just making sure the list sees this for those that are following.
> >
> > On Sat, Feb 11, 2017 at 2:49 PM, Michael Andersen <mich...@steelcode.com>
> wrote:
> >> Right, so yes libceph is loaded
> >>
> >> root@compound-7:~# lsmod | egrep "ceph|rbd"
> >> rbd                    69632  0
> >> libceph               245760  1 rbd
> >> libcrc32c              16384  3 xfs,raid456,libceph
> >>
> >> I stopped all the services and unloaded the modules
> >>
> >> root@compound-7:~# systemctl stop ceph\*.service ceph\*.target
> >> root@compound-7:~# modprobe -r rbd
> >> root@compound-7:~# modprobe -r libceph
> >> root@compound-7:~# lsmod | egrep "ceph|rbd"
> >>
> >> Then rebooted
> >> root@compound-7:~# reboot
> >>
> >> And sure enough the reboot happened OK.
> >>
> >> So that solves my immediate problem, I now know how to work around it
> >> (thanks!), but I would love to work out how to not need this step. Any
>
> Can you double-check that all rbd volumes are unmounted on this host
> when shutting down? Maybe unmap them just for good measure.
>
> I don't believe the libceph module should need to talk to the cluster
> unless it has active connections at the time of shutdown.
>
> >> further info I can give to help?
> >>
> >>
> >>
> >> On Fri, Feb 10, 2017 at 8:42 PM, Michael Andersen <
> mich...@steelcode.com>
> >> wrote:
> >>>
> >>> Sorry this email arrived out of order. I will do the modprobe -r test
> >>>
> >>> On Fri, Feb 10, 2017 at 8:20 PM, Brad Hubbard <bhubb...@redhat.com>
> wrote:
> >>>>
> >>>> On Sat, Feb 11, 2017 at 2:08 PM, Michael Andersen <
> mich...@steelcode.com>
> >>>> wrote:
> >>>> > I believe I did shutdown mon process. Is that not done by the
> >>>> >
> >>>> > sudo systemctl stop ceph\*.service ceph\*.target
> >>>> >
> >>>> > command? Also, as I noted, the mon process does not show up in ps
> after
> >>>> > I do
> >>>> > that, but I still get the shutdown halting.
> >>>> >
> >>>> > The libceph kernel module may be installed. I did not do so
> >>>> > deliberately but
> >>>> > I used ceph-deploy so if it installs that then that is why it's
> there.
> >>>> > I
> >>>> > also run some kubernetes pods with rbd persistent volumes on these
> >>>> > machines,
> >>>> > although no rbd volumes are in use or mounted when I try shut down.
> In
> >>>> > fact
> >>>> > I unmapped all rbd volumes across the whole cluster to make sure. Is
> >>>> > libceph
> >>>> > required for rbd?
> >>>>
> >>>> For kernel rbd (/dev/rbd0, etc.) yes, for librbd, no.
> >>>>
> >>>> As a test try modprobe -r on both the libceph and rbd modules before
> >>>> shutdown and see if that helps ("modprobe -r rbd" should unload
> >>>> libceph as well but verify that).
> >>>>
> >>>> >
> >>>> > But even so, is it normal for the libceph kernel module to prevent
> >>>> > shutdown?
> >>>> > Is there another stage in the shutdown procedure that I am missing?
> >>>> >
> >>>> >
> >>>> > On Feb 10, 2017 7:49 PM, "Brad Hubbard" <bhubb...@redhat.com>
> wrote:
> >>>> >
> >>>> > That looks like dmesg output from the libceph kernel module. Do you
> >>>> > have the libceph kernel module loaded?
> >>>> >
> >>>> > If the answer to that question is "yes" the follow-up question is
> >>>> > "Why?" as it is not required for a MON or OSD host.
> >>>> >
> >>>> > On Sat, Feb 11, 2017 at 1:18 PM, Michael Andersen
> >>>> > <mich...@steelcode.com>
> >>>> > wrote:
> >>>> >> Yeah, all three mons have OSDs on the same machines.
> >>>> >>
> >>>> >> On Feb 10, 2017 7:13 PM, "Shinobu Kinjo" <ski...@redhat.com>
> wrote:
> >>>> >>>
> >>>> >>> Is your primary MON running on the host which some OSDs are
> running
> >>>> >>> on?
> >>>> >>>
> >>>> >>> On Sat, Feb 11, 2017 at 11:53 AM, Michael Andersen
> >>>> >>> <mich...@steelcode.com> wrote:
> >>>> >>> > Hi
> >>>> >>> >
> >>>> >>> > I am running a small cluster of 8 machines (80 osds), with three
> >>>> >>> > monitors on
> >>>> >>> > Ubuntu 16.04. Ceph version 10.2.5.
> >>>> >>> >
> >>>> >>> > I cannot reboot the monitors without physically going into the
> >>>> >>> > datacenter
> >>>> >>> > and power cycling them. What happens is that while shutting
> down,
> >>>> >>> > ceph
> >>>> >>> > gets
> >>>> >>> > stuck trying to contact the other monitors but networking has
> >>>> >>> > already
> >>>> >>> > shut
> >>>> >>> > down or something like that. I get an endless stream of:
> >>>> >>> >
> >>>> >>> > libceph: connect 10.20.0.10:6789 error -101
> >>>> >>> > libceph: connect 10.20.0.13:6789 error -101
> >>>> >>> > libceph: connect 10.20.0.17:6789 error -101
> >>>> >>> >
> >>>> >>> > where in this case 10.20.0.10 is the machine I am trying to shut
> >>>> >>> > down
> >>>> >>> > and
> >>>> >>> > all three IPs are the MONs.
> >>>> >>> >
> >>>> >>> > At this stage of the shutdown, the machine doesn't respond to
> >>>> >>> > pings,
> >>>> >>> > and
> >>>> >>> > I
> >>>> >>> > cannot even log in on any of the virtual terminals. Nothing to
> do
> >>>> >>> > but
> >>>> >>> > poweroff at the server.
> >>>> >>> >
> >>>> >>> > The other non-mon servers shut down just fine, and the cluster
> was
> >>>> >>> > healthy
> >>>> >>> > at the time I was rebooting the mon (I only reboot one machine
> at a
> >>>> >>> > time,
> >>>> >>> > waiting for it to come up before I do the next one).
> >>>> >>> >
> >>>> >>> > Also worth mentioning that if I execute
> >>>> >>> >
> >>>> >>> > sudo systemctl stop ceph\*.service ceph\*.target
> >>>> >>> >
> >>>> >>> > on the server, the only things I see are:
> >>>> >>> >
> >>>> >>> > root     11143     2  0 18:40 ?        00:00:00 [ceph-msgr]
> >>>> >>> > root     11162     2  0 18:40 ?        00:00:00
> [ceph-watch-noti]
> >>>> >>> >
> >>>> >>> > and even then, when no ceph daemons are left running, doing a
> >>>> >>> > reboot
> >>>> >>> > goes
> >>>> >>> > into the same loop.
> >>>> >>> >
> >>>> >>> > I can't really find any mention of this online, but I feel
> someone
> >>>> >>> > must
> >>>> >>> > have
> >>>> >>> > hit this. Any idea how to fix it? It's really annoying because
> its
> >>>> >>> > hard
> >>>> >>> > for
> >>>> >>> > me to get access to the datacenter.
> >>>> >>> >
> >>>> >>> > Thanks
> >>>> >>> > Michael
> >>>> >>> >
> >>>> >>> > _______________________________________________
> >>>> >>> > ceph-users mailing list
> >>>> >>> > ceph-users@lists.ceph.com
> >>>> >>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>> >>> >
> >>>> >>
> >>>> >>
> >>>> >> _______________________________________________
> >>>> >> ceph-users mailing list
> >>>> >> ceph-users@lists.ceph.com
> >>>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >>>> >>
> >>>> >
> >>>> >
> >>>> >
> >>>> > --
> >>>> > Cheers,
> >>>> > Brad
> >>>> >
> >>>> >
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Cheers,
> >>>> Brad
> >>>
> >>>
> >>
> >
> >
> >
> > --
> > Cheers,
> > Brad
>
>
>
> --
> Cheers,
> Brad
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to