I definitely had all the rbd volumes unmounted. I am not sure if they were unmapped. I can try that.
On Fri, Feb 10, 2017 at 9:10 PM, Brad Hubbard <bhubb...@redhat.com> wrote: > On Sat, Feb 11, 2017 at 2:58 PM, Brad Hubbard <bhubb...@redhat.com> wrote: > > Just making sure the list sees this for those that are following. > > > > On Sat, Feb 11, 2017 at 2:49 PM, Michael Andersen <mich...@steelcode.com> > wrote: > >> Right, so yes libceph is loaded > >> > >> root@compound-7:~# lsmod | egrep "ceph|rbd" > >> rbd 69632 0 > >> libceph 245760 1 rbd > >> libcrc32c 16384 3 xfs,raid456,libceph > >> > >> I stopped all the services and unloaded the modules > >> > >> root@compound-7:~# systemctl stop ceph\*.service ceph\*.target > >> root@compound-7:~# modprobe -r rbd > >> root@compound-7:~# modprobe -r libceph > >> root@compound-7:~# lsmod | egrep "ceph|rbd" > >> > >> Then rebooted > >> root@compound-7:~# reboot > >> > >> And sure enough the reboot happened OK. > >> > >> So that solves my immediate problem, I now know how to work around it > >> (thanks!), but I would love to work out how to not need this step. Any > > Can you double-check that all rbd volumes are unmounted on this host > when shutting down? Maybe unmap them just for good measure. > > I don't believe the libceph module should need to talk to the cluster > unless it has active connections at the time of shutdown. > > >> further info I can give to help? > >> > >> > >> > >> On Fri, Feb 10, 2017 at 8:42 PM, Michael Andersen < > mich...@steelcode.com> > >> wrote: > >>> > >>> Sorry this email arrived out of order. I will do the modprobe -r test > >>> > >>> On Fri, Feb 10, 2017 at 8:20 PM, Brad Hubbard <bhubb...@redhat.com> > wrote: > >>>> > >>>> On Sat, Feb 11, 2017 at 2:08 PM, Michael Andersen < > mich...@steelcode.com> > >>>> wrote: > >>>> > I believe I did shutdown mon process. Is that not done by the > >>>> > > >>>> > sudo systemctl stop ceph\*.service ceph\*.target > >>>> > > >>>> > command? Also, as I noted, the mon process does not show up in ps > after > >>>> > I do > >>>> > that, but I still get the shutdown halting. > >>>> > > >>>> > The libceph kernel module may be installed. I did not do so > >>>> > deliberately but > >>>> > I used ceph-deploy so if it installs that then that is why it's > there. > >>>> > I > >>>> > also run some kubernetes pods with rbd persistent volumes on these > >>>> > machines, > >>>> > although no rbd volumes are in use or mounted when I try shut down. > In > >>>> > fact > >>>> > I unmapped all rbd volumes across the whole cluster to make sure. Is > >>>> > libceph > >>>> > required for rbd? > >>>> > >>>> For kernel rbd (/dev/rbd0, etc.) yes, for librbd, no. > >>>> > >>>> As a test try modprobe -r on both the libceph and rbd modules before > >>>> shutdown and see if that helps ("modprobe -r rbd" should unload > >>>> libceph as well but verify that). > >>>> > >>>> > > >>>> > But even so, is it normal for the libceph kernel module to prevent > >>>> > shutdown? > >>>> > Is there another stage in the shutdown procedure that I am missing? > >>>> > > >>>> > > >>>> > On Feb 10, 2017 7:49 PM, "Brad Hubbard" <bhubb...@redhat.com> > wrote: > >>>> > > >>>> > That looks like dmesg output from the libceph kernel module. Do you > >>>> > have the libceph kernel module loaded? > >>>> > > >>>> > If the answer to that question is "yes" the follow-up question is > >>>> > "Why?" as it is not required for a MON or OSD host. > >>>> > > >>>> > On Sat, Feb 11, 2017 at 1:18 PM, Michael Andersen > >>>> > <mich...@steelcode.com> > >>>> > wrote: > >>>> >> Yeah, all three mons have OSDs on the same machines. > >>>> >> > >>>> >> On Feb 10, 2017 7:13 PM, "Shinobu Kinjo" <ski...@redhat.com> > wrote: > >>>> >>> > >>>> >>> Is your primary MON running on the host which some OSDs are > running > >>>> >>> on? > >>>> >>> > >>>> >>> On Sat, Feb 11, 2017 at 11:53 AM, Michael Andersen > >>>> >>> <mich...@steelcode.com> wrote: > >>>> >>> > Hi > >>>> >>> > > >>>> >>> > I am running a small cluster of 8 machines (80 osds), with three > >>>> >>> > monitors on > >>>> >>> > Ubuntu 16.04. Ceph version 10.2.5. > >>>> >>> > > >>>> >>> > I cannot reboot the monitors without physically going into the > >>>> >>> > datacenter > >>>> >>> > and power cycling them. What happens is that while shutting > down, > >>>> >>> > ceph > >>>> >>> > gets > >>>> >>> > stuck trying to contact the other monitors but networking has > >>>> >>> > already > >>>> >>> > shut > >>>> >>> > down or something like that. I get an endless stream of: > >>>> >>> > > >>>> >>> > libceph: connect 10.20.0.10:6789 error -101 > >>>> >>> > libceph: connect 10.20.0.13:6789 error -101 > >>>> >>> > libceph: connect 10.20.0.17:6789 error -101 > >>>> >>> > > >>>> >>> > where in this case 10.20.0.10 is the machine I am trying to shut > >>>> >>> > down > >>>> >>> > and > >>>> >>> > all three IPs are the MONs. > >>>> >>> > > >>>> >>> > At this stage of the shutdown, the machine doesn't respond to > >>>> >>> > pings, > >>>> >>> > and > >>>> >>> > I > >>>> >>> > cannot even log in on any of the virtual terminals. Nothing to > do > >>>> >>> > but > >>>> >>> > poweroff at the server. > >>>> >>> > > >>>> >>> > The other non-mon servers shut down just fine, and the cluster > was > >>>> >>> > healthy > >>>> >>> > at the time I was rebooting the mon (I only reboot one machine > at a > >>>> >>> > time, > >>>> >>> > waiting for it to come up before I do the next one). > >>>> >>> > > >>>> >>> > Also worth mentioning that if I execute > >>>> >>> > > >>>> >>> > sudo systemctl stop ceph\*.service ceph\*.target > >>>> >>> > > >>>> >>> > on the server, the only things I see are: > >>>> >>> > > >>>> >>> > root 11143 2 0 18:40 ? 00:00:00 [ceph-msgr] > >>>> >>> > root 11162 2 0 18:40 ? 00:00:00 > [ceph-watch-noti] > >>>> >>> > > >>>> >>> > and even then, when no ceph daemons are left running, doing a > >>>> >>> > reboot > >>>> >>> > goes > >>>> >>> > into the same loop. > >>>> >>> > > >>>> >>> > I can't really find any mention of this online, but I feel > someone > >>>> >>> > must > >>>> >>> > have > >>>> >>> > hit this. Any idea how to fix it? It's really annoying because > its > >>>> >>> > hard > >>>> >>> > for > >>>> >>> > me to get access to the datacenter. > >>>> >>> > > >>>> >>> > Thanks > >>>> >>> > Michael > >>>> >>> > > >>>> >>> > _______________________________________________ > >>>> >>> > ceph-users mailing list > >>>> >>> > ceph-users@lists.ceph.com > >>>> >>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >>>> >>> > > >>>> >> > >>>> >> > >>>> >> _______________________________________________ > >>>> >> ceph-users mailing list > >>>> >> ceph-users@lists.ceph.com > >>>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >>>> >> > >>>> > > >>>> > > >>>> > > >>>> > -- > >>>> > Cheers, > >>>> > Brad > >>>> > > >>>> > > >>>> > >>>> > >>>> > >>>> -- > >>>> Cheers, > >>>> Brad > >>> > >>> > >> > > > > > > > > -- > > Cheers, > > Brad > > > > -- > Cheers, > Brad >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com