Just making sure the list sees this for those that are following.

On Sat, Feb 11, 2017 at 2:49 PM, Michael Andersen <mich...@steelcode.com> wrote:
> Right, so yes libceph is loaded
>
> root@compound-7:~# lsmod | egrep "ceph|rbd"
> rbd                    69632  0
> libceph               245760  1 rbd
> libcrc32c              16384  3 xfs,raid456,libceph
>
> I stopped all the services and unloaded the modules
>
> root@compound-7:~# systemctl stop ceph\*.service ceph\*.target
> root@compound-7:~# modprobe -r rbd
> root@compound-7:~# modprobe -r libceph
> root@compound-7:~# lsmod | egrep "ceph|rbd"
>
> Then rebooted
> root@compound-7:~# reboot
>
> And sure enough the reboot happened OK.
>
> So that solves my immediate problem, I now know how to work around it
> (thanks!), but I would love to work out how to not need this step. Any
> further info I can give to help?
>
>
>
> On Fri, Feb 10, 2017 at 8:42 PM, Michael Andersen <mich...@steelcode.com>
> wrote:
>>
>> Sorry this email arrived out of order. I will do the modprobe -r test
>>
>> On Fri, Feb 10, 2017 at 8:20 PM, Brad Hubbard <bhubb...@redhat.com> wrote:
>>>
>>> On Sat, Feb 11, 2017 at 2:08 PM, Michael Andersen <mich...@steelcode.com>
>>> wrote:
>>> > I believe I did shutdown mon process. Is that not done by the
>>> >
>>> > sudo systemctl stop ceph\*.service ceph\*.target
>>> >
>>> > command? Also, as I noted, the mon process does not show up in ps after
>>> > I do
>>> > that, but I still get the shutdown halting.
>>> >
>>> > The libceph kernel module may be installed. I did not do so
>>> > deliberately but
>>> > I used ceph-deploy so if it installs that then that is why it's there.
>>> > I
>>> > also run some kubernetes pods with rbd persistent volumes on these
>>> > machines,
>>> > although no rbd volumes are in use or mounted when I try shut down. In
>>> > fact
>>> > I unmapped all rbd volumes across the whole cluster to make sure. Is
>>> > libceph
>>> > required for rbd?
>>>
>>> For kernel rbd (/dev/rbd0, etc.) yes, for librbd, no.
>>>
>>> As a test try modprobe -r on both the libceph and rbd modules before
>>> shutdown and see if that helps ("modprobe -r rbd" should unload
>>> libceph as well but verify that).
>>>
>>> >
>>> > But even so, is it normal for the libceph kernel module to prevent
>>> > shutdown?
>>> > Is there another stage in the shutdown procedure that I am missing?
>>> >
>>> >
>>> > On Feb 10, 2017 7:49 PM, "Brad Hubbard" <bhubb...@redhat.com> wrote:
>>> >
>>> > That looks like dmesg output from the libceph kernel module. Do you
>>> > have the libceph kernel module loaded?
>>> >
>>> > If the answer to that question is "yes" the follow-up question is
>>> > "Why?" as it is not required for a MON or OSD host.
>>> >
>>> > On Sat, Feb 11, 2017 at 1:18 PM, Michael Andersen
>>> > <mich...@steelcode.com>
>>> > wrote:
>>> >> Yeah, all three mons have OSDs on the same machines.
>>> >>
>>> >> On Feb 10, 2017 7:13 PM, "Shinobu Kinjo" <ski...@redhat.com> wrote:
>>> >>>
>>> >>> Is your primary MON running on the host which some OSDs are running
>>> >>> on?
>>> >>>
>>> >>> On Sat, Feb 11, 2017 at 11:53 AM, Michael Andersen
>>> >>> <mich...@steelcode.com> wrote:
>>> >>> > Hi
>>> >>> >
>>> >>> > I am running a small cluster of 8 machines (80 osds), with three
>>> >>> > monitors on
>>> >>> > Ubuntu 16.04. Ceph version 10.2.5.
>>> >>> >
>>> >>> > I cannot reboot the monitors without physically going into the
>>> >>> > datacenter
>>> >>> > and power cycling them. What happens is that while shutting down,
>>> >>> > ceph
>>> >>> > gets
>>> >>> > stuck trying to contact the other monitors but networking has
>>> >>> > already
>>> >>> > shut
>>> >>> > down or something like that. I get an endless stream of:
>>> >>> >
>>> >>> > libceph: connect 10.20.0.10:6789 error -101
>>> >>> > libceph: connect 10.20.0.13:6789 error -101
>>> >>> > libceph: connect 10.20.0.17:6789 error -101
>>> >>> >
>>> >>> > where in this case 10.20.0.10 is the machine I am trying to shut
>>> >>> > down
>>> >>> > and
>>> >>> > all three IPs are the MONs.
>>> >>> >
>>> >>> > At this stage of the shutdown, the machine doesn't respond to
>>> >>> > pings,
>>> >>> > and
>>> >>> > I
>>> >>> > cannot even log in on any of the virtual terminals. Nothing to do
>>> >>> > but
>>> >>> > poweroff at the server.
>>> >>> >
>>> >>> > The other non-mon servers shut down just fine, and the cluster was
>>> >>> > healthy
>>> >>> > at the time I was rebooting the mon (I only reboot one machine at a
>>> >>> > time,
>>> >>> > waiting for it to come up before I do the next one).
>>> >>> >
>>> >>> > Also worth mentioning that if I execute
>>> >>> >
>>> >>> > sudo systemctl stop ceph\*.service ceph\*.target
>>> >>> >
>>> >>> > on the server, the only things I see are:
>>> >>> >
>>> >>> > root     11143     2  0 18:40 ?        00:00:00 [ceph-msgr]
>>> >>> > root     11162     2  0 18:40 ?        00:00:00 [ceph-watch-noti]
>>> >>> >
>>> >>> > and even then, when no ceph daemons are left running, doing a
>>> >>> > reboot
>>> >>> > goes
>>> >>> > into the same loop.
>>> >>> >
>>> >>> > I can't really find any mention of this online, but I feel someone
>>> >>> > must
>>> >>> > have
>>> >>> > hit this. Any idea how to fix it? It's really annoying because its
>>> >>> > hard
>>> >>> > for
>>> >>> > me to get access to the datacenter.
>>> >>> >
>>> >>> > Thanks
>>> >>> > Michael
>>> >>> >
>>> >>> > _______________________________________________
>>> >>> > ceph-users mailing list
>>> >>> > ceph-users@lists.ceph.com
>>> >>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> >>> >
>>> >>
>>> >>
>>> >> _______________________________________________
>>> >> ceph-users mailing list
>>> >> ceph-users@lists.ceph.com
>>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > Cheers,
>>> > Brad
>>> >
>>> >
>>>
>>>
>>>
>>> --
>>> Cheers,
>>> Brad
>>
>>
>



-- 
Cheers,
Brad
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to