We have a cuttlefish (0.61.9) 192-OSD cluster that has lost network
availability several times since this past Thursday and whose nodes were
all rebooted twice (hastily and inadvisably each time). The final reboot,
which was supposed to be "the last thing" before recovery according to our
data center team, resulted in a failure of the cluster's 4 monitors. This
happened yesterday afternoon.

[ By the way, we use Ceph to back Cinder and Glance in our OpenStack Cloud,
block storage only; also this network problems were the result of our data
center team executing maintenance on our switches that was supposed to be
quick and painless ]

After working all day on various troubleshooting techniques found here and
there, we have this situation on our monitor nodes (debug 20):


node-10: dead. ceph-mon will not start

node-14: Seemed to rebuild its monmap. The log has stopped reporting with
this final tail -100: http://pastebin.com/tLiq2ewV

node-16: Same as 14, similar outcome in the log:
http://pastebin.com/W87eT7Mw

node-15: ceph-mon starts but even at debug 20, it will only ouput this
line, over and over again:

       2015-03-18 14:54:35.859511 7f8c82ad3700 -1 asok(0x2e560e0)
AdminSocket: request 'mon_status' not defined

node-02: I added this guy to replace node-10. I updated ceph.conf and
pushed it to all the monitor nodes (the osd nodes without monitors did not
get the config push). Since he's a new guy the log out is obviously
different, but again, here are the last 50 lines:
http://pastebin.com/pfixdD3d


I run my ceph client from my OpenStack controller. All ceph -s shows me is
faults, albeit only to node-15

2015-03-18 16:47:27.145194 7ff762cff700  0 -- 192.168.241.100:0/15112 >>
192.168.241.115:6789/0 pipe(0x7ff75000cf00 sd=3 :0 s=1 pgs=0 cs=0 l=1).fault


Finally, here is our ceph.conf: http://pastebin.com/Gmiq2V8S

So that's where we stand. Did we kill our Ceph Cluster (and thus our
OpenStack Cloud)? Or is there hope? Any suggestions would be greatly
appreciated.


-- 
\*..+.-
--Greg Chavez
+//..;};
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to