Hi Steve
Thanks for your answer. I don't have a private network defined. Furthermore, in my current testing configuration, there is only one OSD, so communication between OSDs should be a non-issue.
Do you know how OSD up/down state is determined when there is only one OSD?
Best,
Jeff

On 01/18/2016 03:59 PM, Steve Taylor wrote:
Do you have a ceph private network defined in your config file? I've seen this 
before in that situation where the private network isn't functional. The osds 
can talk to the mon(s) but not to each other, so they report each other as down 
when they're all running just fine.


Steve Taylor | Senior Software Engineer | StorageCraft Technology Corporation
380 Data Drive Suite 300 | Draper | Utah | 84020
Office: 801.871.2799 | Fax: 801.545.4705

If you are not the intended recipient of this message, be advised that any 
dissemination or copying of this message is prohibited.
If you received this message erroneously, please notify the sender and delete 
it, together with any attachments.


-----Original Message-----
From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of Jeff 
Epstein
Sent: Friday, January 15, 2016 7:28 PM
To: ceph-users <ceph-users@lists.ceph.com>
Subject: [ceph-users] OSDs are down, don't know why

Hello,

I'm setting up a small test instance of ceph and I'm running into a situation 
where the OSDs are being shown as down, but I don't know why.

Connectivity seems to be working. The OSD hosts are able to communicate with the MON hosts; running 
"ceph status" and "ceph osd in" from an OSD host works fine, but with a 
HEALTH_WARN that I have 2 osds: 0 up, 2 in.
Both the OSD and MON daemons seem to be running fine. Network connectivity 
seems to be okay: I can nc from the OSD to port 6789 on the MON, and from the 
MON to port 6800-6803 on the OSD (I have constrained the ms bind port min/max 
config options so that the OSDs will use only these ports). Neither OSD nor MON 
logs show anything that seems unusual, nor why the OSD is marked as being down.

Furthermore, using tcpdump i've watched network traffic between the OSD and the 
MON, and it seems that the OSD is sending heartbeats and getting an ack from 
the MON. So I'm definitely not sure why the MON thinks the OSD is down.

Some questions:
- How does the MON determine if the OSD is down?
- Is there a way to get the MON to report on why an OSD is down, e.g. no 
heartbeat?
- Is there any need to open ports other than TCP 6789 and 6800-6803?
- Any other suggestions?

ceph 0.94 on Debian Jessie

Best,
Jeff
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to