[ceph-users] Incomplete MON removal
Ceph newbie here; ceph 0.94.2, CentOS 6.6 x86_64. Kernel 2.6.32. Initial test cluster of five OSD nodes, 3 MON, 1 MDS. Working well. I was testing the removal of two MONs, just to see how it works. The second MON was stopped and removed: no problems. The third MON was stopped and removed: apparently no problems, and ceph told me that only one MON remained. However, a "ceph -s", along with many other commands, now hang for 5 minutes and then give me an authentication timeout. On the initial MON node, anderson, I get: # ceph daemon mon.anderson mon_status { "name": "anderson", "rank": 1, "state": "probing", "election_epoch": 0, "quorum": [], "outside_quorum": [ "anderson" ], "extra_probe_peers": [], "sync_provider": [], "monmap": { "epoch": 4, "fsid": "b9aeb134-fe63-46b4-a939-152a6c188f6a", "modified": "2015-07-07 17:18:02.816853", "created": "0.00", "mons": [ { "rank": 0, "name": "benford", "addr": "10.22.200.13:6789\/0" }, { "rank": 1, "name": "anderson", "addr": "10.22.200.16:6789\/0" } ] } } So, no quorum. Here benford is the third MON that was already removed. This removal, which initially appeared to work, evidently did not complete fully. I cannot start a MON on benford, however ("mon.benford not present in monmap"). I cannot start the OSD's on any node. How do I recover from this situation? Steve ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Deadly slow Ceph cluster revisited
On Fri, 17 Jul 2015, J David wrote: f16 inbound: 6Gbps f16 outbound: 6Gbps f17 inbound: 6Gbps f17 outbound: 6Gbps f18 inbound: 6Gbps f18 outbound: 1.2Mbps Unless the network was very busy when you did this, I think that 6 Gb/s may not be very good either. Usually iperf will give you much more than that. For example, between two of my OSD's, I get 9.4 Gb/s, or up to 9.9 Gb/s when nothing else is happening. Steve ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Ceph experiences
Ceph newbie (three weeks). Ceph 0.94.2, CentOS 6.6 x86_64, kernel 2.6.32. Twelve identical OSD's (1 TB each), three MON's, one active MDS and two standby MDS's. 10GbE cluster network, 1GbE public network. Using CephFS on a single client via the 4.1.1 kernel from elrepo; using rsync to copy data to the Ceph file system (mostly small files). Only one client (me). All set up with ceph-deploy. For this test setup, the OSD's are present on two quad-core 3.16GHz hosts with 16GB memory each; six OSD's on each node. Journals are on the OSD drives for now. The two hosts are not user-accessible, and so are doing mostly OSD duty only (but they have light duty iSCSI targets on them). First surprise: I have noticed that the OSD drives do not fill at the same rate. For example, when the Ceph file system was 71% full, I had one OSD go into a full state at 95%, while there is another OSD that is only 51% full, and another at 60%. Second surprise: one full OSD results in ENOSPC for *all* writes, even though there is plenty of space available on other OSD's. I marked the full OSD as out to attempt to rebalance ("ceph osd out ods.0"). This appeared to be working, albeit very slowly. I stopped client writes. Third surprise: restart client writes after about an hour; data is still being written to the full OSD, but the full condition is no longer recognized; it went to 96% before I stopped the client writes one more. That was yesterday evening; today it is down to 91%. File system is not going to be useable until the rebalance completes (looks like taking days). I did not expect any of this. Any thoughts? Steve -- ---- Steve Thompson E-mail: smt AT vgersoft DOT com Voyager Software LLC Web: http://www DOT vgersoft DOT com 39 Smugglers Path VSW Support: support AT vgersoft DOT com Ithaca, NY 14850 "186,282 miles per second: it's not just a good idea, it's the law" ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com