On 09/29/16 12:08, Ranjan Ghosh wrote: > Yes, all the pools have min_size 1: > > root@uhu2 /scripts # ceph osd lspools > 0 rbd,1 cephfs_data,2 cephfs_metadata, > root@uhu2 /scripts # ceph osd pool get cephfs_data min_size > min_size: 1 > root@uhu2 /scripts # ceph osd pool get cephfs_metadata min_size > min_size: 1
What about the rbd pool? (FYI you can see all pool sizes with: ceph osd pool ls detail) > > I stopped all the ceph services gracefully on the first machine. But, > just to get this straight: What if the first machine really suffered a > catastrophic failure? My expectation was, that the second machine just > keeps on running and serving files? This is why we are using a Cluster > in the first place... Or is already this expectation wrong? > > When I stop the services on node1, I get this: > > # ceph pg stat > 2016-09-29 11:51:09.514814 7fcba012f700 0 -- :/1939885874 >> > 136.243.82.227:6789/0 pipe(0x7fcb9c05a730 sd=3 :0 s=1 pgs=0 cs=0 l=1 > c=0x7fcb9c05c3f0).fault > v41732: 264 pgs: 264 active+clean; 18514 MB data, 144 GB used, 3546 GB > / 3690 GB avail; 1494 B/s rd, 0 op/s > And also you could try: ceph osd down <osd id> Which will immediately mark it as down, instead of letting it time out first. With 2 osds, maybe there is no consensus about whether it is down, so it takes long to time out. The mons aren't doing the consensus here... instead it's the osds that inform the mons. > So, my question still is: Is there a way to (preferably) automatically > avoid such a situation? Or at least manually tell the second node to > keep on working and forget about those files? > > BR, > Ranjan _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com