On 09/29/16 12:08, Ranjan Ghosh wrote:
> Yes, all the pools have min_size 1:
>
> root@uhu2 /scripts # ceph osd lspools
> 0 rbd,1 cephfs_data,2 cephfs_metadata,
> root@uhu2 /scripts # ceph osd pool get cephfs_data min_size
> min_size: 1
> root@uhu2 /scripts # ceph osd pool get cephfs_metadata min_size
> min_size: 1

What about the rbd pool?

(FYI you can see all pool sizes with:  ceph osd pool ls detail)
>
> I stopped all the ceph services gracefully on the first machine. But,
> just to get this straight: What if the first machine really suffered a
> catastrophic failure? My expectation was, that the second machine just
> keeps on running and serving files? This is why we are using a Cluster
> in the first place... Or is already this expectation wrong?
>
> When I stop the services on node1, I get this:
>
> # ceph pg stat
> 2016-09-29 11:51:09.514814 7fcba012f700  0 -- :/1939885874 >>
> 136.243.82.227:6789/0 pipe(0x7fcb9c05a730 sd=3 :0 s=1 pgs=0 cs=0 l=1
> c=0x7fcb9c05c3f0).fault
> v41732: 264 pgs: 264 active+clean; 18514 MB data, 144 GB used, 3546 GB
> / 3690 GB avail; 1494 B/s rd, 0 op/s
>
And also you could try:
    ceph osd down <osd id>

Which will immediately mark it as down, instead of letting it time out
first. With 2 osds, maybe there is no consensus about whether it is
down, so it takes long to time out. The mons aren't doing the consensus
here... instead it's the osds that inform the mons.

> So, my question still is: Is there a way to (preferably) automatically
> avoid such a situation? Or at least manually tell the second node to
> keep on working and forget about those files?
>
> BR,
> Ranjan 

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to