[ceph-users] Re: [CEPH] Ceph multi nodes failed
Hello. I get it. I will do a test and let you know. Thank you much. Nguyen Huu Khoi On Fri, Nov 24, 2023 at 5:01 PM Janne Johansson wrote: > Den fre 24 nov. 2023 kl 08:53 skrev Nguyễn Hữu Khôi < > nguyenhuukho...@gmail.com>: > > > > Hello. > > I have 10 nodes. My goal is to ensure that I won't lose data if 2 nodes > > fail. > > Now you are mixing terms here. > > There is a difference between "cluster stops" and "losing data". > > If you have EC 8+2 and min_size 9, then when you stop two hosts, ceph > stops allowing writes exactly so that you do not lose data, making > sure the data is protected until you can get one or two hosts back up > again into the cluster. If you need to keep being able to write to the > cluster with two hosts down, you need EC to be X+3 with min_size = > X+1, this way it will still allow writes when two hosts are down. > > -- > May the most significant bit of your life be positive. > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: [CEPH] Ceph multi nodes failed
Den fre 24 nov. 2023 kl 08:53 skrev Nguyễn Hữu Khôi : > > Hello. > I have 10 nodes. My goal is to ensure that I won't lose data if 2 nodes > fail. Now you are mixing terms here. There is a difference between "cluster stops" and "losing data". If you have EC 8+2 and min_size 9, then when you stop two hosts, ceph stops allowing writes exactly so that you do not lose data, making sure the data is protected until you can get one or two hosts back up again into the cluster. If you need to keep being able to write to the cluster with two hosts down, you need EC to be X+3 with min_size = X+1, this way it will still allow writes when two hosts are down. -- May the most significant bit of your life be positive. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: [CEPH] Ceph multi nodes failed
Hello. I have 10 nodes. My goal is to ensure that I won't lose data if 2 nodes fail. Nguyen Huu Khoi On Fri, Nov 24, 2023 at 2:47 PM Etienne Menguy wrote: > Hello, > > How many nodes do you have? > > > -Original Message- > > From: Nguyễn Hữu Khôi > > Sent: vendredi 24 novembre 2023 07:42 > > To: ceph-users@ceph.io > > Subject: [ceph-users] [CEPH] Ceph multi nodes failed > > > > Hello guys. > > > > I see many docs and threads talking about osd failed. I have a question: > > how many nodes in a cluster can be failed. > > > > I am using ec 8 + 2(10 osd nodes) and when I shutdown 2 nodes then my > > cluster crashes, It cannot write anymore. > > > > Thank you. Regards > > > > Nguyen Huu Khoi > > ___ > > ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an > email to > > ceph-users-le...@ceph.io > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: [CEPH] Ceph multi nodes failed
Hello, How many nodes do you have? > -Original Message- > From: Nguyễn Hữu Khôi > Sent: vendredi 24 novembre 2023 07:42 > To: ceph-users@ceph.io > Subject: [ceph-users] [CEPH] Ceph multi nodes failed > > Hello guys. > > I see many docs and threads talking about osd failed. I have a question: > how many nodes in a cluster can be failed. > > I am using ec 8 + 2(10 osd nodes) and when I shutdown 2 nodes then my > cluster crashes, It cannot write anymore. > > Thank you. Regards > > Nguyen Huu Khoi > ___ > ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to > ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: [CEPH] Ceph multi nodes failed
Hello. I am reading. Thank you for information. Nguyen Huu Khoi On Fri, Nov 24, 2023 at 1:56 PM Eugen Block wrote: > Hi, > > basically, with EC pools you usually have a min_size of k + 1 to > prevent data loss. There was a thread about that just a few days ago > on this list. So in your case your min_size is probably 9, which makes > IO pause in case two chunks become unavailable. If your crush failure > domain is host (seems like it is) and you have "only" 10 hosts I'd > recommend to add a host if possible to be able to fully recover while > one host is down. Otherwise the PGs stay degraded until the host comes > back. > So in your case your cluster can handle only one down host, e. g. for > maintenance. If another host goes down (disk, network, whatever) you > hit the min_size limit. Temporarily, you can set min_size = k but you > should not risk anything and increase back to k + 1 after successful > recovery. It's not possible to change the EC profile of a pool, you'd > have to create a new pool and copy the data. > > Check out the EC docs [1] to have some more details. > > Regards, > Eugen > > [1] > > https://docs.ceph.com/en/quincy/rados/operations/erasure-code/?highlight=k%2B1#erasure-coded-pool-recovery > > Zitat von Nguyễn Hữu Khôi : > > > Hello guys. > > > > I see many docs and threads talking about osd failed. I have a question: > > how many nodes in a cluster can be failed. > > > > I am using ec 8 + 2(10 osd nodes) and when I shutdown 2 nodes then my > > cluster crashes, It cannot write anymore. > > > > Thank you. Regards > > > > Nguyen Huu Khoi > > ___ > > ceph-users mailing list -- ceph-users@ceph.io > > To unsubscribe send an email to ceph-users-le...@ceph.io > > > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: [CEPH] Ceph multi nodes failed
Hi, basically, with EC pools you usually have a min_size of k + 1 to prevent data loss. There was a thread about that just a few days ago on this list. So in your case your min_size is probably 9, which makes IO pause in case two chunks become unavailable. If your crush failure domain is host (seems like it is) and you have "only" 10 hosts I'd recommend to add a host if possible to be able to fully recover while one host is down. Otherwise the PGs stay degraded until the host comes back. So in your case your cluster can handle only one down host, e. g. for maintenance. If another host goes down (disk, network, whatever) you hit the min_size limit. Temporarily, you can set min_size = k but you should not risk anything and increase back to k + 1 after successful recovery. It's not possible to change the EC profile of a pool, you'd have to create a new pool and copy the data. Check out the EC docs [1] to have some more details. Regards, Eugen [1] https://docs.ceph.com/en/quincy/rados/operations/erasure-code/?highlight=k%2B1#erasure-coded-pool-recovery Zitat von Nguyễn Hữu Khôi : Hello guys. I see many docs and threads talking about osd failed. I have a question: how many nodes in a cluster can be failed. I am using ec 8 + 2(10 osd nodes) and when I shutdown 2 nodes then my cluster crashes, It cannot write anymore. Thank you. Regards Nguyen Huu Khoi ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io