[ceph-users] Re: [CEPH] Ceph multi nodes failed

2023-11-24 Thread Nguyễn Hữu Khôi
Hello.

I get it. I will do a test and let you know.

Thank you much.

Nguyen Huu Khoi


On Fri, Nov 24, 2023 at 5:01 PM Janne Johansson  wrote:

> Den fre 24 nov. 2023 kl 08:53 skrev Nguyễn Hữu Khôi <
> nguyenhuukho...@gmail.com>:
> >
> > Hello.
> > I have 10 nodes. My goal is to ensure that I won't lose data if 2 nodes
> > fail.
>
> Now you are mixing terms here.
>
> There is a difference between "cluster stops" and "losing data".
>
> If you have EC 8+2 and min_size 9, then when you stop two hosts, ceph
> stops allowing writes exactly so that you do not lose data, making
> sure the data is protected until you can get one or two hosts back up
> again into the cluster. If you need to keep being able to write to the
> cluster with two hosts down, you need EC to be X+3 with min_size =
> X+1, this way it will still allow writes when two hosts are down.
>
> --
> May the most significant bit of your life be positive.
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [CEPH] Ceph multi nodes failed

2023-11-24 Thread Janne Johansson
Den fre 24 nov. 2023 kl 08:53 skrev Nguyễn Hữu Khôi :
>
> Hello.
> I have 10 nodes. My goal is to ensure that I won't lose data if 2 nodes
> fail.

Now you are mixing terms here.

There is a difference between "cluster stops" and "losing data".

If you have EC 8+2 and min_size 9, then when you stop two hosts, ceph
stops allowing writes exactly so that you do not lose data, making
sure the data is protected until you can get one or two hosts back up
again into the cluster. If you need to keep being able to write to the
cluster with two hosts down, you need EC to be X+3 with min_size =
X+1, this way it will still allow writes when two hosts are down.

-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [CEPH] Ceph multi nodes failed

2023-11-23 Thread Nguyễn Hữu Khôi
Hello.
I have 10 nodes. My goal is to ensure that I won't lose data if 2 nodes
fail.
Nguyen Huu Khoi


On Fri, Nov 24, 2023 at 2:47 PM Etienne Menguy 
wrote:

> Hello,
>
> How many nodes do you have?
>
> > -Original Message-
> > From: Nguyễn Hữu Khôi 
> > Sent: vendredi 24 novembre 2023 07:42
> > To: ceph-users@ceph.io
> > Subject: [ceph-users] [CEPH] Ceph multi nodes failed
> >
> > Hello guys.
> >
> > I see many docs and threads talking about osd failed. I have a question:
> > how many nodes in a cluster can be failed.
> >
> > I am using ec 8 + 2(10 osd nodes) and when I shutdown 2 nodes then my
> > cluster crashes, It cannot write anymore.
> >
> > Thank you. Regards
> >
> > Nguyen Huu Khoi
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an
> email to
> > ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [CEPH] Ceph multi nodes failed

2023-11-23 Thread Etienne Menguy
Hello,

How many nodes do you have? 

> -Original Message-
> From: Nguyễn Hữu Khôi 
> Sent: vendredi 24 novembre 2023 07:42
> To: ceph-users@ceph.io
> Subject: [ceph-users] [CEPH] Ceph multi nodes failed
> 
> Hello guys.
> 
> I see many docs and threads talking about osd failed. I have a question:
> how many nodes in a cluster can be failed.
> 
> I am using ec 8 + 2(10 osd nodes) and when I shutdown 2 nodes then my
> cluster crashes, It cannot write anymore.
> 
> Thank you. Regards
> 
> Nguyen Huu Khoi
> ___
> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to
> ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [CEPH] Ceph multi nodes failed

2023-11-23 Thread Nguyễn Hữu Khôi
Hello.
I am reading.
Thank you for  information.
Nguyen Huu Khoi


On Fri, Nov 24, 2023 at 1:56 PM Eugen Block  wrote:

> Hi,
>
> basically, with EC pools you usually have a min_size of k + 1 to
> prevent data loss. There was a thread about that just a few days ago
> on this list. So in your case your min_size is probably 9, which makes
> IO pause in case two chunks become unavailable. If your crush failure
> domain is host (seems like it is) and you have "only" 10 hosts I'd
> recommend to add a host if possible to be able to fully recover while
> one host is down. Otherwise the PGs stay degraded until the host comes
> back.
> So in your case your cluster can handle only one down host, e. g. for
> maintenance. If another host goes down (disk, network, whatever) you
> hit the min_size limit. Temporarily, you can set min_size = k but you
> should not risk anything and increase back to k + 1 after successful
> recovery. It's not possible to change the EC profile of a pool, you'd
> have to create a new pool and copy the data.
>
> Check out the EC docs [1] to have some more details.
>
> Regards,
> Eugen
>
> [1]
>
> https://docs.ceph.com/en/quincy/rados/operations/erasure-code/?highlight=k%2B1#erasure-coded-pool-recovery
>
> Zitat von Nguyễn Hữu Khôi :
>
> > Hello guys.
> >
> > I see many docs and threads talking about osd failed. I have a question:
> > how many nodes in a cluster can be failed.
> >
> > I am using ec 8 + 2(10 osd nodes) and when I shutdown 2 nodes then my
> > cluster crashes, It cannot write anymore.
> >
> > Thank you. Regards
> >
> > Nguyen Huu Khoi
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [CEPH] Ceph multi nodes failed

2023-11-23 Thread Eugen Block

Hi,

basically, with EC pools you usually have a min_size of k + 1 to  
prevent data loss. There was a thread about that just a few days ago  
on this list. So in your case your min_size is probably 9, which makes  
IO pause in case two chunks become unavailable. If your crush failure  
domain is host (seems like it is) and you have "only" 10 hosts I'd  
recommend to add a host if possible to be able to fully recover while  
one host is down. Otherwise the PGs stay degraded until the host comes  
back.
So in your case your cluster can handle only one down host, e. g. for  
maintenance. If another host goes down (disk, network, whatever) you  
hit the min_size limit. Temporarily, you can set min_size = k but you  
should not risk anything and increase back to k + 1 after successful  
recovery. It's not possible to change the EC profile of a pool, you'd  
have to create a new pool and copy the data.


Check out the EC docs [1] to have some more details.

Regards,
Eugen

[1]  
https://docs.ceph.com/en/quincy/rados/operations/erasure-code/?highlight=k%2B1#erasure-coded-pool-recovery


Zitat von Nguyễn Hữu Khôi :


Hello guys.

I see many docs and threads talking about osd failed. I have a question:
how many nodes in a cluster can be failed.

I am using ec 8 + 2(10 osd nodes) and when I shutdown 2 nodes then my
cluster crashes, It cannot write anymore.

Thank you. Regards

Nguyen Huu Khoi
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io