[ceph-users] Re: [CEPH] Ceph multi nodes failed

2023-11-23 Thread Nguyễn Hữu Khôi
Hello.
I have 10 nodes. My goal is to ensure that I won't lose data if 2 nodes
fail.
Nguyen Huu Khoi


On Fri, Nov 24, 2023 at 2:47 PM Etienne Menguy 
wrote:

> Hello,
>
> How many nodes do you have?
>
> > -Original Message-
> > From: Nguyễn Hữu Khôi 
> > Sent: vendredi 24 novembre 2023 07:42
> > To: ceph-users@ceph.io
> > Subject: [ceph-users] [CEPH] Ceph multi nodes failed
> >
> > Hello guys.
> >
> > I see many docs and threads talking about osd failed. I have a question:
> > how many nodes in a cluster can be failed.
> >
> > I am using ec 8 + 2(10 osd nodes) and when I shutdown 2 nodes then my
> > cluster crashes, It cannot write anymore.
> >
> > Thank you. Regards
> >
> > Nguyen Huu Khoi
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an
> email to
> > ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [CEPH] Ceph multi nodes failed

2023-11-23 Thread Etienne Menguy
Hello,

How many nodes do you have? 

> -Original Message-
> From: Nguyễn Hữu Khôi 
> Sent: vendredi 24 novembre 2023 07:42
> To: ceph-users@ceph.io
> Subject: [ceph-users] [CEPH] Ceph multi nodes failed
> 
> Hello guys.
> 
> I see many docs and threads talking about osd failed. I have a question:
> how many nodes in a cluster can be failed.
> 
> I am using ec 8 + 2(10 osd nodes) and when I shutdown 2 nodes then my
> cluster crashes, It cannot write anymore.
> 
> Thank you. Regards
> 
> Nguyen Huu Khoi
> ___
> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to
> ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [CEPH] Ceph multi nodes failed

2023-11-23 Thread Nguyễn Hữu Khôi
Hello.
I am reading.
Thank you for  information.
Nguyen Huu Khoi


On Fri, Nov 24, 2023 at 1:56 PM Eugen Block  wrote:

> Hi,
>
> basically, with EC pools you usually have a min_size of k + 1 to
> prevent data loss. There was a thread about that just a few days ago
> on this list. So in your case your min_size is probably 9, which makes
> IO pause in case two chunks become unavailable. If your crush failure
> domain is host (seems like it is) and you have "only" 10 hosts I'd
> recommend to add a host if possible to be able to fully recover while
> one host is down. Otherwise the PGs stay degraded until the host comes
> back.
> So in your case your cluster can handle only one down host, e. g. for
> maintenance. If another host goes down (disk, network, whatever) you
> hit the min_size limit. Temporarily, you can set min_size = k but you
> should not risk anything and increase back to k + 1 after successful
> recovery. It's not possible to change the EC profile of a pool, you'd
> have to create a new pool and copy the data.
>
> Check out the EC docs [1] to have some more details.
>
> Regards,
> Eugen
>
> [1]
>
> https://docs.ceph.com/en/quincy/rados/operations/erasure-code/?highlight=k%2B1#erasure-coded-pool-recovery
>
> Zitat von Nguyễn Hữu Khôi :
>
> > Hello guys.
> >
> > I see many docs and threads talking about osd failed. I have a question:
> > how many nodes in a cluster can be failed.
> >
> > I am using ec 8 + 2(10 osd nodes) and when I shutdown 2 nodes then my
> > cluster crashes, It cannot write anymore.
> >
> > Thank you. Regards
> >
> > Nguyen Huu Khoi
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CephFS - MDS removed from map - filesystem keeps to be stopped

2023-11-23 Thread Eugen Block

Hi,

I don't have an idea yet why that happens, but could you increase the  
debug level to see why it stops? What is the current ceph status?


Thanks,
Eugen

Zitat von Denis Polom :


Hi

running Ceph Pacific 16.2.13.

we had full CephFS filesystem and after adding new HW we tried to  
start it but our MDS daemons are pushed to be standby and are  
removed from MDS map.


Filesystem was broken, so we repaired it with:

# ceph fs fail cephfs

# cephfs-journal-tool --rank=cephfs:0 event recover_dentries summary

# cephfs-journal-tool --rank=cephfs:0 journal reset

then I started ceph-mds service

and marked rank as repaired

mds after some time has switched to standby. Log is bellow.

I would appreciate any help to resolve this situation. Thank you.

from log:

2023-11-22T14:11:49.212+0100 7f5dc155e700  1 mds.0.9604  
handle_mds_map i am now mds.0.9604
2023-11-22T14:11:49.212+0100 7f5dc155e700  1 mds.0.9604  
handle_mds_map state change up:rejoin --> up:active
2023-11-22T14:11:49.212+0100 7f5dc155e700  1 mds.0.9604  
recovery_done -- successful recovery!

2023-11-22T14:11:49.212+0100 7f5dc155e700  1 mds.0.9604 active_start
2023-11-22T14:11:49.216+0100 7f5dc155e700  1 mds.0.9604 cluster recovered.
2023-11-22T14:11:49.216+0100 7f5dc3d63700  0 --1-  
[v2:10.245.4.103:6800/1548097835,v1:10.245.4.103:6801/1548097835] >>  
v1:10.245.8.127:0/2123529386 conn(0x55a60627a800 0x55a606e5b000  
:6801 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).
handle_connect_message_2 accept peer reset, then tried to connect to  
us, replacing
2023-11-22T14:11:49.216+0100 7f5dc4564700  0 --1-  
[v2:10.245.4.103:6800/1548097835,v1:10.245.4.103:6801/1548097835] >>  
v1:10.245.6.88:0/1899426587 conn(0x55a60627ac00 0x55a6070d :6801  
s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).h
andle_connect_message_2 accept peer reset, then tried to connect to  
us, replacing
2023-11-22T14:11:49.216+0100 7f5dc4564700  0 --1-  
[v2:10.245.4.103:6800/1548097835,v1:10.245.4.103:6801/1548097835] >>  
v1:10.245.4.216:0/2058542052 conn(0x55a6070c9800 0x55a6070d1800  
:6801 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).
handle_connect_message_2 accept peer reset, then tried to connect to  
us, replacing
2023-11-22T14:11:49.216+0100 7f5dc3d63700  0 --1-  
[v2:10.245.4.103:6800/1548097835,v1:10.245.4.103:6801/1548097835] >>  
v1:10.245.4.220:0/1549374180 conn(0x55a60708d000 0x55a6070d0800  
:6801 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).
handle_connect_message_2 accept peer reset, then tried to connect to  
us, replacing
2023-11-22T14:11:49.216+0100 7f5dc4d65700  0 --1-  
[v2:10.245.4.103:6800/1548097835,v1:10.245.4.103:6801/1548097835] >>  
v1:10.245.8.180:0/270666178 conn(0x55a60703a000 0x55a6070cf800 :6801  
s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).h
andle_connect_message_2 accept peer reset, then tried to connect to  
us, replacing
2023-11-22T14:11:49.216+0100 7f5dc4d65700  0 --1-  
[v2:10.245.4.103:6800/1548097835,v1:10.245.4.103:6801/1548097835] >>  
v1:10.245.8.178:0/3673271488 conn(0x55a6070c9400 0x55a6070d1000  
:6801 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).
handle_connect_message_2 accept peer reset, then tried to connect to  
us, replacing
2023-11-22T14:11:49.216+0100 7f5dc4d65700  0 --1-  
[v2:10.245.4.103:6800/1548097835,v1:10.245.4.103:6801/1548097835] >>  
v1:10.245.4.167:0/2667964940 conn(0x55a6070c9c00 0x55a607112000  
:6801 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).
handle_connect_message_2 accept peer reset, then tried to connect to  
us, replacing
2023-11-22T14:11:49.216+0100 7f5dc3d63700  0 --1-  
[v2:10.245.4.103:6800/1548097835,v1:10.245.4.103:6801/1548097835] >>  
v1:10.245.6.70:0/3181830075 conn(0x55a607116000 0x55a607112800 :6801  
s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).h
andle_connect_message_2 accept peer reset, then tried to connect to  
us, replacing
2023-11-22T14:11:49.216+0100 7f5dc4564700  0 --1-  
[v2:10.245.4.103:6800/1548097835,v1:10.245.4.103:6801/1548097835] >>  
v1:10.245.6.72:0/3744737352 conn(0x55a60627a800 0x55a606e5b000 :6801  
s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0).h
andle_connect_message_2 accept peer reset, then tried to connect to  
us, replacing
2023-11-22T14:11:49.216+0100 7f5dc3d63700  0 --1-  
[v2:10.245.4.103:6800/1548097835,v1:10.245.4.103:6801/1548097835] >>  
v1:10.244.18.140:0/1607447464 conn(0x55a60627ac00 0x55a6070d  
:6801 s=ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=0)
.handle_connect_message_2 accept peer reset, then tried to connect  
to us, replacing
2023-11-22T14:11:49.220+0100 7f5dc155e700  1 mds.mds1 Updating MDS  
map to version 9608 from mon.1
2023-11-22T14:11:49.220+0100 7f5dc155e700  1 mds.0.9604  
handle_mds_map i am now mds.0.9604
2023-11-22T14:11:49.220+0100 7f5dc155e700  1 mds.0.9604  
handle_mds_map state change up:active --> up:stopping
2023-11-22T14:11:52.412+0100 7f5dc3562700  1 mds.mds1 asok_command:  
client ls {prefix=client ls} (starting...)
2023-11-22T14:11:57.412+0100 7f5dc3562700  1 mds.mds1 

[ceph-users] Re: [CEPH] Ceph multi nodes failed

2023-11-23 Thread Eugen Block

Hi,

basically, with EC pools you usually have a min_size of k + 1 to  
prevent data loss. There was a thread about that just a few days ago  
on this list. So in your case your min_size is probably 9, which makes  
IO pause in case two chunks become unavailable. If your crush failure  
domain is host (seems like it is) and you have "only" 10 hosts I'd  
recommend to add a host if possible to be able to fully recover while  
one host is down. Otherwise the PGs stay degraded until the host comes  
back.
So in your case your cluster can handle only one down host, e. g. for  
maintenance. If another host goes down (disk, network, whatever) you  
hit the min_size limit. Temporarily, you can set min_size = k but you  
should not risk anything and increase back to k + 1 after successful  
recovery. It's not possible to change the EC profile of a pool, you'd  
have to create a new pool and copy the data.


Check out the EC docs [1] to have some more details.

Regards,
Eugen

[1]  
https://docs.ceph.com/en/quincy/rados/operations/erasure-code/?highlight=k%2B1#erasure-coded-pool-recovery


Zitat von Nguyễn Hữu Khôi :


Hello guys.

I see many docs and threads talking about osd failed. I have a question:
how many nodes in a cluster can be failed.

I am using ec 8 + 2(10 osd nodes) and when I shutdown 2 nodes then my
cluster crashes, It cannot write anymore.

Thank you. Regards

Nguyen Huu Khoi
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] [CEPH] Ceph multi nodes failed

2023-11-23 Thread Nguyễn Hữu Khôi
Hello guys.

I see many docs and threads talking about osd failed. I have a question:
how many nodes in a cluster can be failed.

I am using ec 8 + 2(10 osd nodes) and when I shutdown 2 nodes then my
cluster crashes, It cannot write anymore.

Thank you. Regards

Nguyen Huu Khoi
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Object size

2023-11-23 Thread Miroslav Svoboda

Hi,

please, is it better to reduce the default object size from 4MB to some 
smaller value for the rbd image where there will be a lot of small mail 
and webhosting files?


Thanks

Svoboda Miroslav
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Rook-Ceph OSD Deployment Error

2023-11-23 Thread P Wagner-Beccard
Hi Mailing-Lister's,

I am reaching out for assistance regarding a deployment issue I am facing
with Ceph on a 4 node RKE2 cluster. We are attempting to deploy Ceph via
the rook helm chart, but we are encountering an issue that apparently seems
related to a known bug (https://tracker.ceph.com/issues/61597).

During the OSD preparation phase, the deployment consistently fails with an
IndexError: list index out of range. The logs indicate a problem occurs
when configuring new Disks, specifically using /dev/dm-3 as a metadata
device. It's important to note that /dev/dm-3 is an LVM on top of an mdadm
RAID, which might or might not be contributing to this issue. (I swear,
this setup worked already)

Here is a snippet of the error from the deployment logs:
> 2023-11-23 23:11:30.196913 D | exec: IndexError: list index out of range
> 2023-11-23 23:11:30.236962 C | rookcmd: failed to configure devices:
failed to initialize osd: failed ceph-volume report: exit status 1
https://paste.openstack.org/show/bileqRFKbolrBlTqszmC/

We have attempted different configurations, including specifying devices
explicitly and using the useAllDevices: true option with a specified
metadata device (/dev/dm-3 or the /dev/pv_md0/lv_md0 path). However, the
issue persists across multiple configurations.

tested configurations are as follows:

Explicit device specification:

```yaml
nodes:
  - name: "ceph01.maas"
devices:
  - name: /dev/dm-1
  - name: /dev/dm-2
  - name: "sdb"
config:
  metadataDevice: "/dev/dm-3"
  - name: "sdc"
config:
  metadataDevice: "/dev/dm-3"
```

General device specification with metadata device:
```yaml
storage:
  useAllNodes: true
  useAllDevices: true
  config:
metadataDevice: /dev/dm-3
```

I would greatly appreciate any insights or recommendations on how to
proceed or work around this issue.
Is there a halfway decent way to apply the fix or maybe a workaround that
we can apply to successfully deploy Ceph in our environment?

Kind regards,
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CLT Meeting minutes 2023-11-23

2023-11-23 Thread Igor Fedotov

[1] was closed by mistake. Reopened.


On 11/23/2023 7:18 PM, Konstantin Shalygin wrote:

Hi,


On Nov 23, 2023, at 16:10, Nizamudeen A  wrote:

RCs for reef, quincy and pacific
  for next week when there is more time to discuss


Just little noise: pacific is ready? 16.2.15 should be last release 
(at least that was the last plan), but [1] still not merged. Why now 
ticket is closed - I don't know


Also many users reports about OOM in 16.2.14 release: patch also 
should be merged to main first [2]




Thanks,
k

[1] https://tracker.ceph.com/issues/62815
[2] https://tracker.ceph.com/issues/59580


--
Igor Fedotov
Ceph Lead Developer

Looking for help with your Ceph cluster? Contact us athttps://croit.io

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web:https://croit.io  | YouTube:https://goo.gl/PGE1Bx
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CLT Meeting minutes 2023-11-23

2023-11-23 Thread Konstantin Shalygin
Hi,

> On Nov 23, 2023, at 16:10, Nizamudeen A  wrote:
> 
> RCs for reef, quincy and pacific
>   for next week when there is more time to discuss

Just little noise: pacific is ready? 16.2.15 should be last release (at least 
that was the last plan), but [1] still not merged. Why now ticket is closed - I 
don't know

Also many users reports about OOM in 16.2.14 release: patch also should be 
merged to main first [2]



Thanks,
k

[1] https://tracker.ceph.com/issues/62815
[2] https://tracker.ceph.com/issues/59580
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Erasure vs replica

2023-11-23 Thread Janne Johansson
> Now 25 years later lot of people recommend to use replica so if I buy XTo
> I'm only going to have X/3 To (vs raidz2 where I loose 2 disks over 9-12
> disks).

As seen from other answers, it changes which performance and space
usage you want go have, but there are other factors too. Replica = 3
(or larger) will be as fast when one (or more) of the replicas is
missing, but EC needs to start to do math on the data from one or more
of the checksum shards before it can return data to the client, so
they degrade differently also.
It is never as easy as "one size fits all", the tradeoffs have
multiple dimensions.

-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephadm vs ceph.conf

2023-11-23 Thread Zakhar Kirpichenko
Hi,

Please note that there are cases where the use of ceph.conf inside a
container is justified. For example, I was unable to set monitor's
mon_rocksdb_options by any means except for providing them in monitor's own
ceph.conf within the container, all other attempts to pass this settings
were ignored by the monitor.

/Z

On Thu, 23 Nov 2023 at 16:53, Albert Shih  wrote:

> Le 23/11/2023 à 15:35:25+0100, Michel Jouvin a écrit
> Hi,
>
> >
> > You should never edit any file in the containers, cephadm takes care of
> it.
> > Most of the parameters described in the doc you mentioned are better
> managed
> > with "ceph config" command in the Ceph configuration database. If you
> want
> > to run the ceph commnand on a Ceph machine outside a container, you can
> add
>
> Ok. Of course I'll not touch anything inside any container, I juste check
> the overlay to see if the container use this file.
>
> It's just I see in lot of place in the documentation some configuration
> to put in the /etc/ceph/ceph.conf
>
> > the label _admin to your host in "ceph orch host" so that cephadm takes
> care
> > of maintaining your /etc/ceph.conf (outside the container).
>
> Ok. I'm indeed using ceph orch & Cie.
>
> Thanks.
>
> Regards.
>
> JAS
> --
> Albert SHIH 嶺 
> Observatoire de Paris
> France
> Heure locale/Local time:
> jeu. 23 nov. 2023 15:48:36 CET
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephadm vs ceph.conf

2023-11-23 Thread Albert Shih
Le 23/11/2023 à 15:35:25+0100, Michel Jouvin a écrit
Hi, 

> 
> You should never edit any file in the containers, cephadm takes care of it.
> Most of the parameters described in the doc you mentioned are better managed
> with "ceph config" command in the Ceph configuration database. If you want
> to run the ceph commnand on a Ceph machine outside a container, you can add

Ok. Of course I'll not touch anything inside any container, I juste check
the overlay to see if the container use this file. 

It's just I see in lot of place in the documentation some configuration
to put in the /etc/ceph/ceph.conf 

> the label _admin to your host in "ceph orch host" so that cephadm takes care
> of maintaining your /etc/ceph.conf (outside the container).

Ok. I'm indeed using ceph orch & Cie. 

Thanks. 

Regards.

JAS
-- 
Albert SHIH 嶺 
Observatoire de Paris
France
Heure locale/Local time:
jeu. 23 nov. 2023 15:48:36 CET
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephadm vs ceph.conf

2023-11-23 Thread Anthony D'Atri

>>> 
>>> Should I modify the ceph.conf (vi/emacs) directly ?
>> 
>> vi is never the answer.
> 
> WTF ? You break my dream ;-) ;-)

Let line editors die.

>> 
> You're right. 
> 
> Currently I'm testing 
> 
>  17.2.7 quincy.
> 
> So in my daily life how I would know if I should use ceph config or
> modify with vi/emacs/whatever /etc/ceph/ceph.conf ?

Use central “ceph config” unless you have an explicit reason to use ceph.conf.  
Before we had central config, maintaining ceph.conf properly across clusters 
could be rather a pain.

There is a precedence to option values, which can be specified on the 
commandline, central config, ceph.conf, or via injection.  I don’t recall the 
order.  If you stick with the central config you mostly don’t need to worry 
unless you change your monitor nodes.

> 
>> 
>>> And what about the future ?
>> 
>> We are all interested in the future, for that is where you and I are going 
>> to spend the rest of our lives.
> 
> Correct ;-)

https://www.youtube.com/watch?v=qsb74pW7goU
Plan 9 from Outer Space (1957) Tor Johnson, Vampira, Tom Keene, Bela Lugosi | 
Full Movie, subtitles
youtube.com

> 
> Sorry my question was not very clear. My question was in fact in which
> way we headed. But I'm guessing the answer is “ceph config” or something
> like that. 
> 
> Thanks.
> 
> Regards
> 
> -- 
> Albert SHIH 嶺 
> Observatoire de Paris
> France
> Heure locale/Local time:
> jeu. 23 nov. 2023 15:37:04 CET

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephadm vs ceph.conf

2023-11-23 Thread Albert Shih
Le 23/11/2023 à 09:32:49-0500, Anthony D'Atri a écrit
> 

Hi, 

Thanks for your answer. 

> 
> > to change something in the /etc/ceph/ceph.conf.
> 
> Central config was introduced with Mimic.  Since both central config and 
> ceph.conf work and are supported, explicitly mentioning both in the docs 
> every time is a lot of work (and awkward).  One day we’ll sort out an 
> effective means to generalize throughout the docs.
> 

Ok. I perfectly understand the «nightmare» to maintained a up2date
documentation. 

> > How that is taking account by cephadm ? I see in the docker container they
> > have a overlay for /etc/ceph/ceph.conf. 
> > 
> > Should I modify the ceph.conf (vi/emacs) directly ?
> 
> vi is never the answer.

WTF ? You break my dream ;-) ;-)

Ok. 
> 
> > Should I modify the ceph.conf (vi/emacs) directly and restart something ? 
> > Should I use some cephadm shell and don't manually touche ceph.conf ?
> 
> Depending on the option in question, you may be able to just use “ceph 
> config” to set it centrally.  When asking questions here, it helps a LOT if 
> you share the Ceph release you’re running and more context.
> 

You're right. 

Currently I'm testing 

  17.2.7 quincy.

So in my daily life how I would know if I should use ceph config or
modify with vi/emacs/whatever /etc/ceph/ceph.conf ? 

> 
> > And what about the future ?
> 
> We are all interested in the future, for that is where you and I are going to 
> spend the rest of our lives.

Correct ;-)

Sorry my question was not very clear. My question was in fact in which
way we headed. But I'm guessing the answer is “ceph config” or something
like that. 

Thanks.

Regards

-- 
Albert SHIH 嶺 
Observatoire de Paris
France
Heure locale/Local time:
jeu. 23 nov. 2023 15:37:04 CET
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephadm vs ceph.conf

2023-11-23 Thread Michel Jouvin

Hi Albert,

You should never edit any file in the containers, cephadm takes care of 
it. Most of the parameters described in the doc you mentioned are better 
managed with "ceph config" command in the Ceph configuration database. 
If you want to run the ceph commnand on a Ceph machine outside a 
container, you can add the label _admin to your host in "ceph orch host" 
so that cephadm takes care of maintaining your /etc/ceph.conf (outside 
the container).


Cheers,

Michel

Le 23/11/2023 à 15:28, Albert Shih a écrit :

Hi everyone.

Still me with my newbie questionsorry.

I'm using cephadm to deploy my ceph cluster, but when I search in the
documentation «docs.ceph.com» I see in some place like

   https://docs.ceph.com/en/latest/rados/configuration/pool-pg-config-ref/

to change something in the /etc/ceph/ceph.conf.

How that is taking account by cephadm ? I see in the docker container they
have a overlay for /etc/ceph/ceph.conf.

Should I modify the ceph.conf (vi/emacs) directly ?
Should I modify the ceph.conf (vi/emacs) directly and restart something ?
Should I use some cephadm shell and don't manually touche ceph.conf ?

And what about the future ?

Regards.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephadm vs ceph.conf

2023-11-23 Thread Anthony D'Atri


> to change something in the /etc/ceph/ceph.conf.

Central config was introduced with Mimic.  Since both central config and 
ceph.conf work and are supported, explicitly mentioning both in the docs every 
time is a lot of work (and awkward).  One day we’ll sort out an effective means 
to generalize throughout the docs.

> How that is taking account by cephadm ? I see in the docker container they
> have a overlay for /etc/ceph/ceph.conf. 
> 
> Should I modify the ceph.conf (vi/emacs) directly ?

vi is never the answer.

> Should I modify the ceph.conf (vi/emacs) directly and restart something ? 
> Should I use some cephadm shell and don't manually touche ceph.conf ?

Depending on the option in question, you may be able to just use “ceph config” 
to set it centrally.  When asking questions here, it helps a LOT if you share 
the Ceph release you’re running and more context.


> And what about the future ?

We are all interested in the future, for that is where you and I are going to 
spend the rest of our lives.

> 
> Regards. 
> -- 
> Albert SHIH 嶺 
> France
> Heure locale/Local time:
> jeu. 23 nov. 2023 15:21:47 CET
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] cephadm vs ceph.conf

2023-11-23 Thread Albert Shih
Hi everyone. 

Still me with my newbie questionsorry. 

I'm using cephadm to deploy my ceph cluster, but when I search in the
documentation «docs.ceph.com» I see in some place like 

  https://docs.ceph.com/en/latest/rados/configuration/pool-pg-config-ref/

to change something in the /etc/ceph/ceph.conf.

How that is taking account by cephadm ? I see in the docker container they
have a overlay for /etc/ceph/ceph.conf. 

Should I modify the ceph.conf (vi/emacs) directly ? 
Should I modify the ceph.conf (vi/emacs) directly and restart something ? 
Should I use some cephadm shell and don't manually touche ceph.conf ? 

And what about the future ? 

Regards. 
-- 
Albert SHIH 嶺 
France
Heure locale/Local time:
jeu. 23 nov. 2023 15:21:47 CET
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Erasure vs replica

2023-11-23 Thread Anthony D'Atri
Yes, lots of people are using EC.

Which is more “reliable” depends on what you need.  If you need to survive 4 
failures, there are scenarios where RF=3 won’t do it for you.

You could in such a case use an EC 4,4 profile, 8,4, etc.  

It’s a tradeoff between write speed and raw::usable ratio efficiency.  Which do 
you need more?

Depending on your data, EC may increase space amp significantly.

With RF=2 I have seen overlapping failures in my career.  The larger the 
deployment, the greater the chance of overlapping failures.  In a distributed 
system especially, RF=2 or EC with m=1 are bad ideas if your data is valuable.

> On Nov 23, 2023, at 8:58 AM, Albert Shih  wrote:
> 
> Hi everyone,
> 
> I just like to know what's your opinion about the reliability of erasure
> coding.
> 
> Of course I can understand if we want the «best of the best of the best»
> ;-) I can choose the replica method. 
> 
> I heard in many location “replica” are more reliable, “replica” are more
> efficient etc...
> 
> Yes...well since 25 years I'm using raid (5, 6, lvm, raidz1, raidz2, etc.)
> I never loose data only once when a firmware bug in some x card crash
> the raid volume.
> 
> Now 25 years later lot of people recommend to use replica so if I buy XTo
> I'm only going to have X/3 To (vs raidz2 where I loose 2 disks over 9-12
> disks). 
> 
> So my question are : Anyone use in large scale erasure coding for critical
> (same level as raidz1/raid5 ou raidz2/raid6) ? 
> 
> Regards
> -- 
> Albert SHIH 嶺 
> Observatoire de Paris
> France
> Heure locale/Local time:
> jeu. 23 nov. 2023 14:51:28 CET
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Erasure vs replica

2023-11-23 Thread Nino Kotur
I use EC 4+2 on backup backup site, production site is running replica3,

running 8 servers on backup side and 12 on production side

number of OSDs per server is 16 on all of them


Production has lacp bonded 25G networking for public and cluster network

backup has just 10G networking with no redundancy






On Thu, Nov 23, 2023 at 2:58 PM Albert Shih  wrote:

> Hi everyone,
>
> I just like to know what's your opinion about the reliability of erasure
> coding.
>
> Of course I can understand if we want the «best of the best of the best»
> ;-) I can choose the replica method.
>
> I heard in many location “replica” are more reliable, “replica” are more
> efficient etc...
>
> Yes...well since 25 years I'm using raid (5, 6, lvm, raidz1, raidz2, etc.)
> I never loose data only once when a firmware bug in some x card crash
> the raid volume.
>
> Now 25 years later lot of people recommend to use replica so if I buy XTo
> I'm only going to have X/3 To (vs raidz2 where I loose 2 disks over 9-12
> disks).
>
> So my question are : Anyone use in large scale erasure coding for critical
> (same level as raidz1/raid5 ou raidz2/raid6) ?
>
> Regards
> --
> Albert SHIH 嶺 
> Observatoire de Paris
> France
> Heure locale/Local time:
> jeu. 23 nov. 2023 14:51:28 CET
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Erasure vs replica

2023-11-23 Thread Albert Shih
Hi everyone,

I just like to know what's your opinion about the reliability of erasure
coding.

Of course I can understand if we want the «best of the best of the best»
;-) I can choose the replica method. 

I heard in many location “replica” are more reliable, “replica” are more
efficient etc...

Yes...well since 25 years I'm using raid (5, 6, lvm, raidz1, raidz2, etc.)
I never loose data only once when a firmware bug in some x card crash
the raid volume.

Now 25 years later lot of people recommend to use replica so if I buy XTo
I'm only going to have X/3 To (vs raidz2 where I loose 2 disks over 9-12
disks). 

So my question are : Anyone use in large scale erasure coding for critical
(same level as raidz1/raid5 ou raidz2/raid6) ? 

Regards
-- 
Albert SHIH 嶺 
Observatoire de Paris
France
Heure locale/Local time:
jeu. 23 nov. 2023 14:51:28 CET
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] CLT Meeting minutes 2023-11-23

2023-11-23 Thread Nizamudeen A
Hello,

etherpad history lost
   need a way to recover from DB or find another way to back things up

discuss the quincy/dashboard-v3 backports? was tabled from 11/1
   https://github.com/ceph/ceph/pull/54252
   agreement is to not backport breaking features to stable branches.

18.2.1
   LRC upgrade affected by https://tracker.ceph.com/issues/62570#note-4
https://ceph-storage.slack.com/archives/C1HFJ4VTN/p1700575544548809
  Figure out the reproducer and add tests

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ce0d5bd3a6c176f9a3bf867624a07119dd4d0878
is the trigger on the kernel client side?
Clients with that patch should work with the server-side code that
broke older ones
Suggestion to introduce a matrix of pre-built "older" kernels into the fs
suite

gibba vs LRC upgrades
LRC shouldn't be updated often, it should act more like a production
cluster

RCs for reef, quincy and pacific
   for next week when there is more time to discuss

Regards,
-- 

Nizamudeen A

Software Engineer

Red Hat 

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io