[ceph-users] RFI: Prometheus, Etc, Services - Optimum Number To Run

2024-01-19 Thread duluxoz
Hi All, In regards to the monitoring services on a Ceph Cluster (ie Prometheus, Grafana, Alertmanager, Loki, Node-Exported, Promtail, etc) how many instances should/can we run for fault tolerance purposes? I can't seem to recall that advice being in the doco anywhere (but of course, I

[ceph-users] Re: rbd map snapshot, mount lv, node crash

2024-01-19 Thread Ilya Dryomov
On Fri, Jan 19, 2024 at 2:38 PM Marc wrote: > > Am I doing something weird when I do on a ceph node (nautilus, el7): > > rbd snap ls vps-test -p rbd > rbd map vps-test@vps-test.snap1 -p rbd > > mount -o ro /dev/mapper/VGnew-LVnew /mnt/disk <--- reset/reboot ceph node Hi Marc, It's not clear

[ceph-users] Re: OSD read latency grows over time

2024-01-19 Thread Mark Nelson
HI Roman, The fact that changing the pg_num for the index pool drops the latency back down might be a clue.  Do you have a lot of deletes happening on this cluster?  If you have a lot of deletes and long pauses between writes, you could be accumulating tombstones that you have to keep

[ceph-users] Re: OSD read latency grows over time

2024-01-19 Thread Roman Pashin
Hi Stefan, Do you make use of a separate db partition as well? And if so, where is > it stored? > No, only WAL partition is on separate NVME partition. Not sure if ceph-ansible could install Ceph with db partition on separate device on v17.6.2 Do you only see latency increase in reads? And not

[ceph-users] Re: Performance impact of Heterogeneous environment

2024-01-19 Thread Mark Nelson
On 1/18/24 03:40, Frank Schilder wrote: For multi- vs. single-OSD per flash drive decision the following test might be useful: We found dramatic improvements using multiple OSDs per flash drive with octopus *if* the bottleneck is the kv_sync_thread. Apparently, each OSD has only one and

[ceph-users] rbd map snapshot, mount lv, node crash

2024-01-19 Thread Marc
Am I doing something weird when I do on a ceph node (nautilus, el7): rbd snap ls vps-test -p rbd rbd map vps-test@vps-test.snap1 -p rbd mount -o ro /dev/mapper/VGnew-LVnew /mnt/disk <--- reset/reboot ceph node ___ ceph-users mailing list --

[ceph-users] Re: OSD read latency grows over time

2024-01-19 Thread Roman Pashin
Hi Eugen, How is the data growth in your cluster? Is the pool size rather stable or > is it constantly growing? > Pool size is fairly constant with tiny up trend. It's growth doesn't correlate with increase of OSD read latency. I've combined pool usage with OSD read latency on one graph to

[ceph-users] Re: OSD read latency grows over time

2024-01-19 Thread Stefan Kooman
On 16-01-2024 11:22, Roman Pashin wrote: Hello Ceph users, we see strange issue on last recent Ceph installation v17.6.2. We store data on HDD pool, index pool is on SSD. Each OSD store its wal on NVME partition. Do you make use of a separate db partition as well? And if so, where is it

[ceph-users] Re: Cephadm orchestrator and special label _admin in 17.2.7

2024-01-19 Thread Eugen Block
Oh that does sound strange indeed. I don't have a good idea right now, hopefully someone from the dev team can shed some light on this. Zitat von Robert Sander : Hi, more strang behaviour: When I isssue "ceph mgr fail" a backup MGR takes over and updates all config files on all hosts

[ceph-users] Re: OSD read latency grows over time

2024-01-19 Thread Eugen Block
Hi, I checked two production clusters which don't use RGW too heavily, both on Pacific though. There's no latency increase visible there. How is the data growth in your cluster? Is the pool size rather stable or is it constantly growing? Thanks, Eugen Zitat von Roman Pashin : Hello

[ceph-users] Re: Cephadm orchestrator and special label _admin in 17.2.7

2024-01-19 Thread Robert Sander
Hi, more strang behaviour: When I isssue "ceph mgr fail" a backup MGR takes over and updates all config files on all hosts including /etc/ceph/ceph.conf. At first I thought that this was the solution but when I now remove the _admin label and add it again the new MGR also does not update

[ceph-users] Re: Keyring location for ceph-crash?

2024-01-19 Thread Jan Kasprzak
Hi Eugen, thanks for verifying this. I have created a tracker issue: https://tracker.ceph.com/issues/64102 -Yenya Eugen Block wrote: : Hi, : : I checked the behaviour on Octopus, Pacific and Quincy, I can : confirm. I don't have the time to dig deeper right now, but I'd : suggest to

[ceph-users] Degraded PGs on EC pool when marking an OSD out

2024-01-19 Thread Hector Martin
I'm having a bit of a weird issue with cluster rebalances with a new EC pool. I have a 3-machine cluster, each machine with 4 HDD OSDs (+1 SSD). Until now I've been using an erasure coded k=5 m=3 pool for most of my data. I've recently started to migrate to a k=5 m=4 pool, so I can configure the

[ceph-users] Re: Keyring location for ceph-crash?

2024-01-19 Thread Eugen Block
Hi, I checked the behaviour on Octopus, Pacific and Quincy, I can confirm. I don't have the time to dig deeper right now, but I'd suggest to open a tracker issue. Thanks, Eugen Zitat von Jan Kasprzak : Hello, Ceph users, what is the correct location of keyring for ceph-crash? I tried