[ceph-users] Re: Trouble getting cephadm to deploy iSCSI gateway

2022-05-17 Thread Wesley Dillingham
Well I dont use either the dashboard or the cephadm/containerized deployment but do use ceph-iscsi. The fact that your two gateways are not "up" might indicate that they havent been added to the target IQN yet. Once you can get into gwcli and create an iqn and associate your gateways with it, I

[ceph-users] osd_disk_thread_ioprio_class deprecated?

2022-05-17 Thread Richard Bade
Hi Everyone, I've been going through our config trying to remove settings that are nolonger relevant or which are now the default setting. The osd_disk_thread_ioprio_class and osd_disk_thread_ioprio_priority settings come up a few times in the mailing list but nolonger appear in the ceph

[ceph-users] Options for RADOS client-side write latency monitoring

2022-05-17 Thread jules
Greetings, all.    I'm attempting to introduce client-side RADOS write latency monitoring on a (rook) Ceph cluster. The use case is a mixture of containers, serving file and database workloads (although my question my applies more broadly.)   The aim here is to measure the average write latency as

[ceph-users] Re: DM-Cache for spinning OSDs

2022-05-17 Thread Richard Bade
Hey Felix, I run bcache pretty much in the way you're describing, but we have smaller spinning disks (4TB). We mostly share a 1TB NVMe between 6x osd's with 33GB db/wal per osd and the rest shared bcache cache. The performance is definitely improved over not running cache. We run this mostly for

[ceph-users] Re: Migration Nautilus to Pacifi : Very high latencies (EC profile)

2022-05-17 Thread David Orman
We don't have any that wouldn't have the problem. That said, we've already got a PR out for the 16.2.8 issue we encountered, so I would expect a relatively quick update assuming no issues are found during testing. On Tue, May 17, 2022 at 1:21 PM Wesley Dillingham wrote: > What was the largest

[ceph-users] Re: Migration Nautilus to Pacifi : Very high latencies (EC profile)

2022-05-17 Thread Wesley Dillingham
What was the largest cluster that you upgraded that didn't exhibit the new issue in 16.2.8 ? Thanks. Respectfully, *Wes Dillingham* w...@wesdillingham.com LinkedIn On Tue, May 17, 2022 at 10:24 AM David Orman wrote: > We had an issue with our

[ceph-users] Re: v16.2.8 Pacific released

2022-05-17 Thread Neha Ojha
Hi Cory, Thanks for identifying the bug and creating a PR to fix it. We'll do a retrospective on this issue to catch and avoid such regressions in the future. At the moment, we will go ahead with a minimal 16.2.9 release for this issue. Thanks, Neha On Tue, May 17, 2022 at 5:03 AM Cory Snyder

[ceph-users] Trouble getting cephadm to deploy iSCSI gateway

2022-05-17 Thread Erik Andersen
I am attempting to set up a 3 node Ceph cluster using Ubuntu server 22.04LTS, and the Cehpadm deployment tool. 3 times I've succeeded in setting up ceph itself, getting the cluster healthy, and OSDs all set up. The nodes (all monitors) are at 192.168.122.3, 192.168.122.4, and 192.168.122.5.

[ceph-users] Re: Stretch cluster questions

2022-05-17 Thread Frank Schilder
Hi Gregory, thanks for the clarification. > I'm not quite clear where the confusion is coming from here ... Its because sometimes an important statement needs to be repeated in a way that emphasizes what it is really about: >> Is this because the following: "the OSDs will only take PGs active

[ceph-users] Re: Migration Nautilus to Pacifi : Very high latencies (EC profile)

2022-05-17 Thread David Orman
We had an issue with our original fix in 45963 which was resolved in https://github.com/ceph/ceph/pull/46096. It includes the fix as well as handling for upgraded clusters. This is in the 16.2.8 release. I'm not sure if it will resolve your problem (or help mitigate it) but it would be worth

[ceph-users] Re: Reasonable MDS rejoin time?

2022-05-17 Thread Felix Lee
Hi, Dan, > In our experience this can take ~10 minutes on the most active clusters. Many thanks, this information is quite helpful for us. > this ML how to remove the "openfiles" objects to get out of that Yes, I read that mail threads as well. In fact, it was indeed my next move after setting

[ceph-users] Re: v16.2.8 Pacific released

2022-05-17 Thread Cory Snyder
Yep, sorry about that. Thanks for the correction, Dan! On Tue, May 17, 2022 at 7:44 AM Dan van der Ster wrote: > On Tue, May 17, 2022 at 1:14 PM Cory Snyder wrote: > > > > Hi all, > > > > Unfortunately, we experienced some issues with the upgrade to 16.2.8 > > on one of our larger clusters.

[ceph-users] Re: v16.2.8 Pacific released

2022-05-17 Thread Dan van der Ster
On Tue, May 17, 2022 at 1:14 PM Cory Snyder wrote: > > Hi all, > > Unfortunately, we experienced some issues with the upgrade to 16.2.8 > on one of our larger clusters. Within a few hours of the upgrade, all > 5 of our managers had become unavailable. We found that they were all > deadlocked due

[ceph-users] Re: Reasonable MDS rejoin time?

2022-05-17 Thread Dan van der Ster
Hi Felix, "rejoin" took awhile in the past because the MDS needs to reload all inodes for all the open directories at boot time. In our experience this can take ~10 minutes on the most active clusters. In your case, I wonder if the MDS was going OOM in a loop while recovering? This was happening

[ceph-users] Best practices in regards to OSD’s?

2022-05-17 Thread Angelo Höngens
I’m a Ceph newbie in the planning phase for a first Ceph cluster. (7 osd nodes, each with separate boot disks, one nvme and 12x16TB spinner, intend to run rbd only with 4:2 ec, storing a lot of data, but low iops requirements). I really want encryption-at-rest. I guess I’ll be going with Pacific.

[ceph-users] Re: v16.2.8 Pacific released

2022-05-17 Thread Cory Snyder
Hi all, Unfortunately, we experienced some issues with the upgrade to 16.2.8 on one of our larger clusters. Within a few hours of the upgrade, all 5 of our managers had become unavailable. We found that they were all deadlocked due to (what appears to be) a regression with GIL and mutex handling.

[ceph-users] Re: bunch of " received unsolicited reservation grant from osd" messages in log

2022-05-17 Thread Denis Polom
Hi is it still not backportet to latest 16.2.8? I don't see it in release notes. On 12/19/21 11:05, Ronen Friedman wrote: On Sat, Dec 18, 2021 at 7:06 PM Ronen Friedman wrote: Hi all, This was indeed a bug, which I've already fixed in 'master'. I'll look for the backporting status

[ceph-users] Re: DM-Cache for spinning OSDs

2022-05-17 Thread Burkhard Linke
Hi, On 5/17/22 08:51, Stolte, Felix wrote: Hey guys, i have three servers with 12x 12 TB Sata HDDs and 1x 3,4 TB NVME. I am thinking of putting DB/WAL on the NVMe as well as an 5GB DM-Cache for each spinning disk. Is anyone running something like this in a production environment? We have

[ceph-users] Re: v16.2.8 Pacific released

2022-05-17 Thread Ernesto Puerta
Thanks for reporting this issue, Jozef. I just filed a tracker for fixing it (https://tracker.ceph.com/issues/55686). This issue (harmless as you mentioned) had to be there already, since no changes were made to iSCSI dashboard code for 16.2.8. Kind Regards, Ernesto On Tue, May 17, 2022 at

[ceph-users] Re: Reasonable MDS rejoin time?

2022-05-17 Thread Felix Lee
Yes, we do plan to upgrade Ceph in near future for sure. In any case, I used brutal way(kinda) to kick rejoin to active by setting "mds_wipe_sessions = true" to all MDS. Still, the entire MDS recovery process makes us blind to estimate the service downtime. So, I am wondering if there is any

[ceph-users] Re: Migration Nautilus to Pacifi : Very high latencies (EC profile)

2022-05-17 Thread BEAUDICHON Hubert (Acoss)
Hi Josh, I'm working with Stéphane and I'm the "ceph admin" (big words ^^) in our team. So, yes, as part of the upgrade we've done the offline repair to split the omap by pool. The quick fix is, as far as I know, still disable on the default properties. On the I/O and CPU load, between Nautilus

[ceph-users] Re: Reasonable MDS rejoin time?

2022-05-17 Thread Jos Collin
I suggest you to upgrade the cluster to the latest release [1], as nautilus reached EOL. [1] https://docs.ceph.com/en/latest/releases/ On 16/05/22 13:29, Felix Lee wrote: Hi, Jos, Many thanks for your reply. And sorry, I missed to mention the version, which is 14.2.22. Here is the log:

[ceph-users] Re: S3 and RBD backup

2022-05-17 Thread Janne Johansson
Den mån 16 maj 2022 kl 13:41 skrev Sanjeev Jha : > Could someone please let me know how to take S3 and RBD backup from Ceph side > and possibility to take backup from Client/user side? > Which tool should I use for the backup? Backing data up, or replicating it is a choice between a lot of

[ceph-users] DM-Cache for spinning OSDs

2022-05-17 Thread Stolte, Felix
Hey guys, i have three servers with 12x 12 TB Sata HDDs and 1x 3,4 TB NVME. I am thinking of putting DB/WAL on the NVMe as well as an 5GB DM-Cache for each spinning disk. Is anyone running something like this in a production environment? best regards Felix