[ceph-users] Re: Ceph Quincy and liburing.so.2 on Rocky Linux 9
But Rocky Linux 9 is the continuation of what CentOS would have been on el9. Afaik is ceph being developed on elX distributions and not the 'trial' stream versions, not? > > In most cases the 'Alternative' distro like Alma or Rocky have outdated > versions of packages, if we compared it with CentOS Stream 8 or CentOS > Stream 9. For example is a golang package, on c8s is a 1.20 version on > Alma still 1.19 > > You can try to use c8s/c9s or try to contribute to your distro to > resolve dependency issues > > > > > > I've been digging and I can't see that this has come up anywhere. > > > > I'm trying to update a client from Pacific 17.2.3-2 to 17.2.6-4 and > I'm getting the error > > > > Error: > > Problem: cannot install the best update candidate for package ceph- > base-2:17.2.3-2.el9s.x86_64 > > - nothing provides liburing.so.2()(64bit) needed by ceph-base- > 2:17.2.6-4.el9s.x86_64 > > - nothing provides liburing.so.2(LIBURING_2.0)(64bit) needed by ceph- > base-2:17.2.6-4.el9s.x86_64 > > (try to add '--skip-broken' to skip uninstallable packages or '-- > nobest' to use not only best candidate packages) > > > > Did Ceph Pacific switch to requiring liburing 2? Rocky 9 only provides > 0.7-7. CentOS stream seems to have 1.0.7-3 (at least back to when I set > up that repo on Foreman; I don't remember if I'm keeping it up-to-date). > > > > Can I/should I just do --nobest when updating? I could probably build > it from a source RPM from another RH-based distro, but I'd rather keep > it clean with the same distro. > > ___ ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Ceph Quincy and liburing.so.2 on Rocky Linux 9
Your are right. Centos stream is alpha Fedora is beta RHEL is stable Alma/Rocky/Oracle are based on RHEL Venlig hilsen - Mit freundlichen Grüßen - Kind Regards, Jens Galsgaard Gitservice.dk Mob: +45 28864340 -Oprindelig meddelelse- Fra: Marc Sendt: Friday, 4 August 2023 09.04 Til: Konstantin Shalygin ; dobr...@gmu.edu Cc: ceph-users@ceph.io Emne: [ceph-users] Re: Ceph Quincy and liburing.so.2 on Rocky Linux 9 But Rocky Linux 9 is the continuation of what CentOS would have been on el9. Afaik is ceph being developed on elX distributions and not the 'trial' stream versions, not? > > In most cases the 'Alternative' distro like Alma or Rocky have > outdated versions of packages, if we compared it with CentOS Stream 8 > or CentOS Stream 9. For example is a golang package, on c8s is a 1.20 > version on Alma still 1.19 > > You can try to use c8s/c9s or try to contribute to your distro to > resolve dependency issues > > > > > > I've been digging and I can't see that this has come up anywhere. > > > > I'm trying to update a client from Pacific 17.2.3-2 to 17.2.6-4 and > I'm getting the error > > > > Error: > > Problem: cannot install the best update candidate for package ceph- > base-2:17.2.3-2.el9s.x86_64 > > - nothing provides liburing.so.2()(64bit) needed by ceph-base- > 2:17.2.6-4.el9s.x86_64 > > - nothing provides liburing.so.2(LIBURING_2.0)(64bit) needed by > > ceph- > base-2:17.2.6-4.el9s.x86_64 > > (try to add '--skip-broken' to skip uninstallable packages or '-- > nobest' to use not only best candidate packages) > > > > Did Ceph Pacific switch to requiring liburing 2? Rocky 9 only > > provides > 0.7-7. CentOS stream seems to have 1.0.7-3 (at least back to when I > set up that repo on Foreman; I don't remember if I'm keeping it up-to-date). > > > > Can I/should I just do --nobest when updating? I could probably > > build > it from a source RPM from another RH-based distro, but I'd rather keep > it clean with the same distro. > > ___ ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: [EXTERN] Re: Ceph Quincy and liburing.so.2 on Rocky Linux 9
I thought so too, but now I'm a bit confused. We are planning to setup a new ceph cluster and initially opted for a el9 system, which is supposed to be stable, should we rather use a stream trail version? Dietmar On 8/4/23 09:04, Marc wrote: But Rocky Linux 9 is the continuation of what CentOS would have been on el9. Afaik is ceph being developed on elX distributions and not the 'trial' stream versions, not? In most cases the 'Alternative' distro like Alma or Rocky have outdated versions of packages, if we compared it with CentOS Stream 8 or CentOS Stream 9. For example is a golang package, on c8s is a 1.20 version on Alma still 1.19 You can try to use c8s/c9s or try to contribute to your distro to resolve dependency issues I've been digging and I can't see that this has come up anywhere. I'm trying to update a client from Pacific 17.2.3-2 to 17.2.6-4 and I'm getting the error Error: Problem: cannot install the best update candidate for package ceph- base-2:17.2.3-2.el9s.x86_64 - nothing provides liburing.so.2()(64bit) needed by ceph-base- 2:17.2.6-4.el9s.x86_64 - nothing provides liburing.so.2(LIBURING_2.0)(64bit) needed by ceph- base-2:17.2.6-4.el9s.x86_64 (try to add '--skip-broken' to skip uninstallable packages or '-- nobest' to use not only best candidate packages) Did Ceph Pacific switch to requiring liburing 2? Rocky 9 only provides 0.7-7. CentOS stream seems to have 1.0.7-3 (at least back to when I set up that repo on Foreman; I don't remember if I'm keeping it up-to-date). Can I/should I just do --nobest when updating? I could probably build it from a source RPM from another RH-based distro, but I'd rather keep it clean with the same distro. ___ ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io OpenPGP_signature Description: OpenPGP digital signature ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Ceph Quincy and liburing.so.2 on Rocky Linux 9
Konstantin Shalygin wrote: > Hi, > > In most cases the 'Alternative' distro like Alma or Rocky have outdated > versions > of packages, if we compared it with CentOS Stream 8 or CentOS Stream 9. For > example is a > golang package, on c8s is a 1.20 version on Alma still 1.19 > > You can try to use c8s/c9s or try to contribute to your distro to resolve > dependency > issues > > > k By definition, the stable version of anything is going to have "outdated versions of packages," so that's not really what's going on here. You did, unintentionally, give me the clue I needed, though. I accessed the Ceph repos from Rocky's Extras repo, which includes centos-release-ceph-quincy centos-release-ceph-pacific.noarch1.0-2.el9 CEC_Rocky_Linux_9_Rocky_92_extras centos-release-ceph-quincy.noarch 1.0-2.el9 CEC_Rocky_Linux_9_Rocky_92_extras centos-release-cloud.noarch 1-1.el9 CEC_Rocky_Linux_9_Rocky_92_extras Which is pointing to 9-stream. (I do remember seeing "9s" in the repo names, but I didn't connect it with Stream, since I don't do Stream in production and, honestly, I don't have enough time at work to do Stream in test, so...) >From /etc/yum.repos.d/CentOS-Ceph-Quincy.repo: metalink=https://mirrors.centos.org/metalink?repo=centos-storage-sig-ceph-quincy-9-stream&arch=$basearch Which is why I'm getting different dependencies. THAT I can take to the Rocky folks to get sorted. I can see where that would cause confusion, as it did in my case. When I originally installed Ceph, I was using RHEL, not Rocky and I didn't use (or have?) the Extras repo. I copied the repo over and edited it to point to Ceph Reef EL9, which installed fine -- and confused me further, but makes sense now since it wasn't for Stream. I'll roll my own repo files and not use the centos-release-ceph-* from Extras. Hopefully, this saves someone else a bit of grief later! Thanks! ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: What's the max of snap ID?
I'm no programmer but if I understand [1] correctly it's an unsigned long long: int ImageCtx::snap_set(uint64_t in_snap_id) { which means the max snap_id should be this: 2^64 = 18446744073709551616 Not sure if you can get your cluster to reach that limit, but I also don't know what would happen if you actually would reach it. I also might be misunderstanding so maybe someone with more knowledge can confirm oder correct me. [1] https://github.com/ceph/ceph/blob/main/src/librbd/ImageCtx.cc#L328 Zitat von Tony Liu : Hi, There is a snap ID for each snapshot. How is this ID allocated, sequentially? Did some tests, it seems this ID is per pool, starting from 4 and always going up. Is that correct? What's the max of this ID? What's going to happen when ID reaches the max, going back to start from 4 again? Thanks! Tony ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: What's the max of snap ID?
2^64 byte in peta byte = 18446.744073709551616 (peta⋅byte) Assuming that a snapshot requires storing any data at all, which it must, nobody has a Ceph cluster that could store that much snapshot metadata even for empty snapshots. On Fri, Aug 4, 2023 at 7:05 AM Eugen Block wrote: > > I'm no programmer but if I understand [1] correctly it's an unsigned > long long: > > > int ImageCtx::snap_set(uint64_t in_snap_id) { > > which means the max snap_id should be this: > > 2^64 = 18446744073709551616 > > Not sure if you can get your cluster to reach that limit, but I also > don't know what would happen if you actually would reach it. I also > might be misunderstanding so maybe someone with more knowledge can > confirm oder correct me. > > [1] https://github.com/ceph/ceph/blob/main/src/librbd/ImageCtx.cc#L328 > > Zitat von Tony Liu : > > > Hi, > > > > There is a snap ID for each snapshot. How is this ID allocated, > > sequentially? > > Did some tests, it seems this ID is per pool, starting from 4 and > > always going up. > > Is that correct? > > What's the max of this ID? > > What's going to happen when ID reaches the max, going back to start > > from 4 again? > > > > > > Thanks! > > Tony > > ___ > > ceph-users mailing list -- ceph-users@ceph.io > > To unsubscribe send an email to ceph-users-le...@ceph.io > > > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: snapshot timestamp
On Fri, Aug 4, 2023 at 7:49 AM Tony Liu wrote: > > Hi, > > We know snapshot is on a point of time. Is this point of time tracked > internally by > some sort of sequence number, or the timestamp showed by "snap ls", or > something else? Hi Tony, The timestamp in "rbd snap ls" output is the snapshot creation timestamp. > > I noticed that when "deep cp", the timestamps of all snapshot are changed to > copy-time. Correct -- exactly the same as the image creation timestamp (visible in "rbd info" output). > Say I create a snapshot at 1PM and make a copy at 3PM, the timestamp of > snapshot in > the copy is 3PM. If I rollback the copy to this snapshot, I'd assume it will > actually bring me > back to the state of 1PM. Is that correct? Correct. > > If the above is true, I won't be able to rely on timestamp to track snapshots. > > Say I create a snapshot every hour and make a backup by copy at the end of > the day. > Then the original image is damaged and backup is used to restore the work. On > this > backup image, how do I know which snapshot was on 1PM, which was on 2PM, etc.? > Any advices to track snapshots properly in such case? I would suggest embedding that info along with any additional metadata needed in the snapshot name. Thanks, Ilya ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Ceph Quincy and liburing.so.2 on Rocky Linux 9
That’s a major misinterpretation of how it actually is in reality. Sorry just had to state that, obviously not the proper mailing list to discuss it on. Best regards Tobias > On 4 Aug 2023, at 09:25, Jens Galsgaard wrote: > > Your are right. > > Centos stream is alpha > Fedora is beta > RHEL is stable > > Alma/Rocky/Oracle are based on RHEL > > Venlig hilsen - Mit freundlichen Grüßen - Kind Regards, > Jens Galsgaard > > Gitservice.dk > Mob: +45 28864340 > > > -Oprindelig meddelelse- > Fra: Marc > Sendt: Friday, 4 August 2023 09.04 > Til: Konstantin Shalygin ; dobr...@gmu.edu > Cc: ceph-users@ceph.io > Emne: [ceph-users] Re: Ceph Quincy and liburing.so.2 on Rocky Linux 9 > > But Rocky Linux 9 is the continuation of what CentOS would have been on el9. > Afaik is ceph being developed on elX distributions and not the 'trial' stream > versions, not? > > >> >> In most cases the 'Alternative' distro like Alma or Rocky have >> outdated versions of packages, if we compared it with CentOS Stream 8 >> or CentOS Stream 9. For example is a golang package, on c8s is a 1.20 >> version on Alma still 1.19 >> >> You can try to use c8s/c9s or try to contribute to your distro to >> resolve dependency issues >> >> >>> >>> I've been digging and I can't see that this has come up anywhere. >>> >>> I'm trying to update a client from Pacific 17.2.3-2 to 17.2.6-4 and >> I'm getting the error >>> >>> Error: >>> Problem: cannot install the best update candidate for package ceph- >> base-2:17.2.3-2.el9s.x86_64 >>> - nothing provides liburing.so.2()(64bit) needed by ceph-base- >> 2:17.2.6-4.el9s.x86_64 >>> - nothing provides liburing.so.2(LIBURING_2.0)(64bit) needed by >>> ceph- >> base-2:17.2.6-4.el9s.x86_64 >>> (try to add '--skip-broken' to skip uninstallable packages or '-- >> nobest' to use not only best candidate packages) >>> >>> Did Ceph Pacific switch to requiring liburing 2? Rocky 9 only >>> provides >> 0.7-7. CentOS stream seems to have 1.0.7-3 (at least back to when I >> set up that repo on Foreman; I don't remember if I'm keeping it up-to-date). >>> >>> Can I/should I just do --nobest when updating? I could probably >>> build >> it from a source RPM from another RH-based distro, but I'd rather keep >> it clean with the same distro. >>> ___ > ___ > ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to > ceph-users-le...@ceph.io > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: question about OSD onode hits ratio
Check to see what your osd_memory_target is set to. The default 4GB is generally a decent starting point, but if you have a large active data set you might benefit from increasing the amount of memory available to the OSDs. They'll generally prefer giving it to the onode cache first if it's hot. *Note: In some container based deployments the osd_memory_target might be getting set automatically based on the container limit (and possibly based on the memory available in the node). Mark On 8/2/23 11:25 PM, Ben wrote: Hi, We have a cluster running for a while. From grafana ceph dashboard, I saw OSD onode hits ratio 92% when cluster was just up and running. After couple month, it says now 70%. This is not a good trend I think. Just wondering what should be done to stop this trend. Many thank, Ben ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io -- Best Regards, Mark Nelson Head of R&D (USA) Clyso GmbH p: +49 89 21552391 12 a: Loristraße 8 | 80335 München | Germany w: https://clyso.com | e: mark.nel...@clyso.com We are hiring: https://www.clyso.com/jobs/ ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Natuilus: Taking out OSDs that are 'Failure Pending'
Hello. It's been a while. I have a Nautilus cluster with 72 x 12GB HDD OSDs (BlueStore) and mostly of EC 8+2 pools/PGs. It's been working great - some nodes went nearly 900 days without a reboot. As of yesterday I found that I have 3 OSDs with a Smart status of 'Pending Failure'. New drives are ordered and will be here next week. There is a procedure in the documentation for replacing an OSD, but I can't do that directly until I receive the drives. My inclination is to mark these 3 OSDs 'OUT' before they crash completely, but I want to confirm my understanding of Ceph's response to this. Mainly, given my EC pools (or replicated pools for that matter), if I mark all 3 OSD out all at once will I risk data loss? If I have it right, marking an OSD out will simply cause Ceph to move all of the PG shards from that OSD to other OSDs, so no major risk of data loss. However, if it would be better to do them one per day or something, I'd rather be safe. I also assume that I should wait for the rebalance to complete before I initiate the replacement procedure. Your thoughts? Thanks. -Dave -- Dave Hall Binghamton University kdh...@binghamton.edu ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: [EXTERNAL] Upgrading nautilus / centos7 to octopus / ubuntu 20.04. - Suggestions and hints? - Thanks
Hi, thanks to all suggestions. Right now, it is step by step that works: going to bionic/nautilus …and from that like Josh noted. We encountered a problem which I'll post separately . Best . Götz > Am 03.08.2023 um 15:44 schrieb Beaman, Joshua : > > We went through this exercise, though our starting point was ubuntu 16.04 / > nautilus. We reduced our double builds as follows: > > Rebuild each monitor host on 18.04/bionic and rejoin still on nautilus > Upgrade all mons, mgrs., (and rgws optionally) to pacific > Convert each mon, mgr, rgw to cephadm and enable orchestrator > Rebuild each mon, mgr, rgw on 20.04/focal and rejoin pacfic cluster > Drain and rebuild each osd host on focal and pacific > > This has the advantage of only having to drain and rebuild the OSD hosts > once. Double building the control cluster hosts isn’t so bad, and > orchestrator makes all of the ceph parts easy once it’s enabled. > > The biggest challenge we ran into was: https://tracker.ceph.com/issues/51652 > because we still had a lot of filestore osds. It’s frustrating, but we > managed to get through it without much client interruption on a dozen prod > clusters, most of which were 38 osd hosts and 912 total osds each. One thing > which helped, was, before beginning the osd host builds, set all of the old > osds primary-affinity to something <1. This way when the new pacific (or > octopus) osds join the cluster they will automatically be favored for primary > on their pgs. If a heartbeat timeout storm starts to get out of control, > start by setting nodown and noout. The flapping osds are the worst. Then > figure out which osds are the culprit and restart them. > > Hopefully your nautilus osds are all bluestore and you won’t have this > problem. We put up with it, because the filestore to bluestore conversion > was one of the most important parts of this upgrade for us. > > Best of luck, whatever route you take. > > Regards, > Josh Beaman smime.p7s Description: S/MIME cryptographic signature ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: [EXTERNAL] Natuilus: Taking out OSDs that are 'Failure Pending'
Marking them OUT first is the way to go. As long as the osds stay UP, they can and will participate in the recovery. How many you can mark out at one time will depend on how sensitive your client i/o is to background recovery, and all of the related tunings. If you have the hours/days to spare, it is definitely easier on the cluster to do them one at a time. Thank you, Josh Beaman From: Dave Hall Date: Friday, August 4, 2023 at 8:45 AM To: ceph-users Cc: anthony.datri Subject: [EXTERNAL] [ceph-users] Natuilus: Taking out OSDs that are 'Failure Pending' Hello. It's been a while. I have a Nautilus cluster with 72 x 12GB HDD OSDs (BlueStore) and mostly of EC 8+2 pools/PGs. It's been working great - some nodes went nearly 900 days without a reboot. As of yesterday I found that I have 3 OSDs with a Smart status of 'Pending Failure'. New drives are ordered and will be here next week. There is a procedure in the documentation for replacing an OSD, but I can't do that directly until I receive the drives. My inclination is to mark these 3 OSDs 'OUT' before they crash completely, but I want to confirm my understanding of Ceph's response to this. Mainly, given my EC pools (or replicated pools for that matter), if I mark all 3 OSD out all at once will I risk data loss? If I have it right, marking an OSD out will simply cause Ceph to move all of the PG shards from that OSD to other OSDs, so no major risk of data loss. However, if it would be better to do them one per day or something, I'd rather be safe. I also assume that I should wait for the rebalance to complete before I initiate the replacement procedure. Your thoughts? Thanks. -Dave -- Dave Hall Binghamton University kdh...@binghamton.edu ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] cephfs mount problem - client session lacks required features
Hi, During the upgrade from centos7/nautilus to ubuntu 18/nautilus (still updating the MONs) I got a cephfs client who refuses or is refused to mount the ceph fs again. The clients says: mount error 13 = Permission denied The cephmds log: lacks required features 0x1000 client supports 0x00ff The mds/mon is still centos7/nautilus. The Clients centos7 as well. Any ideas ? Thx for suggestions and hints . Best Götz smime.p7s Description: S/MIME cryptographic signature ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Natuilus: Taking out OSDs that are 'Failure Pending' [EXT]
On Fri, Aug 04, 2023 at 09:44:57AM -0400, Dave Hall wrote: > My inclination is to mark these 3 OSDs 'OUT' before they crash completely, > but I want to confirm my understanding of Ceph's response to this. Mainly, > given my EC pools (or replicated pools for that matter), if I mark all 3 > OSD out all at once will I risk data loss? It depends on your crush map and failure domain layout. In the unlikeliest and unluckiest case, all those 3 OSDs are in different failure domains, and some data has 1 replica on each of those OSDs. In that situation, if you take them out simultaneously, you would lose data. If you're unsure, then do them one at a time and wait for the rebalance/backfill to complete before doing the next. We arrange our OSDs so that the failure domain is the rack; losing an entire rack is safe (and we've had that happen) so we know it's safe to pull any number of OSDs in the same rack and we won't lose data. Dave -- ** Dave Holland ** Systems Support -- Informatics Systems Group ** ** d...@sanger.ac.uk **Wellcome Sanger Institute, Hinxton, UK** -- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, CB10 1SA. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: [EXTERNAL] cephfs mount problem - client session lacks required features
We did not have any cephfs or mds involved. But since you haven’t even started a ceph upgrade in earnest, I have to wonder about your nautilus versions. Maybe you have a mismatch there? I would definitely share the output of `ceph versions` and `ceph features`. If you’re not 14.2.22 across the board, I would at least upgrade your mon, mgr, and mds services. Then check release notes to see if there’s any clues there. Thank you, Josh Beaman From: Götz Reinicke Date: Friday, August 4, 2023 at 9:02 AM To: ceph-users@ceph.io Subject: [EXTERNAL] [ceph-users] cephfs mount problem - client session lacks required features Hi, During the upgrade from centos7/nautilus to ubuntu 18/nautilus (still updating the MONs) I got a cephfs client who refuses or is refused to mount the ceph fs again. The clients says: mount error 13 = Permission denied The cephmds log: lacks required features 0x1000 client supports 0x00ff The mds/mon is still centos7/nautilus. The Clients centos7 as well. Any ideas ? Thx for suggestions and hints . Best Götz ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: [EXTERNAL] cephfs mount problem - client session lacks required features - solved
Hi Josh, Thanks for your feedback. We did a restart of the active MDS and the error / problem is gone. Best . Götz > Am 04.08.2023 um 16:19 schrieb Beaman, Joshua : > > We did not have any cephfs or mds involved. But since you haven’t even > started a ceph upgrade in earnest, I have to wonder about your nautilus > versions. Maybe you have a mismatch there? > > I would definitely share the output of `ceph versions` and `ceph features`. > If you’re not 14.2.22 across the board, I would at least upgrade your mon, > mgr, and mds services. Then check release notes to see if there’s any clues > there. > > Thank you, > Josh Beaman > > From: Götz Reinicke > Date: Friday, August 4, 2023 at 9:02 AM > To: ceph-users@ceph.io > Subject: [EXTERNAL] [ceph-users] cephfs mount problem - client session lacks > required features > > Hi, > > During the upgrade from centos7/nautilus to ubuntu 18/nautilus (still > updating the MONs) I got a cephfs client who refuses or is refused to mount > the ceph fs again. > > The clients says: mount error 13 = Permission denied > > The cephmds log: lacks required features 0x1000 client supports > 0x00ff > > The mds/mon is still centos7/nautilus. The Clients centos7 as well. > > Any ideas ? Thx for suggestions and hints . Best Götz smime.p7s Description: S/MIME cryptographic signature ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: [External Email] Re: Natuilus: Taking out OSDs that are 'Failure Pending' [EXT]
Dave, Actually, my failure domain is OSD since I so far only have 9 OSD nodes but EC 8+2. However, the drives are still functioning, except that one has failed multiple times in the last few days, requiring a node power-cycle to recover. I will certainly mark that one out immediately. The other two pending failures are behaving more politely, so I am assuming that the cluster could copy the data elsewhere as part of the rebalance. I think I'm also concerned about the rebalance process moving data to these drives with pending failures. Since I'm EC 8+2, perhaps it is safe to mark two out simultaneously? Thanks. -Dave -- Dave Hall Binghamton University kdh...@binghamton.edu On Fri, Aug 4, 2023 at 10:16 AM Dave Holland wrote: > On Fri, Aug 04, 2023 at 09:44:57AM -0400, Dave Hall wrote: > > My inclination is to mark these 3 OSDs 'OUT' before they crash > completely, > > but I want to confirm my understanding of Ceph's response to this. > Mainly, > > given my EC pools (or replicated pools for that matter), if I mark all 3 > > OSD out all at once will I risk data loss? > > It depends on your crush map and failure domain layout. In the > unlikeliest and unluckiest case, all those 3 OSDs are in different > failure domains, and some data has 1 replica on each of those OSDs. In > that situation, if you take them out simultaneously, you would lose > data. If you're unsure, then do them one at a time and wait for the > rebalance/backfill to complete before doing the next. > > We arrange our OSDs so that the failure domain is the rack; losing an > entire rack is safe (and we've had that happen) so we know it's safe > to pull any number of OSDs in the same rack and we won't lose data. > > Dave > -- > ** Dave Holland ** Systems Support -- Informatics Systems Group ** > ** d...@sanger.ac.uk **Wellcome Sanger Institute, Hinxton, UK** > > > -- > The Wellcome Sanger Institute is operated by Genome Research > Limited, a charity registered in England with number 1021457 and a > company registered in England with number 2742969, whose registered > office is Wellcome Sanger Institute, Wellcome Genome Campus, > Hinxton, CB10 1SA. > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: What's the max of snap ID?
Thank you Eugen and Nathan! uint64 is big enough, no concerns any more. Tony From: Nathan Fish Sent: August 4, 2023 04:19 AM To: Eugen Block Cc: ceph-users@ceph.io Subject: [ceph-users] Re: What's the max of snap ID? 2^64 byte in peta byte = 18446.744073709551616 (peta⋅byte) Assuming that a snapshot requires storing any data at all, which it must, nobody has a Ceph cluster that could store that much snapshot metadata even for empty snapshots. On Fri, Aug 4, 2023 at 7:05 AM Eugen Block wrote: > > I'm no programmer but if I understand [1] correctly it's an unsigned > long long: > > > int ImageCtx::snap_set(uint64_t in_snap_id) { > > which means the max snap_id should be this: > > 2^64 = 18446744073709551616 > > Not sure if you can get your cluster to reach that limit, but I also > don't know what would happen if you actually would reach it. I also > might be misunderstanding so maybe someone with more knowledge can > confirm oder correct me. > > [1] https://github.com/ceph/ceph/blob/main/src/librbd/ImageCtx.cc#L328 > > Zitat von Tony Liu : > > > Hi, > > > > There is a snap ID for each snapshot. How is this ID allocated, > > sequentially? > > Did some tests, it seems this ID is per pool, starting from 4 and > > always going up. > > Is that correct? > > What's the max of this ID? > > What's going to happen when ID reaches the max, going back to start > > from 4 again? > > > > > > Thanks! > > Tony > > ___ > > ceph-users mailing list -- ceph-users@ceph.io > > To unsubscribe send an email to ceph-users-le...@ceph.io > > > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: snapshot timestamp
Thank you Ilya for confirmation! Tony From: Ilya Dryomov Sent: August 4, 2023 04:51 AM To: Tony Liu Cc: d...@ceph.io; ceph-users@ceph.io Subject: Re: [ceph-users] snapshot timestamp On Fri, Aug 4, 2023 at 7:49 AM Tony Liu wrote: > > Hi, > > We know snapshot is on a point of time. Is this point of time tracked > internally by > some sort of sequence number, or the timestamp showed by "snap ls", or > something else? Hi Tony, The timestamp in "rbd snap ls" output is the snapshot creation timestamp. > > I noticed that when "deep cp", the timestamps of all snapshot are changed to > copy-time. Correct -- exactly the same as the image creation timestamp (visible in "rbd info" output). > Say I create a snapshot at 1PM and make a copy at 3PM, the timestamp of > snapshot in > the copy is 3PM. If I rollback the copy to this snapshot, I'd assume it will > actually bring me > back to the state of 1PM. Is that correct? Correct. > > If the above is true, I won't be able to rely on timestamp to track snapshots. > > Say I create a snapshot every hour and make a backup by copy at the end of > the day. > Then the original image is damaged and backup is used to restore the work. On > this > backup image, how do I know which snapshot was on 1PM, which was on 2PM, etc.? > Any advices to track snapshots properly in such case? I would suggest embedding that info along with any additional metadata needed in the snapshot name. Thanks, Ilya ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: [External Email] Re: Natuilus: Taking out OSDs that are 'Failure Pending' [EXT]
On Fri, Aug 4, 2023 at 11:33 AM Dave Hall wrote: > > Dave, > > Actually, my failure domain is OSD since I so far only have 9 OSD nodes but > EC 8+2. However, the drives are still functioning, except that one has > failed multiple times in the last few days, requiring a node power-cycle to > recover. I will certainly mark that one out immediately. > > The other two pending failures are behaving more politely, so I am assuming > that the cluster could copy the data elsewhere as part of the rebalance. I > think I'm also concerned about the rebalance process moving data to these > drives with pending failures. > > Since I'm EC 8+2, perhaps it is safe to mark two out simultaneously? Dave, You should be able to mark out two OSDs simultaneously without worry as long as you have enough space, etc. When you mark an OSD out, it still participates in the cluster as long as the OSD remains up and is able to aid in the backfilling process. Thus, you'll also want to avoid stopping/downing the OSDs until backfilling completes. Following that logic: if you stop both OSDs before backfilling completes, you will put yourself in a bad spot. If all PGs are active+clean, you may both a) out the two OSDs and b) stop/down *only the one* imminently failing OSD (leaving the second OSD being drained still up) and things should also be fine... but you will be vulnerable to blocked ops/unavailable data if _subsequent_ OSDs fail unexpectedly, including the second OSD being out'd, depending upon your CRUSH map and cluster status. Note that if your intent is to purge the OSD after it is drained, I believe you should do a `ceph osd crush reweight osd.X 0` and not an `ceph out osd.X` or `ceph osd reweight osd.X 0` as it should result in slightly less net data movement. Cheers, Tyler ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] snaptrim number of objects
Hey guys, I'm trying to figure out what's happening to my backup cluster that often grinds to a halt when cephfs automatically removes snapshots. Almost all OSD's go to 100% CPU, ceph complains about slow ops, and CephFS stops doing client i/o. I'm graphing the cumulative value of the snaptrimq_len value, and that slowly decreases over time. One night it takes an hour, but other days, like today, my cluster has been down for almost 20 hours, and I think we're half way. Funny thing is that in both cases, the snaptrimq_len value initially goes to the same value, around 3000, and then slowly decreases, but my guess is that the number of objects that need to be trimmed varies hugely every day. Is there a way to show the size of cephfs snapshots, or get the number of objects or bytes that need snaptrimming? Perhaps I can graph that and see where the differences are. That won't explain why my cluster bogs down, but at least it gives some visibility. Running 17.2.6 everywhere by the way. Angelo. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io