[ceph-users] Re: ceph error connecting to the cluster
Hi, your message showed up in my inbox today, but apparently it's almost a week old. Have you resolved your issue? If not, you'll need to provide more details, for example your ceph version, the current ceph status, which keyrings are present on the host you're trying to execute the command from. Zitat von arimbidh...@gmail.com: hello, i was tried to create osd but when i run ceph command the output is like this : root@pod-deyyaa-ceph1:~# sudo ceph -s 2024-02-02T16:01:23.627+0700 7fc762f37640 0 monclient(hunting): authenticate timed out after 300 [errno 110] RADOS timed out (error connecting to the cluster) can anyone give me a hint or help me to solve this ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Performance issues with writing files to Ceph via S3 API
Hello Anthony, Sorry for the late reply. My thought process behind it was that maybe there's some kind of indexing that Ceph does under the hood, and perhaps the bucket structure could influence that. But if you say it's not the case, then I was on the wrong path. Sorry for the daley, but I also wanted to gather info. > How many millions? About 75 millions. > How big are they? They vary from ~500kb to a couple of megabytes, say 5mb. I wouldn't be able to tell you if most files are closer to 5mb or to 500kb though, but if that's important I can try to figure it out. > Are you writing them to a single bucket? Yes. All these files are in a single bucket. > How is the index pool configured? On what media? > Same with the bucket pool. I wouldn't be able to answer that unfortunately. > Which Ceph release? Pacific (https://docs.ceph.com/en/pacific/). > Sharding config? > Are you mixing in bucket list operations ? We don't use list operations on this bucket, but the Ceph infrastructure is shared across multiple companies and we are aware that there are others using list operations *on other buckets*. But also, I can say that list operations in this bucket IIRC are failing (to a point where we don't have the exact metric of how many objects are in the bucket). The provider has a prometheus exporter which fails to expert the metrics in production currently. > Do you have the ability to utilize more than one bucket? If you can limit the number of objects in a bucket that might help. Technically it should be possible, but I'd assume that Ceph can abstract this complexity for the bucket user so that we don't have to care for that. If we do it, I would see it as a workaround more than a real solution. > If your application keeps track of object names you might try indexless buckets. I didn't know there was this possibility. I don't know how Ceph works under the hood, but assuming that all files are ultimately written to the same folder in disk, could that be a problem? I have faced in the past struggle with linux file system getting too slow due to too many files written to the same folder. Thanks for the help already! Best regards, *Renann Prado* On Sat, Feb 3, 2024 at 7:13 PM Anthony D'Atri wrote: > The slashes don’t mean much if anything to Ceph. Buckets are not > hierarchical filesystems. > > You speak of millions of files. How many millions? > > How big are they? Very small objects stress any object system. Very > large objects may be multi part uploads that stage to slow media or > otherwise add overhead. > > Are you writing them to a single bucket? > > How is the index pool configured? On what media? > Same with the bucket pool. > > Which Ceph release? Sharding config? > Are you mixing in bucket list operations ? > > It could be that you have an older release or a cluster set up on an older > release that doesn’t effectively auto-reshard the bucket index. If the > index pool is set up poorly - slow media, too few OSDs, too few PGs - that > may contribute. > > In some circumstances pre-sharding might help. > > Do you have the ability to utilize more than one bucket? If you can limit > the number of objects in a bucket that might help. > > If your application keeps track of object names you might try indexless > buckets. > > > On Feb 3, 2024, at 12:57 PM, Renann Prado > wrote: > > > > Hello, > > > > I have an issue at my company where we have an underperforming Ceph > > instance. > > The issue that we have is that sometimes writing files to Ceph via S3 API > > (our only option) takes up to 40s, which is too long for us. > > We are a bit limited on what we can do to investigate why it's performing > > so badly, because we have a service provider in between, so getting to > the > > bottom of this really is not that easy. > > > > That being said, the way we use the S3 APi (again, Ceph under the hood) > is > > by writing all files (multiple millions) to the root, so we don't use > *no* > > folder-like structure e.g. we write */* instead of > */this/that/* > > . > > > > The question is: > > > > Does anybody know whether Ceph has performance gains when you create a > > folder structure vs when you don't? > > Looking at Ceph's documentation I could not find such information. > > > > Best regards, > > > > *Renann Prado* > > ___ > > ceph-users mailing list -- ceph-users@ceph.io > > To unsubscribe send an email to ceph-users-le...@ceph.io > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: pacific 16.2.15 QE validation status
Hi, Is this PR: https://github.com/ceph/ceph/pull/54918 included as well? You definitely want to build the Ubuntu / debian packages with the proper CMAKE_CXX_FLAGS. The performance impact on RocksDB is _HUGE_. Thanks, Gr. Stefan P.s. Kudos to Mark Nelson for figuring it out / testing. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Performance issues with writing files to Ceph via S3 API
> On Feb 8, 2024, at 07:05, Renann Prado wrote: > > Hello Anthony, > > Sorry for the late reply. > My thought process behind it was that maybe there's some kind of indexing > that Ceph does under the hood, and perhaps the bucket structure could > influence that. Absolutely, that's why I asked the questions. > But if you say it's not the case, then I was on the wrong path. > > Sorry for the daley, but I also wanted to gather info. > >> How many millions? > > About 75 millions. In a single bucket??? > >> How big are they? > > They vary from ~500kb to a couple of megabytes, say 5mb. I wouldn't be able > to tell you if most files are closer to 5mb or to 500kb though, but if > that's important I can try to figure it out. No that's fine. Ceph, and many other object storage systems, have a harder time with small objects. If they're a lot smaller you can end up with wasted space. But at 500KB, metadata operations rival just storing the data, so they can be a bottleneck and a hotspot. > >> Are you writing them to a single bucket? > > Yes. All these files are in a single bucket. yikes. Any chance you could refactor the application to use smaller buckets? > >> How is the index pool configured? On what media? >> Same with the bucket pool. > > I wouldn't be able to answer that unfortunately. > >> Which Ceph release? > > Pacific (https://docs.ceph.com/en/pacific/). > >> Sharding config? >> Are you mixing in bucket list operations ? > > We don't use list operations on this bucket, but the Ceph infrastructure is > shared across multiple companies and we are aware that there are others > using list operations *on other buckets*. But also, I can say that list > operations in this bucket IIRC are failing (to a point where we don't have > the exact metric of how many objects are in the bucket). Could be a timeout, I think the list API call only returns up to 1000 objects and for a larger bucket one has to iterate. > The provider has a > prometheus exporter which fails to expert the metrics in production > currently. > >> Do you have the ability to utilize more than one bucket? If you can limit > the number of objects in a bucket that might help. > > Technically it should be possible, but I'd assume that Ceph can abstract > this complexity for the bucket user so that we don't have to care for that. > If we do it, I would see it as a workaround more than a real solution. I don't recall the succession of changes to bucket sharding. With your Pacific release it could be that auto-resharding isn't enabled or isn't functioning. I suspect that bucket sharding is the heart of the issue. > >> If your application keeps track of object names you might try indexless > buckets. > > I didn't know there was this possibility. > > I don't know how Ceph works under the hood, but assuming that all files are > ultimately written to the same folder in disk, could that be a problem? It doesn't work that way. Ceph has an abstracted foundation layer called RADOS, and the data isn't stored on disk as traditional files. > I have faced in the past struggle with linux file system getting too slow > due to too many files written to the same folder. It could be a similar but not identical issue. When a Ceph cluster runs RGW to provide object storage, it has a dedicated pool that stores bucket indexes. For any scale at all this must be placed on fast storage (SSDs) across enough separate drives and with enough placement groups. Each bucket's index is broken into "shards". With older releases that sharding was manual -- for very large buckets one would have to manually reshard the index, or pre-shard it in advance for the eventual size of the bucket. Recent releases have a feature that does this automatically, if it's enabled. My command of these dynamics is limited, so others on the list may be able to chime in with refinements. > > Thanks for the help already! > > Best regards, > *Renann Prado* > > > On Sat, Feb 3, 2024 at 7:13 PM Anthony D'Atri > wrote: > >> The slashes don’t mean much if anything to Ceph. Buckets are not >> hierarchical filesystems. >> >> You speak of millions of files. How many millions? >> >> How big are they? Very small objects stress any object system. Very >> large objects may be multi part uploads that stage to slow media or >> otherwise add overhead. >> >> Are you writing them to a single bucket? >> >> How is the index pool configured? On what media? >> Same with the bucket pool. >> >> Which Ceph release? Sharding config? >> Are you mixing in bucket list operations ? >> >> It could be that you have an older release or a cluster set up on an older >> release that doesn’t effectively auto-reshard the bucket index. If the >> index pool is set up poorly - slow media, too few OSDs, too few PGs - that >> may contribute. >> >> In some circumstances pre-sharding might help. >> >> Do you have the ability to utilize more than one bucket? If you can li
[ceph-users] Re: Adding a new monitor fails
Hi, you're always welcome to report a documentation issue on tracker.ceph.com, you don't need to clean them up by yourself. :-) There is a major restructuring in progress, but they will probably never be perfect anyway. There are definitely some warts om there, as the monitor count was 1 but there were 2 monitors listed running. I don't know your mon history, but I assume that you've had more than one mon (before converting to cephadm?). Then you might have updated the mon specs via command line, containing "count:1". But the mgr refuses to remove the second mon because it would break quorum. That's why you had 2/1 running, this is reproducible in my test cluster. Adding more mons also failed because of the count:1 spec. You could have just overwritten it in the cli as well without a yaml spec file (omit the count spec): ceph orch apply mon --placement="host1,host2,host3" Regards, Eugen Zitat von Tim Holloway : Ah, yes. Much better. There are definitely some warts om there, as the monitor count was 1 but there were 2 monitors listed running. I've mostly avoided docs that reference ceph config files and yaml configs because the online docs are (as I've whined before) not always trustworthy and often contain anachronisms. Were I sufficiently knowledgeable, I'd offer to clean them up, but if that were the case, I wouldn't have to come crying here. All happy now, though. Tim On Tue, 2024-02-06 at 19:22 +, Eugen Block wrote: Yeah, you have the „count:1“ in there, that’s why your manually added daemons are rejected. Try my suggestion with a mon.yaml. Zitat von Tim Holloway : > ceph orch ls > NAME PORTS RUNNING REFRESHED > AGE > PLACEMENT > alertmanager ?:9093,9094 1/1 3m ago > 8M > count:1 > crash 5/5 3m ago > 8M > * > grafana ?:3000 1/1 3m ago > 8M > count:1 > mds.ceefs 2/2 3m ago > 4M > count:2 > mds.fs_name 3/3 3m ago > 8M > count:3 > mgr 3/3 3m ago > 4M > www6.mousetech.com;www2.mousetech.com;www7.mousetech.com > mon 2/1 3m ago > 4M > www6.mousetech.com;www2.mousetech.com;www7.mousetech.com;count:1 > nfs.foo ?:2049 1/1 3m ago > 4M > www7.mousetech.com > node-exporter ?:9100 5/5 3m ago > 8M > * > osd 6 3m ago > - > > osd.dashboard-admin-1686941775231 0 - > 7M > * > prometheus ?:9095 1/1 3m ago > 8M > count:1 > rgw.mousetech ?:80 2/2 3m ago > 3M > www7.mousetech.com;www2.mousetech.com > > > Note that the dell02 monitor doesn't show here although the "ceph > orch > deamon add" returns success initially. And actually the www6 > monitor is > not running nor does it list on the dashboard or "ceph orch ps". > The > www6 machine is still somewhat messed up because it was the initial > launch machine for Octopus. > > On Tue, 2024-02-06 at 17:22 +, Eugen Block wrote: > > So the orchestrator is working and you have a working ceph > > cluster? > > Can you share the output of: > > ceph orch ls mon > > > > If the orchestrator expects only one mon and you deploy another > > manually via daemon add it can be removed. Try using a mon.yaml > > file > > instead which contains the designated mon hosts and then run > > ceph orch apply -I mon.yaml > > > > > > > > Zitat von Tim Holloway : > > > > > I just jacked in a completely new, clean server and I've been > > > trying to > > > get a Ceph (Pacific) monitor running on it. > > > > > > The "ceph orch daemon add" appears to install all/most of > > > what's > > > necessary, but when the monitor starts, it shuts down > > > immediately, > > > and > > > in the manner of Ceph containers immediately erases itself and > > > the > > > container log, so it's not possible to see what its problem is. > > > > > > I looked at manual installation, but the docs appear to be > > > oriented > > > towards old-style non-container implementation and don't > > > account > > > for > > > the newer /var/lib/ceph/*fsid*/ approach. > > > > > > Any tips? > > > > > > Last few lines in the system journal are like this: > > > > > > Feb 06 11:09:58 dell02.mousetech.com ceph-278fcd86-0861-11ee- > > > a7df- > > > 9c5c8e86cf8f-mon-dell02[1357545]: debug 2024-02- > > > 06T16:09:58.938+ > > > 7f26810ae700 4 rocksdb: (Original Log Time 2024/02/06- > > > 16:09:58.938432) > > > [compaction/compaction_job.cc:760] [default] compacted to: base > > > level 6 > > > level multiplier 10.00 max bytes base 268435456 files[0 0 0 0 0 > > > 0 > > > 2] > > > max score 0.00, MB/sec: 35
[ceph-users] Re: pacific 16.2.15 QE validation status
thanks, i've created https://tracker.ceph.com/issues/64360 to track these backports to pacific/quincy/reef On Thu, Feb 8, 2024 at 7:50 AM Stefan Kooman wrote: > > Hi, > > Is this PR: https://github.com/ceph/ceph/pull/54918 included as well? > > You definitely want to build the Ubuntu / debian packages with the > proper CMAKE_CXX_FLAGS. The performance impact on RocksDB is _HUGE_. > > Thanks, > > Gr. Stefan > > P.s. Kudos to Mark Nelson for figuring it out / testing. > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Adding a new monitor fails
Thanks, I'll have to see if I come up with a suitable issue on documentation. My biggest issue isn't a specific item (well, except for Octopus telling me to use the not-included ceph-deploy command in lots of places). It's more a case of needing attention paid to anachronisms in general. That and more attention could be paid to the distinction between container-based and OS-native Ceph components. So in short, not single issues, but more of a need for attention to the overall details to assure that features described for a specific release actually apply TO that release. Grunt work, but it can save a lot on service calls. I migrated to ceph from gluster because gluster is apparently going unsupported at the end of this year. I moved to gluster from DR/BD because I wanted triple redundancy on the data. While ceph is really kind of overkill for my small R&D farm, it has proven to be about the most solid network distributed filesystem I've worked with, No split brains, no outright corruption, no data outages. Despite all the atrocities I committed in setting it up, it has never failed at it primary duty of delivering data service. I started off with Octopus, and that has been the root of a lot of my problems. Octopus introduced cephadm as a primary management tool, I believe, but the documentation still referenced ceph-deploy. And cephadm suffered from a bug that meant that if even one service was down, scheduled work would not be done, so to repair anything I needed an already-repaired system. Migrating to Pacific cleared that up so a lot of what I'm doing now is getting the lint out. I'm now staying consistently healthy between a proper monitor configuration and having removed direct ceph mounts on the desktops. I very much appreciate all the help and insights you've provided. It's nice to have laid my problems to rest. Tim On Thu, 2024-02-08 at 14:41 +, Eugen Block wrote: > Hi, > > you're always welcome to report a documentation issue on > tracker.ceph.com, you don't need to clean them up by yourself. :-) > There is a major restructuring in progress, but they will probably > never be perfect anyway. > > > There are definitely some warts om there, as the monitor count was > > 1 > > but there were 2 monitors listed running. > > I don't know your mon history, but I assume that you've had more > than > one mon (before converting to cephadm?). Then you might have updated > the mon specs via command line, containing "count:1". But the mgr > refuses to remove the second mon because it would break quorum. > That's > why you had 2/1 running, this is reproducible in my test cluster. > Adding more mons also failed because of the count:1 spec. You could > have just overwritten it in the cli as well without a yaml spec file > (omit the count spec): > > ceph orch apply mon --placement="host1,host2,host3" > > Regards, > Eugen > > Zitat von Tim Holloway : > > > Ah, yes. Much better. > > > > There are definitely some warts om there, as the monitor count was > > 1 > > but there were 2 monitors listed running. > > > > I've mostly avoided docs that reference ceph config files and yaml > > configs because the online docs are (as I've whined before) not > > always > > trustworthy and often contain anachronisms. Were I sufficiently > > knowledgeable, I'd offer to clean them up, but if that were the > > case, I > > wouldn't have to come crying here. > > > > All happy now, though. > > > > Tim > > > > > > On Tue, 2024-02-06 at 19:22 +, Eugen Block wrote: > > > Yeah, you have the „count:1“ in there, that’s why your manually > > > added > > > daemons are rejected. Try my suggestion with a mon.yaml. > > > > > > Zitat von Tim Holloway : > > > > > > > ceph orch ls > > > > NAME PORTS RUNNING > > > > REFRESHED > > > > AGE > > > > PLACEMENT > > > > alertmanager ?:9093,9094 1/1 3m > > > > ago > > > > 8M > > > > count:1 > > > > crash 5/5 3m > > > > ago > > > > 8M > > > > * > > > > grafana ?:3000 1/1 3m > > > > ago > > > > 8M > > > > count:1 > > > > mds.ceefs 2/2 3m > > > > ago > > > > 4M > > > > count:2 > > > > mds.fs_name 3/3 3m > > > > ago > > > > 8M > > > > count:3 > > > > mgr 3/3 3m > > > > ago > > > > 4M > > > > www6.mousetech.com;www2.mousetech.com;www7.mousetech.com > > > > mon 2/1 3m > > > > ago > > > > 4M > > > > www6.mousetech.com;www2.mousetech.com;www7.mousetech.com;count: > > > > 1 > > > > nfs.foo ?:2049 1/1 3m > > > > ago > > > > 4M > > > > www7.mousetech.com > > > > node-exporter ?:9100 5/5 3m > > > > ago > > > > 8M > > > > * > > > > osd
[ceph-users] PSA: Long Standing Debian/Ubuntu build performance issue (fixed, backports in progress)
Hi Folks, Recently we discovered a flaw in how the upstream Ubuntu and Debian builds of Ceph compile RocksDB. It causes a variety of performance issues including slower than expected write performance, 3X longer compaction times, and significantly higher than expected CPU utilization when RocksDB is heavily utilized. The issue has now been fixed in main. Igor Fedotov, however, observed during the performance meeting today that there were no backports for the fix in place. He also rightly pointed out that it would be helpful to make an announcement about the issue given the severity for the affected users. I wanted to give a bit more background and make sure people are aware and understand what's going on. 1) Who's affected? Anyone running an upstream Ubuntu/Debian build of Ceph from the last several years. External builds from Canonical and Gentoo suffered from this issue as well, but were fixed independently. 2) How can you check? There's no easy way to tell at the moment. We are investigating if running "strings" on the OSD executable may provide a clue. For now, assume that if you are using our Debian/Ubuntu builds in a non-container configuration you are affected. Proxmox for instance was affected prior to adopting the fix. 3) Are Cephadm deployments affected? Not as far as we know. Ceph container builds are compiled slightly differently from stand-alone Debian builds. They do not appear to suffer from the bug. 4) What versions of Ceph will get the fix? Casey Bodley kindly offered to backport the fix to both Reef and Quincy. He also verified that the fix builds properly with Pacific. We now have 3 separate backport PRs for the releases here: https://github.com/ceph/ceph/pull/55500 https://github.com/ceph/ceph/pull/55501 https://github.com/ceph/ceph/pull/55502 Please feel free to reply if you have any questions! Thanks, Mark -- Best Regards, Mark Nelson Head of Research and Development Clyso GmbH p: +49 89 21552391 12 | a: Minnesota, USA w: https://clyso.com | e: mark.nel...@clyso.com We are hiring: https://www.clyso.com/jobs/ ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] What is the proper way to setup Rados Gateway (RGW) under Ceph?
I have setup a 'reef' Ceph Cluster using Cephadm and Ansible in a VMware ESXi 7 / Ubuntu 22.04 lab environment per the how-to guide provided here: https://computingforgeeks.com/install-ceph-storage-cluster-on-ubuntu-linux-servers/. The installation steps were fairly easy and I was able to get the environment up and running in about 15 minutes under VMware ESXi 7. I have buckets and pools already setup. However, the ceph.io site is confusing on how to setup the Rados Gateway (radosgw) with Multi-site -- https://docs.ceph.com/en/latest/radosgw/multisite/. Is a copy of HAProxy also needed for handling the front-end load balancing or is it implied that Ceph sets it up? Command-line scripting I was planning on using for setting up the RGW: ``` radosgw-admin realm create --rgw-realm=sandbox --default radosgw-admin zonegroup create --rgw-zonegroup=sandbox --master --default radosgw-admin zone create --rgw-zonegroup=sandbox --rgw-zone=sandbox --master --default radosgw-admin period update --rgw-realm=sandbox --commit ceph orch apply rgw sandbox --realm=sandbox --zone=sandbox --placement="2 ceph-mon1 ceph-mon2" --port=8000 ``` What other steps are needed to get the RGW up and running so that it can be presented to something like Veeam for doing performance and I/O testing concepts? -- Michael This message and its attachments are from Data Dimensions and are intended only for the use of the individual or entity to which it is addressed, and may contain information that is privileged, confidential, and exempt from disclosure under applicable law. If the reader of this message is not the intended recipient, or the employee or agent responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication is strictly prohibited. If you have received this communication in error, please notify the sender immediately and permanently delete the original email and destroy any copies or printouts of this email as well as any attachments. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: PSA: Long Standing Debian/Ubuntu build performance issue (fixed, backports in progress)
Holy! I have no questions just wanted to say thanks for emailing this, as much as it does suck to know that's been an issue I really appreciate you sharing the information about this on here. We've got a fair share of ubuntu clusters so if there's a way to validate I would love to know, but it also seems like it's pretty much guaranteed to have the issue so maybe no need for that hahahaha. If there's anything we can provide that would be of assistance let me know and I can see what we can do too! Thanks to everyone involved that's doing the hard work to get this resolved! Regards, Bailey > -Original Message- > From: Mark Nelson > Sent: February 8, 2024 2:05 PM > To: ceph-users@ceph.io; d...@ceph.io > Subject: [ceph-users] PSA: Long Standing Debian/Ubuntu build performance > issue (fixed, backports in progress) > > Hi Folks, > > Recently we discovered a flaw in how the upstream Ubuntu and Debian > builds of Ceph compile RocksDB. It causes a variety of performance issues > including slower than expected write performance, 3X longer compaction > times, and significantly higher than expected CPU utilization when RocksDB is > heavily utilized. The issue has now been fixed in main. > Igor Fedotov, however, observed during the performance meeting today > that there were no backports for the fix in place. He also rightly pointed out > that it would be helpful to make an announcement about the issue given the > severity for the affected users. I wanted to give a bit more background and > make sure people are aware and understand what's going on. > > 1) Who's affected? > > Anyone running an upstream Ubuntu/Debian build of Ceph from the last > several years. External builds from Canonical and Gentoo suffered from this > issue as well, but were fixed independently. > > 2) How can you check? > > There's no easy way to tell at the moment. We are investigating if running > "strings" on the OSD executable may provide a clue. For now, assume that if > you are using our Debian/Ubuntu builds in a non-container configuration you > are affected. Proxmox for instance was affected prior to adopting the fix. > > 3) Are Cephadm deployments affected? > > Not as far as we know. Ceph container builds are compiled slightly > differently from stand-alone Debian builds. They do not appear to suffer > from the bug. > > 4) What versions of Ceph will get the fix? > > Casey Bodley kindly offered to backport the fix to both Reef and Quincy. > He also verified that the fix builds properly with Pacific. We now have 3 > separate backport PRs for the releases here: > > https://github.com/ceph/ceph/pull/55500 > https://github.com/ceph/ceph/pull/55501 > https://github.com/ceph/ceph/pull/55502 > > > Please feel free to reply if you have any questions! > > Thanks, > Mark > > -- > Best Regards, > Mark Nelson > Head of Research and Development > > Clyso GmbH > p: +49 89 21552391 12 | a: Minnesota, USA > w: https://clyso.com | e: mark.nel...@clyso.com > > We are hiring: https://www.clyso.com/jobs/ > ___ > ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email > to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Ceph Storage || Deploy/Install/Bootstrap a Ceph Cluster || Cephadm Orchestrator CLI method
Hi Guys, I am newbie and trying to install Ceph Storage cluster and following this https://docs.ceph.com/en/latest/cephadm/install/#cephadm-deploying-new-cluster = OS - Ubuntu 22.04.3 LTS (Jammy Jellyfish) 4 node Cluster - mon1,mgr1,2 OSD nodes mon1 node can ssh all nodes via root to sudo ceph-user and ceph-user to ceph-user on other nodes basic requirements are done like podman, python3, systemd,ntp, lvm. === cephadm bootstrap --mon-ip 192.168.2.125 - after running this i am getting following error. ceph-user@mon1:~$ sudo cephadm bootstrap --mon-ip 192.168.2.125 Creating directory /etc/ceph for ceph.conf Verifying podman|docker is present... Verifying lvm2 is present... Verifying time synchronization is in place... Unit chrony.service is enabled and running Repeating the final host check... podman (/usr/bin/podman) version 3.4.4 is present systemctl is present lvcreate is present Unit chrony.service is enabled and running Host looks OK Cluster fsid: 90813682-c656-11ee-9ca3-0800274ff361 Verifying IP 192.168.2.125 port 3300 ... Verifying IP 192.168.2.125 port 6789 ... Mon IP `192.168.2.125` is in CIDR network `192.168.2.0/24` Mon IP `192.168.2.125` is in CIDR network `192.168.2.0/24` Internal network (--cluster-network) has not been provided, OSD replication will default to the public_network Pulling container image quay.io/ceph/ceph:v17... Ceph version: ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy (stable) Extracting ceph user uid/gid from container image... Creating initial keys... Creating initial monmap... Creating mon... Waiting for mon to start... Waiting for mon... mon is available Assimilating anything we can from ceph.conf... Generating new minimal ceph.conf... Restarting the monitor... Setting mon public_network to 192.168.2.0/24 Wrote config to /etc/ceph/ceph.conf Wrote keyring to /etc/ceph/ceph.client.admin.keyring Creating mgr... Verifying port 9283 ... Waiting for mgr to start... Waiting for mgr... mgr not available, waiting (1/15)... mgr not available, waiting (2/15)... mgr not available, waiting (3/15)... mgr not available, waiting (4/15)... mgr not available, waiting (5/15)... mgr not available, waiting (6/15)... mgr not available, waiting (7/15)... mgr is available Enabling cephadm module... Waiting for the mgr to restart... Waiting for mgr epoch 5... mgr epoch 5 is available Setting orchestrator backend to cephadm... Generating ssh key... Wrote public SSH key to /etc/ceph/ceph.pub Adding key to root@localhost authorized_keys... Adding host mon1... Deploying mon service with default placement... Deploying mgr service with default placement... Deploying crash service with default placement... Deploying ceph-exporter service with default placement... Non-zero exit code 22 from /usr/bin/podman run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint /usr/bin/ceph --init -e CONTAINER_IMAGE=quay.io/ceph/ceph:v17 -e NODE_NAME=mon1 -e CEPH_USE_RANDOM_NONCE=1 -v /var/log/ceph/90813682-c656-11ee-9ca3-0800274ff361:/var/log/ceph:z -v /tmp/ceph-tmpnjonhex7:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmp3gil6lbb:/etc/ceph/ceph.conf:z quay.io/ceph/ceph:v17 orch apply ceph-exporter /usr/bin/ceph: stderr Error EINVAL: Usage: /usr/bin/ceph: stderr ceph orch apply -i [--dry-run] /usr/bin/ceph: stderr ceph orch apply [--placement=] [--unmanaged] /usr/bin/ceph: stderr Traceback (most recent call last): File "/usr/sbin/cephadm", line 9653, in main() File "/usr/sbin/cephadm", line 9641, in main r = ctx.func(ctx) File "/usr/sbin/cephadm", line 2205, in _default_image return func(ctx) File "/usr/sbin/cephadm", line 5774, in command_bootstrap prepare_ssh(ctx, cli, wait_for_mgr_restart) File "/usr/sbin/cephadm", line 5275, in prepare_ssh cli(['orch', 'apply', t]) File "/usr/sbin/cephadm", line 5708, in cli return CephContainer( File "/usr/sbin/cephadm", line 4144, in run out, _, _ = call_throws(self.ctx, self.run_cmd(), File "/usr/sbin/cephadm", line 1853, in call_throws raise RuntimeError('Failed command: %s' % ' '.join(command)) RuntimeError: Failed command: /usr/bin/podman run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint /usr/bin/ceph --init -e CONTAINER_IMAGE=quay.io/ceph/ceph:v17 -e NODE_NAME=mon1 -e CEPH_USE_RANDOM_NONCE=1 -v /var/log/ceph/90813682-c656-11ee-9ca3-0800274ff361:/var/log/ceph:z -v /tmp/ceph-tmpnjonhex7:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmp3gil6lbb:/etc/ceph/ceph.conf:z quay.io/ceph/ceph:v17 orch apply ceph-exporter What i am doing wrong or missing? Please help. Many Thanks AS ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Does it impact write performance when SSD applies into block.wal (not block.db)
Hi everyone, I saw the bluestore can separate block.db, block.wal. In my case, I'd like to apply hybrid device which uses SSD, HDD to improve the small data write performance. but I don't have enough SSD to cover block.db and block.wal. so I think it can impact performance even though SSD applies into just block.wal. I just know that block.wal depends on rocksdb cache size as parameters. SSD might not need too much. 1. When I use SSD just into block.wal, Does it impact the write performance of the small data? 2. Should I make lv of ssd per osd for block.wal? 3. How much SSD do I need for block.wal relative to HDD(if I have 100TB)? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Does it impact write performance when SSD applies into block.wal (not block.db)
> Hi everyone, > > I saw the bluestore can separate block.db, block.wal. > In my case, I'd like to apply hybrid device which uses SSD, HDD to improve > the small data write performance. > but I don't have enough SSD to cover block.db and block.wal. > so I think it can impact performance even though SSD applies into just > block.wal. > I just know that block.wal depends on rocksdb cache size as parameters. SSD > might not need too much. > > 1. > When I use SSD just into block.wal, > Does it impact the write performance of the small data? I *think* by default only writes that are smaller than the min_alloc_size the OSD was created with will be staged in the WAL. In recent releases that defaults to 4KB. > 3. > How much SSD do I need for block.wal relative to HDD(if I have 100TB)? cf. https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/SBNRW5R22IE3OVOR57DRL2ULFTWXLAGQ/ The WAL size is I believe constant, 1GB. Be careful that you don’t share your SSD devices with too many HDDs. In the Filestore days conventional wisdom was to not share a SAS/SATA SSD across more than 4-5 HDD OSDs; an NVMe SSD perhaps as high as 10. If you exceed this ratio you may end up slower than with pure HDD OSDs. Naturally the best solution is to not use HDDs at all ;) ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] RGW Index pool(separated SSD) tuning factor
Hi everyone, I confirmed that write performance has increased too much even if I apply just SSD for the index pool of rgw. I know that ~200 Bytes per object in the index pool is created. When I checked the index pool size, it's around 300 bytes ~ 400 bytes calculated. like me, If it uses index pool applying separated SSD devices, I guess there is the tuning factor like block size to reduce index pool size. Is there the tuning factor or recommendation about separated SSD for the index pool? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io