[ceph-users] Re: Mon DB compaction MON_DISK_BIG
Okay, thank you very much. From: Anthony D'Atri Sent: Tuesday, October 20, 2020 9:32 AM To: Szabo, Istvan (Agoda) Cc: ceph-users@ceph.io Subject: Re: [ceph-users] Re: Mon DB compaction MON_DISK_BIG Email received from outside the company. If in doubt don't click links nor open attachments! > > Hi, > > Yeah, sequentially and waited for finish, and it looks like it is still doing > something in the background because now it is 9.5GB even if it tells > compaction done. > I think the ceph tell compact initiated harder so not sure how far it will go > down, but looks promising. When I sent the email it was 13, now 9.5. Online compaction isn’t as fast as offline compaction. If you set mon_compact_on_start = true in ceph.conf the mons will compact more efficiently before joining the quorum. This means of course that they’ll take longer to start up and become active. Arguably this should > 1 osd is down long time and but that one I want to remove from the cluster > soon, all pgs are active clean. There’s an issue with at least some versions of Luminous where having down/out OSDs confounds comnpaction. If you don’t end up soon with the mon DB size you expect, try removing or replacing that OSD and I’ll bet you have better results. — aad > > mon stat same yes. > > now I fininshed the email it is 8.7Gb. > > I hope I didn't break anything and it will delete everything. > > Thank you > > From: Anthony D'Atri > Sent: Tuesday, October 20, 2020 9:13 AM > To: ceph-users@ceph.io > Cc: Szabo, Istvan (Agoda) > Subject: Re: [ceph-users] Mon DB compaction MON_DISK_BIG > > Email received from outside the company. If in doubt don't click links nor > open attachments! > > > I hope you restarted those mons sequentially, waiting between each for the > quorum to return. > > Is there any recovery or pg autoscaling going on? > > Are all OSDs up/in, ie. are the three numbers returned by `ceph osd stat` the > same? > > — aad > >> On Oct 19, 2020, at 7:05 PM, Szabo, Istvan (Agoda) >> wrote: >> >> Hi, >> >> >> I've received a warning today morning: >> >> >> HEALTH_WARN mons monserver-2c01,monserver-2c02,monserver-2c03 are using a >> lot of disk space >> MON_DISK_BIG mons monserver-2c01,monserver-2c02,monserver-2c03 are using a >> lot of disk space >> mon.monserver-2c01 is 15.3GiB >= mon_data_size_warn (15GiB) >> mon.monserver-2c02 is 15.3GiB >= mon_data_size_warn (15GiB) >> mon.monserver-2c03 is 15.3GiB >= mon_data_size_warn (15GiB) >> >> It hits the 15GB so I've restarted all the 3 mons, it triggered compaction. >> >> I've also ran this command: >> >> ceph tell mon.`hostname -s` compact on the first node, but it wents down >> only to 13GB. >> >> >> du -sch /var/lib/ceph/mon/ceph-monserver-2c01/store.db/ >> 13G /var/lib/ceph/mon/ceph-monserver-2c01/store.db/ >> 13G total >> >> >> Anything else I can do to reduce it? >> >> >> Luminous 12.2.8 is the version. >> >> >> Thank you in advance. >> >> >> >> This message is confidential and is for the sole use of the intended >> recipient(s). It may also be privileged or otherwise protected by copyright >> or other legal rules. If you have received it by mistake please let us know >> by reply email and delete it from your system. It is prohibited to copy this >> message or disclose its content to anyone. Any confidentiality or privilege >> is not waived or lost by any mistaken delivery or unauthorized disclosure of >> the message. All messages sent to and from Agoda may be monitored to ensure >> compliance with company policies, to protect the company's interests and to >> remove potential malware. Electronic messages may be intercepted, amended, >> lost or deleted, or contain viruses. >> ___ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io > > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Mon DB compaction MON_DISK_BIG
Hi, Yeah, sequentially and waited for finish, and it looks like it is still doing something in the background because now it is 9.5GB even if it tells compaction done. I think the ceph tell compact initiated harder so not sure how far it will go down, but looks promising. When I sent the email it was 13, now 9.5. 1 osd is down long time and but that one I want to remove from the cluster soon, all pgs are active clean. mon stat same yes. now I fininshed the email it is 8.7Gb. I hope I didn't break anything and it will delete everything. Thank you From: Anthony D'Atri Sent: Tuesday, October 20, 2020 9:13 AM To: ceph-users@ceph.io Cc: Szabo, Istvan (Agoda) Subject: Re: [ceph-users] Mon DB compaction MON_DISK_BIG Email received from outside the company. If in doubt don't click links nor open attachments! I hope you restarted those mons sequentially, waiting between each for the quorum to return. Is there any recovery or pg autoscaling going on? Are all OSDs up/in, ie. are the three numbers returned by `ceph osd stat` the same? — aad > On Oct 19, 2020, at 7:05 PM, Szabo, Istvan (Agoda) > wrote: > > Hi, > > > I've received a warning today morning: > > > HEALTH_WARN mons monserver-2c01,monserver-2c02,monserver-2c03 are using a lot > of disk space > MON_DISK_BIG mons monserver-2c01,monserver-2c02,monserver-2c03 are using a > lot of disk space >mon.monserver-2c01 is 15.3GiB >= mon_data_size_warn (15GiB) >mon.monserver-2c02 is 15.3GiB >= mon_data_size_warn (15GiB) >mon.monserver-2c03 is 15.3GiB >= mon_data_size_warn (15GiB) > > It hits the 15GB so I've restarted all the 3 mons, it triggered compaction. > > I've also ran this command: > > ceph tell mon.`hostname -s` compact on the first node, but it wents down only > to 13GB. > > > du -sch /var/lib/ceph/mon/ceph-monserver-2c01/store.db/ > 13G /var/lib/ceph/mon/ceph-monserver-2c01/store.db/ > 13G total > > > Anything else I can do to reduce it? > > > Luminous 12.2.8 is the version. > > > Thank you in advance. > > > > This message is confidential and is for the sole use of the intended > recipient(s). It may also be privileged or otherwise protected by copyright > or other legal rules. If you have received it by mistake please let us know > by reply email and delete it from your system. It is prohibited to copy this > message or disclose its content to anyone. Any confidentiality or privilege > is not waived or lost by any mistaken delivery or unauthorized disclosure of > the message. All messages sent to and from Agoda may be monitored to ensure > compliance with company policies, to protect the company's interests and to > remove potential malware. Electronic messages may be intercepted, amended, > lost or deleted, or contain viruses. > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Mon DB compaction MON_DISK_BIG
Hi, I've received a warning today morning: HEALTH_WARN mons monserver-2c01,monserver-2c02,monserver-2c03 are using a lot of disk space MON_DISK_BIG mons monserver-2c01,monserver-2c02,monserver-2c03 are using a lot of disk space mon.monserver-2c01 is 15.3GiB >= mon_data_size_warn (15GiB) mon.monserver-2c02 is 15.3GiB >= mon_data_size_warn (15GiB) mon.monserver-2c03 is 15.3GiB >= mon_data_size_warn (15GiB) It hits the 15GB so I've restarted all the 3 mons, it triggered compaction. I've also ran this command: ceph tell mon.`hostname -s` compact on the first node, but it wents down only to 13GB. du -sch /var/lib/ceph/mon/ceph-monserver-2c01/store.db/ 13G /var/lib/ceph/mon/ceph-monserver-2c01/store.db/ 13G total Anything else I can do to reduce it? Luminous 12.2.8 is the version. Thank you in advance. This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Recommended settings for PostgreSQL
Another path that we have been investigating is to use some NVMe on the database machine as a cache (bcache, cachefs, etc). Several TB of U.2 drives in a striped-LVM should enhance performance for 'hot' data and cover for the issues of storing a large DB in Ceph. Note that we haven't tried this yet, but there are at least some discussions for MySQL. -Dave Dave Hall Binghamton University On 10/19/2020 10:49 PM, Brian Topping wrote: Another option is to let PosgreSQL do the replication with local storage. There are great reasons for Ceph, but databases optimize for this kind of thing extremely well. With replication in hand, run snapshots to RADOS buckets for long term storage. On Oct 17, 2020, at 7:28 AM, Gencer W. Genç wrote: Hi, I have an existing few RBDs. I would like to create a new RBD Image for PostgreSQL. Do you have any suggestions for such use cases? For example; Currently defaults are: Object size (4MB) and Stripe Unit (None) Features: Deep flatten + Layering + Exclusive Lock + Object Map + FastDiff Should I use as is or should I use 16KB of object size and different sets of features for PostgreSQL? Thanks, Gencer. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Recommended settings for PostgreSQL
Another option is to let PosgreSQL do the replication with local storage. There are great reasons for Ceph, but databases optimize for this kind of thing extremely well. With replication in hand, run snapshots to RADOS buckets for long term storage. > On Oct 17, 2020, at 7:28 AM, Gencer W. Genç wrote: > > Hi, > > I have an existing few RBDs. I would like to create a new RBD Image for > PostgreSQL. Do you have any suggestions for such use cases? For example; > > Currently defaults are: > > Object size (4MB) and Stripe Unit (None) > Features: Deep flatten + Layering + Exclusive Lock + Object Map + FastDiff > > Should I use as is or should I use 16KB of object size and different sets of > features for PostgreSQL? > > Thanks, > Gencer. > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Recommended settings for PostgreSQL
Marc Roos wrote: > > In the past I see some good results (benchmark & > > latencies) for MySQL and PostgreSQL. However, I've always used > > 4MB object size. Maybe i can get much better > > performance on smaller object size. Haven't tried actually. > > Did you tune mysql / postgres for this setup? Did you have a default > ceph rbd setup? Yes, I had to tune some settings on PostgreSQL. Especially on: synchronous_commit = off I have a default RBD settings. Do you have any recommendation? Thanks, Gencer. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Mon DB compaction MON_DISK_BIG
> > Hi, > > Yeah, sequentially and waited for finish, and it looks like it is still doing > something in the background because now it is 9.5GB even if it tells > compaction done. > I think the ceph tell compact initiated harder so not sure how far it will go > down, but looks promising. When I sent the email it was 13, now 9.5. Online compaction isn’t as fast as offline compaction. If you set mon_compact_on_start = true in ceph.conf the mons will compact more efficiently before joining the quorum. This means of course that they’ll take longer to start up and become active. Arguably this should > 1 osd is down long time and but that one I want to remove from the cluster > soon, all pgs are active clean. There’s an issue with at least some versions of Luminous where having down/out OSDs confounds comnpaction. If you don’t end up soon with the mon DB size you expect, try removing or replacing that OSD and I’ll bet you have better results. — aad > > mon stat same yes. > > now I fininshed the email it is 8.7Gb. > > I hope I didn't break anything and it will delete everything. > > Thank you > > From: Anthony D'Atri > Sent: Tuesday, October 20, 2020 9:13 AM > To: ceph-users@ceph.io > Cc: Szabo, Istvan (Agoda) > Subject: Re: [ceph-users] Mon DB compaction MON_DISK_BIG > > Email received from outside the company. If in doubt don't click links nor > open attachments! > > > I hope you restarted those mons sequentially, waiting between each for the > quorum to return. > > Is there any recovery or pg autoscaling going on? > > Are all OSDs up/in, ie. are the three numbers returned by `ceph osd stat` the > same? > > — aad > >> On Oct 19, 2020, at 7:05 PM, Szabo, Istvan (Agoda) >> wrote: >> >> Hi, >> >> >> I've received a warning today morning: >> >> >> HEALTH_WARN mons monserver-2c01,monserver-2c02,monserver-2c03 are using a >> lot of disk space >> MON_DISK_BIG mons monserver-2c01,monserver-2c02,monserver-2c03 are using a >> lot of disk space >> mon.monserver-2c01 is 15.3GiB >= mon_data_size_warn (15GiB) >> mon.monserver-2c02 is 15.3GiB >= mon_data_size_warn (15GiB) >> mon.monserver-2c03 is 15.3GiB >= mon_data_size_warn (15GiB) >> >> It hits the 15GB so I've restarted all the 3 mons, it triggered compaction. >> >> I've also ran this command: >> >> ceph tell mon.`hostname -s` compact on the first node, but it wents down >> only to 13GB. >> >> >> du -sch /var/lib/ceph/mon/ceph-monserver-2c01/store.db/ >> 13G /var/lib/ceph/mon/ceph-monserver-2c01/store.db/ >> 13G total >> >> >> Anything else I can do to reduce it? >> >> >> Luminous 12.2.8 is the version. >> >> >> Thank you in advance. >> >> >> >> This message is confidential and is for the sole use of the intended >> recipient(s). It may also be privileged or otherwise protected by copyright >> or other legal rules. If you have received it by mistake please let us know >> by reply email and delete it from your system. It is prohibited to copy this >> message or disclose its content to anyone. Any confidentiality or privilege >> is not waived or lost by any mistaken delivery or unauthorized disclosure of >> the message. All messages sent to and from Agoda may be monitored to ensure >> compliance with company policies, to protect the company's interests and to >> remove potential malware. Electronic messages may be intercepted, amended, >> lost or deleted, or contain viruses. >> ___ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io > > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Mon DB compaction MON_DISK_BIG
I hope you restarted those mons sequentially, waiting between each for the quorum to return. Is there any recovery or pg autoscaling going on? Are all OSDs up/in, ie. are the three numbers returned by `ceph osd stat` the same? — aad > On Oct 19, 2020, at 7:05 PM, Szabo, Istvan (Agoda) > wrote: > > Hi, > > > I've received a warning today morning: > > > HEALTH_WARN mons monserver-2c01,monserver-2c02,monserver-2c03 are using a lot > of disk space > MON_DISK_BIG mons monserver-2c01,monserver-2c02,monserver-2c03 are using a > lot of disk space >mon.monserver-2c01 is 15.3GiB >= mon_data_size_warn (15GiB) >mon.monserver-2c02 is 15.3GiB >= mon_data_size_warn (15GiB) >mon.monserver-2c03 is 15.3GiB >= mon_data_size_warn (15GiB) > > It hits the 15GB so I've restarted all the 3 mons, it triggered compaction. > > I've also ran this command: > > ceph tell mon.`hostname -s` compact on the first node, but it wents down only > to 13GB. > > > du -sch /var/lib/ceph/mon/ceph-monserver-2c01/store.db/ > 13G /var/lib/ceph/mon/ceph-monserver-2c01/store.db/ > 13G total > > > Anything else I can do to reduce it? > > > Luminous 12.2.8 is the version. > > > Thank you in advance. > > > > This message is confidential and is for the sole use of the intended > recipient(s). It may also be privileged or otherwise protected by copyright > or other legal rules. If you have received it by mistake please let us know > by reply email and delete it from your system. It is prohibited to copy this > message or disclose its content to anyone. Any confidentiality or privilege > is not waived or lost by any mistaken delivery or unauthorized disclosure of > the message. All messages sent to and from Agoda may be monitored to ensure > compliance with company policies, to protect the company's interests and to > remove potential malware. Electronic messages may be intercepted, amended, > lost or deleted, or contain viruses. > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Bucket notification is working strange
Hi everyone, I asked the same question in stackoverflow, but will repeat here. I configured bucket notification using a bucket owner creds and when the owner does actions I can see new events in a configured endpoint(kafka actually). However, when I try to do actions in the bucket, but with another user creds I do not see events in the configured notification topic. Is it expected behavior and each user has to configure own topic(is it possible if a user is not system at all)? Or I have missed something? Thank you. https://stackoverflow.com/questions/64384060/enable-bucket-notifications-for-all-users-in-ceph-octopus ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: OSD host count affecting available pool size?
Hi, I'm not sure I understand what your interpretation is. If you have 30 OSDs each with 1TB you'll end up with 30TB available (raw) space, no matter if those OSDs are spread across 3 or 10 hosts. The crush rules you define determine how many replicas are going to be distributed across your OSDs. Having a default recplicated rule with "size = 3" will result in 10TB usable space (given your example), just for simplification I'm not taking into account the rocksDB sizes etc. Is Ceph limiting the max available pool size because all of my OSDs are being hosted on just three nodes? If I had 30 OSDs running across ten nodes instead, a node failure would result in just three OSDs dropping out instead of ten. So the answer would be no, the pool size is defined by available OSDs (of a device class) divided by the replica count. What you gain from having more servers with fewer OSDs is a higher failure resiliency and you're more flexible in terms of data placement (e.g. use erasure-coding to save space). The load during the recovery of one failed node with 10 OSDs is much higher than having to recover only 3 OSDs on one node, the clients probably wouldn't even notice. Having more nodes improves the performance if you have many clients since there are more OSD nodes to talk to, ceph scales out quite well. If this doesn't answer your question, could you please clarify? Regards, Eugen Zitat von Dallas Jones : Ah, this sentence in the docs I've overlooked before: When you deploy OSDs they are automatically placed within the CRUSH map under a host node named with the hostname for the host they are running on. This, combined with the default CRUSH failure domain, ensures that replicas or erasure code shards are separated across hosts and a single host failure will not affect availability. I think this means what I thought it would mean - having the OSDs concentrated onto fewer hosts is limiting the volume size... On Mon, Oct 19, 2020 at 9:08 AM Dallas Jones wrote: Hi, Ceph brain trust: I'm still trying to wrap my head around some capacity planning for Ceph, and I can't find a definitive answer to this question in the docs (at least one that penetrates my mental haze)... Does the OSD host count affect the total available pool size? My cluster consists of three 12-bay Dell PowerEdge machines running reflashed PERCs to make each SAS drive individually addressable. Each node is running 10 OSDs. Is Ceph limiting the max available pool size because all of my OSDs are being hosted on just three nodes? If I had 30 OSDs running across ten nodes instead, a node failure would result in just three OSDs dropping out instead of ten. Is there any rationale to this thinking, or am I trying to manufacture a solution to a problem I still don't understand? Thanks, Dallas ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Ceph Octopus
Hi, I have installed Ceph Octopus cluster using cephadm with a single network now I want to add a second network and configure it as a cluster address. How do I configure ceph to use second Network as cluster network?. Amudhan ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: OSD host count affecting available pool size?
Ah, this sentence in the docs I've overlooked before: When you deploy OSDs they are automatically placed within the CRUSH map under a host node named with the hostname for the host they are running on. This, combined with the default CRUSH failure domain, ensures that replicas or erasure code shards are separated across hosts and a single host failure will not affect availability. I think this means what I thought it would mean - having the OSDs concentrated onto fewer hosts is limiting the volume size... On Mon, Oct 19, 2020 at 9:08 AM Dallas Jones wrote: > Hi, Ceph brain trust: > > I'm still trying to wrap my head around some capacity planning for Ceph, > and I can't find a definitive answer to this question in the docs (at least > one that penetrates my mental haze)... > > Does the OSD host count affect the total available pool size? My cluster > consists of three 12-bay Dell PowerEdge machines running reflashed PERCs to > make each SAS drive individually addressable. Each node is running 10 OSDs. > > Is Ceph limiting the max available pool size because all of my OSDs are > being hosted on just three nodes? If I had 30 OSDs running across ten nodes > instead, a node failure would result in just three OSDs dropping out > instead of ten. > > Is there any rationale to this thinking, or am I trying to manufacture a > solution to a problem I still don't understand? > > Thanks, > > Dallas > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] OSD host count affecting available pool size?
Hi, Ceph brain trust: I'm still trying to wrap my head around some capacity planning for Ceph, and I can't find a definitive answer to this question in the docs (at least one that penetrates my mental haze)... Does the OSD host count affect the total available pool size? My cluster consists of three 12-bay Dell PowerEdge machines running reflashed PERCs to make each SAS drive individually addressable. Each node is running 10 OSDs. Is Ceph limiting the max available pool size because all of my OSDs are being hosted on just three nodes? If I had 30 OSDs running across ten nodes instead, a node failure would result in just three OSDs dropping out instead of ten. Is there any rationale to this thinking, or am I trying to manufacture a solution to a problem I still don't understand? Thanks, Dallas ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: multiple OSD crash, unfound objects
Hi Frank, I'll give both of these a try and let you know what happens. Thanks again for your help, --Mike On 10/16/20 12:35 PM, Frank Schilder wrote: Dear Michael, this is a bit of a nut. I can't see anything obvious. I have two hypotheses that you might consider testing. 1) Problem with 1 incomplete PG. In the shadow hierarchy for your cluster I can see quite a lot of nodes like { "id": -135, "name": "node229~hdd", "type_id": 1, "type_name": "host", "weight": 0, "alg": "straw2", "hash": "rjenkins1", "items": [] }, I would have expected that hosts without a device of a certain device class are *excluded* completely from a tree instead of having weight 0. I'm wondering if this could lead to the crush algorithm fail in the way described here: https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-pg/#crush-gives-up-too-soon . This might be a long shot, but could you export your crush map and play with the tunables as described under this link to see if more tries lead to a valid mapping? Note that testing this is harmless and does not change anything on the cluster. > The hypothesis here is that buckets with weight 0 are not excluded from drawing a-priori, but a-posteriori. If there are too many draws of an empty bucket, a mapping fails. Allowing more tries should then lead to success. We should at least rule out this possibility. 2) About the incomplete PG. I'm wondering if the problem is that the pool has exactly 1 PG. I don't have a test pool with Nautilus and cannot try this out. Can you create a test pool with pg_num=pgp_num=1 and see if the PG gets an OSD mapping? If not, can you then increase pg_num and pgp_num to, say, 10 and see if this has any effect? I'm wondering here if there needs to be a minimum number >1 of PGs in a pool. Again, this is more about ruling out a possibility than expecting success. As an extension to this test, you could increase pg_num and pgp_num of the pool device_health_metrics to see if this has any effect. The crush rules and crush tree look OK to me. I can't really see why the missing OSDs are not assigned to the two PGs 1.0 and 7.39d. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Frank Schilder Sent: 16 October 2020 15:41:29 To: Michael Thomas; ceph-users@ceph.io Subject: [ceph-users] Re: multiple OSD crash, unfound objects Dear Michael, Please mark OSD 41 as "in" again and wait for some slow ops to show up. I forgot. "wait for some slow ops to show up" ... and then what? Could you please go to the host of the affected OSD and look at the output of "ceph daemon osd.ID ops" or "ceph daemon osd.ID dump_historic_slow_ops" and check what type of operations get stuck? I'm wondering if its administrative, like peering attempts. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Frank Schilder Sent: 16 October 2020 15:09:20 To: Michael Thomas; ceph-users@ceph.io Subject: Re: [ceph-users] Re: multiple OSD crash, unfound objects Dear Michael, thanks for this initial work. I will need to look through the files you posted in more detail. In the meantime: Please mark OSD 41 as "in" again and wait for some slow ops to show up. As far as I can see, marking it "out" might have cleared hanging slow ops (there were 1000 before), but they then started piling up again. From the OSD log it looks like an operation that is sent to/from PG 1.0, which doesn't respond because it is inactive. Hence, getting PG 1.0 active should resolve this issue (later). Its a bit strange that I see slow ops for OSD 41 in the latest health detail (https://pastebin.com/3G3ij9ui). Was the OSD still out when this health report was created? I think we might have misunderstood my question 6. My question was whether or not each host bucket corresponds to a physical host and vice versa, that is, each physical host has exactly 1 host bucket. I'm asking because it is possible to have multiple host buckets assigned to a single physical host and this has implications on how to manage things. Coming back to PG 1.0 (the only PG in pool device_health_metrics as far as I can see), the problem is that is has no OSDs assigned. I need to look a bit longer at the data you uploaded to find out why. I can't see anything obvious. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Michael Thomas Sent: 16 October 2020 02:08:01 To: Frank Schilder; ceph-users@ceph.io Subject: Re: [ceph-users] Re: multiple OSD crash, unfound objects On 10/14/20 3:49 PM, Frank Schilder wrote: Hi Michael, it doesn't look too bad. All degraded objects are due to the undersized PG. If this is an EC pool with m>
[ceph-users] Re: multiple OSD crash, unfound objects
I left osd.41 out over the weekend, and put it back in this morning. After the recovery finished, here are the results of the ops queries: ceph daemon osd.41 ops: https://pastebin.com/keYBMVbH ceph daemon osd.41 dump_historic_slow_ops https://pastebin.com/axbZNh7M Yes, the OSD was still out when the previous health report was created. As for the host buckets, each host bucket corresponds to a physical host. --Mike On 10/16/20 8:41 AM, Frank Schilder wrote: Dear Michael, Please mark OSD 41 as "in" again and wait for some slow ops to show up. I forgot. "wait for some slow ops to show up" ... and then what? Could you please go to the host of the affected OSD and look at the output of "ceph daemon osd.ID ops" or "ceph daemon osd.ID dump_historic_slow_ops" and check what type of operations get stuck? I'm wondering if its administrative, like peering attempts. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Frank Schilder Sent: 16 October 2020 15:09:20 To: Michael Thomas; ceph-users@ceph.io Subject: Re: [ceph-users] Re: multiple OSD crash, unfound objects Dear Michael, thanks for this initial work. I will need to look through the files you posted in more detail. In the meantime: Please mark OSD 41 as "in" again and wait for some slow ops to show up. As far as I can see, marking it "out" might have cleared hanging slow ops (there were 1000 before), but they then started piling up again. From the OSD log it looks like an operation that is sent to/from PG 1.0, which doesn't respond because it is inactive. Hence, getting PG 1.0 active should resolve this issue (later). Its a bit strange that I see slow ops for OSD 41 in the latest health detail (https://pastebin.com/3G3ij9ui). Was the OSD still out when this health report was created? I think we might have misunderstood my question 6. My question was whether or not each host bucket corresponds to a physical host and vice versa, that is, each physical host has exactly 1 host bucket. I'm asking because it is possible to have multiple host buckets assigned to a single physical host and this has implications on how to manage things. Coming back to PG 1.0 (the only PG in pool device_health_metrics as far as I can see), the problem is that is has no OSDs assigned. I need to look a bit longer at the data you uploaded to find out why. I can't see anything obvious. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Michael Thomas Sent: 16 October 2020 02:08:01 To: Frank Schilder; ceph-users@ceph.io Subject: Re: [ceph-users] Re: multiple OSD crash, unfound objects On 10/14/20 3:49 PM, Frank Schilder wrote: Hi Michael, it doesn't look too bad. All degraded objects are due to the undersized PG. If this is an EC pool with m>=2, data is currently not in danger. I see a few loose ends to pick up, let's hope this is something simple. For any of the below, before attempting the next step, please wait until all induced recovery IO has completed before continuing. 1) Could you please paste the output of the following commands to pastebin (bash syntax): ceph osd pool get device_health_metrics all https://pastebin.com/6D83mjsV ceph osd pool get fs.data.archive.frames all https://pastebin.com/7XAaQcpC ceph pg dump |& grep -i -e PG_STAT -e "^7.39d" https://pastebin.com/tBLaq63Q ceph osd crush rule ls https://pastebin.com/6f5B778G ceph osd erasure-code-profile ls https://pastebin.com/uhAaMH1c ceph osd crush dump # this is a big one, please be careful with copy-paste (see point 3 below) https://pastebin.com/u92D23jV 2) I don't see any IO reported (neither user nor recovery). Could you please confirm that the command outputs were taken during a zero-IO period? That's correct, there was no activity at this time. Access to the cephfs filesystem is very bursty, varying from completely idle to multiple GB/s (read). 3) Something is wrong with osd.41. Can you check its health status with smartctl? If it is reported healthy, give it one more clean restart. If the slow ops do not disappear, it could be a disk fail that is not detected by health monitoring. You could set it to "out" and see if the cluster recovers to a healthy state (modulo the currently degraded objects) with no slow ops. If so, I would replace the disk. smartctl reports no problems. osd.41 (and osd.0) was one of the original OSDs used for the device_health_metrics pool. Early on, before I knew better, I had removed this OSD (and osd.0) from the cluster, and the OSD ids got recycled when new disks were later added. This is when the slow ops on osd.0 and osd.41 started getting reported. On advice from another user on ceph-users, I updated my crush map to remap the device_health_metrics pool to a different set of OSDs (and the slow ops persisted). osd.0 usual
[ceph-users] Re: Recommended settings for PostgreSQL
Yes, I had to tune some settings on PostgreSQL. Especially on: synchronous_commit = off I have a default RBD settings. Do you have any recommendation? Thanks, Gencer. On 19.10.2020 12:49:51, Marc Roos wrote: > In the past I see some good results (benchmark & latencies) for MySQL and PostgreSQL. However, I've always used > 4MB object size. Maybe i can get much better performance on smaller object size. Haven't tried actually. Did you tune mysql / postgres for this setup? Did you have a default ceph rbd setup? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Ceph OIDC Integration
Dear Pritha, thanks a lot for your feedback and apologies for missing your comment about the backporting. Would you have a rough estimate on the next Octopus release by any chance? On another note on the same subject, would you be able to give us some feedback on how the users will be created in Ceph? (for example when we used ldap, an ldap user used to be created in Ceph for "mapping", will it be the same in this case) If we have multiple tenants (unique usernames "emails" in KeyCloak) how will the introspect url's be defined for different tenants? Thanks in advance ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] RGW with HAProxy
Hi When I use haproxy with keep-alive mode to rgws, haproxy gives many responses like this! Is there any problem with keep-alive mode in rgw? Using nautilus 14.2.9 with beast frontend. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Recommended settings for PostgreSQL
> In the past I see some good results (benchmark & latencies) for MySQL and PostgreSQL. However, I've always used > 4MB object size. Maybe i can get much better performance on smaller object size. Haven't tried actually. Did you tune mysql / postgres for this setup? Did you have a default ceph rbd setup? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io