[ceph-users] NFS - HA and Ingress completion note?
NFS - HA and Ingress: [ https://docs.ceph.com/en/latest/mgr/nfs/#ingress ] Referring to Note#2, is NFS high-availability functionality considered complete (and stable)? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Duplicate help statements in Prometheus metrics in 16.2.13
Dear all, after the update to CEPH 16.2.13 the Prometheus exporter is wrongly exporting multiple metric help & type lines for ceph_pg_objects_repaired: [mon1] /root #curl -sS http://localhost:9283/metrics # HELP ceph_pg_objects_repaired Number of objects repaired in a pool Count # TYPE ceph_pg_objects_repaired counter ceph_pg_objects_repaired{poolid="34"} 0.0 # HELP ceph_pg_objects_repaired Number of objects repaired in a pool Count # TYPE ceph_pg_objects_repaired counter ceph_pg_objects_repaired{poolid="33"} 0.0 # HELP ceph_pg_objects_repaired Number of objects repaired in a pool Count # TYPE ceph_pg_objects_repaired counter ceph_pg_objects_repaired{poolid="32"} 0.0 [...] This annoys our exporter_exporter service so it rejects the export of ceph metrics. Is this a known issue? Will this be fixed in the next update? Cheers, Andreas -- | Andreas Haupt| E-Mail: andreas.ha...@desy.de | DESY Zeuthen| WWW:http://www.zeuthen.desy.de/~ahaupt | Platanenallee 6 | Phone: +49/33762/7-7359 | D-15738 Zeuthen | Fax:+49/33762/7-7216 smime.p7s Description: S/MIME cryptographic signature ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] RBD snapshot mirror syncs all snapshots
Hello, I setup two-way snapshot-based RBD mirroring between two Ceph clusters. After enabling mirroring for an image that already had regular snapshots independently from RBD mirror on the source cluster, the image and all snapshots were synced to the destination cluster. Is there a way to avoid having all snapshots being synced? We only need the latest version of the image on the destination cluster and the snapshots add around 200% disk space overhead on average. Best regards, Andreas ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Ceph rbd clients surrender exclusive lock in critical situation
Hi Frank, one thing that might be relevant here: If you disable transparent lock transitions, you cannot create snapshots of images that are in use in such a way. This may or may not be relevant in your case. I'm just mentioning it because I myself was surprised by that. Best regards, Andreas On 19.01.23 12:50, Frank Schilder wrote: Hi Ilya, thanks for the info, it did help. I agree, its the orchestration layer's responsibility to handle things right. I have a case open already with support and it looks like there is indeed a bug on that side. I was mainly after a way that ceph librbd clients could offer a safety net in case such bugs occur. Its a bit like the four-eyes principle, having an orchestration layer do things right is good, but having a second instance confirming the same thing is much better. A bug in one layer will not cause a catastrophe, because the second layer catches it. I'm not sure if the rbd lock capabilities are sufficiently powerful to provide a command-line interface to that. The flag RBD_LOCK_MODE_EXCLUSIVE seems the only way and if qemu is not using it, there seems not a lot one can do in scripts. Thanks for your help and best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Cannot create snapshots if RBD image is mapped with -oexclusive
Hello, in case anyone finds this post while trying to find an answer to the same question, I believe the answer is here: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/DBJRYTMQURANFFWSS4QDCKD5KULJQ46X/ As far as I understand it: Creating a snapshot requires to acquire the lock and with "-oexclusive" the RBD client is not going to release it. So this is not a bug. Best regards, Andreas On 30.11.22 12:58, Andreas Teuchert wrote: Hello, creating snapshots of RBD images that are mapped with -oexclusive seems not to be possible: # rbd map -oexclusive rbd.blu1/andreasspielt-share11 /dev/rbd7 # rbd snap create rbd.blu1/andreasspielt-share11@ateuchert_test01 Creating snap: 0% complete...failed. rbd: failed to create snapshot: (30) Read-only file system # rbd unmap rbd.blu1/andreasspielt-share11 # rbd map rbd.blu1/andreasspielt-share11 /dev/rbd7 rbd snap create rbd.blu1/andreasspielt-share11@ateuchert_test01 Creating snap: 100% complete...done. I was surprised by this behavior and the documentation seems not to mention this. Is this on purpose or a bug? Ceph version is 17.2.5, RBD client is Ubuntu 22.04 with kernel 5.15.0-52-generic. Best regards, Andreas ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Cannot create snapshots if RBD image is mapped with -oexclusive
Hello, creating snapshots of RBD images that are mapped with -oexclusive seems not to be possible: # rbd map -oexclusive rbd.blu1/andreasspielt-share11 /dev/rbd7 # rbd snap create rbd.blu1/andreasspielt-share11@ateuchert_test01 Creating snap: 0% complete...failed. rbd: failed to create snapshot: (30) Read-only file system # rbd unmap rbd.blu1/andreasspielt-share11 # rbd map rbd.blu1/andreasspielt-share11 /dev/rbd7 rbd snap create rbd.blu1/andreasspielt-share11@ateuchert_test01 Creating snap: 100% complete...done. I was surprised by this behavior and the documentation seems not to mention this. Is this on purpose or a bug? Ceph version is 17.2.5, RBD client is Ubuntu 22.04 with kernel 5.15.0-52-generic. Best regards, Andreas ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: MGR failures and pg autoscaler
Hi Giuseppe, On Tue, 2022-10-25 at 07:54 +, Lo Re Giuseppe wrote: > “”” > > In the mgr logs I see: > “”” > > debug 2022-10-20T23:09:03.859+ 7fba5f300700 0 [pg_autoscaler ERROR root] > pool 2 has overlapping roots: {-60, -1} This is unrelated, I asked the same question some days ago: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/OZTOVT2TXEA23NI2TPTWD3WU2AZM6YSH/ Starting with Pacific the autoscaler is unable to deal with mixed pools spread over different storage device classes. Although this is documented, I'd call it a regression - the same kind of setup still worked with autoscaler in Octopus. You will find the overlapping roots by listing the device-class-based shadow entries: ceph osd crush tree --show-shadow Regarding your problem, you need to look for further errors. Last time an mgr module failed here it was due to some missing python modules ... Something suspicious in the output of "ceph crash ls" ? Cheers, Andreas -- | Andreas Haupt| E-Mail: andreas.ha...@desy.de | DESY Zeuthen| WWW:http://www-zeuthen.desy.de/~ahaupt | Platanenallee 6 | Phone: +49/33762/7-7359 | D-15738 Zeuthen | Fax:+49/33762/7-7216 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Autoscaler stopped working after upgrade Octopus -> Pacific
Dear all, just upgraded our cluster from Octopus to Pacific (16.2.10). This introduced an error in autoscaler: 2022-10-11T14:47:40.421+0200 7f3ec2d03700 0 [pg_autoscaler ERROR root] pool 17 has overlapping roots: {-4, -1} 2022-10-11T14:47:40.423+0200 7f3ec2d03700 0 [pg_autoscaler ERROR root] pool 22 has overlapping roots: {-4, -1} 2022-10-11T14:47:40.423+0200 7f3ec2d03700 0 [pg_autoscaler ERROR root] pool 23 has overlapping roots: {-4, -1} 2022-10-11T14:47:40.427+0200 7f3ec2d03700 0 [pg_autoscaler ERROR root] pool 27 has overlapping roots: {-6, -4, -1} 2022-10-11T14:47:40.428+0200 7f3ec2d03700 0 [pg_autoscaler ERROR root] pool 28 has overlapping roots: {-6, -4, -1} Autoscaler status is empty: [cephmon1] /root # ceph osd pool autoscale-status [cephmon1] /root # On https://forum.proxmox.com/threads/ceph-overlapping-roots.104199/ I found something similar: --- I assume that you have at least one pool that still has the "replicated_rule" assigned, which does not make a distinction between the device class of the OSDs. This is why you see this error. The autoscaler cannot decide how many PGs the pools need. Make sure that all pools are assigned a rule that limit them to a device class and the errors should stop. --- Indeed, we have a mixed cluster (hdd + ssd) with some pools spanning hdd- only, some ssd-only and some both (ec & replicated) which don't care about the storage device class (e.g. via default "replicated_rule"): [cephmon1] /root # ceph osd crush rule ls replicated_rule ssd_only_replicated_rule hdd_only_replicated_rule default.rgw.buckets.data.ec42 test.ec42 [cephmon1] /root # That worked flawlessly until Octopus. Any idea how to make autoscaler work again with that kind of setup? Can I really have pools on one device class only in Pacific in order to get a functional autoscaler? Thanks, Andreas -- | Andreas Haupt| E-Mail: andreas.ha...@desy.de | DESY Zeuthen| WWW:http://www-zeuthen.desy.de/~ahaupt | Platanenallee 6 | Phone: +49/33762/7-7359 | D-15738 Zeuthen | Fax:+49/33762/7-7216 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] tcmu-runner not in EPEL-8
Dear all, does anyone know by chance, why tcmu-runner is not available in EPEL- 8?Fedora maintains a SRPM for e.g. RawHide & 36: https://kojipkgs.fedoraproject.org//packages/tcmu-runner/1.5.4/4.fc36/src/tcmu-runner-1.5.4-4.fc36.src.rpm This one builds flawlessly under mock for EL8, so actually no problem compiling it on our own. But it would be much more convenient to have it in EPEL-8, as problably no one will run productive iSCSI gateways under Fedora ;-) Cheers, Andreas -- | Andreas Haupt| E-Mail: andreas.ha...@desy.de | DESY Zeuthen| WWW:http://www-zeuthen.desy.de/~ahaupt | Platanenallee 6 | Phone: +49/33762/7-7359 | D-15738 Zeuthen | Fax:+49/33762/7-7216 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] How to troubleshoot monitor node
Hi all, I've set up a 6-node ceph cluster to learn how ceph works and what I can do with it. However, I'm new to ceph, so if the answer to one of my questions is RTFM, point me to the right place. My problem is this: The cluster consists of 3 mons and 3 osds. Even though the dashboard shows all green, the mon01 has a problem: the ceph command hangs and never comes back: root@mon01:~# ceph --version ceph version 15.2.14 (cd3bb7e87a2f62c1b862ff3fd8b1eec13391a5be) octopus (stable) root@mon01:~# ceph -s ^CCluster connection aborted To see what happens I tried this: root@mon01:~# ceph -s --debug-ms=1 2022-01-10T15:51:30.434+0100 7f4a2cd7e700 1 Processor -- start 2022-01-10T15:51:30.434+0100 7f4a2cd7e700 1 -- start start 2022-01-10T15:51:30.434+0100 7f4a2cd7e700 1 --2- >> [v2:192.168.14.48:3300/0,v1:192.168.14.48:6789/0] conn(0x7f4a28066a30 0x7f4a28066e40 unknown :-1 s=NONE pgs=0 cs=0 l=0 rev1=0 rx=0 tx=0).connect 2022-01-10T15:51:30.434+0100 7f4a2cd7e700 1 -- --> [v2:192.168.14.48:3300/0,v1:192.168.14.48:6789/0] -- mon_getmap magic: 0 v1 -- 0x7f4a28067330 con 0x7f4a28066a30 2022-01-10T15:51:30.434+0100 7f4a2659c700 1 -- >> [v2:192.168.14.48:3300/0,v1:192.168.14.48:6789/0] conn(0x7f4a28066a30 msgr2=0x7f4a28066e40 unknown :-1 s=STATE_CONNECTING_RE l=0).process reconnect failed to v2:192.168.14.48:3300/0 ... Indeed, both ports are closed: root@mon01:~# nc -z 192.168.14.48 6789; echo $? 1 root@mon01:~# nc -z 192.168.14.48 3300; echo $? 1 In /var/log/ceph/cephadm.log, I cannot see any useful infos about what might go wrong. I'm not aware of anything I could have done to trigger this error, and I wonder what I could do next to repair this monitor node. Any hint is appreciated. -- Andre Tann ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Huge headaches with NFS and ingress HA failover
Hi, we recently set up a new pacific cluster with cephadm. Deployed nfs on two hosts and ingress on two other hosts. (ceph orch apply for nfs and ingress like on the docs page) So far so good. ESXi with NFS41 connects, but the way ingress works confuses me. It distributes clients static to one nfs daemon by their ip addresses. (I know nfs won't like it if the client switches all the time, because of reservations.) Three of our ESXi servers seem to connect to host1, the 4th one to the other. This leads to problem in ESXi where it doesn't recognize the store as the same like the others. I can't find on how exactly ESXi calculates that, but there must be different information coming from these nfs daemons. nfs-ganesha doesn't behave exactly the same on these hosts. Besides that, I wanted to do some failover tests, before the cluster goes live. I stopped stopped on nfs server, but ingress (haproxy) does't seem to care. On the haproxy stats page, both backends are listed with "no check", so there is no failover happening to the NFS clients. haproxy does not fail over to the other host. Datastores are disconnected and unable to connect new ones. How is ingress supposed to detect a failed nfs server and how to tell ganesha to be identical to each other? Bonus question: Why can't keepalived not just manage nfs-ganesha on two hosts instead of haproxy? It would eliminate an extra network hop. Hope someone has a few insights to that. Spent way too much time to switch to some other solution. Best regards, Andreas ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Nautilus CentOS-7 rpm dependencies
Dear all, ceph-mgr-dashboard-15.2.13-0.el7.noarch contains three rpm dependencies that cannot be resolved here (not part of CentOS & EPEL 7): python3-cherrypy python3-routes python3-jwt Does anybody know where they are expected to come from? Thanks, Andreas -- | Andreas Haupt| E-Mail: andreas.ha...@desy.de | DESY Zeuthen| WWW:http://www-zeuthen.desy.de/~ahaupt | Platanenallee 6 | Phone: +49/33762/7-7359 | D-15738 Zeuthen | Fax:+49/33762/7-7216 smime.p7s Description: S/MIME cryptographic signature ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: mon db growing. over 500Gb
Hello, I also observed excessively growing mon DB in case of recovery. Luckily we were able to solve it by exdending the mon db disk. Without having the chance to re-check: The options nobackfill and norecover might cause that behavior.It feelds like mon holds data that cannot be flushed to an OSD. rgds, j. On 11.03.21 10:47, Marc wrote: > From what I have read here in the past, growing monitor db is related to not > having pg's in 'clean active' state > > >> -Original Message- >> From: ricardo.re.azev...@gmail.com >> Sent: 11 March 2021 00:59 >> To: ceph-users@ceph.io >> Subject: [ceph-users] mon db growing. over 500Gb >> >> Hi all, >> >> >> >> I have a fairly pressing issue. I had a monitor fall out of quorum >> because >> it ran out of disk space during rebalancing from switching to upmap. I >> noticed all my monitor store.db started taking up nearly all disk space >> so I >> set noout, nobackfill and norecover and shutdown all the monitor >> daemons. >> Each store.db was at: >> >> >> >> mon.a 89GB (the one that firt dropped out) >> >> mon.a 400GB >> >> mon.c 400GB >> >> >> I tried setting mon_compact_on_start. This brought mon.a down to 1GB. >> Cool. >> However, when I try it on the other monitors it increased the db size >> ~1Gb/10s so I shut them down again. >> >> Any idea what is going on? Or how can I shrik back down the db? >> >> >> >> ___ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > -- Andreas John net-lab GmbH | Frankfurter Str. 99 | 63067 Offenbach Geschaeftsfuehrer: Andreas John | AG Offenbach, HRB40832 Tel: +49 69 8570033-1 | Fax: -2 | http://www.net-lab.net Facebook: https://www.facebook.com/netlabdotnet Twitter: https://twitter.com/netlabdotnet ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Best practices for OSD on bcache
Hello, we clearly understood that. But in ceph we have the concept of "OSD Journal on very fast different disk". I just asked what in theory should be the advantage of caching on bcache/NVME vs. Journal/NVME. I would not expect any performance advantage for bcache (if the Journal is reasonably sized). I might be totally wrong, though. If you just do it, because you don't want to re-create (or modify) the OSDs, it's not worth the effort IMHO. rgds, derjohn On 02.03.21 10:48, Norman.Kern wrote: > On 2021/3/2 上午5:09, Andreas John wrote: >> Hallo, >> >> do you expect that to be better (faster), than having the OSD's Journal >> on a different disk (ssd, nvme) ? > No, I created the OSD storage devices using bcache devices. >> >> rgds, >> >> derjohn >> >> >> On 01.03.21 05:37, Norman.Kern wrote: >>> Hi, guys >>> >>> I am testing ceph on bcache devices, I found the performance is not good >>> as expected. Does anyone have any best practices for it? Thanks. >>> ___ >>> ceph-users mailing list -- ceph-users@ceph.io >>> To unsubscribe send an email to ceph-users-le...@ceph.io > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io -- Andreas John net-lab GmbH | Frankfurter Str. 99 | 63067 Offenbach Geschaeftsfuehrer: Andreas John | AG Offenbach, HRB40832 Tel: +49 69 8570033-1 | Fax: -2 | http://www.net-lab.net Facebook: https://www.facebook.com/netlabdotnet Twitter: https://twitter.com/netlabdotnet ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Best practices for OSD on bcache
Hallo, do you expect that to be better (faster), than having the OSD's Journal on a different disk (ssd, nvme) ? rgds, derjohn On 01.03.21 05:37, Norman.Kern wrote: > Hi, guys > > I am testing ceph on bcache devices, I found the performance is not good as > expected. Does anyone have any best practices for it? Thanks. > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io -- Andreas John net-lab GmbH | Frankfurter Str. 99 | 63067 Offenbach Geschaeftsfuehrer: Andreas John | AG Offenbach, HRB40832 Tel: +49 69 8570033-1 | Fax: -2 | http://www.net-lab.net Facebook: https://www.facebook.com/netlabdotnet Twitter: https://twitter.com/netlabdotnet ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: 10G stackabe lacp switches
Hello, this is not an answer to the question directly, but you could consider the following to double bandwidth: * Run each ceph node with two NICs, each has an own IP, e.g. one node has 192.0.2.10/24 and 192.0.2.11/24 * In ceph.conf you bind 50% of the OSDs to each of those IPs: [osd.XY] ... public_addr = ... cluster_addr = 192.0.2.x * With an equally distributed traffic (enough OSD) should nearly double your bandwidth * To get redundancy, you could extend that config using keepalived/vrrp to switchover the IP of a failing NIC/switch to the other one. I am pretty aware that we also have linux bonding with mode slb, but to my experience that didn't work very well with COTS switches, maybe due to ARP learing issues. (We ended up buying Juniper QFX-5100 with MLAG support). Best Regards, Andreas P.S. I didn't try out the setup from above yet. If anyone did already or will do, I would be happy about feedback. On 16.02.21 16:56, Mario Giammarco wrote: > Il giorno lun 15 feb 2021 alle ore 15:16 mj ha > scritto: > >> >> On 2/15/21 1:38 PM, Eneko Lacunza wrote: >>> Do you really need MLAG? (the 2x10G bandwith?). If not, just use 2 >>> simple switches (Mikrotik for example) and in Proxmox use an >>> active-pasive bond, with default interface in all nodes to the same >> switch. >> >> Since we are now on SSD OSDs only, and our aim is to be able to add more >> OSD nodes, yes: I think we should aim for more than 10G bandwidth. >> >> So go for 40G. LACP will not give you 2x10 magical bandwidth doubling. > BTW: I am using mikrotik 10g switches and they have great value. > BTW2: if you use Proxmox you do not need LACP you can use linux round robin > support that has the same performance of LACP and it does not require > switches support. > > >> Thanks! >> ___ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io >> > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > -- Andreas John net-lab GmbH | Frankfurter Str. 99 | 63067 Offenbach Geschaeftsfuehrer: Andreas John | AG Offenbach, HRB40832 Tel: +49 69 8570033-1 | Fax: -2 | http://www.net-lab.net Facebook: https://www.facebook.com/netlabdotnet Twitter: https://twitter.com/netlabdotnet ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: How to reset an OSD
Hello, I suspect there was unwritten data in RAM which didn't make it to the disk. This shoudn't happen, that's why the journal is in place. If you have size=2 in you pool, there is one copy on the other host. Do delete the OSD you could probably do ceph osd crush remove osd.x ceph osd rm osd.x ceph auth del osd.x maybe "wipefs -a /dev/sdxxx" or dd if=/dev/zero of=dev/sdxx count=1 bs=1m ... Then you should be able deploy the disk again with the tool that you used originally. The disk should be "fresh". rgds, derjohn. On 13.01.21 15:45, Pfannes, Fabian wrote: > failed: (22) Invalid argument -- Andreas John net-lab GmbH | Frankfurter Str. 99 | 63067 Offenbach Geschaeftsfuehrer: Andreas John | AG Offenbach, HRB40832 Tel: +49 69 8570033-1 | Fax: -2 | http://www.net-lab.net Facebook: https://www.facebook.com/netlabdotnet Twitter: https://twitter.com/netlabdotnet ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Proxmox+Ceph Benchmark 2020
Hello Alwin, do you know if it makes difference to disable "all green computing" in the BIOS vs. settings the governor to "performance" in the OS? Of not, I think I will will have some service cycles to set our proxmox-ceph nodes correctly. Best Regards, Andreas On 14.10.20 08:39, Alwin Antreich wrote: > On Tue, Oct 13, 2020 at 11:19:33AM -0500, Mark Nelson wrote: >> Thanks for the link Alwin! >> >> >> On intel platforms disabling C/P state transitions can have a really big >> impact on IOPS (on RHEL for instance using the network or performance >> latency tuned profile). It would be very interesting to know if AMD EPYC >> platforms see similar benefits. I don't have any in house, but if you >> happen to have a chance it would be an interesting addendum to your report. > Thanks for the suggestion. I indeed did a run before disabling the C/P > states in the BIOS. But unfortunately I didn't keep the results. :/ > > As far as I remember though, there was a visible improvement after > disabling them. > > I will have a look, once I have some time to do some more benchmarks. > > -- > Cheers, > Alwin > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > -- Andreas John net-lab GmbH | Frankfurter Str. 99 | 63067 Offenbach Geschaeftsfuehrer: Andreas John | AG Offenbach, HRB40832 Tel: +49 69 8570033-1 | Fax: -2 | http://www.net-lab.net Facebook: https://www.facebook.com/netlabdotnet Twitter: https://twitter.com/netlabdotnet ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Ceph test cluster, how to estimate performance.
Hello Daniel, yes Samsung "Pro" SSD series aren't to much "pro", especially when it's about write IOPS. I would tend to say get some Intel S4510 if you can afford it. It you can't you can still try to activate overprovisioning on the SSD, I would trend to say reserve 10-30% of the SSD for wear leveling (writing). First check the number of sectors with hdparm -N /dev/sdX then set a permanent HPA (host protected area) to the disk. The "p" and no space is important. hdparm -Np${SECTORS} --yes-i-know-what-i-am-x /dev/sdX Wait a little (!), power cycle and re-check the disk with hdparm -N /dev/sdX. My Samsung 850 Pro are a little reluctant to accept the setting, but after some tries or a little waiting the change gets permanent. At least the Samsung 850 pro stopped to die suddenly with that setting. Without it the SSD occasionally disconnected from the bus and reappeared after power cycle. I suspect it ran of of wear something. HTH, derjohn On 13.10.20 08:41, Martin Verges wrote: > Hello Daniel, > > just throw away your crappy Samsung SSD 860 Pro. It won't work in an > acceptable way. > > See > https://docs.google.com/spreadsheets/d/1E9-eXjzsKboiCCX-0u0r5fAjjufLKayaut_FOPxYZjc/edit?usp=sharing > for a performance indication of individual disks. > > -- > Martin Verges > Managing director > > Mobile: +49 174 9335695 > E-Mail: martin.ver...@croit.io > Chat: https://t.me/MartinVerges > > croit GmbH, Freseniusstr. 31h, 81247 Munich > CEO: Martin Verges - VAT-ID: DE310638492 > Com. register: Amtsgericht Munich HRB 231263 > > Web: https://croit.io > YouTube: https://goo.gl/PGE1Bx > > > Am Di., 13. Okt. 2020 um 07:31 Uhr schrieb Daniel Mezentsev > : >> Hi Ceph users, >> >> Im working on common lisp client utilizing rados library. Got some >> results, but don't know how to estimate if i am getting correct >> performance. I'm running test cluster from laptop - 2 OSDs - VM, RAM >> 4Gb, 4 vCPU each, monitors and mgr are running from the same VM(s). As >> for storage, i have Samsung SSD 860 Pro, 512G. Disk is splitted into 2 >> logical volumes (LVMs), and that volumes are attached to VMs. I know >> that i can't expect too much from that layout, just want to know if im >> getting adequate numbers. Im doing read/write operations on very small >> objects - up to 1kb. In async write im getting ~7.5-8.0 KIOPS. >> Synchronouse read - pretty much the same 7.5-8.0 KIOPS. Async read is >> segfaulting don't know why. Disk itself is capable to deliver well >> above 50 KIOPS. Difference is magnitude. Any info is more welcome. >> Daniel Mezentsev, founder >> (+1) 604 313 8592. >> Soleks Data Group. >> Shaping the clouds. >> ___ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io >> > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > -- Andreas John net-lab GmbH | Frankfurter Str. 99 | 63067 Offenbach Geschaeftsfuehrer: Andreas John | AG Offenbach, HRB40832 Tel: +49 69 8570033-1 | Fax: -2 | http://www.net-lab.net Facebook: https://www.facebook.com/netlabdotnet Twitter: https://twitter.com/netlabdotnet ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: multiple OSD crash, unfound objects
mary: idle >>> 2020-09-17T15:02:32.945-0500 7f39b5215700 0 log_channel(cluster) log >>> [INF] : scrub complete with tag '1405e5c7-3ecf-4754-918e-129e9d101f7a' >>> 2020-09-17T15:02:32.945-0500 7f39b5215700 0 log_channel(cluster) log >>> [INF] : scrub completed for path: /frames/postO3/hoft >>> 2020-09-17T15:02:32.945-0500 7f39b5215700 0 log_channel(cluster) log >>> [INF] : scrub summary: idle >>> >>> >>> After the scrub completed, access to the file (ls or rm) continue to >>> hang. The MDS reports slow reads: >>> >>> 2020-09-17T15:11:05.654-0500 7f39b9a1e700 0 log_channel(cluster) log >>> [WRN] : slow request 481.867381 seconds old, received at >>> 2020-09-17T15:03:03.788058-0500: client_request(client.451432:11309 >>> getattr pAsLsXsFs #0x105b1c0 2020-09-17T15:03:03.787602-0500 >>> caller_uid=0, caller_gid=0{}) currently dispatched >>> >>> Does anyone have any suggestions on how else to clean up from a >>> permanently lost object? >>> >>> --Mike >>> >>> On 9/16/20 2:03 AM, Frank Schilder wrote: >>>> Sounds similar to this one: https://tracker.ceph.com/issues/46847 >>>> >>>> If you have or can reconstruct the crush map from before adding the >>>> OSDs, you might be able to discover everything with the temporary >>>> reversal of the crush map method. >>>> >>>> Not sure if there is another method, i never got a reply to my >>>> question in the tracker. >>>> >>>> Best regards, >>>> = >>>> Frank Schilder >>>> AIT Risø Campus >>>> Bygning 109, rum S14 >>>> >>>> >>>> From: Michael Thomas >>>> Sent: 16 September 2020 01:27:19 >>>> To: ceph-users@ceph.io >>>> Subject: [ceph-users] multiple OSD crash, unfound objects >>>> >>>> Over the weekend I had multiple OSD servers in my Octopus cluster >>>> (15.2.4) crash and reboot at nearly the same time. The OSDs are >>>> part of >>>> an erasure coded pool. At the time the cluster had been busy with a >>>> long-running (~week) remapping of a large number of PGs after I >>>> incrementally added more OSDs to the cluster. After bringing all >>>> of the >>>> OSDs back up, I have 25 unfound objects and 75 degraded objects. >>>> There >>>> are other problems reported, but I'm primarily concerned with these >>>> unfound/degraded objects. >>>> >>>> The pool with the missing objects is a cephfs pool. The files >>>> stored in >>>> the pool are backed up on tape, so I can easily restore individual >>>> files >>>> as needed (though I would not want to restore the entire filesystem). >>>> >>>> I tried following the guide at >>>> https://docs.ceph.com/docs/octopus/rados/troubleshooting/troubleshooting-pg/#unfound-objects. >>>> >>>> I found a number of OSDs that are still 'not queried'. >>>> Restarting a >>>> sampling of these OSDs changed the state from 'not queried' to >>>> 'already >>>> probed', but that did not recover any of the unfound or degraded >>>> objects. >>>> >>>> I have also tried 'ceph pg deep-scrub' on the affected PGs, but never >>>> saw them get scrubbed. I also tried doing a 'ceph pg >>>> force-recovery' on >>>> the affected PGs, but only one seems to have been tagged accordingly >>>> (see ceph -s output below). >>>> >>>> The guide also says "Sometimes it simply takes some time for the >>>> cluster >>>> to query possible locations." I'm not sure how long "some time" might >>>> take, but it hasn't changed after several hours. >>>> >>>> My questions are: >>>> >>>> * Is there a way to force the cluster to query the possible locations >>>> sooner? >>>> >>>> * Is it possible to identify the files in cephfs that are affected, so >>>> that I could delete only the affected files and restore them from >>>> backup >>>> tapes? >>>> >>>> --Mike >>>> >>>> ceph -s: >>>> >>>> cluster: >>>> id: 066f558c-6789-4a93-aaf1-5af1ba01a3ad >>>> health: HEALTH_ERR >>>> 1 clients failing to respond to capability release >>>> 1 MDSs report slow requests >>>> 25/78520351 objects unfound (0.000%) >>>> 2 nearfull osd(s) >>>> Reduced data availability: 1 pg inactive >>>> Possible data damage: 9 pgs recovery_unfound >>>> Degraded data redundancy: 75/626645098 objects >>>> degraded >>>> (0.000%), 9 pgs degraded >>>> 1013 pgs not deep-scrubbed in time >>>> 1013 pgs not scrubbed in time >>>> 2 pool(s) nearfull >>>> 1 daemons have recently crashed >>>> 4 slow ops, oldest one blocked for 77939 sec, daemons >>>> [osd.0,osd.41] have slow ops. >>>> >>>> services: >>>> mon: 4 daemons, quorum ceph1,ceph2,ceph3,ceph4 (age 9d) >>>> mgr: ceph3(active, since 11d), standbys: ceph2, ceph4, ceph1 >>>> mds: archive:1 {0=ceph4=up:active} 3 up:standby >>>> osd: 121 osds: 121 up (since 6m), 121 in (since 101m); 4 >>>> remapped pgs >>>> >>>> task status: >>>> scrub status: >>>> mds.ceph4: idle >>>> >>>> data: >>>> pools: 9 pools, 2433 pgs >>>> objects: 78.52M objects, 298 TiB >>>> usage: 412 TiB used, 545 TiB / 956 TiB avail >>>> pgs: 0.041% pgs unknown >>>> 75/626645098 objects degraded (0.000%) >>>> 135224/626645098 objects misplaced (0.022%) >>>> 25/78520351 objects unfound (0.000%) >>>> 2421 active+clean >>>> 5 active+recovery_unfound+degraded >>>> 3 active+recovery_unfound+degraded+remapped >>>> 2 active+clean+scrubbing+deep >>>> 1 unknown >>>> 1 active+forced_recovery+recovery_unfound+degraded >>>> >>>> progress: >>>> PG autoscaler decreasing pool 7 PGs from 1024 to 512 (5d) >>>> [] >>>> ___ >>>> ceph-users mailing list -- ceph-users@ceph.io >>>> To unsubscribe send an email to ceph-users-le...@ceph.io >>>> >>> >> ___ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io >> > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io -- Andreas John net-lab GmbH | Frankfurter Str. 99 | 63067 Offenbach Geschaeftsfuehrer: Andreas John | AG Offenbach, HRB40832 Tel: +49 69 8570033-1 | Fax: -2 | http://www.net-lab.net Facebook: https://www.facebook.com/netlabdotnet Twitter: https://twitter.com/netlabdotnet ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Massive Mon DB Size with noout on 14.2.11
Hello *, thx for taking care. I read "works as designed, be sure to have disk space for the mon available". It sounds a little odd that the growth from 50MB to ~15GB + compaction space happens within a couple of seconds, when two OSD rejoin the cluster. Does it matter if I have cephfs in use? Usually I would expect to have MDS load, but does it also cause load on the mon with many files? My OSD map seems to have low absolute numbers: ceph report | grep osdmap | grep committed report 777999536 "osdmap_first_committed": 1276, "osdmap_last_committed": 1781, If a get new disks (partitions) for the mons, is there a size recommendation? Is there a rule of thumb? BTW: Do I still need a filesystem for the partition of the mon DB? Beste Regards, derjohn On 02.10.20 16:25, Dan van der Ster wrote: > The important metric is the difference between these two values: > > # ceph report | grep osdmap | grep committed > report 3324953770 > "osdmap_first_committed": 3441952, > "osdmap_last_committed": 3442452, > > The mon stores osdmaps on disk, and trims the older versions whenever > the PGs are clean. Trimming brings the osdmap_first_committed to be > closer to osdmap_last_committed. > In a cluster with no PGs backfilling or recovering, the mon should > trim that difference to be within 500-750 epochs. > > If there are any PGs backfilling or recovering, then the mon will not > trim beyond the osdmap epoch when the pools were clean. > > So if you are accumulating gigabytes of data in the mon dir, it > suggests that you have unclean PGs/Pools. > > Cheers, dan > > > > > On Fri, Oct 2, 2020 at 4:14 PM Marc Roos wrote: >> >> Does this also count if your cluster is not healthy because of errors >> like '2 pool(s) have no replicas configured' >> I sometimes use these pools for testing, they are empty. >> >> >> >> >> -Original Message- >> Cc: ceph-users >> Subject: [ceph-users] Re: Massive Mon DB Size with noout on 14.2.11 >> >> As long as the cluster is no healthy, the OSD will require much more >> space, depending on the cluster size and other factors. Yes this is >> somewhat normal. >> >> ___ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > -- Andreas John net-lab GmbH | Frankfurter Str. 99 | 63067 Offenbach Geschaeftsfuehrer: Andreas John | AG Offenbach, HRB40832 Tel: +49 69 8570033-1 | Fax: -2 | http://www.net-lab.net Facebook: https://www.facebook.com/netlabdotnet Twitter: https://twitter.com/netlabdotnet ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Massive Mon DB Size with noout on 14.2.11
Hello, we observed massive and sudden growth of the mon db size on disk, from 50MB to 20GB+ (GB!) and thus reaching 100% disk usage on the mountpoint. As far as we can see, it happens if we set "noout" for a node reboot: After the node and the OSDs come back it looks like the mon db size increased drastically. We have 14.2.11, 10 OSD @ 2TB and cephfs in use. Is this a known issue? Should we avoid noout? TIA, derjohn -- Andreas John net-lab GmbH | Frankfurter Str. 99 | 63067 Offenbach Geschaeftsfuehrer: Andreas John | AG Offenbach, HRB40832 Tel: +49 69 8570033-1 | Fax: -2 | http://www.net-lab.net Facebook: https://www.facebook.com/netlabdotnet Twitter: https://twitter.com/netlabdotnet ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Doing minor version update of Ceph cluster with ceph-ansible and rolling-update playbook
I want to update my mimic cluster to the latest minor version using the rolling-update script of ceph-ansible. The cluster was rolled out with that setup. So as long as ceph_stable_release stays on the current installed version (mimic) the rolling update script will do only a minor update. Is this assumption correct? The documentation (https://docs.ceph.com/projects/ceph-ansible/en/latest/day-2/upgrade.html) is short on this. Thanks! - Andreas ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Remove separate WAL device from OSD
Hello, isnt ceph-osd -i osdnum... –flush-journal and then removing the journal enough? On 22.09.20 21:09, Michael Fladischer wrote: > Hi, > > Is it possible to remove an existing WAL device from an OSD? I saw > that ceph-bluestore-tool has a command bluefs-bdev-migrate, but it's > not clear to me if this can only move a WAL device or if it can be > used to remove it ... > > Regards, > Michael > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > -- Andreas John net-lab GmbH | Frankfurter Str. 99 | 63067 Offenbach Geschaeftsfuehrer: Andreas John | AG Offenbach, HRB40832 Tel: +49 69 8570033-1 | Fax: -2 | http://www.net-lab.net Facebook: https://www.facebook.com/netlabdotnet Twitter: https://twitter.com/netlabdotnet ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Unknown PGs after osd move
On 22.09.20 22:09, Nico Schottelius wrote: [...] > All nodes are connected with 2x 10 Gbit/s bonded/LACP, so I'd expect at > least a couple of hundred MB/s network bandwidth per OSD. > > On one server I just restarted the OSDs and now the read performance > dropped down to 1-4 MB/s per OSD with being about 90% busy. > > Since nautilus we observed much longer starting times of OSDs and I > wonder if the osd does some kind of fsck these days and delays the > peering process because of that? > > The disks in question are 3.5"/10TB/6 Gbit/s SATA disks connected to an > H800 controller - so generally speaking I do not see a reasonable > bottleneck here. Yes, I should! I saw in your mail: 1.) 1532 slow requests are blocked > 32 sec 789 slow ops, oldest one blocked for 1949 sec, daemons [osd.12,osd.14,osd.2,osd.20,osd.23,osd.25,osd.3,osd.33,osd.35,osd.50]... have slow ops. An request that is blocked for > 32 sec is odd! Same goes for 1949 sec. I my experience, they will never finish. Sometimes they go away with osd restarts. Are those OSD the ones you relocated? 2.) client: 91 MiB/s rd, 28 MiB/s wr, 1.76k op/s rd, 686 op/s wr recovery: 67 MiB/s, 17 objects/s 67 MB/sec is slower than a single rotational disk can deliver. Even 67 + 91 MB/s is not much, especially not for an 85 OSD @ 10G cluster. The ~2500 IOPS client I/O will translate to 7500 "net" IOPS with pook size 3, maybe that is the limit. But I guess you already know that. But before tuning, you should probably listen to Frank's advice about the placements (See other post). ASAP the unknown OSDs come back, the speed will probably go up due to parallelism. rgds, j. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Unknown PGs after osd move
Hey Nico, maybe you "pinned" the IP of the OSDs in question in ceph.conf to the IP of the old chassis? Good Luck, derjohn P.S. < 100MB/sec is a terrible performance for recovery with 85 OSDs. Is it rotational on 1 GBit/sec network? You could set ceph osd set nodeep-scrub to prevent too much read from the plattners and get better recovery performance. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Unknown PGs after osd move
Hello, On 22.09.20 20:45, Nico Schottelius wrote: > Hello, > > after having moved 4 ssds to another host (+ the ceph tell hanging issue > - see previous mail), we ran into 241 unknown pgs: You mean, that you re-seated the OSDs into another chassis/host? Is the crush map aware about that? I didn't ever try that, but don't you need to cursh move it? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Mount CEPH-FS on multiple hosts with concurrent access to the same data objects?
Hello, https://docs.ceph.com/en/latest/rados/operations/erasure-code/ but, you could probably manually intervent, if you want an erasure coded pool. rgds, j. On 22.09.20 14:55, René Bartsch wrote: > Am Dienstag, den 22.09.2020, 14:43 +0200 schrieb Andreas John: >> Hello, >> >> yes, it does. I even comes with a GUI so manage ceph and own basic- >> setup >> tool. No EC support. > What do you mean with EC? > > Regards, > > Renne > > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > -- Andreas John net-lab GmbH | Frankfurter Str. 99 | 63067 Offenbach Geschaeftsfuehrer: Andreas John | AG Offenbach, HRB40832 Tel: +49 69 8570033-1 | Fax: -2 | http://www.net-lab.net Facebook: https://www.facebook.com/netlabdotnet Twitter: https://twitter.com/netlabdotnet ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Mount CEPH-FS on multiple hosts with concurrent access to the same data objects?
Hello, yes, it does. I even comes with a GUI so manage ceph and own basic-setup tool. No EC support. The only issue comes is with the backup stuff, which uses "vzdump" under the hood that causes possibly high load. The reason is not really known yet, but some suspect that small block sizes cause large readahead in ceph.. Use eve4pve-barc instead. _ _ rgds j. On 22.09.20 14:31, René Bartsch wrote: > Am Dienstag, den 22.09.2020, 08:50 +0200 schrieb Robert Sander: > >> Do you know that Proxmox is able to store VM images as RBD directly >> in a >> Ceph cluster? > Does Proxmox support snapshots, backups and thin provisioning with RBD- > VM images? > > Regards, > > Renne > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > -- Andreas John net-lab GmbH | Frankfurter Str. 99 | 63067 Offenbach Geschaeftsfuehrer: Andreas John | AG Offenbach, HRB40832 Tel: +49 69 8570033-1 | Fax: -2 | http://www.net-lab.net Facebook: https://www.facebook.com/netlabdotnet Twitter: https://twitter.com/netlabdotnet ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Many scrub errors after update to 14.2.10
Hi *, after updating our CEPH cluster from 14.2.9 to 14.2.10 it accumulates scrub errors on multiple osds: [cephmon1] /root # ceph health detail HEALTH_ERR 6 scrub errors; Possible data damage: 6 pgs inconsistent OSD_SCRUB_ERRORS 6 scrub errors PG_DAMAGED Possible data damage: 6 pgs inconsistent pg 3.69 is active+clean+inconsistent, acting [59,65,61] pg 3.73 is active+clean+inconsistent, acting [73,88,25] pg 12.29 is active+clean+inconsistent, acting [55,92,42] pg 12.38 is active+clean+inconsistent, acting [150,42,13] pg 12.46 is active+clean+inconsistent, acting [55,18,84] pg 12.75 is active+clean+inconsistent, acting [55,155,49] They all can easily get repaired (ceph pg repair $pg) - but I wonder what could be the source of the problem. The cluster started with Luminous some years ago, was updated to Mimic, then Nautilus. Never seen this before! OSDs are a mixture of HDD/SSD, both are affected. All on Bluestore. Any idea? Was there maybe a code change between 14.2.9 & 14.2.10 that could explain this? Errors in syslog look like this: Aug 5 19:21:21 krake08 ceph-osd: 2020-08-05 19:21:21.831 7fb6b2b9d700 -1 log_channel(cluster) log [ERR] : 12.38 scrub : stat mismatch, got 74/74 objects, 20/20 clones, 74/74 dirty, 0/0 omap, 0/0 pinned, 0/0 hit_set_archive, 0/0 whiteouts, 182904850/172877842 bytes, 0/0 manifest objects, 0/0 hit_set_archive bytes. Aug 5 19:21:21 krake08 ceph-osd: 2020-08-05 19:21:21.831 7fb6b2b9d700 -1 log_channel(cluster) log [ERR] : 12.38 scrub 1 errors Aug 6 08:28:44 krake08 ceph-osd: 2020-08-06 08:28:44.477 7fb6b2b9d700 -1 log_channel(cluster) log [ERR] : 12.38 repair : stat mismatch, got 76/76 objects, 22/22 clones, 76/76 dirty, 0/0 omap, 0/0 pinned, 0/0 hit_set_archive, 0/0 whiteouts, 183166994/173139986 bytes, 0/0 manifest objects, 0/0 hit_set_archive bytes. Aug 6 08:28:44 krake08 ceph-osd: 2020-08-06 08:28:44.477 7fb6b2b9d700 -1 log_channel(cluster) log [ERR] : 12.38 repair 1 errors, 1 fixed Thanks in advance, Andreas -- | Andreas Haupt| E-Mail: andreas.ha...@desy.de | DESY Zeuthen| WWW:http://www-zeuthen.desy.de/~ahaupt | Platanenallee 6 | Phone: +49/33762/7-7359 | D-15738 Zeuthen | Fax:+49/33762/7-7216 smime.p7s Description: S/MIME cryptographic signature ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Nautilus to Octopus Upgrade mds without downtime
Hello, if I understand correctly: if we upgrade from an running nautilus cluster to octopus we have a downtime on an update of MDS. Is this correct? Mit freundlichen Grüßen / Kind regards Andreas Schiefer Leiter Systemadministration / Head of systemadministration --- HOME OF LOYALTY CRM- & Customer Loyalty Solution by UW Service Gesellschaft für Direktwerbung und Marketingberatung mbH Alter Deutzer Postweg 221 51107 Koeln (Rath/Heumar) Deutschland Telefon : +49 221 98696 0 Telefax : +49 221 98696 5222 i...@uw-service.de www.hooloy.de Amtsgericht Koeln HRB 24 768 UST-ID: DE 164 191 706 Geschäftsführer: Ralf Heim --- ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: missing amqp-exchange on bucket-notification with AMQP endpoint
Dear Yuval! The message format you tried to use is the standard one (the one being emitted from boto3, or any other AWS SDK [1]). It passes the arguments using 'x-www-form-urlencoded'. For example: Thank you for your clarification! I've previously tried it as a x-www-form-urlencoded-body as well, but I have failed. That it was then working using the non-standard-parameters has lead me down the wrong road... But I have to admit that I'm still failing to create a topic the S3-way. I've tried it with curl, but as well with Postman. Even if I use your example-body, Ceph keeps telling me (at least) method-not-allowed. Is this maybe because I'm using an AWS Sig v4 to authenticate? This is the request I'm sending out: POST / HTTP/1.1 Content-Type: application/x-www-form-urlencoded; charset=utf-8 Accept-Encoding: identity Date: Tue, 23 Apr 2020 05:00:35 GMT X-Amz-Content-Sha256: e8d828552b412fde2cd686b0a984509bc485693a02e8c53ab84cf36d1dbb961a Host: s3.example.com X-Amz-Date: 2 as0200423T050035Z Authorization: AWS4-HMAC-SHA256 Credential=DNQXT3I8Z5MWDJ1A8YMP/20200423/de/s3/aws4_request, SignedHeaders=accept-encoding;content-type;date;host;x-amz-content-sha256;x-amz-date, Signature=fa65844ba997fe11e65be87a18f160afe1ea459892316d6060bbc663daf6eace User-Agent: PostmanRuntime/7.24.1 Accept: */* Connection: keep-alive Content-Length: 303 Name=ajmmvc-1_topic_1& Attributes.entry.2.key=amqp-exchange& Attributes.entry.1.key=amqp-ack-level& Attributes.entry.2.value=amqp.direct& Version=2010-03-31& Attributes.entry.3.value=amqp%3A%2F%2F127.0.0.1%3A7001& Attributes.entry.1.value=none& Action=CreateTopic& Attributes.entry.3.key=push-endpoint This is the response that comes back: HTTP/1.1 405 Method Not Allowed Content-Length: 200 x-amz-request-id: tx1-005ea12159-6e47a-s3-datacenter Accept-Ranges: bytes Content-Type: application/xml Date: Thu, 23 Apr 2020 05:02:17 GMT encoding="UTF-8"?>MethodNotAllowedtx1-005ea12159-6e47a-s3-datacenter6e47a-s3-datacenter-de This is was radosgw is seeing at the same time 2020-04-23T07:02:17.745+0200 7f5aab2af700 20 final domain/bucket subdomain= domain=s3.example.com in_hosted_domain=1 in_hosted_domain_s3website=0 s->info.domain=s3.example.com s->info.request_uri=/ 2020-04-23T07:02:17.745+0200 7f5aab2af700 10 meta>> HTTP_X_AMZ_CONTENT_SHA256 2020-04-23T07:02:17.745+0200 7f5aab2af700 10 meta>> HTTP_X_AMZ_DATE 2020-04-23T07:02:17.745+0200 7f5aab2af700 10 x>> x-amz-content-sha256:e8d828552b412fde2cd686b0a984509bc485693a02e8c53ab84cf36d1dbb961a 2020-04-23T07:02:17.745+0200 7f5aab2af700 10 x>> x-amz-date:20200423T050035Z 2020-04-23T07:02:17.745+0200 7f5aab2af700 20 req 1 0s get_handler handler=26RGWHandler_REST_Service_S3 2020-04-23T07:02:17.745+0200 7f5aab2af700 10 handler=26RGWHandler_REST_Service_S3 2020-04-23T07:02:17.745+0200 7f5aab2af700 2 req 1 0s getting op 4 2020-04-23T07:02:17.745+0200 7f5aab2af700 10 Content of POST: Name=ajmmvc-1_topic_1& Attributes.entry.2.key=amqp-exchange& Attributes.entry.1.key=amqp-ack-level& Attributes.entry.2.value=amqp.direct& Version=2010-03-31& Attributes.entry.3.value=amqp%3A%2F%2F127.0.0.1%3A7001& Attributes.entry.1.value=none& Action=CreateTopic& Attributes.entry.3.key=push-endpoint 2020-04-23T07:02:17.745+0200 7f5aab2af700 10 Content of POST: Name=ajmmvc-1_topic_1& Attributes.entry.2.key=amqp-exchange& Attributes.entry.1.key=amqp-ack-level& Attributes.entry.2.value=amqp.direct& Version=2010-03-31& Attributes.entry.3.value=amqp%3A%2F%2F127.0.0.1%3A7001& Attributes.entry.1.value=none& Action=CreateTopic& Attributes.entry.3.key=push-endpoint 2020-04-23T07:02:17.745+0200 7f5aab2af700 10 Content of POST: Name=ajmmvc-1_topic_1& Attributes.entry.2.key=amqp-exchange& Attributes.entry.1.key=amqp-ack-level& Attributes.entry.2.value=amqp.direct& Version=2010-03-31& Attributes.entry.3.value=amqp%3A%2F%2F127.0.0.1%3A7001& Attributes.entry.1.value=none& Action=CreateTopic& Attributes.entry.3.key=push-endpoint 2020-04-23T07:02:17.745+0200 7f5aab2af700 1 handler->ERRORHANDLER: err_no=-2003 new_err_no=-2003 2020-04-23T07:02:17.745+0200 7f5aab2af700 2 req 1 0s http status=405 2020-04-23T07:02:17.745+0200 7f5aab2af700 1 == req done req=0x7f5aab2a6d50 op status=0 http_status=405 latency=0s == Best Regards, Andreas ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: missing amqp-exchange on bucket-notification with AMQP endpoint
I've tried to debug this a bit. amqp://rabbitmquser:rabbitmqp...@rabbitmq.example.com:5672 Attributes.entry.1.key=amqp-exchange&Attributes.entry.1.value=amqp.direct&push-endpoint=amqp://rabbitmquser:rabbitmqp...@rabbitmq.example.com:5672 testtopic For the above I was using the following request to create the topic - similar as it is described here [1]: https://ceph.example.com/?Action=CreateTopic&Name=testtopic&Attributes.entry.1.key=amqp-exchange&Attributes.entry.1.value=amqp.direct&push-endpoint=amqp://rabbitmquser:rabbitmqp...@rabbitmq.example.com:5672 (of course endpoint then URL-encoded) It seems to me that RGWHTTPArgs::parse() is not translating the "Attributes.entry.1..." strings into keys & values in its map. This are the keys & values that can now be found in the map: Found name: Attributes.entry.1.key Found value: amqp-exchange Found name: Attributes.entry.1.value Found value: amqp.direct Found name: push-endpoint Found value: amqp://rabbitmquser:rabbitmqp...@rabbitmq.example.com:5672 If I simply change the request to: https://ceph.example.com/?Action=CreateTopic&Name=testtopic&amqp-exchange=amqp.direct&push-endpoint=amqp://rabbitmquser:rabbitmqp...@rabbitmq.example.com:5672/foobar -> at voila, the entries in the map are correct Found name: amqp-exchange Found value: amqp.direct Found name: push-endpoint Found value: amqp://rabbitmquser:rabbitmqp...@rabbitmq.example.com:5672 And then the bucket-notification works like it should. But I don't think the documentation is wrong, or is it? Cheers, Andreas [1] https://docs.ceph.com/docs/master/radosgw/notifications/#create-a-topic [2] Index: ceph-15.2.1/src/rgw/rgw_common.cc === --- ceph-15.2.1.orig/src/rgw/rgw_common.cc +++ ceph-15.2.1/src/rgw/rgw_common.cc @@ -810,6 +810,8 @@ int RGWHTTPArgs::parse() string& name = nv.get_name(); string& val = nv.get_val(); + cout << "Found name: " << name << std::endl; + cout << "Found value: " << val << std::endl; append(name, val); } ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] missing amqp-exchange on bucket-notification with AMQP endpoint
Hello List, I'm trying to create a (S3-)bucket-notification into RabbitMQ via AMQP - on Ceph v15.2.1 octopus, using the official .deb packages on Debian Buster. I've created the following topic (directly via S3, not via pubsub REST API): https://sns.amazonaws.com/doc/2010-03-31/";> testuser testtopic amqp://rabbitmquser:rabbitmqp...@rabbitmq.example.com:5672 Attributes.entry.1.key=amqp-exchange&Attributes.entry.1.value=amqp.direct&push-endpoint=amqp://rabbitmquser:rabbitmqp...@rabbitmq.example.com:5672 testtopic arn:aws:sns:de::testtopic ... Then I've created the following bucket-notification notify-psapp arn:aws:sns:de::testtopic s3:ObjectCreated:* s3:ObjectRemoved:* When I upload a file into the bucket, the event itself seems to get fired, but radosgw keeps tell me that cmqp-exchange is not set 2020-04-20T12:24:29.935+0200 7ff01c5d3700 1 == starting new request req=0x7ff01c5cad50 = 2020-04-20T12:24:30.019+0200 7ff01c5d3700 1 ERROR: failed to create push endpoint: amqp://rabbitmquser:rabbitmqp...@rabbitmq.example.com:5672 due to: pubsub endpoint configuration error: AMQP: missing amqp-exchange But it's there in the EndpointArgs, right? Or do I miss it somewhere else? Best Regards, Andreas ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: RGW do not show up in 'ceph status'
Sorry for the noise - problem was introduced by a missing iptables rule :-( On Fri, 2020-02-21 at 09:04 +0100, Andreas Haupt wrote: > Dear all, > > we recently added two additional RGWs to our CEPH cluster (version > 14.2.7). They work flawlessly, however they do not show up in 'ceph > status': > > [cephmon1] /root # ceph -s | grep -A 6 services > services: > mon: 3 daemons, quorum cephmon1,cephmon2,cephmon3 (age 14h) > mgr: cephmon1(active, since 14h), standbys: cephmon2, cephmon3 > mds: cephfs:1 {0=cephmon1=up:active} 2 up:standby > osd: 168 osds: 168 up (since 2w), 168 in (since 6w) > rgw: 1 daemon active (ceph-s3) > > As you can see, only the first, old RGW (ceph-s3) is listed. Is there > any place where the RGWs need to get "announced"? Any idea, how to > debug this? > > Thanks, > Andreas > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io -- | Andreas Haupt| E-Mail: andreas.ha...@desy.de | DESY Zeuthen| WWW:http://www-zeuthen.desy.de/~ahaupt | Platanenallee 6 | Phone: +49/33762/7-7359 | D-15738 Zeuthen | Fax:+49/33762/7-7216 smime.p7s Description: S/MIME cryptographic signature ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: RGW do not show up in 'ceph status'
On Fri, 2020-02-21 at 15:19 +0700, Konstantin Shalygin wrote: > On 2/21/20 3:04 PM, Andreas Haupt wrote: > > As you can see, only the first, old RGW (ceph-s3) is listed. Is there > > any place where the RGWs need to get "announced"? Any idea, how to > > debug this? > > You was try to restart active mgr? Yes, multiple times, it did not change anything. Cheers, Andreas -- | Andreas Haupt| E-Mail: andreas.ha...@desy.de | DESY Zeuthen| WWW:http://www-zeuthen.desy.de/~ahaupt | Platanenallee 6 | Phone: +49/33762/7-7359 | D-15738 Zeuthen | Fax:+49/33762/7-7216 smime.p7s Description: S/MIME cryptographic signature ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] RGW do not show up in 'ceph status'
Dear all, we recently added two additional RGWs to our CEPH cluster (version 14.2.7). They work flawlessly, however they do not show up in 'ceph status': [cephmon1] /root # ceph -s | grep -A 6 services services: mon: 3 daemons, quorum cephmon1,cephmon2,cephmon3 (age 14h) mgr: cephmon1(active, since 14h), standbys: cephmon2, cephmon3 mds: cephfs:1 {0=cephmon1=up:active} 2 up:standby osd: 168 osds: 168 up (since 2w), 168 in (since 6w) rgw: 1 daemon active (ceph-s3) As you can see, only the first, old RGW (ceph-s3) is listed. Is there any place where the RGWs need to get "announced"? Any idea, how to debug this? Thanks, Andreas -- | Andreas Haupt| E-Mail: andreas.ha...@desy.de | DESY Zeuthen| WWW:http://www-zeuthen.desy.de/~ahaupt | Platanenallee 6 | Phone: +49/33762/7-7359 | D-15738 Zeuthen | Fax:+49/33762/7-7216 smime.p7s Description: S/MIME cryptographic signature ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: osd is immidietly down and uses CPU full.
tive+recovery_wait+undersized+degraded+remapped > 1 active+recovery_wait+degraded+remapped > recovery io 239 MB/s, 187 objects/s > client io 575 kB/s wr, 0 op/s rd, 37 op/s wr > > ceph osd tree > -- > > [root@ceph01 ceph]# ceph osd tree > ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY > -1 108.19864 root default > -2 19.09381 host ceph01 > 0 2.72769 osd.0 up 1.0 1.0 > 1 2.72769 osd.1 down 0 1.0 <-- now > down > 2 2.72769 osd.2 up 1.0 1.0 > 5 2.72769 osd.5 up 1.0 1.0 > 6 2.72768 osd.6 up 1.0 1.0 > 3 2.72768 osd.3 up 1.0 1.0 > 4 2.72769 osd.4 up 1.0 1.0 > -3 19.09383 host ceph02 > 8 2.72769 osd.8 up 1.0 1.0 > 9 2.72769 osd.9 up 1.0 1.0 > 10 2.72769 osd.10 up 1.0 1.0 > 12 2.72769 osd.12 up 1.0 1.0 > 11 2.72769 osd.11 up 1.0 1.0 > 7 2.72768 osd.7 up 1.0 1.0 > 13 2.72769 osd.13 up 1.0 1.0 > -4 16.36626 host ceph03 > 14 2.72769 osd.14 up 1.0 1.0 > 16 2.72769 osd.16 up 1.0 1.0 > 17 2.72769 osd.17 up 1.0 1.0 > 19 2.72769 osd.19 up 1.0 1.0 > 15 1.81850 osd.15 up 1.0 1.0 > 18 1.81850 osd.18 up 1.0 1.0 > 20 1.81850 osd.20 up 1.0 1.0 > -5 15.45706 host ceph04 > 23 2.72769 osd.23 up 1.0 1.0 > 24 2.72769 osd.24 up 1.0 1.0 > 27 2.72769 osd.27 down 0 1.0 <-- > more then 3month ago > 21 1.81850 osd.21 up 1.0 1.0 > 22 1.81850 osd.22 up 1.0 1.0 > 25 1.81850 osd.25 up 1.0 1.0 > 26 1.81850 osd.26 up 1.0 1.0 > -6 19.09384 host ceph05 > 28 2.72769 osd.28 up 1.0 1.0 > 29 2.72769 osd.29 up 1.0 1.0 > 30 2.72769 osd.30 up 1.0 1.0 > 31 2.72769 osd.31 down 0 1.0 <-- > more then 3month ago > 32 2.72769 osd.32 up 1.0 1.0 > 34 2.72769 osd.34 up 1.0 1.0 > 33 2.72769 osd.33 down 0 1.0 <-- > more then 3month ago > -7 19.09384 host ceph06 > 35 2.72769 osd.35 up 1.0 1.0 > 36 2.72769 osd.36 up 1.0 1.0 > 37 2.72769 osd.37 up 1.0 1.0 > 39 2.72769 osd.39 up 1.0 1.0 > 40 2.72769 osd.40 up 1.0 1.0 > 41 2.72769 osd.41 up 1.0 1.0 > 38 2.72769 osd.38 down 0 1.0 <-- > more then 3month ago > > > -- > > On 2020/02/02 11:20, 西宮 牧人 wrote: >> Servers: 6 (include 7osds) total 42osdsl >> OS: Centos7 >> Ceph: 10.2.5 >> >> Hi, everyone >> >> The cluster is used for VM image storage and object storage. >> And I have a bucket which has more than 20 million objects. >> >> Now, I have a problem that cluster blocks operation. >> >> Suddenly cluster blocked operations, then VMs can't read disk. >> After a few hours, osd.1 was down. >> >> There is no disk fail messages in dmesg. >> And no error is in smartctl -a /dev/sde. >> >> I tried to wake up osd.1, but osd.1 is down soon. >> Just after re-waking up osd.1, VM can access to the disk. >> But osd.1 always uses 100% CPU, then cluster marked osd.1 down and >> the osd was dead by suicide timeout. >> >> I found that the osdmap epoch of osd.1 is different from other one. >> So I think osd.1 was dead. >> >> >> Question. >> (1) Why does the epoch of osd.1 differ from other osds ones ? >> >> I checked all osds oldest_map and newest_map by ~ceph daemon osd.X >> status~ >> All osd's ecpoch are same number except osd.1 >> >> (2) Why does osd.1 use CPU full? >> >> After the cluster marked osd.1 down, osd.1 keeps up busy. >> When I execute "ceph tell osd.1 injectargs --debug-ms 5/1", osd.1 >> doesn't answer. >> >> >> Thank you. > -- Andreas John net-lab GmbH | Frankfurter Str. 99 | 63067 Offenbach Geschaeftsfuehrer: Andreas John | AG Offenbach, HRB40832 Tel: +49 69 8570033-1 | Fax: -2 | http://www.net-lab.net Facebook: https://www.facebook.com/netlabdotnet Twitter: https://twitter.com/netlabdotnet ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Getting rid of trim_object Snap .... not in clones
Helllo, answering to myself in case some else sutmbles upon this thread in the future. I was able to remove the unexpected snap, here is the recipe: How to remove the unexpected snapshots: 1.) Stop the OSD ceph-osd -i 14 --flush-journal ... flushed journal /var/lib/ceph/osd/ceph-14/journal for object store /var/lib/ceph/osd/ceph-14 2.) List the Object in question ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-14 --journal-path /dev/disk/by-partuuid/212e9db1-943b-45f9-9d83-cffaeb777db7 --op list rbd_data.59cb9c679e2a9e3.3096 [wait ... it might take minutes] ["7.374",{"oid":"rbd_data.59cb9c679e2a9e3.3096","key":"","snapid":171076,"hash":2728045428,"max":0,"pool":7,"namespace":""}] ["7.374",{"oid":"rbd_data.59cb9c679e2a9e3.3096","key":"","snapid":171797,"hash":2728045428,"max":0,"pool":7,"namespace":""}] ["7.374",{"oid":"rbd_data.59cb9c679e2a9e3.3096","key":"","snapid":-2,"hash":2728045428,"max":0,"pool":7,"namespace":""}] 3.) Remove the snap from the object ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-14 --journal-path /dev/disk/by-partuuid/212e9db1-943b-45f9-9d83-cffaeb777db7 ["7.374",{"oid":"rbd_data.59cb9c679e2a9e3.3096","key":"","snapid":171076,"hash":2728045428,"max":0,"pool":7,"namespace":""}] remove [wait ... it might take minutes] remove 7/a29aab74/rbd_data.59cb9c679e2a9e3.3096/29c44 4.) Start the OSD Again 5.) Do this for all OSD on which the snap it exists. If it still exists on one of the other OSDs, it will be synced before repair starts and thus cause harm again. 6.) ceph pg repair 7.374 Happy again and in need of sleep, derjohn ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Getting rid of trim_object Snap .... not in clones
Update: When repairing the PG I get a different error: osd.14 80.69.45.76:6813/4059849 27 : cluster [INF] 7.374 repair starts osd.14 80.69.45.76:6813/4059849 28 : cluster [ERR] 7.374 recorded data digest 0xebbbfb83 != on disk 0x43d61c5d on 7/a29aab74/rbd_data.59cb9c679e2a9e3.3096/29c44 osd.14 80.69.45.76:6813/4059849 29 : cluster [ERR] repair 7.374 7/a29aab74/rbd_data.59cb9c679e2a9e3.3096/29c44 is an unexpected clone osd.14 80.69.45.76:6813/4059849 30 : cluster [ERR] 7.374 repair stat mismatch, got 2110/2111 objects, 131/132 clones, 2110/2111 dirty, 0/0 omap, 0/0 hit_set_archive, 0/0 whiteouts, 8304141312/8304264192 bytes,0/0 hit_set_archive bytes. osd.14 80.69.45.76:6813/4059849 31 : cluster [ERR] 7.374 repair 3 errors, 1 fixed osd.14 80.69.45.76:6813/4059849 32 : cluster [INF] 7.374 deep-scrub starts osd.14 80.69.45.76:6813/4059849 33 : cluster [ERR] deep-scrub 7.374 7/a29aab74/rbd_data.59cb9c679e2a9e3.3096/29c44 is an unexpected clone osd.14 80.69.45.76:6813/4059849 34 : cluster [ERR] 7.374 deep-scrub 1 errors Sorry for being so noisy in list, but maybe someone can now recognize what to do and give me a hint. rgds., j #On 01.02.20 10:20, Andreas John wrote: > Hello, > > for those sumbling upon a similar issue: I was able to mitigate the > issue, by setting > > > === 8< === > > [osd.14] > osd_pg_max_concurrent_snap_trims = 0 > > = > > > in ceph.conf. You don't need to restart the osd, osd crash crash + > systemd will do it for you :) > > Now the osd in question does no trimming anymore and thus stays up. > > Now I let the deep-scrubber run, and press thumbs it will clean up the > mess. > > > In case I need to clean up manually, could anyone give a hint how to > find the rbd with that snap? The logs says: > > > 7faf8f716700 -1 log_channel(cluster) log [ERR] : trim_object Snap 29c44 > not in clones > > > 1.) What is the 7faf8f716700 at the beginning of the log? Is it a daemon > id? > > 2.) About the Snap "ID" 29c44: In the filesystem I see > > ...ceph-14/current/7.374_head/DIR_4/DIR_7/DIR_B/DIR_A/rbd\udata.59cb9c679e2a9e3.3096__29c44_A29AAB74__7 > > Do I read it correctly that in PG 7.374 there is with rbd prefix > 59cb9c679e2a9e3 an object that ends with ..3096, which has a snap ID > 29c44 ... ? What does the part A29AAB74__7 ? > > I was nit able to find in docs how the directory / filename is structured. > > > Best Regrads, > > j. > > > > On 31.01.20 16:04, Andreas John wrote: >> Hello, >> >> in my cluster one after the other OSD dies until I recognized that it >> was simply an "abort" in the daemon caused probably by >> >> 2020-01-31 15:54:42.535930 7faf8f716700 -1 log_channel(cluster) log >> [ERR] : trim_object Snap 29c44 not in clones >> >> >> Close to this msg I get a stracktrace: >> >> >> ceph version 0.94.10 (b1e0532418e4631af01acbc0cedd426f1905f4af) >> 1: /usr/bin/ceph-osd() [0xb35f7d] >> 2: (()+0x11390) [0x7f0fec74b390] >> 3: (gsignal()+0x38) [0x7f0feab43428] >> 4: (abort()+0x16a) [0x7f0feab4502a] >> 5: (__gnu_cxx::__verbose_terminate_handler()+0x16d) [0x7f0feb48684d] >> 6: (()+0x8d6b6) [0x7f0feb4846b6] >> 7: (()+0x8d701) [0x7f0feb484701] >> 8: (()+0x8d919) [0x7f0feb484919] >> 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char >> const*)+0x27e) [0xc3776e] >> 10: (ReplicatedPG::eval_repop(ReplicatedPG::RepGather*)+0x10dd) [0x868cfd] >> 11: (ReplicatedPG::repop_all_committed(ReplicatedPG::RepGather*)+0x80) >> [0x8690e0] >> 12: (Context::complete(int)+0x9) [0x6c8799] >> 13: (void ReplicatedBackend::sub_op_modify_reply> 113>(std::tr1::shared_ptr)+0x21b) [0xa5ae0b] >> 14: >> (ReplicatedBackend::handle_message(std::tr1::shared_ptr)+0x15b) >> [0xa53edb] >> 15: (ReplicatedPG::do_request(std::tr1::shared_ptr&, >> ThreadPool::TPHandle&)+0x1cb) [0x84c78b] >> 16: (OSD::dequeue_op(boost::intrusive_ptr, >> std::tr1::shared_ptr, ThreadPool::TPHandle&)+0x3ef) [0x6966ff] >> 17: (OSD::ShardedOpWQ::_process(unsigned int, >> ceph::heartbeat_handle_d*)+0x4e4) [0x696e14] >> 18: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x71e) >> [0xc264fe] >> 19: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xc29950] >> 20: (()+0x76ba) [0x7f0fec7416ba] >> 21: (clone()+0x6d) [0x7f0feac1541d] >> NOTE: a copy of the executable, or `objdump -rdS ` is >> needed to interpret this. >> >> >> Yes, I know it's still hammer, I want to upgrade soon, but I want to >> resolve that issue first. If I lose that PG, I don't worry. >&
[ceph-users] Getting rid of trim_object Snap .... not in clones
Hello, for those sumbling upon a similar issue: I was able to mitigate the issue, by setting === 8< === [osd.14] osd_pg_max_concurrent_snap_trims = 0 = in ceph.conf. You don't need to restart the osd, osd crash crash + systemd will do it for you :) Now the osd in question does no trimming anymore and thus stays up. Now I let the deep-scrubber run, and press thumbs it will clean up the mess. In case I need to clean up manually, could anyone give a hint how to find the rbd with that snap? The logs says: 7faf8f716700 -1 log_channel(cluster) log [ERR] : trim_object Snap 29c44 not in clones 1.) What is the 7faf8f716700 at the beginning of the log? Is it a daemon id? 2.) About the Snap "ID" 29c44: In the filesystem I see ...ceph-14/current/7.374_head/DIR_4/DIR_7/DIR_B/DIR_A/rbd\udata.59cb9c679e2a9e3.3096__29c44_A29AAB74__7 Do I read it correctly that in PG 7.374 there is with rbd prefix 59cb9c679e2a9e3 an object that ends with ..3096, which has a snap ID 29c44 ... ? What does the part A29AAB74__7 ? I was nit able to find in docs how the directory / filename is structured. Best Regrads, j. On 31.01.20 16:04, Andreas John wrote: > Hello, > > in my cluster one after the other OSD dies until I recognized that it > was simply an "abort" in the daemon caused probably by > > 2020-01-31 15:54:42.535930 7faf8f716700 -1 log_channel(cluster) log > [ERR] : trim_object Snap 29c44 not in clones > > > Close to this msg I get a stracktrace: > > > ceph version 0.94.10 (b1e0532418e4631af01acbc0cedd426f1905f4af) > 1: /usr/bin/ceph-osd() [0xb35f7d] > 2: (()+0x11390) [0x7f0fec74b390] > 3: (gsignal()+0x38) [0x7f0feab43428] > 4: (abort()+0x16a) [0x7f0feab4502a] > 5: (__gnu_cxx::__verbose_terminate_handler()+0x16d) [0x7f0feb48684d] > 6: (()+0x8d6b6) [0x7f0feb4846b6] > 7: (()+0x8d701) [0x7f0feb484701] > 8: (()+0x8d919) [0x7f0feb484919] > 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x27e) [0xc3776e] > 10: (ReplicatedPG::eval_repop(ReplicatedPG::RepGather*)+0x10dd) [0x868cfd] > 11: (ReplicatedPG::repop_all_committed(ReplicatedPG::RepGather*)+0x80) > [0x8690e0] > 12: (Context::complete(int)+0x9) [0x6c8799] > 13: (void ReplicatedBackend::sub_op_modify_reply 113>(std::tr1::shared_ptr)+0x21b) [0xa5ae0b] > 14: > (ReplicatedBackend::handle_message(std::tr1::shared_ptr)+0x15b) > [0xa53edb] > 15: (ReplicatedPG::do_request(std::tr1::shared_ptr&, > ThreadPool::TPHandle&)+0x1cb) [0x84c78b] > 16: (OSD::dequeue_op(boost::intrusive_ptr, > std::tr1::shared_ptr, ThreadPool::TPHandle&)+0x3ef) [0x6966ff] > 17: (OSD::ShardedOpWQ::_process(unsigned int, > ceph::heartbeat_handle_d*)+0x4e4) [0x696e14] > 18: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x71e) > [0xc264fe] > 19: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xc29950] > 20: (()+0x76ba) [0x7f0fec7416ba] > 21: (clone()+0x6d) [0x7f0feac1541d] > NOTE: a copy of the executable, or `objdump -rdS ` is > needed to interpret this. > > > Yes, I know it's still hammer, I want to upgrade soon, but I want to > resolve that issue first. If I lose that PG, I don't worry. > > So: What it the best approach? Can I use something like > ceph-objectstore-tool ... remove-clone-metadata ? I > assume 29c44 is my Object, but what's the clone od? > > > Best regards, > > derjohn > > _______ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io -- Andreas John net-lab GmbH | Frankfurter Str. 99 | 63067 Offenbach Geschaeftsfuehrer: Andreas John | AG Offenbach, HRB40832 Tel: +49 69 8570033-1 | Fax: -2 | http://www.net-lab.net Facebook: https://www.facebook.com/netlabdotnet Twitter: https://twitter.com/netlabdotnet ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Getting rid of trim_object Snap .... not in clones
Hello, in my cluster one after the other OSD dies until I recognized that it was simply an "abort" in the daemon caused probably by 2020-01-31 15:54:42.535930 7faf8f716700 -1 log_channel(cluster) log [ERR] : trim_object Snap 29c44 not in clones Close to this msg I get a stracktrace: ceph version 0.94.10 (b1e0532418e4631af01acbc0cedd426f1905f4af) 1: /usr/bin/ceph-osd() [0xb35f7d] 2: (()+0x11390) [0x7f0fec74b390] 3: (gsignal()+0x38) [0x7f0feab43428] 4: (abort()+0x16a) [0x7f0feab4502a] 5: (__gnu_cxx::__verbose_terminate_handler()+0x16d) [0x7f0feb48684d] 6: (()+0x8d6b6) [0x7f0feb4846b6] 7: (()+0x8d701) [0x7f0feb484701] 8: (()+0x8d919) [0x7f0feb484919] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x27e) [0xc3776e] 10: (ReplicatedPG::eval_repop(ReplicatedPG::RepGather*)+0x10dd) [0x868cfd] 11: (ReplicatedPG::repop_all_committed(ReplicatedPG::RepGather*)+0x80) [0x8690e0] 12: (Context::complete(int)+0x9) [0x6c8799] 13: (void ReplicatedBackend::sub_op_modify_reply(std::tr1::shared_ptr)+0x21b) [0xa5ae0b] 14: (ReplicatedBackend::handle_message(std::tr1::shared_ptr)+0x15b) [0xa53edb] 15: (ReplicatedPG::do_request(std::tr1::shared_ptr&, ThreadPool::TPHandle&)+0x1cb) [0x84c78b] 16: (OSD::dequeue_op(boost::intrusive_ptr, std::tr1::shared_ptr, ThreadPool::TPHandle&)+0x3ef) [0x6966ff] 17: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x4e4) [0x696e14] 18: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x71e) [0xc264fe] 19: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0xc29950] 20: (()+0x76ba) [0x7f0fec7416ba] 21: (clone()+0x6d) [0x7f0feac1541d] NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. Yes, I know it's still hammer, I want to upgrade soon, but I want to resolve that issue first. If I lose that PG, I don't worry. So: What it the best approach? Can I use something like ceph-objectstore-tool ... remove-clone-metadata ? I assume 29c44 is my Object, but what's the clone od? Best regards, derjohn ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io