[ceph-users] Re: Ceph Quincy and liburing.so.2 on Rocky Linux 9
Hi, In most cases the 'Alternative' distro like Alma or Rocky have outdated versions of packages, if we compared it with CentOS Stream 8 or CentOS Stream 9. For example is a golang package, on c8s is a 1.20 version on Alma still 1.19 You can try to use c8s/c9s or try to contribute to your distro to resolve dependency issues k Sent from my iPhone > On 4 Aug 2023, at 02:05, dobr...@gmu.edu wrote: > > I've been digging and I can't see that this has come up anywhere. > > I'm trying to update a client from Pacific 17.2.3-2 to 17.2.6-4 and I'm > getting the error > > Error: > Problem: cannot install the best update candidate for package > ceph-base-2:17.2.3-2.el9s.x86_64 > - nothing provides liburing.so.2()(64bit) needed by > ceph-base-2:17.2.6-4.el9s.x86_64 > - nothing provides liburing.so.2(LIBURING_2.0)(64bit) needed by > ceph-base-2:17.2.6-4.el9s.x86_64 > (try to add '--skip-broken' to skip uninstallable packages or '--nobest' to > use not only best candidate packages) > > Did Ceph Pacific switch to requiring liburing 2? Rocky 9 only provides 0.7-7. > CentOS stream seems to have 1.0.7-3 (at least back to when I set up that repo > on Foreman; I don't remember if I'm keeping it up-to-date). > > Can I/should I just do --nobest when updating? I could probably build it from > a source RPM from another RH-based distro, but I'd rather keep it clean with > the same distro. > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: MDS nodes blocklisted
Hi Nathan, On Mon, Jul 31, 2023 at 4:34 PM Nathan Harper wrote: > > Hi, > > We're having sporadic problems with a CephFS filesystem where MDSs end up > on the OSD blocklist. We're still digging around looking for a cause > (Ceph related or other infrastructure cause). The monitors can blocklist the MDS in various cases. Monitor logs would be a good place to start looking as to why this is happening (frequently). > > The cluster isn't massive (68 OSDs spread over 34 hosts), each host is a > VM, with MGR/MON/MDS on non-OSD hosts. > > Running Ceph 16.2.10 > > Any suggestions for debugging this further? > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > -- Cheers, Venky ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: cephfs snapshot mirror peer_bootstrap import hung
Hi Anantha, On Fri, Aug 4, 2023 at 2:27 AM Adiga, Anantha wrote: > > Hi > > Could you please provide guidance on how to diagnose this issue: > > In this case, there are two Ceph clusters: cluster A, 4 nodes and cluster B, > 3 node, in different locations. Both are already running RGW multi-site, A > is master. > > Cephfs snapshot mirroring is being configured on the clusters. Cluster A is > the primary, cluster B is the peer. Cephfs snapshot mirroring is being > configured. The bootstrap import step on the primary node hangs. > > On the target cluster : > --- > "version": "16.2.5", > "release": "pacific", > "release_type": "stable" > > root@cr21meg16ba0101:/# ceph fs snapshot mirror peer_bootstrap create cephfs > client.mirror_remote flex2-site > {"token": > "eyJmc2lkIjogImE2ZjUyNTk4LWU1Y2QtNGEwOC04NDIyLTdiNmZkYjFkNWRiZSIsICJmaWxlc3lzdGVtIjogImNlcGhmcyIsICJ1c2VyIjogImNsaWVudC5taXJyb3JfcmVtb3RlIiwgInNpdGVfbmFtZSI6ICJmbGV4Mi1zaXRlIiwgImtleSI6ICJBUUNmd01sa005MHBMQkFBd1h0dnBwOGowNEl2Qzh0cXBBRzliQT09IiwgIm1vbl9ob3N0IjogIlt2MjoxNzIuMTguNTUuNzE6MzMwMC8wLHYxOjE3Mi4xOC41NS43MTo2Nzg5LzBdIFt2MjoxNzIuMTguNTUuNzM6MzMwMC8wLHYxOjE3Mi4xOC41NS43Mzo2Nzg5LzBdIn0="} Seems fine uptil here. > root@cr21meg16ba0101:/var/run/ceph# > > On the source cluster: > > "version": "17.2.6", > "release": "quincy", > "release_type": "stable" > > root@fl31ca104ja0201:/# ceph -s > cluster: > id: d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e > health: HEALTH_OK > > services: > mon: 3 daemons, quorum > fl31ca104ja0202,fl31ca104ja0203,fl31ca104ja0201 (age 111m) > mgr: fl31ca104ja0201.nwpqlh(active, since 11h), standbys: > fl31ca104ja0203, fl31ca104ja0202 > mds: 1/1 daemons up, 2 standby > osd: 44 osds: 44 up (since 111m), 44 in (since 4w) > cephfs-mirror: 1 daemon active (1 hosts) > rgw: 3 daemons active (3 hosts, 1 zones) > > data: > volumes: 1/1 healthy > pools: 25 pools, 769 pgs > objects: 614.40k objects, 1.9 TiB > usage: 2.8 TiB used, 292 TiB / 295 TiB avail > pgs: 769 active+clean > > root@fl31ca104ja0302:/# ceph mgr module enable mirroring > module 'mirroring' is already enabled > root@fl31ca104ja0302:/# ceph fs snapshot mirror peer_bootstrap import cephfs > eyJmc2lkIjogImE2ZjUyNTk4LWU1Y2QtNGEwOC04NDIyLTdiNmZkYjFkNWRiZSIsICJmaWxlc3lzdGVtIjogImNlcGhmcyIsICJ1c2VyIjogImNsaWVudC5taXJyb3JfcmVtb3RlIiwgInNpdGVfbmFtZSI6ICJmbGV4Mi1zaXRlIiwgImtleSI6ICJBUUNmd01sa005MHBMQkFBd1h0dnBwOGowNEl2Qzh0cXBBRzliQT09IiwgIm1vbl9ob3N0IjogIlt2MjoxNzIuMTguNTUuNzE6MzMwMC8wLHYxOjE3Mi4xOC41NS43MTo2Nzg5LzBdIFt2MjoxNzIuMTguNTUuNzM6MzMwMC8wLHYxOjE3Mi4xOC41NS43Mzo2Nzg5LzBdIn0= Going by your description, I'm guessing this is the command that hangs? If that's the case, set `debug_mgr=20`, repeat the token import step and share the ceph-mgr log. Also note that you can check the mirror daemon status as detailed in https://docs.ceph.com/en/latest/dev/cephfs-mirroring/#mirror-daemon-status > > > root@fl31ca104ja0302:/var/run/ceph# ceph --admin-daemon > /var/run/ceph/ceph-client.cephfs-mirror.fl31ca104ja0302.sypagt.7.94083135960976.asok > status > { > "metadata": { > "ceph_sha1": "d7ff0d10654d2280e08f1ab989c7cdf3064446a5", > "ceph_version": "ceph version 17.2.6 > (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)", > "entity_id": "cephfs-mirror.fl31ca104ja0302.sypagt", > "hostname": "fl31ca104ja0302", > "pid": "7", > "root": "/" > }, > "dentry_count": 0, > "dentry_pinned_count": 0, > "id": 5194553, > "inst": { > "name": { > "type": "client", > "num": 5194553 > }, > "addr": { > "type": "v1", > "addr": "10.45.129.5:0", > "nonce": 2497002034 > } > }, > "addr": { > "type": "v1", > "addr": "10.45.129.5:0", > "nonce": 2497002034 > }, > "inst_str": "client.5194553 10.45.129.5:0/2497002034", > "addr_str": "10.45.129.5:0/2497002034", > "inode_count": 1, > "mds_epoch": 118, > "osd_epoch": 6266, > "osd_epoch_barrier": 0, > "blocklisted": false, > "fs_name": "cephfs" > } > > root@fl31ca104ja0302:/home/general# docker logs > ceph-d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e-cephfs-mirror-fl31ca104ja0302-sypagt > --tail 10 > debug 2023-08-03T05:24:27.413+ 7f8eb6fc0280 0 ceph version 17.2.6 > (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable), process > cephfs-mirror, pid 7 > debug 2023-08-03T05:24:27.413+ 7f8eb6fc0280 0 pidfile_write: ignore > empty --pid-file > debug 2023-08-03T05:24:27.445+ 7f8eb6fc0280 1 mgrc > service_daemon_register cephfs-mirror.5184622 metadata > {arch=x86_64,ceph_release=quincy,ceph_version=ceph version 17.2.6 > (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy > (stable),ceph_version_short=17.2.6,
[ceph-users] snapshot timestamp
Hi, We know snapshot is on a point of time. Is this point of time tracked internally by some sort of sequence number, or the timestamp showed by "snap ls", or something else? I noticed that when "deep cp", the timestamps of all snapshot are changed to copy-time. Say I create a snapshot at 1PM and make a copy at 3PM, the timestamp of snapshot in the copy is 3PM. If I rollback the copy to this snapshot, I'd assume it will actually bring me back to the state of 1PM. Is that correct? If the above is true, I won't be able to rely on timestamp to track snapshots. Say I create a snapshot every hour and make a backup by copy at the end of the day. Then the original image is damaged and backup is used to restore the work. On this backup image, how do I know which snapshot was on 1PM, which was on 2PM, etc.? Any advices to track snapshots properly in such case? I can definitely build something else to help on this, but I'd like to know how much Ceph can support it. Thanks! Tony ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] What's the max of snap ID?
Hi, There is a snap ID for each snapshot. How is this ID allocated, sequentially? Did some tests, it seems this ID is per pool, starting from 4 and always going up. Is that correct? What's the max of this ID? What's going to happen when ID reaches the max, going back to start from 4 again? Thanks! Tony ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: cephfs snapshot mirror peer_bootstrap import hung
Attached log file -Original Message- From: Adiga, Anantha Sent: Thursday, August 3, 2023 5:50 PM To: ceph-users@ceph.io Subject: [ceph-users] Re: cephfs snapshot mirror peer_bootstrap import hung Adding additional info: The cluster A and B both have the same name: ceph and each has a single filesystem with the same name cephfs. Is that the issue ? Tried using peer_add command and it is hanging as well: root@fl31ca104ja0201:/#ls /etc/ceph/ cr_ceph.conf client.mirror_remote.keying ceph.client.admin.keyring ceph.conf (remote cluster) root@cr21meg16ba0101:/etc/ceph# ls /etc/ceph ceph.client.admin.keyring ceph.conf ceph.mon.keyring root@fl31ca104ja0201:/# ceph fs snapshot mirror peer_add cephfs client.mirror_remote@cr_ceph cephfs v2:172.18.55.71:3300,v1:172.18.55.71:6789],[v2:172.18.55.72:3300,v1:172.18.55.72:6789],[v2:172.18.55.73:3300,v1:172.18.55.73:6789 AQCfwMlkM90pLBAAwXtvpp8j04IvC8tqpAG9bA== Hi Could you please provide guidance on how to diagnose this issue: In this case, there are two Ceph clusters: cluster A, 4 nodes and cluster B, 3 node, in different locations. Both are already running RGW multi-site, A is master. Cephfs snapshot mirroring is being configured on the clusters. Cluster A is the primary, cluster B is the peer. Cephfs snapshot mirroring is being configured. The bootstrap import step on the primary node hangs. On the target cluster : --- "version": "16.2.5", "release": "pacific", "release_type": "stable" root@cr21meg16ba0101:/# ceph fs snapshot mirror peer_bootstrap create cephfs client.mirror_remote flex2-site {"token": "eyJmc2lkIjogImE2ZjUyNTk4LWU1Y2QtNGEwOC04NDIyLTdiNmZkYjFkNWRiZSIsICJmaWxlc3lzdGVtIjogImNlcGhmcyIsICJ1c2VyIjogImNsaWVudC5taXJyb3JfcmVtb3RlIiwgInNpdGVfbmFtZSI6ICJmbGV4Mi1zaXRlIiwgImtleSI6ICJBUUNmd01sa005MHBMQkFBd1h0dnBwOGowNEl2Qzh0cXBBRzliQT09IiwgIm1vbl9ob3N0IjogIlt2MjoxNzIuMTguNTUuNzE6MzMwMC8wLHYxOjE3Mi4xOC41NS43MTo2Nzg5LzBdIFt2MjoxNzIuMTguNTUuNzM6MzMwMC8wLHYxOjE3Mi4xOC41NS43Mzo2Nzg5LzBdIn0="} root@cr21meg16ba0101:/var/run/ceph# On the source cluster: "version": "17.2.6", "release": "quincy", "release_type": "stable" root@fl31ca104ja0201:/# ceph -s cluster: id: d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e health: HEALTH_OK services: mon: 3 daemons, quorum fl31ca104ja0202,fl31ca104ja0203,fl31ca104ja0201 (age 111m) mgr: fl31ca104ja0201.nwpqlh(active, since 11h), standbys: fl31ca104ja0203, fl31ca104ja0202 mds: 1/1 daemons up, 2 standby osd: 44 osds: 44 up (since 111m), 44 in (since 4w) cephfs-mirror: 1 daemon active (1 hosts) rgw: 3 daemons active (3 hosts, 1 zones) data: volumes: 1/1 healthy pools: 25 pools, 769 pgs objects: 614.40k objects, 1.9 TiB usage: 2.8 TiB used, 292 TiB / 295 TiB avail pgs: 769 active+clean root@fl31ca104ja0302:/# ceph mgr module enable mirroring module 'mirroring' is already enabled root@fl31ca104ja0302:/# ceph fs snapshot mirror peer_bootstrap import cephfs eyJmc2lkIjogImE2ZjUyNTk4LWU1Y2QtNGEwOC04NDIyLTdiNmZkYjFkNWRiZSIsICJmaWxlc3lzdGVtIjogImNlcGhmcyIsICJ1c2VyIjogImNsaWVudC5taXJyb3JfcmVtb3RlIiwgInNpdGVfbmFtZSI6ICJmbGV4Mi1zaXRlIiwgImtleSI6ICJBUUNmd01sa005MHBMQkFBd1h0dnBwOGowNEl2Qzh0cXBBRzliQT09IiwgIm1vbl9ob3N0IjogIlt2MjoxNzIuMTguNTUuNzE6MzMwMC8wLHYxOjE3Mi4xOC41NS43MTo2Nzg5LzBdIFt2MjoxNzIuMTguNTUuNzM6MzMwMC8wLHYxOjE3Mi4xOC41NS43Mzo2Nzg5LzBdIn0= root@fl31ca104ja0201:/# ceph fs snapshot mirror daemon status [{"daemon_id": 5300887, "filesystems": [{"filesystem_id": 1, "name": "cephfs", "directory_count": 0, "peers": []}]}] root@fl31ca104ja0302:/var/run/ceph# ceph --admin-daemon /var/run/ceph/ceph-client.cephfs-mirror.fl31ca104ja0302.sypagt.7.94083135960976.asok status { "metadata": { "ceph_sha1": "d7ff0d10654d2280e08f1ab989c7cdf3064446a5", "ceph_version": "ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)", "entity_id": "cephfs-mirror.fl31ca104ja0302.sypagt", "hostname": "fl31ca104ja0302", "pid": "7", "root": "/" }, "dentry_count": 0, "dentry_pinned_count": 0, "id": 5194553, "inst": { "name": { "type": "client", "num": 5194553 }, "addr": { "type": "v1", "addr": "10.45.129.5:0", "nonce": 2497002034 } }, "addr": { "type": "v1", "addr": "10.45.129.5:0", "nonce": 2497002034 }, "inst_str": "client.5194553 10.45.129.5:0/2497002034", "addr_str": "10.45.129.5:0/2497002034", "inode_count": 1, "mds_epoch": 118, "osd_epoch": 6266, "osd_epoch_barrier": 0, "blocklisted": false, "fs_name": "cephfs" } root@fl31ca104ja0302:/home/general# docker logs ceph-d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e-cephfs-mirror-fl31ca104ja0302-sypagt --tail 10 debug 2023-08-03T05:24:27.4
[ceph-users] Re: cephfs snapshot mirror peer_bootstrap import hung
Adding additional info: The cluster A and B both have the same name: ceph and each has a single filesystem with the same name cephfs. Is that the issue ? Tried using peer_add command and it is hanging as well: root@fl31ca104ja0201:/#ls /etc/ceph/ cr_ceph.conf client.mirror_remote.keying ceph.client.admin.keyring ceph.conf (remote cluster) root@cr21meg16ba0101:/etc/ceph# ls /etc/ceph ceph.client.admin.keyring ceph.conf ceph.mon.keyring root@fl31ca104ja0201:/# ceph fs snapshot mirror peer_add cephfs client.mirror_remote@cr_ceph cephfs v2:172.18.55.71:3300,v1:172.18.55.71:6789],[v2:172.18.55.72:3300,v1:172.18.55.72:6789],[v2:172.18.55.73:3300,v1:172.18.55.73:6789 AQCfwMlkM90pLBAAwXtvpp8j04IvC8tqpAG9bA== Hi Could you please provide guidance on how to diagnose this issue: In this case, there are two Ceph clusters: cluster A, 4 nodes and cluster B, 3 node, in different locations. Both are already running RGW multi-site, A is master. Cephfs snapshot mirroring is being configured on the clusters. Cluster A is the primary, cluster B is the peer. Cephfs snapshot mirroring is being configured. The bootstrap import step on the primary node hangs. On the target cluster : --- "version": "16.2.5", "release": "pacific", "release_type": "stable" root@cr21meg16ba0101:/# ceph fs snapshot mirror peer_bootstrap create cephfs client.mirror_remote flex2-site {"token": "eyJmc2lkIjogImE2ZjUyNTk4LWU1Y2QtNGEwOC04NDIyLTdiNmZkYjFkNWRiZSIsICJmaWxlc3lzdGVtIjogImNlcGhmcyIsICJ1c2VyIjogImNsaWVudC5taXJyb3JfcmVtb3RlIiwgInNpdGVfbmFtZSI6ICJmbGV4Mi1zaXRlIiwgImtleSI6ICJBUUNmd01sa005MHBMQkFBd1h0dnBwOGowNEl2Qzh0cXBBRzliQT09IiwgIm1vbl9ob3N0IjogIlt2MjoxNzIuMTguNTUuNzE6MzMwMC8wLHYxOjE3Mi4xOC41NS43MTo2Nzg5LzBdIFt2MjoxNzIuMTguNTUuNzM6MzMwMC8wLHYxOjE3Mi4xOC41NS43Mzo2Nzg5LzBdIn0="} root@cr21meg16ba0101:/var/run/ceph# On the source cluster: "version": "17.2.6", "release": "quincy", "release_type": "stable" root@fl31ca104ja0201:/# ceph -s cluster: id: d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e health: HEALTH_OK services: mon: 3 daemons, quorum fl31ca104ja0202,fl31ca104ja0203,fl31ca104ja0201 (age 111m) mgr: fl31ca104ja0201.nwpqlh(active, since 11h), standbys: fl31ca104ja0203, fl31ca104ja0202 mds: 1/1 daemons up, 2 standby osd: 44 osds: 44 up (since 111m), 44 in (since 4w) cephfs-mirror: 1 daemon active (1 hosts) rgw: 3 daemons active (3 hosts, 1 zones) data: volumes: 1/1 healthy pools: 25 pools, 769 pgs objects: 614.40k objects, 1.9 TiB usage: 2.8 TiB used, 292 TiB / 295 TiB avail pgs: 769 active+clean root@fl31ca104ja0302:/# ceph mgr module enable mirroring module 'mirroring' is already enabled root@fl31ca104ja0302:/# ceph fs snapshot mirror peer_bootstrap import cephfs eyJmc2lkIjogImE2ZjUyNTk4LWU1Y2QtNGEwOC04NDIyLTdiNmZkYjFkNWRiZSIsICJmaWxlc3lzdGVtIjogImNlcGhmcyIsICJ1c2VyIjogImNsaWVudC5taXJyb3JfcmVtb3RlIiwgInNpdGVfbmFtZSI6ICJmbGV4Mi1zaXRlIiwgImtleSI6ICJBUUNmd01sa005MHBMQkFBd1h0dnBwOGowNEl2Qzh0cXBBRzliQT09IiwgIm1vbl9ob3N0IjogIlt2MjoxNzIuMTguNTUuNzE6MzMwMC8wLHYxOjE3Mi4xOC41NS43MTo2Nzg5LzBdIFt2MjoxNzIuMTguNTUuNzM6MzMwMC8wLHYxOjE3Mi4xOC41NS43Mzo2Nzg5LzBdIn0= root@fl31ca104ja0201:/# ceph fs snapshot mirror daemon status [{"daemon_id": 5300887, "filesystems": [{"filesystem_id": 1, "name": "cephfs", "directory_count": 0, "peers": []}]}] root@fl31ca104ja0302:/var/run/ceph# ceph --admin-daemon /var/run/ceph/ceph-client.cephfs-mirror.fl31ca104ja0302.sypagt.7.94083135960976.asok status { "metadata": { "ceph_sha1": "d7ff0d10654d2280e08f1ab989c7cdf3064446a5", "ceph_version": "ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)", "entity_id": "cephfs-mirror.fl31ca104ja0302.sypagt", "hostname": "fl31ca104ja0302", "pid": "7", "root": "/" }, "dentry_count": 0, "dentry_pinned_count": 0, "id": 5194553, "inst": { "name": { "type": "client", "num": 5194553 }, "addr": { "type": "v1", "addr": "10.45.129.5:0", "nonce": 2497002034 } }, "addr": { "type": "v1", "addr": "10.45.129.5:0", "nonce": 2497002034 }, "inst_str": "client.5194553 10.45.129.5:0/2497002034", "addr_str": "10.45.129.5:0/2497002034", "inode_count": 1, "mds_epoch": 118, "osd_epoch": 6266, "osd_epoch_barrier": 0, "blocklisted": false, "fs_name": "cephfs" } root@fl31ca104ja0302:/home/general# docker logs ceph-d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e-cephfs-mirror-fl31ca104ja0302-sypagt --tail 10 debug 2023-08-03T05:24:27.413+ 7f8eb6fc0280 0 ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable), process cephfs-mirror, pid 7 debug 2023-08-03T05:24:27.413+ 7f8eb6fc0280 0 pidfile_write: ignore
[ceph-users] Ceph Quincy and liburing.so.2 on Rocky Linux 9
I've been digging and I can't see that this has come up anywhere. I'm trying to update a client from Pacific 17.2.3-2 to 17.2.6-4 and I'm getting the error Error: Problem: cannot install the best update candidate for package ceph-base-2:17.2.3-2.el9s.x86_64 - nothing provides liburing.so.2()(64bit) needed by ceph-base-2:17.2.6-4.el9s.x86_64 - nothing provides liburing.so.2(LIBURING_2.0)(64bit) needed by ceph-base-2:17.2.6-4.el9s.x86_64 (try to add '--skip-broken' to skip uninstallable packages or '--nobest' to use not only best candidate packages) Did Ceph Pacific switch to requiring liburing 2? Rocky 9 only provides 0.7-7. CentOS stream seems to have 1.0.7-3 (at least back to when I set up that repo on Foreman; I don't remember if I'm keeping it up-to-date). Can I/should I just do --nobest when updating? I could probably build it from a source RPM from another RH-based distro, but I'd rather keep it clean with the same distro. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: cephfs snapshot mirror peer_bootstrap import hung
Tried using peer_add command and it is hanging as well: root@fl31ca104ja0201:/# ceph fs snapshot mirror peer_add cephfs client.mirror_remote@cr_ceph cephfs v2:172.18.55.71:3300,v1:172.18.55.71:6789],[v2:172.18.55.72:3300,v1:172.18.55.72:6789],[v2:172.18.55.73:3300,v1:172.18.55.73:6789 AQCfwMlkM90pLBAAwXtvpp8j04IvC8tqpAG9bA== -Original Message- From: Adiga, Anantha Sent: Thursday, August 3, 2023 2:31 PM To: ceph-users@ceph.io Subject: [ceph-users] Re: cephfs snapshot mirror peer_bootstrap import hung Hi Could you please provide guidance on how to diagnose this issue: In this case, there are two Ceph clusters: cluster A, 4 nodes and cluster B, 3 node, in different locations. Both are already running RGW multi-site, A is master. Cephfs snapshot mirroring is being configured on the clusters. Cluster A is the primary, cluster B is the peer. Cephfs snapshot mirroring is being configured. The bootstrap import step on the primary node hangs. On the target cluster : --- "version": "16.2.5", "release": "pacific", "release_type": "stable" root@cr21meg16ba0101:/# ceph fs snapshot mirror peer_bootstrap create cephfs client.mirror_remote flex2-site {"token": "eyJmc2lkIjogImE2ZjUyNTk4LWU1Y2QtNGEwOC04NDIyLTdiNmZkYjFkNWRiZSIsICJmaWxlc3lzdGVtIjogImNlcGhmcyIsICJ1c2VyIjogImNsaWVudC5taXJyb3JfcmVtb3RlIiwgInNpdGVfbmFtZSI6ICJmbGV4Mi1zaXRlIiwgImtleSI6ICJBUUNmd01sa005MHBMQkFBd1h0dnBwOGowNEl2Qzh0cXBBRzliQT09IiwgIm1vbl9ob3N0IjogIlt2MjoxNzIuMTguNTUuNzE6MzMwMC8wLHYxOjE3Mi4xOC41NS43MTo2Nzg5LzBdIFt2MjoxNzIuMTguNTUuNzM6MzMwMC8wLHYxOjE3Mi4xOC41NS43Mzo2Nzg5LzBdIn0="} root@cr21meg16ba0101:/var/run/ceph# On the source cluster: "version": "17.2.6", "release": "quincy", "release_type": "stable" root@fl31ca104ja0201:/# ceph -s cluster: id: d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e health: HEALTH_OK services: mon: 3 daemons, quorum fl31ca104ja0202,fl31ca104ja0203,fl31ca104ja0201 (age 111m) mgr: fl31ca104ja0201.nwpqlh(active, since 11h), standbys: fl31ca104ja0203, fl31ca104ja0202 mds: 1/1 daemons up, 2 standby osd: 44 osds: 44 up (since 111m), 44 in (since 4w) cephfs-mirror: 1 daemon active (1 hosts) rgw: 3 daemons active (3 hosts, 1 zones) data: volumes: 1/1 healthy pools: 25 pools, 769 pgs objects: 614.40k objects, 1.9 TiB usage: 2.8 TiB used, 292 TiB / 295 TiB avail pgs: 769 active+clean root@fl31ca104ja0302:/# ceph mgr module enable mirroring module 'mirroring' is already enabled root@fl31ca104ja0302:/# ceph fs snapshot mirror peer_bootstrap import cephfs eyJmc2lkIjogImE2ZjUyNTk4LWU1Y2QtNGEwOC04NDIyLTdiNmZkYjFkNWRiZSIsICJmaWxlc3lzdGVtIjogImNlcGhmcyIsICJ1c2VyIjogImNsaWVudC5taXJyb3JfcmVtb3RlIiwgInNpdGVfbmFtZSI6ICJmbGV4Mi1zaXRlIiwgImtleSI6ICJBUUNmd01sa005MHBMQkFBd1h0dnBwOGowNEl2Qzh0cXBBRzliQT09IiwgIm1vbl9ob3N0IjogIlt2MjoxNzIuMTguNTUuNzE6MzMwMC8wLHYxOjE3Mi4xOC41NS43MTo2Nzg5LzBdIFt2MjoxNzIuMTguNTUuNzM6MzMwMC8wLHYxOjE3Mi4xOC41NS43Mzo2Nzg5LzBdIn0= root@fl31ca104ja0201:/# ceph fs snapshot mirror daemon status [{"daemon_id": 5300887, "filesystems": [{"filesystem_id": 1, "name": "cephfs", "directory_count": 0, "peers": []}]}] root@fl31ca104ja0302:/var/run/ceph# ceph --admin-daemon /var/run/ceph/ceph-client.cephfs-mirror.fl31ca104ja0302.sypagt.7.94083135960976.asok status { "metadata": { "ceph_sha1": "d7ff0d10654d2280e08f1ab989c7cdf3064446a5", "ceph_version": "ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)", "entity_id": "cephfs-mirror.fl31ca104ja0302.sypagt", "hostname": "fl31ca104ja0302", "pid": "7", "root": "/" }, "dentry_count": 0, "dentry_pinned_count": 0, "id": 5194553, "inst": { "name": { "type": "client", "num": 5194553 }, "addr": { "type": "v1", "addr": "10.45.129.5:0", "nonce": 2497002034 } }, "addr": { "type": "v1", "addr": "10.45.129.5:0", "nonce": 2497002034 }, "inst_str": "client.5194553 10.45.129.5:0/2497002034", "addr_str": "10.45.129.5:0/2497002034", "inode_count": 1, "mds_epoch": 118, "osd_epoch": 6266, "osd_epoch_barrier": 0, "blocklisted": false, "fs_name": "cephfs" } root@fl31ca104ja0302:/home/general# docker logs ceph-d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e-cephfs-mirror-fl31ca104ja0302-sypagt --tail 10 debug 2023-08-03T05:24:27.413+ 7f8eb6fc0280 0 ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable), process cephfs-mirror, pid 7 debug 2023-08-03T05:24:27.413+ 7f8eb6fc0280 0 pidfile_write: ignore empty --pid-file debug 2023-08-03T05:24:27.445+ 7f8eb6fc0280 1 mgrc service_daemon_register cephfs-mirror.5184622 metadata {arch=x86_64,ceph_release=quincy,ceph_version=ceph version 17.2.6 (d7ff0d1065
[ceph-users] Re: unbalanced OSDs
Take a look at https://github.com/TheJJ/ceph-balancer We switched to it after lot of attempts to make internal balancer work as expected and now we have ~even OSD utilization across cluster: # ./placementoptimizer.py -v balance --ensure-optimal-moves --ensure-variance-decrease [2023-08-03 23:33:27,954] gathering cluster state via ceph api... [2023-08-03 23:33:36,081] running pg balancer [2023-08-03 23:33:36,088] current OSD fill rate per crushclasses: [2023-08-03 23:33:36,089] ssd: average=49.86%, median=50.27%, without_placement_constraints=53.01% [2023-08-03 23:33:36,090] cluster variance for crushclasses: [2023-08-03 23:33:36,090] ssd: 4.163 [2023-08-03 23:33:36,090] min osd.14 44.698% [2023-08-03 23:33:36,090] max osd.22 51.897% [2023-08-03 23:33:36,101] in descending full-order, couldn't empty osd.22, so we're done. if you want to try more often, set --max-full-move-attempts=$nr, this may unlock more balancing possibilities. [2023-08-03 23:33:36,101] [2023-08-03 23:33:36,101] generated 0 remaps. [2023-08-03 23:33:36,101] total movement size: 0.0B. [2023-08-03 23:33:36,102] [2023-08-03 23:33:36,102] old cluster variance per crushclass: [2023-08-03 23:33:36,102] ssd: 4.163 [2023-08-03 23:33:36,102] old min osd.14 44.698% [2023-08-03 23:33:36,102] old max osd.22 51.897% [2023-08-03 23:33:36,102] [2023-08-03 23:33:36,103] new min osd.14 44.698% [2023-08-03 23:33:36,103] new max osd.22 51.897% [2023-08-03 23:33:36,103] new cluster variance: [2023-08-03 23:33:36,103] ssd: 4.163 [2023-08-03 23:33:36,103] On 03.08.2023 16:38, Spiros Papageorgiou wrote: On 03-Aug-23 12:11 PM, Eugen Block wrote: ceph balancer status I changed the PGs and it started rebalancing (and turned autoscaler off) , so now it will not report status: It reports: "optimize_result": "Too many objects (0.088184 > 0.05) are misplaced; try again later" Lets wait a few hours to see what happens... Thanx! Sp ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: cephfs snapshot mirror peer_bootstrap import hung
Hi Could you please provide guidance on how to diagnose this issue: In this case, there are two Ceph clusters: cluster A, 4 nodes and cluster B, 3 node, in different locations. Both are already running RGW multi-site, A is master. Cephfs snapshot mirroring is being configured on the clusters. Cluster A is the primary, cluster B is the peer. Cephfs snapshot mirroring is being configured. The bootstrap import step on the primary node hangs. On the target cluster : --- "version": "16.2.5", "release": "pacific", "release_type": "stable" root@cr21meg16ba0101:/# ceph fs snapshot mirror peer_bootstrap create cephfs client.mirror_remote flex2-site {"token": "eyJmc2lkIjogImE2ZjUyNTk4LWU1Y2QtNGEwOC04NDIyLTdiNmZkYjFkNWRiZSIsICJmaWxlc3lzdGVtIjogImNlcGhmcyIsICJ1c2VyIjogImNsaWVudC5taXJyb3JfcmVtb3RlIiwgInNpdGVfbmFtZSI6ICJmbGV4Mi1zaXRlIiwgImtleSI6ICJBUUNmd01sa005MHBMQkFBd1h0dnBwOGowNEl2Qzh0cXBBRzliQT09IiwgIm1vbl9ob3N0IjogIlt2MjoxNzIuMTguNTUuNzE6MzMwMC8wLHYxOjE3Mi4xOC41NS43MTo2Nzg5LzBdIFt2MjoxNzIuMTguNTUuNzM6MzMwMC8wLHYxOjE3Mi4xOC41NS43Mzo2Nzg5LzBdIn0="} root@cr21meg16ba0101:/var/run/ceph# On the source cluster: "version": "17.2.6", "release": "quincy", "release_type": "stable" root@fl31ca104ja0201:/# ceph -s cluster: id: d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e health: HEALTH_OK services: mon: 3 daemons, quorum fl31ca104ja0202,fl31ca104ja0203,fl31ca104ja0201 (age 111m) mgr: fl31ca104ja0201.nwpqlh(active, since 11h), standbys: fl31ca104ja0203, fl31ca104ja0202 mds: 1/1 daemons up, 2 standby osd: 44 osds: 44 up (since 111m), 44 in (since 4w) cephfs-mirror: 1 daemon active (1 hosts) rgw: 3 daemons active (3 hosts, 1 zones) data: volumes: 1/1 healthy pools: 25 pools, 769 pgs objects: 614.40k objects, 1.9 TiB usage: 2.8 TiB used, 292 TiB / 295 TiB avail pgs: 769 active+clean root@fl31ca104ja0302:/# ceph mgr module enable mirroring module 'mirroring' is already enabled root@fl31ca104ja0302:/# ceph fs snapshot mirror peer_bootstrap import cephfs eyJmc2lkIjogImE2ZjUyNTk4LWU1Y2QtNGEwOC04NDIyLTdiNmZkYjFkNWRiZSIsICJmaWxlc3lzdGVtIjogImNlcGhmcyIsICJ1c2VyIjogImNsaWVudC5taXJyb3JfcmVtb3RlIiwgInNpdGVfbmFtZSI6ICJmbGV4Mi1zaXRlIiwgImtleSI6ICJBUUNmd01sa005MHBMQkFBd1h0dnBwOGowNEl2Qzh0cXBBRzliQT09IiwgIm1vbl9ob3N0IjogIlt2MjoxNzIuMTguNTUuNzE6MzMwMC8wLHYxOjE3Mi4xOC41NS43MTo2Nzg5LzBdIFt2MjoxNzIuMTguNTUuNzM6MzMwMC8wLHYxOjE3Mi4xOC41NS43Mzo2Nzg5LzBdIn0= root@fl31ca104ja0201:/# ceph fs snapshot mirror daemon status [{"daemon_id": 5300887, "filesystems": [{"filesystem_id": 1, "name": "cephfs", "directory_count": 0, "peers": []}]}] root@fl31ca104ja0302:/var/run/ceph# ceph --admin-daemon /var/run/ceph/ceph-client.cephfs-mirror.fl31ca104ja0302.sypagt.7.94083135960976.asok status { "metadata": { "ceph_sha1": "d7ff0d10654d2280e08f1ab989c7cdf3064446a5", "ceph_version": "ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)", "entity_id": "cephfs-mirror.fl31ca104ja0302.sypagt", "hostname": "fl31ca104ja0302", "pid": "7", "root": "/" }, "dentry_count": 0, "dentry_pinned_count": 0, "id": 5194553, "inst": { "name": { "type": "client", "num": 5194553 }, "addr": { "type": "v1", "addr": "10.45.129.5:0", "nonce": 2497002034 } }, "addr": { "type": "v1", "addr": "10.45.129.5:0", "nonce": 2497002034 }, "inst_str": "client.5194553 10.45.129.5:0/2497002034", "addr_str": "10.45.129.5:0/2497002034", "inode_count": 1, "mds_epoch": 118, "osd_epoch": 6266, "osd_epoch_barrier": 0, "blocklisted": false, "fs_name": "cephfs" } root@fl31ca104ja0302:/home/general# docker logs ceph-d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e-cephfs-mirror-fl31ca104ja0302-sypagt --tail 10 debug 2023-08-03T05:24:27.413+ 7f8eb6fc0280 0 ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable), process cephfs-mirror, pid 7 debug 2023-08-03T05:24:27.413+ 7f8eb6fc0280 0 pidfile_write: ignore empty --pid-file debug 2023-08-03T05:24:27.445+ 7f8eb6fc0280 1 mgrc service_daemon_register cephfs-mirror.5184622 metadata {arch=x86_64,ceph_release=quincy,ceph_version=ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable),ceph_version_short=17.2.6,container_hostname=fl31ca104ja0302,container_image=quay.io/ceph/ceph@sha256:af79fedafc42237b7612fe2d18a9c64ca62a0b38ab362e614ad671efa4a0547e,cpu=Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz,distro=centos,distro_description=CentOS Stream 8,distro_version=8,hostname=fl31ca104ja0302,id=fl31ca104ja0302.sypagt,instance_id=5184622,kernel_description=#82-Ub untu SMP Tue Jun 6 23:10:23 UTC 2023,kernel_version=5.15.0-75-generic,mem_swap_kb=8388604,mem_tot
[ceph-users] cephfs snapshot mirror peer_bootstrap import hung
Hi Could you please provide guidance on how to diagnose this issue: In this case, there are two Ceph clusters: cluster A, 4 nodes and cluster B, 3 node, in different locations. Both are already running RGW multi-site, A is master. Cephfs snapshot mirroring is being configured on the clusters. Cluster A is the primary, cluster B is the peer. Cephfs snapshot mirroring is being configured. The bootstrap import step on the primary node hangs. On the target cluster : --- "version": "16.2.5", "release": "pacific", "release_type": "stable" root@cr21meg16ba0101:/# ceph fs snapshot mirror peer_bootstrap create cephfs client.mirror_remote flex2-site {"token": "eyJmc2lkIjogImE2ZjUyNTk4LWU1Y2QtNGEwOC04NDIyLTdiNmZkYjFkNWRiZSIsICJmaWxlc3lzdGVtIjogImNlcGhmcyIsICJ1c2VyIjogImNsaWVudC5taXJyb3JfcmVtb3RlIiwgInNpdGVfbmFtZSI6ICJmbGV4Mi1zaXRlIiwgImtleSI6ICJBUUNmd01sa005MHBMQkFBd1h0dnBwOGowNEl2Qzh0cXBBRzliQT09IiwgIm1vbl9ob3N0IjogIlt2MjoxNzIuMTguNTUuNzE6MzMwMC8wLHYxOjE3Mi4xOC41NS43MTo2Nzg5LzBdIFt2MjoxNzIuMTguNTUuNzM6MzMwMC8wLHYxOjE3Mi4xOC41NS43Mzo2Nzg5LzBdIn0="} root@cr21meg16ba0101:/var/run/ceph# On the source cluster: "version": "17.2.6", "release": "quincy", "release_type": "stable" root@fl31ca104ja0201:/# ceph -s cluster: id: d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e health: HEALTH_OK services: mon: 3 daemons, quorum fl31ca104ja0202,fl31ca104ja0203,fl31ca104ja0201 (age 111m) mgr: fl31ca104ja0201.nwpqlh(active, since 11h), standbys: fl31ca104ja0203, fl31ca104ja0202 mds: 1/1 daemons up, 2 standby osd: 44 osds: 44 up (since 111m), 44 in (since 4w) cephfs-mirror: 1 daemon active (1 hosts) rgw: 3 daemons active (3 hosts, 1 zones) data: volumes: 1/1 healthy pools: 25 pools, 769 pgs objects: 614.40k objects, 1.9 TiB usage: 2.8 TiB used, 292 TiB / 295 TiB avail pgs: 769 active+clean root@fl31ca104ja0302:/# ceph mgr module enable mirroring module 'mirroring' is already enabled root@fl31ca104ja0302:/# ceph fs snapshot mirror peer_bootstrap import cephfs eyJmc2lkIjogImE2ZjUyNTk4LWU1Y2QtNGEwOC04NDIyLTdiNmZkYjFkNWRiZSIsICJmaWxlc3lzdGVtIjogImNlcGhmcyIsICJ1c2VyIjogImNsaWVudC5taXJyb3JfcmVtb3RlIiwgInNpdGVfbmFtZSI6ICJmbGV4Mi1zaXRlIiwgImtleSI6ICJBUUNmd01sa005MHBMQkFBd1h0dnBwOGowNEl2Qzh0cXBBRzliQT09IiwgIm1vbl9ob3N0IjogIlt2MjoxNzIuMTguNTUuNzE6MzMwMC8wLHYxOjE3Mi4xOC41NS43MTo2Nzg5LzBdIFt2MjoxNzIuMTguNTUuNzM6MzMwMC8wLHYxOjE3Mi4xOC41NS43Mzo2Nzg5LzBdIn0= root@fl31ca104ja0302:/var/run/ceph# ceph --admin-daemon /var/run/ceph/ceph-client.cephfs-mirror.fl31ca104ja0302.sypagt.7.94083135960976.asok status { "metadata": { "ceph_sha1": "d7ff0d10654d2280e08f1ab989c7cdf3064446a5", "ceph_version": "ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)", "entity_id": "cephfs-mirror.fl31ca104ja0302.sypagt", "hostname": "fl31ca104ja0302", "pid": "7", "root": "/" }, "dentry_count": 0, "dentry_pinned_count": 0, "id": 5194553, "inst": { "name": { "type": "client", "num": 5194553 }, "addr": { "type": "v1", "addr": "10.45.129.5:0", "nonce": 2497002034 } }, "addr": { "type": "v1", "addr": "10.45.129.5:0", "nonce": 2497002034 }, "inst_str": "client.5194553 10.45.129.5:0/2497002034", "addr_str": "10.45.129.5:0/2497002034", "inode_count": 1, "mds_epoch": 118, "osd_epoch": 6266, "osd_epoch_barrier": 0, "blocklisted": false, "fs_name": "cephfs" } root@fl31ca104ja0302:/home/general# docker logs ceph-d0a3b6e0-d2c3-11ed-be05-a7a3a1d7a87e-cephfs-mirror-fl31ca104ja0302-sypagt --tail 10 debug 2023-08-03T05:24:27.413+ 7f8eb6fc0280 0 ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable), process cephfs-mirror, pid 7 debug 2023-08-03T05:24:27.413+ 7f8eb6fc0280 0 pidfile_write: ignore empty --pid-file debug 2023-08-03T05:24:27.445+ 7f8eb6fc0280 1 mgrc service_daemon_register cephfs-mirror.5184622 metadata {arch=x86_64,ceph_release=quincy,ceph_version=ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable),ceph_version_short=17.2.6,container_hostname=fl31ca104ja0302,container_image=quay.io/ceph/ceph@sha256:af79fedafc42237b7612fe2d18a9c64ca62a0b38ab362e614ad671efa4a0547e,cpu=Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz,distro=centos,distro_description=CentOS Stream 8,distro_version=8,hostname=fl31ca104ja0302,id=fl31ca104ja0302.sypagt,instance_id=5184622,kernel_description=#82-Ubuntu SMP Tue Jun 6 23:10:23 UTC 2023,kernel_version=5.15.0-75-generic,mem_swap_kb=8388604,mem_total_kb=527946928,os=Linux} debug 2023-08-03T05:27:10.419+ 7f8ea1b2c700 0 client.5194553 ms_handle_reset on v2:10.45.128.141:3300/0 debug 2023-08-03T05:50:10.917+ 7f8ea1b2c700
[ceph-users] ceph-csi-cephfs - InvalidArgument desc = provided secret is empty
I’m attempting to setup the CephFS CSI on K3s managed by Rancher against an external CephFS using the Helm chart. I’m using all default values on the Helm chart accept for cephConf and secret. I’ve verified that the configmap ceph-config get’s created with the values from Helm and I’ve verified that the secret csi-cephfs-secret also get’s created with the same values as seen below. Any attempts to create a PVC result in the following error. The only posts I’ve found are about expansion and I am not trying to expand a CephFS volume, just create one. I0803 19:23:39.715036 1 event.go:298] Event(v1.ObjectReference{Kind:"PersistentVolumeClaim", Namespace:"coder", Name:"test", UID:"9c7e51b6-0321-48e1-9950-444f786c14fb", APIVersion:"v1", ResourceVersion:"4523108", FieldPath:""}): type: 'Warning' reason: 'ProvisioningFailed' failed to provision volume with StorageClass "cephfs": rpc error: code = InvalidArgument desc = provided secret is empty cephConfConfigMapName: ceph-config cephconf: | [global] fsid = 9b98ccd8-450e-4172-af70-512e4e77bc36 mon_host = [v2:10.0.5.11:3300/0,v1:10.0.5.11:6789/0] [v2:10.0.5.12:3300/0,v1:10.0.5.12:6789/0] [v2:10.0.5.13:3300/0,v1:10.0.5.13:6789/0] commonLabels: {} configMapName: ceph-csi-config csiConfig: null driverName: cephfs.csi.ceph.com externallyManagedConfigmap: false kubeletDir: /var/lib/kubelet logLevel: 5 nodeplugin: affinity: {} fusemountoptions: '' httpMetrics: containerPort: 8081 enabled: true service: annotations: {} clusterIP: '' enabled: true externalIPs: null loadBalancerIP: '' loadBalancerSourceRanges: null servicePort: 8080 type: ClusterIP imagePullSecrets: null kernelmountoptions: '' name: nodeplugin nodeSelector: {} plugin: image: pullPolicy: IfNotPresent repository: quay.io/cephcsi/cephcsi tag: v3.9.0 resources: {} priorityClassName: system-node-critical profiling: enabled: false registrar: image: pullPolicy: IfNotPresent repository: registry.k8s.io/sig-storage/csi-node-driver-registrar tag: v2.8.0 resources: {} tolerations: null updateStrategy: RollingUpdate pluginSocketFile: csi.sock provisioner: affinity: {} enableHostNetwork: false httpMetrics: containerPort: 8081 enabled: true service: annotations: {} clusterIP: '' enabled: true externalIPs: null loadBalancerIP: '' loadBalancerSourceRanges: null servicePort: 8080 type: ClusterIP imagePullSecrets: null name: provisioner nodeSelector: {} priorityClassName: system-cluster-critical profiling: enabled: false provisioner: extraArgs: null image: pullPolicy: IfNotPresent repository: registry.k8s.io/sig-storage/csi-provisioner tag: v3.5.0 resources: {} replicaCount: 3 resizer: enabled: true extraArgs: null image: pullPolicy: IfNotPresent repository: registry.k8s.io/sig-storage/csi-resizer tag: v1.8.0 name: resizer resources: {} setmetadata: true snapshotter: extraArgs: null image: pullPolicy: IfNotPresent repository: registry.k8s.io/sig-storage/csi-snapshotter tag: v6.2.2 resources: {} strategy: rollingUpdate: maxUnavailable: 50% type: RollingUpdate timeout: 60s tolerations: null provisionerSocketFile: csi-provisioner.sock rbac: create: true secret: adminID: adminKey: create: true name: csi-cephfs-secret selinuxMount: true serviceAccounts: nodeplugin: create: true name: null provisioner: create: true name: null sidecarLogLevel: 1 storageClass: allowVolumeExpansion: true annotations: {} clusterID: controllerExpandSecret: csi-cephfs-secret controllerExpandSecretNamespace: '' create: false fsName: myfs fuseMountOptions: '' kernelMountOptions: '' mountOptions: null mounter: '' name: csi-cephfs-sc nodeStageSecret: csi-cephfs-secret nodeStageSecretNamespace: '' pool: '' provisionerSecret: csi-cephfs-secret provisionerSecretNamespace: '' reclaimPolicy: Delete volumeNamePrefix: '' global: cattle: clusterId: c-m-xschvkd5 clusterName: dev-cluster rkePathPrefix: '' rkeWindowsPathPrefix: '' systemProjectId: p-g6rqs url: https://rancher.example.com ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Backfill Performance for
I am in the process of expanding our cluster capacity by ~50% and have noticed some unexpected behavior during the backfill and recovery process that I'd like to understand and see if there is a better configuration that will yield a faster and smoother backfill. Pool Information: OSDs: 243 spinning HDDs PGs: 1024 (yes, this is low for our number of disks) I inherited the cluster and it has the following settings which seem to have been done in an attempt to get the cluster to recover quickly: osd_max_backfills: 6 (default is 1) osd_recovery_sleep_hdd: 0.0 (default is 0.1) osd_recovery_max_active_hdd: 9 When watching the PGs recover I am noticing a few things: - All PGs seem to be backfilling at the same time which seems to be in violation of osd_max_backfills. I understand that there should be 6 readers and 6 writers at a time, but I'm seeing a given OSD participate in more than 6 PG backfills. Is an OSD only considered as backfilling if it is not present in both the UP and ACTING groups (e.g. it will have it's data altered)? - Some PGs are recovering at a much slower rate than others (some as little as kilobytes per second) despite the disks being all of a similar speed. Is there some way to dig into why that may be? - In general, the recovery is happening very slowly (between 1 and 5 objects per second per PG). Is it possible the settings above are too aggressive and causing performance degradation due to disk thrashing? - Currently, all misplaced PGs are backfilling, if I were to change some of the settings above (specifically `osd_max_backfills`) would that essentially pause backfilling PGs or will those backfills have to end and then start over when it is done waiting? - Given that all PGs are backfilling simultaneously there is no way to prioritize one PG over another (we have some disks with very high usage that we're trying to reduce). Would reducing those max backfills allow for proper prioritization of PGs with force-backfill? - We have had some OSDs restart during the process and their misplaced object count is now zero but they are incrementing their recovering objects bytes. Is that expected and is there a way to estimate when that will complete? Thanks for the help! -Jonathan ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Luminous Bluestore issues and RGW Multi-site Recovery
Hi, Can you show `smartctl -a` for this device? This drives show input/output errors in dmesg when you try to run ceph-osd? k Sent from my iPhone > On 2 Aug 2023, at 21:44, Greg O'Neill wrote: > > Syslog says the drive is not in write-protect mode, however smart says life > remaining is at 1%. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: unbalanced OSDs
On 03-Aug-23 12:11 PM, Eugen Block wrote: ceph balancer status I changed the PGs and it started rebalancing (and turned autoscaler off) , so now it will not report status: It reports: "optimize_result": "Too many objects (0.088184 > 0.05) are misplaced; try again later" Lets wait a few hours to see what happens... Thanx! Sp ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: [EXTERNAL] Upgrading nautilus / centos7 to octopus / ubuntu 20.04. - Suggestions and hints?
We went through this exercise, though our starting point was ubuntu 16.04 / nautilus. We reduced our double builds as follows: 1. Rebuild each monitor host on 18.04/bionic and rejoin still on nautilus 2. Upgrade all mons, mgrs., (and rgws optionally) to pacific 3. Convert each mon, mgr, rgw to cephadm and enable orchestrator 4. Rebuild each mon, mgr, rgw on 20.04/focal and rejoin pacfic cluster 5. Drain and rebuild each osd host on focal and pacific This has the advantage of only having to drain and rebuild the OSD hosts once. Double building the control cluster hosts isn’t so bad, and orchestrator makes all of the ceph parts easy once it’s enabled. The biggest challenge we ran into was: https://tracker.ceph.com/issues/51652 because we still had a lot of filestore osds. It’s frustrating, but we managed to get through it without much client interruption on a dozen prod clusters, most of which were 38 osd hosts and 912 total osds each. One thing which helped, was, before beginning the osd host builds, set all of the old osds primary-affinity to something <1. This way when the new pacific (or octopus) osds join the cluster they will automatically be favored for primary on their pgs. If a heartbeat timeout storm starts to get out of control, start by setting nodown and noout. The flapping osds are the worst. Then figure out which osds are the culprit and restart them. Hopefully your nautilus osds are all bluestore and you won’t have this problem. We put up with it, because the filestore to bluestore conversion was one of the most important parts of this upgrade for us. Best of luck, whatever route you take. Regards, Josh Beaman From: Götz Reinicke Date: Tuesday, August 1, 2023 at 1:01 PM To: ceph-users@ceph.io Subject: [EXTERNAL] [ceph-users] Upgrading nautilus / centos7 to octopus / ubuntu 20.04. - Suggestions and hints? Hi, As I’v read and thought a lot about the migration as this is a bigger project, I was wondering if anyone has done that already and might share some notes or playbooks, because in all readings there where some parts missing or miss understandable to me. I do have some different approaches in mind, so may be you have some suggestions or hints. a) upgrade nautilus on centos 7 with the few missing features like dashboard and prometheus. After that migrate one node after an other to ubuntu 20.04 with octopus and than upgrade ceph to the recent stable version. b) migrate one node after an other to ubuntu 18.04 with nautilus and then upgrade to octupus and after that to ubuntu 20.04. or c) upgrade one node after an other to ubuntu 20.04 with octopus and join it to the cluster until all nodes are upgraded. For test I tried c) with a mon node, but adding that to the cluster fails with some failed state, still probing for the other mons. (I dont have the right log at hand right now.) So my questions are: a) What would be the best (most stable) migration path and b) is it in general possible to add a new octopus mon (not upgraded one) to a nautilus cluster, where the other mons are still on nautilus? I hope my thoughts and questions are understandable :) Thanks for any hint and suggestion. Best . Götz ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: unbalanced OSDs
Turn off the autoscaler and increase pg_num to 512 or so (power of 2). The recommendation is to have between 100 and 150 PGs per OSD (incl. replicas). And then let the balancer handle the rest. What is the current balancer status (ceph balancer status)? Zitat von Spiros Papageorgiou : Hi all, I have a ceph cluster with 3 nodes. ceph version is 16.2.9. There are 7 SSD OSDs on each server and one pool that resides on these OSDs. My OSDs are terribly unbalanced: ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS TYPE NAME -9 28.42200 - 28 TiB 9.3 TiB 9.2 TiB 161 MiB 26 GiB 19 TiB 32.56 1.09 - root ssddisks -2 9.47400 - 9.5 TiB 3.4 TiB 3.4 TiB 66 MiB 9.2 GiB 6.1 TiB 35.52 1.19 - host px1-ssd 0 ssd 1.74599 0.85004 1.7 TiB 810 GiB 807 GiB 3.2 MiB 2.3 GiB 978 GiB 45.28 1.51 26 up osd.0 5 ssd 0.82999 0.85004 850 GiB 581 GiB 580 GiB 22 MiB 912 MiB 269 GiB 68.38 2.29 19 up osd.5 6 ssd 0.82999 1.0 850 GiB 8.2 GiB 7.8 GiB 9.5 MiB 435 MiB 842 GiB 0.97 0.03 4 up osd.6 7 ssd 0.82999 1.0 850 GiB 294 GiB 293 GiB 26 MiB 591 MiB 556 GiB 34.60 1.16 11 up osd.7 16 ssd 1.74599 0.85004 1.7 TiB 872 GiB 869 GiB 3.1 MiB 2.3 GiB 916 GiB 48.75 1.63 27 up osd.16 23 ssd 1.74599 1.0 1.7 TiB 438 GiB 436 GiB 1.5 MiB 1.7 GiB 1.3 TiB 24.48 0.82 14 up osd.23 24 ssd 1.74599 1.0 1.7 TiB 444 GiB 443 GiB 1.6 MiB 1.0 GiB 1.3 TiB 24.81 0.83 17 up osd.24 -6 9.47400 - 9.5 TiB 2.9 TiB 2.9 TiB 46 MiB 8.1 GiB 6.6 TiB 30.39 1.02 - host px2-ssd 12 ssd 0.82999 1.0 850 GiB 154 GiB 154 GiB 21 MiB 368 MiB 696 GiB 18.16 0.61 9 up osd.12 13 ssd 0.82999 1.0 850 GiB 144 GiB 143 GiB 527 KiB 469 MiB 706 GiB 16.92 0.57 4 up osd.13 14 ssd 0.82999 1.0 850 GiB 149 GiB 149 GiB 16 MiB 299 MiB 700 GiB 17.58 0.59 7 up osd.14 29 ssd 1.74599 1.0 1.7 TiB 449 GiB 448 GiB 1.6 MiB 1.4 GiB 1.3 TiB 25.11 0.84 20 up osd.29 30 ssd 1.74599 0.85004 1.7 TiB 885 GiB 882 GiB 3.1 MiB 2.3 GiB 903 GiB 49.48 1.65 31 up osd.30 31 ssd 1.74599 1.0 1.7 TiB 728 GiB 727 GiB 2.6 MiB 1.8 GiB 1.0 TiB 40.74 1.36 22 up osd.31 32 ssd 1.74599 1.0 1.7 TiB 438 GiB 437 GiB 1.6 MiB 1.4 GiB 1.3 TiB 24.51 0.82 15 up osd.32 -4 9.47400 - 9.5 TiB 3.0 TiB 3.0 TiB 49 MiB 8.7 GiB 6.5 TiB 31.78 1.06 - host px3-ssd 19 ssd 0.82999 1.0 850 GiB 293 GiB 292 GiB 14 MiB 500 MiB 557 GiB 34.47 1.15 9 up osd.19 20 ssd 0.82999 1.0 850 GiB 290 GiB 290 GiB 10 MiB 482 MiB 560 GiB 34.15 1.14 10 up osd.20 21 ssd 0.82999 1.0 850 GiB 148 GiB 147 GiB 16 MiB 428 MiB 702 GiB 17.36 0.58 5 up osd.21 25 ssd 1.74599 1.0 1.7 TiB 446 GiB 445 GiB 1.8 MiB 1.6 GiB 1.3 TiB 24.96 0.83 19 up osd.25 26 ssd 1.74599 1.0 1.7 TiB 739 GiB 737 GiB 2.6 MiB 2.0 GiB 1.0 TiB 41.33 1.38 29 up osd.26 27 ssd 1.74599 1.0 1.7 TiB 725 GiB 723 GiB 2.6 MiB 2.1 GiB 1.0 TiB 40.55 1.36 21 up osd.27 28 ssd 1.74599 1.0 1.7 TiB 442 GiB 440 GiB 1.6 MiB 1.7 GiB 1.3 TiB 24.72 0.83 17 up osd.28 I have done a "ceph osd reweight-by-utilization" and "ceph osd set-require-min-compat-client luminous". The pool has 32 PGs which were set by autoscale_mode, which is on. Why are my OSDs, so unbalanced? I have osd.5 with 68.3% and osd.6 with 0.97% Also when the reweight-by-utilization, osd.5 utilization actually increased... What am i missing here? Sp ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] unbalanced OSDs
Hi all, I have a ceph cluster with 3 nodes. ceph version is 16.2.9. There are 7 SSD OSDs on each server and one pool that resides on these OSDs. My OSDs are terribly unbalanced: ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS STATUS TYPE NAME -9 28.42200 - 28 TiB 9.3 TiB 9.2 TiB 161 MiB 26 GiB 19 TiB 32.56 1.09 - root ssddisks -2 9.47400 - 9.5 TiB 3.4 TiB 3.4 TiB 66 MiB 9.2 GiB 6.1 TiB 35.52 1.19 - host px1-ssd 0 ssd 1.74599 0.85004 1.7 TiB 810 GiB 807 GiB 3.2 MiB 2.3 GiB 978 GiB 45.28 1.51 26 up osd.0 5 ssd 0.82999 0.85004 850 GiB 581 GiB 580 GiB 22 MiB 912 MiB 269 GiB 68.38 2.29 19 up osd.5 6 ssd 0.82999 1.0 850 GiB 8.2 GiB 7.8 GiB 9.5 MiB 435 MiB 842 GiB 0.97 0.03 4 up osd.6 7 ssd 0.82999 1.0 850 GiB 294 GiB 293 GiB 26 MiB 591 MiB 556 GiB 34.60 1.16 11 up osd.7 16 ssd 1.74599 0.85004 1.7 TiB 872 GiB 869 GiB 3.1 MiB 2.3 GiB 916 GiB 48.75 1.63 27 up osd.16 23 ssd 1.74599 1.0 1.7 TiB 438 GiB 436 GiB 1.5 MiB 1.7 GiB 1.3 TiB 24.48 0.82 14 up osd.23 24 ssd 1.74599 1.0 1.7 TiB 444 GiB 443 GiB 1.6 MiB 1.0 GiB 1.3 TiB 24.81 0.83 17 up osd.24 -6 9.47400 - 9.5 TiB 2.9 TiB 2.9 TiB 46 MiB 8.1 GiB 6.6 TiB 30.39 1.02 - host px2-ssd 12 ssd 0.82999 1.0 850 GiB 154 GiB 154 GiB 21 MiB 368 MiB 696 GiB 18.16 0.61 9 up osd.12 13 ssd 0.82999 1.0 850 GiB 144 GiB 143 GiB 527 KiB 469 MiB 706 GiB 16.92 0.57 4 up osd.13 14 ssd 0.82999 1.0 850 GiB 149 GiB 149 GiB 16 MiB 299 MiB 700 GiB 17.58 0.59 7 up osd.14 29 ssd 1.74599 1.0 1.7 TiB 449 GiB 448 GiB 1.6 MiB 1.4 GiB 1.3 TiB 25.11 0.84 20 up osd.29 30 ssd 1.74599 0.85004 1.7 TiB 885 GiB 882 GiB 3.1 MiB 2.3 GiB 903 GiB 49.48 1.65 31 up osd.30 31 ssd 1.74599 1.0 1.7 TiB 728 GiB 727 GiB 2.6 MiB 1.8 GiB 1.0 TiB 40.74 1.36 22 up osd.31 32 ssd 1.74599 1.0 1.7 TiB 438 GiB 437 GiB 1.6 MiB 1.4 GiB 1.3 TiB 24.51 0.82 15 up osd.32 -4 9.47400 - 9.5 TiB 3.0 TiB 3.0 TiB 49 MiB 8.7 GiB 6.5 TiB 31.78 1.06 - host px3-ssd 19 ssd 0.82999 1.0 850 GiB 293 GiB 292 GiB 14 MiB 500 MiB 557 GiB 34.47 1.15 9 up osd.19 20 ssd 0.82999 1.0 850 GiB 290 GiB 290 GiB 10 MiB 482 MiB 560 GiB 34.15 1.14 10 up osd.20 21 ssd 0.82999 1.0 850 GiB 148 GiB 147 GiB 16 MiB 428 MiB 702 GiB 17.36 0.58 5 up osd.21 25 ssd 1.74599 1.0 1.7 TiB 446 GiB 445 GiB 1.8 MiB 1.6 GiB 1.3 TiB 24.96 0.83 19 up osd.25 26 ssd 1.74599 1.0 1.7 TiB 739 GiB 737 GiB 2.6 MiB 2.0 GiB 1.0 TiB 41.33 1.38 29 up osd.26 27 ssd 1.74599 1.0 1.7 TiB 725 GiB 723 GiB 2.6 MiB 2.1 GiB 1.0 TiB 40.55 1.36 21 up osd.27 28 ssd 1.74599 1.0 1.7 TiB 442 GiB 440 GiB 1.6 MiB 1.7 GiB 1.3 TiB 24.72 0.83 17 up osd.28 I have done a "ceph osd reweight-by-utilization" and "ceph osd set-require-min-compat-client luminous". The pool has 32 PGs which were set by autoscale_mode, which is on. Why are my OSDs, so unbalanced? I have osd.5 with 68.3% and osd.6 with 0.97% Also when the reweight-by-utilization, osd.5 utilization actually increased... What am i missing here? Sp ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: ceph-volume lvm migrate error
Check out the ownership of the newly created DB device, according to your output it belongs to the root user. In the osd.log you probably should see something related to "permission denied". If you change it to ceph:ceph the OSD might start properly. Zitat von Roland Giesler : Ouch, I got exited too quickly! On 2023/08/02 21:27, Roland Giesler wrote: # systemctl start ceph-osd@14 And, viola!, it did it. # ls -la /var/lib/ceph/osd/ceph-14/block* lrwxrwxrwx 1 ceph ceph 50 Dec 25 2022 /var/lib/ceph/osd/ceph-14/block -> /dev/mapper/0GVWr9-dQ65-LHcx-y6fD-z7fI-10A9-gVWZkY lrwxrwxrwx 1 root root 10 Aug 2 21:17 /var/lib/ceph/osd/ceph-14/block.db -> /dev/dm-20 It crashed! # systemctl status ceph-osd@14 ● ceph-osd@14.service - Ceph object storage daemon osd.14 Loaded: loaded (/lib/systemd/system/ceph-osd@.service; enabled-runtime; vendor preset: enabled) Drop-In: /usr/lib/systemd/system/ceph-osd@.service.d └─ceph-after-pve-cluster.conf Active: failed (Result: exit-code) since Wed 2023-08-02 21:18:54 SAST; 10min ago Process: 520652 ExecStartPre=/usr/libexec/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id 14 (code=exited, status=0/SUCCESS) Process: 520660 ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER} --id 14 --setuser ceph --setgroup ceph (code=exited, status=1/FAILURE) Main PID: 520660 (code=exited, status=1/FAILURE) CPU: 90ms Aug 02 21:18:54 FT1-NodeC systemd[1]: ceph-osd@14.service: Scheduled restart job, restart counter is at 3. Aug 02 21:18:54 FT1-NodeC systemd[1]: Stopped Ceph object storage daemon osd.14. Aug 02 21:18:54 FT1-NodeC systemd[1]: ceph-osd@14.service: Start request repeated too quickly. Aug 02 21:18:54 FT1-NodeC systemd[1]: ceph-osd@14.service: Failed with result 'exit-code'. Aug 02 21:18:54 FT1-NodeC systemd[1]: Failed to start Ceph object storage daemon osd.14. Aug 02 21:28:49 FT1-NodeC systemd[1]: ceph-osd@14.service: Start request repeated too quickly. Aug 02 21:28:49 FT1-NodeC systemd[1]: ceph-osd@14.service: Failed with result 'exit-code'. Aug 02 21:28:49 FT1-NodeC systemd[1]: Failed to start Ceph object storage daemon osd.14. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: ref v18.2.0 QE Validation status
Am 03/08/2023 um 00:30 schrieb Yuri Weinstein: > 1. bookworm distro build support > We will not build bookworm until Debian bug > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1030129 is resolved FYI, there's also a bug in Debian's GCC 12, which is used by default in Debian Bookworm, that causes issues with the gf-complete erasure coding library and older AMD CPU's generating illegal instructions which then kills e.g. the ceph-mon https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1012935 Their workaround is to compile gf-complete explicitly with -O1: https://salsa.debian.org/openstack-team/third-party/gf-complete/-/commit/7751c075f868bf95873c6739d0d942f2a668c58f While we (Proxmox) saw it for Ceph Quincy and didn't yet confirm it for upcoming Ceph Reef, it's quite likely still there as the compiler here seems to be at fault (and gf-complete code didn't change since quincy FWICT). - Thomas ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: mgr services frequently crash on nodes 2,3,4
This mgr assert failure is fixed at https://github.com/ceph/ceph/pull/46688 You can upgrade to 16.2.13 to get the fix. Eugen Block 于2023年8月3日周四 14:57写道: > Can you query those config options yourself? > > storage01:~ # ceph config get mgr mgr/dashboard/standby_behaviour > storage01:~ # ceph config get mgr mgr/dashboard/AUDIT_API_ENABLED > > I'm not sure if those are responsible for the crash though. > > Zitat von "Adiga, Anantha" : > > > Hi, > > > > Mgr service crash frequently on nodes 2 3 and 4 with the same > > condition after the 4th node was added. > > > > root@zp3110b001a0104:/# ceph crash stat > > 19 crashes recorded > > 16 older than 1 days old: > > 2023-07-29T03:35:32.006309Z_7b622c2b-a2fc-425a-acb8-dc1673b4c189 > > 2023-07-29T03:35:32.055174Z_a2ee1e23-5f41-4dbe-86ff-643fbf870dc9 > > 2023-07-29T14:34:13.752432Z_39b6a0d9-1bc3-4481-9a14-c92fea6c2710 > > 2023-07-30T03:02:57.510867Z_df595e04-0ac2-4e3d-93be-a7225348ea19 > > 2023-07-30T06:20:09.322530Z_0c2485f8-281c-4440-8b08-89b08a669de4 > > 2023-07-30T10:16:46.798405Z_79082f37-ee08-4a2b-84d1-d96c4026f321 > > 2023-07-30T10:16:46.843441Z_788391d6-3278-48c4-a95b-1934ee3265c1 > > 2023-07-31T02:26:55.903966Z_416a1e94-a8e1-4057-a683-a907faf400a1 > > 2023-07-31T04:40:10.216044Z_bef9d811-4e92-45cd-bcd7-3282962c8dfe > > 2023-07-31T08:44:20.893344Z_037688ae-266f-4879-932c-2239f4679fd6 > > 2023-07-31T09:22:12.527968Z_f136c93b-7156-4176-a734-66a5a62513a4 > > 2023-07-31T15:22:08.417988Z_b80c6255-5eb3-41dd-b0b1-8bc5b070094f > > 2023-07-31T23:05:16.589501Z_20ed8ef9-a478-49de-a371-08ea7a9937e5 > > 2023-08-01T01:26:01.911387Z_670f9e3c-7fbe-497f-9f0b-abeaefd8f2b3 > > 2023-08-01T01:51:39.759874Z_ff8206e4-34aa-44fe-82ac-7339e6714bb7 > > 2023-08-01T01:56:21.955706Z_98c86cdd-45ec-47dc-8f0c-2e5e09731db8 > > 7 older than 3 days old: > > 2023-07-29T03:35:32.006309Z_7b622c2b-a2fc-425a-acb8-dc1673b4c189 > > 2023-07-29T03:35:32.055174Z_a2ee1e23-5f41-4dbe-86ff-643fbf870dc9 > > 2023-07-29T14:34:13.752432Z_39b6a0d9-1bc3-4481-9a14-c92fea6c2710 > > 2023-07-30T03:02:57.510867Z_df595e04-0ac2-4e3d-93be-a7225348ea19 > > 2023-07-30T06:20:09.322530Z_0c2485f8-281c-4440-8b08-89b08a669de4 > > 2023-07-30T10:16:46.798405Z_79082f37-ee08-4a2b-84d1-d96c4026f321 > > 2023-07-30T10:16:46.843441Z_788391d6-3278-48c4-a95b-1934ee3265c1 > > > > root@zp3110b001a0104 > :/var/lib/ceph/8dbfcd81-fee3-49d2-ac0c-e988c8be7178/crash/posted/2023-07-31T08:44:20.893344Z_037688ae-266f-4879-932c-2239f4679fd6# root@zp3110b001a0104:/var/lib/ceph/8dbfcd81-fee3-49d2-ac0c-e988c8be7178/crash/posted/2023-07-31T08:44:20.893344Z_037688ae-266f-4879-932c-2239f4679fd6#> > cat > > meta > > { > > "crash_id": > > "2023-07-31T08:44:20.893344Z_037688ae-266f-4879-932c-2239f4679fd6", > > "timestamp": "2023-07-31T08:44:20.893344Z", > > "process_name": "ceph-mgr", > > "entity_name": "mgr.zp3110b001a0104.tmbkzq", > > "ceph_version": "16.2.5", > > "utsname_hostname": "zp3110b001a0104", > > "utsname_sysname": "Linux", > > "utsname_release": "5.4.0-153-generic", > > "utsname_version": "#170-Ubuntu SMP Fri Jun 16 13:43:31 UTC 2023", > > "utsname_machine": "x86_64", > > "os_name": "CentOS Linux", > > "os_id": "centos", > > "os_version_id": "8", > > "os_version": "8", > > "assert_condition": "pending_service_map.epoch > service_map.epoch", > > "assert_func": "DaemonServer::got_service_map():: > ServiceMap&)>", > > "assert_file": > > > "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.5/rpm/el8/BUILD/ceph-16.2.5/src/mgr/DaemonServer.cc", > > "assert_line": 2932, > > "assert_thread_name": "ms_dispatch", > > "assert_msg": > > > "/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.5/rpm/el8/BUILD/ceph-16.2.5/src/mgr/DaemonServer.cc: > In function 'DaemonServer::got_service_map()::' > thread 7f127440a700 time > 2023-07-31T08:44:20.887150+\n/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZE/gigantic/release/16.2.5/rpm/el8/BUILD/ceph-16.2.5/src/mgr/DaemonServer.cc: > 2932: FAILED ceph_assert(pending_service_map.epoch > > > service_map.epoch)\n", > > "backtrace": [ > > "/lib64/libpthread.so.0(+0x12b20) [0x7f127c611b20]", > > "gsignal()", > > "abort()", > > "(ceph::__ceph_assert_fail(char const*, char const*, int, > > char const*)+0x1a9) [0x7f127da26b75]", > > "/usr/lib64/ceph/libceph-common.so.2(+0x276d3e) > [0x7f127da26d3e]", > > "(DaemonServer::got_service_map()+0xb2d) [0x5625aee23a4d]", > > > > "(Mgr::handle_service_map(boost::intrusive_ptr)+0x1b6) > > [0x5625aee527c6]", > > "(Mgr::ms_dispatch2(boost::intrusive_ptr > > const&)+0x894) [0x5625aee55424]", > > "(MgrStandby::ms_dispatch2(boost::intrusive_ptr > > const&)+0xb0)