[ceph-users] Re: Building new cluster had a couple of questions
On 2023-12-22 03:28, Robert Sander wrote: Hi, On 22.12.23 11:41, Albert Shih wrote: for n in 1-100 Put off line osd on server n Uninstall docker on server n Install podman on server n redeploy on server n end Yep, that's basically the procedure. But first try it on a test cluster. Regards For reference, this was also discussed about two years ago: https://www.spinics.net/lists/ceph-users/msg70108.html Worked for me. // Johan ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Misplaced objects greater than 100%
I think this is resolved—and you're right about the 0-weight of the root bucket being strange. I had created the rack buckets with # ceph osd crush add-bucket rack-0 rack whereas I should have used something like # ceph osd crush add-bucket rack-0 rack root=default There's a bit in the documentation (https://docs.ceph.com/en/quincy/rados/operations/crush-map) that says "Not all keys need to be specified" (in a different context, I admit). I might have saved a second or two by omitting "root=default" and maybe half a minute by not checking the CRUSH map carefully afterwards. It was not worth it. // J On 2023-04-05 12:01, c...@elchaka.de wrote: I guess this is related to your crush rules.. Unfortunaly i dont know much about creating the rules... But someone cloud give more insights when you also provide crush rule dump your "-1 0 root default" is a bit strange Am 1. April 2023 01:01:39 MESZ schrieb Johan Hattne : Here goes: # ceph -s cluster: id: e1327a10-8b8c-11ed-88b9-3cecef0e3946 health: HEALTH_OK services: mon: 5 daemons, quorum bcgonen-a,bcgonen-b,bcgonen-c,bcgonen-r0h0,bcgonen-r0h1 (age 16h) mgr: bcgonen-b.furndm(active, since 8d), standbys: bcgonen-a.qmmqxj mds: 1/1 daemons up, 2 standby osd: 36 osds: 36 up (since 16h), 36 in (since 3d); 1041 remapped pgs data: volumes: 1/1 healthy pools: 3 pools, 1041 pgs objects: 5.42M objects, 6.5 TiB usage: 19 TiB used, 428 TiB / 447 TiB avail pgs: 27087125/16252275 objects misplaced (166.667%) 1039 active+clean+remapped 2active+clean+remapped+scrubbing+deep # ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -14 149.02008 rack rack-1 -7 149.02008 host bcgonen-r1h0 20hdd 14.55269 osd.20 up 1.0 1.0 21hdd 14.55269 osd.21 up 1.0 1.0 22hdd 14.55269 osd.22 up 1.0 1.0 23hdd 14.55269 osd.23 up 1.0 1.0 24hdd 14.55269 osd.24 up 1.0 1.0 25hdd 14.55269 osd.25 up 1.0 1.0 26hdd 14.55269 osd.26 up 1.0 1.0 27hdd 14.55269 osd.27 up 1.0 1.0 28hdd 14.55269 osd.28 up 1.0 1.0 29hdd 14.55269 osd.29 up 1.0 1.0 34ssd1.74660 osd.34 up 1.0 1.0 35ssd1.74660 osd.35 up 1.0 1.0 -13 298.04016 rack rack-0 -3 149.02008 host bcgonen-r0h0 0hdd 14.55269 osd.0 up 1.0 1.0 1hdd 14.55269 osd.1 up 1.0 1.0 2hdd 14.55269 osd.2 up 1.0 1.0 3hdd 14.55269 osd.3 up 1.0 1.0 4hdd 14.55269 osd.4 up 1.0 1.0 5hdd 14.55269 osd.5 up 1.0 1.0 6hdd 14.55269 osd.6 up 1.0 1.0 7hdd 14.55269 osd.7 up 1.0 1.0 8hdd 14.55269 osd.8 up 1.0 1.0 9hdd 14.55269 osd.9 up 1.0 1.0 30ssd1.74660 osd.30 up 1.0 1.0 31ssd1.74660 osd.31 up 1.0 1.0 -5 149.02008 host bcgonen-r0h1 10hdd 14.55269 osd.10 up 1.0 1.0 11hdd 14.55269 osd.11 up 1.0 1.0 12hdd 14.55269 osd.12 up 1.0 1.0 13hdd 14.55269 osd.13 up 1.0 1.0 14hdd 14.55269 osd.14 up 1.0 1.0 15hdd 14.55269 osd.15 up 1.0 1.0 16hdd 14.55269 osd.16 up 1.0 1.0 17hdd 14.55269 osd.17 up 1.0 1.0 18hdd 14.55269 osd.18 up 1.0 1.0 19hdd 14.55269 osd.19 up 1.0 1.0 32ssd1.74660 osd.32 up 1.0 1.0 33ssd1.74660 osd.33 up 1.0 1.0 -1 0 root default # ceph osd pool ls detail pool 1 '.mgr' replicated size 3 min_
[ceph-users] Re: Misplaced objects greater than 100%
Thanks Mehmet; I took a closer look at what I sent you and the problem appears to be in the CRUSH map. At some point since anything was last rebooted, I created rack buckets and moved the OSD nodes in under them: # ceph osd crush add-bucket rack-0 rack # ceph osd crush add-bucket rack-1 rack # ceph osd crush move bcgonen-r0h0 rack=rack-0 # ceph osd crush move bcgonen-r0h1 rack=rack-0 # ceph osd crush move bcgonen-r1h0 rack=rack-1 All seemed fine at the time; it was not until bcgonen-r1h0 was rebooted that stuff got weird. But as per "ceph osd tree" output, those rack buckets were sitting next to the default root as opposed to under it. Now that's fixed, and the cluster is backfilling remapped PGs. // J On 2023-03-31 16:01, Johan Hattne wrote: Here goes: # ceph -s cluster: id: e1327a10-8b8c-11ed-88b9-3cecef0e3946 health: HEALTH_OK services: mon: 5 daemons, quorum bcgonen-a,bcgonen-b,bcgonen-c,bcgonen-r0h0,bcgonen-r0h1 (age 16h) mgr: bcgonen-b.furndm(active, since 8d), standbys: bcgonen-a.qmmqxj mds: 1/1 daemons up, 2 standby osd: 36 osds: 36 up (since 16h), 36 in (since 3d); 1041 remapped pgs data: volumes: 1/1 healthy pools: 3 pools, 1041 pgs objects: 5.42M objects, 6.5 TiB usage: 19 TiB used, 428 TiB / 447 TiB avail pgs: 27087125/16252275 objects misplaced (166.667%) 1039 active+clean+remapped 2 active+clean+remapped+scrubbing+deep # ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -14 149.02008 rack rack-1 -7 149.02008 host bcgonen-r1h0 20 hdd 14.55269 osd.20 up 1.0 1.0 21 hdd 14.55269 osd.21 up 1.0 1.0 22 hdd 14.55269 osd.22 up 1.0 1.0 23 hdd 14.55269 osd.23 up 1.0 1.0 24 hdd 14.55269 osd.24 up 1.0 1.0 25 hdd 14.55269 osd.25 up 1.0 1.0 26 hdd 14.55269 osd.26 up 1.0 1.0 27 hdd 14.55269 osd.27 up 1.0 1.0 28 hdd 14.55269 osd.28 up 1.0 1.0 29 hdd 14.55269 osd.29 up 1.0 1.0 34 ssd 1.74660 osd.34 up 1.0 1.0 35 ssd 1.74660 osd.35 up 1.0 1.0 -13 298.04016 rack rack-0 -3 149.02008 host bcgonen-r0h0 0 hdd 14.55269 osd.0 up 1.0 1.0 1 hdd 14.55269 osd.1 up 1.0 1.0 2 hdd 14.55269 osd.2 up 1.0 1.0 3 hdd 14.55269 osd.3 up 1.0 1.0 4 hdd 14.55269 osd.4 up 1.0 1.0 5 hdd 14.55269 osd.5 up 1.0 1.0 6 hdd 14.55269 osd.6 up 1.0 1.0 7 hdd 14.55269 osd.7 up 1.0 1.0 8 hdd 14.55269 osd.8 up 1.0 1.0 9 hdd 14.55269 osd.9 up 1.0 1.0 30 ssd 1.74660 osd.30 up 1.0 1.0 31 ssd 1.74660 osd.31 up 1.0 1.0 -5 149.02008 host bcgonen-r0h1 10 hdd 14.55269 osd.10 up 1.0 1.0 11 hdd 14.55269 osd.11 up 1.0 1.0 12 hdd 14.55269 osd.12 up 1.0 1.0 13 hdd 14.55269 osd.13 up 1.0 1.0 14 hdd 14.55269 osd.14 up 1.0 1.0 15 hdd 14.55269 osd.15 up 1.0 1.0 16 hdd 14.55269 osd.16 up 1.0 1.0 17 hdd 14.55269 osd.17 up 1.0 1.0 18 hdd 14.55269 osd.18 up 1.0 1.0 19 hdd 14.55269 osd.19 up 1.0 1.0 32 ssd 1.74660 osd.32 up 1.0 1.0 33 ssd 1.74660 osd.33 up 1.0 1.0 -1 0 root default # ceph osd pool ls detail pool 1 '.mgr' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 31 flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr pool 2 'cephfs.cephfs.meta' replicated size 3 min_size 2 crush_rule 2 object_hash rjenkins pg_num 16 pgp_num 16 autoscale_mode on last_change 9833 lfor 0/0/584 flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 16 recovery_priority 5 application cephfs pool 3 &
[ceph-users] Re: Misplaced objects greater than 100%
Here goes: # ceph -s cluster: id: e1327a10-8b8c-11ed-88b9-3cecef0e3946 health: HEALTH_OK services: mon: 5 daemons, quorum bcgonen-a,bcgonen-b,bcgonen-c,bcgonen-r0h0,bcgonen-r0h1 (age 16h) mgr: bcgonen-b.furndm(active, since 8d), standbys: bcgonen-a.qmmqxj mds: 1/1 daemons up, 2 standby osd: 36 osds: 36 up (since 16h), 36 in (since 3d); 1041 remapped pgs data: volumes: 1/1 healthy pools: 3 pools, 1041 pgs objects: 5.42M objects, 6.5 TiB usage: 19 TiB used, 428 TiB / 447 TiB avail pgs: 27087125/16252275 objects misplaced (166.667%) 1039 active+clean+remapped 2active+clean+remapped+scrubbing+deep # ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -14 149.02008 rack rack-1 -7 149.02008 host bcgonen-r1h0 20hdd 14.55269 osd.20 up 1.0 1.0 21hdd 14.55269 osd.21 up 1.0 1.0 22hdd 14.55269 osd.22 up 1.0 1.0 23hdd 14.55269 osd.23 up 1.0 1.0 24hdd 14.55269 osd.24 up 1.0 1.0 25hdd 14.55269 osd.25 up 1.0 1.0 26hdd 14.55269 osd.26 up 1.0 1.0 27hdd 14.55269 osd.27 up 1.0 1.0 28hdd 14.55269 osd.28 up 1.0 1.0 29hdd 14.55269 osd.29 up 1.0 1.0 34ssd1.74660 osd.34 up 1.0 1.0 35ssd1.74660 osd.35 up 1.0 1.0 -13 298.04016 rack rack-0 -3 149.02008 host bcgonen-r0h0 0hdd 14.55269 osd.0 up 1.0 1.0 1hdd 14.55269 osd.1 up 1.0 1.0 2hdd 14.55269 osd.2 up 1.0 1.0 3hdd 14.55269 osd.3 up 1.0 1.0 4hdd 14.55269 osd.4 up 1.0 1.0 5hdd 14.55269 osd.5 up 1.0 1.0 6hdd 14.55269 osd.6 up 1.0 1.0 7hdd 14.55269 osd.7 up 1.0 1.0 8hdd 14.55269 osd.8 up 1.0 1.0 9hdd 14.55269 osd.9 up 1.0 1.0 30ssd1.74660 osd.30 up 1.0 1.0 31ssd1.74660 osd.31 up 1.0 1.0 -5 149.02008 host bcgonen-r0h1 10hdd 14.55269 osd.10 up 1.0 1.0 11hdd 14.55269 osd.11 up 1.0 1.0 12hdd 14.55269 osd.12 up 1.0 1.0 13hdd 14.55269 osd.13 up 1.0 1.0 14hdd 14.55269 osd.14 up 1.0 1.0 15hdd 14.55269 osd.15 up 1.0 1.0 16hdd 14.55269 osd.16 up 1.0 1.0 17hdd 14.55269 osd.17 up 1.0 1.0 18hdd 14.55269 osd.18 up 1.0 1.0 19hdd 14.55269 osd.19 up 1.0 1.0 32ssd1.74660 osd.32 up 1.0 1.0 33ssd1.74660 osd.33 up 1.0 1.0 -1 0 root default # ceph osd pool ls detail pool 1 '.mgr' replicated size 3 min_size 2 crush_rule 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 31 flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr pool 2 'cephfs.cephfs.meta' replicated size 3 min_size 2 crush_rule 2 object_hash rjenkins pg_num 16 pgp_num 16 autoscale_mode on last_change 9833 lfor 0/0/584 flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 16 recovery_priority 5 application cephfs pool 3 'cephfs.cephfs.data' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 1024 pgp_num 1024 autoscale_mode on last_change 7630 lfor 0/1831/6544 flags hashpspool,bulk stripe_width 0 application cephfs crush_rules 1 and 2 are just used to assign the data and meta pool to HDD and SSD, respectively (failure domain: host). // J On 2023-03-31 15:37, c...@elchaka.de wrote: Need to know some more about your cluster... Ceph -s Ceph osd df tree Replica or ec? ... Perhaps this can give us some insight Mehmet Am 31. März 2023 18:08:38 MESZ schrieb Johan Hattne : Dear all; Up until a few hours ago, I had a seemingly normally-behaving cluster (Quincy, 17.2.5) with 36 OSDs, evenly distributed across 3 of its 6 nodes. The cluster is only used for CephFS and the only non-standard configuration I can t
[ceph-users] Misplaced objects greater than 100%
Dear all; Up until a few hours ago, I had a seemingly normally-behaving cluster (Quincy, 17.2.5) with 36 OSDs, evenly distributed across 3 of its 6 nodes. The cluster is only used for CephFS and the only non-standard configuration I can think of is that I had 2 active MDSs, but only 1 standby. I had also doubled mds_cache_memory limit to 8 GB (all OSD hosts have 256 G of RAM) at some point in the past. Then I rebooted one of the OSD nodes. The rebooted node held one of the active MDSs. Now the node is back up: ceph -s says the cluster is healthy, but all PGs are in a active+clean+remapped state and 166.67% of the objects are misplaced (dashboard: -66.66% healthy). The data pool is a threefold replica with 5.4M object, the number of misplaced objects is reported as 27087410/16252446. The denominator in the ratio makes sense to me (16.2M / 3 = 5.4M), but the numerator does not. I also note that the ratio is *exactly* 5 / 3. The filesystem is still mounted and appears to be usable, but df reports it as 100% full; I suspect it would say 167% but that is capped somewhere. Any ideas about what is going on? Any suggestions for recovery? // Best wishes; Johan ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: OSD failed to load OSD map for epoch
OK, thanks! This is the same package as in the Octopus images, so I would expect Pacific to fail just as spectacularly. What's the best way to have this fixed? New issue on the Ceph tracker? I understand the Ceph images use CentOS packages, so should they be poked as well? // Best wishes; Johan On 2021-07-27 23:48, Eugen Block wrote: Alright, it's great that you could fix it! In my one-node test cluster (Pacific) I see this smartctl version: [ceph: root@pacific /]# rpm -q smartmontools smartmontools-7.1-1.el8.x86_64 Zitat von Johan Hattne : Thanks a lot, Eugen! I had not found those threads, but I did eventually recover; details below. And yes, this is a toy size-2 cluster with two OSDs, but I suspect I would seen the same problem on a more reasonable setup since this whole mess was caused by Octopus's smartmontools not playing nice with the NVMes. Just as in the previous thread Eugen provided, I got an OSD map from the monitors: # ceph osd getmap 4372 > /tmp/osd_map_4372 copied it to the OSD hosts and imported it: # CEPH_ARGS="--bluestore-ignore-data-csum" ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --op set-osdmap --file /tmp/osd_map_4372 Given the initial cause of the error, I removed the WAL devices: # ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-0 --devs-source /var/lib/ceph/osd/ceph-0/block.wal --dev-target /var/lib/ceph/osd/ceph-0/block --command bluefs-bdev-migrate # ceph-volume lvm zap /var/lib/ceph/osd/ceph-0/block.wal Here I got bitten by what looks like #49554, so # lvchange --deltag "ceph.wal_device=/dev/ceph-wal/wal-0" --deltag "ceph.wal_uuid=G7Z5qA-OaJQ-Spvs-X4ec-0SvX-vT2C-C0Dbpe" /dev/ceph-block-0/block-0 And analogously for osd1. After restarting the OSDs, deep scrubbing, and a bit of manual repair, the cluster is healthy again. The reason for the crash turns out to be a known problem with smartmontools <7.2 and the Micron 2200 NVMes that were used to back the WAL (https://www.smartmontools.org/ticket/1404). Unfortunately, the Octopus image ships with smartmontools 7.1, which will crash the kernel on e.g. "smartctl -a /dev/nvme0". Before switching to Octopus containers, I was using smartmontools from Debian backports, which does not have this problem. Does Pacific have newer smartmontools? // Best wishes; Johan On 2021-07-27 06:35, Eugen Block wrote: Hi, did you read this thread [1] reporting a similar issue? It refers to a solution described in [2] but the OP in [1] recreated all OSDs, so it's not clear what the root cause was. Can you start the OSD with more verbose (debug) output and share that? Does your cluster really have only two OSDs? Are you running it with size 2 pools? [1] https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/EUFDKK3HEA5DPTUVJ5LBNQSWAKZH5ZM7/ [2] http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-August/036592.html Zitat von Johan Hattne : Dear all; We have 3-node cluster that has two OSDs on separate nodes, each with wal on NVMe. It's been running fine for quite some time, albeit under very light load. This week, we moved from package-based Octopus to container-based ditto (15.2.13, all on Debian stable). Within a few hours of that change, both OSDs crashed and dmesg filled up with stuff like: DMAR: DRHD: handling fault status reg 2 DMAR: [DMA Read] Request device [06:00.0] PASID fault addr ffbc [fault reason 06] PTE Read access is not set where 06:00.0 is the NVMe with the wal. This happened at the same time on *both* OSD nodes, but I'll worry about why this happened later. I would first like to see if I can get the cluster back up. From cephadm shell, I activate OSD 1 and try to start it (I did create a minimal /etc/ceph/ceph.conf with global "fsid" and "mon host" for that purpose): # ceph-volume lvm activate 1 cce125b2-2597-4be9-bd17-23eb059d2778 --no-systemd # ceph-osd -d --cluster ceph --id 1 This gives "osd.1 0 OSD::init() : unable to read osd superblock", and the subsequent output indicates that this due to checksum errors. So ignore checksum mismatches and try again: # CEPH_ARGS="--bluestore-ignore-data-csum" ceph-osd -d --cluster ceph --id 1 which results in "osd.1 0 failed to load OSD map for epoch 4372, got 0 bytes". The monitors are at 4378, as per: # ceph osd stat 2 osds: 0 up (since 47h), 1 in (since 47h); epoch: e4378 Is there any way to get past this? For instance, could I coax the OSDs into epoch 4378? This is the first time I deal a ceph disaster, so there may be all kinds of obvious things I'm missing. // Best wishes; Johan ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___
[ceph-users] Re: OSD failed to load OSD map for epoch
Thanks a lot, Eugen! I had not found those threads, but I did eventually recover; details below. And yes, this is a toy size-2 cluster with two OSDs, but I suspect I would seen the same problem on a more reasonable setup since this whole mess was caused by Octopus's smartmontools not playing nice with the NVMes. Just as in the previous thread Eugen provided, I got an OSD map from the monitors: # ceph osd getmap 4372 > /tmp/osd_map_4372 copied it to the OSD hosts and imported it: # CEPH_ARGS="--bluestore-ignore-data-csum" ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-0/ --op set-osdmap --file /tmp/osd_map_4372 Given the initial cause of the error, I removed the WAL devices: # ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-0 --devs-source /var/lib/ceph/osd/ceph-0/block.wal --dev-target /var/lib/ceph/osd/ceph-0/block --command bluefs-bdev-migrate # ceph-volume lvm zap /var/lib/ceph/osd/ceph-0/block.wal Here I got bitten by what looks like #49554, so # lvchange --deltag "ceph.wal_device=/dev/ceph-wal/wal-0" --deltag "ceph.wal_uuid=G7Z5qA-OaJQ-Spvs-X4ec-0SvX-vT2C-C0Dbpe" /dev/ceph-block-0/block-0 And analogously for osd1. After restarting the OSDs, deep scrubbing, and a bit of manual repair, the cluster is healthy again. The reason for the crash turns out to be a known problem with smartmontools <7.2 and the Micron 2200 NVMes that were used to back the WAL (https://www.smartmontools.org/ticket/1404). Unfortunately, the Octopus image ships with smartmontools 7.1, which will crash the kernel on e.g. "smartctl -a /dev/nvme0". Before switching to Octopus containers, I was using smartmontools from Debian backports, which does not have this problem. Does Pacific have newer smartmontools? // Best wishes; Johan On 2021-07-27 06:35, Eugen Block wrote: Hi, did you read this thread [1] reporting a similar issue? It refers to a solution described in [2] but the OP in [1] recreated all OSDs, so it's not clear what the root cause was. Can you start the OSD with more verbose (debug) output and share that? Does your cluster really have only two OSDs? Are you running it with size 2 pools? [1] https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/EUFDKK3HEA5DPTUVJ5LBNQSWAKZH5ZM7/ [2] http://lists.ceph.com/pipermail/ceph-users-ceph.com/2019-August/036592.html Zitat von Johan Hattne : Dear all; We have 3-node cluster that has two OSDs on separate nodes, each with wal on NVMe. It's been running fine for quite some time, albeit under very light load. This week, we moved from package-based Octopus to container-based ditto (15.2.13, all on Debian stable). Within a few hours of that change, both OSDs crashed and dmesg filled up with stuff like: DMAR: DRHD: handling fault status reg 2 DMAR: [DMA Read] Request device [06:00.0] PASID fault addr ffbc [fault reason 06] PTE Read access is not set where 06:00.0 is the NVMe with the wal. This happened at the same time on *both* OSD nodes, but I'll worry about why this happened later. I would first like to see if I can get the cluster back up. From cephadm shell, I activate OSD 1 and try to start it (I did create a minimal /etc/ceph/ceph.conf with global "fsid" and "mon host" for that purpose): # ceph-volume lvm activate 1 cce125b2-2597-4be9-bd17-23eb059d2778 --no-systemd # ceph-osd -d --cluster ceph --id 1 This gives "osd.1 0 OSD::init() : unable to read osd superblock", and the subsequent output indicates that this due to checksum errors. So ignore checksum mismatches and try again: # CEPH_ARGS="--bluestore-ignore-data-csum" ceph-osd -d --cluster ceph --id 1 which results in "osd.1 0 failed to load OSD map for epoch 4372, got 0 bytes". The monitors are at 4378, as per: # ceph osd stat 2 osds: 0 up (since 47h), 1 in (since 47h); epoch: e4378 Is there any way to get past this? For instance, could I coax the OSDs into epoch 4378? This is the first time I deal a ceph disaster, so there may be all kinds of obvious things I'm missing. // Best wishes; Johan ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] OSD failed to load OSD map for epoch
Dear all; We have 3-node cluster that has two OSDs on separate nodes, each with wal on NVMe. It's been running fine for quite some time, albeit under very light load. This week, we moved from package-based Octopus to container-based ditto (15.2.13, all on Debian stable). Within a few hours of that change, both OSDs crashed and dmesg filled up with stuff like: DMAR: DRHD: handling fault status reg 2 DMAR: [DMA Read] Request device [06:00.0] PASID fault addr ffbc [fault reason 06] PTE Read access is not set where 06:00.0 is the NVMe with the wal. This happened at the same time on *both* OSD nodes, but I'll worry about why this happened later. I would first like to see if I can get the cluster back up. From cephadm shell, I activate OSD 1 and try to start it (I did create a minimal /etc/ceph/ceph.conf with global "fsid" and "mon host" for that purpose): # ceph-volume lvm activate 1 cce125b2-2597-4be9-bd17-23eb059d2778 --no-systemd # ceph-osd -d --cluster ceph --id 1 This gives "osd.1 0 OSD::init() : unable to read osd superblock", and the subsequent output indicates that this due to checksum errors. So ignore checksum mismatches and try again: # CEPH_ARGS="--bluestore-ignore-data-csum" ceph-osd -d --cluster ceph --id 1 which results in "osd.1 0 failed to load OSD map for epoch 4372, got 0 bytes". The monitors are at 4378, as per: # ceph osd stat 2 osds: 0 up (since 47h), 1 in (since 47h); epoch: e4378 Is there any way to get past this? For instance, could I coax the OSDs into epoch 4378? This is the first time I deal a ceph disaster, so there may be all kinds of obvious things I'm missing. // Best wishes; Johan ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io