[ceph-users] Most OSDs down and all PGs unknown after P2V migration

leo Thu, 30 Jan 2025 17:09:10 -0800

I run a small single-node ceph cluster (the plan was to scale out when I could 
get more equipment) for home file storage deployed by cephadm. It was running 
bare-metal, and I attempted a physical-to-virtual migration to a Proxmox VM 
After doing so, all of my PGs seemed to be "unknown". Initial after a boot, the 
OSDs appear to be up, but after a while, they go down. I assume some sort of 
timeout in the OSD start process. The systemd processes (and podman containers) 
are still running and appear to be happy. I don't see anything that jumps out 
at me in their logs. I'm relatively new to Ceph, so I don't really know where 
to go from here. Can anyone provide any guidance?


**ceph -s**
```
 cluster:
    id:     768819b0-a83f-11ee-81d6-74563c5bfc7b
    health: HEALTH_WARN
            Reduced data availability: 545 pgs inactive
            139 pgs not deep-scrubbed in time
            17 slow ops, oldest one blocked for 1668 sec, mon.fileserver has 
slow ops

  services:
    mon: 1 daemons, quorum fileserver (age 28m)
    mgr: fileserver.rgtdvr(active, since 28m), standbys: fileserver.gikddq
    osd: 17 osds: 5 up (since 116m), 5 in (since 10m)

  data:
    pools:   3 pools, 545 pgs
    objects: 1.97M objects, 7.5 TiB
    usage:   7.7 TiB used, 1.4 TiB / 9.1 TiB avail
    pgs:     100.000% pgs unknown
             545 unknown
```

**ceph osd df**
```
ID  CLASS  WEIGHT   REWEIGHT  SIZE     RAW USE  DATA     OMAP    META     AVAIL 
   %USE   VAR   PGS  STATUS
 0    hdd  1.81940         0      0 B      0 B      0 B     0 B      0 B      0 
B      0     0    0    down
 1    hdd  3.63869         0      0 B      0 B      0 B     0 B      0 B      0 
B      0     0    0    down
 3    hdd  1.81940         0      0 B      0 B      0 B     0 B      0 B      0 
B      0     0  112    down
 4    hdd  1.81940         0      0 B      0 B      0 B     0 B      0 B      0 
B      0     0  117    down
 5    hdd  3.63869         0      0 B      0 B      0 B     0 B      0 B      0 
B      0     0    0    down
 6    hdd  3.63869         0      0 B      0 B      0 B     0 B      0 B      0 
B      0     0    0    down
 7    hdd  1.81940         0      0 B      0 B      0 B     0 B      0 B      0 
B      0     0    0    down
 8    hdd  1.81940         0      0 B      0 B      0 B     0 B      0 B      0 
B      0     0  106    down
20    hdd  1.81940         0      0 B      0 B      0 B     0 B      0 B      0 
B      0     0  115    down
21    hdd  1.81940         0      0 B      0 B      0 B     0 B      0 B      0 
B      0     0   94    down
22    hdd  1.81940         0      0 B      0 B      0 B     0 B      0 B      0 
B      0     0   98    down
23    hdd  1.81940         0      0 B      0 B      0 B     0 B      0 B      0 
B      0     0  109    down
24    hdd  1.81940   1.00000  1.8 TiB  1.6 TiB  1.6 TiB   4 KiB  3.0 GiB  186 
GiB  90.00  1.06  117      up
25    hdd  1.81940   1.00000  1.8 TiB  1.6 TiB  1.6 TiB  10 KiB  2.8 GiB  220 
GiB  88.18  1.04  114      up
26    hdd  1.81940   1.00000  1.8 TiB  1.5 TiB  1.5 TiB   9 KiB  2.8 GiB  297 
GiB  84.07  0.99  109      up
27    hdd  1.81940   1.00000  1.8 TiB  1.4 TiB  1.4 TiB   7 KiB  2.5 GiB  474 
GiB  74.58  0.88   98      up
28    hdd  1.81940   1.00000  1.8 TiB  1.6 TiB  1.6 TiB  10 KiB  3.0 GiB  206 
GiB  88.93  1.04  115      up
                       TOTAL  9.1 TiB  7.7 TiB  7.7 TiB  42 KiB   14 GiB  1.4 
TiB  85.15
MIN/MAX VAR: 0.88/1.06  STDDEV: 5.65
```

**ceph pg stat**
```
545 pgs: 545 unknown; 7.5 TiB data, 7.7 TiB used, 1.4 TiB / 9.1 TiB avail
```
**systemctl | grep ceph-768819b0-a83f-11ee-81d6-74563c5bfc7b**
```
ceph-768819b0-a83f-11ee-81d6-74563c5bfc7b@alertmanager.fileserver.service       
  loaded active     running   Ceph alertmanager.fileserver for 
768819b0-a83f-11ee-81d6-74563c5bfc7b
ceph-768819b0-a83f-11ee-81d6-74563c5bfc7b@ceph-exporter.fileserver.service      
  loaded active     running   Ceph ceph-exporter.fileserver for 
768819b0-a83f-11ee-81d6-74563c5bfc7b
ceph-768819b0-a83f-11ee-81d6-74563c5bfc7b@crash.fileserver.service              
  loaded active     running   Ceph crash.fileserver for 
768819b0-a83f-11ee-81d6-74563c5bfc7b
ceph-768819b0-a83f-11ee-81d6-74563c5bfc7b@grafana.fileserver.service            
  loaded active     running   Ceph grafana.fileserver for 
768819b0-a83f-11ee-81d6-74563c5bfc7b
ceph-768819b0-a83f-11ee-81d6-74563c5bfc7b@mgr.fileserver.gikddq.service         
  loaded active     running   Ceph mgr.fileserver.gikddq for 
768819b0-a83f-11ee-81d6-74563c5bfc7b
ceph-768819b0-a83f-11ee-81d6-74563c5bfc7b@mgr.fileserver.rgtdvr.service         
  loaded active     running   Ceph mgr.fileserver.rgtdvr for 
768819b0-a83f-11ee-81d6-74563c5bfc7b
[email protected]                
  loaded active     running   Ceph mon.fileserver for 
768819b0-a83f-11ee-81d6-74563c5bfc7b
[email protected]                         
  loaded active     running   Ceph osd.0 for 
768819b0-a83f-11ee-81d6-74563c5bfc7b
[email protected]                         
  loaded active     running   Ceph osd.1 for 
768819b0-a83f-11ee-81d6-74563c5bfc7b
[email protected]                        
  loaded active     running   Ceph osd.20 for 
768819b0-a83f-11ee-81d6-74563c5bfc7b
[email protected]                        
  loaded active     running   Ceph osd.21 for 
768819b0-a83f-11ee-81d6-74563c5bfc7b
[email protected]                        
  loaded active     running   Ceph osd.22 for 
768819b0-a83f-11ee-81d6-74563c5bfc7b
[email protected]                        
  loaded active     running   Ceph osd.23 for 
768819b0-a83f-11ee-81d6-74563c5bfc7b
[email protected]                        
  loaded active     running   Ceph osd.24 for 
768819b0-a83f-11ee-81d6-74563c5bfc7b
[email protected]                        
  loaded active     running   Ceph osd.25 for 
768819b0-a83f-11ee-81d6-74563c5bfc7b
[email protected]                        
  loaded active     running   Ceph osd.26 for 
768819b0-a83f-11ee-81d6-74563c5bfc7b
[email protected]                        
  loaded active     running   Ceph osd.27 for 
768819b0-a83f-11ee-81d6-74563c5bfc7b
[email protected]                        
  loaded active     running   Ceph osd.28 for 
768819b0-a83f-11ee-81d6-74563c5bfc7b
[email protected]                         
  loaded active     running   Ceph osd.3 for 
768819b0-a83f-11ee-81d6-74563c5bfc7b
[email protected]                         
  loaded active     running   Ceph osd.4 for 
768819b0-a83f-11ee-81d6-74563c5bfc7b
[email protected]                         
  loaded active     running   Ceph osd.5 for 
768819b0-a83f-11ee-81d6-74563c5bfc7b
[email protected]                         
  loaded active     running   Ceph osd.6 for 
768819b0-a83f-11ee-81d6-74563c5bfc7b
[email protected]                         
  loaded active     running   Ceph osd.7 for 
768819b0-a83f-11ee-81d6-74563c5bfc7b
[email protected]                         
  loaded active     running   Ceph osd.8 for 
768819b0-a83f-11ee-81d6-74563c5bfc7b
ceph-768819b0-a83f-11ee-81d6-74563c5bfc7b@prometheus.fileserver.service         
  loaded active     running   Ceph prometheus.fileserver for 
768819b0-a83f-11ee-81d6-74563c5bfc7b
system-ceph\x2d768819b0\x2da83f\x2d11ee\x2d81d6\x2d74563c5bfc7b.slice           
  loaded active     active    Slice 
/system/ceph-768819b0-a83f-11ee-81d6-74563c5bfc7b
ceph-768819b0-a83f-11ee-81d6-74563c5bfc7b.target                                
  loaded active     active    Ceph cluster 768819b0-a83f-11ee-81d6-74563c5bfc7b
```

The logs for `mon` and `osd.3` can be found here - 
https://gitlab.com/-/snippets/4793143
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]

[ceph-users] Most OSDs down and all PGs unknown after P2V migration

Reply via email to