[ceph-users] Re: unable to deploy ceph -- failed to read label for XXX No such file or directory

2023-04-17 Thread Radoslav Bodó
when adding OSDs the first host gets created OSDs as expected, but 
during creating OSDs on second host the output gets wierd, even when 
adding each device separately the output shows that `ceph orch` tries to 
create multiple osds at once


```
root@test1:~# for xxx in j k l m; do ceph orch daemon add osd 
test2:/dev/xvd$xxx; done

Created osd(s) 0,1,2,3 on host 'test2'
Created osd(s) 0,1 on host 'test2'
Created osd(s) 2,3 on host 'test2'
Created osd(s) 1 on host 'test2'
```


solved

turns out that test cluster has wrong mapping of data devices. all three 
nodes gets the same set of disks and when booting in parallel the XEN 
platform did not prevented to use them multiple times in separate vms



bodik
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] unable to deploy ceph -- failed to read label for XXX No such file or directory

2023-04-16 Thread Radoslav Bodó

hello,

during basic experimentation I'm running into wierd situaltion when 
adding osd to test cluster. The test cluster is created as 3x XEN DomU 
Debian Bookworm (test1-3), 4x CPU, 8GB RAM, xvda root, xvbd swap, 4x 
xvdj,k,l,m 20GB (LVM volumes in Dom0, propagated via xen phy device) and 
cleaned with `wipefs -a`


```
apt-get install cephadm ceph-common
cephadm bootstrap --mon-ip 10.0.0.101
ceph orch host add test2
ceph orch host add test3
```

when adding OSDs the first host gets created OSDs as expected, but 
during creating OSDs on second host the output gets wierd, even when 
adding each device separately the output shows that `ceph orch` tries to 
create multiple osds at once


```
root@test1:~# for xxx in j k l m; do ceph orch daemon add osd 
test2:/dev/xvd$xxx; done

Created osd(s) 0,1,2,3 on host 'test2'
Created osd(s) 0,1 on host 'test2'
Created osd(s) 2,3 on host 'test2'
Created osd(s) 1 on host 'test2'
```

the syslog on test2 node shows an errors


```
2023-04-16T20:57:02.528456+00:00 test2 bash[10426]: cephadm 
2023-04-16T20:57:01.389951+ mgr.test1.ucudzp (mgr.14206) 1691 : 
cephadm [INF] Found duplicate OSDs: osd.0 in status running on test1, 
osd.0 in status error on test2


2023-04-16T20:57:02.528748+00:00 test2 bash[10426]: cephadm
2023-04-16T20:57:01.391346+ mgr.test1.ucudzp (mgr.14206) 1692 : 
cephadm [INF] Removing daemon osd.0 from test2 -- ports []

2023-04-16T20:57:02.528943+00:00 test2 bash[10426]: cluster
2023-04-16T20:57:02.350564+ mon.test1 (mon.0) 743 : cluster [WRN] 
Health check failed: 2 failed cephadm daemon(s) (CEPHADM_FAILED_DAEMON)


2023-04-16T20:57:17.972962+00:00 test2 bash[20098]:  stderr: failed to 
read label for 
/dev/ceph-48f3646c-7070-4a37-b9a4-ed0a4a983965/osd-block-11a0dc2b-f8e1-4694-813f-2309ab6a5c1d: 
(2) No such file or directory
2023-04-16T20:57:17.973064+00:00 test2 bash[20098]:  stderr: 
2023-04-16T20:57:17.962+ 7fad2451c540 -1 
bluestore(/dev/ceph-48f3646c-7070-4a37-b9a4-ed0a4a983965/osd-block-11a0dc2b-f8e1-4694-813f-2309ab6a5c1d) 
_read_bdev_label failed to open 
/dev/ceph-48f3646c-7070-4a37-b9a4-ed0a4a983965/osd-block-11a0dc2b-f8e1-4694-813f-2309ab6a5c1d: 
(2) No such file or directory
2023-04-16T20:57:17.973181+00:00 test2 bash[20098]: --> Failed to 
activate via lvm: command returned non-zero exit status: 1
2023-04-16T20:57:17.973278+00:00 test2 bash[20098]: --> Failed to 
activate via simple: 'Namespace' object has no attribute 'json_config'
2023-04-16T20:57:17.973368+00:00 test2 bash[20098]: --> Failed to 
activate any OSD(s)

```

the ceph and cephadm binaries are installed from debian bookworm

```
ii  ceph-common16.2.11+ds-2 amd64common utilities to mount 
and interact with a ceph storage cluster
ii  cephadm16.2.11+ds-2 amd64utility to bootstrap ceph 
daemons with systemd and containers

```

management session script can be found at https://pastebin.com/raw/FiX7DMHS


none of the googled symptoms helped me to understand why is this 
situation happening nor how to troubleshoot or debug the issues. I'd 
understand that the nodes are very log on RAM to get this experiment 
running, but the behavior does not really look like OOM issue.


any idea would be appreciated

thanks
bodik
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io