Thanks for the tip. I’ve just been using ‘docker exec -it <container id> /bin/bash’ to get into the containers, but those commands sound useful. I think I’ll install cephadm on all nodes just for this.
Thanks again, -Paul > On Sep 8, 2021, at 10:11 AM, Eugen Block <ebl...@nde.ag> wrote: > > Okay, I'm glad it worked! > > >> At first I tried cephadm rm-daemon on the bootstrap node that I usually do >> all management from and it indicated that it could not remove the daemon: >> >> [root@cxcto-c240-j27-01 ~]# cephadm rm-daemon --name >> iscsi.cxcto-c240-j27-04.lgqtxo --fsid 4a29e724-c4a6-11eb-b14a-5c838f8013a5 >> ERROR: Daemon not found: iscsi.cxcto-c240-j27-04.lgqtxo. See `cephadm ls` >> >> When I would do ‘cephadm ls’ I only saw services running locally on that >> server, not the whole cluster. I’m not sure if this is expected or not. > > As far as I can tell this is expected, yes. I have only a lab environment > with containers (we're still hesitating to upgrade to Octopus) but all > virtual nodes have cephadm installed, I thought that was a requirement, I may > be wrong though. But it definitely helps you to debug, for example with > 'cephadm enter --name <daemon>' you get a shell for that container or > 'cephadm logs --name <daemon>' you can inspect specific logs. > > > Zitat von "Paul Giralt (pgiralt)" <pgir...@cisco.com>: > >> Thanks Eugen. >> >> At first I tried cephadm rm-daemon on the bootstrap node that I usually do >> all management from and it indicated that it could not remove the daemon: >> >> [root@cxcto-c240-j27-01 ~]# cephadm rm-daemon --name >> iscsi.cxcto-c240-j27-04.lgqtxo --fsid 4a29e724-c4a6-11eb-b14a-5c838f8013a5 >> ERROR: Daemon not found: iscsi.cxcto-c240-j27-04.lgqtxo. See `cephadm ls` >> >> When I would do ‘cephadm ls’ I only saw services running locally on that >> server, not the whole cluster. I’m not sure if this is expected or not. I >> installed cephadm on the cxcto-c240-j27-04 server and issued the command and >> it worked. It looks like when I did this, suddenly the containers on the >> other two servers that were not supposed to be running the iscsi gateway >> were removed and everything appeared to be back to normal. I then added back >> one server to the yaml file and applied it on the original bootstrap node >> and it got deployed properly, so it appears that everything is working >> again. Somehow deleting that daemon on the 04 server got everything working >> again. >> >> Still not exactly sure why that fixed it, but at least it’s working again. >> Thanks for the suggestion. >> >> -Paul >> >> >>> On Sep 8, 2021, at 4:12 AM, Eugen Block <ebl...@nde.ag> wrote: >>> >>> If you only configured 1 iscsi gw but you see 3 running, have you tried to >>> destroy them with 'cephadm rm-daemon --name ...'? On the active MGR host >>> run 'journalctl -f' and you'll see plenty of information, it should also >>> contain information about the iscsi deployment. Or run 'cephadm logs --name >>> <iscsi-gw>'. >>> >>> >>> Zitat von "Paul Giralt (pgiralt)" <pgir...@cisco.com>: >>> >>>> This was working until recently and now seems to have stopped working. >>>> Running Pacific 16.2.5. When I modify the deployment YAML file for my >>>> iscsi gateways, the services are not being added or removed as requested. >>>> It’s as if the state is “stuck”. >>>> >>>> At one point I had 4 iSCSI gateways: 02, 03, 04 and 05. Through some back >>>> and forth of deploying and undeploying, I ended up in a state where the >>>> services are running on servers 02, 03, and 05 no matter what I tell >>>> cephadm to do. For example, right now I have the following configuration: >>>> >>>> service_type: iscsi >>>> service_id: iscsi >>>> placement: >>>> hosts: >>>> - cxcto-c240-j27-03.cisco.com >>>> spec: >>>> pool: iscsi-config >>>> … removed the rest of this file …. >>>> >>>> However ceph orch ls shows this: >>>> >>>> [root@cxcto-c240-j27-01 ~]# ceph orch ls >>>> NAME PORTS RUNNING REFRESHED AGE >>>> PLACEMENT >>>> alertmanager ?:9093,9094 1/1 9m ago 3M >>>> count:1 >>>> crash 15/15 10m ago 3M * >>>> grafana ?:3000 1/1 9m ago 3M >>>> count:1 >>>> iscsi.iscsi 3/1 10m ago 11m >>>> cxcto-c240-j27-03.cisco.com >>>> mgr 2/2 9m ago 3M >>>> count:2 >>>> mon 5/5 9m ago 12d >>>> cxcto-c240-j27-01.cisco.com;cxcto-c240-j27-06.cisco.com;cxcto-c240-j27-08.cisco.com;cxcto-c240-j27-10.cisco.com;cxcto-c240-j27-12.cisco.com >>>> node-exporter ?:9100 15/15 10m ago 3M * >>>> osd.dashboard-admin-1622750977792 0/15 - 3M * >>>> osd.dashboard-admin-1622751032319 326/341 10m ago 3M * >>>> prometheus ?:9095 1/1 9m ago 3M >>>> count:1 >>>> >>>> Notice it shows 3/1 because the service is still running on 3 servers even >>>> though I’ve told it to only run on one. If I configure all 4 servers and >>>> apply (ceph orch apply) then I end up with 3/4 because server 04 never >>>> deploys. It’s as if something is “stuck”. >>>> >>>> Any ideas where to look / log files that might help figure out what’s >>>> happening? >>>> >>>> -Paul >>>> >>>> _______________________________________________ >>>> ceph-users mailing list -- ceph-users@ceph.io >>>> To unsubscribe send an email to ceph-users-le...@ceph.io >>> >>> >>> >>> _______________________________________________ >>> ceph-users mailing list -- ceph-users@ceph.io >>> To unsubscribe send an email to ceph-users-le...@ceph.io > > > _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io