Thanks for the tip. I’ve just been using ‘docker exec -it <container id> 
/bin/bash’ to get into the containers, but those commands sound useful. I think 
I’ll install cephadm on all nodes just for this. 

Thanks again, 
-Paul


> On Sep 8, 2021, at 10:11 AM, Eugen Block <ebl...@nde.ag> wrote:
> 
> Okay, I'm glad it worked!
> 
> 
>> At first I tried cephadm rm-daemon on the bootstrap node that I usually do 
>> all management from and it indicated that it could not remove the daemon:
>> 
>> [root@cxcto-c240-j27-01 ~]# cephadm rm-daemon --name 
>> iscsi.cxcto-c240-j27-04.lgqtxo --fsid 4a29e724-c4a6-11eb-b14a-5c838f8013a5
>> ERROR: Daemon not found: iscsi.cxcto-c240-j27-04.lgqtxo. See `cephadm ls`
>> 
>> When I would do ‘cephadm ls’ I only saw services running locally on that 
>> server, not the whole cluster. I’m not sure if this is expected or not.
> 
> As far as I can tell this is expected, yes. I have only a lab environment 
> with containers (we're still hesitating to upgrade to Octopus) but all 
> virtual nodes have cephadm installed, I thought that was a requirement, I may 
> be wrong though. But it definitely helps you to debug, for example with 
> 'cephadm enter --name <daemon>' you get a shell for that container or 
> 'cephadm logs --name <daemon>' you can inspect specific logs.
> 
> 
> Zitat von "Paul Giralt (pgiralt)" <pgir...@cisco.com>:
> 
>> Thanks Eugen.
>> 
>> At first I tried cephadm rm-daemon on the bootstrap node that I usually do 
>> all management from and it indicated that it could not remove the daemon:
>> 
>> [root@cxcto-c240-j27-01 ~]# cephadm rm-daemon --name 
>> iscsi.cxcto-c240-j27-04.lgqtxo --fsid 4a29e724-c4a6-11eb-b14a-5c838f8013a5
>> ERROR: Daemon not found: iscsi.cxcto-c240-j27-04.lgqtxo. See `cephadm ls`
>> 
>> When I would do ‘cephadm ls’ I only saw services running locally on that 
>> server, not the whole cluster. I’m not sure if this is expected or not. I 
>> installed cephadm on the cxcto-c240-j27-04 server and issued the command and 
>> it worked. It looks like when I did this, suddenly the containers on the 
>> other two servers that were not supposed to be running the iscsi gateway 
>> were removed and everything appeared to be back to normal. I then added back 
>> one server to the yaml file and applied it on the original bootstrap node 
>> and it got deployed properly, so it appears that everything is working 
>> again. Somehow deleting that daemon on the 04 server got everything working 
>> again.
>> 
>> Still not exactly sure why that fixed it, but at least it’s working again. 
>> Thanks for the suggestion.
>> 
>> -Paul
>> 
>> 
>>> On Sep 8, 2021, at 4:12 AM, Eugen Block <ebl...@nde.ag> wrote:
>>> 
>>> If you only configured 1 iscsi gw but you see 3 running, have you tried to 
>>> destroy them with 'cephadm rm-daemon --name ...'? On the active MGR host 
>>> run 'journalctl -f' and you'll see plenty of information, it should also 
>>> contain information about the iscsi deployment. Or run 'cephadm logs --name 
>>> <iscsi-gw>'.
>>> 
>>> 
>>> Zitat von "Paul Giralt (pgiralt)" <pgir...@cisco.com>:
>>> 
>>>> This was working until recently and now seems to have stopped working. 
>>>> Running Pacific 16.2.5. When I modify the deployment YAML file for my 
>>>> iscsi gateways, the services are not being added or removed as requested. 
>>>> It’s as if the state is “stuck”.
>>>> 
>>>> At one point I had 4 iSCSI gateways: 02, 03, 04 and 05. Through some back 
>>>> and forth of deploying and undeploying, I ended up in a state where the 
>>>> services are running on servers 02, 03, and 05 no matter what I tell 
>>>> cephadm to do. For example, right now I have the following configuration:
>>>> 
>>>> service_type: iscsi
>>>> service_id: iscsi
>>>> placement:
>>>> hosts:
>>>>   - cxcto-c240-j27-03.cisco.com
>>>> spec:
>>>> pool: iscsi-config
>>>> … removed the rest of this file ….
>>>> 
>>>> However ceph orch ls shows this:
>>>> 
>>>> [root@cxcto-c240-j27-01 ~]# ceph orch ls
>>>> NAME                               PORTS        RUNNING  REFRESHED  AGE  
>>>> PLACEMENT
>>>> alertmanager                       ?:9093,9094      1/1  9m ago     3M   
>>>> count:1
>>>> crash                                             15/15  10m ago    3M   *
>>>> grafana                            ?:3000           1/1  9m ago     3M   
>>>> count:1
>>>> iscsi.iscsi                                         3/1  10m ago    11m  
>>>> cxcto-c240-j27-03.cisco.com
>>>> mgr                                                 2/2  9m ago     3M   
>>>> count:2
>>>> mon                                                 5/5  9m ago     12d  
>>>> cxcto-c240-j27-01.cisco.com;cxcto-c240-j27-06.cisco.com;cxcto-c240-j27-08.cisco.com;cxcto-c240-j27-10.cisco.com;cxcto-c240-j27-12.cisco.com
>>>> node-exporter                      ?:9100         15/15  10m ago    3M   *
>>>> osd.dashboard-admin-1622750977792                  0/15  -          3M   *
>>>> osd.dashboard-admin-1622751032319               326/341  10m ago    3M   *
>>>> prometheus                         ?:9095           1/1  9m ago     3M   
>>>> count:1
>>>> 
>>>> Notice it shows 3/1 because the service is still running on 3 servers even 
>>>> though I’ve told it to only run on one. If I configure all 4 servers and 
>>>> apply (ceph orch apply) then I end up with 3/4 because server 04 never 
>>>> deploys. It’s as if something is “stuck”.
>>>> 
>>>> Any ideas where to look / log files that might help figure out what’s 
>>>> happening?
>>>> 
>>>> -Paul
>>>> 
>>>> _______________________________________________
>>>> ceph-users mailing list -- ceph-users@ceph.io
>>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
> 
> 
> 

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to