On 7/8/21 5:06 PM, Bryan Stillwell wrote:
> I upgraded one of my clusters to v16.2.5 today and now I'm seeing these 
> messages from 'ceph -W cephadm':
>
> 2021-07-08T22:01:55.356953+0000 mgr.excalibur.kuumco [ERR] Failed to apply 
> alertmanager spec AlertManagerSpec({'placement': PlacementSpec(count=1), 
> 'service_type': 'alertmanager', 'service_id': None, 'unmanaged': False, 
> 'preview_only': False, 'networks': [], 'config': None, 'user_data': {}, 
> 'port': None}): name alertmanager.aladdin already in use
> Traceback (most recent call last):
>   File "/usr/share/ceph/mgr/cephadm/serve.py", line 582, in 
> _apply_all_services
>     if self._apply_service(spec):
>   File "/usr/share/ceph/mgr/cephadm/serve.py", line 743, in _apply_service
>     rank_generation=slot.rank_generation,
>   File "/usr/share/ceph/mgr/cephadm/module.py", line 613, in get_unique_name
>     f'name {daemon_type}.{name} already in use')
> orchestrator._interface.OrchestratorValidationError: name 
> alertmanager.aladdin already in use
> 2021-07-08T22:01:55.372569+0000 mgr.excalibur.kuumco [ERR] Failed to apply 
> node-exporter spec MonitoringSpec({'placement': 
> PlacementSpec(host_pattern='*'), 'service_type': 'node-exporter', 
> 'service_id': None, 'unmanaged': False, 'preview_only': False, 'networks': 
> [], 'config': None, 'port': None}): name node-exporter.aladdin already in use
> Traceback (most recent call last):
>   File "/usr/share/ceph/mgr/cephadm/serve.py", line 582, in 
> _apply_all_services
>     if self._apply_service(spec):
>   File "/usr/share/ceph/mgr/cephadm/serve.py", line 743, in _apply_service
>     rank_generation=slot.rank_generation,
>   File "/usr/share/ceph/mgr/cephadm/module.py", line 613, in get_unique_name
>     f'name {daemon_type}.{name} already in use')
> orchestrator._interface.OrchestratorValidationError: name 
> node-exporter.aladdin already in use
>
> Also my 'ceph -s' output keeps getting longer and longer (currently 517 
> lines) with messages like these:
>
>     Updating node-exporter deployment (+6 -6 -> 13) (0s)
>       [............................]
>     Updating alertmanager deployment (+1 -1 -> 1) (0s)
>       [............................]
>
> What's the best way to go about fixing this?  I've tried using 'ceph orch 
> daemon redeploy alertmanager.aladdin' and the same for node-exporter, but it 
> doesn't seem to help.


Workaround (caution: temporarily disruptive),  Assuming this is the only
reported problem remaining after upgrade otherwise completes:

1.  ceph orch rm node-exporter  

Wait 30+ seconds.

2.  Stop all managers.

3.  Start all managers.

4.  ceph orch apply node-exporter '*'


>
> Thanks,
> Bryan
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to