Hey all,

Recently upgraded to Ceph Octopus (15.2.14). We also run Zabbix
5.0.15. Have had ceph/zabbix monitoring for a long time. After the
Ceph Octopus update I installed the latest version of the Ceph
template in Zabbix
(https://github.com/ceph/ceph/blob/master/src/pybind/mgr/zabbix/zabbix_template.xml).

Zabbix is successfully getting metrics for all the items in the items
list in my 'ceph' zabbix host. The ceph zabbix host is configured by
fsid so that any of my 3 ceph-mgr's can send data to it via uuid.

Here's the ceph zabbix config:

{
    "discovery_interval": 100,
    "identifier": "4a158d27-f750-41d5-9e7f-26ce4c9d2d45",
    "interval": 60,
    "log_level": "",
    "log_to_cluster": false,
    "log_to_cluster_level": "info",
    "log_to_file": false,
    "zabbix_host": "172.25.4.20",
    "zabbix_port": 10051,
    "zabbix_sender": "/usr/bin/zabbix_sender"
}

But for some reason when I run 'ceph zabbix send' or 'ceph zabbix
discover' I get the following errors:

# ceph zabbix send
Failed to send data to Zabbix
# ceph zabbix discovery
Failed to send discovery data to Zabbix

And the ceph logs are constantly logging zabbix errors:
# ceph log last
2021-10-19T17:40:00.005371-0400 mon.controller1 (mon.0) 682609 :
cluster [INF] overall HEALTH_OK
2021-10-19T17:40:04.347459-0400 mon.controller1 (mon.0) 682611 :
cluster [WRN] Health check failed: Failed to send data to Zabbix
(MGR_ZABBIX_SEND_FAILED)
2021-10-19T17:40:05.352579-0400 mon.controller1 (mon.0) 682612 :
cluster [INF] Health check cleared: MGR_ZABBIX_SEND_FAILED (was:
Failed to send data to Zabbix)
2021-10-19T17:40:05.352611-0400 mon.controller1 (mon.0) 682613 :
cluster [INF] Cluster is now healthy
2021-10-19T17:41:06.196293-0400 mon.controller1 (mon.0) 682647 :
cluster [WRN] Health check failed: Failed to send data to Zabbix
(MGR_ZABBIX_SEND_FAILED)
2021-10-19T17:41:07.260666-0400 mon.controller1 (mon.0) 682649 :
cluster [INF] Health check cleared: MGR_ZABBIX_SEND_FAILED (was:
Failed to send data to Zabbix)
2021-10-19T17:41:07.260689-0400 mon.controller1 (mon.0) 682650 :
cluster [INF] Cluster is now healthy

I've tried setting debug_mgr and debug_mon to 20/20 to look for
additional detail but I didn't see much more other than:

2021-10-19T17:15:27.042-0400 7f2c6c50d700 7
mon.controller1@0(leader).log v30689480 update_from_paxos applying
incremental log 30689480 2021-10-19T17:15:26.604054-0400
mon.controller3 (mon.2) 42876 : audit [DBG] from='mgr.490501944
172.25.12.17:0/3421653' entity='mgr.controller1' cmd=[{"prefix":
"config-key get", "key": "mgr/zabbix/zabbix_host"}]: dispatch
"MGR_ZABBIX_SEND_FAILED": {
"message": "Failed to send data to Zabbix",
"message": "/usr/bin/zabbix_sender exited non-zero: b''"


If anyone has any tips for troubleshooting that would be greatly appreciated!
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to