First of all, thank you for your support. Andrey: sure, I can reach machines through IPMI. Here is a short "log":
#From ld1 trying to contact ld1 [root@ld1 ~]# ipmitool -I lanplus -H 192.168.254.250 -U root -P XXXXXX sdr elist all SEL | 72h | ns | 7.1 | No Reading Intrusion | 73h | ok | 7.1 | iDRAC8 | 00h | ok | 7.1 | Dynamic MC @ 20h ... #From ld1 trying to contact ld2 ipmitool -I lanplus -H 192.168.254.251 -U root -P XXXXXX sdr elist all SEL | 72h | ns | 7.1 | No Reading Intrusion | 73h | ok | 7.1 | iDRAC7 | 00h | ok | 7.1 | Dynamic MC @ 20h ....... #From ld2 trying to contact ld1: root@ld2 ~]# ipmitool -I lanplus -H 192.168.254.250 -U root -P XXXXX sdr elist all SEL | 72h | ns | 7.1 | No Reading Intrusion | 73h | ok | 7.1 | iDRAC8 | 00h | ok | 7.1 | Dynamic MC @ 20h System Board | 00h | ns | 7.1 | Logical FRU @00h ..... #From ld2 trying to contact ld2 [root@ld2 ~]# ipmitool -I lanplus -H 192.168.254.251 -U root -P XXXX sdr elist all SEL | 72h | ns | 7.1 | No Reading Intrusion | 73h | ok | 7.1 | iDRAC7 | 00h | ok | 7.1 | Dynamic MC @ 20h System Board | 00h | ns | 7.1 | Logical FRU @00h ........ Jan: Actually the cluster uses /etc/hosts in order to resolve names: 172.16.77.10 ld1.mydomain.it ld1 172.16.77.11 ld2.mydomain.it ld2 Furthermore I'm using ip addresses for ipmi interfaces in the configuration: [root@ld1 ~]# pcs stonith show fence-node1 Resource: fence-node1 (class=stonith type=fence_ipmilan) Attributes: ipaddr=192.168.254.250 lanplus=1 login=root passwd=XXXXX pcmk_host_check=static-list pcmk_host_list=ld1.mydomain.it Operations: monitor interval=60s (fence-node1-monitor-interval-60s) Any idea? How can I reset the state of the cluster without downtime? "pcs resource cleanup" is enough? Thank you, Marco Il giorno mer 4 set 2019 alle ore 10:29 Jan Pokorný <jpoko...@redhat.com> ha scritto: > On 03/09/19 20:15 +0300, Andrei Borzenkov wrote: > > 03.09.2019 11:09, Marco Marino пишет: > >> Hi, I have a problem with fencing on a two node cluster. It seems that > >> randomly the cluster cannot complete monitor operation for fence > devices. > >> In log I see: > >> crmd[8206]: error: Result of monitor operation for fence-node2 on > >> ld2.mydomain.it: Timed Out > > > > Can you actually access IP addresses of your IPMI ports? > > [ > Tangentially, interesting aspect beyond that and applicable for any > non-IP cross-host referential needs, which I haven't seen mentioned > anywhere so far, is the risk of DNS resolution (when /etc/hosts will > come short) getting to troubles (stale records, port blocked, DNS > server overload [DNSSEC, etc.], IPv4/IPv6 parallel records that the SW > cannot handle gracefully, etc.). In any case, just a single DNS > server would apparently be an undesired SPOF, and would be unfortunate > when unable to fence a node because of that. > > I think the most robust approach is to use IP addresses whenever > possible, and unambiguous records in /etc/hosts when practical. > ] > > >> As attachment there is > >> - /var/log/messages for node1 (only the important part) > >> - /var/log/messages for node2 (only the important part) <-- Problem > starts > >> here > >> - pcs status > >> - pcs stonith show (for both fence devices) > >> > >> I think it could be a timeout problem, so how can I see timeout value > for > >> monitor operation in stonith devices? > >> Please, someone can help me with this problem? > >> Furthermore, how can I fix the state of fence devices without downtime? > > -- > Jan (Poki) > _______________________________________________ > Manage your subscription: > https://lists.clusterlabs.org/mailman/listinfo/users > > ClusterLabs home: https://www.clusterlabs.org/
_______________________________________________ Manage your subscription: https://lists.clusterlabs.org/mailman/listinfo/users ClusterLabs home: https://www.clusterlabs.org/