Hi, some updates about this?
Thank you
Il Mer 4 Set 2019, 10:46 Marco Marino ha scritto:
> First of all, thank you for your support.
> Andrey: sure, I can reach machines through IPMI.
> Here is a short "log":
>
> #From ld1 trying to contact ld1
> [root@ld1 ~]# ipmitool -I lanplus -H 192.168.254.250 -U root -P XX
> sdr elist all
> SEL | 72h | ns | 7.1 | No Reading
> Intrusion| 73h | ok | 7.1 |
> iDRAC8 | 00h | ok | 7.1 | Dynamic MC @ 20h
> ...
>
> #From ld1 trying to contact ld2
> ipmitool -I lanplus -H 192.168.254.251 -U root -P XX sdr elist all
> SEL | 72h | ns | 7.1 | No Reading
> Intrusion| 73h | ok | 7.1 |
> iDRAC7 | 00h | ok | 7.1 | Dynamic MC @ 20h
> ...
>
>
> #From ld2 trying to contact ld1:
> root@ld2 ~]# ipmitool -I lanplus -H 192.168.254.250 -U root -P X sdr
> elist all
> SEL | 72h | ns | 7.1 | No Reading
> Intrusion| 73h | ok | 7.1 |
> iDRAC8 | 00h | ok | 7.1 | Dynamic MC @ 20h
> System Board | 00h | ns | 7.1 | Logical FRU @00h
> .
>
> #From ld2 trying to contact ld2
> [root@ld2 ~]# ipmitool -I lanplus -H 192.168.254.251 -U root -P sdr
> elist all
> SEL | 72h | ns | 7.1 | No Reading
> Intrusion| 73h | ok | 7.1 |
> iDRAC7 | 00h | ok | 7.1 | Dynamic MC @ 20h
> System Board | 00h | ns | 7.1 | Logical FRU @00h
>
>
> Jan: Actually the cluster uses /etc/hosts in order to resolve names:
> 172.16.77.10ld1.mydomain.it ld1
> 172.16.77.11ld2.mydomain.it ld2
>
> Furthermore I'm using ip addresses for ipmi interfaces in the
> configuration:
> [root@ld1 ~]# pcs stonith show fence-node1
> Resource: fence-node1 (class=stonith type=fence_ipmilan)
> Attributes: ipaddr=192.168.254.250 lanplus=1 login=root passwd=X
> pcmk_host_check=static-list pcmk_host_list=ld1.mydomain.it
> Operations: monitor interval=60s (fence-node1-monitor-interval-60s)
>
>
> Any idea?
> How can I reset the state of the cluster without downtime? "pcs resource
> cleanup" is enough?
> Thank you,
> Marco
>
>
> Il giorno mer 4 set 2019 alle ore 10:29 Jan Pokorný
> ha scritto:
>
>> On 03/09/19 20:15 +0300, Andrei Borzenkov wrote:
>> > 03.09.2019 11:09, Marco Marino пишет:
>> >> Hi, I have a problem with fencing on a two node cluster. It seems that
>> >> randomly the cluster cannot complete monitor operation for fence
>> devices.
>> >> In log I see:
>> >> crmd[8206]: error: Result of monitor operation for fence-node2 on
>> >> ld2.mydomain.it: Timed Out
>> >
>> > Can you actually access IP addresses of your IPMI ports?
>>
>> [
>> Tangentially, interesting aspect beyond that and applicable for any
>> non-IP cross-host referential needs, which I haven't seen mentioned
>> anywhere so far, is the risk of DNS resolution (when /etc/hosts will
>> come short) getting to troubles (stale records, port blocked, DNS
>> server overload [DNSSEC, etc.], IPv4/IPv6 parallel records that the SW
>> cannot handle gracefully, etc.). In any case, just a single DNS
>> server would apparently be an undesired SPOF, and would be unfortunate
>> when unable to fence a node because of that.
>>
>> I think the most robust approach is to use IP addresses whenever
>> possible, and unambiguous records in /etc/hosts when practical.
>> ]
>>
>> >> As attachment there is
>> >> - /var/log/messages for node1 (only the important part)
>> >> - /var/log/messages for node2 (only the important part) <-- Problem
>> starts
>> >> here
>> >> - pcs status
>> >> - pcs stonith show (for both fence devices)
>> >>
>> >> I think it could be a timeout problem, so how can I see timeout value
>> for
>> >> monitor operation in stonith devices?
>> >> Please, someone can help me with this problem?
>> >> Furthermore, how can I fix the state of fence devices without downtime?
>>
>> --
>> Jan (Poki)
>> ___
>> Manage your subscription:
>> https://lists.clusterlabs.org/mailman/listinfo/users
>>
>> ClusterLabs home: https://www.clusterlabs.org/
>
>
___
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users
ClusterLabs home: https://www.clusterlabs.org/