Re: [Linux-HA] how to check HBA with heartbeat

Cristina Bulfon Tue, 14 Apr 2009 01:56:56 -0700

Ciao,

thanks for the answer ... Dejan has already pointed me out regardingthe IP.That IP is the alias IP for the AFS server, and I was using also withIPaddr2 because at the beginning,while I was configuring AFS, I had probem with network communicationand I thought to redirect the trafficon that IP. I've solved that problem and I forgot to delete the entryin haresource file

beacuse that configuration work fine with V1...


Anyway I correct the haresource file as follows

afsitfs3.roma1.infn.it \
        drbddisk::afs_fs Filesystem::/dev/drbd1::/vicepa/::xfs \
        drbddisk::afs_sw Filesystem::/dev/drbd2::/usr/afs::ext3 \
        141.108.26.31 afs

and create the cib.xml I don't have anymore the error but the AFSstart/stop

continuously

cristina

On Apr 14, 2009, at 10:38 AM, Andrew Beekhof wrote:

On Fri, Apr 10, 2009 at 12:25, Cristina Bulfon
<cristina.bul...@roma1.infn.it> wrote:

Dejan,
I've followed your advice and I've moved to V2, first the softwarehas been
updated to version 2.1.4.
 I just modified the following files

- ha.cf, added the line
        crm yes
- cib.xml has been produced using the python script and myharesources
       afsitfs3.roma1.infn.it IPaddr2::141.108.26.31/24/eth0:0
       afsitfs3.roma1.infn.it drbddisk::afs_fs
Filesystem::/dev/drbd1::/vicepa::xfs
       afsitfs3.roma1.infn.it drbddisk::afs_sw
Filesystem::/dev/drbd2::/usr/afs::ext3
       afsitfs3.roma1.infn.it 141.108.26.31 afs
With this kind of configuration I've got a lot of error and the AFSresource
doesn't work


Looks to me like the ip address is the one that doesn't work.  Did you
actually read the output you pasted below?

You might want to double check the nic and netmask attributes, they're
probably swapped around.


- crm_verify -L  -x /var/lib/heartbeat/crm/cib.xml

crm_verify[30489]: 2009/04/10_12:20:01 ERROR: unpack_rsc_op: Harderror:

IPaddr2_1_monitor_0 failed with rc=2.

crm_verify[30489]: 2009/04/10_12:20:01 ERROR: unpack_rsc_op:Preventing

IPaddr2_1 from re-starting on afsitfs4.roma1.infn.it

crm_verify[30489]: 2009/04/10_12:20:01 ERROR: unpack_rsc_op: Harderror:

IPaddr2_1_monitor_0 failed with rc=2.

crm_verify[30489]: 2009/04/10_12:20:01 ERROR: unpack_rsc_op:Preventing

IPaddr2_1 from re-starting on afsitfs3.roma1.infn.it

I've attached both cib.xml, ha-log and ha.cf

Thanks for helping me

cristina








On Apr 8, 2009, at 5:50 PM, Cristina Bulfon wrote:

Dejan,

thanks so much for the explanation :-)

c.

On Apr 8, 2009, at 5:46 PM, Dejan Muhamedagic wrote:

Ciao,

On Wed, Apr 08, 2009 at 04:17:45PM +0200, Cristina Bulfon wrote:

Ciao Dejan,

thanks for the answer.
Do you mean that I have to use heartbeat V2 plus CRM and thereis a way
to
check the HBA without using
hbaping ?


Unlike Heartbeat v1, CRM/v2 can monitor resources. I suppose that
in your case, a failing HBA would cause drbd or Filesystem
monitor action to fail, which would result in either a failover
or restart, depending on the configuration.

Thanks,

Dejan

Just to be sure if I have understood correctly. I am newby onheartbeat

V2

thanks

cristina





On Mar 31, 2009, at 2:00 PM, Dejan Muhamedagic wrote:

Ciao,

On Tue, Mar 31, 2009 at 01:48:47PM +0200, Cristina Bulfon wrote:


Ciao,

in our heartbeat cluster we have simulated the breaking of theHBA byunplugging the fiber from HBA on the primary node. Theresource didn'tswitch to the secondary node and on the log file on primarynode

reported
the following messages:

Feb 19 14:33:33 afsitfs3 kernel: qla2xxx 0000:0a:01.0: LOOP DOWN
detected
(2 e678 16ed).
Feb 19 14:33:38 afsitfs3 kernel: qla2xxx 0000:0a:01.1: LOOP DOWN
detected
(2 8633 16fc).
Feb 19 14:33:46 afsitfs3 kernel: qla2x00: FAILOVER device 2 from
200500a0b832d169 -> 200400a0b832d16a - LUN 10, reason=0x2
Feb 19 14:33:46 afsitfs3 kernel: qla2x00: FROM HBA 0 to HBA 1
Feb 19 14:33:52 afsitfs3 kernel: qla2x00: FAILOVER device 2 from
200400a0b832d16a -> 200500a0b832d16a - LUN 10, reason=0x2
Feb 19 14:33:52 afsitfs3 kernel: qla2x00: FROM HBA 1 to HBA 1
Feb 19 14:33:55 afsitfs3 kernel: qla2x00: FAILOVER device 2 from
200500a0b832d16a -> 200400a0b832d169 - LUN 10, reason=0x2
Feb 19 14:33:55 afsitfs3 kernel: qla2x00: FROM HBA 1 to HBA 0
Feb 19 14:33:58 afsitfs3 kernel: qla2x00: FAILOVER device 2 from
200400a0b832d169 -> 200500a0b832d169 - LUN 10, reason=0x2
Feb 19 14:33:58 afsitfs3 kernel: qla2x00: FROM HBA 0 to HBA 0
Feb 19 14:34:01 afsitfs3 kernel: qla2x00: FAILOVER device 2 from
200500a0b832d169 -> 200400a0b832d16a - LUN 10, reason=0x2

In some way I expected this kind of messages but I do notunderstand

why
the secondary node doesn't take the control of the resources.

In the ha.cf there is not nothing related to HBA and theharesources

file
is

afsitfs3.roma1.infn.it  IPaddr2::Y.Y.Y.Y/24/eth0:0
afsitfs3.roma1.infn.it  drbddisk::r0
Filesystem::/dev/drbd1::/vicepa::xfs
afsitfs3.roma1.infn.it  drbddisk::r1
Filesystem::/dev/drbd2::/usr/afs::ext3
afsitfs3.roma1.infn.it         Y.Y.Y.Y   afs


There's no resource monitoring with v1. For that you have to go
with v2/Pacemaker (aka CRM).

Also tried to use hbaping compiling the hbaapi_src_2.2 butwithout
success
.. got problem during the compilations and I didn't understandif I
have
to
use libHBAAPI.so  from hbaapi or from HBA vendor.


That could work with ipfail, perhaps.

Thanks,

Dejan

Our FC controller is

Logic PCI to Fibre Channel Host Adapter forQLA2342:Firmware version 3.03.25 IPX, Driver version 8.02.14.01-fo


Thanks in advance

cristina



_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems



_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] how to check HBA with heartbeat

Reply via email to