Re: [Linux-HA] Detecting drive failure and demoting

Trent Lloyd Thu, 20 Dec 2007 05:47:27 -0800

Hi Dejan,

On 20/12/2007, at 9:54 PM, Dejan Muhamedagic wrote:

Hi,

On Thu, Dec 20, 2007 at 08:15:12PM +0900, Trent Lloyd wrote:

Hi Dejan,

On 20/12/2007, at 7:50 PM, Dejan Muhamedagic wrote:

Hi,

On Thu, Dec 20, 2007 at 05:48:10PM +0900, Trent Lloyd wrote:

Hi All,

I have recently setup a 2-node iSCSI fail-over array backed onto
shared SAS MD3000 storage.


How is this thing connected: is it iSCSI or SAS?


Sorry that wasn't clear - to the nodes running heartbeat they are
connected via SAS - they then serve them up via iSCSI.

OK.

I have everything (including RDAC) working fine on my Debian Etch
nodes - however I am curious if it is possible to get heartbeat to

demote itself if it loses access to the disks - I am not sure ifI am

missing something but it seems if the disks start failing on a node
there's no mechanism to cause it to failover.


The kernel should take care of that. If the computer hangs or
crashes, there won't be heartbeat and, after a successful fencing
operation (you do have a stonith device, right?), a failover will
occur. You can also configure a watchdog. Or did I misunderstand
your question?


I would expect that if a single disk array disappears - the machine
shouldn't hang - only processes that were depending on those would

hang. The same disk array does not contain the root array oranything

like that - only the data partition.


I guess that that depends on the kind of error. At any rate,
the processes which run on top of this disk will fail in some
way. If you have them in the heartbeat as resources and define a
monitor operation, then you should be OK.

Is there anything to do this currently?I can't see anything.  I
figure
it would be possible to write a plugin to monitor the dm-multipath
stuff - is this a reasonable approach?


It's been a long time since I used that. How can one monitor
dm-multipath? Isn't it fault tolerant?

It is, but I'm talking in a situation where for some reason bothpaths

are lost.  I know this seems kinda paranoid but it just seemed like a
reasonable thing to do to me.

Example output:
filer2:~# multipath -ll
mpath0 (360019b9000b6b68e00001c2a46e8e656) dm-0 DELL    ,MD3000
[size=1.9T][features=0][hwhandler=1 rdac]
\_ round-robin 0 [prio=3][enabled]
\_ 2:0:0:0  sdd 8:48  [active][ready]
\_ round-robin 0 [prio=0][enabled]
\_ 1:0:0:0  sdb 8:16  [active][ghost]

So we could parse or write some API that makes the same call thismake

to make sure that the mpath0 has at least 1 active working path.


Yes, it would be possible to do a monitor-only resource agent,
which would otherwise behave like a dummy resource (see Dummy :)
I just wonder how different that output can look and which
information is important. A more elegant way would be to
implement a ping-like monitor as a Heartbeat plugin. There are
already hbaping (for f/c) and ping (for IP).


It's worth noting I am using heartbeat in v1 mode rather than CRM mode.

But alas I can still do something like this I think.. I will look intoit.


Thanks,
Trent
_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Detecting drive failure and demoting

Reply via email to