[Pacemaker] Antwort: Re: WG: time pressure - software raid cluster, raid1 ressource agent, help needed

2011-03-07 Thread Patrik . Rapposch
Hy,

thx for answer. I tested this now, the problem is, mdadm hangs totally 
when we simulate the fail of one storage. (we already tried two ways: 1. 
removing the mapping., 2. removing one path, and then disabling the 
remaining path through the port on the san switch - which is nearly the 
same like a total fail of the storage).

So I can't get the output of mdadm, because it hangs.

I think it must be a problem with mdadm. This is my mdadm.conf:

DEVICE /dev/mapper/3600a0b800050c94e07874d2e0028_part1 
/dev/mapper/3600a0b8000511f5414b14d2df1b1_part1 
/dev/mapper/3600a0b800050c94e07874d2e0028_part2 
/dev/mapper/3600a0b8000511f5414b14d2df1b1_part2 
/dev/mapper/3600a0b800050c94e07874d2e0028_part3 
/dev/mapper/3600a0b8000511f5414b14d2df1b1_part3
ARRAY /dev/md0 metadata=0.90 UUID=c411c076:bb28916f:d50a93ef:800dd1f0
ARRAY /dev/md1 metadata=0.90 UUID=522279fa:f3cdbe3a:d50a93ef:800dd1f0
ARRAY /dev/md2 metadata=0.90 UUID=01e07d7d:5305e46c:d50a93ef:800dd1f0

kr Patrik


Mit freundlichen Grüßen / Best Regards

Patrik Rapposch, BSc
System Administration

KNAPP Systemintegration GmbH
Waltenbachstraße 9
8700 Leoben, Austria 
Phone: +43 3842 805-915
Fax: +43 3842 805-500
patrik.rappo...@knapp.com 
www.KNAPP.com 

Commercial register number: FN 138870x
Commercial register court: Leoben

The information in this e-mail (including any attachment) is confidential 
and intended to be for the use of the addressee(s) only. If you have 
received the e-mail by mistake, any disclosure, copy, distribution or use 
of the contents of the e-mail is prohibited, and you must delete the 
e-mail from your system. As e-mail can be changed electronically KNAPP 
assumes no responsibility for any alteration to this e-mail or its 
attachments. KNAPP has taken every reasonable precaution to ensure that 
any attachment to this e-mail has been swept for virus. However, KNAPP 
does not accept any liability for damage sustained as a result of such 
attachment being virus infected and strongly recommend that you carry out 
your own virus check before opening any attachment.



Holger Teutsch holger.teut...@web.de 
06.03.2011 19:56
Bitte antworten an
The Pacemaker cluster resource manager  pacemaker@oss.clusterlabs.org


An
The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org
Kopie

Thema
Re: [Pacemaker] WG: time pressure - software raid cluster, raid1 ressource 
agent, help needed






On Sun, 2011-03-06 at 12:40 +0100, patrik.rappo...@knapp.com wrote:
Hi,
assume the basic problem is in your raid configuration.

If you unmap one box the devices should not be in status FAIL but in
degraded.

So what is the exit status of

mdadm --detail --test /dev/md0

after unmapping ?

Furthermore I would start start with one isolated group containing the
raid, LVM, and FS to keep it simple.

Regards
Holger

  Hy, 
 
 
 does anyone have an idea to that? I only have the servers till next
 week friday, so to my regret I am under time pressure :(
 
 
 
 Like I already wrote, I would appreciate and test any idea of you.
 Also if someone already made clusters with lvm-mirror, I would be
 happy to get a cib or some configuration examples.
 
 
 
 
 
 
 
 Thank you very much in advance.
 
 
 
 
 
 
 
 kr Patrik
 
 
 
 
 
 patrik.rappo...@knapp.com
 03.03.2011 15:11Bitte antworten anThe Pacemaker cluster resource
 manager
 
 An   pacemaker@oss.clusterlabs.org
 Kopie 
 Blindkopie 
 Thema   [Pacemaker] software raid cluster, raid1 ressource agent,help
 needed
 
 
 Good Day, 
 
 I have a 2 node active/passive cluster which is connected to two  ibm
 4700 storages. I configured 3 raids and I use the Raid1 ressource
 agent for managing the Raid1s in the cluster. 
 When I now disable the mapping of one storage, to simulate the fail of
 one storage, the Raid1 Ressources change to the State FAILED and the
 second node then takes over the ressources and is able to start the
 raid devices. 
 
 So I am confused, why the active node can't keep the raid1 ressources
 and the former passive node takes them over and can start them
 correct. 
 
 I would really appreciate your advice, or maybe someone already has a
 example configuration for Raid1 with two storages.
 
 Thank you very much in advance. Attached you can find my cib.xml. 
 
 kr Patrik 
 
 
 
 Mit freundlichen Grüßen / Best Regards
 
 Patrik Rapposch, BSc
 System Administration
 
 KNAPP Systemintegration GmbH
 Waltenbachstraße 9
 8700 Leoben, Austria 
 Phone: +43 3842 805-915
 Fax: +43 3842 805-500
 patrik.rappo...@knapp.com 
 www.KNAPP.com 
 
 Commercial register number: FN 138870x
 Commercial register court: Leoben
 
 The information in this e-mail (including any attachment) is
 confidential and intended to be for the use of the addressee(s) only.
 If you have received the e-mail by mistake, any disclosure, copy,
 distribution or use of the contents of the e-mail is prohibited, and
 you must delete the e-mail from your system. As e-mail can be changed
 electronically KNAPP assumes no 

Re: [Pacemaker] Antwort: Re: WG: time pressure - software raid cluster, raid1 ressource agent, help needed

2011-03-07 Thread Holger Teutsch
Hi,
SAN drivers often cave large timeouts configured, so are you patient
enough ?
At least this demonstrates that the problem is currently not in the
cluster...
- holger
On Mon, 2011-03-07 at 11:04 +0100, patrik.rappo...@knapp.com wrote:
 Hy, 
 
 thx for answer. I tested this now, the problem is, mdadm hangs totally
 when we simulate the fail of one storage. (we already tried two ways:
 1. removing the mapping., 2. removing one path, and then disabling the
 remaining path through the port on the san switch - which is nearly
 the same like a total fail of the storage). 
 
 So I can't get the output of mdadm, because it hangs. 
 
 I think it must be a problem with mdadm. This is my mdadm.conf: 
 
 DEVICE /dev/mapper/3600a0b800050c94e07874d2e0028_part1 
 /dev/mapper/3600a0b8000511f5414b14d2df1b1_part1 
 /dev/mapper/3600a0b800050c94e07874d2e0028_part2 
 /dev/mapper/3600a0b8000511f5414b14d2df1b1_part2 
 /dev/mapper/3600a0b800050c94e07874d2e0028_part3 
 /dev/mapper/3600a0b8000511f5414b14d2df1b1_part3 
 ARRAY /dev/md0 metadata=0.90 UUID=c411c076:bb28916f:d50a93ef:800dd1f0 
 ARRAY /dev/md1 metadata=0.90 UUID=522279fa:f3cdbe3a:d50a93ef:800dd1f0 
 ARRAY /dev/md2 metadata=0.90
 UUID=01e07d7d:5305e46c:d50a93ef:800dd1f0 
 
 kr Patrik 
 
 
 Mit freundlichen Grüßen / Best Regards
 
 Patrik Rapposch, BSc
 System Administration
 
 KNAPP Systemintegration GmbH
 Waltenbachstraße 9
 8700 Leoben, Austria 
 Phone: +43 3842 805-915
 Fax: +43 3842 805-500
 patrik.rappo...@knapp.com 
 www.KNAPP.com
 
 Commercial register number: FN 138870x
 Commercial register court: Leoben
 
 The information in this e-mail (including any attachment) is
 confidential and intended to be for the use of the addressee(s) only.
 If you have received the e-mail by mistake, any disclosure, copy,
 distribution or use of the contents of the e-mail is prohibited, and
 you must delete the e-mail from your system. As e-mail can be changed
 electronically KNAPP assumes no responsibility for any alteration to
 this e-mail or its attachments. KNAPP has taken every reasonable
 precaution to ensure that any attachment to this e-mail has been swept
 for virus. However, KNAPP does not accept any liability for damage
 sustained as a result of such attachment being virus infected and
 strongly recommend that you carry out your own virus check before
 opening any attachment. 
 
 
 Holger Teutsch
 holger.teut...@web.de 
 
 06.03.2011 19:56 
 Bitte antworten an
   The Pacemaker cluster resource
   manager
   pacemaker@oss.clusterlabs.org
 
 
 
 
An
 The Pacemaker
 cluster resource
 manager
 pacemaker@oss.clusterlabs.org 
 Kopie
 
 Thema
 Re: [Pacemaker]
 WG: time pressure
 - software raid
 cluster, raid1
 ressource agent,
 help needed
 
 
 
 
 
 
 
 
 On Sun, 2011-03-06 at 12:40 +0100, patrik.rappo...@knapp.com wrote:
 Hi,
 assume the basic problem is in your raid configuration.
 
 If you unmap one box the devices should not be in status FAIL but in
 degraded.
 
 So what is the exit status of
 
 mdadm --detail --test /dev/md0
 
 after unmapping ?
 
 Furthermore I would start start with one isolated group containing the
 raid, LVM, and FS to keep it simple.
 
 Regards
 Holger
 
   Hy, 
  
  
  does anyone have an idea to that? I only have the servers till next
  week friday, so to my regret I am under time pressure :(
  
  
  
  Like I already wrote, I would appreciate and test any idea of you.
  Also if someone already made clusters with lvm-mirror, I would be
  happy to get a cib or some configuration examples.
  
   
  
  
  
  
  
  Thank you very much in advance.
  
   
  
  
  
  
  
  kr Patrik
  
  
  
  
  
  patrik.rappo...@knapp.com
  03.03.2011 15:11Bitte antworten anThe Pacemaker cluster resource
  manager
  
  An   pacemaker@oss.clusterlabs.org
  Kopie   
  Blindkopie   
  Thema   [Pacemaker] software raid cluster, raid1 ressource
 agent,help
  needed
  
  
  Good Day, 
  
  I have a 2 node active/passive cluster which is connected to two
  ibm
  4700 storages. I configured 3 raids and I use the Raid1 ressource
  agent for managing the Raid1s in the cluster. 
  When I now disable the mapping of one storage, to simulate the fail
 of
  one storage, the Raid1 Ressources change to the State FAILED and
 the
  second node then takes over the ressources and is able to start the
  raid devices. 
  
  So I am confused, why the active node can't keep the raid1
 ressources
  and the former passive node takes them over and can start them
  correct. 
  
  I would really appreciate your advice, or maybe someone already has
 a
  example configuration for Raid1 with two storages.
  
  Thank you very much in advance. Attached you can find my cib.xml. 
  
  kr Patrik 
  
  
  
  Mit freundlichen Grüßen / Best Regards
  
  Patrik Rapposch, BSc
  System Administration
  
  KNAPP Systemintegration GmbH
  Waltenbachstraße 9
  8700 Leoben, Austria 
  Phone: +43 3842 805-915
  Fax: +43 3842 805-500

Re: [Pacemaker] Antwort: Re: WG: time pressure - software raid cluster, raid1 ressource agent, help needed

2011-03-07 Thread Florian Haas
On 03/07/2011 05:04 AM, patrik.rappo...@knapp.com wrote:
 Hy,
 
 thx for answer. I tested this now, the problem is, mdadm hangs totally
 when we simulate the fail of one storage. (we already tried two ways: 1.
 removing the mapping., 2. removing one path, and then disabling the
 remaining path through the port on the san switch - which is nearly the
 same like a total fail of the storage).
 
 So I can't get the output of mdadm, because it hangs.

Usually a consequence of configuring multipathd to queue I/O
indefinitely when all paths fail. Do multipath -ll and look for
queue_if_no_path.

Florian



signature.asc
Description: OpenPGP digital signature
___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker