[Pacemaker] Antwort: Re: WG: time pressure - software raid cluster, raid1 ressource agent, help needed
Hy, thx for answer. I tested this now, the problem is, mdadm hangs totally when we simulate the fail of one storage. (we already tried two ways: 1. removing the mapping., 2. removing one path, and then disabling the remaining path through the port on the san switch - which is nearly the same like a total fail of the storage). So I can't get the output of mdadm, because it hangs. I think it must be a problem with mdadm. This is my mdadm.conf: DEVICE /dev/mapper/3600a0b800050c94e07874d2e0028_part1 /dev/mapper/3600a0b8000511f5414b14d2df1b1_part1 /dev/mapper/3600a0b800050c94e07874d2e0028_part2 /dev/mapper/3600a0b8000511f5414b14d2df1b1_part2 /dev/mapper/3600a0b800050c94e07874d2e0028_part3 /dev/mapper/3600a0b8000511f5414b14d2df1b1_part3 ARRAY /dev/md0 metadata=0.90 UUID=c411c076:bb28916f:d50a93ef:800dd1f0 ARRAY /dev/md1 metadata=0.90 UUID=522279fa:f3cdbe3a:d50a93ef:800dd1f0 ARRAY /dev/md2 metadata=0.90 UUID=01e07d7d:5305e46c:d50a93ef:800dd1f0 kr Patrik Mit freundlichen Grüßen / Best Regards Patrik Rapposch, BSc System Administration KNAPP Systemintegration GmbH Waltenbachstraße 9 8700 Leoben, Austria Phone: +43 3842 805-915 Fax: +43 3842 805-500 patrik.rappo...@knapp.com www.KNAPP.com Commercial register number: FN 138870x Commercial register court: Leoben The information in this e-mail (including any attachment) is confidential and intended to be for the use of the addressee(s) only. If you have received the e-mail by mistake, any disclosure, copy, distribution or use of the contents of the e-mail is prohibited, and you must delete the e-mail from your system. As e-mail can be changed electronically KNAPP assumes no responsibility for any alteration to this e-mail or its attachments. KNAPP has taken every reasonable precaution to ensure that any attachment to this e-mail has been swept for virus. However, KNAPP does not accept any liability for damage sustained as a result of such attachment being virus infected and strongly recommend that you carry out your own virus check before opening any attachment. Holger Teutsch holger.teut...@web.de 06.03.2011 19:56 Bitte antworten an The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org An The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org Kopie Thema Re: [Pacemaker] WG: time pressure - software raid cluster, raid1 ressource agent, help needed On Sun, 2011-03-06 at 12:40 +0100, patrik.rappo...@knapp.com wrote: Hi, assume the basic problem is in your raid configuration. If you unmap one box the devices should not be in status FAIL but in degraded. So what is the exit status of mdadm --detail --test /dev/md0 after unmapping ? Furthermore I would start start with one isolated group containing the raid, LVM, and FS to keep it simple. Regards Holger Hy, does anyone have an idea to that? I only have the servers till next week friday, so to my regret I am under time pressure :( Like I already wrote, I would appreciate and test any idea of you. Also if someone already made clusters with lvm-mirror, I would be happy to get a cib or some configuration examples. Thank you very much in advance. kr Patrik patrik.rappo...@knapp.com 03.03.2011 15:11Bitte antworten anThe Pacemaker cluster resource manager An pacemaker@oss.clusterlabs.org Kopie Blindkopie Thema [Pacemaker] software raid cluster, raid1 ressource agent,help needed Good Day, I have a 2 node active/passive cluster which is connected to two ibm 4700 storages. I configured 3 raids and I use the Raid1 ressource agent for managing the Raid1s in the cluster. When I now disable the mapping of one storage, to simulate the fail of one storage, the Raid1 Ressources change to the State FAILED and the second node then takes over the ressources and is able to start the raid devices. So I am confused, why the active node can't keep the raid1 ressources and the former passive node takes them over and can start them correct. I would really appreciate your advice, or maybe someone already has a example configuration for Raid1 with two storages. Thank you very much in advance. Attached you can find my cib.xml. kr Patrik Mit freundlichen Grüßen / Best Regards Patrik Rapposch, BSc System Administration KNAPP Systemintegration GmbH Waltenbachstraße 9 8700 Leoben, Austria Phone: +43 3842 805-915 Fax: +43 3842 805-500 patrik.rappo...@knapp.com www.KNAPP.com Commercial register number: FN 138870x Commercial register court: Leoben The information in this e-mail (including any attachment) is confidential and intended to be for the use of the addressee(s) only. If you have received the e-mail by mistake, any disclosure, copy, distribution or use of the contents of the e-mail is prohibited, and you must delete the e-mail from your system. As e-mail can be changed electronically KNAPP assumes no
Re: [Pacemaker] Antwort: Re: WG: time pressure - software raid cluster, raid1 ressource agent, help needed
Hi, SAN drivers often cave large timeouts configured, so are you patient enough ? At least this demonstrates that the problem is currently not in the cluster... - holger On Mon, 2011-03-07 at 11:04 +0100, patrik.rappo...@knapp.com wrote: Hy, thx for answer. I tested this now, the problem is, mdadm hangs totally when we simulate the fail of one storage. (we already tried two ways: 1. removing the mapping., 2. removing one path, and then disabling the remaining path through the port on the san switch - which is nearly the same like a total fail of the storage). So I can't get the output of mdadm, because it hangs. I think it must be a problem with mdadm. This is my mdadm.conf: DEVICE /dev/mapper/3600a0b800050c94e07874d2e0028_part1 /dev/mapper/3600a0b8000511f5414b14d2df1b1_part1 /dev/mapper/3600a0b800050c94e07874d2e0028_part2 /dev/mapper/3600a0b8000511f5414b14d2df1b1_part2 /dev/mapper/3600a0b800050c94e07874d2e0028_part3 /dev/mapper/3600a0b8000511f5414b14d2df1b1_part3 ARRAY /dev/md0 metadata=0.90 UUID=c411c076:bb28916f:d50a93ef:800dd1f0 ARRAY /dev/md1 metadata=0.90 UUID=522279fa:f3cdbe3a:d50a93ef:800dd1f0 ARRAY /dev/md2 metadata=0.90 UUID=01e07d7d:5305e46c:d50a93ef:800dd1f0 kr Patrik Mit freundlichen Grüßen / Best Regards Patrik Rapposch, BSc System Administration KNAPP Systemintegration GmbH Waltenbachstraße 9 8700 Leoben, Austria Phone: +43 3842 805-915 Fax: +43 3842 805-500 patrik.rappo...@knapp.com www.KNAPP.com Commercial register number: FN 138870x Commercial register court: Leoben The information in this e-mail (including any attachment) is confidential and intended to be for the use of the addressee(s) only. If you have received the e-mail by mistake, any disclosure, copy, distribution or use of the contents of the e-mail is prohibited, and you must delete the e-mail from your system. As e-mail can be changed electronically KNAPP assumes no responsibility for any alteration to this e-mail or its attachments. KNAPP has taken every reasonable precaution to ensure that any attachment to this e-mail has been swept for virus. However, KNAPP does not accept any liability for damage sustained as a result of such attachment being virus infected and strongly recommend that you carry out your own virus check before opening any attachment. Holger Teutsch holger.teut...@web.de 06.03.2011 19:56 Bitte antworten an The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org An The Pacemaker cluster resource manager pacemaker@oss.clusterlabs.org Kopie Thema Re: [Pacemaker] WG: time pressure - software raid cluster, raid1 ressource agent, help needed On Sun, 2011-03-06 at 12:40 +0100, patrik.rappo...@knapp.com wrote: Hi, assume the basic problem is in your raid configuration. If you unmap one box the devices should not be in status FAIL but in degraded. So what is the exit status of mdadm --detail --test /dev/md0 after unmapping ? Furthermore I would start start with one isolated group containing the raid, LVM, and FS to keep it simple. Regards Holger Hy, does anyone have an idea to that? I only have the servers till next week friday, so to my regret I am under time pressure :( Like I already wrote, I would appreciate and test any idea of you. Also if someone already made clusters with lvm-mirror, I would be happy to get a cib or some configuration examples. Thank you very much in advance. kr Patrik patrik.rappo...@knapp.com 03.03.2011 15:11Bitte antworten anThe Pacemaker cluster resource manager An pacemaker@oss.clusterlabs.org Kopie Blindkopie Thema [Pacemaker] software raid cluster, raid1 ressource agent,help needed Good Day, I have a 2 node active/passive cluster which is connected to two ibm 4700 storages. I configured 3 raids and I use the Raid1 ressource agent for managing the Raid1s in the cluster. When I now disable the mapping of one storage, to simulate the fail of one storage, the Raid1 Ressources change to the State FAILED and the second node then takes over the ressources and is able to start the raid devices. So I am confused, why the active node can't keep the raid1 ressources and the former passive node takes them over and can start them correct. I would really appreciate your advice, or maybe someone already has a example configuration for Raid1 with two storages. Thank you very much in advance. Attached you can find my cib.xml. kr Patrik Mit freundlichen Grüßen / Best Regards Patrik Rapposch, BSc System Administration KNAPP Systemintegration GmbH Waltenbachstraße 9 8700 Leoben, Austria Phone: +43 3842 805-915 Fax: +43 3842 805-500
Re: [Pacemaker] Antwort: Re: WG: time pressure - software raid cluster, raid1 ressource agent, help needed
On 03/07/2011 05:04 AM, patrik.rappo...@knapp.com wrote: Hy, thx for answer. I tested this now, the problem is, mdadm hangs totally when we simulate the fail of one storage. (we already tried two ways: 1. removing the mapping., 2. removing one path, and then disabling the remaining path through the port on the san switch - which is nearly the same like a total fail of the storage). So I can't get the output of mdadm, because it hangs. Usually a consequence of configuring multipathd to queue I/O indefinitely when all paths fail. Do multipath -ll and look for queue_if_no_path. Florian signature.asc Description: OpenPGP digital signature ___ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker