Dear All,
I have configured 128-node OSCAR in my company. Since all nodes are kept in the same room some of the nodes fail often. I have 10 nodes dedicated to failure recovery. I have two queries in this situation.
 
1. Is there any opensource cluster management software that keeps track of the hardware details the systems in the cluster so that when one node fails the other node (dedicated for failure recovery) gets waked up automatically shutting down the failed node.
 
2. Can heart-beat connection be established in OSCAR itself or HA-OSCAR is required. I want to have 10 nodes dedicated for failure recovery. These 10 nodes do not form part of the normal operation of the cluster. When nodes fails these nodes takes control of the failed node shutting down the later and joins the cluster through cluster management software.
 
Regards,
Rajiv

Reply via email to