Lars, Thank you much for the answer on the "standby" issue. It seems that that was the tip of my real issue. So now I have both nodes coming online. And it seems ha1 starts fine with all the resources starting.
With them both online if I issue the: crm mode standby ha1.iohost.com Then I see IP Takeover on ha2 but the other resources do not start, ever, it remains: Node ha1.iohost.com (b159178d-c19b-4473-aa8e-13e487b65e33): standby Online: [ ha2.iohost.com ] Resource Group: WebServices ip1 (ocf::heartbeat:IPaddr2): Started ha2.iohost.com ip1arp (ocf::heartbeat:SendArp): Started ha2.iohost.com fs_webfs (ocf::heartbeat:Filesystem): Stopped fs_mysql (ocf::heartbeat:Filesystem): Stopped apache2 (lsb:httpd): Stopped mysql (ocf::heartbeat:mysql): Stopped Master/Slave Set: ms_drbd_mysql Slaves: [ ha2.iohost.com ] Stopped: [ drbd_mysql:0 ] Master/Slave Set: ms_drbd_webfs Slaves: [ ha2.iohost.com ] Stopped: [ drbd_webfs:0 ] In looking in the recent log I see this: May 20 12:46:42 ha2.iohost.com pengine: [3117]: info: native_color: Resource fs_webfs cannot run anywhere I am not sure why it cannot promote the other resources on ha2, I checked drbd before putting ha1 on standby and it was up to date. Here are the surrounding log entries, the only thing I changed in the config is standby="off" on both nodes: May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: group_print: Resource Group: WebServices May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: native_print: ip1 (ocf::heartbeat:IPaddr2): Started ha2.iohost.com May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: native_print: ip1arp (ocf::heartbeat:SendArp): Started ha2.iohost.com May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: native_print: fs_webfs (ocf::heartbeat:Filesystem): Stopped May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: native_print: fs_mysql (ocf::heartbeat:Filesystem): Stopped May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: native_print: apache2 (lsb:httpd): Stopped May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: native_print: mysql (ocf::heartbeat:mysql): Stopped May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: clone_print: Master/Slave Set: ms_drbd_mysql May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: short_print: Slaves: [ ha2.iohost.com ] May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: short_print: Stopped: [ drbd_mysql:0 ] May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: clone_print: Master/Slave Set: ms_drbd_webfs May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: short_print: Slaves: [ ha1.iohost.com ha2.iohost.com ] May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: rsc_merge_weights: ip1arp: Breaking dependency loop at ip1 May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: rsc_merge_weights: ip1: Breaking dependency loop at ip1arp May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: native_color: Resource drbd_webfs:0 cannot run anywhere May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: master_color: ms_drbd_webfs: Promoted 0 instances of a possible 1 to master May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: rsc_merge_weights: fs_webfs: Rolling back scores from fs_mysql May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: native_color: Resource fs_webfs cannot run anywhere May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: native_color: Resource drbd_mysql:0 cannot run anywhere May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: master_color: ms_drbd_mysql: Promoted 0 instances of a possible 1 to master May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: master_color: ms_drbd_mysql: Promoted 0 instances of a possible 1 to master May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: rsc_merge_weights: fs_mysql: Rolling back scores from apache2 May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: native_color: Resource fs_mysql cannot run anywhere May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: master_color: ms_drbd_mysql: Promoted 0 instances of a possible 1 to master May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: master_color: ms_drbd_webfs: Promoted 0 instances of a possible 1 to master May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: rsc_merge_weights: apache2: Rolling back scores from mysql May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: native_color: Resource apache2 cannot run anywhere May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: native_color: Resource mysql cannot run anywhere May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: master_color: ms_drbd_mysql: Promoted 0 instances of a possible 1 to master May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: master_color: ms_drbd_webfs: Promoted 0 instances of a possible 1 to master Regards, Randy On 5/19/2011 11:19 PM, Lars Ellenberg wrote: > On Thu, May 19, 2011 at 03:46:37PM -0700, Randy Katz wrote: >> To clarify, I was not seeking a quick response. I just noticed the >> threads I searched >> were NEVER answered, with the problem that I reported. That being said >> and about standby: >> >> Why does my node come up as standby and not as online? > Because you put it there. > > The standby setting (as a few others) can take a "lifetime", > and usually that defaults to "forever", though you can explicitly > specify an "until reboot", which actually means until restart of the > cluster system on that node. > >> Is there a setting in my conf file that affects that? >> Or another issue, is it configuration, please advise. >> >> Thanks, >> Randy >> >> PS - Here are some threads were it seems they were never answered, one >> going back 3 years ago: >> >> http://www.mail-archive.com/linux-ha@lists.linux-ha.org/msg09886.html >> http://www.mail-archive.com/pacemaker@oss.clusterlabs.org/msg07663.html >> http://lists.community.tummy.com/pipermail/linux-ha/2008-August/034310.html > Then they probably have been solved off list, via IRC or support, > or by the original user finally having a facepalm experience. > > Besides, yes, it happens that threads go unanswered, most of the time > because the question was badly asked ("does not work. why?"), and those > that could figure it out have been distracted by more important things, > or decided that, at that time, trying to figure it out was too time > consuming. > > That's life. > > If it happens to you, do a friendly bump, > and/or try to ask a smarter version of the question ;-) > > Most of the time, the answer is in the logs, and the config. > > But please break down the issue to a minimal configuration, > and post that minimal config plust logs of one "incident". > Don't post your 2 MB xml config, plus a 2G log, > and expect people to dig through that for fun. > > BTW, none of the quoted threads has anything to do with your experience, > afaiks. > >> On 5/19/2011 3:16 AM, Lars Ellenberg wrote: >>> On Wed, May 18, 2011 at 09:55:00AM -0700, Randy Katz wrote: >>>> ps - I searched a lot online and I see this issue coming up, >>> I doubt that _this_ issue comes up that often ;-) >>> >>>> and then after about 3-4 emails they request the resources and >>>> constraints and then there is never an answer to the thread, why?! >>> Hey, it's not even a day since you provided the config. >>> People have day jobs. >>> People get _payed_ to do support on these kinds of things, >>> so they probably first deal with requests by paying customers. >>> >>> If you need SLAs, you may need to check out a support contranct. >>> >>> Otherwise you need to be patient. >>> >>> >>>> From what I read, you probably just have misunderstood some concepts. >>> "Standby" is not what I think you think it is ;-) >>> >>> "Standby" is NOT for deciding where resources will be placed. >>> >>> "Standby" is for manually switching a node into a mode where it WILL NOT >>> run any resources. And it WILL NOT leave that state by itself. >>> It is not supposed to. >>> >>> You switch a node into standby if you want to do maintenance on that >>> node, do major software, system or hardware upgrades, or otherwise >>> expect that it won't be useful to run resources there. >>> >>> It won't even run DRBD secondaries. >>> It will run nothing there. >>> >>> >>> If you want automatic failover, DO NOT put your nodes in standby. >>> Because, if you do, they can not take over resources. >>> >>> You have to have your nodes online for any kind of failover to happen. >>> >>> If you want to have a "preferred" location for your resources, >>> use location constraints. >>> >>> >>> Does that help? >>> >>> >>> >> _______________________________________________ >> Linux-HA mailing list >> Linux-HA@lists.linux-ha.org >> http://lists.linux-ha.org/mailman/listinfo/linux-ha >> See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems