Re: [Linux-HA] Need HA Help - standby / online not switching automatically

Randy Katz Thu, 19 May 2011 23:53:56 -0700

Lars,

Thank you much for the answer on the "standby" issue.
It seems that that was the tip of my real issue. So now I have both nodes
coming online. And it seems ha1 starts fine with all the resources starting.


With them both online if I issue the: crm mode standby ha1.iohost.com

Then I see IP Takeover on ha2 but the other resources do not start, 
ever, it remains:

Node ha1.iohost.com (b159178d-c19b-4473-aa8e-13e487b65e33): standby
Online: [ ha2.iohost.com ]

  Resource Group: WebServices
      ip1        (ocf::heartbeat:IPaddr2):       Started ha2.iohost.com
      ip1arp     (ocf::heartbeat:SendArp):       Started ha2.iohost.com
      fs_webfs   (ocf::heartbeat:Filesystem):    Stopped
      fs_mysql   (ocf::heartbeat:Filesystem):    Stopped
      apache2    (lsb:httpd):    Stopped
      mysql      (ocf::heartbeat:mysql): Stopped
  Master/Slave Set: ms_drbd_mysql
      Slaves: [ ha2.iohost.com ]
      Stopped: [ drbd_mysql:0 ]
  Master/Slave Set: ms_drbd_webfs
      Slaves: [ ha2.iohost.com ]
      Stopped: [ drbd_webfs:0 ]

In looking in the recent log I see this: May 20 12:46:42 ha2.iohost.com 
pengine: [3117]: info: native_color: Resource fs_webfs cannot run anywhere

I am not sure why it cannot promote the other resources on ha2, I 
checked drbd before putting ha1 on standby and it was up to date. Here 
are the surrounding
log entries, the only thing I changed in the config is standby="off" on 
both nodes:

May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: group_print:  
Resource Group: WebServices
May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: 
native_print:      ip1  (ocf::heartbeat:IPaddr2):       Started 
ha2.iohost.com
May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: 
native_print:      ip1arp       (ocf::heartbeat:SendArp):       Started 
ha2.iohost.com
May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: 
native_print:      fs_webfs     (ocf::heartbeat:Filesystem):    Stopped
May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: 
native_print:      fs_mysql     (ocf::heartbeat:Filesystem):    Stopped
May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: 
native_print:      apache2      (lsb:httpd):    Stopped
May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: 
native_print:      mysql        (ocf::heartbeat:mysql): Stopped
May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: clone_print:  
Master/Slave Set: ms_drbd_mysql
May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: 
short_print:      Slaves: [ ha2.iohost.com ]
May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: 
short_print:      Stopped: [ drbd_mysql:0 ]
May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: clone_print:  
Master/Slave Set: ms_drbd_webfs
May 20 12:47:06 ha2.iohost.com pengine: [3117]: notice: 
short_print:      Slaves: [ ha1.iohost.com ha2.iohost.com ]
May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: rsc_merge_weights: 
ip1arp: Breaking dependency loop at ip1
May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: rsc_merge_weights: 
ip1: Breaking dependency loop at ip1arp
May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: native_color: 
Resource drbd_webfs:0 cannot run anywhere
May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: master_color: 
ms_drbd_webfs: Promoted 0 instances of a possible 1 to master
May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: rsc_merge_weights: 
fs_webfs: Rolling back scores from fs_mysql
May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: native_color: 
Resource fs_webfs cannot run anywhere
May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: native_color: 
Resource drbd_mysql:0 cannot run anywhere
May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: master_color: 
ms_drbd_mysql: Promoted 0 instances of a possible 1 to master
May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: master_color: 
ms_drbd_mysql: Promoted 0 instances of a possible 1 to master
May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: rsc_merge_weights: 
fs_mysql: Rolling back scores from apache2
May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: native_color: 
Resource fs_mysql cannot run anywhere
May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: master_color: 
ms_drbd_mysql: Promoted 0 instances of a possible 1 to master
May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: master_color: 
ms_drbd_webfs: Promoted 0 instances of a possible 1 to master
May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: rsc_merge_weights: 
apache2: Rolling back scores from mysql
May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: native_color: 
Resource apache2 cannot run anywhere
May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: native_color: 
Resource mysql cannot run anywhere
May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: master_color: 
ms_drbd_mysql: Promoted 0 instances of a possible 1 to master
May 20 12:47:06 ha2.iohost.com pengine: [3117]: info: master_color: 
ms_drbd_webfs: Promoted 0 instances of a possible 1 to master

Regards,
Randy

On 5/19/2011 11:19 PM, Lars Ellenberg wrote:
> On Thu, May 19, 2011 at 03:46:37PM -0700, Randy Katz wrote:
>> To clarify, I was not seeking a quick response. I just noticed the
>> threads I searched
>> were NEVER answered, with the problem that I reported. That being said
>> and about standby:
>>
>> Why does my node come up as standby and not as online?
> Because you put it there.
>
> The standby setting (as a few others) can take a "lifetime",
> and usually that defaults to "forever", though you can explicitly
> specify an "until reboot", which actually means until restart of the
> cluster system on that node.
>
>> Is there a setting in my conf file that affects that?
>> Or another issue, is it configuration, please advise.
>>
>> Thanks,
>> Randy
>>
>> PS - Here are some threads were it seems they were never answered, one
>> going back 3 years ago:
>>
>> http://www.mail-archive.com/linux-ha@lists.linux-ha.org/msg09886.html
>> http://www.mail-archive.com/pacemaker@oss.clusterlabs.org/msg07663.html
>> http://lists.community.tummy.com/pipermail/linux-ha/2008-August/034310.html
> Then they probably have been solved off list, via IRC or support,
> or by the original user finally having a facepalm experience.
>
> Besides, yes, it happens that threads go unanswered, most of the time
> because the question was badly asked ("does not work. why?"), and those
> that could figure it out have been distracted by more important things,
> or decided that, at that time, trying to figure it out was too time
> consuming.
>
> That's life.
>
> If it happens to you, do a friendly bump,
> and/or try to ask a smarter version of the question ;-)
>
> Most of the time, the answer is in the logs, and the config.
>
> But please break down the issue to a minimal configuration,
> and post that minimal config plust logs of one "incident".
> Don't post your 2 MB xml config, plus a 2G log,
> and expect people to dig through that for fun.
>
> BTW, none of the quoted threads has anything to do with your experience,
> afaiks.
>
>> On 5/19/2011 3:16 AM, Lars Ellenberg wrote:
>>> On Wed, May 18, 2011 at 09:55:00AM -0700, Randy Katz wrote:
>>>> ps - I searched a lot online and I see this issue coming up,
>>> I doubt that _this_ issue comes up that often ;-)
>>>
>>>> and then after about 3-4 emails they request the resources and
>>>> constraints and then there is never an answer to the thread, why?!
>>> Hey, it's not even a day since you provided the config.
>>> People have day jobs.
>>> People get _payed_ to do support on these kinds of things,
>>> so they probably first deal with requests by paying customers.
>>>
>>> If you need SLAs, you may need to check out a support contranct.
>>>
>>> Otherwise you need to be patient.
>>>
>>>
>>>>  From what I read, you probably just have misunderstood some concepts.
>>> "Standby" is not what I think you think it is ;-)
>>>
>>> "Standby" is NOT for deciding where resources will be placed.
>>>
>>> "Standby" is for manually switching a node into a mode where it WILL NOT
>>> run any resources. And it WILL NOT leave that state by itself.
>>> It is not supposed to.
>>>
>>> You switch a node into standby if you want to do maintenance on that
>>> node, do major software, system or hardware upgrades, or otherwise
>>> expect that it won't be useful to run resources there.
>>>
>>> It won't even run DRBD secondaries.
>>> It will run nothing there.
>>>
>>>
>>> If you want automatic failover, DO NOT put your nodes in standby.
>>> Because, if you do, they can not take over resources.
>>>
>>> You have to have your nodes online for any kind of failover to happen.
>>>
>>> If you want to have a "preferred" location for your resources,
>>> use location constraints.
>>>
>>>
>>> Does that help?
>>>
>>>
>>>
>> _______________________________________________
>> Linux-HA mailing list
>> Linux-HA@lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems

_______________________________________________
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Need HA Help - standby / online not switching automatically

Reply via email to