Re: [Pacemaker] primary does not run alone

2011-10-11 Thread H . Nakai
Hi, Lars, everybody

(2011/10/11 17:24), Lars Ellenberg wrote:
> DRBD has fencing policies (fencing resource-and-stonith, for example),
> which, if configured, cause it to call fencing handlers (handler { fence-peer 
>  })
> when appropriate.
> 
> There are various fence-peer handlers.
>  One is the "drbd-peer-outdater",
> which needs dopd, which at this point depends on the heartbeat
> communication layer.
> 
Yes, but one problem is heatbeat or crm do not get the status
of drbd correctly.
These are versions of my system, maybe old.
drbd83-8.3.8-1.el5
heartbeat-3.0.5-1.1.el5
pacemaker-1.0.11-1.2.el5
resource-agents-3.9.2-1.1.el5
centos5.6

I checked some variables
in /usr/lib/ocf/resource.d/linbit/drbd script,
on going shutdown.
In drbd_status() or maybe_outdate_self(),
drbd recognize both roles(local and remote) correctly.
$DRBD_ROLE_LOCAL and $DRBD_ROLE_LOCAL show roles
"Secondary" or "Unknown".
But, $OCF_RESKEY_CRM_meta_notify_master_uname or
$OCF_RESKEY_CRM_meta_notify_promote_uname still show
hostname which was primary.
So, it writes "outdate" to local.
I do not understand why $OCF_RESKEY... are needed.
I think it's enogh to check only $DRBD_ROLE... variables.
In the newer version, $OCF_RESKEY... are ignored?
Or correct?

Thanks,

Nickey

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] primary does not run alone

2011-10-11 Thread Lars Ellenberg
On Tue, Oct 11, 2011 at 09:09:52AM +0900, H.Nakai wrote:
> Hi, Andreas, Lars, and everybody
> 
> I will try newer version.
> 
> But, I want below.

DRBD has fencing policies (fencing resource-and-stonith, for example),
which, if configured, cause it to call fencing handlers (handler { fence-peer 
 })
when appropriate.

There are various fence-peer handlers.
 One is the "drbd-peer-outdater",
which needs dopd, which at this point depends on the heartbeat
communication layer.

Then there is the crm-fence-peer.sh script,
which works by setting a pacemaker location constraint instead of
actually setting the peer outdated.

See if that works like you think it should.

> Primary
>   demote
>   wait 5-10 seconds
>   check Secondary is promoted or
> still secondary or disconnected
>   if Secondary is promoted and still primary,
>set local "outdate"
>   (This means shutdown only Primary)
>   if Secondary is still secondary or disconnected,
> not set local "outdate"
>   (This means shutdown both of Primary and Secondary)
>   disconnect
>   shutdown
> Seconday
>   check Primary
>   if Primary is primary, set local "outdate"
>   if Primary is demoted(secondary), not set "outdate"
>   disconnect
>   shutdown
> 
> (2011/10/08 7:14), Lars Ellenberg wrote:
> > On Fri, Oct 07, 2011 at 11:29:57PM +0200, Andreas Kurz wrote:
> >> Hello,
> >> 
> >> On 10/07/2011 04:51 AM, H.Nakai wrote:
> >> > Hi, I'm from Japan, in trouble.
> >> > In the case blow, server which was primary
> >> > sometimes do not run drbd/heartbeat.
> >> > 
> >> > Server A(primary), Server B(secondary) is running.
> >> > Shutdown A and immediately Shutdown B.
> >> > Switch on only A, it dose not run drbd/heartbeat.
> >> > 
> >> > It may happen when one server was broken.
> >> > 
> >> > I'm using,
> >> > drbd83-8.3.8-1.el5
> >> > heartbeat-3.0.5-1.1.el5
> >> > pacemaker-1.0.11-1.2.el5
> >> > resource-agents-3.9.2-1.1.el5
> >> > centos5.6
> >> > Servers are using two LANs(eth0, eth1) and not using serial cable.
> >> > 
> >> > I checked /usr/lib/ocf/resource.d/linbit/drbd,
> >> > and insert some debug codes.
> >> > At drbd_stop(), in while loop,
> >> > only when "Unconfigured", break and call maybe_outdate_self().
> >> > But sometimes, $OCF_RESKEY_CRM_meta_notify_master_uname or
> >> > $OCF_RESKEY_CRM_meta_notify_promote_uname are not null.
> >> > So, at maybe_outdate_self(), it is going to set "outdate".
> >> > And, it always show warning messages below. But, "outdated" flag is set.
> >> > "State change failed: Disk state is lower than outdated"
> >> > " state = { cs:StandAlone ro:Secondary/Unknown ds:Diskless/DUnknown r--- 
> >> > }"
> >> > "wanted = { cs:StandAlone ro:Secondary/Unknown ds:Outdated/DUnknown r--- 
> >> > }"
> > 
> > those are expected and harmless, even though I admit they are annoying.
> > 
> >> > I do not want to be set outdated flag, when shutdown both of them.
> >> > I want to know what program set $OCF_RESKEY_CRM_* variables,
> >> > with what condition set these variables,
> >> > and when these variables are set.
> >> 
> >> you need a newer OCF resource agent, at least from DRBD 8.3.9. There was
> >> the new parameter "stop_outdates_secondary" (defaults to true)
> >> introduced ... set this to false to change the behavior of your setup
> >> and be warned: this increases the change to come up with old (outdated)
> >> data.
> > 
> > BTW, that default has changed to false,
> > because of a bug in some version of pacemaker,
> > which got the environment for stop operations wrong.
> > pacemaker 1.0.11 is ok again, iirc.
> > 
> > Anyways, if you simply go to DRBD 8.3.11, you should be good.
> > If you want only the agent script, grab it there:
> > http://git.drbd.org/drbd-8.3.git/?a=blob_plain;f=scripts/drbd.ocf
> > 
> 
> Thanks,
> 
> Nickey
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: 
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] primary does not run alone

2011-10-10 Thread H . Nakai
Hi, Andreas, Lars, and everybody

I will try newer version.

But, I want below.

Primary
  demote
  wait 5-10 seconds
  check Secondary is promoted or
still secondary or disconnected
  if Secondary is promoted and still primary,
   set local "outdate"
  (This means shutdown only Primary)
  if Secondary is still secondary or disconnected,
not set local "outdate"
  (This means shutdown both of Primary and Secondary)
  disconnect
  shutdown
Seconday
  check Primary
  if Primary is primary, set local "outdate"
  if Primary is demoted(secondary), not set "outdate"
  disconnect
  shutdown

(2011/10/08 7:14), Lars Ellenberg wrote:
> On Fri, Oct 07, 2011 at 11:29:57PM +0200, Andreas Kurz wrote:
>> Hello,
>> 
>> On 10/07/2011 04:51 AM, H.Nakai wrote:
>> > Hi, I'm from Japan, in trouble.
>> > In the case blow, server which was primary
>> > sometimes do not run drbd/heartbeat.
>> > 
>> > Server A(primary), Server B(secondary) is running.
>> > Shutdown A and immediately Shutdown B.
>> > Switch on only A, it dose not run drbd/heartbeat.
>> > 
>> > It may happen when one server was broken.
>> > 
>> > I'm using,
>> > drbd83-8.3.8-1.el5
>> > heartbeat-3.0.5-1.1.el5
>> > pacemaker-1.0.11-1.2.el5
>> > resource-agents-3.9.2-1.1.el5
>> > centos5.6
>> > Servers are using two LANs(eth0, eth1) and not using serial cable.
>> > 
>> > I checked /usr/lib/ocf/resource.d/linbit/drbd,
>> > and insert some debug codes.
>> > At drbd_stop(), in while loop,
>> > only when "Unconfigured", break and call maybe_outdate_self().
>> > But sometimes, $OCF_RESKEY_CRM_meta_notify_master_uname or
>> > $OCF_RESKEY_CRM_meta_notify_promote_uname are not null.
>> > So, at maybe_outdate_self(), it is going to set "outdate".
>> > And, it always show warning messages below. But, "outdated" flag is set.
>> > "State change failed: Disk state is lower than outdated"
>> > " state = { cs:StandAlone ro:Secondary/Unknown ds:Diskless/DUnknown r--- }"
>> > "wanted = { cs:StandAlone ro:Secondary/Unknown ds:Outdated/DUnknown r--- }"
> 
> those are expected and harmless, even though I admit they are annoying.
> 
>> > I do not want to be set outdated flag, when shutdown both of them.
>> > I want to know what program set $OCF_RESKEY_CRM_* variables,
>> > with what condition set these variables,
>> > and when these variables are set.
>> 
>> you need a newer OCF resource agent, at least from DRBD 8.3.9. There was
>> the new parameter "stop_outdates_secondary" (defaults to true)
>> introduced ... set this to false to change the behavior of your setup
>> and be warned: this increases the change to come up with old (outdated)
>> data.
> 
> BTW, that default has changed to false,
> because of a bug in some version of pacemaker,
> which got the environment for stop operations wrong.
> pacemaker 1.0.11 is ok again, iirc.
> 
> Anyways, if you simply go to DRBD 8.3.11, you should be good.
> If you want only the agent script, grab it there:
> http://git.drbd.org/drbd-8.3.git/?a=blob_plain;f=scripts/drbd.ocf
> 

Thanks,

Nickey

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] primary does not run alone

2011-10-07 Thread Lars Ellenberg
On Fri, Oct 07, 2011 at 11:29:57PM +0200, Andreas Kurz wrote:
> Hello,
> 
> On 10/07/2011 04:51 AM, H.Nakai wrote:
> > Hi, I'm from Japan, in trouble.
> > In the case blow, server which was primary
> > sometimes do not run drbd/heartbeat.
> > 
> > Server A(primary), Server B(secondary) is running.
> > Shutdown A and immediately Shutdown B.
> > Switch on only A, it dose not run drbd/heartbeat.
> > 
> > It may happen when one server was broken.
> > 
> > I'm using,
> > drbd83-8.3.8-1.el5
> > heartbeat-3.0.5-1.1.el5
> > pacemaker-1.0.11-1.2.el5
> > resource-agents-3.9.2-1.1.el5
> > centos5.6
> > Servers are using two LANs(eth0, eth1) and not using serial cable.
> > 
> > I checked /usr/lib/ocf/resource.d/linbit/drbd,
> > and insert some debug codes.
> > At drbd_stop(), in while loop,
> > only when "Unconfigured", break and call maybe_outdate_self().
> > But sometimes, $OCF_RESKEY_CRM_meta_notify_master_uname or
> > $OCF_RESKEY_CRM_meta_notify_promote_uname are not null.
> > So, at maybe_outdate_self(), it is going to set "outdate".
> > And, it always show warning messages below. But, "outdated" flag is set.
> > "State change failed: Disk state is lower than outdated"
> > " state = { cs:StandAlone ro:Secondary/Unknown ds:Diskless/DUnknown r--- }"
> > "wanted = { cs:StandAlone ro:Secondary/Unknown ds:Outdated/DUnknown r--- }"

those are expected and harmless, even though I admit they are annoying.

> > I do not want to be set outdated flag, when shutdown both of them.
> > I want to know what program set $OCF_RESKEY_CRM_* variables,
> > with what condition set these variables,
> > and when these variables are set.
> 
> you need a newer OCF resource agent, at least from DRBD 8.3.9. There was
> the new parameter "stop_outdates_secondary" (defaults to true)
> introduced ... set this to false to change the behavior of your setup
> and be warned: this increases the change to come up with old (outdated)
> data.

BTW, that default has changed to false,
because of a bug in some version of pacemaker,
which got the environment for stop operations wrong.
pacemaker 1.0.11 is ok again, iirc.

Anyways, if you simply go to DRBD 8.3.11, you should be good.
If you want only the agent script, grab it there:
http://git.drbd.org/drbd-8.3.git/?a=blob_plain;f=scripts/drbd.ocf

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


Re: [Pacemaker] primary does not run alone

2011-10-07 Thread Andreas Kurz
Hello,

On 10/07/2011 04:51 AM, H.Nakai wrote:
> Hi, I'm from Japan, in trouble.
> In the case blow, server which was primary
> sometimes do not run drbd/heartbeat.
> 
> Server A(primary), Server B(secondary) is running.
> Shutdown A and immediately Shutdown B.
> Switch on only A, it dose not run drbd/heartbeat.
> 
> It may happen when one server was broken.
> 
> I'm using,
> drbd83-8.3.8-1.el5
> heartbeat-3.0.5-1.1.el5
> pacemaker-1.0.11-1.2.el5
> resource-agents-3.9.2-1.1.el5
> centos5.6
> Servers are using two LANs(eth0, eth1) and not using serial cable.
> 
> I checked /usr/lib/ocf/resource.d/linbit/drbd,
> and insert some debug codes.
> At drbd_stop(), in while loop,
> only when "Unconfigured", break and call maybe_outdate_self().
> But sometimes, $OCF_RESKEY_CRM_meta_notify_master_uname or
> $OCF_RESKEY_CRM_meta_notify_promote_uname are not null.
> So, at maybe_outdate_self(), it is going to set "outdate".
> And, it always show warning messages below. But, "outdated" flag is set.
> "State change failed: Disk state is lower than outdated"
> " state = { cs:StandAlone ro:Secondary/Unknown ds:Diskless/DUnknown r--- }"
> "wanted = { cs:StandAlone ro:Secondary/Unknown ds:Outdated/DUnknown r--- }"
> 
> I do not want to be set outdated flag, when shutdown both of them.
> I want to know what program set $OCF_RESKEY_CRM_* variables,
> with what condition set these variables,
> and when these variables are set.

you need a newer OCF resource agent, at least from DRBD 8.3.9. There was
the new parameter "stop_outdates_secondary" (defaults to true)
introduced ... set this to false to change the behavior of your setup
and be warned: this increases the change to come up with old (outdated)
data.

Regards,
Andreas

-- 
Need help with Pacemaker?
http://www.hastexo.com/now

> 
> Thanks,
> 
> Nickey
> 
> ___
> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
> http://oss.clusterlabs.org/mailman/listinfo/pacemaker
> 
> Project Home: http://www.clusterlabs.org
> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
> Bugs: 
> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker


___
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker