Re: [Linux-ha-dev] [PATCH][crmsh] deal with the case-insentive hostname

2013-05-29 Thread Junko IKEDA
Hi Dejan,

Sorry for no reply.
I tried this, and it works well!
http://hg.savannah.gnu.org/hgweb/crmsh/rev/1ebbf036c6d9

Many thanks for your review.

Thanks,
Junko


2013/5/29 Dejan Muhamedagic 

> On Tue, Apr 23, 2013 at 04:44:19PM +0200, Dejan Muhamedagic wrote:
> > Hi Junko-san,
> >
> > Can you try the attached patch, instead of this one?
>
> Any news? Was the patch any good?
>
> Cheers,
>
> Dejan
>
> > Cheers,
> >
> > Dejan
> >
> > On Wed, Apr 10, 2013 at 06:13:45PM +0900, Junko IKEDA wrote:
> > > Hi,
> > > I set upper-case hostname (GUEST03/GUEST4) and run Pacemaker 1.1.9 +
> > > Corosync 2.3.0.
> > >
> > > [root@GUEST04 ~]# crm_mon -1
> > > Last updated: Wed Apr 10 15:12:48 2013
> > > Last change: Wed Apr 10 14:02:36 2013 via crmd on GUEST04
> > > Stack: corosync
> > > Current DC: GUEST04 (3232242817) - partition with quorum
> > > Version: 1.1.9-e8caee8
> > > 2 Nodes configured, unknown expected votes
> > > 1 Resources configured.
> > >
> > >
> > > Online: [ GUEST03 GUEST04 ]
> > >
> > >  dummy  (ocf::pacemaker:Dummy): Started GUEST03
> > >
> > >
> > > for example, call crm shell with lower-case hostname.
> > >
> > > [root@GUEST04 ~]# crm node standby guest03
> > > ERROR: bad lifetime: guest03
> > >
> > > "crm node standby GUEST03" surely works well,
> > > so crm shell just doesn't take into account the hostname conversion.
> > > It's better to accept the both of the upper/lower-case.
> > >
> > > "node standby", "node delete", "resource migrate(move)"  get hit with
> this
> > > issue.
> > > Please see the attached.
> > >
> > > Thanks,
> > > Junko
> >
> >
> > > _______
> > > Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> > > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> > > Home Page: http://linux-ha.org/
> >
>
> > # HG changeset patch
> > # User Dejan Muhamedagic 
> > # Date 1366728211 -7200
> > # Node ID cd4d36b347c17b06b76f3386c041947a03c708bb
> > # Parent  4a47465b1fe1f48123080b4336f0b4516d9264f6
> > Medium: node: ignore case when looking up nodes (thanks to Junko Ikeda)
> >
> > diff -r 4a47465b1fe1 -r cd4d36b347c1 modules/ui.py.in
> > --- a/modules/ui.py.inTue Apr 23 11:23:10 2013 +0200
> > +++ b/modules/ui.py.inTue Apr 23 16:43:31 2013 +0200
> > @@ -924,7 +924,7 @@ class RscMgmt(UserInterface):
> >  lifetime = None
> >  opt_l = fetch_opts(argl, ["force"])
> >  if len(argl) == 1:
> > -if not argl[0] in listnodes():
> > +if not is_node(argl[0]):
> >  lifetime = argl[0]
> >  else:
> >  node = argl[0]
> > @@ -1186,7 +1186,7 @@ class NodeMgmt(UserInterface):
> >  if not args:
> >  node = vars.this_node
> >  if len(args) == 1:
> > -if not args[0] in listnodes():
> > +if not is_node(args[0]):
> >  node = vars.this_node
> >  lifetime = args[0]
> >  else:
> > @@ -1249,7 +1249,7 @@ class NodeMgmt(UserInterface):
> >  'usage: delete '
> >  if not is_name_sane(node):
> >  return False
> > -if not node in listnodes():
> > +if not is_node(node):
> >  common_err("node %s not found in the CIB" % node)
> >  return False
> >  rc = True
> > diff -r 4a47465b1fe1 -r cd4d36b347c1 modules/xmlutil.py
> > --- a/modules/xmlutil.py  Tue Apr 23 11:23:10 2013 +0200
> > +++ b/modules/xmlutil.py  Tue Apr 23 16:43:31 2013 +0200
> > @@ -159,6 +159,15 @@ def mk_rsc_type(n):
> >  if ra_provider:
> >  s2 = "%s:"%ra_provider
> >  return ''.join((s1,s2,ra_type))
> > +def is_node(s):
> > +'''
> > +Check if s is in a list of our nodes (ignore case).
> > +This is not fast, perhaps should be cached.
> > +'''
> > +for n in listnodes():
> > +if n.lower() == s.lower():
> > +return True
> > +return False
> >  def listnodes():
> >  nodes_elem = cibdump2elem("nodes")
> >  if nodes_elem is None:
>
> > ___
> > Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> > Home Page: http://linux-ha.org/
>
>
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] [PATCH][crmsh] deal with the case-insentive hostname

2013-04-10 Thread Junko IKEDA
Hi,
I set upper-case hostname (GUEST03/GUEST4) and run Pacemaker 1.1.9 +
Corosync 2.3.0.

[root@GUEST04 ~]# crm_mon -1
Last updated: Wed Apr 10 15:12:48 2013
Last change: Wed Apr 10 14:02:36 2013 via crmd on GUEST04
Stack: corosync
Current DC: GUEST04 (3232242817) - partition with quorum
Version: 1.1.9-e8caee8
2 Nodes configured, unknown expected votes
1 Resources configured.


Online: [ GUEST03 GUEST04 ]

 dummy  (ocf::pacemaker:Dummy): Started GUEST03


for example, call crm shell with lower-case hostname.

[root@GUEST04 ~]# crm node standby guest03
ERROR: bad lifetime: guest03

"crm node standby GUEST03" surely works well,
so crm shell just doesn't take into account the hostname conversion.
It's better to accept the both of the upper/lower-case.

"node standby", "node delete", "resource migrate(move)"  get hit with this
issue.
Please see the attached.

Thanks,
Junko


ignorecase.patch
Description: Binary data
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [PATCH] handle idmapd using nfsserver RA

2012-05-30 Thread Junko IKEDA
Hi,

My previous patch had a spelling error, revise it just a bit.

Thanks,
Junko

2012/5/30 Junko IKEDA :
> Hi,
>
> I am trying to setup NFSv4 server using nfsserver RA,
> and adding some handlings for rpc.idmad.
> http://linux.die.net/man/8/rpc.idmapd
>
> Please see the attached.
>
> /etc/init.d/nfs script which is included with RHEL6.2 starts idmad
> during its starting process.
>
> # /etc/init.d/nfs start
> Starting NFS services:                                     [  OK  ]
> Starting NFS quotas:                                       [  OK  ]
> Starting NFS daemon:                                       [  OK  ]
> Starting NFS mountd:                                       [  OK  ]
> Starting RPC idmapd :                                       [  OK  ]
>
> But nfs init script does not stop idmapd, so I need this patch for now.
> How's that for the other distribution?
>
> # /etc/init.d/nfs stop
> Shutting down NFS mountd:                                  [  OK  ]
> Shutting down NFS daemon:                                  [  OK  ]
> Shutting down NFS quotas:                                  [  OK  ]
> Shutting down NFS services:                                [  OK  ]
>
> # /etc/init.d/rpcidmapd status
> rpc.idmapd (pid 17450) is running...
>
> Thanks,
> Junko IKEDA
>
> NTT DATA INTELLILINK CORPORATION


nfsserver.patch
Description: Binary data
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] [PATCH] handle idmapd using nfsserver RA

2012-05-30 Thread Junko IKEDA
Hi,

I am trying to setup NFSv4 server using nfsserver RA,
and adding some handlings for rpc.idmad.
http://linux.die.net/man/8/rpc.idmapd

Please see the attached.

/etc/init.d/nfs script which is included with RHEL6.2 starts idmad
during its starting process.

# /etc/init.d/nfs start
Starting NFS services: [  OK  ]
Starting NFS quotas:   [  OK  ]
Starting NFS daemon:   [  OK  ]
Starting NFS mountd:   [  OK  ]
Starting RPC idmapd :   [  OK  ]

But nfs init script does not stop idmapd, so I need this patch for now.
How's that for the other distribution?

# /etc/init.d/nfs stop
Shutting down NFS mountd:  [  OK  ]
Shutting down NFS daemon:  [  OK  ]
Shutting down NFS quotas:  [  OK  ]
Shutting down NFS services:[  OK  ]

# /etc/init.d/rpcidmapd status
rpc.idmapd (pid 17450) is running...

Thanks,
Junko IKEDA

NTT DATA INTELLILINK CORPORATION


nfsserver.patch
Description: Binary data
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [PATCH] nfsserver RA : add check statement to start function

2012-05-16 Thread Junko IKEDA
Hi,

Thank you for your quick response!

> This one seems to be missing. Or is it covered now by the monitor
> test?

nfsserver_start () can now return $OCF_SUCCESS if it detects that nfs
server is already started.
This "ocf_log debug", which complains about the argument, will not be
called anymore because RA has already exited.

Thanks,
Junko
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] [PATCH] nfsserver RA : add check statement to start function

2012-05-15 Thread Junko IKEDA
Hi,

These are some small patches for nfsserver RA.

(1) nfsserver-validate-all.patch

nfsserver_validate() is called at line 254 and 263,
https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/nfsserver#L231

it means each operation(start/monitor/stop) calls nfsserver_validate() twice.
It's not harmfulness but just a little annoying.

I used mysql RA as a reference and modified the above.
https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/mysql#L1059

(2) nfsserver-check-start.patch

If nfs service is already started befor RA's start,
https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/nfsserver#L172

fn=`mktemp`
${OCF_RESKEY_nfs_init_script} start > $fn 2>&1
rc=$?
ocf_log debug `cat $fn`
rm -f $fn

${OCF_RESKEY_nfs_init_script}, we call "/etc/init.d/nfs" in our case,
wouldn't print any stdout/stderr,
so "ocf_log debug" complains;
Not enough arguments [1] to ocf_log.

I added a check statement for this.
Please see the attached.

Regards,
Junko IKEDA

NTT DATA INTELLILINK CORPORATION


nfsserver-validate-all.patch
Description: Binary data


nfsserver-check-start.patch
Description: Binary data
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [PATCH] Filesystem RA: remove a status file only when OCF_CHECK_LEVEL is set as 20

2012-05-13 Thread Junko IKEDA
Hi,

Is my case hard to understand?
"multipath" means the Fibre Channels, there are two cables for redundancy.

Thanks,
Junko

2012/5/9 Junko IKEDA :
> Hi,
>
> In my case, the umount succeed when the Fibre Channels is disconnected,
> so it seemed that the handling status file caused a longer failover,
> as Dejan said.
> If the umount fails, it will go into a timeout, might call stonith
> action, and this case also makes sense (though I couldn't see this).
>
> I tried the following setup;
>
> (1) timeout : multipath > RA
> multipath timeout = 120s
> Filesystem RA stop timeout = 60s
>
> (2) timeout : multipath < RA
> multipath timeout = 60s
> Filesystem RA stop timeout = 120s
>
> case (1), Filesystem_stop() fails. The hanging FC causes the stop timeout.
>
> case (2), Filesystem_stop() succeeds.
> Filesystem is hanging out, but line 758 and 759 succeed(rc=0).
> The status file is no more inaccessible, so it remains on the
> filesystem, in fact.
>
>> > 758 if [ -f "$STATUSFILE" ]; then
>> > 759 rm -f ${STATUSFILE}
>> > 760 if [ $? -ne 0 ]; then
>
> so, the line 761 might not be called as expected.
>
>> > 761 ocf_log warn "Failed to remove status file ${STATUSFILE}."
>
>
> By the way, my concern is the unexpected stop timeout and the longer
> fail over time,
> if OCF_CHECK_LEVEL is set as 20, it would be better to try remove its
> status file just in case.
> It can handle the case (2) if the user wants to recover this case with 
> STONITH.
>
>
> Thanks,
> Junko
>
> 2012/5/8 Dejan Muhamedagic :
>> Hi Lars,
>>
>> On Tue, May 08, 2012 at 01:35:16PM +0200, Lars Marowsky-Bree wrote:
>>> On 2012-05-08T12:08:27, Dejan Muhamedagic  wrote:
>>>
>>> > > In the default (without OCF_CHECK_LEVE), it's enough to try unmount
>>> > > the file system, isn't it?
>>> > > https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/Filesystem#L774
>>> >
>>> > I don't see a need to remove the STATUSFILE at all, as that may
>>> > (and as you observed it) prevent the filesystem from stopping.
>>> > Perhaps to skip it altogether? If nobody objects let's just
>>> > remove this code:
>>> >
>>> >  758         if [ -f "$STATUSFILE" ]; then
>>> >  759             rm -f ${STATUSFILE}
>>> >  760             if [ $? -ne 0 ]; then
>>> >  761                 ocf_log warn "Failed to remove status file 
>>> > ${STATUSFILE}."
>>> >  762             fi
>>> >  763         fi
>>>
>>> That would mean you can no longer differentiate between a "crash" and a
>>> clean unmount.
>>
>> One could take a look at the logs. I guess that a crash would
>> otherwise be noticeable as well :)
>>
>>> A hanging FC/SAN is likely to be unable to flush any other dirty buffers
>>> too, as well, so the umount may not necessarily succeed w/o errors. I
>>> think it's unreasonable to expect that the node will survive such a
>>> scenario w/o recovery.
>>
>> True. However, in case of network attached storage or other
>> transient errors it may lead to an unnecessary timeout followed
>> by fencing, i.e. the chance for a longer failover time is higher.
>> Just leaving a file around may not justify the risk.
>>
>> Junko-san, what was your experience?
>>
>> Cheers,
>>
>> Dejan
>>
>>> Regards,
>>>     Lars
>>>
>>> --
>>> Architect Storage/HA
>>> SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, 
>>> HRB 21284 (AG Nürnberg)
>>> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
>>>
>>> ___
>>> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
>>> Home Page: http://linux-ha.org/
>> ___
>> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
>> Home Page: http://linux-ha.org/
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [PATCH] Filesystem RA: remove a status file only when OCF_CHECK_LEVEL is set as 20

2012-05-08 Thread Junko IKEDA
Hi,

In my case, the umount succeed when the Fibre Channels is disconnected,
so it seemed that the handling status file caused a longer failover,
as Dejan said.
If the umount fails, it will go into a timeout, might call stonith
action, and this case also makes sense (though I couldn't see this).

I tried the following setup;

(1) timeout : multipath > RA
multipath timeout = 120s
Filesystem RA stop timeout = 60s

(2) timeout : multipath < RA
multipath timeout = 60s
Filesystem RA stop timeout = 120s

case (1), Filesystem_stop() fails. The hanging FC causes the stop timeout.

case (2), Filesystem_stop() succeeds.
Filesystem is hanging out, but line 758 and 759 succeed(rc=0).
The status file is no more inaccessible, so it remains on the
filesystem, in fact.

> > 758 if [ -f "$STATUSFILE" ]; then
> > 759 rm -f ${STATUSFILE}
> > 760 if [ $? -ne 0 ]; then

so, the line 761 might not be called as expected.

> > 761 ocf_log warn "Failed to remove status file ${STATUSFILE}."


By the way, my concern is the unexpected stop timeout and the longer
fail over time,
if OCF_CHECK_LEVEL is set as 20, it would be better to try remove its
status file just in case.
It can handle the case (2) if the user wants to recover this case with STONITH.


Thanks,
Junko

2012/5/8 Dejan Muhamedagic :
> Hi Lars,
>
> On Tue, May 08, 2012 at 01:35:16PM +0200, Lars Marowsky-Bree wrote:
>> On 2012-05-08T12:08:27, Dejan Muhamedagic  wrote:
>>
>> > > In the default (without OCF_CHECK_LEVE), it's enough to try unmount
>> > > the file system, isn't it?
>> > > https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/Filesystem#L774
>> >
>> > I don't see a need to remove the STATUSFILE at all, as that may
>> > (and as you observed it) prevent the filesystem from stopping.
>> > Perhaps to skip it altogether? If nobody objects let's just
>> > remove this code:
>> >
>> >  758         if [ -f "$STATUSFILE" ]; then
>> >  759             rm -f ${STATUSFILE}
>> >  760             if [ $? -ne 0 ]; then
>> >  761                 ocf_log warn "Failed to remove status file 
>> > ${STATUSFILE}."
>> >  762             fi
>> >  763         fi
>>
>> That would mean you can no longer differentiate between a "crash" and a
>> clean unmount.
>
> One could take a look at the logs. I guess that a crash would
> otherwise be noticeable as well :)
>
>> A hanging FC/SAN is likely to be unable to flush any other dirty buffers
>> too, as well, so the umount may not necessarily succeed w/o errors. I
>> think it's unreasonable to expect that the node will survive such a
>> scenario w/o recovery.
>
> True. However, in case of network attached storage or other
> transient errors it may lead to an unnecessary timeout followed
> by fencing, i.e. the chance for a longer failover time is higher.
> Just leaving a file around may not justify the risk.
>
> Junko-san, what was your experience?
>
> Cheers,
>
> Dejan
>
>> Regards,
>>     Lars
>>
>> --
>> Architect Storage/HA
>> SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, 
>> HRB 21284 (AG Nürnberg)
>> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
>>
>> ___
>> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
>> Home Page: http://linux-ha.org/
> ___
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] [PATCH] Filesystem RA: remove a status file only when OCF_CHECK_LEVEL is set as 20

2012-05-08 Thread Junko IKEDA
Hi,

This is a small patch for Filesystem RA.

When we mount a shared storage without OCF_CHECK_LEVEL parameter,
Filesystem_stop() has possibly to cause an unexpected timeout.

For example;
(1) mount the shared storage without OCF_CHECK_LEVEL
(2) disconnect Fibre Channels
(3) service heartbeat stop

When Filesystem_stop() is called, it tries to remove the STATUSFILE on
the shared storage.
STATUSFILE is only created when OCF_CHECK_LEVEL is set as 20,
RA can not access it and time-out.
https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/Filesystem#L756

In the default (without OCF_CHECK_LEVE), it's enough to try unmount
the file system, isn't it?
https://github.com/ClusterLabs/resource-agents/blob/master/heartbeat/Filesystem#L774

Regards,
Junko IKEDA

NTT DATA INTELLILINK CORPORATION


Filesystem.patch
Description: Binary data
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [PATCH] named RA: support IPv6

2012-01-16 Thread Junko IKEDA
Hi,

Thank you for pointing that out!

Regards,
Junko IKEDA

2012/1/17 Dejan Muhamedagic :
> On Mon, Jan 16, 2012 at 03:10:14PM +0100, Dejan Muhamedagic wrote:
>> On Sat, Jan 14, 2012 at 12:32:20PM +0100, Lars Ellenberg wrote:
>> > On Mon, Jan 09, 2012 at 05:50:14PM +0100, Dejan Muhamedagic wrote:
>> > > Hi Serge,
>> > >
>> > > On Mon, Jan 09, 2012 at 09:11:43AM -0700, Serge Dubrouski wrote:
>> > > > I did a couple of weeks ago :-)
>> > >
>> > > Hmm, me completely missed it. Sorry about that. Will apply the
>> > > patch. Many thanks to Junko for the contribution.
>> >
>> > Hm. I apparently missed this, too.
>> >
>> > -    if [ $? -ne 0 ] || ! echo $output | grep -q '.* has address 
>> > '"$OCF_RESKEY_monitor_response"
>> > +    if [ $? -ne 0 ] || ! echo $output | egrep -q '.* has |IPv6 address 
>> > '"$OCF_RESKEY_monitor_response"
>> >
>> > Not good.
>> >
>> > Should be
>> > +    if [ $? -ne 0 ] || ! echo $output | grep -q '.* \(has\|IPv6\) address 
>> > '"$OCF_RESKEY_monitor_response"
>> >
>> > Why?
>> > Because otherwise, as long as the resonse contains " has ", it
>> > would match, and $OCF_RESKEY_monitor_response would be ignored.
>>
>> Right.
>>
>> > And, using egrep (or grep -E) would also change how
>> > $OCF_RESKEY_monitor_response would be interpreted,
>> > so could in theory break existing configurations,
>> > if they use grep special chars.
>> > If you consider this as unlikely, do
>>
>> I guess it is.
>>
>> > +    if [ $? -ne 0 ] || ! echo $output | grep -q -E '.* (has|IPv6) address 
>> > '"$OCF_RESKEY_monitor_response"
>>
>> But quoting Junko's example:
>>
>> orange.kame.net has address 203.178.141.194
>> orange.kame.net has IPv6 address 2001:200:dff:fff1:216:3eff:feb1:44d7
>>
>> it should be:
>>
>> +    if [ $? -ne 0 ] || ! echo $output | grep -q -E '.* has (IPv6 )? address 
>> '"$OCF_RESKEY_monitor_response"
>>
>> But I guess that it would be safe to do this as well (and reduce
>> probability of regression):
>>
>> +    if [ $? -ne 0 ] || ! echo $output | grep -q '.* has .*address 
>> '"$OCF_RESKEY_monitor_response"
>
> This is what I applied today. Please speak up if there are any
> objections.
>
> Thanks,
>
> Dejan
>
>> Cheers,
>>
>> Dejan
>>
>> P.S. And many thanks for taking a closer look!
>>
>> > > Thanks,
>> > >
>> > > Dejan
>> > >
>> > > >  On Jan 9, 2012 8:00 AM, "Dejan Muhamedagic"  wrote:
>> > > >
>> > > > > Hi Junko-san,
>> > > > >
>> > > > > On Tue, Dec 13, 2011 at 04:32:07PM +0900, Junko IKEDA wrote:
>> > > > > > Hi Serge,
>> > > > > >
>> > > > > > We are now investigating the support status of ocf RAs,
>> > > > > > and this is the issue for named.
>> > > > > >
>> > > > > > Here is the example output of host command;
>> > > > > >
>> > > > > > # host www.kame.net
>> > > > > > www.kame.net is an alias for orange.kame.net.
>> > > > > > orange.kame.net has address 203.178.141.194
>> > > > > > orange.kame.net has IPv6 address 
>> > > > > > 2001:200:dff:fff1:216:3eff:feb1:44d7
>> > > > > >
>> > > > > > named_monitor() searches its named server with
>> > > > > $OCF_RESKEY_monitor_response.
>> > > > > > I'm not familiar with named's behavior,
>> > > > > > is it possible to set IPv6 to $OCF_RESKEY_monitor_response?
>> > > > > > If $OCF_RESKEY_monitor_response has IPv6 address,
>> > > > > > the following syntax can not hit the result, right?
>> > > > >
>> > > > > The patch looks OK to me. Serge, can you also ack please?
>> > > > >
>> > > > > Cheers,
>> > > > >
>> > > > > Dejan
>> > > > >
>> > > > > > named_monitor()
>> > > > > >
>> > > > > > output=`$OCF_RESKEY_host $OCF_RESKEY_monitor_request
>> > > > > $OCF_RE

[Linux-ha-dev] [PATCH] named RA: support IPv6

2011-12-12 Thread Junko IKEDA
Hi Serge,

We are now investigating the support status of ocf RAs,
and this is the issue for named.

Here is the example output of host command;

# host www.kame.net
www.kame.net is an alias for orange.kame.net.
orange.kame.net has address 203.178.141.194
orange.kame.net has IPv6 address 2001:200:dff:fff1:216:3eff:feb1:44d7

named_monitor() searches its named server with $OCF_RESKEY_monitor_response.
I'm not familiar with named's behavior,
is it possible to set IPv6 to $OCF_RESKEY_monitor_response?
If $OCF_RESKEY_monitor_response has IPv6 address,
the following syntax can not hit the result, right?

named_monitor()

output=`$OCF_RESKEY_host $OCF_RESKEY_monitor_request $OCF_RESKEY_monitor_ip`
if [ $? -ne 0 ] || ! echo $output | grep -q '.* has address
'"$OCF_RESKEY_monitor_response"

Would you please give me some advice?

Regards,
Junko IKEDA

NTT DATA INTELLILINK CORPORATION


named_ipv6.patch
Description: Binary data
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [PATCH] add the new parameter for replication network in mysql RA

2011-11-21 Thread Junko IKEDA
Hi Raoul,

Thank you for your comments!

> this method should leave the slave be if the master did not change
> since the last sync. consider:
>   crm node standby node02; crm node online node02
>
> the slave should pick up where it left using mysql's own way of saving
> the last replication information to master.info.
>
> so this must somehow be adapted too right?

Yes, I  got the same situation. :(

> here the ra tries to restore the master information from the cib.
> (this information is put there via unset_master(), see below )
>
> i think this part has to be moved up so that the outlined issues can be
> handled.
>
> moreover, setting the variable master_host might influence how the
> script works outside of the method (master_host is a global variable,
> if i'm not mistaken)

Yes, due to my carelessness, master_host is a global variable, so we
should change its name.

> consider this code:
>> unset_master(){
> in our case, this would save node01-mysqlrepl into the cib, right?

I see, I didn't consider information about CIB.

> this is the first shot at trying out the proposed patches.
> maybe it is sufficient to
>
> a) restructure set_master() a little bit and
> b) to modify get_slave_info() to strip the replication suffix from
>   the running config?

You are right.
I will revise the patch.

Thanks,
Junko
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [PATCH] prevent Slave promotion in mysql RA

2011-11-14 Thread Junko IKEDA
Hi Marek, Florian,

Thank you for your comments!

>> Did you set "evict_outdated_slaves"?

No,

> If set to false (the default), then the slave will be allowed to stay in
> the cluster, but its master preference will be pushed down so it's not
> promoted, and this seems to be Ikeda-san's preferred behavior. The
> caveat which I mentioned in my other email in this thread applies here,
> though.

Yes, this is what I expected.
I want to re-start the replication after re-connecting the network.

By the way, I have changed that attribute score from "0" to
"-INFINITY" with Pacemaker 1.0,
but there are still some problems... :(
I will gather the logs and post them here soon.

Thanks,
Junko
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [PATCH] add the new parameter for replication network in mysql RA

2011-11-14 Thread Junko IKEDA
Hi Raoul,

Sure, thanks!

Regards,
Junko

2011/11/14 Raoul Bhatia [IPAX] :
> hello junko-san!
>
> i propose the following documentation update to clarify the parameter's
> usage.
>
>> 
>> 
>> A hostname suffix that will be added when setting the MySQL replication
>> master. This enables the use of a seperate replication subnet/link.
>>
>> For example, suppose the following configuration:
>>
>> # uname -n
>> node01
>>
>> # cat /etc/hosts
>> 192.168.100.101 node01
>> 192.168.100.102 node02
>>
>> 192.168.200.101 node01-mysqlrep
>> 192.168.200.102 node02-mysqlrep
>>
>> Normally, the replication will be done via the subnet 192.168.100.x.
>>
>> Setting "replication_hostname_suffix=-mysqlrep" will move all
>> replication related traffic to the subnet 192.168.200.x.
>> 
>> MySQL replication master hostname suffix
>> 
>> 
>
> would you agree with that?
>
> thanks,
> raoul
> --
> 
> DI (FH) Raoul Bhatia M.Sc.          email.          r.bha...@ipax.at
> Technischer Leiter
>
> IPAX - Aloy Bhatia Hava OG          web.          http://www.ipax.at
> Barawitzkagasse 10/2/2/11           email.            off...@ipax.at
> 1190 Wien                           tel.               +43 1 3670030
> FN 277995t HG Wien                  fax.            +43 1 3670030 15
> 
>
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [PATCH] add the new parameter for replication network in mysql RA

2011-11-14 Thread Junko IKEDA
Hi,

sorry, agein.
My previous patch was wrong.
I attached the new one.

Thanks,
Junko

2011/11/11 Junko IKEDA :
> Hi,
>
> The current mysql RA, it set hostname (= uname -n) as its replication network,
> but I have the following restriction.
>
> # uname -n
> node01
>
> # cat /etc/hosts
> 192.168.100.101 node01 # maintenance LAN for node01
> 192.168.100.101 node02 # maintenance LAN for node02
>
> 192.168.200.101 # replication LAN for node01
> 192.168.200.101 # replication LAN for node02
>
> It means, RA will set "192.168.100.0" as the replication LAN,
> but I want to set "192.168.200.0".
> so, I tried to add the new parameter to mysql RA.
> Please see the attached.
>
> or is there any good way to work out this without the above patch?
>
> Regards,
> Junko IKEDA
>
> NTT DATA INTELLILINK CORPORATION
>


mysql-replication_hostname_suffix.patch
Description: Binary data
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] [PATCH] add the new parameter for replication network in mysql RA

2011-11-11 Thread Junko IKEDA
Hi,

The current mysql RA, it set hostname (= uname -n) as its replication network,
but I have the following restriction.

# uname -n
node01

# cat /etc/hosts
192.168.100.101 node01 # maintenance LAN for node01
192.168.100.101 node02 # maintenance LAN for node02

192.168.200.101 # replication LAN for node01
192.168.200.101 # replication LAN for node02

It means, RA will set "192.168.100.0" as the replication LAN,
but I want to set "192.168.200.0".
so, I tried to add the new parameter to mysql RA.
Please see the attached.

or is there any good way to work out this without the above patch?

Regards,
Junko IKEDA

NTT DATA INTELLILINK CORPORATION


mysql-replication_hostname_suffix.patch
Description: Binary data
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [PATCH] change the monitor log level of mysql RA

2011-11-11 Thread Junko IKEDA
Hi,

Sorry, I missed the following pull request.
https://github.com/raoulbhatia/resource-agents/commit/8a87f1b6d410d63e04b5ed29ea20d1abe61bc59c

My post was duplication.
Other pull requests are also pending, please check them :)
https://github.com/ClusterLabs/resource-agents/pull/28

Thanks,
Junko

2011/11/11 Junko IKEDA :
> Hi,
>
> I'm now trying to run MySQL replication setting with Pacemaker.
> When I set "op monitor OCF_CHECK_LEVEL=10",
> mysql RA writes down the following log messages every monitor interval.
>
> * Master
> Nov 11 14:19:51 dl380g5c mysql[18766]: INFO: COUNT(*) 3
> Nov 11 14:19:51 dl380g5c mysql[18766]: INFO: MySQL monitor succeeded (master)
> Nov 11 14:20:01 dl380g5c mysql[18804]: INFO: COUNT(*) 3
> Nov 11 14:20:01 dl380g5c mysql[18804]: INFO: MySQL monitor succeeded (master)
> Nov 11 14:20:11 dl380g5c mysql[18842]: INFO: COUNT(*) 3
> Nov 11 14:20:11 dl380g5c mysql[18842]: INFO: MySQL monitor succeeded (master)
> Nov 11 14:20:21 dl380g5c mysql[18880]: INFO: COUNT(*) 3
>
> * Slave
> Nov 11 14:21:55 dl380g5d mysql[28357]: INFO: MySQL instance running as
> a replication slave
> Nov 11 14:21:55 dl380g5d mysql[28357]: INFO: COUNT(*) 3
> Nov 11 14:21:55 dl380g5d mysql[28357]: INFO: MySQL monitor succeeded
>
> It's just a little noisy.
> I think there is no problem if we change these log level from "info" to 
> "debug".
> Please see attached.
>
> Regards,
> Junko IKEDA
>
> NTT DATA INTELLILINK CORPORATION
>
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] [PATCH] prevent Slave promotion in mysql RA

2011-11-11 Thread Junko IKEDA
Hi,

I am running MySQL replication setting with 2 nodes Master/Slave configuration.
If Slave status(secs_behind) is lager than Master's parameter(max_slave_lag),
Slave data is outdated, right?
check_slave() in mysql RA would run "crm_master -v 0" in this
situation to mark Slave as "outdated",
but if Master is shut down in this status,
Slave will be able to promote instead of its old data.
(is this correct?)
It seems that "crm_master -v -INFINITY" is effectual to prevent Slave promotion.

check_slave() {
# Checks slave status



elif ocf_is_ms; then
# Even if we're not set to evict lagging slaves, we can
# still use the seconds behind master value to set our
# master preference.
local master_pref
master_pref=$((${OCF_RESKEY_max_slave_lag}-${secs_behind}))
if [ $master_pref -lt 0 ]; then
# Sanitize a below-zero preference to just zero
master_pref=0

fi
$CRM_MASTER -v $master_pref
fi


I'm less familiar with the replication behavior,
please advise me how to do it.

Regards,
Junko IKEDA

NTT DATA INTELLILINK CORPORATION


mysql-preference.patch
Description: Binary data
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] [PATCH] change the monitor log level of mysql RA

2011-11-11 Thread Junko IKEDA
Hi,

I'm now trying to run MySQL replication setting with Pacemaker.
When I set "op monitor OCF_CHECK_LEVEL=10",
mysql RA writes down the following log messages every monitor interval.

* Master
Nov 11 14:19:51 dl380g5c mysql[18766]: INFO: COUNT(*) 3
Nov 11 14:19:51 dl380g5c mysql[18766]: INFO: MySQL monitor succeeded (master)
Nov 11 14:20:01 dl380g5c mysql[18804]: INFO: COUNT(*) 3
Nov 11 14:20:01 dl380g5c mysql[18804]: INFO: MySQL monitor succeeded (master)
Nov 11 14:20:11 dl380g5c mysql[18842]: INFO: COUNT(*) 3
Nov 11 14:20:11 dl380g5c mysql[18842]: INFO: MySQL monitor succeeded (master)
Nov 11 14:20:21 dl380g5c mysql[18880]: INFO: COUNT(*) 3

* Slave
Nov 11 14:21:55 dl380g5d mysql[28357]: INFO: MySQL instance running as
a replication slave
Nov 11 14:21:55 dl380g5d mysql[28357]: INFO: COUNT(*) 3
Nov 11 14:21:55 dl380g5d mysql[28357]: INFO: MySQL monitor succeeded

It's just a little noisy.
I think there is no problem if we change these log level from "info" to "debug".
Please see attached.

Regards,
Junko IKEDA

NTT DATA INTELLILINK CORPORATION


mysql-log.patch
Description: Binary data
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [PATCH] specify the full path for ipmi command

2011-09-26 Thread Junko IKEDA
Hi Dejan,

Many thanks!
Can I get it from http://hg.linux-ha.org/glue/ ?

Regards,
Junko

2011/9/22 Dejan Muhamedagic :
> Hi Junko-san,
>
> On Wed, Aug 17, 2011 at 10:22:40AM +0900, Junko IKEDA wrote:
>> Hi Dejan,
>>
>> Thank you for your reply!
>> I attached the revised patch.
>
> Just applied your patch. Sorry for the delay, this post got
> somehow lost.
>
> Cheers,
>
> Dejan
>
>> >> http://www.gossamer-threads.com/lists/linuxha/pacemaker/74350
>> >
>> > I don't see the connection between the two.
>>
>> I am trying to use "/tmp/ipmitool" command for some tests,
>> and add its path for root.
>> so $PATH for root is here;
>>
>> # echo $PATH
>> /tmp:/usr/kerberos/sbin:/usr/kerberos/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin
>>
>> # which ipmitool
>> /tmp/ipmitool
>>
>> but external/ipmi could not search "/tmp/ipmitool".
>> It seems that external/ipmi set $PATH as
>> "/usr/share/cluster-glue:/sbin:/usr/sbin:/bin:/usr/bin".
>>
>> I tought that it might be simple to specify its full path at crm configure,
>> so I made the above patch.
>>
>> Thanks,
>> Junko
>
>
>> ___
>> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
>> Home Page: http://linux-ha.org/
>
> ___
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
>
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [PATCH] specify the full path for ipmi command

2011-08-16 Thread Junko IKEDA
Hi Dejan,

Thank you for your reply!
I attached the revised patch.

>> http://www.gossamer-threads.com/lists/linuxha/pacemaker/74350
>
> I don't see the connection between the two.

I am trying to use "/tmp/ipmitool" command for some tests,
and add its path for root.
so $PATH for root is here;

# echo $PATH
/tmp:/usr/kerberos/sbin:/usr/kerberos/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin

# which ipmitool
/tmp/ipmitool

but external/ipmi could not search "/tmp/ipmitool".
It seems that external/ipmi set $PATH as
"/usr/share/cluster-glue:/sbin:/usr/sbin:/bin:/usr/bin".

I tought that it might be simple to specify its full path at crm configure,
so I made the above patch.

Thanks,
Junko


ipmi.patch
Description: Binary data
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] [PATCH] specify the full path for ipmi command

2011-08-15 Thread Junko IKEDA
Hi,

I attached a small patch for external/ipmi to specify the full path of
"ipmitool"
There might be a much better way, for example, refer to user's $PATH,
but I encountered the following issue.
http://www.gossamer-threads.com/lists/linuxha/pacemaker/74350

Would you please give me some advices?

Best Regards,
Junko IKEDA

NTT DATA INTELLILINK CORPORATION


ipmi.patch
Description: Binary data
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [PATCH]add sfex_init man to .spec

2011-06-22 Thread Junko IKEDA
Hi,

Fabio has posted this man page.
sfex_init command has only a few options,
so it is enough for now.

Thanks,
Junko

2011/6/22 Dejan Muhamedagic :
> Hi again,
>
> On Wed, Jun 22, 2011 at 02:10:21PM +0900, Junko IKEDA wrote:
>> Hi,
>>
>> The latest resource-agent has man page for sfex_init,
>
> The man page seems to be somewhat short. Did something go wrong
> with it?
>
> Cheers,
>
> Dejan
>
>> and I add it to .spec.
>> Please see the attached patch.
>>
>> Best Regard,
>> Junko IKEDA
>>
>> NTT DATA INTELLILINK CORPORATION
>
>
>> ___
>> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
>> Home Page: http://linux-ha.org/
>
> ___
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
>
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [PATCH]modify description for ethmonitor RA

2011-06-22 Thread Junko IKEDA
Hi Dejan,

Thank you for your quick reply!

I am trying the latest one(commit:20b2773d013d600486c0)
and encounter the other error like this;

# git clone http://github.com/ClusterLabs/resource-agents/
# cd resource-agents/

# ./autogen.sh
# ./configure --prefix=/usr --localstatedir=/var --sysconfdir=/etc
--with-ras-set=linux-ha
# make rpm

rpmbuild --define "_sourcedir
/root/Desktop/work/20110622/resource-agents" --define "_specdir
/root/Desktop/work/20110622/resource-agents" --define "_builddir
/root/Desktop/work/20110622/resource-agents" --define "_srcrpmdir
/root/Desktop/work/20110622/resource-agents" --define "_rpmdir
/root/Desktop/work/20110622/resource-agents" -ba resource-agents.spec
error: File 
/root/Desktop/work/20110622/resource-agents/resource-agents-3.1.9.9-20b27.tar.bz2:
No such file or directory
make: *** [rpm] Error 1


# grep -R "3.1.9" .
Binary file 
./.git/objects/pack/pack-d1760dd14f85ad98b7060f9a0fb09b8d5defda42.pack
matches
./.git/packed-refs:400de371394ded696629bb3d0380d0c0fb0ef0aa
refs/tags/agents-1.0.4-rc
./resource-agents.spec:Version: 3.1.9
./resource-agents.spec:* Wed Jun 22 2011 Autotools generated version
 - 3.1.9-1-9.20b27.


Where does 3.1.9 come from?
%version% macro in .spec should get converted to 3.9.1.

Thanks,
Junko

2011/6/22 Dejan Muhamedagic :
> Hi Junko-san,
>
> On Wed, Jun 22, 2011 at 02:12:53PM +0900, Junko IKEDA wrote:
>> Hi,
>>
>> I posted this into the General List,
>> I'm sorry to bother you.
>>
>>  I got the latest resource-agents 3.9.1(commit
>>  b8c0487f68978a1857e0eb1a137f61880cb435e6),
>>  and found that ethmonitor RA had something sensitive description.
>
> Yes, entirely my fault.
>
>>  Please see the attached patch.
>
> Many thanks!
>
> Dejan
>
>>
>>  # git clone http://github.com/ClusterLabs/resource-agents/
>>  # cd resource-agents/
>>  # ./autogen.sh
>>  # ./configure --prefix=/usr --localstatedir=/var --sysconfdir=/etc
>>  --with-ras-set=linux-ha
>>  # make
>>
>>  /usr/bin/xsltproc --novalid \
>>         --stringparam package resource-agents \
>>         --stringparam version 3.9.1.7-b8c0 \
>>         --output ocf_heartbeat_ethmonitor.xml \
>>         ./ra2refentry.xsl metadata-ethmonitor.xml
>>  metadata-ethmonitor.xml:41: parser error : Opening and ending tag
>>  mismatch: interface_name line 40 and longdesc
>>  
>>            ^
>>  metadata-ethmonitor.xml:44: parser error : Opening and ending tag
>>  mismatch: longdesc line 39 and parameter
>>  
>>             ^
>>  metadata-ethmonitor.xml:104: parser error : expected '>'
>>  
>>            ^
>>  metadata-ethmonitor.xml:113: parser error : Opening and ending tag
>>  mismatch: parameters line 29 and resource-agent
>>  
>>                  ^
>>  metadata-ethmonitor.xml:114: parser error : Premature end of data in
>>  tag resource-agent line 3
>>
>>  ^
>>  unable to parse metadata-ethmonitor.xml
>>  gmake[2]: *** [ocf_heartbeat_ethmonitor.xml] Error 6
>>  rm metadata-LVM.xml metadata-Pure-FTPd.xml metadata-SAPDatabase.xml
>>  metadata-ManageRAID.xml metadata-CTDB.xml metadata-Xen.xml
>>  metadata-SAPInstance.xml metadata-Xinetd.xml metadata-Route.xml
>>  metadata-ICP.xml metadata-MailTo.xml metadata-SysInfo.xml
>>  metadata-Squid.xml metadata-IPaddr.xml metadata-Delay.xml
>>  metadata-SendArp.xml metadata-VirtualDomain.xml metadata-db2.xml
>>  metadata-AoEtarget.xml metadata-Stateful.xml metadata-ethmonitor.xml
>>  metadata-Dummy.xml metadata-ServeRAID.xml metadata-Evmsd.xml
>>  metadata-eDir88.xml metadata-VIPArip.xml metadata-IPsrcaddr.xml
>>  metadata-WinPopup.xml metadata-ClusterMon.xml metadata-Filesystem.xml
>>  metadata-SphinxSearchDaemon.xml metadata-WAS6.xml metadata-apache.xml
>>  metadata-AudibleAlarm.xml metadata-conntrackd.xml
>>  metadata-LinuxSCSI.xml metadata-EvmsSCC.xml metadata-IPaddr2.xml
>>  metadata-Raid1.xml metadata-WAS.xml metadata-drbd.xml
>>  metadata-ManageVE.xml metadata-anything.xml
>>  gmake[2]: Leaving directory 
>> `/root/Desktop/work/20110622/resource-agents/doc'
>>  gmake[1]: *** [all-recursive] Error 1
>>  gmake[1]: Leaving directory `/root/Desktop/work/20110622/resource-agents'
>>  make: *** [all] Error 2
>>
>>
>>  Best Regards,
>>  Junko IKEDA
>>
>>  NTT DATA INTELLILINK CORPORATION
>
>
>> ___
>> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
>> Home Page: http://linux-ha.org/
>
> ___
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
>
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] [PATCH]modify description for ethmonitor RA

2011-06-21 Thread Junko IKEDA
Hi,

I posted this into the General List,
I'm sorry to bother you.

 I got the latest resource-agents 3.9.1(commit
 b8c0487f68978a1857e0eb1a137f61880cb435e6),
 and found that ethmonitor RA had something sensitive description.
 Please see the attached patch.

 # git clone http://github.com/ClusterLabs/resource-agents/
 # cd resource-agents/
 # ./autogen.sh
 # ./configure --prefix=/usr --localstatedir=/var --sysconfdir=/etc
 --with-ras-set=linux-ha
 # make

 /usr/bin/xsltproc --novalid \
--stringparam package resource-agents \
--stringparam version 3.9.1.7-b8c0 \
--output ocf_heartbeat_ethmonitor.xml \
./ra2refentry.xsl metadata-ethmonitor.xml
 metadata-ethmonitor.xml:41: parser error : Opening and ending tag
 mismatch: interface_name line 40 and longdesc
 
   ^
 metadata-ethmonitor.xml:44: parser error : Opening and ending tag
 mismatch: longdesc line 39 and parameter
 
^
 metadata-ethmonitor.xml:104: parser error : expected '>'
 
   ^
 metadata-ethmonitor.xml:113: parser error : Opening and ending tag
 mismatch: parameters line 29 and resource-agent
 
 ^
 metadata-ethmonitor.xml:114: parser error : Premature end of data in
 tag resource-agent line 3

 ^
 unable to parse metadata-ethmonitor.xml
 gmake[2]: *** [ocf_heartbeat_ethmonitor.xml] Error 6
 rm metadata-LVM.xml metadata-Pure-FTPd.xml metadata-SAPDatabase.xml
 metadata-ManageRAID.xml metadata-CTDB.xml metadata-Xen.xml
 metadata-SAPInstance.xml metadata-Xinetd.xml metadata-Route.xml
 metadata-ICP.xml metadata-MailTo.xml metadata-SysInfo.xml
 metadata-Squid.xml metadata-IPaddr.xml metadata-Delay.xml
 metadata-SendArp.xml metadata-VirtualDomain.xml metadata-db2.xml
 metadata-AoEtarget.xml metadata-Stateful.xml metadata-ethmonitor.xml
 metadata-Dummy.xml metadata-ServeRAID.xml metadata-Evmsd.xml
 metadata-eDir88.xml metadata-VIPArip.xml metadata-IPsrcaddr.xml
 metadata-WinPopup.xml metadata-ClusterMon.xml metadata-Filesystem.xml
 metadata-SphinxSearchDaemon.xml metadata-WAS6.xml metadata-apache.xml
 metadata-AudibleAlarm.xml metadata-conntrackd.xml
 metadata-LinuxSCSI.xml metadata-EvmsSCC.xml metadata-IPaddr2.xml
 metadata-Raid1.xml metadata-WAS.xml metadata-drbd.xml
 metadata-ManageVE.xml metadata-anything.xml
 gmake[2]: Leaving directory `/root/Desktop/work/20110622/resource-agents/doc'
 gmake[1]: *** [all-recursive] Error 1
 gmake[1]: Leaving directory `/root/Desktop/work/20110622/resource-agents'
 make: *** [all] Error 2


 Best Regards,
 Junko IKEDA

 NTT DATA INTELLILINK CORPORATION


ethmonitor.patch
Description: Binary data
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] [PATCH]add sfex_init man to .spec

2011-06-21 Thread Junko IKEDA
Hi,

The latest resource-agent has man page for sfex_init,
and I add it to .spec.
Please see the attached patch.

Best Regard,
Junko IKEDA

NTT DATA INTELLILINK CORPORATION


sfex_init.patch
Description: Binary data
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] Translate crm_cli.txt to Japanese

2011-04-27 Thread Junko IKEDA
Hi,

> May I suggest that you go with the devel version, because
> crm_cli.txt was converted to crm.8.txt. There are not many
> textual changes, just some obsolete parts removed.

OK, I got "crm.8.txt" from devel.

Each directory structure for Pacemaker 1.0,1.1 and devel is just a bit
different.
Does 1.0 keep its doc dir structure for now?
If so, it seems that just create html file is not so difficult when
asciidoc is available.

Thanks,
Junko


crm_cli.patch
Description: Binary data
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] execute permission for exportfs RA

2010-04-22 Thread Junko IKEDA
Hi,

I tried to compile the latest agents package from mercurial repository,
but new exportfs RA complained about something like this;

# hg clone http://hg.linux-ha.org/agents/
# cd agents
# ./autogen.sh
# ./configure --localstatedir=/var --disable-fatal-warnings
# make

/bin/sh: ../heartbeat/exportfs: Permission denied

should exportfs RA have the execute permission?
or is there any compile option to avoid this error?

Thanks,
Junko
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] Fwd: Re: [PATCH] recovering from the online backup failure

2009-11-16 Thread Junko IKEDA

Hi,

I had done some tests for this patch,
and I could get the desired results.
I think this patch wouldn't affect the current usage.

Serge,
Thank you for your review!

Thanks,
Junko


--- Forwarded message ---
From: "Serge Dubrouski" 
To: "Junko IKEDA" 
Cc:
Subject: Re: [PATCH] recovering from the online backup failure
Date: Mon, 16 Nov 2009 09:59:38 +0900

Dejan -

Please apply this patch so it won't be lost.

Thanks.

On Wed, Nov 11, 2009 at 6:44 AM, Serge Dubrouski 
wrote:

Hello, Junko -

The patch is absolutely all right. Thanks for it.

Serge.

2009/11/10 Junko IKEDA :

Hi,

This issue is related to this list.
http://archives.postgresql.org/pgsql-hackers/2009-11/msg00024.php

PostgreSQL might become able to remove its lock file automatically in  
the

future,
but cluster software should be take care of it.
It seems that oracle RA handles this situation with running ora_cleanup
function which removes a lock file.

Serge,
What do you think of this patch?

Thanks,
Junko

On Tue, 10 Nov 2009 17:02:03 +0900, Junko IKEDA  


wrote:


Hi,

If some failures happen during the online backup of PostgreSQL,
pgsql can not handle the fail over,
because "backup_label", this is a file for a backup process of  
Postgres,

remains on the shared disk.
pgsql can not start DB if this file remains.
Please see the attached.

Thanks,
Junko







--
Serge Dubrouski.



pgsql.patch
Description: Binary data
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] [PATCH] recovering from the online backup failure

2009-11-10 Thread Junko IKEDA
Hi,

This issue is related to this list.
http://archives.postgresql.org/pgsql-hackers/2009-11/msg00024.php

PostgreSQL might become able to remove its lock file automatically in the  
future,
but cluster software should be take care of it.
It seems that oracle RA handles this situation with running ora_cleanup  
function which removes a lock file.

Serge,
What do you think of this patch?

Thanks,
Junko

On Tue, 10 Nov 2009 17:02:03 +0900, Junko IKEDA   
wrote:

> Hi,
>
> If some failures happen during the online backup of PostgreSQL,
> pgsql can not handle the fail over,
> because "backup_label", this is a file for a backup process of Postgres,  
> remains on the shared disk.
> pgsql can not start DB if this file remains.
> Please see the attached.
>
> Thanks,
> Junko

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] route del in IPaddr RA

2009-11-10 Thread Junko IKEDA

Hi Dejan,


Won't this part be too verbose?


At first I think so too,
but ping command doesn't log any messages to ha-log if it succeed.


@@ -717,7 +719,7 @@ ip_monitor() {

 PINGARGS="`pingargs $OCF_RESKEY_ip`"
  for j in 1 2 3 4 5 6 7 8 9 10; do
  -   if $PING $PINGARGS >/dev/null 2>&1 ; then
  +   if $PING $PINGARGS ; then



By the way, this is an alternative.

@@ -717,11 +719,13 @@ ip_monitor() {

 PINGARGS="`pingargs $OCF_RESKEY_ip`"
 for j in 1 2 3 4 5 6 7 8 9 10; do
-   if $PING $PINGARGS >/dev/null 2>&1 ; then
-   return $OCF_SUCCESS
-   fi
+MSG=`$PING $PINGARGS 2>&1`
+if [ $? = 0 ]; then
+return $OCF_SUCCESS
+fi
 done
-
+
+ocf_log err "$MSG"
 return $OCF_ERR_GENERIC
 }

Thanks,
Junko



On Mon, 09 Nov 2009 18:13:29 +0900, Junko IKEDA
 wrote:

>Hi,
>
>I wonder why IPaddr RA needs to run "route del" before it deletes the
>target interface.
>Does the old version of IPaddr contain "route add"?
>
>If "route del" fails, RA will be able to return $OCF_SUCCESS,
>but I feel a little strange when I see the error message from route
>command like this.
>
>lrmd[2576]: 2009/11/09_17:24:08 info: RA output:
>(prmIpPostgreSQLDB:stop:stderr) SIOCDELRT: No such process
>
>Please see the attached file. (IPaddr-1.patch)
>
>It might be possible to delete that line if LVS configuration
>doesn't also
>need it.
>See IPaddr-2.patch. Is it overkill?
>
>Thanks,
>Junko




___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


IPaddr.patch
Description: Binary data
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] [PATCH] recovering from the online backup failure

2009-11-10 Thread Junko IKEDA

Hi,

If some failures happen during the online backup of PostgreSQL,
pgsql can not handle the fail over,
because "backup_label", this is a file for a backup process of Postgres,  
remains on the shared disk.

pgsql can not start DB if this file remains.
Please see the attached.

Thanks,
Junko


pgsql.patch
Description: Binary data
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


Re: [Linux-ha-dev] route del in IPaddr RA

2009-11-09 Thread Junko IKEDA

Hi,

By the way, this is a really trivial thing,
I have some requests about logging messages of IPaddr.
Please see the modified attachment.

Thanks,
Junko

On Mon, 09 Nov 2009 18:13:29 +0900, Junko IKEDA   
wrote:



Hi,

I wonder why IPaddr RA needs to run "route del" before it deletes the
target interface.
Does the old version of IPaddr contain "route add"?

If "route del" fails, RA will be able to return $OCF_SUCCESS,
but I feel a little strange when I see the error message from route
command like this.

lrmd[2576]: 2009/11/09_17:24:08 info: RA output:
(prmIpPostgreSQLDB:stop:stderr) SIOCDELRT: No such process

Please see the attached file. (IPaddr-1.patch)

It might be possible to delete that line if LVS configuration doesn't  
also

need it.
See IPaddr-2.patch. Is it overkill?

Thanks,
Junko


IPaddr.patch
Description: Binary data
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] route del in IPaddr RA

2009-11-09 Thread Junko IKEDA

Hi,

I wonder why IPaddr RA needs to run "route del" before it deletes the  
target interface.

Does the old version of IPaddr contain "route add"?

If "route del" fails, RA will be able to return $OCF_SUCCESS,
but I feel a little strange when I see the error message from route  
command like this.


lrmd[2576]: 2009/11/09_17:24:08 info: RA output:  
(prmIpPostgreSQLDB:stop:stderr) SIOCDELRT: No such process


Please see the attached file. (IPaddr-1.patch)

It might be possible to delete that line if LVS configuration doesn't also  
need it.

See IPaddr-2.patch. Is it overkill?

Thanks,
Junko

IPaddr-1.patch
Description: Binary data


IPaddr-2.patch
Description: Binary data
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


RE: [Linux-ha-dev] xm dump-core from xen0

2009-03-18 Thread Junko IKEDA
Hi Dejan, Serge,

Thank you for your following up!

Regards,
Junko


> -Original Message-
> From: linux-ha-dev-boun...@lists.linux-ha.org
> [mailto:linux-ha-dev-boun...@lists.linux-ha.org] On Behalf Of Dejan
> Muhamedagic
> Sent: Thursday, March 19, 2009 4:56 AM
> To: High-Availability Linux Development List
> Subject: Re: [Linux-ha-dev] xm dump-core from xen0
> 
> Hi,
> 
> On Mon, Mar 16, 2009 at 07:22:02PM +0900, Junko IKEDA wrote:
> > Hi,
> >
> > I run the new xen0 on domU now,
> > and need an additional feature for a dump destination.
> >
> > I have RHEL5.2 x86_64 and xen 3.1.
> > This would dump the core file into "/var/lib/xen/dump" as default.
> >
> > Ex.)
> > # ls -lh /var/lib/xen/dump/
> > -rw--- 1 root root 1.1G  3?$B7n 16 15:21
> 2009-0316-1520.59-dom-d2.11.core
> > -rw--- 1 root root 1.1G  3?$B7n 16 15:21
> 2009-0316-1521.42-dom-d2.12.core
> > -rw--- 1 root root 1.1G  3?$B7n 16 15:42
> 2009-0316-1542.23-dom-d2.13.core
> > -rw--- 1 root root 1.1G  3?$B7n 16 15:43
> 2009-0316-1543.07-dom-d2.14.core
> >
> > It might be helpful if we can specify the dump destination from cib.xml.
> > Ex.)
> > # grep dump_dir cib.xml
> >   
> >   
> >
> > # ls -lh /var/log/dump
> > -rw--- 1 root root 1.1G  3?$B7n 16 18:51
2009-0316-1851.02-dom-d2.core
> > -rw--- 1 root root 1.1G  3?$B7n 16 18:51
2009-0316-1851.43-dom-d2.core
> >
> > Please see the attached.
> 
> Applied.
> 
> Cheers,
> 
> Dejan
> ___
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


RE: [Linux-ha-dev] xm dump-core from xen0

2009-03-16 Thread Junko IKEDA
Hi,

I run the new xen0 on domU now,
and need an additional feature for a dump destination.

I have RHEL5.2 x86_64 and xen 3.1.
This would dump the core file into "/var/lib/xen/dump" as default.

Ex.)
# ls -lh /var/lib/xen/dump/
-rw--- 1 root root 1.1G  3月 16 15:21 2009-0316-1520.59-dom-d2.11.core
-rw--- 1 root root 1.1G  3月 16 15:21 2009-0316-1521.42-dom-d2.12.core
-rw--- 1 root root 1.1G  3月 16 15:42 2009-0316-1542.23-dom-d2.13.core
-rw--- 1 root root 1.1G  3月 16 15:43 2009-0316-1543.07-dom-d2.14.core

It might be helpful if we can specify the dump destination from cib.xml.
Ex.)
# grep dump_dir cib.xml
  
  

# ls -lh /var/log/dump
-rw--- 1 root root 1.1G  3月 16 18:51 2009-0316-1851.02-dom-d2.core
-rw--- 1 root root 1.1G  3月 16 18:51 2009-0316-1851.43-dom-d2.core

Please see the attached.
NOTICE:attached cib.xml is for Heartbeat 2.1.4.
It won't work with Pacemaker.

Best Regards,
Junko Ikeda

NTT DATA INTELLILINK CORPORATION


xen0.patch
Description: Binary data
 
   
 
   
 
   
   
   
   
   
   
 
   
 
 
 
   
 
   
 
 
 
   
 
   
   
 
   
   
   
 
 
   
 
 
 
 
   
 
   
   
 
   
   
   
 
 
   
 
 
 
 
   
 
   
 
 
   
 
 
 
 
 
 
   
   
 
 
 
 
 
 
   
   
 
 
 
 
 
 
   
 
   
   
 

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


RE: [Linux-ha-dev] xm dump-core from xen0

2009-03-06 Thread Junko IKEDA
I run the attached cib.xml.
It seems that this configuration works well (but I need more tests)
If there is any strange elements, please let me know.

Thanks,
Junko
 
 
> Sorry for all of my mistakes...
> I have a wrong /etc/hosts.
> It works well for now.
> 
> By the way, Could I config this plugin on two Dom0 and two DomU?
> 
> ex.) domU-1 on Dom0-1, and domU-2 o Dom0-2
> 
> Thanks,
> Junko
> 
> 
> 
> > Hi,
> >
> > My operation is here;
> >
> > # ssh x3650g
> >
> > # export dom0="x3650g"
> >
> > # export hostlist="dom-d1:/etc/xen/dom-d1 dom-d2:/etc/xen/dom-d2"
> >
> > # /usr/lib64/stonith/plugins/external/xen0 on dom-d1
> >
> > # echo $?
> >
> > 0
> >
> > dom-d1 was created well.
> >
> > # /usr/lib64/stonith/plugins/external/xen0 reset dom-d1
> >
> > # echo $?
> >
> > # 1
> >
> > dom-d1 was destroyed but return code was not zero,
> > so Heartbeat handled that this stonith operation failed.
> >
> > When I edit the plugin like this, it work well.
> >
> > --- xen0.org2009-03-06 17:22:44.0 +0900
> > +++ xen02009-03-06 17:22:52.0 +0900
> > @@ -140,7 +140,7 @@ reset)
> >  exit 0
> >  fi
> >
> > -exit 1
> > +exit 0
> >  ;;
> >  status)
> >  CheckHostList
> >
> >
> > Is this some timing issue?
> >
> > Thanks,
> > Junko
> >
> >
> > ___
> > Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> > Home Page: http://linux-ha.org/
> 
> ___
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/
 
   
 
   
 
   
   
   
   
   
   
 
   
 
 
 
   
 
   
 
 
 
   
 
   
   
 
   
   
   
 
 
   
 
 
   
 
   
   
 
   
   
   
 
 
   
 
 
   
 
   
 
 
   
 
 
 
   
   
 
 
 
   
   
 
 
 
   
   
 
 
 
   
 
   
   
 

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


RE: [Linux-ha-dev] xm dump-core from xen0

2009-03-06 Thread Junko IKEDA


Sorry for all of my mistakes...
I have a wrong /etc/hosts.
It works well for now.

By the way, Could I config this plugin on two Dom0 and two DomU?

ex.) domU-1 on Dom0-1, and domU-2 o Dom0-2

Thanks,
Junko



> Hi,
> 
> My operation is here;
> 
> # ssh x3650g
> 
> # export dom0="x3650g"
> 
> # export hostlist="dom-d1:/etc/xen/dom-d1 dom-d2:/etc/xen/dom-d2"
> 
> # /usr/lib64/stonith/plugins/external/xen0 on dom-d1
> 
> # echo $?
> 
> 0
> 
> dom-d1 was created well.
> 
> # /usr/lib64/stonith/plugins/external/xen0 reset dom-d1
> 
> # echo $?
> 
> # 1
> 
> dom-d1 was destroyed but return code was not zero,
> so Heartbeat handled that this stonith operation failed.
> 
> When I edit the plugin like this, it work well.
> 
> --- xen0.org2009-03-06 17:22:44.0 +0900
> +++ xen02009-03-06 17:22:52.0 +0900
> @@ -140,7 +140,7 @@ reset)
>  exit 0
>  fi
> 
> -exit 1
> +exit 0
>  ;;
>  status)
>  CheckHostList
> 
> 
> Is this some timing issue?
> 
> Thanks,
> Junko
> 
> 
> ___
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


RE: [Linux-ha-dev] xm dump-core from xen0

2009-03-06 Thread Junko IKEDA
Hi,

My operation is here;

# ssh x3650g

# export dom0="x3650g"

# export hostlist="dom-d1:/etc/xen/dom-d1 dom-d2:/etc/xen/dom-d2"
 
# /usr/lib64/stonith/plugins/external/xen0 on dom-d1

# echo $?

0

dom-d1 was created well.

# /usr/lib64/stonith/plugins/external/xen0 reset dom-d1

# echo $?

# 1

dom-d1 was destroyed but return code was not zero,
so Heartbeat handled that this stonith operation failed.

When I edit the plugin like this, it work well.

--- xen0.org2009-03-06 17:22:44.0 +0900
+++ xen02009-03-06 17:22:52.0 +0900
@@ -140,7 +140,7 @@ reset)
 exit 0
 fi

-exit 1
+exit 0
 ;;
 status)
 CheckHostList


Is this some timing issue?

Thanks,
Junko


___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


RE: [Linux-ha-dev] xm dump-core from xen0

2009-03-04 Thread Junko IKEDA
> In fact I use it in DomUs only. In my test home cluster I have one
> Dom0 that runs several DomUs that act like cluster nodes and run xen0
> to fence each other when needed.

Really?
I got it mixed up...
but my plan is to run xen0 on some DomUs, so it would be great!

Thanks,
Junko

> 
> On Wed, Mar 4, 2009 at 6:45 PM, Junko IKEDA 
wrote:
> > Hi,
> >
> >> Attached is a patch that adds that functionality.
> >
> > Many thanks!
> > I'll give it a try.
> >
> > By the way, xen0 plugin should run on domain-0, right?
> > Is it possible to run it on domain-U?
> >
> > Thanks,
> > Junko
> >
> >> On Tue, Mar 3, 2009 at 11:24 PM, Serge Dubrouski 
> > wrote:
> >> > That shouldn't be a big deal. I can add one more config parameter
like
> >> > "run_dump", then if it's set the script will call xm dump-core before
> >> > destroying xunU.
> >> >
> >> > On Tue, Mar 3, 2009 at 10:38 PM, Junko IKEDA

> > wrote:
> >> >> Hi Serge,
> >> >>
> >> >> I'm trying to manage xen domain-U with xen0 plugin.
> >> >> There are two "xm" command, like "xm destroy" and "xm create" in
xen0,
> >> >> How do you think to add "xm dump-core" into it?
> >> >> If possible, I want to get the dump of domain-U when some fence
events
> >> >> happen.
> >> >>
> >> >> Best Regards,
> >> >> Junko Ikeda
> >> >>
> >> >>
> >> >> ___
> >> >> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> >> >> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> >> >> Home Page: http://linux-ha.org/
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Serge Dubrouski.
> >> >
> >>
> >>
> >>
> >> --
> >> Serge Dubrouski.
> >
> > ___
> > Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> > Home Page: http://linux-ha.org/
> >
> 
> 
> 
> --
> Serge Dubrouski.
> ___
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


RE: [Linux-ha-dev] xm dump-core from xen0

2009-03-04 Thread Junko IKEDA
Hi,

> Attached is a patch that adds that functionality.

Many thanks!
I'll give it a try.

By the way, xen0 plugin should run on domain-0, right?
Is it possible to run it on domain-U?

Thanks,
Junko
 
> On Tue, Mar 3, 2009 at 11:24 PM, Serge Dubrouski 
wrote:
> > That shouldn't be a big deal. I can add one more config parameter like
> > "run_dump", then if it's set the script will call xm dump-core before
> > destroying xunU.
> >
> > On Tue, Mar 3, 2009 at 10:38 PM, Junko IKEDA 
wrote:
> >> Hi Serge,
> >>
> >> I'm trying to manage xen domain-U with xen0 plugin.
> >> There are two "xm" command, like "xm destroy" and "xm create" in xen0,
> >> How do you think to add "xm dump-core" into it?
> >> If possible, I want to get the dump of domain-U when some fence events
> >> happen.
> >>
> >> Best Regards,
> >> Junko Ikeda
> >>
> >>
> >> ___
> >> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> >> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> >> Home Page: http://linux-ha.org/
> >>
> >
> >
> >
> > --
> > Serge Dubrouski.
> >
> 
> 
> 
> --
> Serge Dubrouski.

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] xm dump-core from xen0

2009-03-03 Thread Junko IKEDA
Hi Serge,

I'm trying to manage xen domain-U with xen0 plugin.
There are two "xm" command, like "xm destroy" and "xm create" in xen0,
How do you think to add "xm dump-core" into it?
If possible, I want to get the dump of domain-U when some fence events
happen.

Best Regards,
Junko Ikeda


___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


RE: [Linux-ha-dev] SFEX resource agent for heartbeat

2008-10-16 Thread Junko IKEDA
Hi,

See also this page, please.
http://www.linux-ha.org/sfex/

Thanks,
Junko

> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Xinwei Hu
> Sent: Thursday, October 16, 2008 6:55 PM
> To: High-Availability Linux Development List
> Subject: Re: [Linux-ha-dev] SFEX resource agent for heartbeat
> 
> 2008/10/16 Raoul Bhatia [IPAX] <[EMAIL PROTECTED]>:
> > hi,
> >
> > do you mind me asking what the purpose of sfex is?
> 
> sfex implements a advisory protocol over shared disk. It helps to
> prevent concurrent accessing to the shared storage even when the
> split-site happens.
> 
> > cheers,
> > raoul
> >
> > Xinwei Hu wrote:
> >> Hi all,
> >>
> >>   Attached is a rewritten version of sfex. It can be applied to tip of
> >> heartbeat.
> >>   Here's some explanation about the changes and design notes on it.
> >>
> >>   . The fundamental algorithm of sfex is kept untouched, except that
> >> it's a background deamon now.
> >>   . The memory allocation in original version is correct but
> >> confusing. So all calls malloc are replaced with one call to
> >> posix_memalign, and daemon avoid to allocate extra memory after
> >> initialized itself.
> >>   . The on disk meta-data works well, but again, very confusing. All
> >> integers are converted to/from strings before/after saving/loading
> >> to/from disk. I add new structs to represent the on disk formats. It
> >> helps to remove all confusing offset.
> >>   . sfex_daemon will be installed into /usr/lib/heartbeat/ and
> >> sfex_init into /usr/sbin
> >>   . sfex_* use syslog & stderr
> >>
> >>   . sfex implements an exclusive mode to help to control the shared
> >> disk. It's a cooperative and advisory protocol only. I think it's not
> >> possible to implement a shared-mode based on advisory protocol only.
> >> The idea of SBD can be borrowed in as a workaround, but it's more
> >> reasonable to do that as a STONITH plugin then a resource agent.
> >>
> >>   Junko helps to review the code again, but any bugs you found should
> >> be my fault.
> >>
> >>   Please kindly review the patch and give your comments.
> >>
> >>   Thanks.
> >>
> >>
> >>
> 
> >>
> >> ___
> >> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> >> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> >> Home Page: http://linux-ha.org/
> >
> >
> > --
> > 
> > DI (FH) Raoul Bhatia M.Sc.  email.  [EMAIL PROTECTED]
> > Technischer Leiter
> >
> > IPAX - Aloy Bhatia Hava OEG web.  http://www.ipax.at
> > Barawitzkagasse 10/2/2/11   email.[EMAIL PROTECTED]
> > 1190 Wien   tel.   +43 1 3670030
> > FN 277995t HG Wien  fax.+43 1 3670030 15
> > 
> > ___
> > Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> > Home Page: http://linux-ha.org/
> >
> ___
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] uninstall Heartbeat 2.1.4-1 on RedHat

2008-08-19 Thread Junko IKEDA
Hi,

Heartbeat 2.1.4-1 works well, 
but it seems that an uninstall process has a problem.

I grubbed rpms from here.
http://download.opensuse.org/repositories/server:/ha-clustering:/lha-2.1/RHE
L_5/x86_64/

There is no problem with installing (rpm -ihv),
but uninstall (rpm -e) would say like this;

# rpm -e heartbeat
/var/tmp/rpm-tmp.41797: line 1: fg: no job control
error: %postun(heartbeat-2.1.4-1.1.x86_64) scriptlet failed, exit status 1

and I found the similer problem in this list.
http://www.gossamer-threads.com/lists/linuxha/users/41357

Is it hard to replaced the %run_ldconfig macro with /sbin/ldconfig for
RedHat?

Best Regards,
Junko Ikeda

NTT DATA INTELLILINK CORPORATION


___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


RE: [Linux-ha-dev] crm_mon doesn't exit immediately

2008-08-11 Thread Junko IKEDA
> > If there's no objection I would like to push this patch into
> > the lha-2.1 repository, but any problem on that?
> 
> sure
> 
> >
> > It seems that the latest pacemaker also presents the same behavior
> > so I think the both needs to be fixed as well.
> 
> I thought it was fine?

sorry, that might have been my misunderstanding.
It seems that the latest Pacemaker should work the same (doesn't exit with
Ctrl + C) in source codes.

Thanks,
Junko

> 
> >> I found that crm_mon which is included in Pacemaker-dev(2f2343008186)
can
> be
> >> quitted by Ctrl + C.

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


RE: [Linux-ha-dev] crm_mon doesn't exit immediately

2008-07-29 Thread Junko IKEDA
Hi,

I found that crm_mon which is included in Pacemaker-dev(2f2343008186) can be
quitted by Ctrl + C.
If a back port from Pacemaker to Heartbeat 2.1.4 is better than applying the
patch,
We don't care about how to fix this.

Thanks,
Junko


> Can somebody handle this issue?
> She said that, she couldn't quit crm_mon command with Ctrl+C.
> I usually use crm_mon with -i option, so I couldn't notice this behavior,
> but it sure is that crm_mon running with no option wouldn't be stopped by
> SIGINT.
> It's odd, right?
> I think almost all people would expect that Ctrl + C can stop this
command.
> See attached her patch.
> 
> Thanks,
> Junko
> 
> 
> > I noticed that crm_mon doesn't exit immediately
> > when it receive SIGINT in mainloop.
> > It seems that SIGINT only kills sleep() function...
> > (Is this caused by something in G_main_add_SignalHandler()?
> >  Or anything else?)
> >
> > So, I modified it to exit wait function
> > when it is interrupted by a signal.
> > This patch is for Heartbeat STABLE 2.1 (aae8d51d84ec).
> > I hope it isn't too late for Heartbeat2.1.4...
> >
> >
> > Regards,
> > Satomi Taniguchi

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


RE: [Linux-ha-dev] crm_mon doesn't exit immediately

2008-07-28 Thread Junko IKEDA
Hi,

Can somebody handle this issue?
She said that, she couldn't quit crm_mon command with Ctrl+C.
I usually use crm_mon with -i option, so I couldn't notice this behavior,
but it sure is that crm_mon running with no option wouldn't be stopped by
SIGINT.
It's odd, right?
I think almost all people would expect that Ctrl + C can stop this command.
See attached her patch.

Thanks,
Junko


> I noticed that crm_mon doesn't exit immediately
> when it receive SIGINT in mainloop.
> It seems that SIGINT only kills sleep() function...
> (Is this caused by something in G_main_add_SignalHandler()?
>  Or anything else?)
> 
> So, I modified it to exit wait function
> when it is interrupted by a signal.
> This patch is for Heartbeat STABLE 2.1 (aae8d51d84ec).
> I hope it isn't too late for Heartbeat2.1.4...
> 
> 
> Regards,
> Satomi Taniguchi


interrupted_by_a_signal.patch
Description: Binary data
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


RE: [Linux-ha-dev][RFC]heartbeat-2.1.4---Masterresource'sdemoteoperationgoesintoaninfinite loop

2008-04-21 Thread Junko IKEDA
> > > Btw. You do realize that setting ordered=false for the master resource
> >  > also means that the group's actions wont be ordered either don't you?
> >
> >  You mean, there's a possibility that slave resource will start/stop
before
> >  master's action complete if I don't set ordered=true, right?
> 
> No.  I mean that the members of the group would be able to stop/start
> in parallel.

I see.
I have to take care of it when I set a group with master/slave.
The default value for master/slave is ordered=false.
Group resource is usually expected to be ordered=true.

Thanks,
Junko

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


RE: [Linux-ha-dev][RFC]heartbeat-2.1.4---Masterresource'sdemoteoperationgoesinto aninfinite loop

2008-04-20 Thread Junko IKEDA
> Btw. You do realize that setting ordered=false for the master resource
> also means that the group's actions wont be ordered either don't you?

You mean, there's a possibility that slave resource will start/stop before
master's action complete if I don't set ordered=true, right?
I will take care of it next time. 

Thanks,
Junko

> 
> 2008/4/18 Junko IKEDA <[EMAIL PROTECTED]>:
> > > Fixed by:
> >  >http://hg.clusterlabs.org/pacemaker/stable-0.6/rev/4817a7094683
> >
> >  It works well with group-master/slave, too.
> >  Many thanks!
> >  Please merge it into Heartbeat 2.1.4.
> >
> >  Thanks,
> >  Junko
> >
> > ___
> >  Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> >  http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> >  Home Page: http://linux-ha.org/
> >
> >
> ___
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


RE: [Linux-ha-dev] [RFC] heartbeat-2.1.4 --- build onRHEL5.1

2008-04-17 Thread Junko IKEDA
> any ideas as to why the current code doesn't work for you?

I failed to build rpm on open suse 10.1 too...

It might be a potential problem in Heartbeat 2.1.3.
See attached configure-213.log.

It's sure that the summary says CIM provider and TSA plugin would not be
built,

  Build CRM= "yes"
  Build LRM= "yes"
  Build Ldirectord = "yes"
  Build IPMILan Plugin = "no"
  Build CIM providers  = "no"
  Build TSA plugin = "no"
  Build dopd plugin= "yes"
  Enable times kludge  = "yes"

Despite this, cim and tsa_plugin directory are listed as SUBDIRS.

list='debian pkg port replace include lib heartbeat membership telecom
resources lrm crm fencing logd snmp_subagent tools doc cts mgmt cim
ldirectord config tsa_plugin contrib'; for subdir in $list; do \

Dose it mean AM_CONDITIONAL in configure.in is ignored?
In 2.1.3, Makefile under cim/tsa_plugin directory were created under any
circumstances,
But 2.1.4 doesn't.
So I fail to build, maybe.

configure.in(2.1.4)
--
if test "x${enable_cim_provider}" = "xyes"; then
AC_CONFIG_FILES(\
cim/Makefile\
cim/mof/Makefile\
cim/mof/register_providers.sh   \
cim/mof/unregister_providers.sh \
)
fi

if test "x${enable_tsa_plugin}" = "xyes"; then
AC_CONFIG_FILES(\
tsa_plugin/Makefile \
tsa_plugin/testrun.sh   \
tsa_plugin/linuxha-adapter  \
)
fi
--

Bin off the previous patch,
is there any troubles to remove the test condition in configure.in ?
the plugins can be built.


Thanks,
Junko

 
> Does the patched version work to actually _build_ the plugins?
> For example, you add CIM_PROVIDER_DIR as a substitution pattern, but
> it's not referenced as a substitution anywhere?
> 
> And your patch seems to remove all references to the conditionals; you
> might as well not define them in the configure file then.



build-2.patch
Description: Binary data
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


RE: [Linux-ha-dev] [RFC] heartbeat-2.1.4 --- build onRHEL5.1

2008-04-16 Thread Junko IKEDA
> Hi,
> 
> I keep failing to build lha-2.1 on RHEL5.1 for now.
> It seems that "--enable-cim-provider=no" and "--enable-tsa-plugin=no" are
> ineffective for ConfigureMe.
> We don't need CIM providers or TSA plugin, so have a try to make patch
about
> it.
> Please check the attached.

Sorry for annoying.
The last attached was something wrong, so check this one.

Thanks,
Junko



build.patch
Description: Binary data
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] [RFC] heartbeat-2.1.4 --- build onRHEL5.1

2008-04-16 Thread Junko IKEDA
Hi,

I keep failing to build lha-2.1 on RHEL5.1 for now.
It seems that "--enable-cim-provider=no" and "--enable-tsa-plugin=no" are
ineffective for ConfigureMe.
We don't need CIM providers or TSA plugin, so have a try to make patch about
it.
Please check the attached.

Thanks,
Junko


build.patch
Description: Binary data
___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


RE: [Linux-ha-dev] [RFC] heartbeat-2.1.4 --- Master resource'sdemoteoperation goes into an infinite loop

2008-04-16 Thread Junko IKEDA
> > >  I am running some tests around Master/Slave with
> > >  http://hg.linux-ha.org/lha-2.1/.
> > >
> > >  Scenario is;
> > >  (1) start heartbeat on two nodes
> > >  (2) confirm the resources state (Master/Slave)
> > >  (3) remove the state file of the master resource
> > >  # rm -f /var/run/heartbeat/rsctmp/Stateful-stateful-2\:0.state
> > >
> > >  Heartbeat could detect the failure, and the resource's state would be
> > >  shifted to demote, and stop.
> > >  Demote operation is sure to be called, but it will goes into an
> infinite
> > >  loop.
> >
> > because the demote action keeps failing (rc=7 instead of rc=0)
> >
> > is this a test you ran on previous versions?
> 
> This test relates to these issues.
> http://developerbugs.linux-foundation.org/show_bug.cgi?id=1822
> http://hg.clusterlabs.org/pacemaker/dev/rev/7edc6bc1557b
> 
> It seems that the fix is included to pacemaker/dev not stable...
> bummage.

This patch is also needed.
http://hg.clusterlabs.org/pacemaker/dev/rev/7bc83e8f3911

Thanks,
Junko

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


RE: [Linux-ha-dev] [RFC] heartbeat-2.1.4 --- Master resource's demoteoperation goes into an infinite loop

2008-04-16 Thread Junko IKEDA
> >  I am running some tests around Master/Slave with
> >  http://hg.linux-ha.org/lha-2.1/.
> >
> >  Scenario is;
> >  (1) start heartbeat on two nodes
> >  (2) confirm the resources state (Master/Slave)
> >  (3) remove the state file of the master resource
> >  # rm -f /var/run/heartbeat/rsctmp/Stateful-stateful-2\:0.state
> >
> >  Heartbeat could detect the failure, and the resource's state would be
> >  shifted to demote, and stop.
> >  Demote operation is sure to be called, but it will goes into an
infinite
> >  loop.
> 
> because the demote action keeps failing (rc=7 instead of rc=0)
> 
> is this a test you ran on previous versions?

This test relates to these issues.
http://developerbugs.linux-foundation.org/show_bug.cgi?id=1822
http://hg.clusterlabs.org/pacemaker/dev/rev/7edc6bc1557b

It seems that the fix is included to pacemaker/dev not stable...
bummage.

Thanks,
Junko


> 
> >  See the attached hb_report.
> >  Is there something wrong with cib.xml ?
> >  This is similar case to what Yamauchi-san posted.
> >
> >  Best Regards,
> >  Junko Ikeda
> >
> >  NTT DATA INTELLILINK CORPORATION
> >
> > ___
> >  Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> >  http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> >  Home Page: http://linux-ha.org/
> >
> >
> ___
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


RE: [Linux-ha-dev] [RFC] heartbeat-2.1.4

2008-04-15 Thread Junko IKEDA
Hi again,

Another request;
Would it be possible to include the following patch in release 2.1.4?
http://hg.linux-ha.org/dev/rev/6307bb091d02

It will help the problems which are posted into Bugzilla 1814, 
for all platform not only ppc.
http://developerbugs.linux-foundation.org/show_bug.cgi?id=1814

Thanks,
Junko

> > So, that said, I've pushed my proposed code to
> > http://hg.linux-ha.org/lha-2.1/. It, for reasons outlined above, likely
> > doesn't build yet (because the in-tree packaging is broken), but I
> > wanted to share the scope of changes with you.
> 
> There are some fixes about failcount in pacemaker/stable-0.6 recently.
> It seems that some of them slip out of 2.1.4 repository.
> (http://hg.linux-ha.org/lha-2.1/)
> 
> It might be difficult to include them because they are fixed as the
> pacemaker's code (like transitioner/events.c),
> but it would be helpful if you release them as Heartbeat 2.1.4.
> 
> Changest are here;
> http://hg.clusterlabs.org/pacemaker/stable-0.6/rev/8334d7b6d2e4
> http://hg.clusterlabs.org/pacemaker/stable-0.6/rev/4622081ce2fc
> http://hg.clusterlabs.org/pacemaker/stable-0.6/rev/97d9fc0dcbd5
> http://hg.clusterlabs.org/pacemaker/stable-0.6/rev/3c6074b76b99
> 
> Thanks,
> Junko
> 
> > As a further point of reference, I'm attaching the SLES changes section
> > to this mail. (bnc# refers to bugzilla.novell.com.)
> >
> >
> > Let me emphasize strongly that I really don't want to step on anyone's
> > toes, or rush the new governance board, but only fill the current void
> > until that is actually operational and has settled down, as I suggest
> > our users need it.
> 
> 
> ___
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


RE: [Linux-ha-dev] [RFC] heartbeat-2.1.4

2008-04-14 Thread Junko IKEDA
Hi,

> So, that said, I've pushed my proposed code to
> http://hg.linux-ha.org/lha-2.1/. It, for reasons outlined above, likely
> doesn't build yet (because the in-tree packaging is broken), but I
> wanted to share the scope of changes with you.

There are some fixes about failcount in pacemaker/stable-0.6 recently.
It seems that some of them slip out of 2.1.4 repository.
(http://hg.linux-ha.org/lha-2.1/)

It might be difficult to include them because they are fixed as the
pacemaker's code (like transitioner/events.c),
but it would be helpful if you release them as Heartbeat 2.1.4.

Changest are here;
http://hg.clusterlabs.org/pacemaker/stable-0.6/rev/8334d7b6d2e4
http://hg.clusterlabs.org/pacemaker/stable-0.6/rev/4622081ce2fc
http://hg.clusterlabs.org/pacemaker/stable-0.6/rev/97d9fc0dcbd5
http://hg.clusterlabs.org/pacemaker/stable-0.6/rev/3c6074b76b99

Thanks,
Junko

> As a further point of reference, I'm attaching the SLES changes section
> to this mail. (bnc# refers to bugzilla.novell.com.)
> 
> 
> Let me emphasize strongly that I really don't want to step on anyone's
> toes, or rush the new governance board, but only fill the current void
> until that is actually operational and has settled down, as I suggest
> our users need it.


___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


RE: AW: [Linux-ha-dev] Call for testers: 2.1.3

2007-12-16 Thread Junko IKEDA
Hi,

> There was two bugs in the configure stuff:
>   1) It got the package name for pegasus wrong for Red Hat
>   2) It didn't work if you had pegasus installed but didn't
>   enable the CIM provider.

I tried this branch and it worked well.
http://hg.linux-ha.org/dev/rev/3914fa415bd0

Thanks a lot!
By the way, why these rpms were changed their name?

heartbeat-pils-2.1.2-2.x86_64.rpm -> pils-2.1.3-1.x86_64.rpm
heartbeat-stonith-2.1.2-2.x86_64.rpm -> stonith-2.1.3-1.x86_64.rpm

Thanks,
Junko

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


RE: [Linux-ha-dev] Re: Shared disk file Exclusiveness control programfor HB2

2007-09-02 Thread Junko IKEDA
Hello,

> NAKAHIRA Kazutomo wrote:
> > We wrote a Shared Disk File EXclusiveness Control Program, called
> > "SF-EX" for short, could prevent a destruction of data on
> > shared disk file system due to Split-Brain.
> >
> > This program consists of CUI commands written in the C and RA,
> > the former is used for managing and monitoring shared disk status
> > and the latter is the same as other common RAs.

> This program would be even more useful if it were available as a quorum
> module.  Have you thought about making it a quorum module?

We tried the quorumd, and noticed that we had to rewrite principle parts of
SF-EX.
SF-EX which works as RA checks the lock status regularly.
and quorum server module also go the round periodically.
but it seems that quorumd_getquorum() starts up without any apparent
connection to a quorum server.
Maybe it's just as well we're going to implement some exclusive algorithms
right there, 
it might be related to CCM problem, I'm not sure.

If it's possible to handle this function as RA,
could you consider introduction of SF-EX into the next release as the first
step?

Thanks,
Junko IKEDA

NTT DATA INTELLILINK

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


RE: [Linux-ha-dev] Shared disk file Exclusiveness controlprogramforHB2

2007-08-14 Thread Junko IKEDA
Hi,

We are planning to make sf-ex into a quorum plugin,
but It might take a while because we aren't really familiar with it.
Your continued support will be greatly appreciated.

as for RA, last come, last served, let's see, the last node which can update
the reserve status is going to win the right to run resources, won't work at
all?

Thanks,
Junko Ikeda
NTT DATA INTELLILINK

> I believe that the point he was trying to make is that it _needs_ the
> complexity of the logic to be always correct even in the split-brain
> case - and I agree.
> 
> If this logic fails and both sides think they have exclusive access in a
> split-brain case, then a filesystem on disk may be destroyed.  This is a
> _very_ bad consequence - much worse than a crash.  It doesn't matter if
> it is relatively unlikely, because the consequence is so terrible.  With
> hundreds of thousands of clusters running Heartbeat, even unlikely
> events eventually happen.
>   http://linux-ha.org/BadThingsWillHappen
> 
> You should be able to run hundreds of thousands or millions of tests
> where both sides are trying to get the lock at the same time, and be
> able to verify that only one side got the lock - in every single case.
> 
> Please don't be discouraged.  Horms started a similar effort a few years
> ago, but he wasn't able to spend enough time with it to get it right.
> 
> What you're doing is a valuable thing to do, and we all understand very
> well that it's difficult.
> 
> When I first entered this discussion, I mentioned lockless
> synchronization algorithms as being good things to study.  In this case,
> we are trying to create a lock, but I suspect the lockless methods would
> be a good way to synchronize the creation of a lock (even though this
> sounds odd).
> 
> --
> Alan Robertson <[EMAIL PROTECTED]>
> 
> "Openness is the foundation and preservative of friendship...  Let me
> claim from you at all times your undisguised opinions." - William
> Wilberforce
> ___
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


RE: [Linux-ha-dev] Shared disk file Exclusiveness controlprogramforHB2

2007-08-13 Thread Junko IKEDA
> OK. I think you are mis-understanding the problem.
> 
> When the communication between Node A & B is fine, you don't need any
> kind of lock. Heartbeat itself can ensure the resource runs on one
selected
> node, and on one node only.

sfex_lock() is just checking the status that shows which node succeeded to
lock.
It won't be always trying to lock over and over again

> sfex_lock is valuable when the communication between A & B is broken.
> But when the communication IS broken, you can't assume sfex_lock will run
> in order any more.

If the interconnect LAN is down, Split-Brain will come.
the lock status is reserved for Node A at this moment,
but Node B is also trying to update the status in order to lock because
Split-Brain has arisen.
while Node A checks the status, Node B might update it.
Node A, which is overwrote its status, is going to release the lock.
sfex_lock() doesn't have such a complex logic.

Thanks,
Junko

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


RE: [Linux-ha-dev] Shared disk file Exclusiveness controlprogramforHB2

2007-08-13 Thread Junko IKEDA
> > If Node B updates the lock status _at just the right moment_,
> > sfex_update() detects that the other node is trying to update its
status,
> > and it will be terminated with exit(2).
> This time window is enough to destroy all data if you are bad luck ;-(

Node B is just updating its lock status, it's not the data.
Node B can get the lock and access the shared disk after Node A gets out
both of the lock and its data.

> > > This statement is wrong according to your code.
> > > Especially, your check-and-reserve is not an atomic CAS operation.
> >
> > By the way, the lock status stores on the partition, (not using file
system)
> > so, as a communication media, it can keep read-write operation
atomicity.
> > All nodes' action, like read (check) or write (reserve) the status won't
> > bump against each other.
> > inconsequent remark?
> Yes, but still, the CAS operation is not atomic unless we do some tricks
like
> scsi reservation.

well...I'm not sure the following comment is essential or not,
this is a later-come basis system (inverse of a first-come basis).
Once one node detects its status is updated from the other node, 
the comparing will be terminated here.
basically, there is no the second bite at the cherry to get the lock.

Thanks,
Junko

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


RE: [Linux-ha-dev] Shared disk file Exclusiveness controlprogramforHB2

2007-08-12 Thread Junko IKEDA
> > Assume we have 2 nodes.
> > 1. Node A & B reach step 3) in the same time.
> > 2. sfex_lock on Node B is scheduled out due to some other reasons.
> > 3. sfex_lock on Node A goes through step 3 to 6, and Node A holds 
> > the lock now.

Node A is sure to hold the lock at this moment.
sfex_lock() is going to return the value 0, and RA will start monitoring on
Node A.
during the monitor operation, sfex_update() is running, and it can check and
update the status of Node A.

If Node B updates the lock status _at just the right moment_,
sfex_update() detects that the other node is trying to update its status,
and it will be terminated with exit(2).

> > 4. sfex_lock on Node B is scheduled back, and goes through step 3 to 
> > 6 also.

RA monitor on Node A will also be stopped.
Node B can get the lock during a situation like this.

> This statement is wrong according to your code.
> Especially, your check-and-reserve is not an atomic CAS operation.

By the way, the lock status stores on the partition, (not using file system)
so, as a communication media, it can keep read-write operation atomicity.
All nodes' action, like read (check) or write (reserve) the status won't
bump against each other.
inconsequent remark?

Thanks,
Junko

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


RE: [Linux-ha-dev] Shared disk file Exclusiveness controlprogramforHB2

2007-08-12 Thread Junko IKEDA
> 2007/8/10, Junko IKEDA <[EMAIL PROTECTED]>:
> > Hi,
> >
> > sfex_lock.c would work like this;
> > 1) read lock data
> > 2) check current lock status
> > 3) reserve lock
> > 4) detect the collision of lock
> > 5) extension of lock
> > 6) lock acquisition completion
> >
> > in "reserve lock" phase, each node writes its status on the disk,
> > and the later node is going to reserve the lock.
> > the former one gives up and the race will be end here.
>
> Assume we have 2 nodes.
> 1. Node A & B reach step 3) in the same time.
> 2. sfex_lock on Node B is scheduled out due to some other reasons.
> 3. sfex_lock on Node A goes through step 3 to 6, and Node A holds the lock
now.
> 4. sfex_lock on Node B is scheduled back, and goes through step 3 to 6
also.

Node B would not be able to get lock in this case, because Node A has
already held the lock.
Node B would check the lock status before reserving the lock,
and notice Node A has the lock.
With this condition, Node B is going to give up to lock.

> 5. Now both A & B have the lock.
>
> > During collision_timeout waiting, the cord around "detect the collision
of
> > lock" has responsibility to prevent the race.
> > in other cases, "check current lock status" would prevent it.
> >
> > as a precondition to ensure the control exclusively,
> > lock_timeout should be longer enough than collision_timeout.
> >
> > Does that answer you?
> >
> > Thanks,
> > Junko
> >
> >
> > > -Original Message-
> > > From: [EMAIL PROTECTED]
> > > [mailto:[EMAIL PROTECTED] On Behalf Of Xinwei
Hu
> > > Sent: Thursday, August 09, 2007 8:58 PM
> > > To: High-Availability Linux Development List
> > > Subject: Re: [Linux-ha-dev] Shared disk file Exclusiveness
> > > controlprogramforHB2
> > >
> > > 2007/8/9, Junko IKEDA <[EMAIL PROTECTED]>:
> > > > Hi,
> > > >
> > > > sorry, my previous answer was off the mark...
> > > > When 2 nodes reach there at the same time,
> > > > node A notices that the other node want to lock too, so give up lock
> > itself.
> > >
> > > I only see that you sleep a period of collision_timeout. This will not
> > prevent
> > > race condition from happening.
> > > Am i missing anything else ?
> > >
> > > > node B is ready to lock.
> > > >
> > > > Thanks,
> > > > Junko
> > > >
> > > >
> > > > > -Original Message-
> > > > > From: [EMAIL PROTECTED]
> > > > > [mailto:[EMAIL PROTECTED] On Behalf Of
Junko
> > IKEDA
> > > > > Sent: Thursday, August 09, 2007 4:30 PM
> > > > > To: 'High-Availability Linux Development List'
> > > > > Subject: RE: [Linux-ha-dev] Shared disk file Exclusiveness control
> > > > > programforHB2
> > > > >
> > > > > Hi,
> > > > >
> > > > > You know that could be true...
> > > > > but if it's called from RA, 2 nodes wouldn't reach that part at
the
> > same
> > > > > time, right?
> > > > > One node will be able to reach there according to the score rule.
> > > > >
> > > > > Thanks,
> > > > > Junko
> > > > >
> > > > > > -Original Message-
> > > > > > From: [EMAIL PROTECTED]
> > > > > > [mailto:[EMAIL PROTECTED] On Behalf Of
Xinwei
> > Hu
> > > > > > Sent: Thursday, August 09, 2007 3:00 PM
> > > > > > To: High-Availability Linux Development List
> > > > > > Subject: Re: [Linux-ha-dev] Shared disk file Exclusiveness
control
> > > > program
> > > > > > forHB2
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > >   There's races in the sfex_lock.c.
> > > > > >
> > > > > >   When 2 nodes reach sfex_lock.c:265 in the same time.
> > > > > >
> > > > > >   node A: reserve lock -> wait for collision_timeout -> hold
lock
> > > > > >   node B: sleep   -> sleep
->
> > > > reserve
> > > > > > lock
> > > > > >
> > > > > > 在 星期三 08 八月 2007 12:00,NAKAHIRA Kazutomo 写道:
> > > > > > > Hello, all.
> > > > > > >
> &

RE: [Linux-ha-dev] Shared disk file Exclusiveness controlprogramforHB2

2007-08-10 Thread Junko IKEDA
Hi,

sfex_lock.c would work like this;
1) read lock data
2) check current lock status
3) reserve lock
4) detect the collision of lock
5) extension of lock
6) lock acquisition completion

in "reserve lock" phase, each node writes its status on the disk,
and the later node is going to reserve the lock.
the former one gives up and the race will be end here.

During collision_timeout waiting, the cord around "detect the collision of
lock" has responsibility to prevent the race.
in other cases, "check current lock status" would prevent it.

as a precondition to ensure the control exclusively,
lock_timeout should be longer enough than collision_timeout.

Does that answer you?

Thanks,
Junko


> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Xinwei Hu
> Sent: Thursday, August 09, 2007 8:58 PM
> To: High-Availability Linux Development List
> Subject: Re: [Linux-ha-dev] Shared disk file Exclusiveness
> controlprogramforHB2
>
> 2007/8/9, Junko IKEDA <[EMAIL PROTECTED]>:
> > Hi,
> >
> > sorry, my previous answer was off the mark...
> > When 2 nodes reach there at the same time,
> > node A notices that the other node want to lock too, so give up lock
itself.
>
> I only see that you sleep a period of collision_timeout. This will not
prevent
> race condition from happening.
> Am i missing anything else ?
>
> > node B is ready to lock.
> >
> > Thanks,
> > Junko
> >
> >
> > > -Original Message-
> > > From: [EMAIL PROTECTED]
> > > [mailto:[EMAIL PROTECTED] On Behalf Of Junko
IKEDA
> > > Sent: Thursday, August 09, 2007 4:30 PM
> > > To: 'High-Availability Linux Development List'
> > > Subject: RE: [Linux-ha-dev] Shared disk file Exclusiveness control
> > > programforHB2
> > >
> > > Hi,
> > >
> > > You know that could be true...
> > > but if it's called from RA, 2 nodes wouldn't reach that part at the
same
> > > time, right?
> > > One node will be able to reach there according to the score rule.
> > >
> > > Thanks,
> > > Junko
> > >
> > > > -Original Message-
> > > > From: [EMAIL PROTECTED]
> > > > [mailto:[EMAIL PROTECTED] On Behalf Of Xinwei
Hu
> > > > Sent: Thursday, August 09, 2007 3:00 PM
> > > > To: High-Availability Linux Development List
> > > > Subject: Re: [Linux-ha-dev] Shared disk file Exclusiveness control
> > program
> > > > forHB2
> > > >
> > > > Hi,
> > > >
> > > >   There's races in the sfex_lock.c.
> > > >
> > > >   When 2 nodes reach sfex_lock.c:265 in the same time.
> > > >
> > > >   node A: reserve lock -> wait for collision_timeout -> hold lock
> > > >   node B: sleep   -> sleep ->
> > reserve
> > > > lock
> > > >
> > > > 在 星期三 08 八月 2007 12:00,NAKAHIRA Kazutomo 写道:
> > > > > Hello, all.
> > > > >
> > > > > We wrote a Shared Disk File EXclusiveness Control Program, called
> > > > > "SF-EX" for short, could prevent a destruction of data on
> > > > > shared disk file system due to Split-Brain.
> > > > >
> > > > > This program consists of CUI commands written in the C and RA,
> > > > > the former is used for managing and monitoring shared disk status
> > > > > and the latter is the same as other common RAs.
> > > > >
> > > > > We tested this program on IBM and HP platform, and we confirmed
> > > > > all functions worked well.
> > > > >
> > > > > Our test environment is listed below:
> > > > >  Software:
> > > > >   OS :RHEL4 ES Update5(kernel2.6.9-55.ELsmp)
> > > > >   HB2:heartbeat-2.1.2-2
> > > > >
> > > > >  Hardware:
> > > > >   IBM platform:
> > > > >Server : System x3650
> > > > >Shared disk: DS 4700 (FC)
> > > > >
> > > > >   HP platform:
> > > > >Server : DL380G5
> > > > >Shared disk: MSA500G2 (SCSI)
> > > > >
> > > > > How to install and configuration are described in README.
> > > > > If you are interested in this program, please try it and
> > > > > let me know your comments.
> > > > >
> > > > > Your suggestions on how to improve are really appreciated.
> > > > >
> > > > > Best regards.
> > > > ___
> > > > Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> > > > Home Page: http://linux-ha.org/
> > >
> > > ___
> > > Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> > > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> > > Home Page: http://linux-ha.org/
> >
> > ___
> > Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> > Home Page: http://linux-ha.org/
> >

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


RE: [Linux-ha-dev] Shared disk file Exclusiveness control programforHB2

2007-08-09 Thread Junko IKEDA
Hi,

sorry, my previous answer was off the mark...
When 2 nodes reach there at the same time,
node A notices that the other node want to lock too, so give up lock itself.
node B is ready to lock.

Thanks,
Junko


> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Junko IKEDA
> Sent: Thursday, August 09, 2007 4:30 PM
> To: 'High-Availability Linux Development List'
> Subject: RE: [Linux-ha-dev] Shared disk file Exclusiveness control
> programforHB2
>
> Hi,
>
> You know that could be true...
> but if it's called from RA, 2 nodes wouldn't reach that part at the same
> time, right?
> One node will be able to reach there according to the score rule.
>
> Thanks,
> Junko
>
> > -Original Message-
> > From: [EMAIL PROTECTED]
> > [mailto:[EMAIL PROTECTED] On Behalf Of Xinwei Hu
> > Sent: Thursday, August 09, 2007 3:00 PM
> > To: High-Availability Linux Development List
> > Subject: Re: [Linux-ha-dev] Shared disk file Exclusiveness control
program
> > forHB2
> >
> > Hi,
> >
> >   There's races in the sfex_lock.c.
> >
> >   When 2 nodes reach sfex_lock.c:265 in the same time.
> >
> >   node A: reserve lock -> wait for collision_timeout -> hold lock
> >   node B: sleep   -> sleep ->
reserve
> > lock
> >
> > 在 星期三 08 八月 2007 12:00,NAKAHIRA Kazutomo 写道:
> > > Hello, all.
> > >
> > > We wrote a Shared Disk File EXclusiveness Control Program, called
> > > "SF-EX" for short, could prevent a destruction of data on
> > > shared disk file system due to Split-Brain.
> > >
> > > This program consists of CUI commands written in the C and RA,
> > > the former is used for managing and monitoring shared disk status
> > > and the latter is the same as other common RAs.
> > >
> > > We tested this program on IBM and HP platform, and we confirmed
> > > all functions worked well.
> > >
> > > Our test environment is listed below:
> > >  Software:
> > >   OS :RHEL4 ES Update5(kernel2.6.9-55.ELsmp)
> > >   HB2:heartbeat-2.1.2-2
> > >
> > >  Hardware:
> > >   IBM platform:
> > >Server : System x3650
> > >Shared disk: DS 4700 (FC)
> > >
> > >   HP platform:
> > >Server : DL380G5
> > >Shared disk: MSA500G2 (SCSI)
> > >
> > > How to install and configuration are described in README.
> > > If you are interested in this program, please try it and
> > > let me know your comments.
> > >
> > > Your suggestions on how to improve are really appreciated.
> > >
> > > Best regards.
> > ___
> > Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> > Home Page: http://linux-ha.org/
>
> ___
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


RE: [Linux-ha-dev] Shared disk file Exclusiveness control program forHB2

2007-08-09 Thread Junko IKEDA
Hi,

You know that could be true...
but if it's called from RA, 2 nodes wouldn't reach that part at the same
time, right?
One node will be able to reach there according to the score rule.

Thanks,
Junko

> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Xinwei Hu
> Sent: Thursday, August 09, 2007 3:00 PM
> To: High-Availability Linux Development List
> Subject: Re: [Linux-ha-dev] Shared disk file Exclusiveness control program
> forHB2
>
> Hi,
>
>   There's races in the sfex_lock.c.
>
>   When 2 nodes reach sfex_lock.c:265 in the same time.
>
>   node A: reserve lock -> wait for collision_timeout -> hold lock
>   node B: sleep   -> sleep -> reserve
> lock
>
> 在 星期三 08 八月 2007 12:00,NAKAHIRA Kazutomo 写道:
> > Hello, all.
> >
> > We wrote a Shared Disk File EXclusiveness Control Program, called
> > "SF-EX" for short, could prevent a destruction of data on
> > shared disk file system due to Split-Brain.
> >
> > This program consists of CUI commands written in the C and RA,
> > the former is used for managing and monitoring shared disk status
> > and the latter is the same as other common RAs.
> >
> > We tested this program on IBM and HP platform, and we confirmed
> > all functions worked well.
> >
> > Our test environment is listed below:
> >  Software:
> >   OS :RHEL4 ES Update5(kernel2.6.9-55.ELsmp)
> >   HB2:heartbeat-2.1.2-2
> >
> >  Hardware:
> >   IBM platform:
> >Server : System x3650
> >Shared disk: DS 4700 (FC)
> >
> >   HP platform:
> >Server : DL380G5
> >Shared disk: MSA500G2 (SCSI)
> >
> > How to install and configuration are described in README.
> > If you are interested in this program, please try it and
> > let me know your comments.
> >
> > Your suggestions on how to improve are really appreciated.
> >
> > Best regards.
> ___
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


RE: [Linux-ha-dev] Re: Shared disk file Exclusiveness control program for HB2

2007-08-08 Thread Junko IKEDA
Hi Alan,

Thank you for your comment.
This program isn't considered as a quorum module for now.
It was developed as RA, I mean, it provides an independent function without
some effects to Heartbeat's internal behavior.

since we don't have familiar with a quorum module,
what are the requirements for it?
We have no problem with improving it, but it's not known exactly how we
should do.

Best Regards,
Junko Ikeda

NTT DATA INTELLILINK CORPORATION


> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Alan
Robertson
> Sent: Wednesday, August 08, 2007 1:12 PM
> To: NAKAHIRA Kazutomo
> Cc: linux-ha-dev@lists.linux-ha.org
> Subject: [Linux-ha-dev] Re: Shared disk file Exclusiveness control program
for
> HB2
> 
> NAKAHIRA Kazutomo wrote:
> > Hello, all.
> >
> > We wrote a Shared Disk File EXclusiveness Control Program, called
> > "SF-EX" for short, could prevent a destruction of data on
> > shared disk file system due to Split-Brain.
> >
> > This program consists of CUI commands written in the C and RA,
> > the former is used for managing and monitoring shared disk status
> > and the latter is the same as other common RAs.
> >
> > We tested this program on IBM and HP platform, and we confirmed
> > all functions worked well.
> >
> > Our test environment is listed below:
> >  Software:
> >   OS :RHEL4 ES Update5(kernel2.6.9-55.ELsmp)
> >   HB2:heartbeat-2.1.2-2
> >
> >  Hardware:
> >   IBM platform:
> >Server : System x3650
> >Shared disk: DS 4700 (FC)
> >
> >   HP platform:
> >Server : DL380G5
> >Shared disk: MSA500G2 (SCSI)
> >
> > How to install and configuration are described in README.
> > If you are interested in this program, please try it and
> > let me know your comments.
> >
> > Your suggestions on how to improve are really appreciated.
> 
> This program would be even more useful if it were available as a quorum
> module.  Have you thought about making it a quorum module?
> 
>   Thanks!
> 
> 
> --
> Alan Robertson <[EMAIL PROTECTED]>
> 
> "Openness is the foundation and preservative of friendship...  Let me
> claim from you at all times your undisguised opinions." - William
> Wilberforce
> ___
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


[Linux-ha-dev] re-start action when a resource parameter is changed

2007-06-14 Thread Junko IKEDA
Hi,

I'm trying to change some parameters for a resource with "crm_resource"
,online.
It's easy to change them but I found a resource stopped at first, and
started again anytime.
Is it possible to ignore this re-start action in the case the resource
wouldn't move to another node.
for instance; change "resource_stickiness" from '0' to 'INFINITY'.
The resource could keep running on the current node, right?

It seems that this depends on the last "else" section in
~/crm/pengine/native.c.
What will happen if this section is removed?
Is "RSC_ROLE_STOPPED" value needed elsewhere?

~/crm/pengine/native.c
---
...
} else {
stop = stop_action(rsc, current, TRUE);
start = start_action(rsc, next, TRUE);
stop->optional = start->optional;

if(start->runnable == FALSE) {
rsc->next_role = RSC_ROLE_STOPPED;
} else if(start->optional) {
crm_notice("Leave resource %s\t(%s)",
rsc->id, next->details->uname);

} else {
crm_notice("Restart resource %s\t(%s)",
rsc->id, next->details->uname);
}
}
---

Best Regards,
Junko Ikeda

NTT DATA INTELLILINK CORPORATION
Open Source Solutions Business Unit
Open Source Business Division

Toyosu Center Building Annex, 3-3-9, Toyosu, Koto-ku, Tokyo 135-0061, Japan
TEL : +81-3-3534-4810 FAX : +81-3-3534-4814 mailto:[EMAIL PROTECTED]
http://www.intellilink.co.jp/

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


RE: [Linux-ha-dev] transition graphs during fail-over process

2007-04-19 Thread Junko IKEDA
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Andrew
Beekhof
> Sent: Thursday, April 19, 2007 4:29 PM
> To: High-Availability Linux Development List
> Subject: Re: [Linux-ha-dev] transition graphs during fail-over process
> 
> On 4/19/07, Junko IKEDA <[EMAIL PROTECTED]> wrote:
> > Hi,
> >
> > This is not a serious problem but I just take notice of this, so please
let
> > me know whether this is a common behavior for Heartbeat or not, if you
know
> > anything about it.
> >
> > There are two nodes, a virtual IP (IPaddr) is running on one of them.
> > If the IPaddr is taken away, fail-over process is sure to succeed.
> > What I notice is;
> > Heartbeat starts IPaddr on the node which it has already been dead first
(it
> > would fail), and next, do it on the stand-by node.
> > Why does Heartbeat try to (re)start the resource on the failed node
again?
> 
> presumably because you told us to run there if we can

I agree, some might want to do that.
I feel that it's a waste of time for failover...
In their opinion, if the first trial (re-start a resource on the failed
node) succeed, 
we won't see a failover, right?
Is there no possibility that a resource try to restart again and again on
the same node?

> > I don't understand why "pe-input3.bz2" was needed.
> 
> it wasn't used?
> if so, its likely some event happened which invalidated the transition
> before it could be started

yes, this graph didn't carry out and next one was run.

Thanks a lot for your quick response!

Junko Ikeda

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/


RE: [Linux-ha-dev] Split-Brain that use the latest development version

2007-04-17 Thread Junko IKEDA
> -Original Message-
> From: [EMAIL PROTECTED]
> [mailto:[EMAIL PROTECTED] On Behalf Of Lars
> Marowsky-Bree
> Sent: Tuesday, April 17, 2007 11:13 PM
> To: High-Availability Linux Development List
> Subject: Re: [Linux-ha-dev] Split-Brain that use the latest development
> version
> 
> On 2007-04-17T19:16:52, "?$BCSED=_;R" <[EMAIL PROTECTED]> wrote:
> 
> > Hi,
> >
> > I am trying to figure out Split-Brain still,
> > and using the latest development version this time.
> >
> > While Heartbeat2 is running on two nodes, if I disconnect the
interconnect
> > LAN, Split-Brain goes on.
> 
> Yes. This is a bug in your setup. You need STONITH.


I see. 
Is there no choice but to setup STONISH?

> 
> > After confirmation of Split-Brain, the LAN would be connected again.
> > With Heartbeat 2.0.8-1, I could check that Split-Brain could be resolved
> > after recovering the LAN trouble, but with the development version,
there
> > remain Split-Brain.
> > Is this the same phenomenon which Andrew pointed out?
> 
> Possibly. Please file a bugzilla entry and attach debug 1 logs.


I post it right now; Bug #1546.

> 
> 
> Sincerely,
> Lars Marowsky-Brée
> 
> --
> Teamlead Kernel, SuSE Labs, Research and Development
> SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
> "Experience is the name everyone gives to their mistakes." -- Oscar Wilde
> 
> ___
> Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/

___
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/