Re: [users] multiple-node simultaneous failure handling

Nivrutti Kale Sat, 20 Dec 2014 03:35:44 -0800

Hi Ted,

What is the transport you are using?
If you are using TCP, you need to adjust following tcp_retries2 parameters
of the system.
By default *tcp_retries2=15. *


Add *net.ipv4.tcp_retries2=3* (3 works for me. You can try with other
values) in /etc/sysctl.conf to persist the changes across reboots.

Let me know if this helps.

Thanks,
Nivrutti

On Fri, Dec 19, 2014 at 10:57 AM, Nagendra Kumar <[email protected]>
wrote:

> Hi Ted,
>
>                 I was kind of guessing that. Please share the snaps of
> syslog and saflog of nodes.
>
>
>
> Thanks
>
> -Nagu
>
>
>
> *From:* Yao Cheng LIANG [mailto:[email protected]]
> *Sent:* 19 December 2014 10:47
> *To:* Nagendra Kumar; Nivrutti Kale
> *Cc:* piyush jaiswal; [email protected]; Yao Cheng LIANG
>
> *Subject:* RE: [users] multiple-node simultaneous failure handling
>
>
>
> Dear Nagu,
>
>
>
> Thanks. This is different from OpenSAF “lock” operation. It is kind of
> operation similar to “reboot”.
>
>
>
> Ted
>
>
>
> *From:* Nagendra Kumar [mailto:[email protected]
> <[email protected]>]
> *Sent:* Friday, December 19, 2014 1:21 PM
> *To:* Yao Cheng LIANG; Nivrutti Kale
> *Cc:* piyush jaiswal; [email protected]
> *Subject:* RE: [users] multiple-node simultaneous failure handling
>
>
>
> Hi Ted,
>
>               Can you please clarify how did you lock or what do you mean
> by locking “Physical node 1”. In OpenSAF, you can lock a node like sc-1,
> sc-2, pl-3, etc one at a time.
>
>
>
> Thanks
>
> -Nagu
>
>
>
> *From:* Yao Cheng LIANG [mailto:[email protected] <[email protected]>]
> *Sent:* 18 December 2014 19:54
> *To:* Nivrutti Kale; Nagendra Kumar
> *Cc:* piyush jaiswal; [email protected]; Yao Cheng LIANG
> *Subject:* Re: [users] multiple-node simultaneous failure handling
>
>
>
> Dear all,
>
>
>
> Today I did more tests in virtualized environment, by “lock” one of
> the “compute” node where one “active” controller and a  “active” payload
> reside. The “lock” operation would “terminate” all the virtual machine
> running on that physical node. I have expected that the “active” role would
> switched to another VM running on another compute node, which I have
> configured “1+1” protection relatiosnhip.
>
>
>
> But when surprised me is that that “controller” vm switched very quickly,
> but the payload vm did not switched(the “standby" vm kept in “standby”
> although “active” VM has been terminated). I have captured the packet on
> now “active” controller, and noticed that it has not received packets
> from(211.7 -- former “active” payload, but has been terminated" for long),
> but keep sending arp packet asking “who has 211.7”.
>
>
>
> Please see attached file for packet I have captured.
>
>
>
> Note:                      physical node 1
> physical node 2
>
> before lock:           sc-1(211.2
> )-active                     sc-2(211.3) - standby
>
>                                 pl-3(211.7) -active
> pl-4 (211.7) - standby
>
>
>
> after lock               sc-1 terminated
> sc-2(211.3) became active
>
>                                 pl-3 termianted
> pl-4(211.7) kept “standby”
>
>
>
> The packets were captured on 211.3
>
>
>
> Thanks.
>
>
>
> Ted
>
>
>
> Sent from Windows Mail
>
>
>
> *From:* Nivrutti Kale <[email protected]>
> *Sent:* ‎Wednesday‎, ‎December‎ ‎10‎, ‎2014 ‎2‎:‎16‎ ‎PM
> *To:* Nagendra Kumar <[email protected]>
> *Cc:* Yao Cheng LIANG <[email protected]>, piyush jaiswal
> <[email protected]>, [email protected]
>
>
>
> Hi Ted,
>
>
>
> I am using opensaf in  "virtualized environment " and I don't see any
> issues till now with OpenSAF.
>
>
>
> Regarding the multiple fail-over, we tested the blade fail-over on which 6
> VM's (1 Active controller and 5 payloads) were placed. OpenSAF works like a
> charm here.
>
>
>
> First controller is failed-over, then notification for other payload nodes
> is received by new Active controller, so multiple fail-over in a correct
> sequence works very well with OpenSAF. I am using opensaf 4.2.0 and TCP as
> a OpenSAF transport.
>
>
>
> Thanks,
>
> Nivrutti
>
>
>
> On Wed, Dec 10, 2014 at 11:23 AM, Nagendra Kumar <[email protected]>
> wrote:
>
> Hi Ted,
>
> >> In my case, all these VMs works as payload.
> Then you should have no problem.
> >> Have you tested how many these concurrent failures OpenSAF can support?
> I am using 4.4.0.
> OpenSAF can handle any number of concurrent failures.
>
> I haven't joined OP-NFV.
> If the " virtualized environment " is only requirement, then OpenSAF can
> run without any problems. But I guess, there may be more requirements than
> that.
> We are working on Cloud requirements for OpenSAF and there has been few
> tickets raised.
>
> Thanks
> -Nagu
>
> > -----Original Message-----
> > From: Yao Cheng LIANG [mailto:[email protected]]
>
> > Sent: 10 December 2014 08:54
> > To: Nagendra Kumar; piyush jaiswal; [email protected]
> > Cc: Yao Cheng LIANG
> > Subject: RE: [users] multiple-node simultaneous failure handling
> >
> > Dear Nagu,
> >
> > Thanks for clarification. In my case, all these VMs works as payload.
> Have you
> > tested how many these concurrent failures OpenSAF can support? I am using
> > 4.4.0.
> >
> > By the way, I am working in OP-NFV for HA proposal? Have you joined the
> same
> > work-force, and is there any issue applying OpenSAF to these virtualized
> > environment?
> >
> > Thanks.
> >
> > Ted
> >
> > -----Original Message-----
> > From: Nagendra Kumar [mailto:[email protected]]
> > Sent: Tuesday, December 09, 2014 8:53 PM
> > To: Yao Cheng LIANG; piyush jaiswal; [email protected]
> > Subject: RE: [users] multiple-node simultaneous failure handling
> >
> > Hi Yao,
> >       If one controller remains available at a separate node then the
> given
> > scenario will work fine.
> >
> > Going detailed:
> > 1. If Node 1 and Node 2 are controllers and Node 1 reboots, the scenario
> works
> > fine.
> > 2. If Node 1 and Node 2 are payloads  (Of course, there is one
> controller in the
> > cluster at Node X), then the scenario works fine.
> > 3. If Node 1 is payload and Node 2 is controller and Node 1 reboots,
> then the
> > scenario works fine.
> > 4. If Node 1 is controller and Node 2 is payload and Node 1 reboots(and
> there is
> > one another controller in the cluster), then the scenario works fine.
> > 5. If Node 1 is controller and Node 2 is payload and Node 1 reboots(and
> there is
> > no other controller in the cluster), then the scenario will not work as
> OpenSAF
> > cluster requires one controller.
> >
> > Thanks
> > -Nagu
> >
> > > -----Original Message-----
> > > From: Yao Cheng LIANG [mailto:[email protected]]
> > > Sent: 09 December 2014 17:37
> > > To: piyush jaiswal; [email protected]
> > > Subject: [users] multiple-node simultaneous failure handling
> > >
> > > Dear all,
> > >
> > > I am now applying OpenSAF to a cloud environment. I have two physical
> > > nodes, on each node, there are a few virtual machine. Please see
> diagram
> > below:
> > >
> > > vm name   on  physical node                1+1 protected by vm on
> physical node
> > >
> -------------------------------------------------------------------------
> ----------------------
> > > vm1              physical node 1                 vm2
>              physical node 2
> > > vm3               physical node 1                vm4
>              physical node 2
> > > vm5               physical node 1                vm6
>              physical node 2
> > > vm7               physical node 1                vm8
>              physical node 2
> > >
> > > so app1 on vm1 in protecte by the same app on vm2, app3 on vm3 is
> > > protected by the same app on vm4, ..
> > >
> > > My question is when I reboot physical node 1, can opensaf handle the
> > > simultaneous failure of vm1/3/5/7, and failover to vm2/4/6/8.
> > >
> > > Thanks.
> > >
> > > Ted
> > > ----------------------------------------------------------------------
> > > -------- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT
> > > Server from Actuate! Instantly Supercharge Your Business Reports and
> > > Dashboards with Interactivity, Sharing, Native Excel Exports, App
> > > Integration & more Get technology previously reserved for
> > > billion-dollar corporations, FREE
> > > http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.
> > > clk
> > > trk
> > > _______________________________________________
> > > Opensaf-users mailing list
> > > [email protected]
> > > https://lists.sourceforge.net/lists/listinfo/opensaf-users
>
>
> ------------------------------------------------------------------------------
> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
> with Interactivity, Sharing, Native Excel Exports, App Integration & more
> Get technology previously reserved for billion-dollar corporations, FREE
>
> http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
> _______________________________________________
> Opensaf-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/opensaf-users
>
>
>
------------------------------------------------------------------------------
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk
_______________________________________________
Opensaf-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Re: [users] multiple-node simultaneous failure handling

Reply via email to