Hi Ted, What is the transport you are using? If you are using TCP, you need to adjust following tcp_retries2 parameters of the system. By default *tcp_retries2=15. *
Add *net.ipv4.tcp_retries2=3* (3 works for me. You can try with other values) in /etc/sysctl.conf to persist the changes across reboots. Let me know if this helps. Thanks, Nivrutti On Fri, Dec 19, 2014 at 10:57 AM, Nagendra Kumar <[email protected]> wrote: > Hi Ted, > > I was kind of guessing that. Please share the snaps of > syslog and saflog of nodes. > > > > Thanks > > -Nagu > > > > *From:* Yao Cheng LIANG [mailto:[email protected]] > *Sent:* 19 December 2014 10:47 > *To:* Nagendra Kumar; Nivrutti Kale > *Cc:* piyush jaiswal; [email protected]; Yao Cheng LIANG > > *Subject:* RE: [users] multiple-node simultaneous failure handling > > > > Dear Nagu, > > > > Thanks. This is different from OpenSAF “lock” operation. It is kind of > operation similar to “reboot”. > > > > Ted > > > > *From:* Nagendra Kumar [mailto:[email protected] > <[email protected]>] > *Sent:* Friday, December 19, 2014 1:21 PM > *To:* Yao Cheng LIANG; Nivrutti Kale > *Cc:* piyush jaiswal; [email protected] > *Subject:* RE: [users] multiple-node simultaneous failure handling > > > > Hi Ted, > > Can you please clarify how did you lock or what do you mean > by locking “Physical node 1”. In OpenSAF, you can lock a node like sc-1, > sc-2, pl-3, etc one at a time. > > > > Thanks > > -Nagu > > > > *From:* Yao Cheng LIANG [mailto:[email protected] <[email protected]>] > *Sent:* 18 December 2014 19:54 > *To:* Nivrutti Kale; Nagendra Kumar > *Cc:* piyush jaiswal; [email protected]; Yao Cheng LIANG > *Subject:* Re: [users] multiple-node simultaneous failure handling > > > > Dear all, > > > > Today I did more tests in virtualized environment, by “lock” one of > the “compute” node where one “active” controller and a “active” payload > reside. The “lock” operation would “terminate” all the virtual machine > running on that physical node. I have expected that the “active” role would > switched to another VM running on another compute node, which I have > configured “1+1” protection relatiosnhip. > > > > But when surprised me is that that “controller” vm switched very quickly, > but the payload vm did not switched(the “standby" vm kept in “standby” > although “active” VM has been terminated). I have captured the packet on > now “active” controller, and noticed that it has not received packets > from(211.7 -- former “active” payload, but has been terminated" for long), > but keep sending arp packet asking “who has 211.7”. > > > > Please see attached file for packet I have captured. > > > > Note: physical node 1 > physical node 2 > > before lock: sc-1(211.2 > )-active sc-2(211.3) - standby > > pl-3(211.7) -active > pl-4 (211.7) - standby > > > > after lock sc-1 terminated > sc-2(211.3) became active > > pl-3 termianted > pl-4(211.7) kept “standby” > > > > The packets were captured on 211.3 > > > > Thanks. > > > > Ted > > > > Sent from Windows Mail > > > > *From:* Nivrutti Kale <[email protected]> > *Sent:* Wednesday, December 10, 2014 2:16 PM > *To:* Nagendra Kumar <[email protected]> > *Cc:* Yao Cheng LIANG <[email protected]>, piyush jaiswal > <[email protected]>, [email protected] > > > > Hi Ted, > > > > I am using opensaf in "virtualized environment " and I don't see any > issues till now with OpenSAF. > > > > Regarding the multiple fail-over, we tested the blade fail-over on which 6 > VM's (1 Active controller and 5 payloads) were placed. OpenSAF works like a > charm here. > > > > First controller is failed-over, then notification for other payload nodes > is received by new Active controller, so multiple fail-over in a correct > sequence works very well with OpenSAF. I am using opensaf 4.2.0 and TCP as > a OpenSAF transport. > > > > Thanks, > > Nivrutti > > > > On Wed, Dec 10, 2014 at 11:23 AM, Nagendra Kumar <[email protected]> > wrote: > > Hi Ted, > > >> In my case, all these VMs works as payload. > Then you should have no problem. > >> Have you tested how many these concurrent failures OpenSAF can support? > I am using 4.4.0. > OpenSAF can handle any number of concurrent failures. > > I haven't joined OP-NFV. > If the " virtualized environment " is only requirement, then OpenSAF can > run without any problems. But I guess, there may be more requirements than > that. > We are working on Cloud requirements for OpenSAF and there has been few > tickets raised. > > Thanks > -Nagu > > > -----Original Message----- > > From: Yao Cheng LIANG [mailto:[email protected]] > > > Sent: 10 December 2014 08:54 > > To: Nagendra Kumar; piyush jaiswal; [email protected] > > Cc: Yao Cheng LIANG > > Subject: RE: [users] multiple-node simultaneous failure handling > > > > Dear Nagu, > > > > Thanks for clarification. In my case, all these VMs works as payload. > Have you > > tested how many these concurrent failures OpenSAF can support? I am using > > 4.4.0. > > > > By the way, I am working in OP-NFV for HA proposal? Have you joined the > same > > work-force, and is there any issue applying OpenSAF to these virtualized > > environment? > > > > Thanks. > > > > Ted > > > > -----Original Message----- > > From: Nagendra Kumar [mailto:[email protected]] > > Sent: Tuesday, December 09, 2014 8:53 PM > > To: Yao Cheng LIANG; piyush jaiswal; [email protected] > > Subject: RE: [users] multiple-node simultaneous failure handling > > > > Hi Yao, > > If one controller remains available at a separate node then the > given > > scenario will work fine. > > > > Going detailed: > > 1. If Node 1 and Node 2 are controllers and Node 1 reboots, the scenario > works > > fine. > > 2. If Node 1 and Node 2 are payloads (Of course, there is one > controller in the > > cluster at Node X), then the scenario works fine. > > 3. If Node 1 is payload and Node 2 is controller and Node 1 reboots, > then the > > scenario works fine. > > 4. If Node 1 is controller and Node 2 is payload and Node 1 reboots(and > there is > > one another controller in the cluster), then the scenario works fine. > > 5. If Node 1 is controller and Node 2 is payload and Node 1 reboots(and > there is > > no other controller in the cluster), then the scenario will not work as > OpenSAF > > cluster requires one controller. > > > > Thanks > > -Nagu > > > > > -----Original Message----- > > > From: Yao Cheng LIANG [mailto:[email protected]] > > > Sent: 09 December 2014 17:37 > > > To: piyush jaiswal; [email protected] > > > Subject: [users] multiple-node simultaneous failure handling > > > > > > Dear all, > > > > > > I am now applying OpenSAF to a cloud environment. I have two physical > > > nodes, on each node, there are a few virtual machine. Please see > diagram > > below: > > > > > > vm name on physical node 1+1 protected by vm on > physical node > > > > ------------------------------------------------------------------------- > ---------------------- > > > vm1 physical node 1 vm2 > physical node 2 > > > vm3 physical node 1 vm4 > physical node 2 > > > vm5 physical node 1 vm6 > physical node 2 > > > vm7 physical node 1 vm8 > physical node 2 > > > > > > so app1 on vm1 in protecte by the same app on vm2, app3 on vm3 is > > > protected by the same app on vm4, .. > > > > > > My question is when I reboot physical node 1, can opensaf handle the > > > simultaneous failure of vm1/3/5/7, and failover to vm2/4/6/8. > > > > > > Thanks. > > > > > > Ted > > > ---------------------------------------------------------------------- > > > -------- Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT > > > Server from Actuate! Instantly Supercharge Your Business Reports and > > > Dashboards with Interactivity, Sharing, Native Excel Exports, App > > > Integration & more Get technology previously reserved for > > > billion-dollar corporations, FREE > > > http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg. > > > clk > > > trk > > > _______________________________________________ > > > Opensaf-users mailing list > > > [email protected] > > > https://lists.sourceforge.net/lists/listinfo/opensaf-users > > > ------------------------------------------------------------------------------ > Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server > from Actuate! Instantly Supercharge Your Business Reports and Dashboards > with Interactivity, Sharing, Native Excel Exports, App Integration & more > Get technology previously reserved for billion-dollar corporations, FREE > > http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk > _______________________________________________ > Opensaf-users mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/opensaf-users > > > ------------------------------------------------------------------------------ Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server from Actuate! Instantly Supercharge Your Business Reports and Dashboards with Interactivity, Sharing, Native Excel Exports, App Integration & more Get technology previously reserved for billion-dollar corporations, FREE http://pubads.g.doubleclick.net/gampad/clk?id=164703151&iu=/4140/ostg.clktrk _______________________________________________ Opensaf-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-users
