Hi,

There is one way to improve the detection time. You can change the " 
net.ipv4.tcp_retries2"  value to 3.
Default value of " net.ipv4.tcp_retries2" is 15. 

Thanks,
Nivrutti

-----Original Message-----
From: Mathivanan Naickan Palanivelu [mailto:[email protected]] 
Sent: Thursday, September 15, 2016 6:38 PM
To: Shu Wang <[email protected]>; [email protected]
Subject: Re: [users] how long it takes to detect node sudden power

Hi,

You could try the fix in this ticket 
https://urldefense.proofpoint.com/v2/url?u=https-3A__sourceforge.net_p_opensaf_tickets_2014_&d=DQICAg&c=IL_XqQWOjubgfqINi2jTzg&r=8oj2Tn7_JuMy90N67rXExkWsx29-JTWbXUkT3IIi99w&m=DetywC0rOBBSwA5PRfrcpfRXAyGliPduaCiI-fnO-gw&s=gSGrK2pteB9mnPgovHNo3qsOXF0w9s77wt4nUXOHt4o&e=
  and see if the scenario is the same The patch In 
https://urldefense.proofpoint.com/v2/url?u=https-3A__sourceforge.net_p_opensaf_staging_ci_b30d5e33e50c7eea8cc1730cbe0a0dde572621f0_&d=DQICAg&c=IL_XqQWOjubgfqINi2jTzg&r=8oj2Tn7_JuMy90N67rXExkWsx29-JTWbXUkT3IIi99w&m=DetywC0rOBBSwA5PRfrcpfRXAyGliPduaCiI-fnO-gw&s=UTa3tlpHkkLFWQGUlegcxS3Y6JFlHiW2Yfx1bCbKcTM&e=
 

Thanks,
Mathi.


> -----Original Message-----
> From: Shu Wang [mailto:[email protected]]
> Sent: Saturday, June 20, 2015 1:50 AM
> To: [email protected]
> Subject: Re: [users] how long it takes to detect node sudden power
> 
> We have a similar scenario. One of our payload node rebooted, it took 
> from a few seconds to a few minutes for other nodes to detect the node 
> loss. Since it took the master controller a few minutes to detect the 
> node loss and reacted to the loss, this caused serious problems and 
> many service units went bad. Is there anyway to improve the detection time?
> 
> Thank you!
> 
> Shu Wang | Senior Analyst | +1(407)708-5117 or x3917| 
> www.NetCracker.com Proven Partner to Communications Service Providers
> 
> -----Original Message-----
> Message: 3
> Date: Tue, 14 Apr 2015 09:58:51 +0000
> From: Yao Cheng LIANG <[email protected]>
> Subject: Re: [users] how long it takes to detect node sudden power
>         loss
> To: 'A V Mahesh' <[email protected]>, Mathivanan Naickan
>         Palanivelu      <[email protected]>
> Cc: "[email protected]"
>         <[email protected]>
> Message-ID: <285F6C4AD3FBC04EBAE1D68203EA87F20B037F25@asdag1>
> Content-Type: text/plain; charset="windows-1255"
> 
> Let me give more info about my setup:
> 
> 
> 1.       I have two node, running as controller
> 
> 2.       Besides OpenSAF service, I have another service unit with three
> component in it
> 
> 3.       These components use Checkpoint service to data synchronization
> 
> 
> 
> My dtmd.conf is as below:
> 
> ?
> 
> DTM_INI_DIS_TIMEOUT_SECS=5
> 
> 
> 
> DTM_TCP_KEEPIDLE_TIME=2
> 
> 
> 
> DTM_TCP_KEEPALIVE_INTVL=1
> 
> 
> 
> DTM_TCP_KEEPALIVE_PROBES=2
> 
> 
> 
> I read the code and found it is using TCP keepalive to detect failure 
> of peer node. While keepalive packet will not be send until some time 
> after the link is IDLE. I think the issue is here. Suppose ?standby? 
> node is sending something to ?active? node, while at this time ?active? node 
> is rebooted, ?standby?
> node will keeping sending this until it reaches maximum retries. In 
> this period, the link will not be idel, thus the keepalive mechanism 
> will not start to work. This may cause ?standby? node long time to detect 
> failure of ?active?
> node.
> 
> Thanks.
> 
> 
> 
> Ted
> 
> 
> 
> 
> 
> From: A V Mahesh [mailto:[email protected]]
> Sent: Monday, April 13, 2015 10:06 PM
> To: Yao Cheng LIANG; Mathivanan Naickan Palanivelu
> Cc: [email protected]
> Subject: Re: [users] how long it takes to detect node sudden power 
> loss
> 
> Hi,
> 
> Un-comment the below line to enable trace of osafdtm in 
> /etc/opensaf/dtmd.conf
> 
> #args="--tracemask=0xffffffff"   ------>  args="--tracemask=0xffffffff"
> 
> And do  `export MDS_LOG_LEVEL=5` on both node consoles before 
> `/etc/init.d/opensafd restart` to get debuig MDS logs.
> 
> 
> -AVM
> 
> On 4/13/2015 11:52 AM, Yao Cheng LIANG wrote:
> Dear AVM,
> 
> Thanks. But I need to add ?args="--loglevel=info"? to dtmd.conf so 
> that /var/log/opensaf/osafdtm and /var/log/opensaf/mds.log can be seen, right?
> 
> Ted
> 
> From: A V Mahesh [mailto:[email protected]]
> Sent: Monday, April 13, 2015 1:03 PM
> To: Yao Cheng LIANG; Mathivanan Naickan Palanivelu
> Cc: [email protected]<mailto:opensaf-
> [email protected]>
> Subject: Re: [users] how long it takes to detect node sudden power 
> loss
> 
> Hi Ted,
> 
> On 4/10/2015 3:54 PM, Yao Cheng LIANG wrote:
> I did 3o times rebooting ?standby? node, and found two times it needs 
> 1~2 minutes for the ?active? node to detect it
> 
> Can you please share the  following data of both nodes when ?active? 
> node detection of standby taken 1~2 minutes.
> 
> 1) #/var/log/opensaf/osafdtm
> 2) #/var/log/opensaf/mds.log
> 3) #/var/log/messages ( syslog )
> 
> 4) #top    (output at the time of detection)
> 5) /etc/opensaf/dtmd.conf
> 
> -AVM
> 
> On 4/10/2015 3:54 PM, Yao Cheng LIANG wrote:
> I did some tests recently. I have two controllers, and I reboot one 
> and see how long the second could detect failure of the peer. I did 3o 
> times rebooting ?standby? node, and found two times it needs 1~2 
> minutes for the ?active? node to detect it. Could you anyone tell me 
> the reason and the solution?
> 
> Thanks.
> 
> Ted
> 
> Sent from Windows Mail
> 
> From: Mathivanan Naickan Palanivelu<mailto:[email protected]>
> Sent: ?Thursday?, ?April? ?9?, ?2015 ?7?:?39? ?PM
> To: Yao Cheng LIANG<mailto:[email protected]>
> Cc: [email protected]<mailto:opensaf-
> [email protected]>, 'A V
> Mahesh'<mailto:[email protected]>
> 
> I think since these are TCP keepalive configuration values, the 
> connection loss would be detected immediatey in the cases of abrupt 
> powershutdown or cable unplug.
> 
> Thanks,
> Mathi.
> 
> ----- [email protected]<mailto:[email protected]> wrote:
> 
> > Is there any approach to hasten this detection, because 4 seconds is 
> > too long for some use cases?
> >
> > Br,
> >
> > Ted
> >
> > -----Original Message-----
> > From: A V Mahesh [mailto:[email protected]]
> > Sent: Monday, March 30, 2015 12:29 PM
> > To:
> > [email protected]<mailto:[email protected]
> > ef
> > orge.net>
> > Subject: Re: [users] how long it takes to detect node sudden power 
> > loss
> >
> > Hi,
> >
> >  >>Does that mean it needs 2 + 2*1 = 4s before the peer can detect 
> > the node connection loss if I suddenly unplug power supply of one node?
> > Yes,when the connection goes down (  disconnect the cable/unplug 
> > power supply )  in 4 seconds detect that the connection has been 
> > lost
> >
> >   -AVM
> >
> > On 3/29/2015 7:11 PM, Yao Cheng LIANG wrote:
> > > Dear all,
> > >
> > > If using tcp, the underlying dtms using tcp keepalive to detect
> > connection loss. If my dtmd.conf is as below:
> > >
> > > DTM_TCP_KEEPIDLE_TIME=2
> > >
> > > DTM_TCP_KEEPALIVE_INTVL=1
> > >
> > > DTM_TCP_KEEPALIVE_PROBES=2
> > >
> > > Does that mean it needs 2 + 2*1 = 4s before the peer can detect 
> > > the
> > node connection loss if I suddenly unplug power supply of one node?
> > >
> > > Thanks.
> > >
> > > Ted
> > >
> > >
> > --------------------------------------------------------------------
> > --
> > > -------- Dive into the World of Parallel Programming The Go 
> > > Parallel
> >
> > > Website, sponsored by Intel and developed in partnership with
> > Slashdot
> > > Media, is your hub for all things parallel software development,
> > from
> > > weekly thought leadership blogs to news, videos, case studies, 
> > > tutorials and more. Take a look and join the conversation now.
> > > https://urldefense.proofpoint.com/v2/url?u=http-3A__goparallel.sou
> > > rceforge.net_&d=DQICAg&c=IL_XqQWOjubgfqINi2jTzg&r=8oj2Tn7_JuMy90N6
> > > 7rXExkWsx29-JTWbXUkT3IIi99w&m=DetywC0rOBBSwA5PRfrcpfRXAyGliPduaCiI
> > > -fnO-gw&s=Zs7HfD3qAmpaCItVfMRUxsDZoQG2omqLC_2-ifs5Kxw&e=
> > > _______________________________________________
> > > Opensaf-users mailing list
> > > [email protected]<mailto:[email protected]
> > > rc
> > > eforge.net>
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.sourcef
> > > orge.net_lists_listinfo_opensaf-2Dusers&d=DQICAg&c=IL_XqQWOjubgfqI
> > > Ni2jTzg&r=8oj2Tn7_JuMy90N67rXExkWsx29-JTWbXUkT3IIi99w&m=DetywC0rOB
> > > BSwA5PRfrcpfRXAyGliPduaCiI-fnO-gw&s=7eqTbeBNi29xHoYbFSxSInV7UyTiDf
> > > hJtPItghKLab0&e=
> >
> >
> > --------------------------------------------------------------------
> > --
> > -------- Dive into the World of Parallel Programming The Go Parallel 
> > Website, sponsored by Intel and developed in partnership with 
> > Slashdot Media, is your hub for all things parallel software 
> > development, from weekly thought leadership blogs to news, videos, 
> > case studies, tutorials and more. Take a look and join the conversation now.
> > https://urldefense.proofpoint.com/v2/url?u=http-3A__goparallel.sourc
> > eforge.net_&d=DQICAg&c=IL_XqQWOjubgfqINi2jTzg&r=8oj2Tn7_JuMy90N67rXE
> > xkWsx29-JTWbXUkT3IIi99w&m=DetywC0rOBBSwA5PRfrcpfRXAyGliPduaCiI-fnO-g
> > w&s=Zs7HfD3qAmpaCItVfMRUxsDZoQG2omqLC_2-ifs5Kxw&e=
> > _______________________________________________
> > Opensaf-users mailing list
> > [email protected]<mailto:[email protected]
> > ef orge.net> 
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.sourcefor
> > ge.net_lists_listinfo_opensaf-2Dusers&d=DQICAg&c=IL_XqQWOjubgfqINi2j
> > Tzg&r=8oj2Tn7_JuMy90N67rXExkWsx29-JTWbXUkT3IIi99w&m=DetywC0rOBBSwA5P
> > RfrcpfRXAyGliPduaCiI-fnO-gw&s=7eqTbeBNi29xHoYbFSxSInV7UyTiDfhJtPItgh
> > KLab0&e=
> >
> > --------------------------------------------------------------------
> > --
> > -------- Dive into the World of Parallel Programming The Go Parallel 
> > Website, sponsored by Intel and developed in partnership with 
> > Slashdot Media, is your hub for all things parallel software 
> > development, from weekly thought leadership blogs to news, videos, 
> > case studies, tutorials and more. Take a look and join the conversation now.
> > https://urldefense.proofpoint.com/v2/url?u=http-3A__goparallel.sourc
> > eforge.net_&d=DQICAg&c=IL_XqQWOjubgfqINi2jTzg&r=8oj2Tn7_JuMy90N67rXE
> > xkWsx29-JTWbXUkT3IIi99w&m=DetywC0rOBBSwA5PRfrcpfRXAyGliPduaCiI-fnO-g
> > w&s=Zs7HfD3qAmpaCItVfMRUxsDZoQG2omqLC_2-ifs5Kxw&e=
> > _______________________________________________
> > Opensaf-users mailing list
> > [email protected]<mailto:[email protected]
> > ef orge.net> 
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.sourcefor
> > ge.net_lists_listinfo_opensaf-2Dusers&d=DQICAg&c=IL_XqQWOjubgfqINi2j
> > Tzg&r=8oj2Tn7_JuMy90N67rXExkWsx29-JTWbXUkT3IIi99w&m=DetywC0rOBBSwA5P
> > RfrcpfRXAyGliPduaCiI-fnO-gw&s=7eqTbeBNi29xHoYbFSxSInV7UyTiDfhJtPItgh
> > KLab0&e=
> 
> ------------------------------
> 
> 
> 
> ________________________________
> The information transmitted herein is intended only for the person or 
> entity to which it is addressed and may contain confidential, 
> proprietary and/or privileged material. Any review, retransmission, 
> dissemination or other use of, or taking of any action in reliance 
> upon, this information by persons or entities other than the intended 
> recipient is prohibited. If you received this in error, please contact the 
> sender and delete the material from any computer.
> 
> ----------------------------------------------------------------------
> -------- _______________________________________________
> Opensaf-users mailing list
> [email protected]
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.sourceforge
> .net_lists_listinfo_opensaf-2Dusers&d=DQICAg&c=IL_XqQWOjubgfqINi2jTzg&
> r=8oj2Tn7_JuMy90N67rXExkWsx29-JTWbXUkT3IIi99w&m=DetywC0rOBBSwA5PRfrcpf
> RXAyGliPduaCiI-fnO-gw&s=7eqTbeBNi29xHoYbFSxSInV7UyTiDfhJtPItghKLab0&e=

------------------------------------------------------------------------------
_______________________________________________
Opensaf-users mailing list
[email protected]
https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.sourceforge.net_lists_listinfo_opensaf-2Dusers&d=DQICAg&c=IL_XqQWOjubgfqINi2jTzg&r=8oj2Tn7_JuMy90N67rXExkWsx29-JTWbXUkT3IIi99w&m=DetywC0rOBBSwA5PRfrcpfRXAyGliPduaCiI-fnO-gw&s=7eqTbeBNi29xHoYbFSxSInV7UyTiDfhJtPItghKLab0&e=
 

------------------------------------------------------------------------------
_______________________________________________
Opensaf-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Reply via email to