Hi Shu Wang,

It seems you are using TCP transport , please provide the following data.


On 3/20/2015 1:25 AM, Shu Wang wrote:
> Then nodes lost contact for 10 seconds and rejoined

Can you please share your observation , why the fault node didn't go for 
reboot ?
Have you customized/any of  the /etc/opensaf/dtmd.conf configuration ?
please share  /var/log/messages  SC-1 & SC-2

On 3/20/2015 1:25 AM, Shu Wang wrote:
> For example, the following message was seen from /var/log/messages:
> NO Lost contact with 'appbox'

In default configuration the `node_name` should be  SC-1 , SC-2 , PL-3 
,ect ...
have customized the imm.xml with your node_name`s ?


-AVM

On 3/20/2015 1:25 AM, Shu Wang wrote:
> We have a scenario when nodes lost contact for 10 seconds and rejoined, some 
> service units ended up in Terminating state.
>
> For example, the following message was seen from /var/log/messages:
> NO Lost contact with 'appbox'
>
> We saw some service units on the same box disabled. Then we performed lock 
> and lock-in on the disabled service unit:
> amf-adm lock safSu=amfSU2.1,safSg=amfSG2,safApp=myApp
> amf-adm lock-in safSu=amfSU2.1,safSg=amfSG2,safApp=myApp
>
> Then we tried the following commands:
> amf-adm repaired safSu=amfSU2.1,safSg=amfSG2,safApp=myApp
> amf-adm unlock-in safSu=amfSU2.1,safSg=amfSG2,safApp=myApp
>
> For either repaired or unlock-in, we got the following error:
> error - command timed out (alarm)
>
> SU state stayed as:
> safSu=amfSU2.1,safSg=amfSG2,safApp=myApp
>           saAmfSUAdminState=LOCKED-INSTANTIATION(3)
>           saAmfSUOperState=ENABLED(1)
>           saAmfSUPresenceState=TERMINATING(4)
>           saAmfSUReadinessState=OUT-OF-SERVICE(1)
>
> Eventually we had to stop the node and restart the node to bring things back 
> to normal.
>
> Why disabled service unit stuck at TERMINATING state?  What made a service 
> unit stuck at TERMINATING state?
> If a node is lost for a little while, what are the effects of the node lost 
> contact in the cluster?
> How to repair the damage caused by the node lost?
>
> Thanks!
>
> Shu Wang | Senior Analyst | +1(407)708-5117 or x3917| www.NetCracker.com
> Proven Partner to Communications Service Providers
>
>
>
>
> ________________________________
> The information transmitted herein is intended only for the person or entity 
> to which it is addressed and may contain confidential, proprietary and/or 
> privileged material. Any review, retransmission, dissemination or other use 
> of, or taking of any action in reliance upon, this information by persons or 
> entities other than the intended recipient is prohibited. If you received 
> this in error, please contact the sender and delete the material from any 
> computer.
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming The Go Parallel Website, sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for all
> things parallel software development, from weekly thought leadership blogs to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Opensaf-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/opensaf-users


------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Opensaf-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Reply via email to