Re: [users] opensaf 4.5.0 osafimmnd causes opensafd.service start failure on payload node

2016-10-10 Thread Jeremy Matthews
I have since opened ports 6700, 6800, and 6900 in the firewall of each node/VM, 
but still no apparent communications between SC-1 and SC-2.

From: Jeremy Matthews
Sent: Monday, October 10, 2016 12:38 PM
To: 'Neelakanta Reddy' ; 
opensaf-users@lists.sourceforge.net
Subject: RE: [users] opensaf 4.5.0 osafimmnd causes opensafd.service start 
failure on payload node

Hello again,

Sorry for the delay. Yes, there does not appear to be communications between at 
least SC-1 and SC-2. On SC-1, I did a tcpdump host , and 
nothing appeared.

I did check whether the firewall was preventing this, but I have since opened 
ports 20 to 23 on each node, and restarted opensafd.service on each node. I 
have set
“export MDS_TRANSPORT=TCP” in nid.conf on each node. However, I still have the 
same result. The opensaf processes started on SC-1 and SC-2, but failed on PL-3.
Should there be at least an OpenSAF heartbeat between SC-1 and SC-2?

Just to list the steps in which I set up this cluster, this is what I did:

1.   On SC-1, installed as a controller:

a.   cd /usr/share/opensaf/immxml

b.   ./immxml-clustersize -s 2 -p 1

c.   I edited the third column in nodes.cfg to the actual hostnames of the 
nodes (VMs):
SC SC-1 linux-h8o1.site
SC SC-2 linux-vzbw.site
PL PL-3 linux-9qkx.site

d.   ./immxml-configure   // this created imm.xml.20161006_0900

e.   cp imm.xml.20161006_0900 /etc/opensaf/imm.xml

f.In /etc/opensaf/dtmd.conf, changed DTM_NODE_IP to SC-1’s IP address.

g.   Added the nodes’ hostnames mapped to their IP addresses in /etc/hosts.

2.   On SC-2,installed as a controller:

a.   Transferred imm.xml from SC-1 to /etc/opensaf on SC-2.

b.   Changed DTM_NODE_IP to SC-2’s IP address.

c.   Changed slot_id to 2.

d.   Added the nodes’ hostnames mapped to their IP addresses in /etc/hosts.

3.   On PL-3, installed as a payload:

a.   Transferred imm.xml from SC-1 to /etc/opensaf on PL-3. I don’t think 
that I needed to do this and have since removed imm.xml from PL-3.

b.   Changed DTM_NODE_IP to PL-3’s IP address.

c.   Changed slot_id to 3.

d.   Added the nodes’ hostnames mapped to their IP addresses in /etc/hosts.

4.   Beginning with SC-1, then SC-2, and lastly PL-3, I entered “systemctl 
start opensafd.service”. Again it started on the controllers but not the 
payload.

Have I missed anything in this setup?

Thanks,

Jeremy

From: Neelakanta Reddy [mailto:reddy.neelaka...@oracle.com]
Sent: Friday, October 7, 2016 8:07 AM
To: Jeremy Matthews 
>; 
opensaf-users@lists.sourceforge.net
Subject: Re: [users] opensaf 4.5.0 osafimmnd causes opensafd.service start 
failure on payload node


NOTICE: This email was received from an EXTERNAL sender


Hi,

There seems to be TANSPORT, problem between SC-1 and SC-2, PL-3.
Both SC-2 and PL-3 did not join SC-1.

Please, check the TRANSPORT(TCP/TIPC) is working correctly between the nodes.

Thanks,
Neel.
On 2016/10/07 11:36 AM, Jeremy Matthews wrote:

Attached. For SC-1 and PL-3, they include /var/log/messages and the 
/var/log/opensaf contents.



For SC-2, I accidentally wrote over /var/log/messages. It’s just got the 
/var/log/opensaf contents.



Thank you,



Jeremy



From: Neelakanta Reddy [mailto:reddy.neelaka...@oracle.com]

Sent: Thursday, October 6, 2016 9:51 PM

To: Jeremy Matthews 
; 
opensaf-users@lists.sourceforge.net

Subject: Re: [users] opensaf 4.5.0 osafimmnd causes opensafd.service start 
failure on payload node





NOTICE: This email was received from an EXTERNAL sender





Hi ,



Share the syslog of all the nodes(SC-1, SC-2, PL-3).



/Neel.



On 2016/10/06 09:04 PM, Jeremy Matthews wrote:

Hi,



I've seen this issue for a payload node in another post which was attributed to 
a configuration error which was resolved by a reboot (?).

I have rebooted my payload node, just in case, but to no effect.



The logs in /var/log/messages when issuing the "systemctl start 
opensafd.service" command:



Oct 6 09:38:35 linux-9qkx opensafd: Starting OpenSAF Services

Oct 6 09:38:35 linux-9qkx osafdtmd[2987]: Started

Oct 6 09:38:35 linux-9qkx osafimmnd[2999]: Started

Oct 6 09:40:05 linux-9qkx systemd[1]: opensafd.service operation timed out. 
Terminating.

Oct 6 09:40:05 linux-9qkx osafimmnd[2999]: MDTM:socket_recv() = 0, conn lost 
with dh server, exiting library err :Success

Oct 6 09:40:05 linux-9qkx systemd[1]: Unit opensafd.service entered failed 
state.



I had enabled the tracing in immnd.conf which caused these in 
/var/log/opensaf/osafimmnd:



Oct 6 9:38:35.142143 osafimmnd 

Re: [users] opensaf 4.5.0 osafimmnd causes opensafd.service start failure on payload node

2016-10-10 Thread Jeremy Matthews
Hello again,

Sorry for the delay. Yes, there does not appear to be communications between at 
least SC-1 and SC-2. On SC-1, I did a tcpdump host , and 
nothing appeared.

I did check whether the firewall was preventing this, but I have since opened 
ports 20 to 23 on each node, and restarted opensafd.service on each node. I 
have set
“export MDS_TRANSPORT=TCP” in nid.conf on each node. However, I still have the 
same result. The opensaf processes started on SC-1 and SC-2, but failed on PL-3.
Should there be at least an OpenSAF heartbeat between SC-1 and SC-2?

Just to list the steps in which I set up this cluster, this is what I did:

1.   On SC-1, installed as a controller:

a.   cd /usr/share/opensaf/immxml

b.   ./immxml-clustersize -s 2 -p 1

c.   I edited the third column in nodes.cfg to the actual hostnames of the 
nodes (VMs):
SC SC-1 linux-h8o1.site
SC SC-2 linux-vzbw.site
PL PL-3 linux-9qkx.site

d.   ./immxml-configure   // this created imm.xml.20161006_0900

e.   cp imm.xml.20161006_0900 /etc/opensaf/imm.xml

f.In /etc/opensaf/dtmd.conf, changed DTM_NODE_IP to SC-1’s IP address.

g.   Added the nodes’ hostnames mapped to their IP addresses in /etc/hosts.

2.   On SC-2,installed as a controller:

a.   Transferred imm.xml from SC-1 to /etc/opensaf on SC-2.

b.   Changed DTM_NODE_IP to SC-2’s IP address.

c.   Changed slot_id to 2.

d.   Added the nodes’ hostnames mapped to their IP addresses in /etc/hosts.

3.   On PL-3, installed as a payload:

a.   Transferred imm.xml from SC-1 to /etc/opensaf on PL-3. I don’t think 
that I needed to do this and have since removed imm.xml from PL-3.

b.   Changed DTM_NODE_IP to PL-3’s IP address.

c.   Changed slot_id to 3.

d.   Added the nodes’ hostnames mapped to their IP addresses in /etc/hosts.

4.   Beginning with SC-1, then SC-2, and lastly PL-3, I entered “systemctl 
start opensafd.service”. Again it started on the controllers but not the 
payload.

Have I missed anything in this setup?

Thanks,

Jeremy

From: Neelakanta Reddy [mailto:reddy.neelaka...@oracle.com]
Sent: Friday, October 7, 2016 8:07 AM
To: Jeremy Matthews ; 
opensaf-users@lists.sourceforge.net
Subject: Re: [users] opensaf 4.5.0 osafimmnd causes opensafd.service start 
failure on payload node


NOTICE: This email was received from an EXTERNAL sender


Hi,

There seems to be TANSPORT, problem between SC-1 and SC-2, PL-3.
Both SC-2 and PL-3 did not join SC-1.

Please, check the TRANSPORT(TCP/TIPC) is working correctly between the nodes.

Thanks,
Neel.
On 2016/10/07 11:36 AM, Jeremy Matthews wrote:

Attached. For SC-1 and PL-3, they include /var/log/messages and the 
/var/log/opensaf contents.



For SC-2, I accidentally wrote over /var/log/messages. It’s just got the 
/var/log/opensaf contents.



Thank you,



Jeremy



From: Neelakanta Reddy [mailto:reddy.neelaka...@oracle.com]

Sent: Thursday, October 6, 2016 9:51 PM

To: Jeremy Matthews 
; 
opensaf-users@lists.sourceforge.net

Subject: Re: [users] opensaf 4.5.0 osafimmnd causes opensafd.service start 
failure on payload node





NOTICE: This email was received from an EXTERNAL sender





Hi ,



Share the syslog of all the nodes(SC-1, SC-2, PL-3).



/Neel.



On 2016/10/06 09:04 PM, Jeremy Matthews wrote:

Hi,



I've seen this issue for a payload node in another post which was attributed to 
a configuration error which was resolved by a reboot (?).

I have rebooted my payload node, just in case, but to no effect.



The logs in /var/log/messages when issuing the "systemctl start 
opensafd.service" command:



Oct 6 09:38:35 linux-9qkx opensafd: Starting OpenSAF Services

Oct 6 09:38:35 linux-9qkx osafdtmd[2987]: Started

Oct 6 09:38:35 linux-9qkx osafimmnd[2999]: Started

Oct 6 09:40:05 linux-9qkx systemd[1]: opensafd.service operation timed out. 
Terminating.

Oct 6 09:40:05 linux-9qkx osafimmnd[2999]: MDTM:socket_recv() = 0, conn lost 
with dh server, exiting library err :Success

Oct 6 09:40:05 linux-9qkx systemd[1]: Unit opensafd.service entered failed 
state.



I had enabled the tracing in immnd.conf which caused these in 
/var/log/opensaf/osafimmnd:



Oct 6 9:38:35.142143 osafimmnd [2999:immnd_main.c:0113] >> immnd_initialize

Oct 6 9:38:35.142188 osafimmnd [2999:osaf_secutil.c:0193] >> 
osaf_auth_server_create

Oct 6 9:38:35.142260 osafimmnd [2999:osaf_secutil.c:0215] << 
osaf_auth_server_create

Oct 6 9:38:35.142270 osafimmnd [2999:ncs_main_pub.c:0223] TR

NCS:PROCESS_ID=2999

Oct 6 9:38:35.142273 osafimmnd [2999:sysf_def.c:0090] TR INITIALIZING LEAP 
ENVIRONMENT

Oct 6 9:38:35.142962 osafimmnd [2999:sysf_def.c:0123] TR DONE INITIALIZING LEAP 

[users] OpenSAF release 5.0.1 can not promote SC after enable "headless cluster" feature

2016-10-10 Thread Jianfeng Dong
Hi,

For several years we use OpenSAF(4.5.2 now) to provide HA service in our 
product(including 2 SC and several payload cards), but our customer keep on 
requiring that it's better to do NOT reboot payload card even if both SC reload 
or hang.

We just knew that the new release 5.0.0 has provided this feature(i.e. 
"headless cluster"), so we installed 5.0.0 into our product and enable 
"headless" feature by setting "IMMSV_SC_ABSENCE_ALLOWED" to 900 seconds. After 
installation we found it worked fine, our system with new OpenSAF release can 
start to run successfully, all SC and payload cards can be "UP", and payload 
card will NOT reboot immediately after we reload both SC.

However we got a problem that, neither of two SC can't be promoted to be 
controller after reboot until the "headless" payload reboot due to 
'IMMSV_SC_ABSENCE_ALLOWED' timeout after 900 seconds. Seems OpenSAF modules in 
both SC just wait there and do nothing, till payload reboot due to timeout, 
then OpenSAF in SC continue to run, whole system recovered finally.

We thought ticket #1828 may has resolved this issue so we took another try with 
release 5.0.1 but got same result.

Could you please tell us in our case, why OpenSAF in both SC could not run 
until payload card(in "headless" status) rebooted due to timeout?
Besides 'IMMSV_SC_ABSENCE_ALLOWED', is there any other variable or parameter 
need to set/modify to enable 'headless cluster' feature? Do we miss anything?
Attachments are the syslog of SC and payload card when this problem happened, 
hope the log files can help us to find out the root cause.

Much appreciated to any comment, thanks!

Regards,
Jianfeng Dong



SC.log
Description: SC.log


payload.log
Description: payload.log
--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot___
Opensaf-users mailing list
Opensaf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-users