Hi Kang-sen,

You had either network or disk problem.
The same problem happened more times before IMMND was killed by NID.
First start:
Jan  4 12:28:48 BHA-IND-MUM-MALAD-CAE-8 opensafd: Starting OpenSAF Services
....
Jan  4 12:28:54 BHA-IND-MUM-MALAD-CAE-8 osafimmnd[53251]: Started
....
Jan  4 12:33:24 BHA-IND-MUM-MALAD-CAE-8 opensafd: Stopping OpenSAF Services     
<----- stopping OpenSAF was not initiated by OpenSAF ??????

Second try:
Jan  4 12:34:06 BHA-IND-MUM-MALAD-CAE-8 opensafd: Starting OpenSAF Services
....
Jan  4 12:34:12 BHA-IND-MUM-MALAD-CAE-8 osafimmnd[57640]: Started
....
Jan  4 12:36:02 BHA-IND-MUM-MALAD-CAE-8 opensafd: Stopping OpenSAF Services


... and so on....

In the last try you have the case where NID killed IMMND because it didn't 
start in 8 minutes which is the default IMMND timeout in NID (check 
/etc/opensaf/nodeinit.conf.payload).

Jan  4 12:44:20 BHA-IND-MUM-MALAD-CAE-8 osafimmnd[65304]: Started
....
Jan  4 12:52:20 BHA-IND-MUM-MALAD-CAE-8 opensafd[65278]: ER Timed-out for 
response from IMMND          <------- timeout after 8 minutes
Jan  4 12:52:20 BHA-IND-MUM-MALAD-CAE-8 opensafd[65278]: ER 
Jan  4 12:52:20 BHA-IND-MUM-MALAD-CAE-8 opensafd[65278]: ER Going for recovery
Jan  4 12:52:20 BHA-IND-MUM-MALAD-CAE-8 opensafd[65278]: ER Trying To RESPAWN 
/usr/lib/opensaf/clc-cli/osaf-immnd attempt #1
Jan  4 12:52:20 BHA-IND-MUM-MALAD-CAE-8 opensafd[65278]: ER Sending SIGKILL to 
IMMND, pid=65297
Jan  4 12:52:20 BHA-IND-MUM-MALAD-CAE-8 osafimmnd[65304]: exiting for shutdown
Jan  4 12:52:35 BHA-IND-MUM-MALAD-CAE-8 osafimmnd[1977]: Started

There is no any syslog after osafimmnd started, so I assume that you had a 
network issue where IMMND was waiting to finish IMMND initialization.
Another issue might be a disk problem but it's unlikely since you have syslogs 
written to the disk.

Thanks,
Zoran

-----Original Message-----
From: Kang-Sen Lu [mailto:[email protected]] 
Sent: den 5 januari 2017 14:23
To: Zoran Milinkovic <[email protected]>; 
[email protected]
Subject: RE: [users] question about payload blade osafimmnd startup problem

Hi, Zoran:

Thank for your reply. I am sending you the syslog from "Starting Opensaf 
Service", up to the time we gave up.

You can find the log from all opensaf components.

However, we didn't turn on trace on immnd, so there is no trace log to provide 
you. Unfortunately, this problem is not reproduceable. After some time, the 
problem goes away.

Kang-sen

-----Original Message-----
From: Zoran Milinkovic [mailto:[email protected]] 
Sent: Thursday, January 05, 2017 3:50 AM
To: Kang-Sen Lu <[email protected]>; [email protected]
Subject: RE: [users] question about payload blade osafimmnd startup problem

Hi Kang-sen,

The error indicates that IMMND was not started within a certain time, and NID 
killed IMMND.

Please share logs before the error to see what exactly happened.
If it's IMMND problem, traces will help more to analyze the problem.

Thanks,
Zoran

-----Original Message-----
From: Kang-Sen Lu [mailto:[email protected]]
Sent: den 4 januari 2017 18:17
To: [email protected]
Subject: [users] question about payload blade osafimmnd startup problem

We are running opensaf 4.4.0, on a HP chassis. We are facing a payload blade 
(slot-8) have opensaf startup problem.

Here is the relevant part of the syslog:

Jan  4 12:44:20 BHA-IND-MUM-MALAD-CAE-8 kernel: [660527.561238] tipc: Activated 
(version 2.0.1.2) Jan  4 12:44:20 BHA-IND-MUM-MALAD-CAE-8 kernel: 
[660527.561327] NET: Registered protocol family 30 Jan  4 12:44:20 
BHA-IND-MUM-MALAD-CAE-8 kernel: [660527.561444] tipc: Started in single node 
mode Jan  4 12:44:20 BHA-IND-MUM-MALAD-CAE-8 kernel: [660527.563034] tipc: 
Started in network mode Jan  4 12:44:20 BHA-IND-MUM-MALAD-CAE-8 kernel: 
[660527.563037] tipc: Own node address <1.1.129>, network identity 1234 Jan  4 
12:44:20 BHA-IND-MUM-MALAD-CAE-8 kernel: [660527.565229] tipc: Enabled bearer 
<eth:bond0>, discovery domain <1.1.0>, priority 10 Jan  4 12:44:20 
BHA-IND-MUM-MALAD-CAE-8 osafimmnd[65304]: Started Jan  4 12:44:20 
BHA-IND-MUM-MALAD-CAE-8 kernel: [660528.240310] IPMI Watchdog: response: Error 
d5 on cmd 22

Jan  4 12:52:20 BHA-IND-MUM-MALAD-CAE-8 opensafd[65278]: ER Timed-out for 
response from IMMND Jan  4 12:52:20 BHA-IND-MUM-MALAD-CAE-8 opensafd[65278]: ER 
Jan  4 12:52:20 BHA-IND-MUM-MALAD-CAE-8 opensafd[65278]: ER Going for recovery 
Jan  4 12:52:20 BHA-IND-MUM-MALAD-CAE-8 opensafd[65278]: ER Trying To RESPAWN 
/usr/lib/opensaf/clc-cli/osaf-immnd attempt #1 Jan  4 12:52:20 
BHA-IND-MUM-MALAD-CAE-8 opensafd[65278]: ER Sending SIGKILL to IMMND, pid=65297 
Jan  4 12:52:20 BHA-IND-MUM-MALAD-CAE-8 osafimmnd[65304]: exiting for shutdown

Anybody can suggest how to fin dout what the problem is? Other payload blade 
did not have the same problem.

Thanks.

Kang-sen

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most engaging tech 
sites, SlashDot.org! http://sdm.link/slashdot 
_______________________________________________
Opensaf-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-users

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Opensaf-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Reply via email to