[tickets] [opensaf:tickets] #2158 AMF: IMMND dies at Opensaf start up phase causes AMFD heartbeat timeout

2016-11-01 Thread Minh Hon Chau



---

** [tickets:#2158] AMF: IMMND dies at Opensaf start up phase causes AMFD 
heartbeat timeout**

**Status:** unassigned
**Milestone:** 5.0.2
**Created:** Wed Nov 02, 2016 05:20 AM UTC by Minh Hon Chau
**Last Updated:** Wed Nov 02, 2016 05:20 AM UTC
**Owner:** nobody
**Attachments:**

- 
[osafamfnd_sc2](https://sourceforge.net/p/opensaf/tickets/2158/attachment/osafamfnd_sc2)
 (264.2 kB; application/octet-stream)


If IMMND dies at Opensaf startup phase, IMMND is not restarted by AMF. The 
issue has been observed in following situation
- Restart cluster
- During active controller starts up, a critical component is death which cause 
a node failfast
Oct 25 12:51:21 SC-1 osafamfnd[7642]: ER 
safComp=ABC,safSu=1,safSg=2N,safApp=ABC Faulted due to:csiSetcallbackTimeout 
Recovery is:nodeFailfast
Oct 25 12:51:21 SC-1 osafamfnd[7642]: Rebooting OpenSAF NodeId = 131343 EE Name 
= , Reason: Component faulted: recovery is node failfast, OwnNodeId = 131343, 
SupervisionTime = 60
- In the meantime, standby controller is requested to become active
Oct 25 12:51:27 SC-2 tipclog[16221]: Lost link <1.1.2:eth0-1.1.1:eth0> on 
network plane A
Oct 25 12:51:27 SC-2 osafclmna[4336]: NO Starting to promote this node to a 
system controller
Oct 25 12:51:27 SC-2 osafrded[4387]: NO Requesting ACTIVE role
- IMMND is also death a bit later
Oct 25 12:51:29 SC-2 osafimmnd[4536]: ER MESSAGE:44816 OUT OF ORDER my highest 
processed:44814 - exiting
Oct 25 12:51:29 SC-2 osafamfnd[7414]: NO saClmDispatch BAD_HANDLE
- Other services could not initialize other services since IMMND is death
Oct 25 12:51:39 SC-2 osafamfd[7400]: WA saClmInitialize_4 returned 5
Oct 25 12:51:39 SC-2 osafamfd[7400]: WA saNtfInitialize returned 5
Oct 25 12:51:39 SC-2 osafntfimcnd[7501]: WA ntfimcn_ntf_init saNtfInitialize( 
returned SA_AIS_ERR_TIMEOUT (5)
Oct 25 12:51:39 SC-2 osafclmd[7386]: WA saImmOiImplementerSet returned 9
Oct 25 12:51:39 SC-2 osafntfd[7372]: WA saLogInitialize returns try again, 
retries...
Oct 25 12:51:39 SC-2 osaflogd[7358]: WA saImmOiImplementerSet returned 
SA_AIS_ERR_BAD_HANDLE (9)
Oct 25 12:51:39 SC-2 osafamfnd[7414]: WA saClmInitialize_4 returned 5

Oct 25 12:51:49 SC-2 osafamfd[7400]: WA saClmInitialize_4 returned 5
Oct 25 12:51:50 SC-2 osafamfd[7400]: WA saNtfInitialize returned 5
Oct 25 12:51:50 SC-2 osafamfnd[7414]: WA saClmInitialize_4 returned 5

Oct 25 12:52:00 SC-2 osafamfd[7400]: WA saClmInitialize_4 returned 5
Oct 25 12:52:00 SC-2 osafamfd[7400]: WA saNtfInitialize returned 5
Oct 25 12:52:00 SC-2 osafamfnd[7414]: WA saClmInitialize_4 returned 5

Oct 25 12:52:20 SC-2 osafamfnd[7414]: WA saClmInitialize_4 returned 5
Oct 25 12:52:20 SC-2 osafamfd[7400]: WA saNtfInitialize returned 5
Oct 25 12:52:20 SC-2 osafimmd[4489]: NO Extended intro from node 2210f

- At the end, AMFD heart beat timeout 
Oct 25 12:53:57 SC-2 osafntfimcnd[7501]: WA ntfimcn_ntf_init saNtfInitialize( 
returned SA_AIS_ERR_TIMEOUT (5)
Oct 25 12:54:01 SC-2 osafamfnd[7414]: WA saClmInitialize_4 returned 5
Oct 25 12:54:01 SC-2 osafamfd[7400]: WA saNtfInitialize returned 5
Oct 25 12:54:01 SC-2 osafamfd[7400]: WA saClmInitialize_4 returned 5
Oct 25 12:54:07 SC-2 osafntfimcnd[7501]: WA ntfimcn_ntf_init saNtfInitialize( 
returned SA_AIS_ERR_TIMEOUT (5)
Oct 25 12:54:11 SC-2 osafamfnd[7414]: WA saClmInitialize_4 returned 5
Oct 25 12:54:11 SC-2 osafamfd[7400]: WA saClmInitialize_4 returned 5
Oct 25 12:54:11 SC-2 osafamfd[7400]: WA saNtfInitialize returned 5
Oct 25 12:54:15 SC-2 osafamfnd[7414]: ER AMF director heart beat timeout, 
generating core for amfd

In AMFND trace in SC2, AMFND did not receive su_pres from AMFD, therefore AMFND 
could not initiate middleware components (including IMMND), so AMFND was not 
aware of IMMND's death so that AMFND can restart IMMND. The problem here is 
slightly different from #1828, which happened in newly promoted SC (with 
roamingSC feature) where AMFND had IMMND registered.



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2150 amfd: standby amfd crash while decoding node info during cold sync period

2016-11-01 Thread Long HB Nguyen
- Description has changed:

Diff:



--- old
+++ new
@@ -2,7 +2,7 @@
   When standby cold sync occurs, there may have a chance that 
   node creation information is missed in standby node. Active node sending
   node information to standby node (checkpointing) will lead to a standby amfd 
crash.
-  One way to get over this situation is to re-read node info when the node 
info is null.
+  One way to get over this situation is to create node when the node is null.
   
 - Reproduction:
 1) Start a cluster (e.g. 5 nodes).



- **status**: accepted --> review



---

** [tickets:#2150] amfd: standby amfd crash while decoding node info during 
cold sync period**

**Status:** review
**Milestone:** 5.0.2
**Created:** Mon Oct 31, 2016 03:47 AM UTC by Long HB Nguyen
**Last Updated:** Mon Oct 31, 2016 04:17 AM UTC
**Owner:** Long HB Nguyen
**Attachments:**

- 
[logs.zip](https://sourceforge.net/p/opensaf/tickets/2150/attachment/logs.zip) 
(537.0 kB; application/x-zip-compressed)


- Description:
  When standby cold sync occurs, there may have a chance that 
  node creation information is missed in standby node. Active node sending
  node information to standby node (checkpointing) will lead to a standby amfd 
crash.
  One way to get over this situation is to create node when the node is null.
  
- Reproduction:
1) Start a cluster (e.g. 5 nodes).
2) On Standby controller, add a sleep (e.g. 5 seconds) to main.cc:
3) Reboot standby controller.
4) Use the script scale_opensaf in python/samples directory to add a node (e.g. 
PL-6) while standby is rebooting.
5) Observe a coredump on standby node.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2157 mds: Decouple Node ID from TIPC address

2016-11-01 Thread Anders Widell



---

** [tickets:#2157] mds: Decouple Node ID from TIPC address**

**Status:** unassigned
**Milestone:** future
**Created:** Tue Nov 01, 2016 03:10 PM UTC by Anders Widell
**Last Updated:** Tue Nov 01, 2016 03:10 PM UTC
**Owner:** nobody


Currently there is a connection between the OpenSAF Node Id and the TIPC 
address when using the TIPC transport. However, there is no connection between 
the OpenSAF Node Id and the IP address when using the TCP transport. We should 
investigate if this connection could be removed in the TIPC case.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2155 base: Use timerfd and ppoll libc functions when available

2016-11-01 Thread Anders Widell



---

** [tickets:#2155] base: Use timerfd and ppoll libc functions when available**

**Status:** unassigned
**Milestone:** 5.2.FC
**Created:** Tue Nov 01, 2016 01:45 PM UTC by Anders Widell
**Last Updated:** Tue Nov 01, 2016 01:45 PM UTC
**Owner:** nobody


The timerfd and ppoll functions are not included in the current version of LSB, 
and for this reason OpenSAF contains drop-in replacement functions. These 
replacement functions should first check if the underlying operating system 
actually supports timerfd and ppoll, in which case the OS functions should be 
used instead of the replacement function.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2156 build: Remove the dtlog directory

2016-11-01 Thread Anders Widell



---

** [tickets:#2156] build: Remove the dtlog directory**

**Status:** unassigned
**Milestone:** 5.2.FC
**Created:** Tue Nov 01, 2016 01:47 PM UTC by Anders Widell
**Last Updated:** Tue Nov 01, 2016 01:47 PM UTC
**Owner:** nobody


The directory /var/log/opensaf/dtlog is not used and should therefore not be 
created during installation.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1918 AMF: Informative logging

2016-11-01 Thread Nagendra Kumar
- **status**: unassigned --> review
- **assigned_to**: Nagendra Kumar



---

** [tickets:#1918] AMF: Informative logging**

**Status:** review
**Milestone:** 5.2.FC
**Created:** Fri Jul 15, 2016 11:30 AM UTC by Minh Hon Chau
**Last Updated:** Mon Aug 29, 2016 08:09 PM UTC
**Owner:** Nagendra Kumar


Some error/warning logging in AMF currently does not give enough information 
about unexpected situation. 
For example:
1- LOG_ER("%s: invalid node state %u", __FUNCTION__, node->node_state);
-> it should tell which node name/id is in invalid state
2- LOG_ER("Wrong sg fsm state %u", su->sg_of_su->sg_fsm_state);
-> it should tell the sg name is in wrong fsm state
3- LOG_ER("Invalid node_name. Check node_id");
-> it should tell the node name in msg that amfd can not find
4- LOG_ER("Internal error, could not send message to avnd");
-> it should tell at least node id of which avnd that msg can not be sent to
5- LOG_ER("%s: no susis", __FUNCTION__);
-> which su has no susis
6-  LOG_ER("avnd_di_msg_send FAILED");
-> it's helpful to know which msg is failed to sent out, so we can know which 
msg is missing at amfd
...

As the logging is informative, it could help debugging in running system where 
the fault sometimes could not be reproduced (so there would not be trace file 
in next fault reproduction), or we can identify the fault straight away in some 
cases without tracing enquiries

This ticket will scan through amfd/amfnd file by file and add more information 
in error/waring cases. It's started at 5.1 FC and could be continued in next 
releases. Some rules:
1. Log must tell the object that error happens on
2. Log must give error code if it fails at checking return code
3. When failed to send msg, log the msg type and object that msg carries (if 
any)
4. ... 


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2151 osaf: system in not in correct state during Act controller comming up

2016-11-01 Thread Nagendra Kumar
Hi Srikanth,  Thanks for your comments. That means when we are creating 
headless situation, it should be fault based and not 'opensafd stop'. Then 
definetely, this situation will not arise.

@Others: Any suggestion/comment  ?


---

** [tickets:#2151] osaf: system in not in correct state during Act controller 
comming up**

**Status:** unassigned
**Milestone:** 5.2.FC
**Created:** Mon Oct 31, 2016 10:54 AM UTC by Nagendra Kumar
**Last Updated:** Tue Nov 01, 2016 10:10 AM UTC
**Owner:** nobody


Steps to reproduce:
1. Start two controllers(SC-1 Act, SC-2 Standby) and two paylods. Configure 50 
components on SC-2 and unlock them. Keep 1 sec delay in each component stop 
script.
2. Stop SC-1 and after that, stop SC-2.
3. During SC-2 is going down, start SC-1.

Observed behaviour:
Since components are taking time in stopping all components during 'opensad 
stop' of SC-2, Amfnd hasn't exited. But, all middleware components assignments 
are stopped. Only Amfnd and Amfd is alive with few more components to stop.
But SC-1 has come up till Amfd and since two Amfd is Act now, so SC-2 Amfd 
exits by saying "Duplicate ACTIVE detected, exiting".
Till this time, services states including Amfd is in bad state as they couldn't 
differentiate whether it is headless state or failover. This is true also as 
the system is in half middle of headless and failover.


Expected behaviour
In my view:
FMS should stop and shouldn't proceed if peer is going down. i.e. FMS should 
figure out on SC-1 that the peer system is going down. And should allow SC-1 
only if all services are down i.e. it gets node down (may be cb->immd_down && 
cb->immnd_down && cb->amfnd_down && cb->amfd_down && cb->fm_down).





---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2151 osaf: system in not in correct state during Act controller comming up

2016-11-01 Thread Srikanth R
There are three issues in the ticket raised.

1) As per the ticket #2094 comments, "/etc/init.d/opensafd stop" is not a 
proper way to bring down opensaf. It is suggested that to bring down a faulty 
node,  CLM lock on the node can be performed and later reboot command can be 
invoked manually. 

2) I cannot think of any real use case scenario for "concurrent 'opensafd stop' 
on controller and opensafd start on another controller".

In a fault scenario, reboot -f is called where none of the runlevel 
services shall be called during node recovery process. So, the scenario of 
simultaneous 'opensafd stop on SC-1 and opensafd start on SC-2' is not possible 
in production environment.
   
3) Deploying such a large number of components on controller is not suggested, 
as the failure or fault of user components can impact middleware ( opensaf) 
functionality on the entire cluster.


---

** [tickets:#2151] osaf: system in not in correct state during Act controller 
comming up**

**Status:** unassigned
**Milestone:** 5.2.FC
**Created:** Mon Oct 31, 2016 10:54 AM UTC by Nagendra Kumar
**Last Updated:** Tue Nov 01, 2016 06:59 AM UTC
**Owner:** nobody


Steps to reproduce:
1. Start two controllers(SC-1 Act, SC-2 Standby) and two paylods. Configure 50 
components on SC-2 and unlock them. Keep 1 sec delay in each component stop 
script.
2. Stop SC-1 and after that, stop SC-2.
3. During SC-2 is going down, start SC-1.

Observed behaviour:
Since components are taking time in stopping all components during 'opensad 
stop' of SC-2, Amfnd hasn't exited. But, all middleware components assignments 
are stopped. Only Amfnd and Amfd is alive with few more components to stop.
But SC-1 has come up till Amfd and since two Amfd is Act now, so SC-2 Amfd 
exits by saying "Duplicate ACTIVE detected, exiting".
Till this time, services states including Amfd is in bad state as they couldn't 
differentiate whether it is headless state or failover. This is true also as 
the system is in half middle of headless and failover.


Expected behaviour
In my view:
FMS should stop and shouldn't proceed if peer is going down. i.e. FMS should 
figure out on SC-1 that the peer system is going down. And should allow SC-1 
only if all services are down i.e. it gets node down (may be cb->immd_down && 
cb->immnd_down && cb->amfnd_down && cb->amfd_down && cb->fm_down).





---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2154 smf: non backward compatible name change for ExecCtrlCopy

2016-11-01 Thread Rafael
- **status**: assigned --> invalid



---

** [tickets:#2154] smf: non backward compatible name change for ExecCtrlCopy**

**Status:** invalid
**Milestone:** 5.1.1
**Created:** Mon Oct 31, 2016 03:18 PM UTC by Rafael
**Last Updated:** Mon Oct 31, 2016 03:34 PM UTC
**Owner:** Rafael


there was a name change introduced in patch for ticket [#2114]. This changed 
the name of the object openSafSmfExecControl_copy to 
openSafSmfExecControl=SmfHdlCopy. This change causes the upgrades between these 
two versions to fail.


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2078 amfd: remove db_template.h

2016-11-01 Thread Nagendra Kumar
- **status**: unassigned --> review
- **assigned_to**: Nagendra Kumar



---

** [tickets:#2078] amfd: remove db_template.h**

**Status:** review
**Milestone:** 5.2.FC
**Created:** Wed Sep 28, 2016 07:24 AM UTC by Gary Lee
**Last Updated:** Wed Sep 28, 2016 07:24 AM UTC
**Owner:** Nagendra Kumar


amfd/db_template.h can be removed. There is already 
osaf/libs/common/amf/include/amf_db_template.h



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2052 immtools: SC/PL field in nodes.cfg is not used

2016-11-01 Thread Srikanth R
Zoran,

  Node reboot recovery is to be followed, when the system cannot recover from 
the observed fault. For a fault like amfd crashing, node reboot can be 
followed. But in the current scenario, upon reboot same configuration exists 
and node shall go for reboot as opensafd is enabled in the runlevel by default. 
  
   If the system has the same environment after reboot, then it doesn't help 
user / system by rebooting  to recover from a misconfiguration or even a fault.
   
  My expectation is that node shouldn't go for reboot and opensafd should 
be either running in a suspended way or can even be stopped. This issue is 
observed mainly for newbies. Rebooting a node upon starting opensaf for 
misconfiguration doesn't look good. 


---

** [tickets:#2052] immtools: SC/PL field in nodes.cfg is not used**

**Status:** unassigned
**Milestone:** 5.0.2
**Created:** Tue Sep 20, 2016 09:41 AM UTC by Ritu Raj
**Last Updated:** Tue Nov 01, 2016 07:26 AM UTC
**Owner:** nobody


# Environment details
OS : Suse 64bit
Changeset : 7997 ( 5.1.FC)

# Summary
Controller able to join with invalid node_name

# Steps followed & Observed behaviour
1. Mistakenly configured controller node_name with PL-3 and the remaining 
configuration files are properly installed and updated apart from 
/etc/opensaf/node_name.
2. Bringup OpenSAF, OpneSAF still able to comeup with misconfigured node_name

Opensaf status:
fos1:/opt/goahead/tetware/opensaffire/suites/avsv/api/suites # 
/etc/init.d/opensafd status
safSISU=safSu=PL-3\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)

#  Expected
OpenSAF should come up with only SC-1 / SC-2, as immxml generated with :
 ./immxml-clustersize -s 2 -p 2
 ./immxml-configure




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] Re: #2052 immtools: SC/PL field in nodes.cfg is not used

2016-11-01 Thread Zoran Milinkovic
Hi Srikanth,

Immxml tool is used for creating the first basic IMM xml database for starting 
OpenSAF.
As I remember, according to the first column SC/PL, immxml tools use a template 
for SC or PL to create imm.xml file. 

>From my point of view, if a node is misconfigured, the node reboot is 
>reasonable action for the recovery.

When the node misconfiguration is detected, you have written that the node 
should not reboot. 
What do you expect to happen with OpenSAF on the affected node ? To Stop or to 
continue working as payload ?

BR,
Zoran

-Original Message-
From: Srikanth R [mailto:rwp...@users.sf.net] 
Sent: den 1 november 2016 08:26
To: [opensaf:tickets] <2...@tickets.opensaf.p.re.sf.net>
Subject: [opensaf:tickets] #2052 immtools: SC/PL field in nodes.cfg is not used

I think, the discussion got deviated by the usage of PL string in nodes.cfg. 

On the fist node in the opensaf cluster, the following info is filled up in 
opensaf cfg files.


cat /usr/share/opensaf/immxml/nodes.cfg 
SC node-1 node-1
SC node-2 node-2
PL node-3 node-3
PL node-4 node-4
PL node-5 node-5
PL node-6 node-6

cat /etc/opensaf/slot_id
1

cat /etc/opensaf/node_name
node-3
cat /etc/opensaf/node_type
controller


-> Opensafd starts successfully, but with the following output
safSISU=safSu=node-3\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)


-> After a timegap of 5 minutes, the node went for reboot with the following 
output.

Nov  1 12:31:22 CONTROLLER-1 osaffmd[3945]: Rebooting OpenSAF NodeId = 0 EE 
Name = No EE Mapped, Reason: Activation timer supervision expired: no ACTIVE 
assignment received within the time limit, OwnNodeId = 131343, SupervisionTime 
= 60
Nov  1 12:31:22 CONTROLLER-1 opensaf_reboot: Rebooting local node; timeout=60


Observed behavior :
 
 If user mistakenly populates the node_name with the payload's node_name and 
starts the opensafd script, then user shall not be informed about 
mis-configuration. The node reboots continuously as opensafd is enabled in 
runtime by default during RPM installation.

Expected behavior :

  Either fms / imm / amf should detect that the node_name used in bringing up 
is intended for payload, but not for controller.  More importantly, the node 
should not go for reboot.
   


---

** [tickets:#2052] immtools: SC/PL field in nodes.cfg is not used**

**Status:** unassigned
**Milestone:** 5.0.2
**Created:** Tue Sep 20, 2016 09:41 AM UTC by Ritu Raj
**Last Updated:** Tue Sep 20, 2016 05:49 PM UTC
**Owner:** nobody


# Environment details
OS : Suse 64bit
Changeset : 7997 ( 5.1.FC)

# Summary
Controller able to join with invalid node_name

# Steps followed & Observed behaviour
1. Mistakenly configured controller node_name with PL-3 and the remaining 
configuration files are properly installed and updated apart from 
/etc/opensaf/node_name.
2. Bringup OpenSAF, OpneSAF still able to comeup with misconfigured node_name

Opensaf status:
fos1:/opt/goahead/tetware/opensaffire/suites/avsv/api/suites # 
/etc/init.d/opensafd status
safSISU=safSu=PL-3\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)

#  Expected
OpenSAF should come up with only SC-1 / SC-2, as immxml generated with :
 ./immxml-clustersize -s 2 -p 2
 ./immxml-configure




---

Sent from sourceforge.net because you indicated interest in 




To unsubscribe from further messages, please visit 



---

** [tickets:#2052] immtools: SC/PL field in nodes.cfg is not used**

**Status:** unassigned
**Milestone:** 5.0.2
**Created:** Tue Sep 20, 2016 09:41 AM UTC by Ritu Raj
**Last Updated:** Tue Nov 01, 2016 07:26 AM UTC
**Owner:** nobody


# Environment details
OS : Suse 64bit
Changeset : 7997 ( 5.1.FC)

# Summary
Controller able to join with invalid node_name

# Steps followed & Observed behaviour
1. Mistakenly configured controller node_name with PL-3 and the remaining 
configuration files are properly installed and updated apart from 
/etc/opensaf/node_name.
2. Bringup OpenSAF, OpneSAF still able to comeup with misconfigured node_name

Opensaf status:
fos1:/opt/goahead/tetware/opensaffire/suites/avsv/api/suites # 
/etc/init.d/opensafd status
safSISU=safSu=PL-3\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)

#  Expected
OpenSAF should come up with only SC-1 / SC-2, as immxml generated with :
 ./immxml-clustersize -s 2 -p 2
 ./immxml-configure




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Developer Access Program 

[tickets] [opensaf:tickets] #2052 immtools: SC/PL field in nodes.cfg is not used

2016-11-01 Thread Srikanth R
I think, the discussion got deviated by the usage of PL string in nodes.cfg. 

On the fist node in the opensaf cluster, the following info is filled up in 
opensaf cfg files.


cat /usr/share/opensaf/immxml/nodes.cfg 
SC node-1 node-1
SC node-2 node-2
PL node-3 node-3
PL node-4 node-4
PL node-5 node-5
PL node-6 node-6

cat /etc/opensaf/slot_id
1

cat /etc/opensaf/node_name
node-3
cat /etc/opensaf/node_type
controller


-> Opensafd starts successfully, but with the following output
safSISU=safSu=node-3\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)


-> After a timegap of 5 minutes, the node went for reboot with the following 
output.

Nov  1 12:31:22 CONTROLLER-1 osaffmd[3945]: Rebooting OpenSAF NodeId = 0 EE 
Name = No EE Mapped, Reason: Activation timer supervision expired: no ACTIVE 
assignment received within the time limit, OwnNodeId = 131343, SupervisionTime 
= 60
Nov  1 12:31:22 CONTROLLER-1 opensaf_reboot: Rebooting local node; timeout=60


Observed behavior :
 
 If user mistakenly populates the node_name with the payload's node_name and 
starts the opensafd script, then user shall not be informed about 
mis-configuration. The node reboots continuously as opensafd is enabled in 
runtime by default during RPM installation.

Expected behavior :

  Either fms / imm / amf should detect that the node_name used in bringing up 
is intended for payload, but not for controller.  More importantly, the node 
should not go for reboot.
   


---

** [tickets:#2052] immtools: SC/PL field in nodes.cfg is not used**

**Status:** unassigned
**Milestone:** 5.0.2
**Created:** Tue Sep 20, 2016 09:41 AM UTC by Ritu Raj
**Last Updated:** Tue Sep 20, 2016 05:49 PM UTC
**Owner:** nobody


# Environment details
OS : Suse 64bit
Changeset : 7997 ( 5.1.FC)

# Summary
Controller able to join with invalid node_name

# Steps followed & Observed behaviour
1. Mistakenly configured controller node_name with PL-3 and the remaining 
configuration files are properly installed and updated apart from 
/etc/opensaf/node_name.
2. Bringup OpenSAF, OpneSAF still able to comeup with misconfigured node_name

Opensaf status:
fos1:/opt/goahead/tetware/opensaffire/suites/avsv/api/suites # 
/etc/init.d/opensafd status
safSISU=safSu=PL-3\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=OpenSAF
saAmfSISUHAState=ACTIVE(1)

#  Expected
OpenSAF should come up with only SC-1 / SC-2, as immxml generated with :
 ./immxml-clustersize -s 2 -p 2
 ./immxml-configure




---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1571 AMF: Use std::maps instead of Patricia trees

2016-11-01 Thread Long HB Nguyen
- **Milestone**: 5.2.FC --> never



---

** [tickets:#1571] AMF: Use std::maps instead of Patricia trees**

**Status:** invalid
**Milestone:** never
**Created:** Wed Oct 28, 2015 02:39 AM UTC by Long HB Nguyen
**Last Updated:** Tue Aug 30, 2016 03:28 AM UTC
**Owner:** Long HB Nguyen


Use std::maps instead of Patricia trees, see also [#1520].
This enhancement has been included in [#1642].


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #1558 amf: use nullptr instead of NULL macros

2016-11-01 Thread Long HB Nguyen
- **Milestone**: 5.2.FC --> never



---

** [tickets:#1558] amf: use nullptr instead of NULL macros**

**Status:** invalid
**Milestone:** never
**Created:** Fri Oct 23, 2015 04:28 AM UTC by Long HB Nguyen
**Last Updated:** Tue Aug 30, 2016 03:25 AM UTC
**Owner:** Long HB Nguyen


Using nullptr instead of NULL macros. This is a part of ticket [#1520].
This ticket was replaced by [#1547] and [#1551].


---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2151 osaf: system in not in correct state during Act controller comming up

2016-11-01 Thread Nagendra Kumar
- **Type**: defect --> discussion



---

** [tickets:#2151] osaf: system in not in correct state during Act controller 
comming up**

**Status:** unassigned
**Milestone:** 5.2.FC
**Created:** Mon Oct 31, 2016 10:54 AM UTC by Nagendra Kumar
**Last Updated:** Mon Oct 31, 2016 10:56 AM UTC
**Owner:** nobody


Steps to reproduce:
1. Start two controllers(SC-1 Act, SC-2 Standby) and two paylods. Configure 50 
components on SC-2 and unlock them. Keep 1 sec delay in each component stop 
script.
2. Stop SC-1 and after that, stop SC-2.
3. During SC-2 is going down, start SC-1.

Observed behaviour:
Since components are taking time in stopping all components during 'opensad 
stop' of SC-2, Amfnd hasn't exited. But, all middleware components assignments 
are stopped. Only Amfnd and Amfd is alive with few more components to stop.
But SC-1 has come up till Amfd and since two Amfd is Act now, so SC-2 Amfd 
exits by saying "Duplicate ACTIVE detected, exiting".
Till this time, services states including Amfd is in bad state as they couldn't 
differentiate whether it is headless state or failover. This is true also as 
the system is in half middle of headless and failover.


Expected behaviour
In my view:
FMS should stop and shouldn't proceed if peer is going down. i.e. FMS should 
figure out on SC-1 that the peer system is going down. And should allow SC-1 
only if all services are down i.e. it gets node down (may be cb->immd_down && 
cb->immnd_down && cb->amfnd_down && cb->amfd_down && cb->fm_down).





---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
Developer Access Program for Intel Xeon Phi Processors
Access to Intel Xeon Phi processor-based developer platforms.
With one year of Intel Parallel Studio XE.
Training and support from Colfax.
Order your platform today. http://sdm.link/xeonphi___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets