from:"Anders Widell"

Re: [users] what is the new SA FORUM website?

2020-03-04 Thread Anders Widell via Opensaf-users

Hi!

The  specifications implemented by OpenSAF are available at 
https://opensaf.sourceforge.io/documentation.html

Since the saforum page is no longer working, we are going to upload the rest of 
the specifications there as well.

/ Anders Widell

-Original Message-
From: Carroll, James R  
Sent: March 04, 2020 22:19
To: opensaf-users@lists.sourceforge.net
Subject: [users] what is the new SA FORUM website?

Hi All,

I just downloaded what I believe is the latest AMF Programmers Reference 
document from the following website:
https://opensaf.sourceforge.io/documentation.html

It still has the following reference:

The above documents are available at 
https://protect2.fireeye.com/v1/url?k=d5cf78b5-891b7229-d5cf382e-8631fc8bdea5-792ec565dcda7337=1=d47d059f-f775-4d74-9eeb-3d4e0487eb6e=http%3A%2F%2Fwww.saforum.org%2F<https://protect2.fireeye.com/v1/url?k=1bf47b86-4720711a-1bf43b1d-8631fc8bdea5-4e62c222aa61dc07=1=d47d059f-f775-4d74-9eeb-3d4e0487eb6e=http%3A%2F%2Fwww.saforum.org%2F>
This statement is included in all the programmers reference manuals, I am just 
showing this one example.

The website 
https://protect2.fireeye.com/v1/url?k=7f835916-2357538a-7f83198d-8631fc8bdea5-e7db2d6338537c60=1=d47d059f-f775-4d74-9eeb-3d4e0487eb6e=http%3A%2F%2Fwww.saforum.org%2F
 is no longer affiliated with the Service Availability Forum.  It is some type 
of shopping advice forum.
Can someone confirm the official replacement URL for 
https://protect2.fireeye.com/v1/url?k=69d8e97b-350ce3e7-69d8a9e0-8631fc8bdea5-db02398ba88c30bb=1=d47d059f-f775-4d74-9eeb-3d4e0487eb6e=http%3A%2F%2Fwww.saforum.org%2F<https://protect2.fireeye.com/v1/url?k=5b1233a9-07c63935-5b127332-8631fc8bdea5-e495c3e133411bb7=1=d47d059f-f775-4d74-9eeb-3d4e0487eb6e=http%3A%2F%2Fwww.saforum.org%2F>
 ?
Also, can someone confirm when the documentation will be updated with the 
correct URL references?

Thanks.

Jim


___
Opensaf-users mailing list
Opensaf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-users


___
Opensaf-users mailing list
Opensaf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-users

[users] Announcement of the OpenSAF 5.18.04 release

2018-04-20 Thread Anders Widell

The OpenSAF community is pleased to announce the availability of the 
OpenSAF 5.18.04 release. The source code for OpenSAF 5.18.04 and the 
corresponding documentation can be downloaded using the following links:


http://sourceforge.net/projects/opensaf/files/releases/opensaf-5.18.04.tar.gz/download
http://sourceforge.net/projects/opensaf/files/docs/opensaf-documentation-5.18.04.tar.gz/download

For a complete list of new features in this release, please refer to the 
NEWS at the wiki:


https://sourceforge.net/p/opensaf/wiki/NEWS-5.18.04/

See the ChangeLog for a full list of changes in this release:

https://sourceforge.net/p/opensaf/wiki/ChangeLog-5.18.04/

Note that starting from the August 2017 release, we are using a new 
version numbering scheme for OpenSAF. The components in the OpenSAF 
version number 5.18.04 represent the major release (5), followed by the 
year (18) and month (04) when the release was made. This change was made 
as a step towards introducing continuous delivery in the OpenSAF project.


Thank you for your continued interest in OpenSAF and to everyone who has 
contributed to this release.


regards,
Anders Widell


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-users mailing list
Opensaf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Re: [users] Payload card reboot due to a short time network break

2018-04-09 Thread Anders Widell

The only way to be sure if it is appropriate is to test under realistic 
conditions. I agree that it makes sense to increase it so that it is 
larger than the TIPC link tolerance. It should be noted that the IMM 
agent always communicates directly with the IMM node director running on 
the same node, and for this communication I don't think the TIPC link 
tolerance is relevant (you will immediately detect if the IMM node 
director process goes away). However, the IMM node director may in turn 
have to communicate with IMM processes running on other nodes in the 
cluster in order to fulfill your request, and for that communication the 
TIPC link tolerance comes into play. If it needs to communicate in 
several hops it may even make sense to have a time-out which is several 
times the TIPC link tolerance (compare with the default values for these 
time-outs: link tolerance=1.5 seconds and IMMA time-out=10 seconds).


regards,

Anders Widell


On 04/09/2018 10:19 AM, Jianfeng Dong wrote:


Hi Anders,

Now we want to increase TIPC tolerance from current 10 seconds to 12 
or 15, thus we also need to increase a OpenSAF parameter 
‘IMMA_SYNCR_TIMEOUT’ from current 12 seconds to a bigger value(20 
maybe), do you think 20 seconds is proper for the parameter?


Thanks.

Regards,

Jianfeng

*From:*Jianfeng Dong
*Sent:* Tuesday, March 13, 2018 5:38 PM
*To:* Anders Widell <anders.wid...@ericsson.com>; Mathi N P 
<mathi.np@gmail.com>

*Cc:* opensaf-users@lists.sourceforge.net
*Subject:* RE: [users] Payload card reboot due to a short time network 
break


Anders,

As you can see in those logs we had set the TIPC link tolerance to 10 
seconds, I’m just not sure how long is proper especially for this case.


I think I can take a try at least, to turn TIPC running on the 
Ethernet interfaces instead.


Thanks for your comment for the CLM design idea, I understand it 
definitely would not be easy to make such a change.


Thanks,

Jianfeng

*From:*Anders Widell [mailto:anders.wid...@ericsson.com]
*Sent:* Monday, March 12, 2018 7:52 PM
*To:* Mathi N P <mathi.np@gmail.com 
<mailto:mathi.np@gmail.com>>; Jianfeng Dong <jd...@juniper.net 
<mailto:jd...@juniper.net>>
*Cc:* opensaf-users@lists.sourceforge.net 
<mailto:opensaf-users@lists.sourceforge.net>
*Subject:* Re: [users] Payload card reboot due to a short time network 
break


We also tried running TIPC on a bonded interface but ended up having 
to change it since it never worked well. When you have two redundant 
Ethernet interfaces, TIPC will tolerate failures in one of them 
seamlessly without losing connectivity. But when you run TIPC on a 
bonded interface it doesn't work, as you can see in your case. I guess 
the reason is that you have two separate mechanisms on top of each 
other, trying to achieve the same thing. One possible workaround is to 
increase the TIPC link tolerance.


When we lose connectivity with a node in the cluster, we are expecting 
that it happened because the other node went down (rebooted or 
permanently died). We don't expect to re-establish connectivity with 
the same node unless it has rebooted in between. It would be possible 
to introduce a grace time to allow a node to stay in the CLM cluster 
for a while after the connectivity with it has been lost, and allow it 
to continue as a cluster member if connectivity is re-established 
before this grace time has expired. However, this is not so easy and 
it is much easier to increase the TIPC link tolerance and let TIPC 
handle this for us.


regards,

Anders Widell

On 03/09/2018 12:42 PM, Mathi N P wrote:

This is an interesting case (and 'rare' :-))

2018-02-16T17:56:41.172791+00:00 scm2 osafamfd[3312]: WA Sending
node reboot order to
node:safAmfNode=PLD0114,safAmfCluster=myAmfCluster, due to late
node_up_msg after cluster startup timeout
2018-02-16T17:56:11 to 2018-02-16T17:56:41 except an error, then
it got the reboot command from SC and thus it reboot itself.

Given that the node has not 'instantiated' completely and a reboot
order can be treated as a 'failed start up', based on the current
AMF state,

AMF can make a decision by reading the
'saamfnodefailfastoninstantiationfailure' (or perhaps
'saamfnodeautorepair' ) attribute to reboot or not and report a
node instantantiation failure (back to the rc script and other
associated events for that state).

Thanks,

Mathi.

On Fri, Mar 9, 2018 at 10:42 AM, Jianfeng Dong <jd...@juniper.net
<mailto:jd...@juniper.net>> wrote:

Thanks Anders, much appreciate.

And yes, in PLD we run TIPC on a bonded interface which
comprises two Ethernet interfaces.
I'm wondering why a bonding interface can't provide similar
protection like TIPC does, is it because TIPC is more robust
or something else? I'm not sure if it is right to change the
low-level design at this t

Re: [users] [devel] Performance benchmark for OpenSAF message services

2018-03-22 Thread Anders Widell


Hi!

I don't have detailed knowledge about the implementation of any of the 
services you mention, but I will try to answer as good as I can anyhow. 
See my comments below, marked AndersW>


regards,

Anders Widell


On 03/21/2018 09:28 PM, Feng Xie wrote:

Hi,

We are evaluating OpenSAF as an open-source middleware solution. Our 
target environment is a single-node embedded system for now and may be extended 
to a cluster later. Performance of OpenSAF services, especially messages, 
notification and event distribution, is important for us. Can somebody shed 
light on high-level understanding of the communication mechanism in these 
services? More specifically,

   1.  Are OpenSAF daemons involved in every message passing from a sender to receivers? 
Or the daemons only act as "name servers" when setting up the connections 
between senders and receivers.


AndersW> I think the messages always pass through the OpenSAF service 
daemons, though there could be some exceptions that I am not aware of. 
CKPT uses shared memory for checkpoint data, so not everything is send 
in messages.



   2.  It seems that OpenSAF allows both sender and receiver to be in the same 
process while using these services. If so, is the communication still carried 
by TCP/TIPC? Are there any ways to bypass TCP or TIPC, eg., using shared 
memory, to improve the performance?


AndersW> When you configure OpenSAF with TCP transport, it will use UNIX 
domain sockets for node-local communication. I wouldn't expect it to 
affect performance very much though, as both TCP and TIPC should perform 
well for node-local communication. Note that I am talking about the 
node-local communication between the application process and the OpenSAF 
daemon here. I don't think there is any short-cut where you can send 
messages directly from one handle to another within the same application 
process. CKPT data could be an exception though, as mentioned earlier.



   3.  Are there any benchmark data on any reference/real systems? Is the 
overhead of OpenSAF services a real/valid concern for a single-node environment?


AndersW> I am not aware of any publicly available benchmark data for 
OpenSAF.



Thanks a lot!

From: Feng Xie
Sent: Wednesday, March 21, 2018 3:31 PM
To: ajo...@rbbn.com; opensaf-users@lists.sourceforge.net; 
opensaf-de...@lists.sourceforge.net
Cc: Feng Xie <feng_x...@jabil.com>
Subject: RE: Errors in running OpenSAF message sample

Hi Alex,

Thanks a lot for the help. I try to run OpenSAF in a single node 
environment.

 It turned out that the failure of message sample program is due to nodes.cfg. Initially, I 
created a two-node configuration with "immxml-clustersize -s 2" with the assumption that 
two controllers are required. After re-creating nodes.cfg with one controller using 
"immxml-clustersize -s 1", the sample program works.

 Although the root cause may be identified, I still don't know the 
reason/mechanism behind the failure. In my opinion, a two-controller 
configuration file should be allowed. The topology may have two controllers but 
my app just uses one.

Feng

From: Alex Jones <ajo...@rbbn.com<mailto:ajo...@rbbn.com>>
Sent: Friday, March 16, 2018 1:52 PM
To: Feng Xie <feng_x...@jabil.com<mailto:feng_x...@jabil.com>>
Cc: 
opensaf-users@lists.sourceforge.net<mailto:opensaf-users@lists.sourceforge.net>
Subject: Re: Opensaf-users Digest, Vol 58, Issue 10

Hi Feng,

 How many nodes are you running? Just 1 controller?

 Look in /var/log/messages for output from msgd and msgnd. These implement 
the MSG subsystem. Maybe there is something wrong with how they started up.

Alex

From: Feng Xie
Sent: Thursday, March 15, 2018 4:25 PM
To: Anders Widell <anders.wid...@ericsson.com<mailto:anders.wid...@ericsson.com>>; 
opensaf-users@lists.sourceforge.net<mailto:opensaf-users@lists.sourceforge.net>; 
opensaf-de...@lists.sourceforge.net<mailto:opensaf-de...@lists.sourceforge.net>
Cc: Feng Xie <feng_x...@jabil.com<mailto:feng_x...@jabil.com>>
Subject: Errors in running OpenSAF message sample

Hi,

Thanks a lot to Anders' suggestion. After downgrading from Python 3 to Python 
2.7, the errors in running immxml-configure were resolved!

I was able to start OpenSAF related demos and compile sample programs. The 
first OpenSAF sample program I ran is msg_demo under 
~/local/share/opensaf/samples/mqsv. The sample was run in a single-node Ubuntu 
VM. However, I encountered error code 2 from saMsgInitialize. I checked the 
source code in ~/src/msg/agent/mqa_api.cc, it seems that error code 2 maps to 
SA_AIS_ERR_LIBRARY. There are may occurrences of it. I am stuck at this step. 
Any help will be highly appreciated!

In addition, I have two additional questions:

   1.  Is there a timer service provided by OpenSAF?
   2.  How to turn on internal tracing when running OpenSAF sample

Re: [users] Procedure to Update OpenSAF Binary(/ies) on a Live Running System

2018-03-13 Thread Anders Widell


Hi!

I suppose the standard procedure for upgrades is a rolling upgrade over 
nodes using SMF. I.e. upgrade the nodes in the cluster one at a time. 
For each node, do something like this:


1. If it is a system controller node, make sure that it is note the 
active controller. If it is, perform an si-swap to make it standby.


2. Lock the node.

3. Stop the OpenSAF service

4. Upgrade the OpenSAF binaries (recommended to use RPM)

5. Reboot the node (recommended, but not strictly necessary)

6. Start the OpenSAF service (if not already started due to reboot)

7. Unlock the node.

regards,

Anders Widell


On 03/13/2018 06:01 AM, Saurabh Pandey wrote:

Can anyone check and let us know if there is any standard procedure available 
for patching?

Regards,
Saurabh

From: Kapil Gokhale
Sent: Monday, March 12, 2018 4:38 PM
To: opensaf-users@lists.sourceforge.net
Cc: Nivrutti Kale <nivrutti.k...@mavenir.com>; Saurabh Pandey 
<saurabh.pan...@mavenir.com>
Subject: Procedure to Update OpenSAF Binary(/ies) on a Live Running System

Hi All

Query:
How to update OpenSAF binary/binaries in a live running system with minimal 
hindrance(downtime).
What steps to follow for such an update to be foolproof?

Scenario:
We have openSAF 4.5.0 deployed at customer site. An issue related to Ticket 
[#1475] has been observed.
This ticket has been fixed in 4.5.2 however we do not want to upgrade system to 
4.5.2, instead want to apply patch related to Ticket [#1475].
Since patched binaries have to be updated finally at customer site, we need a 
foolproof procedure which also guarantees minimal downtime.

A general answer (irrespective of specific Ticket #1475) is welcome too.

Regards
Kapil Gokhale


This e-mail message may contain confidential or proprietary information of 
Mavenir Systems, Inc. or its affiliates and is intended solely for the use of 
the intended recipient(s). If you are not the intended recipient of this 
message, you are hereby notified that any review, use or distribution of this 
information is absolutely prohibited and we request that you delete all copies 
in your control and contact us by e-mailing to secur...@mavenir.com. This 
message contains the views of its author and may not necessarily reflect the 
views of Mavenir Systems, Inc. or its affiliates, who employ systems to monitor 
email messages, but make no representation that such messages are authorized, 
secure, uncompromised, or free from computer viruses, malware, or other 
defects. Thank You
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-users mailing list
Opensaf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-users




--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-users mailing list
Opensaf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Re: [users] Payload card reboot due to a short time network break

2018-03-12 Thread Anders Widell

We also tried running TIPC on a bonded interface but ended up having to 
change it since it never worked well. When you have two redundant 
Ethernet interfaces, TIPC will tolerate failures in one of them 
seamlessly without losing connectivity. But when you run TIPC on a 
bonded interface it doesn't work, as you can see in your case. I guess 
the reason is that you have two separate mechanisms on top of each 
other, trying to achieve the same thing. One possible workaround is to 
increase the TIPC link tolerance.

When we lose connectivity with a node in the cluster, we are expecting 
that it happened because the other node went down (rebooted or 
permanently died). We don't expect to re-establish connectivity with the 
same node unless it has rebooted in between. It would be possible to 
introduce a grace time to allow a node to stay in the CLM cluster for a 
while after the connectivity with it has been lost, and allow it to 
continue as a cluster member if connectivity is re-established before 
this grace time has expired. However, this is not so easy and it is much 
easier to increase the TIPC link tolerance and let TIPC handle this for us.

regards,

Anders Widell

On 03/09/2018 12:42 PM, Mathi N P wrote:

This is an interesting case (and 'rare' :-))

2018-02-16T17:56:41.172791+00:00 scm2 osafamfd[3312]: WA Sending node 
reboot order to node:safAmfNode=PLD0114,safAmfCluster=myAmfCluster, 
due to late node_up_msg after cluster startup timeout
2018-02-16T17:56:11 to 2018-02-16T17:56:41 except an error, then it 
got the reboot command from SC and thus it reboot itself.

Given that the node has not 'instantiated' completely and a reboot 
order can be treated as a 'failed start up', based on the current AMF 
state,
AMF can make a decision by reading the 
'saamfnodefailfastoninstantiationfailure' (or perhaps 
'saamfnodeautorepair' ) attribute to reboot or not and report a node 
instantantiation failure (back to the rc script and other associated 
events for that state).

Thanks,
Mathi.

On Fri, Mar 9, 2018 at 10:42 AM, Jianfeng Dong <jd...@juniper.net 
<mailto:jd...@juniper.net>> wrote:

Thanks Anders, much appreciate.

And yes, in PLD we run TIPC on a bonded interface which comprises
two Ethernet interfaces.
I'm wondering why a bonding interface can't provide similar
protection like TIPC does, is it because TIPC is more robust or
something else? I'm not sure if it is right to change the
low-level design at this time point for our product, I will talk
with my workmates on this change and find more details in TIPC manual.

Regarding to OpenSAF part, do you guys think is it possible that
SC do not force rebooting the PLD in this case? After all the
connection recovered quickly.

Regards,
Jianfeng

-Original Message-
From: Anders Widell [mailto:anders.wid...@ericsson.com
<mailto:anders.wid...@ericsson.com>]
Sent: Thursday, March 8, 2018 8:38 PM
To: Jianfeng Dong <jd...@juniper.net <mailto:jd...@juniper.net>>;
opensaf-users@lists.sourceforge.net
<mailto:opensaf-users@lists.sourceforge.net>
Subject: Re: [users] Payload card reboot due to a short time
network break

Hi!

Are you running TIPC on a bonded interface? I wouldn't recommend this.
Instead, you should run TIPC on the raw Ethernet interfaces and
let TIPC handle the link fail-over in case of a failure in one of
them. TIPC should be able to do this without ever losing the
connectivity between the nodes.

regards,

Anders Widell

On 03/08/2018 10:43 AM, Jianfeng Dong wrote:
> Hi,
>
> Several days ago we got a payload card reboot issue in customer
field, a PLD lost connection with SC for a little while(about 10
seconds), then SC forced the PLD to reboot even though the PLD was
going into “SC Absent mode”.
>
> System summary:
> our product is a system with 2 SC boards and at most 14 PLD
cards, running OpenSAF 5.1.0 with the feature “SC Absent Mode”
enabled, and SC connect with PLD via Ethernet and TIPC.
>
> Issue course:
> 1. PLD’s internal network went down for a hardware/driver
problem, but it recovered quickly in 2 seconds.
>
> 2018-02-16T17:55:58.343287+00:00 pld0114 kernel: bonding: bond0:
link
> status definitely down for interface eth0, disabling it
> 2018-02-16T17:56:00.743201+00:00 pld0114 kernel: bonding: bond0:
link status up for interface eth0, enabling it in 6 ms.
>
> 2. 10 seconds later TIPC still broke even though the network got
recovered.
>
> 2018-02-16T17:56:10.050386+00:00 pld0114 kernel: tipc: Resetting
link
> <1.1.14:bond0-1.1.16:eth2>, peer not responding
> 2018-02-16T17:56:10.050428+00:00 pld0114 kernel: tipc: Lost link
> <1.1.14:bond0-1.1.16:eth2> on network plane A

Re: [users] Payload card reboot due to a short time network break

2018-03-08 Thread Anders Widell


Hi!

Are you running TIPC on a bonded interface? I wouldn't recommend this. 
Instead, you should run TIPC on the raw Ethernet interfaces and let TIPC 
handle the link fail-over in case of a failure in one of them. TIPC 
should be able to do this without ever losing the connectivity between 
the nodes.


regards,

Anders Widell


On 03/08/2018 10:43 AM, Jianfeng Dong wrote:

Hi,

Several days ago we got a payload card reboot issue in customer field, a PLD 
lost connection with SC for a little while(about 10 seconds), then SC forced 
the PLD to reboot even though the PLD was going into “SC Absent mode”.

System summary:
our product is a system with 2 SC boards and at most 14 PLD cards, running 
OpenSAF 5.1.0 with the feature “SC Absent Mode” enabled, and SC connect with 
PLD via Ethernet and TIPC.

Issue course:
1. PLD’s internal network went down for a hardware/driver problem, but it 
recovered quickly in 2 seconds.

2018-02-16T17:55:58.343287+00:00 pld0114 kernel: bonding: bond0: link status 
definitely down for interface eth0, disabling it
2018-02-16T17:56:00.743201+00:00 pld0114 kernel: bonding: bond0: link status up 
for interface eth0, enabling it in 6 ms.

2. 10 seconds later TIPC still broke even though the network got recovered.

2018-02-16T17:56:10.050386+00:00 pld0114 kernel: tipc: Resetting link 
<1.1.14:bond0-1.1.16:eth2>, peer not responding
2018-02-16T17:56:10.050428+00:00 pld0114 kernel: tipc: Lost link 
<1.1.14:bond0-1.1.16:eth2> on network plane A
2018-02-16T17:56:10.050440+00:00 pld0114 kernel: tipc: Lost contact with 
<1.1.16>

3. SC found the PLD left the cluster.

2018-02-16T17:56:10.050704+00:00 scm2 osafimmd[3095]: NO MDS event from svc_id 
25 (change:4, dest:296935520731140)
2018-02-16T17:56:10.052770+00:00 scm2 osafclmd[3302]: NO Node 69135 went down. 
Not sending track callback for agents on that node
2018-02-16T17:56:10.054411+00:00 scm2 osafimmnd[3106]: NO Global discard node 
received for nodeId:10e0f pid:3516
2018-02-16T17:56:10.054505+00:00 scm2 osafimmnd[3106]: NO Implementer disconnected 15 
<0, 10e0f(down)> (MsgQueueService69135)
2018-02-16T17:56:10.055158+00:00 scm2 osafamfd[3312]: NO Node 'PLD0114' left 
the cluster

4. One more second later, the TIPC link also got recovered.

2018-02-16T17:56:11.054553+00:00 pld0114 kernel: tipc: Established link 
<1.1.14:bond0-1.1.16:eth2> on network plane A

5. However, PLD was still impacted by the network issue and was trying to go 
into ‘SC Absent Mode’.

2018-02-16T17:56:11.057260+00:00 pld0114 osafamfnd[3626]: NO AVD NEW_ACTIVE, 
adest:1
2018-02-16T17:56:11.057407+00:00 pld0114 osafamfnd[3626]: NO Sending node up 
due to NCSMDS_NEW_ACTIVE
2018-02-16T17:56:11.057684+00:00 pld0114 osafamfnd[3626]: NO 19 SISU states sent
2018-02-16T17:56:11.057715+00:00 pld0114 osafamfnd[3626]: NO 22 SU states sent
2018-02-16T17:56:11.057775+00:00 pld0114 osafimmnd[3516]: NO Sleep done 
registering IMMND with MDS
2018-02-16T17:56:11.058243+00:00 pld0114 osafmsgnd[3665]: ER saClmDispatch 
Failed with error 9
2018-02-16T17:56:11.058283+00:00 pld0114 osafckptnd[3697]: NO Bad CLM handle. 
Reinitializing.
2018-02-16T17:56:11.059054+00:00 pld0114 osafimmnd[3516]: NO SUCCESS IN 
REGISTERING IMMND WITH MDS
2018-02-16T17:56:11.059116+00:00 pld0114 osafimmnd[3516]: NO Re-introduce-me 
highestProcessed:26209 highestReceived:26209
2018-02-16T17:56:11.059699+00:00 pld0114 osafimmnd[3516]: NO IMMD service is UP 
... ScAbsenseAllowed?:31536 introduced?:2
2018-02-16T17:56:11.059932+00:00 pld0114 osafimmnd[3516]: NO MDS: 
mds_register_callback: dest 10e0fb03c0010 already exist
2018-02-16T17:56:11.060297+00:00 pld0114 osafimmnd[3516]: NO Re-introduce-me 
highestProcessed:26209 highestReceived:26209
2018-02-16T17:56:11.062053+00:00 pld0114 osafamfnd[3626]: NO 25 CSICOMP states 
synced
2018-02-16T17:56:11.062102+00:00 pld0114 osafamfnd[3626]: NO 28 SU states sent
2018-02-16T17:56:11.064418+00:00 pld0114 osafimmnd[3516]: ER MESSAGE:26438 OUT 
OF ORDER my highest processed:26209 - exiting
2018-02-16T17:56:11.160121+00:00 pld0114 osafckptnd[3697]: NO CLM selection 
object was updated. (12)
2018-02-16T17:56:11.166764+00:00 pld0114 osafamfnd[3626]: NO saClmDispatch 
BAD_HANDLE
2018-02-16T17:56:11.167030+00:00 pld0114 osafamfnd[3626]: NO 
'safSu=PLD0114,safSg=NoRed,safApp=OpenSAF' component restart probation timer 
started (timeout: 600 ns)
2018-02-16T17:56:11.167102+00:00 pld0114 osafamfnd[3626]: NO Restarting a 
component of 'safSu=PLD0114,safSg=NoRed,safApp=OpenSAF' (comp restart count: 1)
2018-02-16T17:56:11.167135+00:00 pld0114 osafamfnd[3626]: NO 
'safComp=IMMND,safSu=PLD0114,safSg=NoRed,safApp=OpenSAF' faulted due to 
'avaDown' : Recovery is 'componentRestart'

6. SC received messages from the PLD, then it forced the PLD to reboot(due to 
the node sync timeout?).

2018-02-16T17:56:11.058121+00:00 scm2 osafimmd[3095]: NO MDS event from svc_id 
25 (change:3, dest:296935520731140)
2018-02-16T17:56:11.058515

Re: [users] Errors in running immxml-configure with OpenSaf 5.3 in a Ubuntu VM

2018-03-08 Thread Anders Widell

Just a wild guess: maybe your Linux distribution installs Python version 
3 as default? We haven't fully converted our Python code to be 
compatible with version 3 yet (it ought to be done, though). Could you 
run 'python --version' to check your default version. If it is version 
3, you probably need to install version 2 and use it to run the OpenSAF 
Python code.


OpenSAF supports running in virtual machines.

regards,

Anders Widell

On 03/07/2018 09:59 PM, Feng Xie wrote:


Hi Dheeraj, Gary and Anders,

   Thanks a lot for the quick response! With your help, I was able 
to solve the original issue of locating the share libraries. The 
current issue I am facing now is with immxml-configure, which failed 
in File "./immxml-merge", line 370, in save_result


self.imm_content_element.toxml(encoding).replace("/>", ">") + "\n"). 
As a result, IMMD can’t be started. My debug procedure and detailed 
error messages are attached at the end for your reference. I am stuck 
at this step. Any help will be highly appreciated!


   Let me provide some background information about my current 
effort so that you can better understand my situation:


 1. I am trying to use opensaf for core middleware services like
messaging (IPC), log, timer, etc.. The target environment is a
single-node multicore (ARM) Linux embedded system. If opensaf
works in a single-node environment, we may extend to a cluster
environment in the future.

 2. I created a virtual machine using VMWare on a X86 Dell PC with
ubuntu Linux and installed the latest opensaf tarball (OpenSAF
package opensaf-5.18.02.tar.gz in a Ubuntu Linux VM (Linux version
4.4.0-116-generic (buildd@lgw01-amd64-021) (gcc version 5.4.0
20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.9));

 3. As part of proof of concept, my goal is to start opensaf related
daemons in such single-node VM environment and run several sample
programs.

 4. The installation/configuration procedure I follow is exactly as
Anders suggested:
https://sourceforge.net/p/opensaf/wiki/OpenSAF%20as%20an%20application/.
I did not try any extra advanced configuration yet. Thus, I have
NOT changed nid.conf. The underlying transport is still TCP. The
OPENSAF_GROUP/USER are still “opensaf”. The node_type is
“controller” and the node_name is “feng-opensaf-2”.

   By the way, from the above website, I saw a statement like 
“Note no sudo or virtualization!”. Does it mean that OpenSAF can’t be 
run in a VM environment? Thanks again for your help!


Feng

P.S.

Procedure and error messages:

fxie@feng-opensaf-2:~/local/share/opensaf/immxml$ sudo ./immxml-configure

error: immxml-merge SC templates failed. Aborting script! exitCode: 1

fxie@feng-opensaf-2:~/local/share/opensaf/immxml$

I was able to locate a log file under /tmp/immxml_configure. LFwfeK:

root@feng-opensaf-2:/tmp/immxml_configure.LFwfeK# ls

immxml-configure.log intermediatefiles  nodes  templatedir

root@feng-opensaf-2:/tmp/immxml_configure.LFwfeK# vi immxml-configure.log

/* content from the above log file is highlighted in red below */

encoding in first source xml document: utf-8

Traceback (most recent call last):

  File "./immxml-merge", line 611, in 

    main(sys.argv[1:])

  File "./immxml-merge", line 603, in main

    merged_doc.save_result()

  File "./immxml-merge", line 370, in save_result

self.imm_content_element.toxml(encoding).replace("/>", ">") + "\n")

TypeError: a bytes-like object is required, not 'str'

 I used “immxml-clustersize -s 1” to generate the following 
nodes.cfg:


SC SC-1 SC-1

and I replaced the third column with “feng-opensaf-2”, which I got 
from “hostname -s”. In fact, I also tried “immxml-clustersize -s 2” 
and replaced the third column with “feng-opensaf-2”. I got the same error.


If I started running opensafd, it would be stuck. Using “journalctl 
-xe” revealed that immd could not be started:


Mar 07 14:57:40 feng-opensaf-2 osafimmd[71940]: WA IMMND coordinator 
at 7f01 apparently crashed => electing new coord


Mar 07 14:57:40 feng-opensaf-2 osafimmd[71940]: ER Failed to find 
candidate for new IMMND coordinator (ScAbsenceAllowed:0 RulingEpoch:1


Mar 07 14:57:40 feng-opensaf-2 osafimmd[71940]: ER Active IMMD has to 
restart the IMMSv. All IMMNDs will restart


Mar 07 14:57:40 feng-opensaf-2 osafimmd[71940]: NO Cluster failed to 
load => IMMDs will not exit.


Mar 07 14:57:40 feng-opensaf-2 osafclmna[71906]: exiting for shutdown

Mar 07 14:57:40 feng-opensaf-2 osaffmd[71928]: exiting for shutdown

Mar 07 14:57:40 feng-opensaf-2 osafimmd[71940]: exiting for shutdown

Mar 07 14:57:41 feng-opensaf-2 osafrded[71917]: exiting for shutdown

Mar 07 14:57:41 feng-opensaf-2 osaftransportd[71899]: exiting for shutdown

Mar 07 14:57:41 feng-opensaf-2 opensafd[72383]: Starting OpenSAF failed

*From:*Dheeroj Ram <

Re: [users] Errors in running OpenSaf 5.3 in a Ubuntu VM

2018-03-06 Thread Anders Widell

To quickly get started with OpenSAF, you can also try the instructions 
on this wiki page, which explains how to install and run a single-node 
instance of OpenSAF in your own home directory:


https://sourceforge.net/p/opensaf/wiki/OpenSAF%20as%20an%20application/

There is also another wiki page explaining how to build and start a 
virtualized multi-node cluster using User-mode Linux:


https://sourceforge.net/p/opensaf/wiki/OpenSAF%20quick-start%20guide%20%28simulated%20cluster%29/

regards,

Anders Widell


On 03/06/2018 01:01 AM, Gary Lee wrote:

Hi

Perhaps you just need to run ldconfig.

Gary

On 06/03/18 08:19, Feng Xie wrote:

Hi,

  I am new in using OpenSAF. I encountered an error in running 
the latest OpenSaf software in a Ubuntu Linux VM. I would appreciate 
if somebody can provide some hint on this issue in specific and some 
references in running OpenSAF in general. Thanks a lot in advance!



   1.  Procedures and errors encountered



  *   Download the latest OpenSAF package opensaf-5.18.02.tar.gz 
in a Ubuntu Linux VM (Linux version 4.4.0-116-generic 
(buildd@lgw01-amd64-021) (gcc version 5.4.0 20160609 (Ubuntu 
5.4.0-6ubuntu1~16.04.9));




  *   Use "./configure -enable-tipc", "make" and "make install" 
to install the OpenSAF;




  *   Use "/etc/init.d/opensafd start"



  *   Errors (output from "journalctl -xe")


Mar 05 12:46:33 feng-opensaf-2 opensafd[57351]: Starting OpenSAF 
Services(5.18.02 - 2ed303919c3f0f36859028f47caf1498e882f45a) (Using TCP)
Mar 05 12:46:33 feng-opensaf-2 opensafd[57330]: Starting OpenSAF 
Services (Using TCP):/usr/local/lib/opensaf/opensafd: error while 
loading shared libraries: libopensaf_core.so.0: cannot open shared 
object file: No such file or directory

Mar 05 12:46:33 feng-opensaf-2 opensafd[57330]:  *
Mar 05 12:46:33 feng-opensaf-2 opensafd[57675]: Starting OpenSAF failed
Mar 05 12:46:33 feng-opensaf-2 systemd[1]: opensafd.service: Control 
process exited, code=exited status=127
Mar 05 12:46:33 feng-opensaf-2 systemd[1]: Failed to start OpenSAF 
daemon.

-- Subject: Unit opensafd.service has failed


   1.  I checked /usr/local/lib and I found libopensaf_core.so.0 by 
"ls -l /usr/local/lib"



-rwxr-xr-x 1 root root  974 Mar  2 16:29 libopensaf_core.la

lrwxrwxrwx 1 root root   24 Mar  2 16:29 libopensaf_core.so -> 
libopensaf_core.so.0.2.0


lrwxrwxrwx 1 root root   24 Mar  2 16:29 libopensaf_core.so.0 -> 
libopensaf_core.so.0.2.0


-rwxr-xr-x 1 root root  2816096 Mar  2 16:29 libopensaf_core.so.0.2.0





   1.  Then I modified my $LD_LIBRARY_PATH to include 
"/usr/local/lib" by adding it to the .bashrc file and did "source 
./bashrc"


root@opensaf-2:/home/xyz# echo $LD_LIBRARY_PATH
/usr/local/lib



   1.  Rerun opensaf by using "/etc/init.d/opensafd start", the same 
error was seen.



Feng
-- 


Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-users mailing list
Opensaf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-users



-- 


Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-users mailing list
Opensaf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-users





--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-users mailing list
Opensaf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-users

[users] Announcement of the OpenSAF 5.18.02 release

2018-02-02 Thread Anders Widell

The OpenSAF community is pleased to announce the availability of the 
OpenSAF 5.18.02 release. The source code for OpenSAF 5.18.02 and the 
corresponding documentation can be downloaded using the following links:


http://sourceforge.net/projects/opensaf/files/releases/opensaf-5.18.02.tar.gz/download
http://sourceforge.net/projects/opensaf/files/docs/opensaf-documentation-5.18.02.tar.gz/download

For a complete list of new features in this release, please refer to the 
NEWS at the wiki:


https://sourceforge.net/p/opensaf/wiki/NEWS-5.18.02/

See the ChangeLog for a full list of changes in this release:

https://sourceforge.net/p/opensaf/wiki/ChangeLog-5.18.02/

Note that starting from the August 2017 release, we are using a new 
version numbering scheme for OpenSAF. The components in the OpenSAF 
version number 5.18.02 represent the major release (5), followed by the 
year (18) and month (02) when the release was made. This change was made 
as a step towards introducing continuous delivery in the OpenSAF project.


Thank you for your continued interest in OpenSAF and to everyone who has 
contributed to this release.


regards,
Anders Widell


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-users mailing list
Opensaf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Re: [users] Documentation for trace mask and log levels

2018-01-03 Thread Anders Widell


See replies inline, marked AndersW>

regards,

Anders Widell


On 12/13/2017 09:34 PM, Carroll, James R wrote:

Hi All,

We are using OpenSAF 5.2, and have some questions on the usage of the trace 
mask, and the log levels.

   1.  First off, is there any documentation that explains their usage in 
detail?  Can they be used in parallel, are they mutually exclusive, etc.  Are 
there recommended usages and settings for both?
Trace mask and log level are two separate things and can be used 
independently of each other. I think the recommended setting for normal 
operation would be to have trace disabled and log level set to notice.



   2.  Looking at a sample config file, such as amfd.conf, there is a line that states 
"uncomment to enable trace".  The trace mask is all ones (0x).  What 
are my options for the trace mask, is it only ON and OFF, or can I gradually increase 
levels, say from 0x, to 0x, to 0x, 0X), and other values 
in-between?  What do the different nibbles of the mask represent?
It is possible to enable subsets of the log messages by choosing to set 
only some of the bits in the bitmask. However, this is not used in a 
very consistent way so it is not very useful. Maybe the only thing used 
consistently is TRACE_ENTER and TRACE_LEAVE, which are logged at the 
start and end of a function, respectively. The trace levels are listed 
in logtrace.h:


enum logtrace_categories {
  CAT_LOG = 0,
  CAT_TRACE,
  CAT_TRACE1,
  CAT_TRACE2,
  CAT_TRACE3,
  CAT_TRACE4,
  CAT_TRACE5,
  CAT_TRACE6,
  CAT_TRACE7,
  CAT_TRACE8,
  CAT_TRACE_ENTER,
  CAT_TRACE_LEAVE,
  CAT_MAX
};

The corresponding bits are two to the power of the trace level.


   3.  Looking at the same sample config file, there is a line that states 
"uncomment to enable info level level logging".  But perusing the code shows 
that there are at least 3 log levels (and maybe more?), LOG_INFO, LOG_DEBUG, LOG_NOTICE.  
  What do the different log levels do, and are they hierarchical, so that DEBUG implies 
INFO+DEBUG?
The log level is similar to the levels in syslog, so yes they are 
cumulative. debug implies info and notice.



Thanks.

Jim





--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-users mailing list
Opensaf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-users





--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-users mailing list
Opensaf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Re: [users] Recommendations on How to handle "SA_AIS_ERR_TIMEOUT"

2017-12-12 Thread Anders Widell

In many cases, I think you can re-try the operation if you get 
SA_AIS_ERR_TIMEOUT. But you must take into account the possibility that 
it was successful already the first time, and think about what the 
consequences would be if you perform the same operation a second time. 
In you particular example with LOCK_INSTANTIATE, I don't think there is 
any problem. I am guessing that the only think that would happen is that 
you get back an error code saying that the entity is already locked. 
Your code must then be prepared to get this error code and treat it as 
normal after a re-try.


regards,

Anders Widell


On 12/05/2017 06:16 PM, Carroll, James R wrote:

Hi All,

We are using OpenSAF 5.2, and are seeing cases where we get the error code:  
SA_AIS_ERR_TIMEOUT.  According to the spec, An implementation-dependent timeout 
occurred before the
call could complete. It is unspecified whether the call succeeded or whether it 
did not.

>From a design viewpoint, what are we supposed to do when something "might have 
succeeded"?  Are there any practical recommendations or suggestions on how to handle 
this condition?

In our particular case, we are trying to do a node LOCK_INSTANTIATE operation.  
 There are 2 scenarios to consider:

   1.  If I assume the LOCK_I Failed, I can send the command again.  However, 
if the command actually succeeded, then sending the command again will result 
in an illegal state transition.  How do I determine if it succeeded or failed?
   2.  Is there any type of hierarchical association of the notifications? For 
example, if issuing the node LOCK_I command were to generate 100 notifications, 
can I assume that as long as I see the 
ADMINISTRATIVE_STATE_CHANGE_NOTIFICATION, that reports the node is LOCK_I, I 
can disregard the other 99 notifications (SERVICE_INSTANCE_UNASSIGNED, etc).  
In other words, does the receipt of the highest level notification imply that 
all the other lower ones have occurred?

Thanks.

Jim


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-users mailing list
Opensaf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-users




--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-users mailing list
Opensaf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-users

[users] Announcement of the OpenSAF 5.17.11.1 release

2017-11-28 Thread Anders Widell

The OpenSAF community is pleased to announce the availability of the 
OpenSAF 5.17.11.1 release. This is a maintenance release that only 
contains bug-fixes and no new features. The source code for OpenSAF 
5.17.11.1 and the corresponding documentation can be downloaded using 
the following links:


http://sourceforge.net/projects/opensaf/files/releases/opensaf-5.17.11.1.tar.gz/download
http://sourceforge.net/projects/opensaf/files/docs/opensaf-documentation-5.17.11.1.tar.gz/download

See the ChangeLog for a full list of changes since the previous feature 
release (5.17.11):


https://sourceforge.net/p/opensaf/wiki/ChangeLog-5.17.11.1/

Note that starting from the August 2017 release, we are using a new 
version numbering scheme for OpenSAF. The components in the OpenSAF 
version number 5.17.11.1 represent the major release (5), followed by 
the year (17) and month (11) when the release was made. The last number 
(1) indicates that this is a bug-fix release which doesn't contain any 
new features.


Thank you for your continued interest in OpenSAF and to everyone who has 
contributed to this release.


regards,
Anders Widell


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-users mailing list
Opensaf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Re: [users] opensaf failed to start on Standby with error :Peer rde@a0a0f has standby state => possible fail over, waiting...

2017-11-23 Thread Anders Widell

It looks like the node with TIPC address 1.1.10 is a system controller. 
Is this expected? Please check that you have configured at most two 
nodes as system controllers (i.e. check the IMM configuration as well as 
the file /etc/opensaf/node_type). I don't remember in what version we 
implemented support for more than two system controllers, but I don't 
think it in such an old version of OpenSAF.


regards,
Anders Widell

On 11/23/2017 09:12 AM, Ashish Kumar34 wrote:

Hi All,

I am new to opensaf . I am using opensaf 4.6.2  On windRiver 4.3.

I am facing difficulties to start opensaf on Standby controller.
Getting the below logs at /var/log/messages.

Nov 22 17:14:00 localhost opensafd: Starting OpenSAF Services (Using TIPC)
Nov 22 17:14:00 localhost kernel: TIPC: Activated (version 1.7.7 compiled Sep  
2 2016 22:58:00)
Nov 22 17:14:00 localhost kernel: NET: Registered protocol family 30
Nov 22 17:14:00 localhost kernel: TIPC: Started in single node mode
Nov 22 17:14:00 localhost kernel: TIPC: Started in single node mode
Nov 22 17:14:00 localhost kernel: TIPC: Started in network mode
Nov 22 17:14:00 localhost kernel: TIPC: Own node address <1.1.1>, network 
identity 1244
Nov 22 17:14:00 localhost kernel: TIPC: Enabled bearer , discovery domain 
<1.1.0>, priority 10
Nov 22 17:14:00 localhost kernel: TIPC: Established link 
<1.1.1:base-1.1.2:base> on network plane A
Nov 22 17:14:00 localhost kernel: TIPC: Established link 
<1.1.1:base-1.1.10:base> on network plane A
Nov 22 17:14:00 localhost kernel: TIPC: Established link 
<1.1.1:base-1.1.5:base> on network plane A
Nov 22 17:14:00 localhost kernel: TIPC: Established link 
<1.1.1:base-1.1.5:base> on network plane A
Nov 22 17:14:00 localhost osafrded[4966]: Started
Nov 22 17:14:00 localhost /etc/wrs-lsb/lsb_start_daemon: osafrded startup - OK
Nov 22 17:14:00 localhost osafrded[4966]: NO Peer rde@a0a0f has standby state 
=> possible fail over, waiting...
Nov 22 17:14:00 localhost /etc/wrs-lsb/lsb_log_message:  - OK
Nov 22 17:14:01 localhost osafrded[4966]: NO Peer rde@a0a0f has standby state 
=> possible fail over, waiting...
Nov 22 17:14:02 localhost osafrded[4966]: NO Peer rde@a0a0f has standby state 
=> possible fail over, waiting...
Nov 22 17:14:03 localhost osafrded[4966]: NO Peer rde@a0a0f has standby state 
=> possible fail over, waiting...
Nov 22 17:14:04 localhost osafrded[4966]: NO Peer rde@a0a0f has standby state 
=> possible fail over, waiting...
Nov 22 17:14:05 localhost osafrded[4966]: NO Peer rde@a0a0f has standby state 
=> possible fail over, waiting...
Nov 22 17:14:06 localhost osafrded[4966]: NO Peer rde@a0a0f has standby state 
=> possible fail over, waiting...
Nov 22 17:14:07 localhost osafrded[4966]: NO Peer rde@a0a0f has standby state 
=> possible fail over, waiting...
Nov 22 17:14:08 localhost osafrded[4966]: NO Peer rde@a0a0f has standby state 
=> possible fail over, waiting...
Nov 22 17:14:09 localhost osafrded[4966]: NO Peer rde@a0a0f has standby state 
=> possible fail over, waiting...
Nov 22 17:14:10 localhost osafrded[4966]: NO Peer rde@a0a0f has standby state 
=> possible fail over, waiting...
Nov 22 17:14:11 localhost osafrded[4966]: NO Peer rde@a0a0f has standby state 
=> possible fail over, waiting...


However  opensaf working fine in Active Controller.

Below is the opensafd status output from Active controller :

atcafs-n10s2:~# /etc/init.d/opensafd status
safSISU=safSu=n10s2\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed1,safApp=OpenSAF
 saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=n10s5\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed10,safApp=OpenSAF
 saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=SU-n10s5\,safSg=HenbGw-SG\,safApp=HenbGwApp_PL_n10s5,safSi=HenbGw,safApp=HenbGwApp_PL_n10s5
 saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=n10s10\,safSg=NoRed\,safApp=OpenSAF,safSi=NoRed2,safApp=OpenSAF
 saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=SU-n10s10\,safSg=HenbGw-SG\,safApp=HenbGwApp_PL_n10s10,safSi=HenbGw,safApp=HenbGwApp_PL_n10s10
 saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=SU-n10s2\,safSg=HenbGw-SG\,safApp=HenbGwApp,safSi=HenbGw,safApp=HenbGwApp
 saAmfSISUHAState=ACTIVE(1)
safSISU=safSu=n10s2\,safSg=2N\,safApp=OpenSAF,safSi=SC-2N,safApp=OpenSAF
 saAmfSISUHAState=ACTIVE(1)
atcafs-n10s2:~#

Please help. I am struggling to fix the problem.


Thanks in advance.

Regards,
Ashish




Disclaimer:  This message and the information contained herein is proprietary and 
confidential and subject to the Tech Mahindra policy statement, you may review the policy 
at http://www.techmahindra.com/Disclaimer.html 
<http://www.techmahindra.com/Disclaimer.html> externally 
http://tim.techmahindra.com/tim/disclaimer.html 
&l

Re: [users] opensaf failed to start with error :MDS:MDTM:TCP DBSRsock unable to connect err :No such file or directory

2017-11-20 Thread Anders Widell


I assume you with to use TIPC? Please check the following:

1) That you have passed the --enable-tipc option to ./configure when 
building OpenSAF.
2) That you have the following configuration in /etc/opensaf/nid.conf: 
export MDS_TRANSPORT=TIPC
3) That you don't have any stale /var/lib/opensaf/osaf_dtm_intra_server 
socket file (remove it if you find it).


regards,
Anders Widelll

On 11/20/2017 03:31 PM, Ashish Kumar34 wrote:

Hi All,

I am new to opensaf . I built  opensaf (opensaf-4.6.2) on WindRiver Linux 4.3.
When I am trying to start I am getting below error in /var/log/message.
 Feb  9 01:53:14 localhost opensafd: Starting OpenSAF Services 
(Using TIPC)
Feb  9 01:53:14 localhost kernel: TIPC: Activated (version 1.7.7 compiled Sep  
2 2016 22:58:00)
Feb  9 01:53:14 localhost kernel: NET: Registered protocol family 30
Feb  9 01:53:14 localhost kernel: TIPC: Started in single node mode
Feb  9 01:53:14 localhost kernel: TIPC: Started in network mode
Feb  9 01:53:14 localhost kernel: TIPC: Own node address <1.1.2>, network 
identity 1246
Feb  9 01:53:14 localhost kernel: TIPC: Enabled bearer , discovery domain 
<1.1.0>, priority 10
Feb  9 01:53:14 localhost osafrded[9881]: Started
Feb  9 01:53:14 localhost osafrded[9881]: MDS:MDTM:TCP DBSRsock unable to 
connect err :No such file or directory
Feb  9 01:53:14 localhost osafrded[9881]: ER ncs_core_agents_startup FAILED
Feb  9 01:53:14 localhost osafrded[9881]: Exiting...
Feb  9 01:53:14 localhost opensafd[9850]: ER Failed
DESC:RDE
Feb  9 01:53:14 localhost opensafd[9850]: ER Going for recovery
Feb  9 01:53:14 localhost opensafd[9850]: ER Trying To RESPAWN 
/usr/lib64/opensaf/clc-cli/osaf-rded attempt #1
Feb  9 01:53:14 localhost opensafd[9850]: ER Sending SIGKILL to RDE, pid=9870
Feb  9 01:53:14 localhost /etc/wrs-lsb/lsb_start_daemon: osafrded startup - OK
Feb  9 01:53:14 localhost /etc/wrs-lsb/lsb_killproc: osafrded shutdown - FAILED
Feb  9 01:53:14 localhost /etc/wrs-lsb/lsb_log_message:  - OK


Need your help as I am struggling to find solution for this problem.

Note : when compiled as target 32 bit, opensaf  is starting and running fine , 
but when 64bit used above error are coming.
Thanks in Advance.

Thanks,
Ashish


Disclaimer:  This message and the information contained herein is proprietary and 
confidential and subject to the Tech Mahindra policy statement, you may review the policy 
at http://www.techmahindra.com/Disclaimer.html 
 externally 
http://tim.techmahindra.com/tim/disclaimer.html 
 internally within TechMahindra.


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-users mailing list
Opensaf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-users




--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-users mailing list
Opensaf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-users

[users] Announcement of the OpenSAF 5.17.11 release

2017-11-03 Thread Anders Widell

The OpenSAF community is pleased to announce the availability of the 
OpenSAF 5.17.11 release. The source code for OpenSAF 5.17.11 and the 
corresponding documentation can be downloaded using the following links:


http://sourceforge.net/projects/opensaf/files/releases/opensaf-5.17.11.tar.gz/download
http://sourceforge.net/projects/opensaf/files/docs/opensaf-documentation-5.17.11.tar.gz/download

For a complete list of new features in this release, please refer to the 
NEWS at the wiki:


https://sourceforge.net/p/opensaf/wiki/NEWS-5.17.11/

See the ChangeLog for a full list of changes in this release:

https://sourceforge.net/p/opensaf/wiki/ChangeLog-5.17.11/

Note that starting from the August 2017 release, we are using a new 
version numbering scheme for OpenSAF. The components in the OpenSAF 
version number 5.17.11 represent the major release (5), followed by the 
year (17) and month (11) when the release was made. This change was made 
as a step towards introducing continuous delivery in the OpenSAF project.


Thank you for your continued interest in OpenSAF and to everyone who has 
contributed to this release.


regards,
Anders Widell



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-users mailing list
Opensaf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Re: [users] notification vs model updates

2017-10-02 Thread Anders Widell


Hi!

This is expected when considering the current OpenSAF design. The reason 
is that IMM and NTF use different communication protocols; IMM uses a 
form of group communication based on extended virtual synchrony, whereas 
NTF uses simple point-to-point communication.


To avoid this problem, you can use the IMM applier interface instead of 
NTF notifications.


regards,

Anders Widell


On 09/22/2017 03:24 PM, Carroll, James R wrote:

Hi,

We are using OpenSAF 5.2, and have observed the following behaviors:

1)We receive a Notification for an update

2)When we query the IMM for the updated information, the model has not been 
updated

3)If we wait, and query the IMM again, the model now has the correct 
information.

This happens infrequently, but has been observed a number of times.
Can you clarify this behavior?  Is this normal or expected?

Thanks.

Jim

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-users mailing list
Opensaf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-users




--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-users mailing list
Opensaf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Re: [users] Open-SAF 5.2 - question on 32-bit/64bit compatibility

2017-08-29 Thread Anders Widell

I have never heard about anyone trying this before. In theory it could 
work, if you install both 32-bit and 64-bit versions of the OpenSAF 
libraries on the same node. However I wouldn't be surprised if you 
encounter some problems when you do this.


regards,

Anders Widell


On 08/28/2017 10:51 PM, Carroll, James R wrote:

Hi,

We are running with OpenSAF 5.2.  We have been running it with 32-bit only, and 
64-bit only applications.
A potential need was recently identified, where we may need to support both 
32-bit, and 64-bit applications, on a single node.
Has this type of scenario existed previously?  Is there any history here?

If not, does anyone know if a 32-bit application can run on a 64-bit OpenSAF 
node?

Thanks.

Jim


--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-users mailing list
Opensaf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-users




--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-users mailing list
Opensaf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Re: [users] MAX_PROCESSES for TCP errors

2017-08-18 Thread Anders Widell

FYI, ticket [#2240] has now been pushed to the develop branch and will 
be included in the October release. I wrote a new ticket [#2561] for 
implementing a similar change in the intra-node poll loop.


regards,

Anders Widell


On 07/12/2017 01:26 PM, Anders Widell wrote:
I discovered a similar limit for inter-node connections, effectively 
limiting the maximum cluster size to 100 nodes (or so):


https://sourceforge.net/p/opensaf/tickets/2240/

I made some progress refactoring DTM to use epoll instead of poll, so 
that the 100-node limit could be removed without sacrificing 
performance. This refactoring focused on the inter-node communication, 
but we could do the same also for intra-node communication.


regards,

Anders Widell


On 07/11/2017 11:16 PM, Carroll, James R wrote:

All,

I am using opensaf 5.0, configured to use TCP.  We recently 
encountered an issue with the DTM_INTRANODE_MAX_PROCESSES=100.
We were unaware if this seemingly random restriction, and we just 
happened to be starting a large number of processes, easily exceeding 
the default value of 100.  It took quite some time to figure out that 
we had exceeded this limit. Setting the value to a larger value 
resolved the issue.


The error messages generated by exceeding this number were not very 
helpful, often just indicating MDS messaging errors.
It would have been much more helpful to have a message of the form 
"MAX Processes exceeded, cannot start remaining processes".
Instead, the obscure error messages led us down a path where we 
thought processes were getting starved out, etc,


Is there any type of open ticket regarding this limitation? Can one 
be added?


Thanks.

Jim






-- 


Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-users mailing list
Opensaf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-users




-- 


Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-users mailing list
Opensaf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-users




--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-users mailing list
Opensaf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Re: [users] Issue with applications started as root user

2017-07-25 Thread Anders Widell


See my comments below, marked AndersW>

regards,

Anders Widell


On 07/25/2017 02:08 PM, William R Elliott wrote:

Hi Anders,
Thanks for the response.  So I just want to make sure that I completely 
understand what you are saying here, there's no way to make opensaf start an 
application as a different user?  I.e. opensaf will always start an application 
as root, and the developer must change the application code to start as another 
user?


AndersW> As far as I know, there is currently no support in OpenSAF for 
specifying what user ID and/or group ID a component shall be started with.



The reason I'm asking is that we have an instantiation script that actually starts our 
applications and I was hoping that by using the "su" command to change to the 
correct user and group in that script, this would solve my problem.
AndersW> Yes, you should be able to use something like "su -c 
your_application user_id" to launch your application from the CLC 
script. Be aware that the su command probably does a lot more than just 
setting the group ID and user ID, though.


thanks
-Original Message-
From: Anders Widell [mailto:anders.wid...@ericsson.com]
Sent: Monday, July 24, 2017 8:56 AM
To: William R Elliott; opensaf-users@lists.sourceforge.net
Subject: Re: [users] Issue with applications started as root user

I think I recall that this behaviour was changed so that applications can 
choose themselves what user-id and group-id to run with.
OPENSAF_USER and OPENSAF_GROUP specify what user-id the OpenSAF processes 
themselves shall run with, which may be different from the user-id the 
applications shall run with.

So the application will be started as root:root and must call setgid() and 
setuid() to change its user-id and group-id.

regards,

Anders Widell


On 07/20/2017 11:50 PM, William R Elliott wrote:

Hi All,
I have recently upgraded from opensaf version 4.4.0 to 5.1.0.  In 4.4.0, when I 
set the OPENSAF_GROUP and OPENSAF_USER variables in the nid.conf file and 
unlocked a service unit the applications in each component were started as the 
OPENSAF_USER which is what I needed. However, in 5.1.0 the applications are now 
being started as the root user instead of the OPENSAF_USER in nid.conf.

I’ve read the config README file, as well as other README files, but I don’t 
see any references concerning this problem, or what has changed in 5.1.0 that 
would exhibit this kind of behavior.  I’ve read through the opensaf documents 
and I still have not found anything concerning this scenario.

I have verified the following:

1)  OPENSAF_USER and OPENSAF_GROUP variables are set correctly in nid.conf 
file

2)  The user and group are set correctly on the instantiation scripts

3)  opensaf was not built with: CPPFLAGS=-DRUNASROOT

I’ve even tried changing the amfnd main.cc file main function to directly call 
daemonize instead of daemonize_as_user to ensure osafamfnd started as the 
OPENSAF_USER, but for some reason osafamfnd hung and the opensaf services did 
not come up.

I could be missing something simple here, but I can’t think what else to try.  
I would appreciate any help with this problem.

Thanks

[https://www.netcracker.com/assets/img/netcracker-social-final.png] ƕ




The information transmitted herein is intended only for the person or entity to 
which it is addressed and may contain confidential, proprietary and/or 
privileged material. Any review, retransmission, dissemination or other use of, 
or taking of any action in reliance upon, this information by persons or 
entities other than the intended recipient is prohibited. If you received this 
in error, please contact the sender and delete the material from any computer.
--
 Check out the vibrant tech community on one of the world's
most engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-users mailing list
Opensaf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-users





The information transmitted herein is intended only for the person or entity to 
which it is addressed and may contain confidential, proprietary and/or 
privileged material. Any review, retransmission, dissemination or other use of, 
or taking of any action in reliance upon, this information by persons or 
entities other than the intended recipient is prohibited. If you received this 
in error, please contact the sender and delete the material from any computer.




--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-users mailing list
Opensaf-users@lists.sourceforge.net
https://lists.sourcefor

Re: [users] MAX_PROCESSES for TCP errors

2017-07-12 Thread Anders Widell

I discovered a similar limit for inter-node connections, effectively 
limiting the maximum cluster size to 100 nodes (or so):


https://sourceforge.net/p/opensaf/tickets/2240/

I made some progress refactoring DTM to use epoll instead of poll, so 
that the 100-node limit could be removed without sacrificing 
performance. This refactoring focused on the inter-node communication, 
but we could do the same also for intra-node communication.


regards,

Anders Widell


On 07/11/2017 11:16 PM, Carroll, James R wrote:

All,

I am using opensaf 5.0, configured to use TCP.  We recently encountered an 
issue with the DTM_INTRANODE_MAX_PROCESSES=100.
We were unaware if this seemingly random restriction, and we just happened to 
be starting a large number of processes, easily exceeding the default value of 
100.  It took quite some time to figure out that we had exceeded this limit. 
Setting the value to a larger value resolved the issue.

The error messages generated by exceeding this number were not very helpful, 
often just indicating MDS messaging errors.
It would have been much more helpful to have a message of the form "MAX Processes 
exceeded, cannot start remaining processes".
Instead, the obscure error messages led us down a path where we thought 
processes were getting starved out, etc,

Is there any type of open ticket regarding this limitation? Can one be added?

Thanks.

Jim






--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-users mailing list
Opensaf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-users




--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-users mailing list
Opensaf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Re: [users] OpenSAF 5.0, clarification of amf_objects.xml, preferred SU values

2017-05-22 Thread Anders Widell

I think the saAmfSGNumPrefAssignedSUs is unused for the 2N redundancy 
model, so this configuration attribute is ignored (AMF experts: correct 
me if I am wrong here). The saAmfSGNumPrefInserviceSUs specifies how 
many SUs you wish to have instantiated, see page 122 of the AMF 
specification http://opensaf.sourceforge.net/SAI-AIS-AMF-B.04.01.AL.pdf

I don't know how the value 10 was chosen for saAmfSGNumPrefInserviceSUs. 
This value was chosen before OpenSAF had support for more than two 
system controller nodes, so it probably should have been set to 2. Now 
OpenSAF supports any number of system controller nodes and you may wish 
to increase the value so that it is equal to or greater than the number 
of system controller nodes in your cluster.

regards,
Anders Widell

On 05/19/2017 06:58 PM, Carroll, James R wrote:
> Hi,
>
> I am using OpenSAF 5.0, and had a question on the following default values in 
> the file amf_objects.xml.
>
> There are 2 related attributes: saAmfSGNumPrefAssignedSUs, and 
> saAmfSGNumPrefInserviceSUs.
> Both have a value of 10.
>
> Can anyone clarify where this value 10 comes from?  Or what happens if these 
> are increased to larger numbers, such as 50 or 100?  And do they both need to 
> be the same value?  I cannot find any documentation that clarifies where 
> these values originated, and how they should be modified.
>
> Thanks.
>
> Jim
>
> --
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> ___
> Opensaf-users mailing list
> Opensaf-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/opensaf-users
>

--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-users mailing list
Opensaf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-users

[users] OpenSAF 5.2 RC2 is available

2017-04-03 Thread Anders Widell

Hi!

The second, release candidate for the upcoming OpenSAF 5.2 is now 
available for download using this link:

http://sourceforge.net/projects/opensaf/files/testing/opensaf-5.2.RC2.tar.gz/download

Please download and test this release candidate, so that we can fix any 
remaining bugs and regressions before the final OpenSAF 5.2 release. If 
you write a defect ticket on this release candidate, specify 5.2.RC2 in 
the Version field when writing the ticket.

For a full list of enhancements implemented in OpenSAF 5.2, see the 
following page:

https://sourceforge.net/p/opensaf/tickets/search/?q=status%3A%28accepted+review+fixed%29+AND+_milestone%3A%285.2.FC+5.2.RC1+5.2.RC2+5.2.0%29+AND+_type%3Aenhancement=100

regards,
Anders Widell



--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
___
Opensaf-users mailing list
Opensaf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Re: [users] Code analysis question

2017-03-09 Thread Anders Widell

We don't yet have any firm requirement for running these tools, but we 
are planning to revise the development process after the OpenSAF 5.2 
release. The new process will be a step towards continuous delivery, and 
the tools you mention will be run somewhere in the deployment pipeline.

regards,
Anders Widell

On 03/08/2017 04:45 PM, Colucci, Marc A wrote:
> All
>  I have a question about the OpenSAF software development 
> process. Before code is accepted into a final distribution, it is required by 
> the developer to run CppCheck, SpellCheck cpplint and produce a report? If 
> so, is that report available online somewhere?
>
> Also I see that there are Valgrind environment parameters in the ".conf" 
> files, does that mean that Valgrind is run before every major release 
> (5.0,5.1 etc.)?
>
> Thanks
> Marc
> --
> Announcing the Oxford Dictionaries API! The API offers world-renowned
> dictionary content that is easy and intuitive to access. Sign up for an
> account today to start using our lexical data to power your apps and
> projects. Get started today and enter our developer competition.
> http://sdm.link/oxford
> ___
> Opensaf-users mailing list
> Opensaf-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/opensaf-users
>


--
Announcing the Oxford Dictionaries API! The API offers world-renowned
dictionary content that is easy and intuitive to access. Sign up for an
account today to start using our lexical data to power your apps and
projects. Get started today and enter our developer competition.
http://sdm.link/oxford
___
Opensaf-users mailing list
Opensaf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Re: [users] Both controllers Active

2016-12-02 Thread Anders Widell

The "spare SC" feature is actually about making it possible to configure 
more than two system controller, so it is not primarily about improving 
split-brain detection. However, we implemented some improvements in the 
split-brain detection and recovery as part of this feature for the 
simple reason that the risk for split-brain is higher when you have more 
system controller nodes.

The split-brain recovery mechanism is very simple in OpenSAF 5.0, but I 
believe it will detect and recover from the situation you describe. If 
it is not enough, you can also try the remote fencing mechanism that was 
added in OpenSAF 5.1. There are plans to improve this further in future 
versions of OpenSAF, by using quorum and arbitration mechanisms.

regards,

Anders Widell


On 12/01/2016 09:18 PM, Nivrutti Kale wrote:
>
> Hi Anders,
>
> I have gone through the “Spare SC” feature documentation. I am not 
> able to understand how Spare SC feature will help in this case?
>
> In my case, there is a connectivity loss between 2 physical servers 
> which results into 2 opensaf cluster.
>
> Now when the connectivity is restored, there is no way to restore the 
> original state.
>
> Thanks,
>
> Nivrutti
>
> *From:*Anders Widell [mailto:anders.wid...@ericsson.com]
> *Sent:* Thursday, December 01, 2016 6:02 PM
> *To:* Nivrutti Kale <nk...@brocade.com>; 
> opensaf-users@lists.sourceforge.net
> *Subject:* Re: [users] Both controllers Active
>
> No, the "spare SC" feature was introduced in OpenSAF 5.0 and doesn't 
> exist in earlier versions of OpenSAF.
>
> regards,
>
> Anders Widell
>
> On 12/01/2016 01:24 PM, Nivrutti Kale wrote:
>
> Thanks for the response Anders.
>
> We are using opensaf 4.5.
>
> Is this feature not available in 4.5?
>
> Thanks,
>
> Nivrutti
>
> *From:* Anders Widell [mailto:anders.wid...@ericsson.com]
> *Sent:* Thursday, December 01, 2016 5:48 PM
> *To:* Nivrutti Kale <nk...@brocade.com>
> <mailto:nk...@brocade.com>; opensaf-users@lists.sourceforge.net
> <mailto:opensaf-users@lists.sourceforge.net>
> *Subject:* Re: [users] Both controllers Active
>
> Which version of OpenSAF are you using? A simple split-brain
> detection and recovery mechanism was added as part of the "spare
> SC" feature in OpenSAF 5.0.
>
> regards,
>
> Anders Widell
>
> On 12/01/2016 09:49 AM, Nivrutti Kale wrote:
>
> Hi All,
>
>   
>
> In my opensaf cluster there are 2 controllers and 8 payloads (10 VMs) 
> on two physical servers. 1 Controller and 4 payload resides on one physical 
> server each.
>
> There is a connectivity loss between 2 physical servers which results 
> into 2 different opensaf cluster with 2 Active controllers.
>
>   
>
> After some time connectivity is restored between 2 physical servers. 
> Still there are 2 opensaf clusters with 2 Active Controllers.
>
> Is there any parameter in opensaf which will auto-recover the state 
> of the opensaf cluster to normal. i.e 1 Active Controller and 1 Standby 
> controller with 8 payloads in single opensaf cluster.
>
>   
>
> Thanks,
>
> Nivrutti Kale
>
> Manager, Software Engineering, Mobile Networking R
>
> Brocade
>
> Mumbai, India
>
> M. +91 9503023209
>
> 
> [http://intranet.brocade.com/Marketing/PublishingImages/brocade_email_signature.gif]
>
>   
>
>   
>
>
>
> 
> --
>
>
>
> ___
>
> Opensaf-users mailing list
>
> Opensaf-users@lists.sourceforge.net
> <mailto:Opensaf-users@lists.sourceforge.net>
>
> https://lists.sourceforge.net/lists/listinfo/opensaf-users
> 
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.sourceforge.net_lists_listinfo_opensaf-2Dusers=DgMC-g=IL_XqQWOjubgfqINi2jTzg=qssxjGQZARrEa_Yax-32kXOgWL2XHZgOPUvhIFaqP1k=89lj_FWtizgPI0LECfAA4FcKq_o-IRr4GaVpM0eMzKY=G3LSNLTwt_e1DJLUZaQvg6LrWC0TrCI4u8keSYorpwk=>
>

--
Check out the vibrant tech community on one of the world's most 
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
___
Opensaf-users mailing list
Opensaf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Re: [users] Both controllers Active

2016-12-01 Thread Anders Widell

No, the "spare SC" feature was introduced in OpenSAF 5.0 and doesn't 
exist in earlier versions of OpenSAF.

regards,

Anders Widell


On 12/01/2016 01:24 PM, Nivrutti Kale wrote:
>
> Thanks for the response Anders.
>
> We are using opensaf 4.5.
>
> Is this feature not available in 4.5?
>
> Thanks,
>
> Nivrutti
>
> *From:*Anders Widell [mailto:anders.wid...@ericsson.com]
> *Sent:* Thursday, December 01, 2016 5:48 PM
> *To:* Nivrutti Kale <nk...@brocade.com>; 
> opensaf-users@lists.sourceforge.net
> *Subject:* Re: [users] Both controllers Active
>
> Which version of OpenSAF are you using? A simple split-brain detection 
> and recovery mechanism was added as part of the "spare SC" feature in 
> OpenSAF 5.0.
>
> regards,
>
> Anders Widell
>
> On 12/01/2016 09:49 AM, Nivrutti Kale wrote:
>
> Hi All,
>
>   
>
> In my opensaf cluster there are 2 controllers and 8 payloads (10 VMs) on 
> two physical servers. 1 Controller and 4 payload resides on one physical 
> server each.
>
> There is a connectivity loss between 2 physical servers which results 
> into 2 different opensaf cluster with 2 Active controllers.
>
>   
>
> After some time connectivity is restored between 2 physical servers. 
> Still there are 2 opensaf clusters with 2 Active Controllers.
>
> Is there any parameter in opensaf which will auto-recover the state of 
> the opensaf cluster to normal. i.e 1 Active Controller and 1 Standby 
> controller with 8 payloads in single opensaf cluster.
>
>   
>
> Thanks,
>
> Nivrutti Kale
>
> Manager, Software Engineering, Mobile Networking R
>
> Brocade
>
> Mumbai, India
>
> M. +91 9503023209
>
> 
> [http://intranet.brocade.com/Marketing/PublishingImages/brocade_email_signature.gif]
>
>   
>
>   
>
>
>
>
> 
> --
>
>
>
>
> ___
>
> Opensaf-users mailing list
>
> Opensaf-users@lists.sourceforge.net
> <mailto:Opensaf-users@lists.sourceforge.net>
>
> https://lists.sourceforge.net/lists/listinfo/opensaf-users
> 
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.sourceforge.net_lists_listinfo_opensaf-2Dusers=DgMC-g=IL_XqQWOjubgfqINi2jTzg=qssxjGQZARrEa_Yax-32kXOgWL2XHZgOPUvhIFaqP1k=89lj_FWtizgPI0LECfAA4FcKq_o-IRr4GaVpM0eMzKY=G3LSNLTwt_e1DJLUZaQvg6LrWC0TrCI4u8keSYorpwk=>
>

--
___
Opensaf-users mailing list
Opensaf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Re: [users] Both controllers Active

2016-12-01 Thread Anders Widell

Which version of OpenSAF are you using? A simple split-brain detection 
and recovery mechanism was added as part of the "spare SC" feature in 
OpenSAF 5.0.

regards,

Anders Widell


On 12/01/2016 09:49 AM, Nivrutti Kale wrote:
> Hi All,
>
> In my opensaf cluster there are 2 controllers and 8 payloads (10 VMs) on two 
> physical servers. 1 Controller and 4 payload resides on one physical server 
> each.
> There is a connectivity loss between 2 physical servers which results into 2 
> different opensaf cluster with 2 Active controllers.
>
> After some time connectivity is restored between 2 physical servers. Still 
> there are 2 opensaf clusters with 2 Active Controllers.
> Is there any parameter in opensaf which will auto-recover the state of the 
> opensaf cluster to normal. i.e 1 Active Controller and 1 Standby controller 
> with 8 payloads in single opensaf cluster.
>
> Thanks,
> Nivrutti Kale
> Manager, Software Engineering, Mobile Networking R
> Brocade
> Mumbai, India
> M. +91 9503023209
> [http://intranet.brocade.com/Marketing/PublishingImages/brocade_email_signature.gif]
>
>
>
>
> --
>
>
> ___
> Opensaf-users mailing list
> Opensaf-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/opensaf-users

--
___
Opensaf-users mailing list
Opensaf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Re: [users] OpenSAF release 5.0.1 can not promote SC after enable "headless cluster" feature

2016-10-11 Thread Anders Widell

I can send you a patch within the next few days and let you try it out.

regards,

Anders Widell


On 10/11/2016 11:36 AM, Jianfeng Dong wrote:
> Do you have a clear plan to remove this requirement?
> We want to know if we can't change node_id due to our architecture,  when we 
> could get a no-this-limit release to upgrade? After all, our products have 
> been deployed to many customers so we have to think about upgrade and 
> compatibility issues.
>
> Thanks,
> Jianfeng
>
> -Original Message-
> From: Anders Widell [mailto:anders.wid...@ericsson.com]
> Sent: Tuesday, October 11, 2016 4:10 PM
> To: Jianfeng Dong <jd...@juniper.net>; Neelakanta Reddy 
> <reddy.neelaka...@oracle.com>; opensaf-users@lists.sourceforge.net
> Subject: Re: [users] OpenSAF release 5.0.1 can not promote SC after enable 
> "headless cluster" feature
>
> Yes, this is required with the current implementation. It might be possible 
> to remove this requirement - I will think about how it can be done.
>
> regards,
>
> Anders Widell
>
>
> On 10/11/2016 09:06 AM, Jianfeng Dong wrote:
>> Is it obligatory that controller must have a slower slot_id than payload if 
>> we want to enable "headless" feature?
>> If it is obligatory, seems it's a big change to our architecture, but I will 
>> have a try at least.
>>
>> Thanks,
>> Jianfeng
>>
>> -Original Message-
>> From: Anders Widell [mailto:anders.wid...@ericsson.com]
>> Sent: Tuesday, October 11, 2016 2:30 PM
>> To: Jianfeng Dong <jd...@juniper.net>; Neelakanta Reddy
>> <reddy.neelaka...@oracle.com>; opensaf-users@lists.sourceforge.net
>> Subject: Re: [users] OpenSAF release 5.0.1 can not promote SC after
>> enable "headless cluster" feature
>>
>> There is a one-to-one mapping between /etc/opensaf/slot_id and the node_id. 
>> Simply make sure that all your system controller nodes have lower slot_id 
>> than any of your payloads. This file is read when the node is booted. You 
>> should be able to do an in-service renumbering of your nodes - just be 
>> careful so that you never have two nodes with the same node_id at the same 
>> time.
>>
>> Yes, the assumption is there in 5.1.0 as well.
>>
>> regards,
>>
>> Anders Widell
>>
>>
>> On 10/11/2016 04:29 AM, Jianfeng Dong wrote:
>>> Yes, in our product payload's node_id is lower than SC, could you please 
>>> tell us how to configure it?
>>>
>>> And, does this assumption exist in OpenSAF 5.1.0 as well?
>>>
>>> Thanks,
>>> Jianfeng
>>>
>>> -Original Message-
>>> From: Anders Widell [mailto:anders.wid...@ericsson.com]
>>> Sent: Tuesday, October 11, 2016 12:55 AM
>>> To: Jianfeng Dong <jd...@juniper.net>; Neelakanta Reddy
>>> <reddy.neelaka...@oracle.com>; opensaf-users@lists.sourceforge.net
>>> Subject: Re: [users] OpenSAF release 5.0.1 can not promote SC after
>>> enable "headless cluster" feature
>>>
>>> There is a (probably not so well documented :-) assumption that the system 
>>> controllers are configured with a lower node_id than the payloads. From 
>>> what I can see in the logs you sent, I think it looks like you have 
>>> configured the payload with a lower node_id than the system controllers.
>>>
>>> By the way, the headless feature has been improved in OpenSAF 5.1.0 so I 
>>> would suggest that you upgrade to that version if possible.
>>>
>>> regards,
>>>
>>> Anders Widell
>>>
>>>
>>> On 10/10/2016 06:04 PM, Jianfeng Dong wrote:
>>>> I tried with sufficient drive space but got same result, neither of the 
>>>> two SCs can be promoted to be controller until the payload reboot.
>>>>
>>>> I also checked the network link between SC and payload, they can PING each 
>>>> other when this issue happened. I suspect too the problem is caused by 
>>>> IMMD/IMMND link among those nodes, but don't know how to prove it.
>>>>
>>>> From: Neelakanta Reddy [mailto:reddy.neelaka...@oracle.com]
>>>> Sent: Monday, October 10, 2016 8:39 PM
>>>> To: Jianfeng Dong <jd...@juniper.net>;
>>>> opensaf-users@lists.sourceforge.net
>>>> Subject: Re: [users] OpenSAF release 5.0.1 can not promote SC after
>>>> enable "headless cluster" feature
>>>>
>>>> Hi,
>>>>
>>>> Once after the "Headless" if any of the controller

Re: [users] OpenSAF release 5.0.1 can not promote SC after enable "headless cluster" feature

2016-10-11 Thread Anders Widell

Yes, this is required with the current implementation. It might be 
possible to remove this requirement - I will think about how it can be done.

regards,

Anders Widell


On 10/11/2016 09:06 AM, Jianfeng Dong wrote:
> Is it obligatory that controller must have a slower slot_id than payload if 
> we want to enable "headless" feature?
> If it is obligatory, seems it's a big change to our architecture, but I will 
> have a try at least.
>
> Thanks,
> Jianfeng
>
> -Original Message-
> From: Anders Widell [mailto:anders.wid...@ericsson.com]
> Sent: Tuesday, October 11, 2016 2:30 PM
> To: Jianfeng Dong <jd...@juniper.net>; Neelakanta Reddy 
> <reddy.neelaka...@oracle.com>; opensaf-users@lists.sourceforge.net
> Subject: Re: [users] OpenSAF release 5.0.1 can not promote SC after enable 
> "headless cluster" feature
>
> There is a one-to-one mapping between /etc/opensaf/slot_id and the node_id. 
> Simply make sure that all your system controller nodes have lower slot_id 
> than any of your payloads. This file is read when the node is booted. You 
> should be able to do an in-service renumbering of your nodes - just be 
> careful so that you never have two nodes with the same node_id at the same 
> time.
>
> Yes, the assumption is there in 5.1.0 as well.
>
> regards,
>
> Anders Widell
>
>
> On 10/11/2016 04:29 AM, Jianfeng Dong wrote:
>> Yes, in our product payload's node_id is lower than SC, could you please 
>> tell us how to configure it?
>>
>> And, does this assumption exist in OpenSAF 5.1.0 as well?
>>
>> Thanks,
>> Jianfeng
>>
>> -Original Message-
>> From: Anders Widell [mailto:anders.wid...@ericsson.com]
>> Sent: Tuesday, October 11, 2016 12:55 AM
>> To: Jianfeng Dong <jd...@juniper.net>; Neelakanta Reddy
>> <reddy.neelaka...@oracle.com>; opensaf-users@lists.sourceforge.net
>> Subject: Re: [users] OpenSAF release 5.0.1 can not promote SC after
>> enable "headless cluster" feature
>>
>> There is a (probably not so well documented :-) assumption that the system 
>> controllers are configured with a lower node_id than the payloads. From what 
>> I can see in the logs you sent, I think it looks like you have configured 
>> the payload with a lower node_id than the system controllers.
>>
>> By the way, the headless feature has been improved in OpenSAF 5.1.0 so I 
>> would suggest that you upgrade to that version if possible.
>>
>> regards,
>>
>> Anders Widell
>>
>>
>> On 10/10/2016 06:04 PM, Jianfeng Dong wrote:
>>> I tried with sufficient drive space but got same result, neither of the two 
>>> SCs can be promoted to be controller until the payload reboot.
>>>
>>> I also checked the network link between SC and payload, they can PING each 
>>> other when this issue happened. I suspect too the problem is caused by 
>>> IMMD/IMMND link among those nodes, but don't know how to prove it.
>>>
>>> From: Neelakanta Reddy [mailto:reddy.neelaka...@oracle.com]
>>> Sent: Monday, October 10, 2016 8:39 PM
>>> To: Jianfeng Dong <jd...@juniper.net>;
>>> opensaf-users@lists.sourceforge.net
>>> Subject: Re: [users] OpenSAF release 5.0.1 can not promote SC after
>>> enable "headless cluster" feature
>>>
>>> Hi,
>>>
>>> Once after the "Headless" if any of the controller started then the IMMND 
>>> from the payaload will send the intro message to IMMD.
>>> Looks like this did not happen, the following is the log from the payload:
>>>
>>> 2016-10-10T11:09:18.507851+08:00 pld0101 osafimmnd[3141]: message
>>> repeated 2 times: [ logtrace: write failed, No space left on device]
>>> 2016-10-10T11:09:18.507883+08:00 pld0101 osafimmnd[3141]: NO
>>> Re-introduce-me highestProcessed:23839 highestReceived:23839
>>> 2016-10-10T11:09:18.508011+08:00 pld0101 osafimmnd[3141]: logtrace:
>>> write failed, No space left on device
>>> 2016-10-10T11:09:18.508129+08:00 pld0101 osafimmnd[3141]: logtrace:
>>> write failed, No space left on device
>>> 2016-10-10T11:09:18.508501+08:00 pld0101 osafimmnd[3141]: WA MDS Send
>>> Failed to service:IMMD rc:2
>>>
>>>
>>> Retry, again with the sufficient space in payload.
>>>
>>> /Neel.
>>>
>>> On 2016/10/10 03:59 PM, Jianfeng Dong wrote:
>>>
>>> Hi,
>>>
>>>
>>>
>>> For several years we use OpenSAF(4.5.2 now) to provide HA service in our 
>>> product(including

[users] Announcement of OpenSAF 5.1.0 (GA) and 5.0.1, 4.7.2 (maintenance) releases

2016-09-27 Thread Anders Widell

Hello,

The OpenSAF community is pleased to announce the General Availability of 
the OpenSAF 5.1.0 enhancements release. The source code for OpenSAF 
5.1.0 and the corresponding documentation can be downloaded using the 
following links:

http://sourceforge.net/projects/opensaf/files/releases/opensaf-5.1.0.tar.gz/download
http://sourceforge.net/projects/opensaf/files/docs/opensaf-documentation-5.1.0.tar.gz/download

For a complete list of new features in this release, please refer to the 
NEWS at the wiki:

https://sourceforge.net/p/opensaf/wiki/NEWS-5.1.0/

We have also released the maintenance (patch) releases OpenSAF 5.0.1 and 
OpenSAF 4.7.2.

- The source code for OpenSAF 5.0.1 and the corresponding documentation 
can be downloaded using the following links:

http://sourceforge.net/projects/opensaf/files/releases/opensaf-5.0.1.tar.gz/download
http://sourceforge.net/projects/opensaf/files/docs/opensaf-documentation-5.0.0.tar.gz/download

See the ChangeLog for a full list of changes in this release:

https://sourceforge.net/p/opensaf/wiki/ChangeLog-5.0.1/

- The source code for OpenSAF 4.7.2 and the corresponding documentation 
can be downloaded using the following links:

http://sourceforge.net/projects/opensaf/files/releases/opensaf-4.7.2.tar.gz/download
http://sourceforge.net/projects/opensaf/files/docs/opensaf-documentation-4.7.0.tar.gz/download

See the ChangeLog for a full list of changes in this release:

https://sourceforge.net/p/opensaf/wiki/ChangeLog-4.7.2/

Thank you for your continued interest in OpenSAF and to everyone who has 
contributed to this release.

regards,
Anders Widell


--
___
Opensaf-users mailing list
Opensaf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Re: [users] how long it takes to detect node sudden power

2016-09-15 Thread Anders Widell

Yes we were experimenting with the tcp_retries2 option, but the solution 
we ended up with was to use the TCP_USER_TIMEOUT socket option.

regards,

Anders Widell


On 09/15/2016 03:13 PM, Nivrutti Kale wrote:
> Hi,
>
> There is one way to improve the detection time. You can change the " 
> net.ipv4.tcp_retries2"  value to 3.
> Default value of " net.ipv4.tcp_retries2" is 15.
>
> Thanks,
> Nivrutti
>
> -Original Message-
> From: Mathivanan Naickan Palanivelu [mailto:mathi.naic...@oracle.com]
> Sent: Thursday, September 15, 2016 6:38 PM
> To: Shu Wang <shu.w...@netcracker.com>; opensaf-users@lists.sourceforge.net
> Subject: Re: [users] how long it takes to detect node sudden power
>
> Hi,
>
> You could try the fix in this ticket 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__sourceforge.net_p_opensaf_tickets_2014_=DQICAg=IL_XqQWOjubgfqINi2jTzg=8oj2Tn7_JuMy90N67rXExkWsx29-JTWbXUkT3IIi99w=DetywC0rOBBSwA5PRfrcpfRXAyGliPduaCiI-fnO-gw=gSGrK2pteB9mnPgovHNo3qsOXF0w9s77wt4nUXOHt4o=
>   and see if the scenario is the same The patch In 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__sourceforge.net_p_opensaf_staging_ci_b30d5e33e50c7eea8cc1730cbe0a0dde572621f0_=DQICAg=IL_XqQWOjubgfqINi2jTzg=8oj2Tn7_JuMy90N67rXExkWsx29-JTWbXUkT3IIi99w=DetywC0rOBBSwA5PRfrcpfRXAyGliPduaCiI-fnO-gw=UTa3tlpHkkLFWQGUlegcxS3Y6JFlHiW2Yfx1bCbKcTM=
>
> Thanks,
> Mathi.
>
>
>> -Original Message-
>> From: Shu Wang [mailto:shu.w...@netcracker.com]
>> Sent: Saturday, June 20, 2015 1:50 AM
>> To: opensaf-users@lists.sourceforge.net
>> Subject: Re: [users] how long it takes to detect node sudden power
>>
>> We have a similar scenario. One of our payload node rebooted, it took
>> from a few seconds to a few minutes for other nodes to detect the node
>> loss. Since it took the master controller a few minutes to detect the
>> node loss and reacted to the loss, this caused serious problems and
>> many service units went bad. Is there anyway to improve the detection time?
>>
>> Thank you!
>>
>> Shu Wang | Senior Analyst | +1(407)708-5117 or x3917|
>> www.NetCracker.com Proven Partner to Communications Service Providers
>>
>> -Original Message-
>> Message: 3
>> Date: Tue, 14 Apr 2015 09:58:51 +
>> From: Yao Cheng LIANG <ycli...@astri.org>
>> Subject: Re: [users] how long it takes to detect node sudden power
>>  loss
>> To: 'A V Mahesh' <mahesh.va...@oracle.com>, Mathivanan Naickan
>>  Palanivelu  <mathi.naic...@oracle.com>
>> Cc: "opensaf-users@lists.sourceforge.net"
>>  <opensaf-users@lists.sourceforge.net>
>> Message-ID: <285F6C4AD3FBC04EBAE1D68203EA87F20B037F25@asdag1>
>> Content-Type: text/plain; charset="windows-1255"
>>
>> Let me give more info about my setup:
>>
>>
>> 1.   I have two node, running as controller
>>
>> 2.   Besides OpenSAF service, I have another service unit with three
>> component in it
>>
>> 3.   These components use Checkpoint service to data synchronization
>>
>>
>>
>> My dtmd.conf is as below:
>>
>> ?
>>
>> DTM_INI_DIS_TIMEOUT_SECS=5
>>
>>
>>
>> DTM_TCP_KEEPIDLE_TIME=2
>>
>>
>>
>> DTM_TCP_KEEPALIVE_INTVL=1
>>
>>
>>
>> DTM_TCP_KEEPALIVE_PROBES=2
>>
>>
>>
>> I read the code and found it is using TCP keepalive to detect failure
>> of peer node. While keepalive packet will not be send until some time
>> after the link is IDLE. I think the issue is here. Suppose ?standby?
>> node is sending something to ?active? node, while at this time ?active? node 
>> is rebooted, ?standby?
>> node will keeping sending this until it reaches maximum retries. In
>> this period, the link will not be idel, thus the keepalive mechanism
>> will not start to work. This may cause ?standby? node long time to detect 
>> failure of ?active?
>> node.
>>
>> Thanks.
>>
>>
>>
>> Ted
>>
>>
>>
>>
>>
>> From: A V Mahesh [mailto:mahesh.va...@oracle.com]
>> Sent: Monday, April 13, 2015 10:06 PM
>> To: Yao Cheng LIANG; Mathivanan Naickan Palanivelu
>> Cc: opensaf-users@lists.sourceforge.net
>> Subject: Re: [users] how long it takes to detect node sudden power
>> loss
>>
>> Hi,
>>
>> Un-comment the below line to enable trace of osafdtm in
>> /etc/opensaf/dtmd.conf
>>
>> #args="--tracemask=0x"   -->  args="--tracemask=0x"
>

[users] OpenSAF 5.1 RC1 is available

2016-09-13 Thread Anders Widell

Hi!

A release candidate for the upcoming OpenSAF 5.1 is now available for 
download using this link:

http://sourceforge.net/projects/opensaf/files/testing/opensaf-5.1.RC1.tar.gz/download

Please download and test this release candidate, so that we can fix any 
remaining bugs and regressions before the final OpenSAF 5.1 release. If 
you write a defect ticket on this release candidate, specify 5.1.RC1 in 
the Version field when writing the ticket.

For a full list of enhancements implemented in OpenSAF 5.1, see the 
following page:

https://sourceforge.net/p/opensaf/tickets/search/?q=status%3A%28accepted+review+fixed%29+AND+_milestone%3A%285.1.FC+5.1.RC1+5.1.RC2+5.1.0%29+AND+_type%3Aenhancement=100

regards,
Anders Widell



--
___
Opensaf-users mailing list
Opensaf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-users

[users] Announcement of OpenSAF 4.7.0 (GA) and 4.6.1, 4.5.2 (maintenance) releases

2015-11-01 Thread Anders Widell

Hello,

The OpenSAF community is pleased to announce the General Availability of 
the OpenSAF 4.7.0 enhancements release. The source code for OpenSAF 
4.7.0 and the corresponding documentation can be downloaded using the 
following links:

http://sourceforge.net/projects/opensaf/files/releases/opensaf-4.7.0.tar.gz/download
http://sourceforge.net/projects/opensaf/files/docs/opensaf-documentation-4.7.0.tar.gz/download

For a complete list of new features in this release, please refer to the 
NEWS at the wiki:

https://sourceforge.net/p/opensaf/wiki/NEWS-4.7.0/

We have also released the maintenance (patch) releases OpenSAF 4.6.1 and 
OpenSAF 4.5.2.

- The source code for OpenSAF 4.6.1 and the corresponding documentation 
can be downloaded using the following links:

http://sourceforge.net/projects/opensaf/files/releases/opensaf-4.6.1.tar.gz/download
http://sourceforge.net/projects/opensaf/files/docs/opensaf-documentation-4.6.1.tar.gz/download

See the ChangeLog for a full list of changes in this release:

https://sourceforge.net/p/opensaf/wiki/ChangeLog-4.6.1/

- The source code for OpenSAF 4.5.2 and the corresponding documentation 
can be downloaded using the following links:

http://sourceforge.net/projects/opensaf/files/releases/opensaf-4.5.2.tar.gz/download
http://sourceforge.net/projects/opensaf/files/docs/opensaf-documentation-4.5.2.tar.gz/download

See the ChangeLog for a full list of changes in this release:

https://sourceforge.net/p/opensaf/wiki/ChangeLog-4.5.2/

Thank you for your continued interest in OpenSAF and to everyone who has 
contributed to this release.

regards,
Anders Widell


--
___
Opensaf-users mailing list
Opensaf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-users

[users] OpenSAF 4.7 RC1 is available

2015-10-23 Thread Anders Widell

Hi!

A release candidate for the upcoming OpenSAF 4.7 is now available for 
download using this link:

http://sourceforge.net/projects/opensaf/files/testing/opensaf-4.7.RC1.tar.gz/download

Please download and test this release candidate, so that we can fix any 
remaining bugs and regressions before the final OpenSAF 4.7 release. If 
you write a defect ticket on this release candidate, specify 4.7.RC1 in 
the Version field when writing the ticket.

For a full list of enhancements implemented in OpenSAF 4.7, see the 
following page:

https://sourceforge.net/p/opensaf/tickets/search/?q=status%3A%28accepted+review+fixed%29+AND+_milestone%3A%284.7.FC+4.7.RC1+4.7.RC2+4.7.0%29+AND+_type%3Aenhancement=100

regards,
Anders Widell


--
___
Opensaf-users mailing list
Opensaf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Re: [users] Avoid rebooting payload modules after losing system controller

2015-10-14 Thread Anders Widell

In this kind of solution the system controller(s) would not be part of 
the cluster (not the same cluster anyway) - the system controller 
functionality would be a service provided externally (e.g. by the 
cloud). So loss of connectivity with the system controller service would 
not in itself result in a change of the cluster membership.

/ Anders Widell

On 10/14/2015 12:44 PM, Mathivanan Naickan Palanivelu wrote:
> In a clustered environment, all nodes need to be in the same consistent view
> from a membership perspective.
> So, loss of a leader will indeed result in state changes to the nodes
> following the leader. Therefore there cannot be independent lifecycles
> for some nodes and other nodes.
> B.T.W the restart mentioned below meant - 'transparent restart
> of OpenSAF' without affecting applications and for this the application's
> CLC-CLI scripts need to handle.
>
> While we could be inclined to support (unique!) use-case such as this,
> it is the solution(headless fork) that would be under the lens! ;-)
> Let's discuss this!
>
> Cheers,
> Mathi.
>
>
> - anders.wid...@ericsson.com wrote:
>
>> Yes, this is yet another approach. But it is also another use-case for
>>
>> the headless feature. When we have moved the system controllers out of
>>
>> the cluster (into the cloud infrastructure), I would expect
>> controllers
>> and payloads to have independent life cycles. You have servers (i.e.
>> system controllers), and clients (payloads). They can be installed and
>>
>> upgraded separately from each other, and I wouldn't expect a restart
>> of
>> the servers to cause all the clients to restart as well, in the same
>> way
>> as I don't expect my web browser to restart just because because the
>> web
>> server has crashed.
>>
>> / Anders Widell
>>
>> On 10/13/2015 03:54 PM, Mathivanan Naickan Palanivelu wrote:
>>> I don't think this is a case of cattles! Even in those scenario
>>> the cloud management stacks, the  "**controller" software themselves
>> are 'placed' on physical nodes
>>> in appropriate redundancy models and not inside those cattle VMs!
>>>
>>> I think the case here is about avoid rebooting of the node!
>>> This can be achieved by setting the NOACTIVE timer to a longer value
>> till OpenSAF on the controller comes back up.
>>> Upon detecting that the controllers are up, some entity on the local
>> node restart OpenSAF (/etc/init.d/opensafd restart)
>>> And ensure the CLC-CLI scripts of the applications differentiate
>> usual restart versus this spoof-restart!
>>> Mathi.
>>>
>>>> -Original Message-
>>>> From: Anders Widell [mailto:anders.wid...@ericsson.com]
>>>> Sent: Tuesday, October 13, 2015 5:36 PM
>>>> To: Anders Björnerstedt; Tony Hart
>>>> Cc: opensaf-users@lists.sourceforge.net
>>>> Subject: Re: [users] Avoid rebooting payload modules after losing
>> system
>>>> controller
>>>>
>>>> Yes, I agree that the best fit for this feature is an application
>> using either the
>>>> NWay-Active or the No-Redundancy models, and where you view the
>>>> system more as a collection of nodes rather than as a cluster. This
>> kind of
>>>> architecture is quite common when you write applications for cloud.
>> The
>>>> redundancy models are suitable for scaling, and the architecture
>> fits into the
>>>> "cattle" philosophy which is common in cloud.
>>>> Such an application can tolerate any number of node failures, and
>> the
>>>> remaining nodes would still be able to continue functioning and
>> provide their
>>>> service. However, if we put the OpenSAF middleware on the nodes it
>>>> becomes the weakest link, since OpenSAF will reboot all the nodes
>> just
>>>> because the two controller nodes fail. What a pity on a system with
>> one
>>>> hundred nodes!
>>>>
>>>> / Anders Widell
>>>>
>>>> On 10/13/2015 01:19 PM, Anders Björnerstedt wrote:
>>>>> On 10/13/2015 12:27 PM, Anders Widell wrote:
>>>>>> The possibility to have more than two system controllers (one
>> active
>>>>>> + several standby and/or spare controller nodes) is also
>> something
>>>>>> that has been investigated. For scalability reasons, we probably
>>>>>> can't turn all nodes into standby controllers in a large cluster
>> -
>>>>>> but it may be feasible to ha

Re: [users] Avoid rebooting payload modules after losing system controller

2015-10-13 Thread Anders Widell

The possibility to have more than two system controllers (one active + 
several standby and/or spare controller nodes) is also something that 
has been investigated. For scalability reasons, we probably can't turn 
all nodes into standby controllers in a large cluster - but it may be 
feasible to have a system with one or several standby controllers and 
the rest of the nodes are spares that are ready to take an active or 
standby assignment when needed.

However, the "headless" feature will still be needed in some systems 
where you need dedicated controller node(s).

/ Anders Widell

On 10/13/2015 12:07 PM, Tony Hart wrote:
> Understood.  The assumption is that this is temporary but we allow the 
> payloads to continue to run (with reduced osaf functionality) until a 
> replacement controller is found.  At that point they can reboot to get the 
> system back into sync.
>
> Or allow more than 2 controllers in the system so we can have one or more 
> usually-payload cards be controllers to reduce the probability of 
> no-controllers to an acceptable level.
>
>
>> On Oct 12, 2015, at 11:05 AM, Anders Björnerstedt 
>> <anders.bjornerst...@ericsson.com> wrote:
>>
>> The headless state is also vulnerable to split-brain scenarios.
>> That is network partitions and joins can occur and will not be detected as 
>> such and thus not handled properly (isolated) when they occur.
>> Basically you can  not be sure you have a continuously coherent cluster 
>> while in the headless state.
>>
>> On paper you may get a very resilient system in the sense that It "stays up" 
>>  and replies on ping etc.
>> But typically a customer wants not just availability but reliable behavior 
>> also.
>>
>> /AndersBj
>>
>>
>> -Original Message-
>> From: Anders Björnerstedt [mailto:anders.bjornerst...@ericsson.com]
>> Sent: den 12 oktober 2015 16:42
>> To: Anders Widell; Tony Hart; opensaf-users@lists.sourceforge.net
>> Subject: Re: [users] Avoid rebooting payload modules after losing system 
>> controller
>>
>> Note that this headless variant  is a very questionable feature. This for 
>> the reasons explained earlier, i.e. you *will*  get a reduction in service 
>> availability.
>> It was never accepted into OpenSAF for that reason.
>>
>> On top of that the unreliability will typically not he explicit/handled. 
>> That is the operator will probably not even know what is working and what is 
>> not during the SC absence since the alarm/notification  function is gone. No 
>> OpenSAF director services are executing.
>>
>> It is truly a headless system, i.e. a zombie system and thus not working at 
>> full monitoring and availability functionality.
>> It begs the question of what OpenSAF and SAF is there for in the first place.
>>
>> The SCs don’t have to run any special software and don’t have to have any 
>> special hardware.
>> They do need file system access, at least for a cluster restart, but not 
>> necessarily to handle single SC failure.
>> The headless variant when headless is also in that 
>> not-able-to-cluster-restart also, but with even less functionality.
>>
>> An SC can of course run other (non OpenSAF specific) software.  And the two 
>> SCs don’t necessarily have to be symmetric in terms of software.
>>
>> Providing file system access via NFS is typically a non issue. They have 
>> three nodes. Ergo  they should be able to assign two of them the role of SC 
>> in the OpensAF domain.
>>
>> /AndersBj
>>
>> -Original Message-
>> From: Anders Widell [mailto:anders.wid...@ericsson.com]
>> Sent: den 12 oktober 2015 16:08
>> To: Tony Hart; opensaf-users@lists.sourceforge.net
>> Subject: Re: [users] Avoid rebooting payload modules after losing system 
>> controller
>>
>> We have actually implemented something very similar to what you are talking 
>> about. With this feature, the payloads can survive without a cluster restart 
>> even if both system controllers restart (or the single system controller, in 
>> your case). If you want to try it out, you can clone this Mercurial 
>> repository:
>>
>> https://sourceforge.net/u/anders-w/opensaf-headless/
>>
>> To enable the feature, set the variable IMMSV_SC_ABSENCE_ALLOWED in 
>> immd.conf to the amount of seconds you wish the payloads to wait for the 
>> system controllers to come back. Note: we have only implemented this feature 
>> for the "core" OpenSAF services (plus CKPT), so you need to disable the 
>> optional serivces.
>>
>> / Anders Wid

Re: [users] Avoid rebooting payload modules after losing system controller

2015-10-13 Thread Anders Widell

Yes, I agree that the best fit for this feature is an application using 
either the NWay-Active or the No-Redundancy models, and where you view 
the system more as a collection of nodes rather than as a cluster. This 
kind of architecture is quite common when you write applications for 
cloud. The redundancy models are suitable for scaling, and the 
architecture fits into the "cattle" philosophy which is common in cloud. 
Such an application can tolerate any number of node failures, and the 
remaining nodes would still be able to continue functioning and provide 
their service. However, if we put the OpenSAF middleware on the nodes it 
becomes the weakest link, since OpenSAF will reboot all the nodes just 
because the two controller nodes fail. What a pity on a system with one 
hundred nodes!

/ Anders Widell

On 10/13/2015 01:19 PM, Anders Björnerstedt wrote:
>
>
> On 10/13/2015 12:27 PM, Anders Widell wrote:
>> The possibility to have more than two system controllers (one active 
>> + several standby and/or spare controller nodes) is also something 
>> that has been investigated. For scalability reasons, we probably 
>> can't turn all nodes into standby controllers in a large cluster - 
>> but it may be feasible to have a system with one or several standby 
>> controllers and the rest of the nodes are spares that are ready to 
>> take an active or standby assignment when needed.
>>
>> However, the "headless" feature will still be needed in some systems 
>> where you need dedicated controller node(s).
>
> That sounds as if some deployments have a special requirement that can 
> only be supported by the headless feature.
> But you also have to say that the headless feature places 
> anti-requirements on the deployments/applications that are to use it.
>
> For example not needing cluster coherence among the payloads.
>
> If the payloads only run independent application instances where each 
> instance is implemented at one processor
> or at least does not communicate in any state-sensitive way with peer 
> processes at other payloads; and no such
> instance is unique or if it is unique it is still expendable (non 
> critical to the service), then it could work.
>
> It is important the the deployments that end up thinking they need the 
> headless feature also understand what they loose
> with the headless feature and that this loss is acceptable for that 
> deployment.
>
> So headless is not a fancy feature needed by some exclusive and picky 
> subset of applications.
> It is a relaxation that drops all requirements on distributed 
> consistency and may be acceptable to some
> applications with weaker demands so they can accept the anti 
> requirements.
>
> Besides requiring "dedicated" controller nodes, the deployment must of 
> course NOT require any *availability*
> of those dedicated controller nodes, i.e. not have any requirements on 
> service availability in general.
>
> It may works for some "dumb" applications that are stateless, or state 
> stable (frozen in state), or have no requirements on
> availability of state. In other words some applicaitons that really 
> dont need SAF.
>
> They may still want to use SAF as a way of managing and monitoring the 
> system when it happens to be healthy,
> but can live with  long periods of not being able to manage or monitor 
> that system, which can then be degrading
> in any way that is possible.
>
>
> /AndersBJ
>
>
>
>>
>> / Anders Widell
>>
>> On 10/13/2015 12:07 PM, Tony Hart wrote:
>>> Understood.  The assumption is that this is temporary but we allow 
>>> the payloads to continue to run (with reduced osaf functionality) 
>>> until a replacement controller is found.  At that point they can 
>>> reboot to get the system back into sync.
>>>
>>> Or allow more than 2 controllers in the system so we can have one or 
>>> more usually-payload cards be controllers to reduce the probability 
>>> of no-controllers to an acceptable level.
>>>
>>>
>>>> On Oct 12, 2015, at 11:05 AM, Anders Björnerstedt 
>>>> <anders.bjornerst...@ericsson.com> wrote:
>>>>
>>>> The headless state is also vulnerable to split-brain scenarios.
>>>> That is network partitions and joins can occur and will not be 
>>>> detected as such and thus not handled properly (isolated) when they 
>>>> occur.
>>>> Basically you can  not be sure you have a continuously coherent 
>>>> cluster while in the headless state.
>>>>
>>>> On paper you may get a very resilient system in the sense that It 
>>

Re: [users] Avoid rebooting payload modules after losing system controller

2015-10-13 Thread Anders Widell

Agreed, this would also improve availability. So in the end, when we 
have all this, I suspect that the headless feature will not be needed on 
most systems. But we are not there yet, and even when we are there will 
still be those systems which require dedicated controller nodes.

/ Anders Widell

On 10/13/2015 03:44 PM, Anders Björnerstedt wrote:
> Yes and here we have a third approach for addressing the SC failure issue 
> (besides headless and "roaming" SC) there is the approach of
> drastically increasing the MTBF for SCs by:
>
> a) Making director processes re-startable, i.e. the failure or termination of 
> a director process should not have to pull down the whole SC.
>
> b) While keeping the concept of active and standby director process one can 
> drop the concept of active and standby SC. That is the
> different active (or standby) directors do not necessarily have to run at the 
> same SC.  The only real requirement is that active and
> standby directors for any given service are not located on the same SC.
>
> When a director terminates or crashes, instead of pulling down the SC, you 
> would fail-over only that service.
> This would of course also open up for si-swap for individual service 
> directors.
>
> It would of course require a fair  bit of work. But I expect it would be less 
> work than the work already put into the headless prototype
> and the end result would be much cleaner both conceptually and in the code. 
> Almost no-one *really* understands how the headless
> prototype works and its side effects.
> For example: are  runtime attributes readable or not during SC absence ? If 
> they are readable, what does the value reflect ?
> If not readable, why? Or more importantly for how long?
>
>   Restartable directors would be easier to understand for users and there 
> would be much less to understand since the difference in terms of 
> functionality
> would be much smaller. Total availability would improve instead of being 
> degraded.
>
> /AndersBj
>
> -Original Message-
> From: Anders Widell
> Sent: den 13 oktober 2015 14:06
> To: Anders Björnerstedt; Tony Hart
> Cc: opensaf-users@lists.sourceforge.net
> Subject: Re: [users] Avoid rebooting payload modules after losing system 
> controller
>
> Yes, I agree that the best fit for this feature is an application using 
> either the NWay-Active or the No-Redundancy models, and where you view the 
> system more as a collection of nodes rather than as a cluster. This kind of 
> architecture is quite common when you write applications for cloud. The 
> redundancy models are suitable for scaling, and the architecture fits into 
> the "cattle" philosophy which is common in cloud.
> Such an application can tolerate any number of node failures, and the 
> remaining nodes would still be able to continue functioning and provide their 
> service. However, if we put the OpenSAF middleware on the nodes it becomes 
> the weakest link, since OpenSAF will reboot all the nodes just because the 
> two controller nodes fail. What a pity on a system with one hundred nodes!
>
> / Anders Widell
>
> On 10/13/2015 01:19 PM, Anders Björnerstedt wrote:
>>
>> On 10/13/2015 12:27 PM, Anders Widell wrote:
>>> The possibility to have more than two system controllers (one active
>>> + several standby and/or spare controller nodes) is also something
>>> that has been investigated. For scalability reasons, we probably
>>> can't turn all nodes into standby controllers in a large cluster -
>>> but it may be feasible to have a system with one or several standby
>>> controllers and the rest of the nodes are spares that are ready to
>>> take an active or standby assignment when needed.
>>>
>>> However, the "headless" feature will still be needed in some systems
>>> where you need dedicated controller node(s).
>> That sounds as if some deployments have a special requirement that can
>> only be supported by the headless feature.
>> But you also have to say that the headless feature places
>> anti-requirements on the deployments/applications that are to use it.
>>
>> For example not needing cluster coherence among the payloads.
>>
>> If the payloads only run independent application instances where each
>> instance is implemented at one processor or at least does not
>> communicate in any state-sensitive way with peer processes at other
>> payloads; and no such instance is unique or if it is unique it is
>> still expendable (non critical to the service), then it could work.
>>
>> It is important the the deployments that end up thinking they need the
>> headless feature also und

Re: [users] Avoid rebooting payload modules after losing system controller

2015-10-12 Thread Anders Widell

We have actually implemented something very similar to what you are 
talking about. With this feature, the payloads can survive without a 
cluster restart even if both system controllers restart (or the single 
system controller, in your case). If you want to try it out, you can 
clone this Mercurial repository:

https://sourceforge.net/u/anders-w/opensaf-headless/

To enable the feature, set the variable IMMSV_SC_ABSENCE_ALLOWED in 
immd.conf to the amount of seconds you wish the payloads to wait for the 
system controllers to come back. Note: we have only implemented this 
feature for the "core" OpenSAF services (plus CKPT), so you need to 
disable the optional serivces.

/ Anders Widell

On 10/11/2015 02:30 PM, Tony Hart wrote:
> We have been using opensaf in our product for a couple of years now.  One of 
> the issues we have is the fact that payload cards reboot when the system 
> controllers are lost.  Although our payload card hardware will continue to 
> perform its functions whilst the software is down (which is desirable) the 
> functions that the software performs are obviously not performed (which is 
> not desirable).
>
> Why would we loose both controllers, surely that is a rare circumstance?  Not 
> if you only have one controller to begin with.  Removing the second 
> controller is a significant cost saving for us so we want to support a 
> product that only has one controller.  The most significant impediment to 
> that is the loss of payload software functions when the system controller 
> fails.
>
> I’m looking for suggestions from this email list as to what could be done for 
> this issue.
>
> One suggestion, that would work for us, is if we could convince the payload 
> card to only reboot when the controller reappears after a loss rather than 
> when the loss initially occurs.  Is that possible?
>
> Another possibility is if we could support more than 2 controllers, for 
> example if we could support 4 (one active and 3 standbys) that would also 
> provide a solution for us (our current payloads would instead become 
> controllers).  I know that this is not currently possible with opensaf.
>
> thanks for any suggestions,
> —
> tony
> --
> ___
> Opensaf-users mailing list
> Opensaf-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/opensaf-users



--
___
Opensaf-users mailing list
Opensaf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Re: [users] Avoid rebooting payload modules after losing system controller

2015-10-12 Thread Anders Widell

The feature has been controversial so we currently maintain it as a 
fork, but it may get merged in the future.

You currently need to disable the following OpenSAF services:

--disable-ais-msg --disable-ais-evt --disable-ais-lck --disable-ais-plm

The fork that I published is on the default branch, so it's basically 
OpenSAF 4.7.

regards,
Anders Widell

On 10/12/2015 04:19 PM, Tony Hart wrote:
> Hi Anders,
>
> Thank you I’ll definitely try this out.
>
> A couple of questions,
>
> What is the status of this feature, is it scheduled to be included in a 
> release?
> Do you have a list of services that need to be disabled?
> What OSAF version is this based on? 5.4?
>
> thanks
> —
> tony
>
>
>> On Oct 12, 2015, at 10:07 AM, Anders Widell <anders.wid...@ericsson.com> 
>> wrote:
>>
>> We have actually implemented something very similar to what you are talking 
>> about. With this feature, the payloads can survive without a cluster restart 
>> even if both system controllers restart (or the single system controller, in 
>> your case). If you want to try it out, you can clone this Mercurial 
>> repository:
>>
>> https://sourceforge.net/u/anders-w/opensaf-headless/
>>
>> To enable the feature, set the variable IMMSV_SC_ABSENCE_ALLOWED in 
>> immd.conf to the amount of seconds you wish the payloads to wait for the 
>> system controllers to come back. Note: we have only implemented this feature 
>> for the "core" OpenSAF services (plus CKPT), so you need to disable the 
>> optional serivces.
>>
>> / Anders Widell
>>
>> On 10/11/2015 02:30 PM, Tony Hart wrote:
>>> We have been using opensaf in our product for a couple of years now.  One 
>>> of the issues we have is the fact that payload cards reboot when the system 
>>> controllers are lost.  Although our payload card hardware will continue to 
>>> perform its functions whilst the software is down (which is desirable) the 
>>> functions that the software performs are obviously not performed (which is 
>>> not desirable).
>>>
>>> Why would we loose both controllers, surely that is a rare circumstance?  
>>> Not if you only have one controller to begin with.  Removing the second 
>>> controller is a significant cost saving for us so we want to support a 
>>> product that only has one controller.  The most significant impediment to 
>>> that is the loss of payload software functions when the system controller 
>>> fails.
>>>
>>> I’m looking for suggestions from this email list as to what could be done 
>>> for this issue.
>>>
>>> One suggestion, that would work for us, is if we could convince the payload 
>>> card to only reboot when the controller reappears after a loss rather than 
>>> when the loss initially occurs.  Is that possible?
>>>
>>> Another possibility is if we could support more than 2 controllers, for 
>>> example if we could support 4 (one active and 3 standbys) that would also 
>>> provide a solution for us (our current payloads would instead become 
>>> controllers).  I know that this is not currently possible with opensaf.
>>>
>>> thanks for any suggestions,
>>> —
>>> tony
>>> --
>>> ___
>>> Opensaf-users mailing list
>>> Opensaf-users@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/opensaf-users
>>



--
___
Opensaf-users mailing list
Opensaf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Re: [users] How to correct a split-brain situation

2015-09-24 Thread Anders Widell

Also, I must point out the importance of having a redundant network 
connection between the nodes; otherwise it will be a single point of 
failure. Is your network duplicated?

/ Anders Widell

On 09/24/2015 12:21 PM, Mathivanan Naickan Palanivelu wrote:
> Hi,
>
> Note that FMS_PROMOTE_ACTIVE_TIMER and opensaf_reboot scripts are two 
> platform adaptation attributes in
> OpenSAF w.r.t failover and fencing. An OpenSAF user can customize these in 
> their deployments.
>
> Upon receiving connection loss indication with the active controller,
> the STANDBY controller starts this promote active timer (see 
> FMS_PROMOTE_ACTIVE_TIMER in /etc/opensaf/fmd.conf).
> This timer acts as a tolerance mechanism to handle or differentiate temporary 
> link-flaps and false-positives
> in your network.
> Upon expiry of this timer, the STANDBY invokes opensaf_reboot script (with 
> the intention to reboot
> the ACTIVE node) and subsequently promotes itself to ACTIVE.
>
> The opensaf_reboot script is an integration point for the OpenSAF user. So, 
> during failover
> when this opensaf_reboot script is invoked the node information (node_id, PLM 
> ee name) of the
> peer ACTIVE node is passed as input to this script.
> Inside this script, the user can modify so as to invoke 'commands' that will 
> perform remote reboots
> of the old ACTIVE node.
> The 'commands' here could be an IPMI command or any STONITH agent/command.
>
> Cheers,
> Mathi.
>
> - shu.w...@netcracker.com wrote:
>
>> When a system gets into split-brain scenario, both controllers assume
>> active role. How does a payload node distinguish which controller it
>> is associated to? Is there a way that we find out which payload nodes
>> connect to which controller?
>>
>> Our cluster needs to provide service 24x7.  So restarting the cluster
>> is not possible when this situation occurs.  What is the best way to
>> correct a split-brain situation? If we stop and restart one of the
>> controller nodes to allow it to rejoin the other controller, should we
>> also restart the payload nodes associated to that controller? Those
>> payload nodes should be stopped before stopping their associated
>> controller node, correct?
>>
>> Shu Wang
>>
>>
>>
>>
>> 
>> The information transmitted herein is intended only for the person or
>> entity to which it is addressed and may contain confidential,
>> proprietary and/or privileged material. Any review, retransmission,
>> dissemination or other use of, or taking of any action in reliance
>> upon, this information by persons or entities other than the intended
>> recipient is prohibited. If you received this in error, please contact
>> the sender and delete the material from any computer.
>> --
>> Monitor Your Dynamic Infrastructure at Any Scale With Datadog!
>> Get real-time metrics from all of your servers, apps and tools
>> in one place.
>> SourceForge users - Click here to start your Free Trial of Datadog
>> now!
>> http://pubads.g.doubleclick.net/gampad/clk?id=241902991=/4140
>> ___
>> Opensaf-users mailing list
>> Opensaf-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/opensaf-users
> --
> Monitor Your Dynamic Infrastructure at Any Scale With Datadog!
> Get real-time metrics from all of your servers, apps and tools
> in one place.
> SourceForge users - Click here to start your Free Trial of Datadog now!
> http://pubads.g.doubleclick.net/gampad/clk?id=241902991=/4140
> ___
> Opensaf-users mailing list
> Opensaf-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/opensaf-users
>


--
Monitor Your Dynamic Infrastructure at Any Scale With Datadog!
Get real-time metrics from all of your servers, apps and tools
in one place.
SourceForge users - Click here to start your Free Trial of Datadog now!
http://pubads.g.doubleclick.net/gampad/clk?id=241902991=/4140
___
Opensaf-users mailing list
Opensaf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Re: [users] Error compiling 4.6.0 version

2015-07-08 Thread Anders Widell

I just tried to build opensaf-4.6.0 rpms on Fedora 21, and it worked 
fine. Could you try building on Fedora 21 to see if you still get the 
problem. Fedora 17 is very old and in fact it reached end-of-life two 
years ago, which means it doesn't get security updates any longer. I 
strongly recommend you to upgrade to a version of Fedora that is still 
receiving security updates.

By the way, have you done any customization to the OpenSAF source code 
or rpm spec file? The thing that caught my attention is the name zebha 
in the printouts below. Maybe you need to update your customizations to 
work with the latest OpenSAF version? If you have done any 
customization, please first try to build with a clean unmodified version 
of OpenSAF (i.e. the tarball you can download from sourceforge).

/ Anders Widell

On 07/08/2015 05:54 AM, Girish Nagaraj wrote:
 Hi Mathi,

 Provides: libSaPlm.so.0
 Requires(interp): /sbin/ldconfig /sbin/ldconfig
 Requires(rpmlib): rpmlib(CompressedFileNames) = 3.0.4-1 rpmlib(FileDigests)
 = 4.6.0-1 rpmlib(PayloadFilesHavePrefix) = 4.0-1
 Requires(post): /sbin/ldconfig /sbin/ldconfig
 Requires(postun): /sbin/ldconfig /sbin/ldconfig
 Requires: libc.so.6 libc.so.6(GLIBC_2.0) libc.so.6(GLIBC_2.1.3)
 libc.so.6(GLIBC_2.3.4) libc.so.6(GLIBC_2.4) libdl.so.2 libopensaf_core.so.0
 libpthread.so.0 libpthread.so.0(GLIBC_2.0) librt.so.1 rtld(GNU_HASH)
 Obsoletes: zebha-plm-libs  1.1.0-1.fc17
 Processing files: zebha-plm-server-1.1.0-1.fc17.i686
 Provides: config(zebha-plm-server) = 1.1.0-1.fc17
 Requires(rpmlib): rpmlib(CompressedFileNames) = 3.0.4-1 rpmlib(FileDigests)
 = 4.6.0-1 rpmlib(PayloadFilesHavePrefix) = 4.0-1
 Requires(post): /sbin/ldconfig
 Requires(postun): /sbin/ldconfig
 Requires: /bin/sh libSaAmf.so.0 libSaImmOi.so.0
 libSaImmOi.so.0(OPENSAF_IMM_A.02.01) libSaImmOm.so.0
 libSaImmOm.so.0(OPENSAF_IMM_A.02.01) libSaNtf.so.0 libc.so.6
 libc.so.6(GLIBC_2.0) libc.so.6(GLIBC_2.1) libc.so.6(GLIBC_2.3.4)
 libc.so.6(GLIBC_2.4) libdl.so.2 libdl.so.2(GLIBC_2.0) libdl.so.2(GLIBC_2.1)
 libgcc_s.so.1 libgcc_s.so.1(GCC_3.0) libgcc_s.so.1(GCC_3.3.1)
 libglib-2.0.so.0 libopenhpi.so.3 libopensaf_core.so.0 libplmc_utils.so.0
 libpthread.so.0 libpthread.so.0(GLIBC_2.0) libpthread.so.0(GLIBC_2.1)
 libpthread.so.0(GLIBC_2.3.2) librda.so.0 librt.so.1 librt.so.1(GLIBC_2.2)
 rtld(GNU_HASH)
 Processing files: zebha-plm-coordinator-1.1.0-1.fc17.i686
 error: File not found:
 /home/girish/projects/HMSSDEV/src/opensaf-4.6.0/rpms/BUILDROOT/zebha-1.1.0-1.fc17.i386/etc/init.d/plmcboot
 error: File not found:
 /home/girish/projects/HMSSDEV/src/opensaf-4.6.0/rpms/BUILDROOT/zebha-1.1.0-1.fc17.i386/etc/init.d/plmcd


 RPM build errors:
  File not found:
 /home/girish/projects/HMSSDEV/src/opensaf-4.6.0/rpms/BUILDROOT/zebha-1.1.0-1.fc17.i386/etc/init.d/plmcboot
  File not found:
 /home/girish/projects/HMSSDEV/src/opensaf-4.6.0/rpms/BUILDROOT/zebha-1.1.0-1.fc17.i386/etc/init.d/plmcd
 make: *** [rpm] Error 1
 #/home/girish/projects/HMSSDEV/src/opensaf-4.6.0find ./ -name plmcboot
 ./rpms/BUILD/zebha-1.1.0/contrib/plmc/scripts/plmcboot
 ./contrib/plmc/scripts/plmcboot


 Issue is still seen, Fedora 17, we use Linux kernel 3.4.44

 Regards,
 Girish



 -Original Message-
 From: Mathivanan Naickan Palanivelu [mailto:mathi.naic...@oracle.com]
 Sent: Friday, July 03, 2015 8:07 PM
 To: girish.nagar...@ipinfusion.com
 Cc: opensaf-users@lists.sourceforge.net
 Subject: Re: [users] Error compiling 4.6.0 version

 Are you still facing this problem?

 Iam interested in pursuing any distro-specific autotools' problem that might
 need attention!

 Mathi.


 - mathi.naic...@oracle.com wrote:

 Which distro is this?

 Do a 'make dist clean' followed by ./bootstrap.sh Are these
 files(plmcboot and plmcd) created ?
 i.e. $ find ./ -name plmcboot

 -Mathi.

 - girish.nagar...@ipinfusion.com wrote:

 Hi,



 Could someone please into this issue.



 Regards,

 Girish



 *From:* Girish Nagaraj [mailto:girish.nagar...@ipinfusion.com]
 *Sent:* Tuesday, June 23, 2015 2:27 PM
 *To:* 'opensaf-users@lists.sourceforge.net'
 *Subject:* Error compiling 4.6.0 version



 Hi,



 I tried build rpm of opensaf version 4.6.0 GA release, I ran into
 this error while preparing rpm:



 Requires: /bin/sh libSaAmf.so.0 libSaImmOi.so.0
 libSaImmOi.so.0(OPENSAF_IMM_A.02.01) libSaImmOm.so.0
 libSaImmOm.so.0(OPENSAF_IMM_A.02.01) libSaNtf.so.0 libc.so.6
 libc.so.6(GLIBC_2.0) libc.so.6(GLIBC_2.1) libc.so.6(GLIBC_2.3.4)
 libc.so.6(GLIBC_2.4) libdl.so.2 libdl.so.2(GLIBC_2.0)
 libdl.so.2(GLIBC_2.1)
 libgcc_s.so.1 libgcc_s.so.1(GCC_3.0) libgcc_s.so.1(GCC_3.3.1)
 libglib-2.0.so.0 libopenhpi.so.3 libopensaf_core.so.0
 libplmc_utils.so.0
 libpthread.so.0 libpthread.so.0(GLIBC_2.0)
 libpthread.so.0(GLIBC_2.1)
 libpthread.so.0(GLIBC_2.3.2) librda.so.0 librt.so.1
 librt.so.1(GLIBC_2.2)
 rtld(GNU_HASH)

 Processing files: zebha-plm-coordinator-1.1.0-1.fc17.i686

 error: File not found:

 /home/girish/projects/HMSSDEV/src/opensaf-4.6.0/rpms/BUILDROOT

Re: [users] Change in process ownership in 4.5?

2014-12-29 Thread Anders Widell

Well, the only way at the moment is if you put back the code that was 
removed in ticket [#1138]. :-)

It seems that this change was not backwards compatible so we will have 
to look at how to handle it.

regards,
Anders Widell

On 12/26/2014 06:36 PM, Tony Hart wrote:
 We were trying out osaf 4.5 in our system and noticed some failures. Seems 
 that our OSAF started processes are now running as root rather than with the 
 uid/gid from the executable.   Seems that this change in behavior maybe due 
 to patch #1138.

 Is there a way to revert to the original behavior?

 thanks
 —
 tony
 --
 Dive into the World of Parallel Programming! The Go Parallel Website,
 sponsored by Intel and developed in partnership with Slashdot Media, is your
 hub for all things parallel software development, from weekly thought
 leadership blogs to news, videos, case studies, tutorials and more. Take a
 look and join the conversation now. http://goparallel.sourceforge.net
 ___
 Opensaf-users mailing list
 Opensaf-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/opensaf-users



--
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net
___
Opensaf-users mailing list
Opensaf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-users

[users] Announcement of OpenSAF 4.5.0 (GA) and 4.4.1, 4.3.3 (maintenance) releases

2014-10-07 Thread Anders Widell

Hello,

The OpenSAF community is pleased to announce the General Availability of the 
OpenSAF 4.5.0 enhancements release. The source code for OpenSAF 4.5.0 and the 
corresponding documentation can be downloaded using the following links:

http://sourceforge.net/projects/opensaf/files/releases/opensaf-4.5.0.tar.gz/download
http://sourceforge.net/projects/opensaf/files/docs/opensaf-documentation-4.5.0.tar.gz/download

For a complete list of new features in this release, please refer to the NEWS 
at the wiki:

https://sourceforge.net/p/opensaf/wiki/NEWS-4.5.0/

We have also released the maintenance (patch) releases OpenSAF 4.4.1 and 
OpenSAF 4.3.3.

- The source code for OpenSAF 4.4.1 and the corresponding documentation can be 
downloaded using the following links:

http://sourceforge.net/projects/opensaf/files/releases/opensaf-4.4.1.tar.gz/download
http://sourceforge.net/projects/opensaf/files/docs/opensaf-documentation-4.4.1.tar.gz/download

See the ChangeLog for a full list of changes in this release:

https://sourceforge.net/p/opensaf/wiki/ChangeLog-4.4.1/

- The source code for OpenSAF 4.3.3 and the corresponding documentation can be 
downloaded using the following links:

http://sourceforge.net/projects/opensaf/files/releases/opensaf-4.3.3.tar.gz/download
http://sourceforge.net/projects/opensaf/files/docs/opensaf-documentation-4.3.3.tar.gz/download

See the ChangeLog for a full list of changes in this release:

https://sourceforge.net/p/opensaf/wiki/ChangeLog-4.3.3/

Thank you for your continued interest in OpenSAF and to everyone who has 
contributed to this release.

regards,
Anders Widell


--
Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer
Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports
Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper
Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer
http://pubads.g.doubleclick.net/gampad/clk?id=154622311iu=/4140/ostg.clktrk
___
Opensaf-users mailing list
Opensaf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-users

[users] OpenSAF 4.5 RC1 is available

2014-09-15 Thread Anders Widell

Hi!

A release candidate for the upcoming OpenSAF 4.5 is now available for 
download using this link:

http://sourceforge.net/projects/opensaf/files/testing/opensaf-4.5.RC1.tar.gz/download

Please download and test this release candidate, so that we can fix any 
remaining bugs and regressions before the final OpenSAF 4.5 release. If 
you write a defect ticket on this release candidate, specify 4.5.RC1 in 
the Version field when writing the ticket.

For a full list of enhancements implemented in OpenSAF 4.5, see the 
following page:

https://sourceforge.net/p/opensaf/tickets/search/?q=status%3Afixed+AND+_milestone%3A%284.5.FC+4.5.0%29+AND+_type%3Aenhancementlimit=50

regards,
Anders Widell


--
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191iu=/4140/ostg.clktrk
___
Opensaf-users mailing list
Opensaf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Re: [users] opensafd status not showing properly on payload

2014-05-22 Thread Anders Widell

Looks like pidofproc cannot find your process. Maybe something wrong
with the pid file? Try the following commands to check if the pid
matches your process:

cat /var/run/opensaf/osafamfnd.pid
pgrep -l osafamfnd

/ Anders W

On 05/22/2014 09:48 AM, Biswajit Panigrahi wrote:
Hi All,
Can any one help what could be the root cause.

Regards,
Biswajit

Connect with us:

www.techmahindra.com

-Original Message-
From: Biswajit Panigrahi [mailto:bp00106...@techmahindra.com]
Sent: Thursday, May 22, 2014 11:59 AM
To: opensaf-users@lists.sourceforge.net
Subject: [users] opensafd status not showing properly on payload

Hi All,
I am using opensaf-4.2.0 .I have made it to run at the boot up. After
starting the opensaf in payload and reboot the payload then once its comes up
after reboot, then /etc/init.d/opensafd status on payload shows The OpenSAF
HA Framework is not running but all the opensaf processes are up and running
fine. Even in controller /etc/init.d/opensafd status shows it(payload ) re
joined the cluster.

Since it already delivered to customer we can't use different version of
opensaf. So looking for your cooperation ASAP.
I have attached doc regarding Steps to re produce the issue.

I have attached /var/log/messages log file of controller .Since payload is
booting from PXEBOOT (booting from Lan) .So before reboot log will not be
available. After reboot of payload captured log is attached.

Note:n12s1 is my controller and n12s3 is my payload

Regards,
Biswajit Panigrahi
Department : Product Engg.
9/7 Hosur Road,bangalore-560029, INDIA
* Office: +918040243000|extn:3418
Mobile: + 91 9916674401
Email: biswajit.panigr...@techmahindra.commailto:a...@techmahindra.com

[cid:image001.jpg@01CF75B5.2C17C780]http://www.techmahindra.com/
Connect with us:
[cid:image002.jpg@01CF75B5.2C17C780]http://www.facebook.com/TechMahindra[cid:image003.jpg@01CF75B5.2C17C780]http://twitter.com/Tech_Mahindra[cid:image004.jpg@01CF75B5.2C17C780]http://www.linkedin.com/company/tech-mahindra[cid:image005.jpg@01CF75B5.2C17C780]http://www.youtube.com/user/techmahindra09
www.techmahindra.comhttp://www.techmahindra.com/

Disclaimer: This message and the information contained herein is proprietary
and confidential and subject to the Tech Mahindra policy statement, you may
review the policy at http://www.techmahindra.com/Disclaimer.html externally
http://tim.techmahindra.com/tim/disclaimer.html internally within
TechMahindra.

--
Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.
Get unparalleled scalability from the best Selenium testing platform available
Simple to use. Nothing to install. Get started now for free.
http://p.sf.net/sfu/SauceLabs
___
Opensaf-users mailing list
Opensaf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-users

Re: [users] opensafd status not showing properly on payload

2014-05-22 Thread Anders Widell

pidofproc is often implemented as a shell function, that in turn depends
on other shell commands in /bin/, /sbin or /usr/sbin. It is different on
different Linux distributions, so you need to check
/lib/lsb/init-functions to find out how it is implemented in your
system. Then check that all the required commands are available at the
time OpenSAF is started. Also check that /var/run/opensaf is available.

I would guess your problem is caused by starting OpenSAF before all
required file systems have been mounted. So you may wish to try starting
OpenSAF later during the boot process.

/ Anders Widell

On 05/22/2014 11:47 AM, Biswajit Panigrahi wrote:
Hi Anders,
Thanks a lot for your reply.
The only problem is only during boot up.
Once boot up if I restart the opensaf ,then its working fine.
There is no content in /var/run/opensaf after boot up in payload . But when
again I restart manually opensaf after the reboot ,all the required .pid
files are available.
My suspect is with pidofproc function in /etc/init.d/functions only during
booting.
Do you have any idea is there any dependency.
Since I have made the customized file system from scratch there could be
chance that some service need to run at the time of booting before calling
the /etc/init.d/functions

Regards,
Biswajit

-Original Message-
From: Anders Widell [mailto:anders.wid...@ericsson.com]
Sent: Thursday, May 22, 2014 2:34 PM
To: Biswajit Panigrahi; opensaf-users@lists.sourceforge.net
Subject: Re: [users] opensafd status not showing properly on payload

Looks like pidofproc cannot find your process. Maybe something wrong
with the pid file? Try the following commands to check if the pid
matches your process:

cat /var/run/opensaf/osafamfnd.pid
pgrep -l osafamfnd

/ Anders W

On 05/22/2014 09:48 AM, Biswajit Panigrahi wrote:
Hi All,
Can any one help what could be the root cause.

Regards,
Biswajit

Connect with us:

www.techmahindra.com

Hi All,
I am using opensaf-4.2.0 .I have made it to run at the boot up. After
starting the opensaf in payload and reboot the payload then once its comes
up after reboot, then /etc/init.d/opensafd status on payload shows The
OpenSAF HA Framework is not running but all the opensaf processes are up
and running fine. Even in controller /etc/init.d/opensafd status shows
it(payload ) re joined the cluster.

Since it already delivered to customer we can't use different version of
opensaf. So looking for your cooperation ASAP.
I have attached doc regarding Steps to re produce the issue.

Note:n12s1 is my controller and n12s3 is my payload

Disclaimer: This message and the information contained herein is
proprietary and confidential and subject to the Tech Mahindra policy
statement, you may review the policy at
http://www.techmahindra.com/Disclaimer.html externally
http://tim.techmahindra.com/tim/disclaimer.html internally within
TechMahindra.

Disclaimer: This message and the information contained herein is
proprietary and confidential and subject to the Tech Mahindra policy
statement, you may review the policy at
http://www.techmahindra.com/Disclaimer.html externally
http://tim.techmahindra.com/tim/disclaimer.html internally within
TechMahindra

Re: [users] opensaf_reboot - reboot -f (does it sync the file system)

2014-04-01 Thread Anders Widell

The reboot command in sysvinit does sync, unless you specify the -n option:

OPTIONS
-n Don't sync before reboot or halt. Note that the kernel and stor‐
age drivers may still sync.

On Ubuntu, the reboot command is provided by upstart rather than by 
sysvinit, and the upstart version of reboot does not seem to have the -n 
option. So potentially it may behave differently. Where does your reboot 
command come from? sysvinit or upstart? Or maybe busybox?

regards,
Anders Widell

2014-04-01 16:36, Tony Hart skrev:
 The opensaf_reboot script shuts-down the system using ‘reboot -f’  the 
 comment say that this command will do a filesystem sync.  Is this correct?  
 Research on the web suggests that ‘reboot -f’ DOES NOT do a filesystem sync.

 Can anyone confirm?

 # Stop some important opensaf processes to prevent bad things from happening
 $icmd pkill -STOP osafamfwd
 $icmd pkill -STOP osafamfnd
 $icmd pkill -STOP osafamfd
 $icmd pkill -STOP osaffmd

 # Reboot (not shutdown) system WITH file system sync
 $icmd /sbin/reboot -f


 thanks
 —
 tony
 --
 ___
 Opensaf-users mailing list
 Opensaf-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/opensaf-users





--
___
Opensaf-users mailing list
Opensaf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-users

[users] OpenSAF 4.2.4RC1 and 4.3.1RC1 Tagged and Available for Download

2013-07-26 Thread Anders Widell

OpenSAF 4.2.4RC1 and 4.3.1RC1 have been tagged, and the archives 
http://sourceforge.net/projects/opensaf/files/testing/opensaf-4.2.4RC1.tar.gz/download
 
and 
http://sourceforge.net/projects/opensaf/files/testing/opensaf-4.3.1RC1.tar.gz/download
 
are available for download.

Please test these release candidates and report any problems found. We 
are aiming to release OpenSAF 4.2.4 and 4.3.1 in the beginning of August.

regards,
Anders Widell


--
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831iu=/4140/ostg.clktrk
___
Opensaf-users mailing list
Opensaf-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-users

49 matches

Mail list logo