[tickets] [opensaf:tickets] #2006 NTFSv: Cluster rebooted with ntfd crashed on both controllers

2016-09-13 Thread Vu Minh Nguyen
- **status**: review --> fixed
- **assigned_to**: Vu Minh Nguyen -->  nobody 
- **Comment**:

changeset:   8057:8081a9ddd2fc
tag: tip
parent:  8055:fee502a9845c
user:Vu Minh Nguyen 
date:Tue Sep 13 13:15:21 2016 +0700
summary: ntf: cluster rebooted with ntfd crashed on both controllers [#2006]

changeset:   8056:280d00e0eba1
branch:  opensaf-5.1.x
parent:  8054:9e774234274a
user:Vu Minh Nguyen 
date:Tue Sep 13 13:15:21 2016 +0700
summary: ntf: cluster rebooted with ntfd crashed on both controllers [#2006]




---

** [tickets:#2006] NTFSv: Cluster rebooted with ntfd crashed on both 
controllers**

**Status:** fixed
**Milestone:** 5.1.RC1
**Created:** Wed Sep 07, 2016 06:42 AM UTC by Chani Srivastava
**Last Updated:** Thu Sep 08, 2016 11:55 AM UTC
**Owner:** nobody
**Attachments:**

- 
[NtfCrash.zip](https://sourceforge.net/p/opensaf/tickets/2006/attachment/NtfCrash.zip)
 (165.9 kB; application/zip)


OS : Suse PPC 64bit 
Changeset : 7997  ( 5.1.FC)
Setup : 4 nodes ( 2 controllers and 2 payloads with headless feature disabled & 
no PBE )

Ntfd traces and syslog for both controllers attached
*  Ntf Application is running on system
*  Will update ticket with core dump 

Note: The timings on system are not synced. After every reboot node timings are 
modified

BT:
0  0x0fffa0848100 in .raise () from /lib64/libc.so.6
1  0x0fffa0849d10 in .abort () from /lib64/libc.so.6
2  0x0fffa0e34234 in osaf_abort (i_cause=7) at osaf_utility.c:27
3  0x1001a2f8 in NtfLogger::logNotification (this=0x100ba768, notif=
std::tr1::shared_ptr (count 2, weak 0) 0x100b88f0) at NtfLogger.cc:247
4  0x10019e60 in NtfLogger::checkQueueAndLog (this=0x100ba768,
newNotif=std::tr1::shared_ptr (count 2, weak 0) 0x100b88f0) at 
NtfLogger.cc:181
5  0x10019a74 in NtfLogger::log (this=0x100ba768, 
notif=std::tr1::shared_ptr (count 2, weak 0) 0x100b88f0,
isLocal=true) at NtfLogger.cc:137
6  0x1002b528 in NtfAdmin::processNotification (this=0x100ba760, 
clientId=62, notificationType=SA_NTF_TYPE_ALARM,
sendNotInfo=0x100b8800, mdsCtxt=0x100bb3dc, notificationId=47) at 
NtfAdmin.cc:203
7  0x1002b938 in NtfAdmin::notificationReceived (this=0x100ba760, 
clientId=62, notificationType=SA_NTF_TYPE_ALARM,
sendNotInfo=0x100b8800, mdsCtxt=0x100bb3dc) at NtfAdmin.cc:257
8  0x1002ec20 in notificationReceived (clientId=62, 
notificationType=SA_NTF_TYPE_ALARM, sendNotInfo=0x100b8800,
mdsCtxt=0x100bb3dc) at NtfAdmin.cc:1012
9  0x10006410 in proc_send_not_msg (cb=0x10073190 <_ntfs_cb>, 
evt=0x100bb3d0) at ntfs_evt.c:447
10 0x10006b28 in process_api_evt (evt=0x100bb3d0) at ntfs_evt.c:628
11 0x10006c38 in ntfs_process_mbx (mbx=0x10073190 <_ntfs_cb>) at 
ntfs_evt.c:660
12 0x1000b6f4 in main (argc=2, argv=0xfffc056a7f8) at ntfs_main.c:399


Active Controler:
May 26 19:41:16 linux-pvra osafntfd[24205]: **osaf_abort(7) called from 
0x1001a2f8 with errno=11**
May 26 19:41:16 linux-pvra osafamfnd[24243]: NO 
'safComp=NTF,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
May 26 19:41:16 linux-pvra osafamfnd[24243]: ER 
safComp=NTF,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
May 26 19:41:16 linux-pvra osafamfnd[24243]: Rebooting OpenSAF NodeId = 131343 
EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 
131343, SupervisionTime = 60
May 26 19:41:16 linux-pvra opensaf_reboot: Rebooting local node; timeout=60
Jun  2 14:11:28 linux-pvra syslog-ng[1639]: syslog-ng starting up; 
version='2.0.9'


Ntf Trace:
May 26 19:41:16.767426 osafntfd [24205:lga_api.c:1190] TR logBufSize > 
strlen(logBuf) + 1
May 26 19:41:16.767436 osafntfd [24205:lga_api.c:1320] << saLogWriteLogAsync
Jun  2 14:11:47.153831 osafntfd [2958:ntfs_main.c:0181] >> initialize
Jun  2 14:11:47.175099 osafntfd [2958:ncs_main_pub.c:0220] TR
NCS:PROCESS_ID=2958



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2006 NTFSv: Cluster rebooted with ntfd crashed on both controllers

2016-09-08 Thread Vu Minh Nguyen
- **status**: accepted --> review



---

** [tickets:#2006] NTFSv: Cluster rebooted with ntfd crashed on both 
controllers**

**Status:** review
**Milestone:** 5.1.RC1
**Created:** Wed Sep 07, 2016 06:42 AM UTC by Chani Srivastava
**Last Updated:** Thu Sep 08, 2016 06:57 AM UTC
**Owner:** Vu Minh Nguyen
**Attachments:**

- 
[NtfCrash.zip](https://sourceforge.net/p/opensaf/tickets/2006/attachment/NtfCrash.zip)
 (165.9 kB; application/zip)


OS : Suse PPC 64bit 
Changeset : 7997  ( 5.1.FC)
Setup : 4 nodes ( 2 controllers and 2 payloads with headless feature disabled & 
no PBE )

Ntfd traces and syslog for both controllers attached
*  Ntf Application is running on system
*  Will update ticket with core dump 

Note: The timings on system are not synced. After every reboot node timings are 
modified

BT:
0  0x0fffa0848100 in .raise () from /lib64/libc.so.6
1  0x0fffa0849d10 in .abort () from /lib64/libc.so.6
2  0x0fffa0e34234 in osaf_abort (i_cause=7) at osaf_utility.c:27
3  0x1001a2f8 in NtfLogger::logNotification (this=0x100ba768, notif=
std::tr1::shared_ptr (count 2, weak 0) 0x100b88f0) at NtfLogger.cc:247
4  0x10019e60 in NtfLogger::checkQueueAndLog (this=0x100ba768,
newNotif=std::tr1::shared_ptr (count 2, weak 0) 0x100b88f0) at 
NtfLogger.cc:181
5  0x10019a74 in NtfLogger::log (this=0x100ba768, 
notif=std::tr1::shared_ptr (count 2, weak 0) 0x100b88f0,
isLocal=true) at NtfLogger.cc:137
6  0x1002b528 in NtfAdmin::processNotification (this=0x100ba760, 
clientId=62, notificationType=SA_NTF_TYPE_ALARM,
sendNotInfo=0x100b8800, mdsCtxt=0x100bb3dc, notificationId=47) at 
NtfAdmin.cc:203
7  0x1002b938 in NtfAdmin::notificationReceived (this=0x100ba760, 
clientId=62, notificationType=SA_NTF_TYPE_ALARM,
sendNotInfo=0x100b8800, mdsCtxt=0x100bb3dc) at NtfAdmin.cc:257
8  0x1002ec20 in notificationReceived (clientId=62, 
notificationType=SA_NTF_TYPE_ALARM, sendNotInfo=0x100b8800,
mdsCtxt=0x100bb3dc) at NtfAdmin.cc:1012
9  0x10006410 in proc_send_not_msg (cb=0x10073190 <_ntfs_cb>, 
evt=0x100bb3d0) at ntfs_evt.c:447
10 0x10006b28 in process_api_evt (evt=0x100bb3d0) at ntfs_evt.c:628
11 0x10006c38 in ntfs_process_mbx (mbx=0x10073190 <_ntfs_cb>) at 
ntfs_evt.c:660
12 0x1000b6f4 in main (argc=2, argv=0xfffc056a7f8) at ntfs_main.c:399


Active Controler:
May 26 19:41:16 linux-pvra osafntfd[24205]: **osaf_abort(7) called from 
0x1001a2f8 with errno=11**
May 26 19:41:16 linux-pvra osafamfnd[24243]: NO 
'safComp=NTF,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
May 26 19:41:16 linux-pvra osafamfnd[24243]: ER 
safComp=NTF,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
May 26 19:41:16 linux-pvra osafamfnd[24243]: Rebooting OpenSAF NodeId = 131343 
EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 
131343, SupervisionTime = 60
May 26 19:41:16 linux-pvra opensaf_reboot: Rebooting local node; timeout=60
Jun  2 14:11:28 linux-pvra syslog-ng[1639]: syslog-ng starting up; 
version='2.0.9'


Ntf Trace:
May 26 19:41:16.767426 osafntfd [24205:lga_api.c:1190] TR logBufSize > 
strlen(logBuf) + 1
May 26 19:41:16.767436 osafntfd [24205:lga_api.c:1320] << saLogWriteLogAsync
Jun  2 14:11:47.153831 osafntfd [2958:ntfs_main.c:0181] >> initialize
Jun  2 14:11:47.175099 osafntfd [2958:ncs_main_pub.c:0220] TR
NCS:PROCESS_ID=2958



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2006 NTFSv: Cluster rebooted with ntfd crashed on both controllers

2016-09-08 Thread Vu Minh Nguyen
AIS states`additionalText` and `lengthAdditionalText` must be consistent.
Need to add an check of this. Return INVALID_PARAM if there is a mismatch.


---

** [tickets:#2006] NTFSv: Cluster rebooted with ntfd crashed on both 
controllers**

**Status:** accepted
**Milestone:** 5.1.RC1
**Created:** Wed Sep 07, 2016 06:42 AM UTC by Chani Srivastava
**Last Updated:** Wed Sep 07, 2016 08:22 AM UTC
**Owner:** Vu Minh Nguyen
**Attachments:**

- 
[NtfCrash.zip](https://sourceforge.net/p/opensaf/tickets/2006/attachment/NtfCrash.zip)
 (165.9 kB; application/zip)


OS : Suse PPC 64bit 
Changeset : 7997  ( 5.1.FC)
Setup : 4 nodes ( 2 controllers and 2 payloads with headless feature disabled & 
no PBE )

Ntfd traces and syslog for both controllers attached
*  Ntf Application is running on system
*  Will update ticket with core dump 

Note: The timings on system are not synced. After every reboot node timings are 
modified

BT:
0  0x0fffa0848100 in .raise () from /lib64/libc.so.6
1  0x0fffa0849d10 in .abort () from /lib64/libc.so.6
2  0x0fffa0e34234 in osaf_abort (i_cause=7) at osaf_utility.c:27
3  0x1001a2f8 in NtfLogger::logNotification (this=0x100ba768, notif=
std::tr1::shared_ptr (count 2, weak 0) 0x100b88f0) at NtfLogger.cc:247
4  0x10019e60 in NtfLogger::checkQueueAndLog (this=0x100ba768,
newNotif=std::tr1::shared_ptr (count 2, weak 0) 0x100b88f0) at 
NtfLogger.cc:181
5  0x10019a74 in NtfLogger::log (this=0x100ba768, 
notif=std::tr1::shared_ptr (count 2, weak 0) 0x100b88f0,
isLocal=true) at NtfLogger.cc:137
6  0x1002b528 in NtfAdmin::processNotification (this=0x100ba760, 
clientId=62, notificationType=SA_NTF_TYPE_ALARM,
sendNotInfo=0x100b8800, mdsCtxt=0x100bb3dc, notificationId=47) at 
NtfAdmin.cc:203
7  0x1002b938 in NtfAdmin::notificationReceived (this=0x100ba760, 
clientId=62, notificationType=SA_NTF_TYPE_ALARM,
sendNotInfo=0x100b8800, mdsCtxt=0x100bb3dc) at NtfAdmin.cc:257
8  0x1002ec20 in notificationReceived (clientId=62, 
notificationType=SA_NTF_TYPE_ALARM, sendNotInfo=0x100b8800,
mdsCtxt=0x100bb3dc) at NtfAdmin.cc:1012
9  0x10006410 in proc_send_not_msg (cb=0x10073190 <_ntfs_cb>, 
evt=0x100bb3d0) at ntfs_evt.c:447
10 0x10006b28 in process_api_evt (evt=0x100bb3d0) at ntfs_evt.c:628
11 0x10006c38 in ntfs_process_mbx (mbx=0x10073190 <_ntfs_cb>) at 
ntfs_evt.c:660
12 0x1000b6f4 in main (argc=2, argv=0xfffc056a7f8) at ntfs_main.c:399


Active Controler:
May 26 19:41:16 linux-pvra osafntfd[24205]: **osaf_abort(7) called from 
0x1001a2f8 with errno=11**
May 26 19:41:16 linux-pvra osafamfnd[24243]: NO 
'safComp=NTF,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
May 26 19:41:16 linux-pvra osafamfnd[24243]: ER 
safComp=NTF,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
May 26 19:41:16 linux-pvra osafamfnd[24243]: Rebooting OpenSAF NodeId = 131343 
EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 
131343, SupervisionTime = 60
May 26 19:41:16 linux-pvra opensaf_reboot: Rebooting local node; timeout=60
Jun  2 14:11:28 linux-pvra syslog-ng[1639]: syslog-ng starting up; 
version='2.0.9'


Ntf Trace:
May 26 19:41:16.767426 osafntfd [24205:lga_api.c:1190] TR logBufSize > 
strlen(logBuf) + 1
May 26 19:41:16.767436 osafntfd [24205:lga_api.c:1320] << saLogWriteLogAsync
Jun  2 14:11:47.153831 osafntfd [2958:ntfs_main.c:0181] >> initialize
Jun  2 14:11:47.175099 osafntfd [2958:ncs_main_pub.c:0220] TR
NCS:PROCESS_ID=2958



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2006 NTFSv: Cluster rebooted with ntfd crashed on both controllers

2016-09-07 Thread Vu Minh Nguyen
- **status**: unassigned --> accepted
- **assigned_to**: Vu Minh Nguyen



---

** [tickets:#2006] NTFSv: Cluster rebooted with ntfd crashed on both 
controllers**

**Status:** accepted
**Milestone:** 5.1.RC1
**Created:** Wed Sep 07, 2016 06:42 AM UTC by Chani Srivastava
**Last Updated:** Wed Sep 07, 2016 08:17 AM UTC
**Owner:** Vu Minh Nguyen
**Attachments:**

- 
[NtfCrash.zip](https://sourceforge.net/p/opensaf/tickets/2006/attachment/NtfCrash.zip)
 (165.9 kB; application/zip)


OS : Suse PPC 64bit 
Changeset : 7997  ( 5.1.FC)
Setup : 4 nodes ( 2 controllers and 2 payloads with headless feature disabled & 
no PBE )

Ntfd traces and syslog for both controllers attached
*  Ntf Application is running on system
*  Will update ticket with core dump 

Note: The timings on system are not synced. After every reboot node timings are 
modified

BT:
0  0x0fffa0848100 in .raise () from /lib64/libc.so.6
1  0x0fffa0849d10 in .abort () from /lib64/libc.so.6
2  0x0fffa0e34234 in osaf_abort (i_cause=7) at osaf_utility.c:27
3  0x1001a2f8 in NtfLogger::logNotification (this=0x100ba768, notif=
std::tr1::shared_ptr (count 2, weak 0) 0x100b88f0) at NtfLogger.cc:247
4  0x10019e60 in NtfLogger::checkQueueAndLog (this=0x100ba768,
newNotif=std::tr1::shared_ptr (count 2, weak 0) 0x100b88f0) at 
NtfLogger.cc:181
5  0x10019a74 in NtfLogger::log (this=0x100ba768, 
notif=std::tr1::shared_ptr (count 2, weak 0) 0x100b88f0,
isLocal=true) at NtfLogger.cc:137
6  0x1002b528 in NtfAdmin::processNotification (this=0x100ba760, 
clientId=62, notificationType=SA_NTF_TYPE_ALARM,
sendNotInfo=0x100b8800, mdsCtxt=0x100bb3dc, notificationId=47) at 
NtfAdmin.cc:203
7  0x1002b938 in NtfAdmin::notificationReceived (this=0x100ba760, 
clientId=62, notificationType=SA_NTF_TYPE_ALARM,
sendNotInfo=0x100b8800, mdsCtxt=0x100bb3dc) at NtfAdmin.cc:257
8  0x1002ec20 in notificationReceived (clientId=62, 
notificationType=SA_NTF_TYPE_ALARM, sendNotInfo=0x100b8800,
mdsCtxt=0x100bb3dc) at NtfAdmin.cc:1012
9  0x10006410 in proc_send_not_msg (cb=0x10073190 <_ntfs_cb>, 
evt=0x100bb3d0) at ntfs_evt.c:447
10 0x10006b28 in process_api_evt (evt=0x100bb3d0) at ntfs_evt.c:628
11 0x10006c38 in ntfs_process_mbx (mbx=0x10073190 <_ntfs_cb>) at 
ntfs_evt.c:660
12 0x1000b6f4 in main (argc=2, argv=0xfffc056a7f8) at ntfs_main.c:399


Active Controler:
May 26 19:41:16 linux-pvra osafntfd[24205]: **osaf_abort(7) called from 
0x1001a2f8 with errno=11**
May 26 19:41:16 linux-pvra osafamfnd[24243]: NO 
'safComp=NTF,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
May 26 19:41:16 linux-pvra osafamfnd[24243]: ER 
safComp=NTF,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
May 26 19:41:16 linux-pvra osafamfnd[24243]: Rebooting OpenSAF NodeId = 131343 
EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 
131343, SupervisionTime = 60
May 26 19:41:16 linux-pvra opensaf_reboot: Rebooting local node; timeout=60
Jun  2 14:11:28 linux-pvra syslog-ng[1639]: syslog-ng starting up; 
version='2.0.9'


Ntf Trace:
May 26 19:41:16.767426 osafntfd [24205:lga_api.c:1190] TR logBufSize > 
strlen(logBuf) + 1
May 26 19:41:16.767436 osafntfd [24205:lga_api.c:1320] << saLogWriteLogAsync
Jun  2 14:11:47.153831 osafntfd [2958:ntfs_main.c:0181] >> initialize
Jun  2 14:11:47.175099 osafntfd [2958:ncs_main_pub.c:0220] TR
NCS:PROCESS_ID=2958



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2006 NTFSv: Cluster rebooted with ntfd crashed on both controllers

2016-09-07 Thread Anders Widell
A bug in the test app shouldn't cause the NTF server to crash.


---

** [tickets:#2006] NTFSv: Cluster rebooted with ntfd crashed on both 
controllers**

**Status:** unassigned
**Milestone:** 5.1.RC1
**Created:** Wed Sep 07, 2016 06:42 AM UTC by Chani Srivastava
**Last Updated:** Wed Sep 07, 2016 07:24 AM UTC
**Owner:** nobody
**Attachments:**

- 
[NtfCrash.zip](https://sourceforge.net/p/opensaf/tickets/2006/attachment/NtfCrash.zip)
 (165.9 kB; application/zip)


OS : Suse PPC 64bit 
Changeset : 7997  ( 5.1.FC)
Setup : 4 nodes ( 2 controllers and 2 payloads with headless feature disabled & 
no PBE )

Ntfd traces and syslog for both controllers attached
*  Ntf Application is running on system
*  Will update ticket with core dump 

Note: The timings on system are not synced. After every reboot node timings are 
modified

BT:
0  0x0fffa0848100 in .raise () from /lib64/libc.so.6
1  0x0fffa0849d10 in .abort () from /lib64/libc.so.6
2  0x0fffa0e34234 in osaf_abort (i_cause=7) at osaf_utility.c:27
3  0x1001a2f8 in NtfLogger::logNotification (this=0x100ba768, notif=
std::tr1::shared_ptr (count 2, weak 0) 0x100b88f0) at NtfLogger.cc:247
4  0x10019e60 in NtfLogger::checkQueueAndLog (this=0x100ba768,
newNotif=std::tr1::shared_ptr (count 2, weak 0) 0x100b88f0) at 
NtfLogger.cc:181
5  0x10019a74 in NtfLogger::log (this=0x100ba768, 
notif=std::tr1::shared_ptr (count 2, weak 0) 0x100b88f0,
isLocal=true) at NtfLogger.cc:137
6  0x1002b528 in NtfAdmin::processNotification (this=0x100ba760, 
clientId=62, notificationType=SA_NTF_TYPE_ALARM,
sendNotInfo=0x100b8800, mdsCtxt=0x100bb3dc, notificationId=47) at 
NtfAdmin.cc:203
7  0x1002b938 in NtfAdmin::notificationReceived (this=0x100ba760, 
clientId=62, notificationType=SA_NTF_TYPE_ALARM,
sendNotInfo=0x100b8800, mdsCtxt=0x100bb3dc) at NtfAdmin.cc:257
8  0x1002ec20 in notificationReceived (clientId=62, 
notificationType=SA_NTF_TYPE_ALARM, sendNotInfo=0x100b8800,
mdsCtxt=0x100bb3dc) at NtfAdmin.cc:1012
9  0x10006410 in proc_send_not_msg (cb=0x10073190 <_ntfs_cb>, 
evt=0x100bb3d0) at ntfs_evt.c:447
10 0x10006b28 in process_api_evt (evt=0x100bb3d0) at ntfs_evt.c:628
11 0x10006c38 in ntfs_process_mbx (mbx=0x10073190 <_ntfs_cb>) at 
ntfs_evt.c:660
12 0x1000b6f4 in main (argc=2, argv=0xfffc056a7f8) at ntfs_main.c:399


Active Controler:
May 26 19:41:16 linux-pvra osafntfd[24205]: **osaf_abort(7) called from 
0x1001a2f8 with errno=11**
May 26 19:41:16 linux-pvra osafamfnd[24243]: NO 
'safComp=NTF,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
May 26 19:41:16 linux-pvra osafamfnd[24243]: ER 
safComp=NTF,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
May 26 19:41:16 linux-pvra osafamfnd[24243]: Rebooting OpenSAF NodeId = 131343 
EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 
131343, SupervisionTime = 60
May 26 19:41:16 linux-pvra opensaf_reboot: Rebooting local node; timeout=60
Jun  2 14:11:28 linux-pvra syslog-ng[1639]: syslog-ng starting up; 
version='2.0.9'


Ntf Trace:
May 26 19:41:16.767426 osafntfd [24205:lga_api.c:1190] TR logBufSize > 
strlen(logBuf) + 1
May 26 19:41:16.767436 osafntfd [24205:lga_api.c:1320] << saLogWriteLogAsync
Jun  2 14:11:47.153831 osafntfd [2958:ntfs_main.c:0181] >> initialize
Jun  2 14:11:47.175099 osafntfd [2958:ncs_main_pub.c:0220] TR
NCS:PROCESS_ID=2958



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2006 NTFSv: Cluster rebooted with ntfd crashed on both controllers

2016-09-07 Thread Vu Minh Nguyen
I see this trace `logBufSize > strlen(logBuf) + 1`. 

This happened because log client (ntf) sent mismatch data in `SaLogBufferT`. 
Please check your test app if it have constructed the data `SaLogbufferT` is 
correct or not. Be sure the `logBufSize` must less than or equal to 
`strlen(logBuf) + 1`.






---

** [tickets:#2006] NTFSv: Cluster rebooted with ntfd crashed on both 
controllers**

**Status:** unassigned
**Milestone:** 5.1.RC1
**Created:** Wed Sep 07, 2016 06:42 AM UTC by Chani Srivastava
**Last Updated:** Wed Sep 07, 2016 07:13 AM UTC
**Owner:** nobody
**Attachments:**

- 
[NtfCrash.zip](https://sourceforge.net/p/opensaf/tickets/2006/attachment/NtfCrash.zip)
 (165.9 kB; application/zip)


OS : Suse PPC 64bit 
Changeset : 7997  ( 5.1.FC)
Setup : 4 nodes ( 2 controllers and 2 payloads with headless feature disabled & 
no PBE )

Ntfd traces and syslog for both controllers attached
*  Ntf Application is running on system
*  Will update ticket with core dump 

Note: The timings on system are not synced. After every reboot node timings are 
modified

BT:
0  0x0fffa0848100 in .raise () from /lib64/libc.so.6
1  0x0fffa0849d10 in .abort () from /lib64/libc.so.6
2  0x0fffa0e34234 in osaf_abort (i_cause=7) at osaf_utility.c:27
3  0x1001a2f8 in NtfLogger::logNotification (this=0x100ba768, notif=
std::tr1::shared_ptr (count 2, weak 0) 0x100b88f0) at NtfLogger.cc:247
4  0x10019e60 in NtfLogger::checkQueueAndLog (this=0x100ba768,
newNotif=std::tr1::shared_ptr (count 2, weak 0) 0x100b88f0) at 
NtfLogger.cc:181
5  0x10019a74 in NtfLogger::log (this=0x100ba768, 
notif=std::tr1::shared_ptr (count 2, weak 0) 0x100b88f0,
isLocal=true) at NtfLogger.cc:137
6  0x1002b528 in NtfAdmin::processNotification (this=0x100ba760, 
clientId=62, notificationType=SA_NTF_TYPE_ALARM,
sendNotInfo=0x100b8800, mdsCtxt=0x100bb3dc, notificationId=47) at 
NtfAdmin.cc:203
7  0x1002b938 in NtfAdmin::notificationReceived (this=0x100ba760, 
clientId=62, notificationType=SA_NTF_TYPE_ALARM,
sendNotInfo=0x100b8800, mdsCtxt=0x100bb3dc) at NtfAdmin.cc:257
8  0x1002ec20 in notificationReceived (clientId=62, 
notificationType=SA_NTF_TYPE_ALARM, sendNotInfo=0x100b8800,
mdsCtxt=0x100bb3dc) at NtfAdmin.cc:1012
9  0x10006410 in proc_send_not_msg (cb=0x10073190 <_ntfs_cb>, 
evt=0x100bb3d0) at ntfs_evt.c:447
10 0x10006b28 in process_api_evt (evt=0x100bb3d0) at ntfs_evt.c:628
11 0x10006c38 in ntfs_process_mbx (mbx=0x10073190 <_ntfs_cb>) at 
ntfs_evt.c:660
12 0x1000b6f4 in main (argc=2, argv=0xfffc056a7f8) at ntfs_main.c:399


Active Controler:
May 26 19:41:16 linux-pvra osafntfd[24205]: **osaf_abort(7) called from 
0x1001a2f8 with errno=11**
May 26 19:41:16 linux-pvra osafamfnd[24243]: NO 
'safComp=NTF,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
May 26 19:41:16 linux-pvra osafamfnd[24243]: ER 
safComp=NTF,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
May 26 19:41:16 linux-pvra osafamfnd[24243]: Rebooting OpenSAF NodeId = 131343 
EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 
131343, SupervisionTime = 60
May 26 19:41:16 linux-pvra opensaf_reboot: Rebooting local node; timeout=60
Jun  2 14:11:28 linux-pvra syslog-ng[1639]: syslog-ng starting up; 
version='2.0.9'


Ntf Trace:
May 26 19:41:16.767426 osafntfd [24205:lga_api.c:1190] TR logBufSize > 
strlen(logBuf) + 1
May 26 19:41:16.767436 osafntfd [24205:lga_api.c:1320] << saLogWriteLogAsync
Jun  2 14:11:47.153831 osafntfd [2958:ntfs_main.c:0181] >> initialize
Jun  2 14:11:47.175099 osafntfd [2958:ncs_main_pub.c:0220] TR
NCS:PROCESS_ID=2958



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets


[tickets] [opensaf:tickets] #2006 NTFSv: Cluster rebooted with ntfd crashed on both controllers

2016-09-07 Thread Chani Srivastava
- Description has changed:

Diff:



--- old
+++ new
@@ -7,6 +7,28 @@
 *  Will update ticket with core dump 
 
 Note: The timings on system are not synced. After every reboot node timings 
are modified
+
+BT:
+0  0x0fffa0848100 in .raise () from /lib64/libc.so.6
+1  0x0fffa0849d10 in .abort () from /lib64/libc.so.6
+2  0x0fffa0e34234 in osaf_abort (i_cause=7) at osaf_utility.c:27
+3  0x1001a2f8 in NtfLogger::logNotification (this=0x100ba768, notif=
+std::tr1::shared_ptr (count 2, weak 0) 0x100b88f0) at NtfLogger.cc:247
+4  0x10019e60 in NtfLogger::checkQueueAndLog (this=0x100ba768,
+newNotif=std::tr1::shared_ptr (count 2, weak 0) 0x100b88f0) at 
NtfLogger.cc:181
+5  0x10019a74 in NtfLogger::log (this=0x100ba768, 
notif=std::tr1::shared_ptr (count 2, weak 0) 0x100b88f0,
+isLocal=true) at NtfLogger.cc:137
+6  0x1002b528 in NtfAdmin::processNotification (this=0x100ba760, 
clientId=62, notificationType=SA_NTF_TYPE_ALARM,
+sendNotInfo=0x100b8800, mdsCtxt=0x100bb3dc, notificationId=47) at 
NtfAdmin.cc:203
+7  0x1002b938 in NtfAdmin::notificationReceived (this=0x100ba760, 
clientId=62, notificationType=SA_NTF_TYPE_ALARM,
+sendNotInfo=0x100b8800, mdsCtxt=0x100bb3dc) at NtfAdmin.cc:257
+8  0x1002ec20 in notificationReceived (clientId=62, 
notificationType=SA_NTF_TYPE_ALARM, sendNotInfo=0x100b8800,
+mdsCtxt=0x100bb3dc) at NtfAdmin.cc:1012
+9  0x10006410 in proc_send_not_msg (cb=0x10073190 <_ntfs_cb>, 
evt=0x100bb3d0) at ntfs_evt.c:447
+10 0x10006b28 in process_api_evt (evt=0x100bb3d0) at ntfs_evt.c:628
+11 0x10006c38 in ntfs_process_mbx (mbx=0x10073190 <_ntfs_cb>) at 
ntfs_evt.c:660
+12 0x1000b6f4 in main (argc=2, argv=0xfffc056a7f8) at ntfs_main.c:399
+
 
 Active Controler:
 May 26 19:41:16 linux-pvra osafntfd[24205]: **osaf_abort(7) called from 
0x1001a2f8 with errno=11**






---

** [tickets:#2006] NTFSv: Cluster rebooted with ntfd crashed on both 
controllers**

**Status:** unassigned
**Milestone:** 5.1.RC1
**Created:** Wed Sep 07, 2016 06:42 AM UTC by Chani Srivastava
**Last Updated:** Wed Sep 07, 2016 06:42 AM UTC
**Owner:** nobody
**Attachments:**

- 
[NtfCrash.zip](https://sourceforge.net/p/opensaf/tickets/2006/attachment/NtfCrash.zip)
 (165.9 kB; application/zip)


OS : Suse PPC 64bit 
Changeset : 7997  ( 5.1.FC)
Setup : 4 nodes ( 2 controllers and 2 payloads with headless feature disabled & 
no PBE )

Ntfd traces and syslog for both controllers attached
*  Ntf Application is running on system
*  Will update ticket with core dump 

Note: The timings on system are not synced. After every reboot node timings are 
modified

BT:
0  0x0fffa0848100 in .raise () from /lib64/libc.so.6
1  0x0fffa0849d10 in .abort () from /lib64/libc.so.6
2  0x0fffa0e34234 in osaf_abort (i_cause=7) at osaf_utility.c:27
3  0x1001a2f8 in NtfLogger::logNotification (this=0x100ba768, notif=
std::tr1::shared_ptr (count 2, weak 0) 0x100b88f0) at NtfLogger.cc:247
4  0x10019e60 in NtfLogger::checkQueueAndLog (this=0x100ba768,
newNotif=std::tr1::shared_ptr (count 2, weak 0) 0x100b88f0) at 
NtfLogger.cc:181
5  0x10019a74 in NtfLogger::log (this=0x100ba768, 
notif=std::tr1::shared_ptr (count 2, weak 0) 0x100b88f0,
isLocal=true) at NtfLogger.cc:137
6  0x1002b528 in NtfAdmin::processNotification (this=0x100ba760, 
clientId=62, notificationType=SA_NTF_TYPE_ALARM,
sendNotInfo=0x100b8800, mdsCtxt=0x100bb3dc, notificationId=47) at 
NtfAdmin.cc:203
7  0x1002b938 in NtfAdmin::notificationReceived (this=0x100ba760, 
clientId=62, notificationType=SA_NTF_TYPE_ALARM,
sendNotInfo=0x100b8800, mdsCtxt=0x100bb3dc) at NtfAdmin.cc:257
8  0x1002ec20 in notificationReceived (clientId=62, 
notificationType=SA_NTF_TYPE_ALARM, sendNotInfo=0x100b8800,
mdsCtxt=0x100bb3dc) at NtfAdmin.cc:1012
9  0x10006410 in proc_send_not_msg (cb=0x10073190 <_ntfs_cb>, 
evt=0x100bb3d0) at ntfs_evt.c:447
10 0x10006b28 in process_api_evt (evt=0x100bb3d0) at ntfs_evt.c:628
11 0x10006c38 in ntfs_process_mbx (mbx=0x10073190 <_ntfs_cb>) at 
ntfs_evt.c:660
12 0x1000b6f4 in main (argc=2, argv=0xfffc056a7f8) at ntfs_main.c:399


Active Controler:
May 26 19:41:16 linux-pvra osafntfd[24205]: **osaf_abort(7) called from 
0x1001a2f8 with errno=11**
May 26 19:41:16 linux-pvra osafamfnd[24243]: NO 
'safComp=NTF,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
May 26 19:41:16 linux-pvra osafamfnd[24243]: ER 
safComp=NTF,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
May 26 19:41:16 linux-pvra osafamfnd[24243]: Rebooting OpenSAF NodeId = 131343 
EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 
131343, SupervisionTime = 60
May 26 19:41:16 linux-pvra opensaf_reboot: Rebooting local node; timeout=60
Jun  2 14:11:28 linux-pvra syslog-ng[1639]: syslog-ng 

[tickets] [opensaf:tickets] #2006 NTFSv: Cluster rebooted with ntfd crashed on both controllers

2016-09-07 Thread Chani Srivastava



---

** [tickets:#2006] NTFSv: Cluster rebooted with ntfd crashed on both 
controllers**

**Status:** unassigned
**Milestone:** 5.1.RC1
**Created:** Wed Sep 07, 2016 06:42 AM UTC by Chani Srivastava
**Last Updated:** Wed Sep 07, 2016 06:42 AM UTC
**Owner:** nobody
**Attachments:**

- 
[NtfCrash.zip](https://sourceforge.net/p/opensaf/tickets/2006/attachment/NtfCrash.zip)
 (165.9 kB; application/zip)


OS : Suse PPC 64bit 
Changeset : 7997  ( 5.1.FC)
Setup : 4 nodes ( 2 controllers and 2 payloads with headless feature disabled & 
no PBE )

Ntfd traces and syslog for both controllers attached
*  Ntf Application is running on system
*  Will update ticket with core dump 

Note: The timings on system are not synced. After every reboot node timings are 
modified

Active Controler:
May 26 19:41:16 linux-pvra osafntfd[24205]: **osaf_abort(7) called from 
0x1001a2f8 with errno=11**
May 26 19:41:16 linux-pvra osafamfnd[24243]: NO 
'safComp=NTF,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : 
Recovery is 'nodeFailfast'
May 26 19:41:16 linux-pvra osafamfnd[24243]: ER 
safComp=NTF,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery 
is:nodeFailfast
May 26 19:41:16 linux-pvra osafamfnd[24243]: Rebooting OpenSAF NodeId = 131343 
EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 
131343, SupervisionTime = 60
May 26 19:41:16 linux-pvra opensaf_reboot: Rebooting local node; timeout=60
Jun  2 14:11:28 linux-pvra syslog-ng[1639]: syslog-ng starting up; 
version='2.0.9'


Ntf Trace:
May 26 19:41:16.767426 osafntfd [24205:lga_api.c:1190] TR logBufSize > 
strlen(logBuf) + 1
May 26 19:41:16.767436 osafntfd [24205:lga_api.c:1320] << saLogWriteLogAsync
Jun  2 14:11:47.153831 osafntfd [2958:ntfs_main.c:0181] >> initialize
Jun  2 14:11:47.175099 osafntfd [2958:ncs_main_pub.c:0220] TR
NCS:PROCESS_ID=2958



---

Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is 
subscribed to https://sourceforge.net/p/opensaf/tickets/

To unsubscribe from further messages, a project admin can change settings at 
https://sourceforge.net/p/opensaf/admin/tickets/options.  Or, if this is a 
mailing list, you can unsubscribe from the mailing list.--
___
Opensaf-tickets mailing list
Opensaf-tickets@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opensaf-tickets