[tickets] [opensaf:tickets] #2006 NTFSv: Cluster rebooted with ntfd crashed on both controllers
- **status**: review --> fixed - **assigned_to**: Vu Minh Nguyen --> nobody - **Comment**: changeset: 8057:8081a9ddd2fc tag: tip parent: 8055:fee502a9845c user:Vu Minh Nguyendate:Tue Sep 13 13:15:21 2016 +0700 summary: ntf: cluster rebooted with ntfd crashed on both controllers [#2006] changeset: 8056:280d00e0eba1 branch: opensaf-5.1.x parent: 8054:9e774234274a user:Vu Minh Nguyen date:Tue Sep 13 13:15:21 2016 +0700 summary: ntf: cluster rebooted with ntfd crashed on both controllers [#2006] --- ** [tickets:#2006] NTFSv: Cluster rebooted with ntfd crashed on both controllers** **Status:** fixed **Milestone:** 5.1.RC1 **Created:** Wed Sep 07, 2016 06:42 AM UTC by Chani Srivastava **Last Updated:** Thu Sep 08, 2016 11:55 AM UTC **Owner:** nobody **Attachments:** - [NtfCrash.zip](https://sourceforge.net/p/opensaf/tickets/2006/attachment/NtfCrash.zip) (165.9 kB; application/zip) OS : Suse PPC 64bit Changeset : 7997 ( 5.1.FC) Setup : 4 nodes ( 2 controllers and 2 payloads with headless feature disabled & no PBE ) Ntfd traces and syslog for both controllers attached * Ntf Application is running on system * Will update ticket with core dump Note: The timings on system are not synced. After every reboot node timings are modified BT: 0 0x0fffa0848100 in .raise () from /lib64/libc.so.6 1 0x0fffa0849d10 in .abort () from /lib64/libc.so.6 2 0x0fffa0e34234 in osaf_abort (i_cause=7) at osaf_utility.c:27 3 0x1001a2f8 in NtfLogger::logNotification (this=0x100ba768, notif= std::tr1::shared_ptr (count 2, weak 0) 0x100b88f0) at NtfLogger.cc:247 4 0x10019e60 in NtfLogger::checkQueueAndLog (this=0x100ba768, newNotif=std::tr1::shared_ptr (count 2, weak 0) 0x100b88f0) at NtfLogger.cc:181 5 0x10019a74 in NtfLogger::log (this=0x100ba768, notif=std::tr1::shared_ptr (count 2, weak 0) 0x100b88f0, isLocal=true) at NtfLogger.cc:137 6 0x1002b528 in NtfAdmin::processNotification (this=0x100ba760, clientId=62, notificationType=SA_NTF_TYPE_ALARM, sendNotInfo=0x100b8800, mdsCtxt=0x100bb3dc, notificationId=47) at NtfAdmin.cc:203 7 0x1002b938 in NtfAdmin::notificationReceived (this=0x100ba760, clientId=62, notificationType=SA_NTF_TYPE_ALARM, sendNotInfo=0x100b8800, mdsCtxt=0x100bb3dc) at NtfAdmin.cc:257 8 0x1002ec20 in notificationReceived (clientId=62, notificationType=SA_NTF_TYPE_ALARM, sendNotInfo=0x100b8800, mdsCtxt=0x100bb3dc) at NtfAdmin.cc:1012 9 0x10006410 in proc_send_not_msg (cb=0x10073190 <_ntfs_cb>, evt=0x100bb3d0) at ntfs_evt.c:447 10 0x10006b28 in process_api_evt (evt=0x100bb3d0) at ntfs_evt.c:628 11 0x10006c38 in ntfs_process_mbx (mbx=0x10073190 <_ntfs_cb>) at ntfs_evt.c:660 12 0x1000b6f4 in main (argc=2, argv=0xfffc056a7f8) at ntfs_main.c:399 Active Controler: May 26 19:41:16 linux-pvra osafntfd[24205]: **osaf_abort(7) called from 0x1001a2f8 with errno=11** May 26 19:41:16 linux-pvra osafamfnd[24243]: NO 'safComp=NTF,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'nodeFailfast' May 26 19:41:16 linux-pvra osafamfnd[24243]: ER safComp=NTF,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery is:nodeFailfast May 26 19:41:16 linux-pvra osafamfnd[24243]: Rebooting OpenSAF NodeId = 131343 EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 131343, SupervisionTime = 60 May 26 19:41:16 linux-pvra opensaf_reboot: Rebooting local node; timeout=60 Jun 2 14:11:28 linux-pvra syslog-ng[1639]: syslog-ng starting up; version='2.0.9' Ntf Trace: May 26 19:41:16.767426 osafntfd [24205:lga_api.c:1190] TR logBufSize > strlen(logBuf) + 1 May 26 19:41:16.767436 osafntfd [24205:lga_api.c:1320] << saLogWriteLogAsync Jun 2 14:11:47.153831 osafntfd [2958:ntfs_main.c:0181] >> initialize Jun 2 14:11:47.175099 osafntfd [2958:ncs_main_pub.c:0220] TR NCS:PROCESS_ID=2958 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2006 NTFSv: Cluster rebooted with ntfd crashed on both controllers
- **status**: accepted --> review --- ** [tickets:#2006] NTFSv: Cluster rebooted with ntfd crashed on both controllers** **Status:** review **Milestone:** 5.1.RC1 **Created:** Wed Sep 07, 2016 06:42 AM UTC by Chani Srivastava **Last Updated:** Thu Sep 08, 2016 06:57 AM UTC **Owner:** Vu Minh Nguyen **Attachments:** - [NtfCrash.zip](https://sourceforge.net/p/opensaf/tickets/2006/attachment/NtfCrash.zip) (165.9 kB; application/zip) OS : Suse PPC 64bit Changeset : 7997 ( 5.1.FC) Setup : 4 nodes ( 2 controllers and 2 payloads with headless feature disabled & no PBE ) Ntfd traces and syslog for both controllers attached * Ntf Application is running on system * Will update ticket with core dump Note: The timings on system are not synced. After every reboot node timings are modified BT: 0 0x0fffa0848100 in .raise () from /lib64/libc.so.6 1 0x0fffa0849d10 in .abort () from /lib64/libc.so.6 2 0x0fffa0e34234 in osaf_abort (i_cause=7) at osaf_utility.c:27 3 0x1001a2f8 in NtfLogger::logNotification (this=0x100ba768, notif= std::tr1::shared_ptr (count 2, weak 0) 0x100b88f0) at NtfLogger.cc:247 4 0x10019e60 in NtfLogger::checkQueueAndLog (this=0x100ba768, newNotif=std::tr1::shared_ptr (count 2, weak 0) 0x100b88f0) at NtfLogger.cc:181 5 0x10019a74 in NtfLogger::log (this=0x100ba768, notif=std::tr1::shared_ptr (count 2, weak 0) 0x100b88f0, isLocal=true) at NtfLogger.cc:137 6 0x1002b528 in NtfAdmin::processNotification (this=0x100ba760, clientId=62, notificationType=SA_NTF_TYPE_ALARM, sendNotInfo=0x100b8800, mdsCtxt=0x100bb3dc, notificationId=47) at NtfAdmin.cc:203 7 0x1002b938 in NtfAdmin::notificationReceived (this=0x100ba760, clientId=62, notificationType=SA_NTF_TYPE_ALARM, sendNotInfo=0x100b8800, mdsCtxt=0x100bb3dc) at NtfAdmin.cc:257 8 0x1002ec20 in notificationReceived (clientId=62, notificationType=SA_NTF_TYPE_ALARM, sendNotInfo=0x100b8800, mdsCtxt=0x100bb3dc) at NtfAdmin.cc:1012 9 0x10006410 in proc_send_not_msg (cb=0x10073190 <_ntfs_cb>, evt=0x100bb3d0) at ntfs_evt.c:447 10 0x10006b28 in process_api_evt (evt=0x100bb3d0) at ntfs_evt.c:628 11 0x10006c38 in ntfs_process_mbx (mbx=0x10073190 <_ntfs_cb>) at ntfs_evt.c:660 12 0x1000b6f4 in main (argc=2, argv=0xfffc056a7f8) at ntfs_main.c:399 Active Controler: May 26 19:41:16 linux-pvra osafntfd[24205]: **osaf_abort(7) called from 0x1001a2f8 with errno=11** May 26 19:41:16 linux-pvra osafamfnd[24243]: NO 'safComp=NTF,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'nodeFailfast' May 26 19:41:16 linux-pvra osafamfnd[24243]: ER safComp=NTF,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery is:nodeFailfast May 26 19:41:16 linux-pvra osafamfnd[24243]: Rebooting OpenSAF NodeId = 131343 EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 131343, SupervisionTime = 60 May 26 19:41:16 linux-pvra opensaf_reboot: Rebooting local node; timeout=60 Jun 2 14:11:28 linux-pvra syslog-ng[1639]: syslog-ng starting up; version='2.0.9' Ntf Trace: May 26 19:41:16.767426 osafntfd [24205:lga_api.c:1190] TR logBufSize > strlen(logBuf) + 1 May 26 19:41:16.767436 osafntfd [24205:lga_api.c:1320] << saLogWriteLogAsync Jun 2 14:11:47.153831 osafntfd [2958:ntfs_main.c:0181] >> initialize Jun 2 14:11:47.175099 osafntfd [2958:ncs_main_pub.c:0220] TR NCS:PROCESS_ID=2958 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2006 NTFSv: Cluster rebooted with ntfd crashed on both controllers
AIS states`additionalText` and `lengthAdditionalText` must be consistent. Need to add an check of this. Return INVALID_PARAM if there is a mismatch. --- ** [tickets:#2006] NTFSv: Cluster rebooted with ntfd crashed on both controllers** **Status:** accepted **Milestone:** 5.1.RC1 **Created:** Wed Sep 07, 2016 06:42 AM UTC by Chani Srivastava **Last Updated:** Wed Sep 07, 2016 08:22 AM UTC **Owner:** Vu Minh Nguyen **Attachments:** - [NtfCrash.zip](https://sourceforge.net/p/opensaf/tickets/2006/attachment/NtfCrash.zip) (165.9 kB; application/zip) OS : Suse PPC 64bit Changeset : 7997 ( 5.1.FC) Setup : 4 nodes ( 2 controllers and 2 payloads with headless feature disabled & no PBE ) Ntfd traces and syslog for both controllers attached * Ntf Application is running on system * Will update ticket with core dump Note: The timings on system are not synced. After every reboot node timings are modified BT: 0 0x0fffa0848100 in .raise () from /lib64/libc.so.6 1 0x0fffa0849d10 in .abort () from /lib64/libc.so.6 2 0x0fffa0e34234 in osaf_abort (i_cause=7) at osaf_utility.c:27 3 0x1001a2f8 in NtfLogger::logNotification (this=0x100ba768, notif= std::tr1::shared_ptr (count 2, weak 0) 0x100b88f0) at NtfLogger.cc:247 4 0x10019e60 in NtfLogger::checkQueueAndLog (this=0x100ba768, newNotif=std::tr1::shared_ptr (count 2, weak 0) 0x100b88f0) at NtfLogger.cc:181 5 0x10019a74 in NtfLogger::log (this=0x100ba768, notif=std::tr1::shared_ptr (count 2, weak 0) 0x100b88f0, isLocal=true) at NtfLogger.cc:137 6 0x1002b528 in NtfAdmin::processNotification (this=0x100ba760, clientId=62, notificationType=SA_NTF_TYPE_ALARM, sendNotInfo=0x100b8800, mdsCtxt=0x100bb3dc, notificationId=47) at NtfAdmin.cc:203 7 0x1002b938 in NtfAdmin::notificationReceived (this=0x100ba760, clientId=62, notificationType=SA_NTF_TYPE_ALARM, sendNotInfo=0x100b8800, mdsCtxt=0x100bb3dc) at NtfAdmin.cc:257 8 0x1002ec20 in notificationReceived (clientId=62, notificationType=SA_NTF_TYPE_ALARM, sendNotInfo=0x100b8800, mdsCtxt=0x100bb3dc) at NtfAdmin.cc:1012 9 0x10006410 in proc_send_not_msg (cb=0x10073190 <_ntfs_cb>, evt=0x100bb3d0) at ntfs_evt.c:447 10 0x10006b28 in process_api_evt (evt=0x100bb3d0) at ntfs_evt.c:628 11 0x10006c38 in ntfs_process_mbx (mbx=0x10073190 <_ntfs_cb>) at ntfs_evt.c:660 12 0x1000b6f4 in main (argc=2, argv=0xfffc056a7f8) at ntfs_main.c:399 Active Controler: May 26 19:41:16 linux-pvra osafntfd[24205]: **osaf_abort(7) called from 0x1001a2f8 with errno=11** May 26 19:41:16 linux-pvra osafamfnd[24243]: NO 'safComp=NTF,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'nodeFailfast' May 26 19:41:16 linux-pvra osafamfnd[24243]: ER safComp=NTF,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery is:nodeFailfast May 26 19:41:16 linux-pvra osafamfnd[24243]: Rebooting OpenSAF NodeId = 131343 EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 131343, SupervisionTime = 60 May 26 19:41:16 linux-pvra opensaf_reboot: Rebooting local node; timeout=60 Jun 2 14:11:28 linux-pvra syslog-ng[1639]: syslog-ng starting up; version='2.0.9' Ntf Trace: May 26 19:41:16.767426 osafntfd [24205:lga_api.c:1190] TR logBufSize > strlen(logBuf) + 1 May 26 19:41:16.767436 osafntfd [24205:lga_api.c:1320] << saLogWriteLogAsync Jun 2 14:11:47.153831 osafntfd [2958:ntfs_main.c:0181] >> initialize Jun 2 14:11:47.175099 osafntfd [2958:ncs_main_pub.c:0220] TR NCS:PROCESS_ID=2958 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2006 NTFSv: Cluster rebooted with ntfd crashed on both controllers
- **status**: unassigned --> accepted - **assigned_to**: Vu Minh Nguyen --- ** [tickets:#2006] NTFSv: Cluster rebooted with ntfd crashed on both controllers** **Status:** accepted **Milestone:** 5.1.RC1 **Created:** Wed Sep 07, 2016 06:42 AM UTC by Chani Srivastava **Last Updated:** Wed Sep 07, 2016 08:17 AM UTC **Owner:** Vu Minh Nguyen **Attachments:** - [NtfCrash.zip](https://sourceforge.net/p/opensaf/tickets/2006/attachment/NtfCrash.zip) (165.9 kB; application/zip) OS : Suse PPC 64bit Changeset : 7997 ( 5.1.FC) Setup : 4 nodes ( 2 controllers and 2 payloads with headless feature disabled & no PBE ) Ntfd traces and syslog for both controllers attached * Ntf Application is running on system * Will update ticket with core dump Note: The timings on system are not synced. After every reboot node timings are modified BT: 0 0x0fffa0848100 in .raise () from /lib64/libc.so.6 1 0x0fffa0849d10 in .abort () from /lib64/libc.so.6 2 0x0fffa0e34234 in osaf_abort (i_cause=7) at osaf_utility.c:27 3 0x1001a2f8 in NtfLogger::logNotification (this=0x100ba768, notif= std::tr1::shared_ptr (count 2, weak 0) 0x100b88f0) at NtfLogger.cc:247 4 0x10019e60 in NtfLogger::checkQueueAndLog (this=0x100ba768, newNotif=std::tr1::shared_ptr (count 2, weak 0) 0x100b88f0) at NtfLogger.cc:181 5 0x10019a74 in NtfLogger::log (this=0x100ba768, notif=std::tr1::shared_ptr (count 2, weak 0) 0x100b88f0, isLocal=true) at NtfLogger.cc:137 6 0x1002b528 in NtfAdmin::processNotification (this=0x100ba760, clientId=62, notificationType=SA_NTF_TYPE_ALARM, sendNotInfo=0x100b8800, mdsCtxt=0x100bb3dc, notificationId=47) at NtfAdmin.cc:203 7 0x1002b938 in NtfAdmin::notificationReceived (this=0x100ba760, clientId=62, notificationType=SA_NTF_TYPE_ALARM, sendNotInfo=0x100b8800, mdsCtxt=0x100bb3dc) at NtfAdmin.cc:257 8 0x1002ec20 in notificationReceived (clientId=62, notificationType=SA_NTF_TYPE_ALARM, sendNotInfo=0x100b8800, mdsCtxt=0x100bb3dc) at NtfAdmin.cc:1012 9 0x10006410 in proc_send_not_msg (cb=0x10073190 <_ntfs_cb>, evt=0x100bb3d0) at ntfs_evt.c:447 10 0x10006b28 in process_api_evt (evt=0x100bb3d0) at ntfs_evt.c:628 11 0x10006c38 in ntfs_process_mbx (mbx=0x10073190 <_ntfs_cb>) at ntfs_evt.c:660 12 0x1000b6f4 in main (argc=2, argv=0xfffc056a7f8) at ntfs_main.c:399 Active Controler: May 26 19:41:16 linux-pvra osafntfd[24205]: **osaf_abort(7) called from 0x1001a2f8 with errno=11** May 26 19:41:16 linux-pvra osafamfnd[24243]: NO 'safComp=NTF,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'nodeFailfast' May 26 19:41:16 linux-pvra osafamfnd[24243]: ER safComp=NTF,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery is:nodeFailfast May 26 19:41:16 linux-pvra osafamfnd[24243]: Rebooting OpenSAF NodeId = 131343 EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 131343, SupervisionTime = 60 May 26 19:41:16 linux-pvra opensaf_reboot: Rebooting local node; timeout=60 Jun 2 14:11:28 linux-pvra syslog-ng[1639]: syslog-ng starting up; version='2.0.9' Ntf Trace: May 26 19:41:16.767426 osafntfd [24205:lga_api.c:1190] TR logBufSize > strlen(logBuf) + 1 May 26 19:41:16.767436 osafntfd [24205:lga_api.c:1320] << saLogWriteLogAsync Jun 2 14:11:47.153831 osafntfd [2958:ntfs_main.c:0181] >> initialize Jun 2 14:11:47.175099 osafntfd [2958:ncs_main_pub.c:0220] TR NCS:PROCESS_ID=2958 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2006 NTFSv: Cluster rebooted with ntfd crashed on both controllers
A bug in the test app shouldn't cause the NTF server to crash. --- ** [tickets:#2006] NTFSv: Cluster rebooted with ntfd crashed on both controllers** **Status:** unassigned **Milestone:** 5.1.RC1 **Created:** Wed Sep 07, 2016 06:42 AM UTC by Chani Srivastava **Last Updated:** Wed Sep 07, 2016 07:24 AM UTC **Owner:** nobody **Attachments:** - [NtfCrash.zip](https://sourceforge.net/p/opensaf/tickets/2006/attachment/NtfCrash.zip) (165.9 kB; application/zip) OS : Suse PPC 64bit Changeset : 7997 ( 5.1.FC) Setup : 4 nodes ( 2 controllers and 2 payloads with headless feature disabled & no PBE ) Ntfd traces and syslog for both controllers attached * Ntf Application is running on system * Will update ticket with core dump Note: The timings on system are not synced. After every reboot node timings are modified BT: 0 0x0fffa0848100 in .raise () from /lib64/libc.so.6 1 0x0fffa0849d10 in .abort () from /lib64/libc.so.6 2 0x0fffa0e34234 in osaf_abort (i_cause=7) at osaf_utility.c:27 3 0x1001a2f8 in NtfLogger::logNotification (this=0x100ba768, notif= std::tr1::shared_ptr (count 2, weak 0) 0x100b88f0) at NtfLogger.cc:247 4 0x10019e60 in NtfLogger::checkQueueAndLog (this=0x100ba768, newNotif=std::tr1::shared_ptr (count 2, weak 0) 0x100b88f0) at NtfLogger.cc:181 5 0x10019a74 in NtfLogger::log (this=0x100ba768, notif=std::tr1::shared_ptr (count 2, weak 0) 0x100b88f0, isLocal=true) at NtfLogger.cc:137 6 0x1002b528 in NtfAdmin::processNotification (this=0x100ba760, clientId=62, notificationType=SA_NTF_TYPE_ALARM, sendNotInfo=0x100b8800, mdsCtxt=0x100bb3dc, notificationId=47) at NtfAdmin.cc:203 7 0x1002b938 in NtfAdmin::notificationReceived (this=0x100ba760, clientId=62, notificationType=SA_NTF_TYPE_ALARM, sendNotInfo=0x100b8800, mdsCtxt=0x100bb3dc) at NtfAdmin.cc:257 8 0x1002ec20 in notificationReceived (clientId=62, notificationType=SA_NTF_TYPE_ALARM, sendNotInfo=0x100b8800, mdsCtxt=0x100bb3dc) at NtfAdmin.cc:1012 9 0x10006410 in proc_send_not_msg (cb=0x10073190 <_ntfs_cb>, evt=0x100bb3d0) at ntfs_evt.c:447 10 0x10006b28 in process_api_evt (evt=0x100bb3d0) at ntfs_evt.c:628 11 0x10006c38 in ntfs_process_mbx (mbx=0x10073190 <_ntfs_cb>) at ntfs_evt.c:660 12 0x1000b6f4 in main (argc=2, argv=0xfffc056a7f8) at ntfs_main.c:399 Active Controler: May 26 19:41:16 linux-pvra osafntfd[24205]: **osaf_abort(7) called from 0x1001a2f8 with errno=11** May 26 19:41:16 linux-pvra osafamfnd[24243]: NO 'safComp=NTF,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'nodeFailfast' May 26 19:41:16 linux-pvra osafamfnd[24243]: ER safComp=NTF,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery is:nodeFailfast May 26 19:41:16 linux-pvra osafamfnd[24243]: Rebooting OpenSAF NodeId = 131343 EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 131343, SupervisionTime = 60 May 26 19:41:16 linux-pvra opensaf_reboot: Rebooting local node; timeout=60 Jun 2 14:11:28 linux-pvra syslog-ng[1639]: syslog-ng starting up; version='2.0.9' Ntf Trace: May 26 19:41:16.767426 osafntfd [24205:lga_api.c:1190] TR logBufSize > strlen(logBuf) + 1 May 26 19:41:16.767436 osafntfd [24205:lga_api.c:1320] << saLogWriteLogAsync Jun 2 14:11:47.153831 osafntfd [2958:ntfs_main.c:0181] >> initialize Jun 2 14:11:47.175099 osafntfd [2958:ncs_main_pub.c:0220] TR NCS:PROCESS_ID=2958 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2006 NTFSv: Cluster rebooted with ntfd crashed on both controllers
I see this trace `logBufSize > strlen(logBuf) + 1`. This happened because log client (ntf) sent mismatch data in `SaLogBufferT`. Please check your test app if it have constructed the data `SaLogbufferT` is correct or not. Be sure the `logBufSize` must less than or equal to `strlen(logBuf) + 1`. --- ** [tickets:#2006] NTFSv: Cluster rebooted with ntfd crashed on both controllers** **Status:** unassigned **Milestone:** 5.1.RC1 **Created:** Wed Sep 07, 2016 06:42 AM UTC by Chani Srivastava **Last Updated:** Wed Sep 07, 2016 07:13 AM UTC **Owner:** nobody **Attachments:** - [NtfCrash.zip](https://sourceforge.net/p/opensaf/tickets/2006/attachment/NtfCrash.zip) (165.9 kB; application/zip) OS : Suse PPC 64bit Changeset : 7997 ( 5.1.FC) Setup : 4 nodes ( 2 controllers and 2 payloads with headless feature disabled & no PBE ) Ntfd traces and syslog for both controllers attached * Ntf Application is running on system * Will update ticket with core dump Note: The timings on system are not synced. After every reboot node timings are modified BT: 0 0x0fffa0848100 in .raise () from /lib64/libc.so.6 1 0x0fffa0849d10 in .abort () from /lib64/libc.so.6 2 0x0fffa0e34234 in osaf_abort (i_cause=7) at osaf_utility.c:27 3 0x1001a2f8 in NtfLogger::logNotification (this=0x100ba768, notif= std::tr1::shared_ptr (count 2, weak 0) 0x100b88f0) at NtfLogger.cc:247 4 0x10019e60 in NtfLogger::checkQueueAndLog (this=0x100ba768, newNotif=std::tr1::shared_ptr (count 2, weak 0) 0x100b88f0) at NtfLogger.cc:181 5 0x10019a74 in NtfLogger::log (this=0x100ba768, notif=std::tr1::shared_ptr (count 2, weak 0) 0x100b88f0, isLocal=true) at NtfLogger.cc:137 6 0x1002b528 in NtfAdmin::processNotification (this=0x100ba760, clientId=62, notificationType=SA_NTF_TYPE_ALARM, sendNotInfo=0x100b8800, mdsCtxt=0x100bb3dc, notificationId=47) at NtfAdmin.cc:203 7 0x1002b938 in NtfAdmin::notificationReceived (this=0x100ba760, clientId=62, notificationType=SA_NTF_TYPE_ALARM, sendNotInfo=0x100b8800, mdsCtxt=0x100bb3dc) at NtfAdmin.cc:257 8 0x1002ec20 in notificationReceived (clientId=62, notificationType=SA_NTF_TYPE_ALARM, sendNotInfo=0x100b8800, mdsCtxt=0x100bb3dc) at NtfAdmin.cc:1012 9 0x10006410 in proc_send_not_msg (cb=0x10073190 <_ntfs_cb>, evt=0x100bb3d0) at ntfs_evt.c:447 10 0x10006b28 in process_api_evt (evt=0x100bb3d0) at ntfs_evt.c:628 11 0x10006c38 in ntfs_process_mbx (mbx=0x10073190 <_ntfs_cb>) at ntfs_evt.c:660 12 0x1000b6f4 in main (argc=2, argv=0xfffc056a7f8) at ntfs_main.c:399 Active Controler: May 26 19:41:16 linux-pvra osafntfd[24205]: **osaf_abort(7) called from 0x1001a2f8 with errno=11** May 26 19:41:16 linux-pvra osafamfnd[24243]: NO 'safComp=NTF,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'nodeFailfast' May 26 19:41:16 linux-pvra osafamfnd[24243]: ER safComp=NTF,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery is:nodeFailfast May 26 19:41:16 linux-pvra osafamfnd[24243]: Rebooting OpenSAF NodeId = 131343 EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 131343, SupervisionTime = 60 May 26 19:41:16 linux-pvra opensaf_reboot: Rebooting local node; timeout=60 Jun 2 14:11:28 linux-pvra syslog-ng[1639]: syslog-ng starting up; version='2.0.9' Ntf Trace: May 26 19:41:16.767426 osafntfd [24205:lga_api.c:1190] TR logBufSize > strlen(logBuf) + 1 May 26 19:41:16.767436 osafntfd [24205:lga_api.c:1320] << saLogWriteLogAsync Jun 2 14:11:47.153831 osafntfd [2958:ntfs_main.c:0181] >> initialize Jun 2 14:11:47.175099 osafntfd [2958:ncs_main_pub.c:0220] TR NCS:PROCESS_ID=2958 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #2006 NTFSv: Cluster rebooted with ntfd crashed on both controllers
- Description has changed: Diff: --- old +++ new @@ -7,6 +7,28 @@ * Will update ticket with core dump Note: The timings on system are not synced. After every reboot node timings are modified + +BT: +0 0x0fffa0848100 in .raise () from /lib64/libc.so.6 +1 0x0fffa0849d10 in .abort () from /lib64/libc.so.6 +2 0x0fffa0e34234 in osaf_abort (i_cause=7) at osaf_utility.c:27 +3 0x1001a2f8 in NtfLogger::logNotification (this=0x100ba768, notif= +std::tr1::shared_ptr (count 2, weak 0) 0x100b88f0) at NtfLogger.cc:247 +4 0x10019e60 in NtfLogger::checkQueueAndLog (this=0x100ba768, +newNotif=std::tr1::shared_ptr (count 2, weak 0) 0x100b88f0) at NtfLogger.cc:181 +5 0x10019a74 in NtfLogger::log (this=0x100ba768, notif=std::tr1::shared_ptr (count 2, weak 0) 0x100b88f0, +isLocal=true) at NtfLogger.cc:137 +6 0x1002b528 in NtfAdmin::processNotification (this=0x100ba760, clientId=62, notificationType=SA_NTF_TYPE_ALARM, +sendNotInfo=0x100b8800, mdsCtxt=0x100bb3dc, notificationId=47) at NtfAdmin.cc:203 +7 0x1002b938 in NtfAdmin::notificationReceived (this=0x100ba760, clientId=62, notificationType=SA_NTF_TYPE_ALARM, +sendNotInfo=0x100b8800, mdsCtxt=0x100bb3dc) at NtfAdmin.cc:257 +8 0x1002ec20 in notificationReceived (clientId=62, notificationType=SA_NTF_TYPE_ALARM, sendNotInfo=0x100b8800, +mdsCtxt=0x100bb3dc) at NtfAdmin.cc:1012 +9 0x10006410 in proc_send_not_msg (cb=0x10073190 <_ntfs_cb>, evt=0x100bb3d0) at ntfs_evt.c:447 +10 0x10006b28 in process_api_evt (evt=0x100bb3d0) at ntfs_evt.c:628 +11 0x10006c38 in ntfs_process_mbx (mbx=0x10073190 <_ntfs_cb>) at ntfs_evt.c:660 +12 0x1000b6f4 in main (argc=2, argv=0xfffc056a7f8) at ntfs_main.c:399 + Active Controler: May 26 19:41:16 linux-pvra osafntfd[24205]: **osaf_abort(7) called from 0x1001a2f8 with errno=11** --- ** [tickets:#2006] NTFSv: Cluster rebooted with ntfd crashed on both controllers** **Status:** unassigned **Milestone:** 5.1.RC1 **Created:** Wed Sep 07, 2016 06:42 AM UTC by Chani Srivastava **Last Updated:** Wed Sep 07, 2016 06:42 AM UTC **Owner:** nobody **Attachments:** - [NtfCrash.zip](https://sourceforge.net/p/opensaf/tickets/2006/attachment/NtfCrash.zip) (165.9 kB; application/zip) OS : Suse PPC 64bit Changeset : 7997 ( 5.1.FC) Setup : 4 nodes ( 2 controllers and 2 payloads with headless feature disabled & no PBE ) Ntfd traces and syslog for both controllers attached * Ntf Application is running on system * Will update ticket with core dump Note: The timings on system are not synced. After every reboot node timings are modified BT: 0 0x0fffa0848100 in .raise () from /lib64/libc.so.6 1 0x0fffa0849d10 in .abort () from /lib64/libc.so.6 2 0x0fffa0e34234 in osaf_abort (i_cause=7) at osaf_utility.c:27 3 0x1001a2f8 in NtfLogger::logNotification (this=0x100ba768, notif= std::tr1::shared_ptr (count 2, weak 0) 0x100b88f0) at NtfLogger.cc:247 4 0x10019e60 in NtfLogger::checkQueueAndLog (this=0x100ba768, newNotif=std::tr1::shared_ptr (count 2, weak 0) 0x100b88f0) at NtfLogger.cc:181 5 0x10019a74 in NtfLogger::log (this=0x100ba768, notif=std::tr1::shared_ptr (count 2, weak 0) 0x100b88f0, isLocal=true) at NtfLogger.cc:137 6 0x1002b528 in NtfAdmin::processNotification (this=0x100ba760, clientId=62, notificationType=SA_NTF_TYPE_ALARM, sendNotInfo=0x100b8800, mdsCtxt=0x100bb3dc, notificationId=47) at NtfAdmin.cc:203 7 0x1002b938 in NtfAdmin::notificationReceived (this=0x100ba760, clientId=62, notificationType=SA_NTF_TYPE_ALARM, sendNotInfo=0x100b8800, mdsCtxt=0x100bb3dc) at NtfAdmin.cc:257 8 0x1002ec20 in notificationReceived (clientId=62, notificationType=SA_NTF_TYPE_ALARM, sendNotInfo=0x100b8800, mdsCtxt=0x100bb3dc) at NtfAdmin.cc:1012 9 0x10006410 in proc_send_not_msg (cb=0x10073190 <_ntfs_cb>, evt=0x100bb3d0) at ntfs_evt.c:447 10 0x10006b28 in process_api_evt (evt=0x100bb3d0) at ntfs_evt.c:628 11 0x10006c38 in ntfs_process_mbx (mbx=0x10073190 <_ntfs_cb>) at ntfs_evt.c:660 12 0x1000b6f4 in main (argc=2, argv=0xfffc056a7f8) at ntfs_main.c:399 Active Controler: May 26 19:41:16 linux-pvra osafntfd[24205]: **osaf_abort(7) called from 0x1001a2f8 with errno=11** May 26 19:41:16 linux-pvra osafamfnd[24243]: NO 'safComp=NTF,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'nodeFailfast' May 26 19:41:16 linux-pvra osafamfnd[24243]: ER safComp=NTF,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery is:nodeFailfast May 26 19:41:16 linux-pvra osafamfnd[24243]: Rebooting OpenSAF NodeId = 131343 EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 131343, SupervisionTime = 60 May 26 19:41:16 linux-pvra opensaf_reboot: Rebooting local node; timeout=60 Jun 2 14:11:28 linux-pvra syslog-ng[1639]: syslog-ng
[tickets] [opensaf:tickets] #2006 NTFSv: Cluster rebooted with ntfd crashed on both controllers
--- ** [tickets:#2006] NTFSv: Cluster rebooted with ntfd crashed on both controllers** **Status:** unassigned **Milestone:** 5.1.RC1 **Created:** Wed Sep 07, 2016 06:42 AM UTC by Chani Srivastava **Last Updated:** Wed Sep 07, 2016 06:42 AM UTC **Owner:** nobody **Attachments:** - [NtfCrash.zip](https://sourceforge.net/p/opensaf/tickets/2006/attachment/NtfCrash.zip) (165.9 kB; application/zip) OS : Suse PPC 64bit Changeset : 7997 ( 5.1.FC) Setup : 4 nodes ( 2 controllers and 2 payloads with headless feature disabled & no PBE ) Ntfd traces and syslog for both controllers attached * Ntf Application is running on system * Will update ticket with core dump Note: The timings on system are not synced. After every reboot node timings are modified Active Controler: May 26 19:41:16 linux-pvra osafntfd[24205]: **osaf_abort(7) called from 0x1001a2f8 with errno=11** May 26 19:41:16 linux-pvra osafamfnd[24243]: NO 'safComp=NTF,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'nodeFailfast' May 26 19:41:16 linux-pvra osafamfnd[24243]: ER safComp=NTF,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery is:nodeFailfast May 26 19:41:16 linux-pvra osafamfnd[24243]: Rebooting OpenSAF NodeId = 131343 EE Name = , Reason: Component faulted: recovery is node failfast, OwnNodeId = 131343, SupervisionTime = 60 May 26 19:41:16 linux-pvra opensaf_reboot: Rebooting local node; timeout=60 Jun 2 14:11:28 linux-pvra syslog-ng[1639]: syslog-ng starting up; version='2.0.9' Ntf Trace: May 26 19:41:16.767426 osafntfd [24205:lga_api.c:1190] TR logBufSize > strlen(logBuf) + 1 May 26 19:41:16.767436 osafntfd [24205:lga_api.c:1320] << saLogWriteLogAsync Jun 2 14:11:47.153831 osafntfd [2958:ntfs_main.c:0181] >> initialize Jun 2 14:11:47.175099 osafntfd [2958:ncs_main_pub.c:0220] TR NCS:PROCESS_ID=2958 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- ___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets