[tickets] [opensaf:tickets] #536 LOG: Incorrect handling of "partial write" when writing log record to file
- **assigned_to**: elunlen --- ** [tickets:#536] LOG: Incorrect handling of "partial write" when writing log record to file** **Status:** unassigned **Created:** Thu Aug 08, 2013 10:59 AM UTC by elunlen **Last Updated:** Thu Aug 15, 2013 01:54 PM UTC **Owner:** elunlen In the "partial write check" number of byte written is checked against params_in fixedLogRecordSize instead of the actual number of bytes in the buffer to write. The buffer should always contain the number of bytes that corresponds to the setting in params_in fixedLogRecordSize. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Get 100% visibility into Java/.NET code with AppDynamics Lite! It's a free troubleshooting tool designed for production. Get down to code-level detail for bottlenecks, with <2% overhead. Download for free and get started troubleshooting in minutes. http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #536 LOG: Incorrect handling of "partial write" when writing log record to file
This ticket shall be fixed for 4.2 and 4.3. For devel branch (4.4) this problem is fixed with ticket [#9]. --- ** [tickets:#536] LOG: Incorrect handling of "partial write" when writing log record to file** **Status:** unassigned **Created:** Thu Aug 08, 2013 10:59 AM UTC by elunlen **Last Updated:** Thu Aug 08, 2013 10:59 AM UTC **Owner:** nobody In the "partial write check" number of byte written is checked against params_in fixedLogRecordSize instead of the actual number of bytes in the buffer to write. The buffer should always contain the number of bytes that corresponds to the setting in params_in fixedLogRecordSize. --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Get 100% visibility into Java/.NET code with AppDynamics Lite! It's a free troubleshooting tool designed for production. Get down to code-level detail for bottlenecks, with <2% overhead. Download for free and get started troubleshooting in minutes. http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #130 LOG: Logd crashed on active controller when saLogStreamLogFullHaltThreshold value is changed to invalid for configured application streams
- **status**: assigned --> unassigned - **assigned_to**: carl johannesson --> elunlen --- ** [tickets:#130] LOG: Logd crashed on active controller when saLogStreamLogFullHaltThreshold value is changed to invalid for configured application streams** **Status:** unassigned **Created:** Mon May 13, 2013 09:52 AM UTC by elunlen **Last Updated:** Mon May 13, 2013 09:59 AM UTC **Owner:** elunlen Logd crashed on active controller when the value for of the saLogStreamLogFullHaltThreshold is modified to invalid for the configured application streams. Steps to Reproduce: === 1)Created application streams using immcfg. 2)Modified the saLogStreamLogFullHaltThreshold attribute value from default value to invalid value through immcfg. After changing the attribute value logd got crashed on active controller and node went for reboot. == snippet from the /var/log/messages. === Feb 25 12:39:06 SLES-64BIT-SLOT1 osafimmnd[2493]: NO Implementer connected: 15 (MsgQueueService131599) <0, 2020f> Feb 25 12:39:35 SLES-64BIT-SLOT1 sshd[3047]: Accepted keyboard-interactive/pam for root from 192.168.56.1 port 33674 ssh2 Feb 25 12:41:13 SLES-64BIT-SLOT1 osafimmnd[2493]: NO Ccb 2 COMMITTED (immcfg_SLES-64BIT-SLOT4_2819) Feb 25 12:41:13 SLES-64BIT-SLOT1 osafamfnd[2586]: NO 'safComp=LOG,safSu=SC-1,safSg=2N,safApp=OpenSAF' faulted due to 'avaDown' : Recovery is 'nodeFailfast' Feb 25 12:41:13 SLES-64BIT-SLOT1 osafamfnd[2586]: ER safComp=LOG,safSu=SC-1,safSg=2N,safApp=OpenSAF Faulted due to:avaDown Recovery is:nodeFailfast Feb 25 12:41:13 SLES-64BIT-SLOT1 osafamfnd[2586]: Rebooting OpenSAF NodeId = 131343 EE Name = , Reason: Component faulted: recovery is node failfast Feb 25 12:41:13 SLES-64BIT-SLOT1 kernel: [ 291.046073] osaflogd[2530]: segfault at 0 ip 00410201 sp 7fffe31bdc80 error 4 in osaflogd[40+26000] Feb 25 12:41:13 SLES-64BIT-SLOT1 osafimmnd[2493]: NO Implementer locally disconnected. Marking it as doomed 1 <4, 2010f> (safLogService) Feb 25 12:41:13 SLES-64BIT-SLOT1 osafimmnd[2493]: NO Implementer disconnected 1 <4, 2010f> (safLogService) Feb 25 12:41:13 SLES-64BIT-SLOT1 opensaf_reboot: Rebooting local node === Core file is not generated. Migrated from devel.opensaf.org #3024 --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Get 100% visibility into Java/.NET code with AppDynamics Lite! It's a free troubleshooting tool designed for production. Get down to code-level detail for bottlenecks, with <2% overhead. Download for free and get started troubleshooting in minutes. http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #448 LOG: Incorrect validity check for saLogStreamFileName
A possible solution is to construct a path to the .cfg file using root path + relative path + file name + .cfg example: + + + <.cfg> --- ** [tickets:#448] LOG: Incorrect validity check for saLogStreamFileName** **Status:** unassigned **Created:** Mon Jun 10, 2013 12:01 PM UTC by elunlen **Last Updated:** Thu Aug 15, 2013 01:16 PM UTC **Owner:** elunlen Validity check when changing saLogStreamFileName via IMM adm interace is incorrect. Will always return information saying that the file name does not already exists. The content of attribute saLogStreamFileName is used for checking not a complete path to any file. See file lgs_imm.c function check_attr_validity(..) --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Get 100% visibility into Java/.NET code with AppDynamics Lite! It's a free troubleshooting tool designed for production. Get down to code-level detail for bottlenecks, with <2% overhead. Download for free and get started troubleshooting in minutes. http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #448 LOG: Incorrect validity check for saLogStreamFileName
- Description has changed: Diff: --- old +++ new @@ -1,2 +1,2 @@ -Validity check when changing saLogStreamFileName via IMM adm interace is incorrect. Will always return information saying that the file name does not already exists. +Validity check when changing saLogStreamFileName via IMM adm interace is incorrect. Will always return information saying that the file name does not already exists. The content of attribute saLogStreamFileName is used for checking not a complete path to any file. See file lgs_imm.c function check_attr_validity(..) --- ** [tickets:#448] LOG: Incorrect validity check for saLogStreamFileName** **Status:** unassigned **Created:** Mon Jun 10, 2013 12:01 PM UTC by elunlen **Last Updated:** Mon Jun 10, 2013 12:01 PM UTC **Owner:** elunlen Validity check when changing saLogStreamFileName via IMM adm interace is incorrect. Will always return information saying that the file name does not already exists. The content of attribute saLogStreamFileName is used for checking not a complete path to any file. See file lgs_imm.c function check_attr_validity(..) --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Get 100% visibility into Java/.NET code with AppDynamics Lite! It's a free troubleshooting tool designed for production. Get down to code-level detail for bottlenecks, with <2% overhead. Download for free and get started troubleshooting in minutes. http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets
[tickets] [opensaf:tickets] #516 Amfd: calling immutil_saImmOiImplementerClear in avd_mds_qsd_role_evh leads to amfnd sending SIGABRT to amfd
Hans N and I made some progress on this one. Analysis: Test case: cluster restart * Due to the #540 bug the old active controller "hangs" in shutdown. The state here is probably that the RDE process is killed. * The old standby controller is rebooted and comes up, RDE detects no peer an assumes ACTIVE role. * When the above happens MDS informs the old active amfd process that it is now QUIESCED. From the MDS documentation: "MDS_CALLBACK_QUIESCED_ACK This callback informs the MDS Client that it is being moved to standby HA-state. The callback may be due to the result of the MDS Client switching to quiesced HA-state or due to a competing MDS-Client showing up in active HA-state." the last part is what happens here. * This fools the old active amfd into thinking a controller switch-over is going on. And it calls saImmOiImplementerClear(). This call eventually times out and abort() is called. The reason it fails is that the local immnd process (and immd) has already been killed. This is basically a controller split brain. RDE should optimally fence the peer controller before going active. At least it could add some more evidence into the decision. There are already artifacts for RDE. The proposed small change in amfd is to check that switch-over is pending in the avd_mds_qsd_role_evh() event handler. If not just log and exit. amfnd will detect that and reboot the node. No core dump generated. Should even be easy to reproduce. --- ** [tickets:#516] Amfd: calling immutil_saImmOiImplementerClear in avd_mds_qsd_role_evh leads to amfnd sending SIGABRT to amfd** **Status:** assigned **Created:** Tue Jul 23, 2013 02:39 PM UTC by hano **Last Updated:** Wed Aug 14, 2013 11:42 AM UTC **Owner:** Praveen osafamfd is "supervised" by osafamfnd through osafamfd is sending "heartbeats" to osafamfnd. If no "heartbeats" are recievied within one minute, osafamfnd will send an abort signal to osafamfd which then will abort, (produce an core dump and exit). The reason why osafamfd is not sending any "heartbeats" below is due to that osafamfd has got a role change message from MDS (Active to Quiesced) and calls immutil_saImmOiImplementerClear. IMM is not responding, osafamfd waits and is not sending any "heartbeats" and will be aborted by osafamfnd. There are several cases with this behavior and amfd should not call immutil_saImmOiImplementerClear but instead call saImmOiImplementerClear and handle the return code and retry logic in avd_main_proc poll loop instead to avoid these core dumps and make amf responsive. --- Core was generated by `/usr/lib64/opensaf/osafamfd'. Program terminated with signal 6, Aborted. #0 0x7f08e45b6dfd in nanosleep () from /lib64/libc.so.6 (gdb) bt full #0 0x7f08e45b6dfd in nanosleep () from /lib64/libc.so.6 No symbol table info available. #1 0x7f08e45e2824 in usleep () from /lib64/libc.so.6 No symbol table info available. #2 0x00407506 in immutil_saImmOiImplementerClear (immOiHandle=94489411855) at ../../../../../osaf/tools/safimm/src/immutil.c:1042 rc = nTries = 54 #3 0x0043492a in avd_mds_qsd_role_evh (cb=0x69c980, evt=) at avd_role.c:573 status = rc = __FUNCTION__ = #4 0x0043341d in avd_process_event (cb_now=0x69c980, evt=0x7ff160) at avd_proc.c:591 __FUNCTION__ = #5 0x004336a1 in avd_main_proc () at avd_proc.c:507 pollretval = cb = 0x69c980 evt = 0x7ff160 mbx_fd = error = polltmo = -1 #6 0x004096bd in main (argc=, argv=) at amfd_main.c:47 error = 0 node_id = (gdb) quit --- Sent from sourceforge.net because opensaf-tickets@lists.sourceforge.net is subscribed to https://sourceforge.net/p/opensaf/tickets/ To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/opensaf/admin/tickets/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.-- Get 100% visibility into Java/.NET code with AppDynamics Lite! It's a free troubleshooting tool designed for production. Get down to code-level detail for bottlenecks, with <2% overhead. Download for free and get started troubleshooting in minutes. http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk___ Opensaf-tickets mailing list Opensaf-tickets@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-tickets