Hi Andrew, Agree with you that time(NULL) is not reliable and may create unwanted results when the time is changed. Monotonic time is more reliable. I'll create a ticket for changing time(NULL) to monotonic time.
If you are working on this issue and have a patch, you can send it to the review. BR, Zoran -----Original Message----- From: Andrew Riley [mailto:[email protected]] Sent: Wednesday, December 02, 2015 10:26 PM To: [email protected] Subject: [users] System time change triggers a timeout in immModel_cleanTheBasement, osafamfnd aborts Hi, Running OpenSAF 4.5.1, at system initialization we can have a race condition on our payload cards between OpenSAF startup and housekeeping functions that update the system time. If the system clock is updated at the wrong point, the time check loop in immModel_cleanTheBasement decides there was a timeout. >From our logs (the time jump is due to the clock set, not a gap in >processing) 2015-11-06T14:44:07.488664-05:00 pld0107 osafclmna[3531]: Started 2015-11-06T14:44:07.492894-05:00 pld0107 osafclmna[3531]: NO safNode=pld0107,safCluster=myClmCluster Joined cluster, nodeid=1070f 2015-11-06T14:44:07.523654-05:00 pld0107 osafamfnd[3540]: Started 2015-11-18T15:14:40.889384-05:00 pld0107 osafimmnd[3431]: NO Clear 1 search result(s) for OM handle d0001070f. Search timeout 600sec 2015-11-18T15:14:40.893124-05:00 pld0107 logger: Synced time with server 100.100.0.1 2015-11-18T15:14:40.894797-05:00 pld0107 osafimmnd[3431]: ER Could not find search node for search-ID:14 2015-11-18T15:14:40.895280-05:00 pld0107 osafamfnd[3540]: saImmOmSearchNext FAILED, rc = 9 2015-11-18T15:14:40.896828-05:00 pld0107 logger: Set hardware clock 2015-11-18T15:14:40.915801-05:00 pld0107 osafimmnd[3431]: NO Implementer connected: 28 (MsgQueueService67343) <0, 10f0f> I am working to change the startup sequence to avoid this jump, but immModel_cleanTheBasement is using time(NULL) to get current time and comparing it against cl_node->mLastSearch. Where time() works back to CLOCK_REALTIME, it would seem there's always possibility of a problem, maybe that could change to use osaf_clock_gettime(CLOCK_MONOTONIC... instead? The side-effect of the timeout is that osafamfnd aborts and neither restarts nor triggers any recovery action, so the system sits until our external watchdog kicks off a reboot. Is there a way to configure the system such that it cleans up and restarts? Thanks, Andy Riley ------------------------------------------------------------------------------ Go from Idea to Many App Stores Faster with Intel(R) XDK Give your users amazing mobile app experiences with Intel(R) XDK. Use one codebase in this all-in-one HTML5 development environment. Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs. http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140 _______________________________________________ Opensaf-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-users ------------------------------------------------------------------------------ Go from Idea to Many App Stores Faster with Intel(R) XDK Give your users amazing mobile app experiences with Intel(R) XDK. Use one codebase in this all-in-one HTML5 development environment. Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs. http://pubads.g.doubleclick.net/gampad/clk?id=254741911&iu=/4140 _______________________________________________ Opensaf-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/opensaf-users
