Hi Vu, fine, perhaps also changing the static bool svc_monitor_thread_running = false to std::atomic<bool>?/BR Hans
From: Vu Minh Nguyen <vu.m.ngu...@dektech.com.au> Sent: den 28 februari 2019 09:30 To: Hans Nordebäck <hans.nordeb...@ericsson.com>; Gary Lee <gary....@dektech.com.au> Cc: opensaf-devel@lists.sourceforge.net Subject: RE: [PATCH 1/1] nid: fix opensafd crashed during start-up [#3013] Thanks Hans. I will send the V2 for these updates. Regards, Vu From: Hans Nordebäck <hans.nordeb...@ericsson.com<mailto:hans.nordeb...@ericsson.com>> Sent: Thursday, February 28, 2019 2:16 PM To: Vu Minh Nguyen <vu.m.ngu...@dektech.com.au<mailto:vu.m.ngu...@dektech.com.au>>; Gary Lee <gary....@dektech.com.au<mailto:gary....@dektech.com.au>> Cc: opensaf-devel@lists.sourceforge.net<mailto:opensaf-devel@lists.sourceforge.net> Subject: Re: [PATCH 1/1] nid: fix opensafd crashed during start-up [#3013] Hi Vu, you can keep your patch for the ready state, but also change SOCK_STREAM to SOCK_DGRAM and change the read(svc_mon_thr_fd, nid_name, NID_MAXSNAME) in svc_monitor_thread to recv(svc_mon_thr_fd, nid_name, NID_MAXSNAME, MSG_DONTWAIT) and also handle EAGAIN and EWOULDBLOCK. Then only one nid_name per read/recv will be given instead of several nid_names as in the SOCK_STREAM case. /BR Hans On 2/28/19 05:30, Vu Minh Nguyen wrote: Hi Hans, Thanks for your comment. But I has a concern that the service-monitoring function may not fully work if a service is crashed before the svc_monitor_thread goes to ready state? Is it mandatory for monitoring thread to enter ready state before spawning SAF services? Regards, Vu -----Original Message----- From: Hans Nordebäck <hans.nordeb...@ericsson.com><mailto:hans.nordeb...@ericsson.com> Sent: Wednesday, February 27, 2019 8:23 PM To: Vu Minh Nguyen <vu.m.ngu...@dektech.com.au><mailto:vu.m.ngu...@dektech.com.au>; Gary Lee <gary....@dektech.com.au><mailto:gary....@dektech.com.au> Cc: opensaf-devel@lists.sourceforge.net<mailto:opensaf-devel@lists.sourceforge.net>; Vu Minh Nguyen <vu.m.ngu...@dektech.com.au><mailto:vu.m.ngu...@dektech.com.au> Subject: RE: [PATCH 1/1] nid: fix opensafd crashed during start-up [#3013] Hi Vu, I discussed a bit with Anders, likely it should work if the socketpair is changed to socketpair(AF_UNIX, SOCK_DGRAM .. from SOCK_STREAM. /BR Hans -----Original Message----- From: Hans Nordebäck Sent: den 27 februari 2019 11:55 To: 'Vu Minh Nguyen' <vu.m.ngu...@dektech.com.au><mailto:vu.m.ngu...@dektech.com.au>; Gary Lee <gary....@dektech.com.au><mailto:gary....@dektech.com.au> Cc: opensaf-devel@lists.sourceforge.net<mailto:opensaf-devel@lists.sourceforge.net>; Vu Minh Nguyen <vu.m.ngu...@dektech.com.au><mailto:vu.m.ngu...@dektech.com.au> Subject: RE: [PATCH 1/1] nid: fix opensafd crashed during start-up [#3013] Hi Vu, ack, code review only/Thanks HansN -----Original Message----- From: Vu Minh Nguyen <vu.m.ngu...@dektech.com.au><mailto:vu.m.ngu...@dektech.com.au> Sent: den 27 februari 2019 11:48 To: Hans Nordebäck <hans.nordeb...@ericsson.com><mailto:hans.nordeb...@ericsson.com>; Gary Lee <gary....@dektech.com.au><mailto:gary....@dektech.com.au> Cc: opensaf-devel@lists.sourceforge.net<mailto:opensaf-devel@lists.sourceforge.net>; Vu Minh Nguyen <vu.m.ngu...@dektech.com.au><mailto:vu.m.ngu...@dektech.com.au> Subject: [PATCH 1/1] nid: fix opensafd crashed during start-up [#3013] There is a dependency b/w svc_monitor_thread and spawn_services. The coredump happens when spawn_services is executed while the thread has not yet started. In this case, data is sent to the pipe but no one consumed it. Later on, reading data from the pipe, will get unexpected data and crash the program. This patch ensures the order: svc_monitor_thread must be in ready state before spawn_services() is executed. --- src/nid/nodeinit.cc | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/src/nid/nodeinit.cc b/src/nid/nodeinit.cc index 5f15916b4..b4945b05c 100644 --- a/src/nid/nodeinit.cc +++ b/src/nid/nodeinit.cc @@ -134,6 +134,7 @@ static int start_monitor_svc(const char *svc); /* Data declarations for service monitoring */ static int svc_mon_fd = -1; static int next_svc_fds_slot = 0; +static bool svc_monitor_thread_running = false; struct SAFServices { const std::string fifo_dir = PKGLOCALSTATEDIR; @@ -1517,6 +1518,7 @@ void *svc_monitor_thread(void *fd) { next_svc_fds_slot++; while (true) { + svc_monitor_thread_running = true; unsigned rc = osaf_poll(fds, next_svc_fds_slot, -1); if (rc > 0) { // check if any monitored service has exit @@ -1655,6 +1657,15 @@ int main(int argc, char *argv[]) { exit(EXIT_FAILURE); } + // Waiting until svc_monitor_thread is up and in ready state. + // If spawn_services runs before the thread is in ready state, // + receive side of the pipe s_pair will get unexpected data and // may + crash the process. + while (svc_monitor_thread_running == false) { + usleep(100); + } + + LOG_NO("svc_monitor_thread is up and in ready state"); if (parse_nodeinit_conf(sbuf) != NCSCC_RC_SUCCESS) { LOG_ER("Failed to parse file %s. Exiting", sbuf); exit(EXIT_FAILURE); -- 2.19.2 _______________________________________________ Opensaf-devel mailing list Opensaf-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/opensaf-devel