hi there, recently after a server reboot I found the following messages in the monit log, repeated every cycle..
[CET Nov 6 03:56:07] error : 'sistd' failed to get process data it was related to a process check with a pid file. Monit reported no problem using the command monit summary.. unfortunatly I can't investigate furthermore because I was off-shift and monit has been restarted again in order to solve. I has glanced over the source and found the following stack trace to bring to the error message: #0 check_process (s=0x702b30) at src/validate.c:1326 #1 0x000000000043c320 in validate () at src/validate.c:1292 #2 0x000000000041bd72 in do_default () at src/monit.c:586 #3 0x000000000041b37b in do_action (argc=4, args=0x7fffffffe368) at src/monit.c:414 #4 0x000000000041ad4d in main (argc=4, argv=0x7fffffffe368) at src/monit.c:173 for what I can get, every iteration cycle monit reads from procfs the statistics of every process and load them on an array ptree.. it reads the pid from the pidfile checking for its existence using the syscall getgpid.. then it searches the pid read from the pidfile on the ptree array to update the service data on the list used for the checks (of type ServiceT).. and it saves the statistics read from the ptree on s->inf.process it seems that the error happens when getgpid returns no error but the pid is no more on the ptree array... All these steps are not atomic.. so the file content could be modified between the load on procfs and the update.. Anyway I couldn't reproduce the error anymore. Maybe I miss something. Do you know if there's some workaround or this kind of error is related to a bug.. The monit release on the server is 5.25.3 Thanks Luca Cazzaniga