On Monday 22 March 2010 17:59:26 Hugh Brown wrote: > Kern Sibbald wrote: > > Since we still have an open bug, please add this to the bug report. > > Hi Kern -- I'm unsure what to do; the bug has been marked closed, and > I'm reluctant to reopen it just to attach files.
It is OK to re-open the bug report, if the files have something new. > > If it's a hardware problem, it's a hardware problem -- however, what > confuses me is why the child process appears to be hanging at the > place it is. Unless I am missing something, it was always hanging because the child did not exit. Did you determine that it was hanging on the closelog()? > > I'd like to ask what you think about Eric's suggestion re: the > LOG_NOWAIT option for openlog() (instead of just calling syslog() > directly). The LOG_NOWAIT is not used on Linux, so adding it would make no difference in your case. > Near as I can tell, Bacula isn't using either syslog() or > openlog(), except in the signal_handler() routine in libs/signal.c. > Does that mean that closelog() should be avoided, or is this just > something that all forked processes should do? If there was a problem with closelog(), which I cannot entirely rule out, then why have we not seen this problem before in the last 6 or 7 years that the code has been used? If you think it is the closelog(), then try commenting it out. It is not terribly serious if the log is not closed. The main reason it is closed is for security reasons. If you have not sent any syslog messages, then the closelog() should be like a noop. If you run a strace on the SD, you should be able to find the exact place where it hangs. That would give us a lot more information -- it will also generate a lot of output. If you find that commenting it out resolves the problem, then I would definitely like to know, and we can come up with some more appropriate solution or find some way to try to duplicate it here so that we can see *exactly* why it hangs there. > > I did get these problems when I was using tapeinfo for the alert > command, not just smartctl. As you say, that could indicate problems > with the SCSI hardware. At this point some sort of SCSI hardware problem is the highest probability as I see it. If you can show that it is closelog(), then I would re-evaluate that. > > Again, thanks for your time, and I hope I'm not wasting more of it > with these questions. > Best regards, Kern ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ Bacula-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/bacula-devel
