On Monday 22 March 2010 17:59:26 Hugh Brown wrote:
> Kern Sibbald wrote:
> > Since we still have an open bug, please add this to the bug report.
>
> Hi Kern -- I'm unsure what to do; the bug has been marked closed, and
> I'm reluctant to reopen it just to attach files.

It is OK to re-open the bug report, if the files have something new.

>
> If it's a hardware problem, it's a hardware problem -- however, what
> confuses me is why the child process appears to be hanging at the
> place it is.

Unless I am missing something, it was always hanging because the child did not 
exit.  Did you determine that it was hanging on the closelog()?

>
> I'd like to ask what you think about Eric's suggestion re: the
> LOG_NOWAIT option for openlog() (instead of just calling syslog()
> directly).  

The LOG_NOWAIT is not used on Linux, so adding it would make no difference in 
your case.

> Near as I can tell, Bacula isn't using either syslog() or 
> openlog(), except in the signal_handler() routine in libs/signal.c.
> Does that mean that closelog() should be avoided, or is this just
> something that all forked processes should do?

If there was a problem with closelog(), which I cannot entirely rule out, then 
why have we not seen this problem before in the last 6 or 7 years that the 
code has been used?

If you think it is the closelog(), then try commenting it out.  It is not 
terribly serious if the log is not closed.  The main reason it is closed is 
for security reasons.  If you have not sent any syslog messages, then the 
closelog() should be like a noop.  If you run a strace on the SD, you should 
be able to find the exact place where it hangs.  That would give us a lot 
more information -- it will also generate a lot of output.

If you find that commenting it out resolves the problem, then I would 
definitely like to know, and we can come up with some more appropriate 
solution or find some way to try to duplicate it here so that we can see 
*exactly* why it hangs there.

>
> I did get these problems when I was using tapeinfo for the alert
> command, not just smartctl.  As you say, that could indicate problems
> with the SCSI hardware.

At this point some sort of SCSI hardware problem is the highest probability as 
I see it.  If you can show that it is closelog(), then I would re-evaluate 
that.

>
> Again, thanks for your time, and I hope I'm not wasting more of it
> with these questions.
>

Best regards,

Kern

------------------------------------------------------------------------------
Download Intel® Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
_______________________________________________
Bacula-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-devel

Reply via email to