Hello Frank,

This is indeed interesting, particularly because the TLS code in Bacula uses 
its own set of networking routines separate from the normal Bacula code -- 
many of the routines were derived from Bacula code, but they may or may not 
have been changed.  In any case, unless I am mistaken, the TLS code runs 
non-blocking, which creates a whole new set of problems -- normal Bacula 
network code always runs in default blocking mode (i.e. I/O requests always 
wait for completion).

I'll take a look at this today.

Regards,

Kern

On Friday 22 June 2007 23:27, Frank Sweetser wrote:
> Kern Sibbald wrote:
> > Hello,
> > 
> > I've been thinking about possible causes of "spurious" connection drops 
and 
> > how to debug them.
> 
> I'm back again, and this time I have more evidence =)
> 
> Since last time, I've tried a number of other suggestions, including making
> sure all HP printers were turned off for the night, and even tried running 
the
> backup through a different switch, all with no change.
> 
> I have, however, been gathering packet captures, and I think I've found a
> reproducible problem that I strongly suspect is related to the network
> dropouts that I and at least one other person have hit.  More specifically,
> whenever the TLS comm code is enabled, the TCP sockets always close out with
> RST packets, instead of FIN packets, which indicates that the underlying OS
> believed there to be a problem with the socket.  Disabling TLS makes the
> problem go away.  In addition, disabling TLS has allowed the one system that
> was reliably failing to back up during the nightly production runs to work.
> I've attacked the tail end of a pair of Wireshark packet captures that
> demonstrate the problem, and can produce more if anyone would like to see 
them.
> 
> I've also included a patch to the regression tests that does a simple backup
> with the TLS comm code enabled, since there aren't any tests that do so in 
the
> SVN tree now.  I've used this test just now to to verify that the problem
> behavior exists in the current SVN head.
> 
> After some searching around on Google, the closest problem description that 
I
> can find to the observed symptoms is this paper:
> 
> http://cs.baylor.edu/~donahoo/practical/CSockets/TCPRST.pdf
> 
> That's about as far as I'm able to go with this problem, as with my level of
> C++, any attempts at non-trivial debugging tend to crash any program running
> within a 50' radius.
> 
> -- 
> Frank Sweetser fs at wpi.edu  |  For every problem, there is a solution that
> WPI Senior Network Engineer   |  is simple, elegant, and wrong. - HL Mencken
>     GPG fingerprint = 6174 1257 129E 0D21 D8D4  E8A3 8E39 29E3 E2E8 8CEC
> 

-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Bacula-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/bacula-devel

Reply via email to