Hello Frank, This is indeed interesting, particularly because the TLS code in Bacula uses its own set of networking routines separate from the normal Bacula code -- many of the routines were derived from Bacula code, but they may or may not have been changed. In any case, unless I am mistaken, the TLS code runs non-blocking, which creates a whole new set of problems -- normal Bacula network code always runs in default blocking mode (i.e. I/O requests always wait for completion).
I'll take a look at this today. Regards, Kern On Friday 22 June 2007 23:27, Frank Sweetser wrote: > Kern Sibbald wrote: > > Hello, > > > > I've been thinking about possible causes of "spurious" connection drops and > > how to debug them. > > I'm back again, and this time I have more evidence =) > > Since last time, I've tried a number of other suggestions, including making > sure all HP printers were turned off for the night, and even tried running the > backup through a different switch, all with no change. > > I have, however, been gathering packet captures, and I think I've found a > reproducible problem that I strongly suspect is related to the network > dropouts that I and at least one other person have hit. More specifically, > whenever the TLS comm code is enabled, the TCP sockets always close out with > RST packets, instead of FIN packets, which indicates that the underlying OS > believed there to be a problem with the socket. Disabling TLS makes the > problem go away. In addition, disabling TLS has allowed the one system that > was reliably failing to back up during the nightly production runs to work. > I've attacked the tail end of a pair of Wireshark packet captures that > demonstrate the problem, and can produce more if anyone would like to see them. > > I've also included a patch to the regression tests that does a simple backup > with the TLS comm code enabled, since there aren't any tests that do so in the > SVN tree now. I've used this test just now to to verify that the problem > behavior exists in the current SVN head. > > After some searching around on Google, the closest problem description that I > can find to the observed symptoms is this paper: > > http://cs.baylor.edu/~donahoo/practical/CSockets/TCPRST.pdf > > That's about as far as I'm able to go with this problem, as with my level of > C++, any attempts at non-trivial debugging tend to crash any program running > within a 50' radius. > > -- > Frank Sweetser fs at wpi.edu | For every problem, there is a solution that > WPI Senior Network Engineer | is simple, elegant, and wrong. - HL Mencken > GPG fingerprint = 6174 1257 129E 0D21 D8D4 E8A3 8E39 29E3 E2E8 8CEC > ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Bacula-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/bacula-devel
