Arnaud Brand wrote: > Le 08/02/10 23:18, James Carlson a écrit : >> Causes for RST include: >> >> - peer application is intentionally setting the linger time to zero >> and issuing close(2), which results in TCP RST generation. >> > Might be possible, but I can't see why the receiving end would do that.
No idea, but a debugger on that side might be able to detect something. >> - bugs in one or both peers (often related to TCP keepalive; key >> signature of such a problem is an apparent two-hour time limit). >> > That could be it, but I doubt it since disconnections appeared anywhere > randomly in the range 10 minutes to 13 hours. > It should be noted that the node sending the RST keeps the connection > open (netstat -a shows its still established). > To be honest that puzzles me. That sounds horrible. There's no way a node that still has state for the connection should be sending RST. Normal procedure is to generate RST when you do _not_ have state for the connection or (if you're intentionally aborting the connection) to discard the state at the same time you send RST. That points to either a bug in the peer's TCP/IP implementation or one of the causes that you've dismissed (particularly either a duplicate IP address or a firewall/NAT problem). >> You (at least) have to analyze the packet sequences to determine what is >> going wrong. Depending on the nature of the problem, it may also take >> in-depth kernel debugging on one or both peers to locate the cause. >> > I relaunched another transfer and I'm tcpdumping both servers in the > hope that I find something. > In the mean time I've received a beta bios from tyan which provides > support for IKVM over tagged VLANs. > Until now the intel chips (on which the IKVM/IPMI card is piggy-backed) > are working better than before. > I can't tell if it's related or not, I'm crossing fingers. That could EASILY be related. That was key information to include. IPMI, as I recall, hijacks the node's Ethernet controller to provide low-level node control service. From what I remember out of ARC reviews, the architecture is pretty brutal^W"clever". I wouldn't be surprised in the least if this is the problem. I like the idea of remote management, but that's the sort of thing I'd never enable on my systems ... > Regarding kernel debugging I though I would look for dtrace scripts, and > found some, but nothing that seemed relevant in my case. > As I a complete beginner (read: copy-paste) in dtrace I couldn't yet > figure out how to write one myself. Is the remote end that's generating RST also OpenSolaris? -- James Carlson 42.703N 71.076W <[email protected]> _______________________________________________ networking-discuss mailing list [email protected]
