Hi Gary, I'd strongly suggest opening that support case with Sun. To give you additional information, your understanding of FIN_WAIT_2 state is correct. After your end of connection (application) sends EOF to socket, and effectively FIN to remote end, it goes to FIN_WAIT_1 state. Then it receives Ack of that FIN frame and goes to FIN_WAIT_2 state. Now, If I'm not mistaken, if your end of connection does not receive FIN from other end while it's in FIN_WAIT_2 state, the ndd parameter you mentioned determines how long we stay in such state before connection gets flushed. This is 675 seconds in your case. It is in fact protocol violation, but seems to be good idea since we must account for situations where far end of connection can simply reboot and loose all it's socket states. Important thing about FIN_WAIT_2 state though is the fact, that it's perfectly synchronised TCP state. Far end will transition into CLOSE_WAIT state having received FIN from you, but may still want to send some data. In my opinion you should ask Sun to have a look at it. Here's what you should be collecting to get you started:
- snoops from both ends of connection: snoop -q -d <devicename> -o <outputfile> - trusses of processes on both ends responsible for data transfer (the ones reading and writing to your sockets) truss -eflDda -fall -rall -vall -mall -o <outpu_file> -p <pid> - explorers from both systems Good luck. Regards, Daniel On Sun, 13 Dec 2009 06:23:44 PST, Gary Mills <[email protected]> wrote: > I have an anonymous FTP server where processes occasionally > persist with one socket in the FIN_WAIT_2 state: > > Local Address Remote Address Swind Send-Q Rwind Recv-Q > State > -------------------- -------------------- ----- ------ ----- ------ > ----------- > 130.179.16.34.7775 164.164.240.122.1814 59430 0 49640 0 > FIN_WAIT_2 > > It's always for the data connection, with the process sleeping > in read() on that socket. I assume it's waiting for a FIN from > the client. Shouldn't this state time out? > > # ndd /dev/tcp tcp_fin_wait_2_flush_interval > 675000 > > It never does. Is the server supposed to take some action? > All of the timeouts are set in the ftpaccess file. Is there a bug > in the Solaris kernel? I can't find one that's documented. > > This is running under Solaris 10. I can open a support case, > but I'd like to get a little more information first. _______________________________________________ networking-discuss mailing list [email protected]
