Leland, I don't see how cpuplugd would have anything to do with sockets waiting to be closed by the application. The SYN_RECV is a result of the CLOSED_wait is what it seems to me. This error in abundance normally is linked to the app. Websphere should notice the bad connection and issue a close thru the OS. the obvious question is - was there maintenance applied to Websphere recently?
'Where ever you go - There you are!! ' Richard (Gaz) Gasiorowski System z - Linux Product Manager Portfolio Platform Services CSC 3170 Fairview Park Dr., Falls Church, VA 22042 845-889-8533|Work|845-392-7889 Cell|rgasi...@csc.com|www.csc.com This is a PRIVATE message. If you are not the intended recipient, please delete without copying and kindly advise us by e-mail of the mistake in delivery. NOTE: Regardless of content, this e-mail shall not operate to bind CSC to any order or other contract unless pursuant to explicit written agreement or government initiative expressly permitting the use of e-mail for such purpose. Leland Lucius <lluc...@homerow.net> Sent by: Linux on 390 Port <LINUX-390@VM.MARIST.EDU> 04/30/2009 02:07 PM Please respond to Linux on 390 Port <LINUX-390@VM.MARIST.EDU> To LINUX-390@VM.MARIST.EDU cc Subject cpuplugd and hung connections/defunct processes? Has anyone had any problems with connects hung in CLOSE_WAIT and defunct processes while using cpuplugd? We have a script that runs nightly that shuts down a Websphere instance and it's been running just fine for months. But, shortly after we turned on cpuplugd, the script started to intermittently "hang" or fail with a timeout, after which we get connections hung in a CLOSE_WAIT state and a defunct process (or two): (sorry for the wrappage) pzawap02:/var/log # netstat -ntp Active Internet connections (w/o servers) Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp 49144 0 172.16.205.190:11107 172.16.205.187:44023 CLOSE_WAIT - tcp 47944 0 172.16.205.190:11107 172.16.205.190:43666 CLOSE_WAIT - tcp 48744 0 172.16.205.190:11107 172.16.205.187:43117 CLOSE_WAIT - tcp 49144 0 172.16.205.190:11107 172.16.205.190:43019 CLOSE_WAIT - tcp 47944 0 172.16.205.190:11107 172.16.205.187:45013 CLOSE_WAIT - tcp 49136 0 172.16.205.190:11107 172.16.205.190:44773 CLOSE_WAIT - tcp 47944 0 172.16.205.190:11107 172.16.205.190:44789 CLOSE_WAIT - pzawap02:/var/log # ps -ef medwas 23974 1 0 Apr27 ? 00:05:55 [java] <defunct> After this happens, we seem to also be getting a buildup of connects in a SYN_RECV state, but that might just be our F5 trying to find out the state of the service. The reason I suspect cpuplugd is that while the script was running, a CPU was brought online and taken offline soon after: Apr 29 20:46:01 pzawap02 sudo: mfsched : TTY=unknown ; PWD=/home/mfsched ; USER=root ; COMMAND=/bin/su - medwas -c ./stopServer2.sh Apr 29 20:46:01 pzawap02 su: (to medwas) root on none Apr 29 20:46:19 pzawap02 kernel: cpu 1 phys_idx=1 vers=FF ident=0C930E machine=2094 unused=8000 Apr 29 20:46:29 pzawap02 kernel: Processor 1 spun down Any ideas on how I could prove it WASN'T cpuplugd? I'd rather not have to turn it off... Thanks much, Leland ---------------------------------------------------------------------- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 ---------------------------------------------------------------------- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390