I have a server farm at my small ISP running NetBSD 7.0 and pf. All the servers seem to be rock solid except the email server which has random lockups. The system is still running as it responds to pings and in fact if I am running screen I can switch between the different screens but none of them will run anything and even a simple carriage return will not display a new prompt.
It sounds like kern/50168 (Frequent lockups and panics with NetBSD 7/amd64, may be ipfilter-related) but I run pf, not ipf. I have a little script that capture memory usage every minute and stores it in a log. It writes the time followed by MemTotal, MemFree, MemShared, SwapTotal, SwapFree, Cached and Buffers from /proc/meminfo. Here's what it looked like when it hung and was rebooted. Wed Mar 16 13:39:00 2016 31806 721 0 32787 32787 27565 25744 Wed Mar 16 13:40:00 2016 31806 718 0 32787 32787 27568 25744 Wed Mar 16 13:41:00 2016 31806 739 0 32787 32787 27549 25746 Wed Mar 16 13:42:00 2016 31806 733 0 32787 32787 27555 25748 Wed Mar 16 13:43:00 2016 31806 763 0 32787 32787 27528 25754 Wed Mar 16 13:44:00 2016 31806 720 0 32787 32787 27568 25754 Wed Mar 16 13:45:00 2016 31806 696 0 32787 32787 27591 25756 Wed Mar 16 13:46:00 2016 31806 718 0 32787 32787 27569 25755 Wed Mar 16 13:47:00 2016 31806 721 0 32787 32787 27566 25752 Wed Mar 16 13:48:01 2016 31806 736 0 32787 32787 27552 25756 Wed Mar 16 13:49:00 2016 31806 794 0 32787 32787 27497 25756 Wed Mar 16 13:50:00 2016 31806 819 0 32787 32787 27471 25755 Wed Mar 16 13:51:00 2016 31806 834 0 32787 32787 27457 25754 Wed Mar 16 13:52:00 2016 31806 830 0 32787 32787 27461 25754 Wed Mar 16 13:53:00 2016 31806 836 0 32787 32787 27456 25754 Wed Mar 16 13:54:00 2016 31806 842 0 32787 32787 27450 25755 Wed Mar 16 13:55:01 2016 31806 827 0 32787 32787 27465 25754 Wed Mar 16 13:56:00 2016 31806 66 0 32787 32787 28227 26540 Wed Mar 16 13:57:00 2016 31806 83 0 32787 32787 28207 26476 Wed Mar 16 13:58:00 2016 31806 48 0 32787 32787 28243 26479 Wed Mar 16 13:59:00 2016 31806 75 0 32787 32787 28215 26480 Wed Mar 16 14:36:01 2016 31806 31067 0 32787 32787 475 98 Wed Mar 16 14:37:00 2016 31806 30733 0 32787 32787 745 135 Wed Mar 16 14:38:00 2016 31806 30644 0 32787 32787 821 163 Wed Mar 16 14:39:00 2016 31806 30542 0 32787 32787 915 187 Wed Mar 16 14:40:00 2016 31806 30450 0 32787 32787 993 211 I could turn off pf but it could be weeks before a hang might happen. I am considering rebooting on a regular basis (early Sunday morning is what I had in mind) to see if that makes it more reliable but I have no indication that this is uptime related. I also have a "top -osize" running in one of the screens. Since I can still switch screens I am hoping that that might show me the culprit if it is a runaway process. Can anyone suggest any other avenues to investigate? -- D'Arcy J.M. Cain <da...@netbsd.org> http://www.NetBSD.org/ IM:da...@vex.net