Re: Strange Swapping Issues(?)

2010-04-15 Thread Peter Jeremy
On 2010-Apr-14 06:44:58 +0200, Gabor PALI p...@freebsd.org wrote:
Apr 14 05:26:45 xxx kernel: swap_pager_getswapspace(16): failed
Apr 14 05:26:45 xxx kernel: pid 7388 (throwto003), uid 1001, was
killed: out of swap space
Apr 14 05:26:45 xxx kernel: swap_pager_getswapspace(16): failed

The out-of-swap hander will kill the largest process so one of your
problems is probably throwto003.  I can't offer any suggestion as to
why the swap_pager_getswapspace() errors continued afterwards.

-- 
Peter Jeremy


pgp0nNZofvEHu.pgp
Description: PGP signature


Re: Strange Swapping Issues(?)

2010-04-15 Thread Gabor PALI
Hello,

On 04/15/10 20:41, Peter Jeremy wrote:
 The out-of-swap hander will kill the largest process so one of your
 problems is probably throwto003.  I can't offer any suggestion as
 to why the swap_pager_getswapspace() errors continued afterwards.

Okay, it was my fault.  After huge processes, like throwto003 are
killed, swap usage falls below 3-4% in a second.  (I will need to
consult the author of that processes.)

For your amusement, here is a log excerpt of the situation (it can by
easily reproduced on my system).

Columns are as follows:

- Time
- Swap Used (KB) [swapinfo]
- Swap Free (KB) [swapinfo]
- Number of Processes [ps]
- Active Virtual Pages (KB) [vmstat]
- Size of Free List (KB) [vmstat]


Messages are pounding the logs at 17:16:37:  kernel:
swap_pager_getswapspace(12): failed, and at 17:16:39 throwto003 is
finally killed:  kernel: pid 117 (throwto003), uid 1001, was killed:
out of swap space, no further fails.


17:15:3081720   4112584 155 2033636 71464
17:15:3181720   4112584 155 2777084 9732
17:15:41439384  3754920 155 2953212 10104
17:15:42493804  3700500 155 3121148 10152
17:15:44730560  3463744 155 3280812 10208
17:15:47817608  3376696 155 3459068 9980
17:15:49988420  3205884 155 3625008 10268
17:15:511182748 3011556 155 3799036 10036
17:15:531326872 2867432 155 3963900 8280
17:15:541445332 2748972 155 3969020 9632
17:15:551457348 2736956 155 4093928 40304
17:15:571598944 2595360 155 4139004 10568
17:15:581646460 2547844 155 4276220 8360
17:15:591689776 2504528 155 4365308 8476
17:16:001882924 2311380 155 4453372 89048
17:16:042049676 2144628 155 4567036 9860
17:16:062219004 1975300 155 4733948 10264
17:16:082389028 1805276 155 4903932 10012
17:16:102558516 1635788 155 5047396 99916
17:16:122735864 1458440 157 5254428 10532
17:16:152917096 1277208 158 5475788 100944
17:16:173103632 1090672 159 5716856 98876
17:16:203231660 962644  159 5929908 40144
17:16:243498256 696048  159 6168504 68192
17:16:283675324 518980  159 6238136 10700
17:16:303861972 332332  159 6414172 8500
17:16:324032132 162172  157 6556588 10324
17:16:334054828 139476  155 6718460 7932
17:16:344176548 17756   155 6719536 9584
17:16:384193604 700 155 6799908 71096
17:16:39100580  4093724 151 1132240 1640856
17:16:4096336   4097968 151 1123620 1637372
17:16:4196332   4097972 153 1189608 1613748
17:16:4296328   4097976 156 1269164 1529760
17:16:4396272   4098032 154 1147628 1602084


Cheers,
:g

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Strange Swapping Issues(?)

2010-04-14 Thread Gabor PALI
Hi Jeremy,

On Wed, Apr 14, 2010 at 7:26 AM, Jeremy Chadwick
free...@jdc.parodius.com wrote:
 The swapinfo command you ran was not run at 05:26 in the morning.

It was run a few minutes after.  I accidentally got it live :)  Well,
I was expecting that because I have seen similar message previously in
the logs.  I think it is unlikely that things suddenly fall below 3%
after the kernel has complained about the lack of swap space.  Please,
correct me, if I am wrong here.


 You should probably set up a small script, run via cronjob, that logs
 swapinfo -h output to a file somewhere (rotate it if you want via
 newsyslog.conf).

Great idea, will do it.


 You may have something running on the system that spirals out of
 control, such as a web board script being pounded to death, or something
 that's forking excessively.

It is called parallel nightly build of the Glasgow Haskell Compiler :D
 According to its official documentation, compilation and testing is
very intensive, indeed. I am trying to launch the builders in
different times in order to distribute the load.


 I'd also recommend having the script output top -b -o res 100, which
 will give you the top 100 processes on the machine sorted by RSS
 [..] So I'm making the assumption RSS will be large.

We will see soon...


Thanks for the quick help!

:g
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Strange Swapping Issues(?)

2010-04-13 Thread Gabor PALI
Hello there,

I am running a FreeBSD/amd64 8-STABLE with GENERIC kernel as of
February 17 on my box (quad core, 2 GB RAM), and recently I spot some
interesting problems in my logs.  My machine runs two instances of a
client in two separate chroot environments in parallel with 32-bit and
64-bit userlands respectively, doing a nightly building and testing.
According to the logs it puts a nice load on my system, though it
still seems to be working fine.

Except one thing: it produces strange error messages on swap space
without an apparent reason (at least to me):

xxx# tail -f /var/log/messages
Apr 14 05:26:44 xxx kernel: swap_pager_getswapspace(4): failed
Apr 14 05:26:44 xxx kernel: swap_pager_getswapspace(16): failed
Apr 14 05:26:44 xxx kernel:
Apr 14 05:26:44 xxx kernel: swap_pager_getswapspace(3): failed
Apr 14 05:26:44 xxx kernel: swap_pager_getswapspace(16): failed
Apr 14 05:26:44 xxx kernel: swap_pager_getswapspace(3): failed
Apr 14 05:26:45 xxx kernel: swap_pager_getswapspace(16): failed
Apr 14 05:26:45 xxx kernel: swap_pager_getswapspace(12): failed
Apr 14 05:26:45 xxx kernel: swap_pager_getswapspace(2): failed
Apr 14 05:26:45 xxx kernel: swap_pager_getswapspace(16): failed
Apr 14 05:26:45 xxx kernel: swap_pager_getswapspace(16): failed
Apr 14 05:26:45 xxx kernel: pid 7388 (throwto003), uid 1001, was
killed: out of swap space
Apr 14 05:26:45 xxx kernel: swap_pager_getswapspace(16): failed
Apr 14 05:26:45 xxx kernel: swap_pager_getswapspace(8): failed
Apr 14 05:26:45 xxx kernel: swap_pager_getswapspace(16): failed
Apr 14 05:26:45 xxx kernel: swap_pager_getswapspace(16): failed
Apr 14 05:26:45 xxx kernel: swap_pager_getswapspace(8): failed
Apr 14 05:26:45 xxx kernel: swap_pager_getswapspace(16): failed
Apr 14 05:26:45 xxx kernel: swap_pager_getswapspace(12): failed
Apr 14 05:26:45 xxx kernel: swap_pager_getswapspace(9): failed
Apr 14 05:26:45 xxx kernel: swap_pager_getswapspace(16): failed
Apr 14 05:26:45 xxx kernel: swap_pager_getswapspace(9): failed
Apr 14 05:26:45 xxx kernel: swap_pager_getswapspace(3): failed
Apr 14 05:26:45 xxx kernel: swap_pager_getswapspace(16): failed
Apr 14 05:26:45 xxx kernel: swap_pager_getswapspace(9): failed
Apr 14 05:26:45 xxx kernel: swap_pager_getswapspace(3): failed
^C
xxx# swapinfo -h
Device  1K-blocks UsedAvail Capacity
/dev/ad0s1b   4194304 112M 3.9G 3%


Do you have any ideas what might have happened?  Do I need to update
or configure something?

Thank you very much for your replies in advance.

Cheers,
:g
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org


Re: Strange Swapping Issues(?)

2010-04-13 Thread Jeremy Chadwick
On Wed, Apr 14, 2010 at 06:44:58AM +0200, Gabor PALI wrote:
 Hello there,
 
 I am running a FreeBSD/amd64 8-STABLE with GENERIC kernel as of
 February 17 on my box (quad core, 2 GB RAM), and recently I spot some
 interesting problems in my logs.  My machine runs two instances of a
 client in two separate chroot environments in parallel with 32-bit and
 64-bit userlands respectively, doing a nightly building and testing.
 According to the logs it puts a nice load on my system, though it
 still seems to be working fine.
 
 Except one thing: it produces strange error messages on swap space
 without an apparent reason (at least to me):
 
 xxx# tail -f /var/log/messages
 Apr 14 05:26:44 xxx kernel: swap_pager_getswapspace(4): failed
 Apr 14 05:26:44 xxx kernel: swap_pager_getswapspace(16): failed
 Apr 14 05:26:44 xxx kernel:
 Apr 14 05:26:44 xxx kernel: swap_pager_getswapspace(3): failed
 Apr 14 05:26:44 xxx kernel: swap_pager_getswapspace(16): failed
 Apr 14 05:26:44 xxx kernel: swap_pager_getswapspace(3): failed
 Apr 14 05:26:45 xxx kernel: swap_pager_getswapspace(16): failed
 Apr 14 05:26:45 xxx kernel: swap_pager_getswapspace(12): failed
 Apr 14 05:26:45 xxx kernel: swap_pager_getswapspace(2): failed
 Apr 14 05:26:45 xxx kernel: swap_pager_getswapspace(16): failed
 Apr 14 05:26:45 xxx kernel: swap_pager_getswapspace(16): failed
 Apr 14 05:26:45 xxx kernel: pid 7388 (throwto003), uid 1001, was
 killed: out of swap space
 Apr 14 05:26:45 xxx kernel: swap_pager_getswapspace(16): failed
 Apr 14 05:26:45 xxx kernel: swap_pager_getswapspace(8): failed
 Apr 14 05:26:45 xxx kernel: swap_pager_getswapspace(16): failed
 Apr 14 05:26:45 xxx kernel: swap_pager_getswapspace(16): failed
 Apr 14 05:26:45 xxx kernel: swap_pager_getswapspace(8): failed
 Apr 14 05:26:45 xxx kernel: swap_pager_getswapspace(16): failed
 Apr 14 05:26:45 xxx kernel: swap_pager_getswapspace(12): failed
 Apr 14 05:26:45 xxx kernel: swap_pager_getswapspace(9): failed
 Apr 14 05:26:45 xxx kernel: swap_pager_getswapspace(16): failed
 Apr 14 05:26:45 xxx kernel: swap_pager_getswapspace(9): failed
 Apr 14 05:26:45 xxx kernel: swap_pager_getswapspace(3): failed
 Apr 14 05:26:45 xxx kernel: swap_pager_getswapspace(16): failed
 Apr 14 05:26:45 xxx kernel: swap_pager_getswapspace(9): failed
 Apr 14 05:26:45 xxx kernel: swap_pager_getswapspace(3): failed
 ^C
 xxx# swapinfo -h
 Device  1K-blocks UsedAvail Capacity
 /dev/ad0s1b   4194304 112M 3.9G 3%

The swapinfo command you ran was not run at 05:26 in the morning.  You
should probably set up a small script, run via cronjob, that logs
swapinfo -h output to a file somewhere (rotate it if you want via
newsyslog.conf).

You may have something running on the system that spirals out of
control, such as a web board script being pounded to death, or something
that's forking excessively.

I'd also recommend having the script output top -b -o res 100, which
will give you the top 100 processes on the machine sorted by RSS
(non-shared) memory usage.  I don't know of a way to show the amount of
swap used by process N, or all processes, since it's transparently
handled by the VM.  So I'm making the assumption RSS will be large.

-- 
| Jeremy Chadwick   j...@parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to freebsd-stable-unsubscr...@freebsd.org