Please don't reply to lustre-devel. Instead, comment in Bugzilla by using the 
following link:
https://bugzilla.lustre.org/show_bug.cgi?id=11176



Brian started a little loop on the igs21 OSS that did the following:

# screen
# while [ 1 ]  ; do echo "m" > /proc/sysrq-trigger; sleep 1200; done
# ^ad

The loop ran from the time of a reboot on January 3rd, 2007.  It shows a rapid
rise in the Active pages value unitl it stabilizes after a couple of hours:

2007-01-03 17:33:10 Active:60953 inactive:218430 dirty:0 writeback:0 unstable:0
free:99633 slab:58644 mapped:5598 pagetables:233
2007-01-03 17:54:50 Active:143831 inactive:198215 dirty:0 writeback:0 unstable:0
free:35719 slab:59937 mapped:5598 pagetables:233
2007-01-03 18:16:43 Active:216975 inactive:152993 dirty:0 writeback:0 unstable:0
free:6445 slab:61204 mapped:5548 pagetables:205
2007-01-03 18:36:44 Active:248424 inactive:121477 dirty:0 writeback:0 unstable:0
free:6261 slab:61472 mapped:5548 pagetables:205
2007-01-03 18:56:44 Active:266343 inactive:103574 dirty:0 writeback:0 unstable:0
free:6345 slab:61411 mapped:5548 pagetables:205
2007-01-03 19:16:44 Active:285333 inactive:84681 dirty:26 writeback:0 unstable:0
free:6649 slab:60978 mapped:5548 pagetables:205
2007-01-03 19:36:44 Active:296231 inactive:73783 dirty:28 writeback:0 unstable:0
free:6263 slab:61392 mapped:5548 pagetables:205
2007-01-03 19:56:44 Active:301777 inactive:67702 dirty:38 writeback:0 unstable:0
free:6103 slab:61889 mapped:5548 pagetables:205
2007-01-03 20:16:45 Active:306633 inactive:62612 dirty:38 writeback:0 unstable:0
free:6717 slab:61690 mapped:5551 pagetables:205
2007-01-03 20:36:45 Active:310025 inactive:59337 dirty:26 writeback:0 unstable:0
free:6843 slab:61431 mapped:5551 pagetables:205


and the inactive pages slowly drain away until your see numbers like the 
following:

2007-01-07 23:57:45 Active:340033 inactive:13701 dirty:0 writeback:0 unstable:0
free:13593 slab:70320 mapped:5621 pagetables:206
2007-01-08 00:17:46 Active:340068 inactive:13701 dirty:597 writeback:0
unstable:0 free:12575 slab:70980 mapped:5621 pagetables:206

at this point the oom killer is liable to get invoked at any time.  

We decided to try a stepped sequence of tests to see if we could get any insight
into where all the pages are disappearing to.
1)  Run for a few hours with lustre off to see if active and inactive pages stay
stable
2)  Start Lustre but leave the network intefaces off while we monitor memory use
3)  Activate the network interfaces but leave the file system idle and continue
to monitor
4)  Start auto-testing again and continue to monitor - we'd like some way of
tracking exactly who (what kerel process) is responisble for the change in
memory use.

_______________________________________________
Lustre-devel mailing list
[email protected]
https://mail.clusterfs.com/mailman/listinfo/lustre-devel

Reply via email to