Please don't reply to lustre-devel. Instead, comment in Bugzilla by using the following link: https://bugzilla.lustre.org/show_bug.cgi?id=11176
Brian started a little loop on the igs21 OSS that did the following: # screen # while [ 1 ] ; do echo "m" > /proc/sysrq-trigger; sleep 1200; done # ^ad The loop ran from the time of a reboot on January 3rd, 2007. It shows a rapid rise in the Active pages value unitl it stabilizes after a couple of hours: 2007-01-03 17:33:10 Active:60953 inactive:218430 dirty:0 writeback:0 unstable:0 free:99633 slab:58644 mapped:5598 pagetables:233 2007-01-03 17:54:50 Active:143831 inactive:198215 dirty:0 writeback:0 unstable:0 free:35719 slab:59937 mapped:5598 pagetables:233 2007-01-03 18:16:43 Active:216975 inactive:152993 dirty:0 writeback:0 unstable:0 free:6445 slab:61204 mapped:5548 pagetables:205 2007-01-03 18:36:44 Active:248424 inactive:121477 dirty:0 writeback:0 unstable:0 free:6261 slab:61472 mapped:5548 pagetables:205 2007-01-03 18:56:44 Active:266343 inactive:103574 dirty:0 writeback:0 unstable:0 free:6345 slab:61411 mapped:5548 pagetables:205 2007-01-03 19:16:44 Active:285333 inactive:84681 dirty:26 writeback:0 unstable:0 free:6649 slab:60978 mapped:5548 pagetables:205 2007-01-03 19:36:44 Active:296231 inactive:73783 dirty:28 writeback:0 unstable:0 free:6263 slab:61392 mapped:5548 pagetables:205 2007-01-03 19:56:44 Active:301777 inactive:67702 dirty:38 writeback:0 unstable:0 free:6103 slab:61889 mapped:5548 pagetables:205 2007-01-03 20:16:45 Active:306633 inactive:62612 dirty:38 writeback:0 unstable:0 free:6717 slab:61690 mapped:5551 pagetables:205 2007-01-03 20:36:45 Active:310025 inactive:59337 dirty:26 writeback:0 unstable:0 free:6843 slab:61431 mapped:5551 pagetables:205 and the inactive pages slowly drain away until your see numbers like the following: 2007-01-07 23:57:45 Active:340033 inactive:13701 dirty:0 writeback:0 unstable:0 free:13593 slab:70320 mapped:5621 pagetables:206 2007-01-08 00:17:46 Active:340068 inactive:13701 dirty:597 writeback:0 unstable:0 free:12575 slab:70980 mapped:5621 pagetables:206 at this point the oom killer is liable to get invoked at any time. We decided to try a stepped sequence of tests to see if we could get any insight into where all the pages are disappearing to. 1) Run for a few hours with lustre off to see if active and inactive pages stay stable 2) Start Lustre but leave the network intefaces off while we monitor memory use 3) Activate the network interfaces but leave the file system idle and continue to monitor 4) Start auto-testing again and continue to monitor - we'd like some way of tracking exactly who (what kerel process) is responisble for the change in memory use. _______________________________________________ Lustre-devel mailing list [email protected] https://mail.clusterfs.com/mailman/listinfo/lustre-devel
