Hi all, The subject of this message could have been 'Memory Leak' but honestly that doesn't sound dramatic enough for my problem.
Warning, this turned out to be a longish email, for the impatient, please skip to the section marked SUMMARY at the bottom. Last summer (2004) I installed Debian Woody on a P3 to use as a development platform for PHP and MySql. Everything worked perfectly. No complaints. Then, three weeks ago, a very odd problem started to plague me. At first I saw only the symptoms. I was editing php pages, and suddenly vim wouldn't start. It complained about being unable to load linked libraries. I checked my library file (it was fine). The next time I tried vim it worked fine. I was mystified. The symptom recurred. I upgraded to version 3.1 (sarge) with apt-get upgrade. The process did not go smoothly, halting several times when various libraries were unable to be loaded, mostly on execution of a perl script. I muddled through the upgrade and rebooted the computer. From this point on I had no problems with loading linked libraries. The computer rebooted fine and all services worked as before. I started KDE and the computer almost ground to a halt. top showed me that all 256 megs of ram were used up. I quit KDE and unloaded X. No change. I brought down: Apache, Apache-ssl, Exim, pop3ad, mysqld, smbd. no changes. I actually ended up halting every process except for: kernel processes, 5 getty's, 1 bash, 1 top. The memory was not freed. I unloaded every kernel module (mostly iptables and ethernet device drivers), the memory was not freed. I reboot the comuter so that I can monitor the memory usage from a blank slate. This is the behaviour I noticed: Every five seconds or so a chunk of memory averaging between 4k and 16k would be allocated and never freed. I watched long enough to asure myself of the pattern, then started killing of processes one by one. After each process exited, the memory was still being leaked. I continued until I was down to the bare processes listed above, and no modules. In less than two hours, all my memory was gone, save for 4 or 5k. The leaking stopped there, not cutting into swap space. At this point I think one of three things has happened. There is a memory leak in the kernel (I have 2.4.18 at this point, not likely). Or there is a hidden process gobbling up all my memory. Or there is a physical problem that is preventing memory from being deallocated. I briefly considered the possibility that top was reporting memory usage incorrectly, but a brief inspection of /proc (especially /proc/mem, more on this later) showed that this was not the case. I began with a tool to show hidden processes, psreal. It found none. I downloaded kernel sources for 2.6.8, configured and built that kernel, figuring this would both A) show hidden processes missed by psreal and B) solve my unlikely problem of a kernel memory leak. Although the 2.6.8 kernel had no better luck holding onto the memory than the 2.4.18 kernel did, there was a change in behaviour; I was losing memory in 60k chunks now. This seems like an important clue, but I can't decipher it. What I've Done in My Futile Attempts to Diagnose or Repair this Problem *) To test the kernel's ability to deallocate memory, I wrote a program that looked something like this: void main() { char *p; int i = 1024; while (p = malloc(i)) i += 1024; } There was a bit more to it than that, but I didn't write or access the malloc'd memory in any way. This showed expected behaviour, quickly gobbling up memory and then returning it with it exited or was killed. *) I dumped /proc/mem into a file on a workstation and opened it up in a hex editor, to examine all 256 megs of data. So, this is interesting. A very important clue it seems like that I am incapable of deciphering. I am finding that some files from the root filesystem have been inserted into the memory. A tarball from one users directory appears twice in main memory. /etc/passwd appears 11 times in memory! The root partition is reiserfs. The boot partition, which is always mounted, is ext2. There are no files from /boot in memory that I can find. --------------------------- SUMMARY: Files or chunks of files from the root (reiserfs) partition are being inserted into memory at the rate of 4-16k/5 secs (2.4.18) or 60k/5 secs (2.6.8). This memory is never freed. This insertion is not being caused by any user space program. If the only programs running are kernel processes, getty, bash, and top, it will still occur. Memory will be eaten up until about 5k is left, and then it stabilizes. Swap space will not be used. This behaviour occured under Debian Woody and Sarge. Sarge was tested with kernels of version 2.4.18 and 2.6.8. Does anyone have any idea what could possibly be causing this? Even advice to other references would be greatly appreciated.