Re: [uClinux-dev] Using page_alloc2() yields high kswapd runtime.
Sorry for the repost. It seems that when I tried to attach the kernel data as an attached file the text of my mail message got clipped. Many of you may have already seen this data, sorry for the repost, but I was concerned that perhaps not all of my original email made it through. Hopefully this one, sans attachments, will... Dave David McCullough wrote: If you boot te hsystem is a configuration that doesn't use much RAM and don't start and nasty big apps is the system idle (ie kswapd is behaving). If so what triggers it's rampage ? Cheers, Davidm Davidm, This came out a bit garbled. I'm going to paraphrase and hope I got this right: If you boot the system in a configuration that doesn't use much RAM and don't start big and nasty apps is the system idle (i.e. is kswapd behaving?). If so what triggers it's rampage? FYI, Attached are some various runtime scenarios I've sampled. In each case I've measured both (kernel running with page_alloc() vs page_alloc2()) during times at which there are no applications running and when there are. Also I've thrown in an additional measurement showing the state of system after the main application (which is a server application) has been put to some use and is no longer idling. During the CPU intensive times when the application (mspscand) is running, top shows that it is only getting about 40% of the CPU with the remaining 50%+ going to kswapd. With page_alloc2() configured back out, my old performance returns on my application. top shows the application getting CPU percentages in the high 80% range when busy. But I'm also seeing those old ksize on unknown page type errors again. I should also state that I'm running on a ColdFire 5282 with 16MB of SDRAM, uclinux-2.4.32-uc0 (20060806 drop with mods), and using m68k-elf-tools-20030314 (gcc 2.95.3 and matching binutils). HTH. I'll have to consider what to do next. For those of you who *have* tweaked in this area in the past, please share your tweaks. I'm not above some experimentation with this. Thanks, Dave -- David Spain SiCortex, Inc. Three Clock Tower Place, Suite 210 Maynard, MA USA 01754 Email: [EMAIL PROTECTED] Session with page_alloc() pwr(2) memory allocator + all applications / ps PID PORT STAT SIZE SHARED %CPU COMMAND 1 S 142K 0K 0.5 /bin/init 2 S 0K 0K 0.0 keventd 3 R 0K 0K 0.0 ksoftirqd_CPU0 4 S 0K 0K 0.0 kswapd 5 S 0K 0K 0.0 bdflush 6 S 0K 0K 0.0 kupdated 24 S 29K 0K 0.0 dhcpcd -D -H -p -a eth0 78 S 41K 0K 0.0 portmap 85 S 0K 0K 0.0 rpciod 92 S 198K 4K 0.0 msh /etc/rdate.msh 10.0.0.118 95 S1963K 0K 1.9 /bin/mspscand --execed 96 S 70K 4K 0.0 sleep 300 98 S0 R 37K 0K 7.6 /bin/sh 99 S 19K 0K 0.2 /bin/inetd 100 S 71K 4K 0.6 /bin/syslogd -n 101 S 70K 4K 0.3 /bin/klogd -n / cat /proc/meminfo total:used:free: shared: buffers: cached: Mem: 12742656 7114752 56279040 610304 1236992 Swap:000 MemTotal:12444 kB MemFree: 5496 kB MemShared: 0 kB Buffers: 596 kB Active: 1040 kB Inactive: 764 kB HighTotal: 0 kB HighFree:0 kB LowTotal:12444 kB LowFree: 5496 kB SwapTotal: 0 kB SwapFree:0 kB Session with page_alloc() pwr(2) memory allocator + all applications (After client application mspscand is no longer idle) / ps PID PORT STAT SIZE SHARED %CPU COMMAND 1 S 142K 0K 0.0 /bin/init 2 S 0K 0K 0.0 keventd 3 R 0K 0K 0.4 ksoftirqd_CPU0 4 S 0K 0K 0.0 kswapd 5 S 0K 0K 0.0 bdflush 6 S 0K 0K 0.0 kupdated 24 S 29K 0K 0.0 dhcpcd -D -H -p -a eth0 78 S 41K 0K 0.0 portmap 85 S 0K 0K 0.0 rpciod 92 S 198K 4K 0.0 msh /etc/rdate.msh 10.0.0.118 95 S2281K 0K 18.2 /bin/mspscand --execed 99 S 19K 0K 0.0 /bin/inetd 100 S 71K 4K 0.0 /bin/syslogd -n 101 S 70K 4K 0.0 /bin/klogd -n 113 S 70K 4K 0.0 sleep 300 115 S0 R 30K 0K 0.8 /bin/sh / / cat /proc/meminfo total:used:free: shared: buffers: cached: Mem: 12742656 12451840 2908160 610304 6197248 Swap:000 MemTotal:12444 kB MemFree: 284 kB MemShared: 0 kB Buffers: 596 kB Active: 1064 kB Inactive: 5584 kB HighTotal: 0 kB HighFree:0 kB
Re: [uClinux-dev] Using page_alloc2() yields high kswapd runtime.
Hi All, This problem is close to my heart too. On the Blackfin systems we have been working on slobs, slabs and even (piece of)cake allocators inside the ever dynamic 2.6 kernel. We were, I think close to a solution but I for one am having trouble keeping any solution in line with kernel movement. If you want to stream data buffers I would take a close look at relayfs I also have a my own simpler close relation that I use in these circumstances. This will stream the data into already allocated io channels (rEALLY BIG FIFOS). This totally avoids any dynamic memory allocation problems. Just another 10C worth. Phil Wilshire David McCullough wrote: Jivin Jamie Lokier lays it down ... David McCullough wrote: Feel free to send in some patches :-) When they let me past the dark age of 2.4.26-uc0, maybe I will :-) I have a few ideas to combine the better fragmentation performance of page_alloc2.c with the speed of page_alloc.c (a hybrid of buddy and bitmap search), plus some fragmentation-reducing strategies using zones (nothing to do with uclinux) that were proposed for 2.6 kernels and did well in measurements. You know, when that copious free time rolls around :-) I think everyone is waiting for that one :-) Are you low on memory ? page_alloc2 gets pretty nasty about trying to clear the caches etc as often as possible to keep as much contiguous memory available at all times. Rapidly allocating and freeing memory: it's streaming video from disk at rates of 1-2MB/s, on a device with 32MB total for Linux. Free memory oscillates, decreasing and then jumping up every 5 seconds (on the vendor-patched kernel). Straight uclinux keeps the free memory up more consistently, but at the cost of very high kswapd CPU while streaming. That said, I have seen systems where kswapd CPU usage is not a problem, and oviously there are those where it is. I don't know the cause. 2 possibilities: 1) I haven't actively used a 2.4 kernel on a non-MMU system for some time and the page_alloc2 code may just be wrong due to a kernel update and bit rot. 2) The usage on these systems is triggering the behaviour. If you boot te hsystem is a configuration that doesn't use much RAM and don't start and nasty big apps is the system idle (ie kswapd is behaving). If so what triggers it's rampage ? I think it's the high rate of page allocation which triggers it. There shouldn't be a need to run kswapd constantly, for file cache pages: it should be possible to reclaim cache pages rapidly during allocation, recycling them. I think that's where page_alloc2.c goes wrong. The heuristic interaction between page_alloc.c and kswapd is rather subtle and tricky, but the basic difference is that page_alloc.c doesn't maximise free memory all the time; instead, it keeps track of rapidly reclaimable memory. Apart from the CPU difference, that means page_alloc2.c tends to fail allocations if it really does run out of memory while kswapd is catching up asynchronously. (And failed allocations result in execs crashing, ahem). It's crashes due to memory shortage which prompted me to investigate; the CPU differences were a surprise. A side effect of the high CPU of kswapd with page_alloc2.c in these situations is that allocation is noticably slower. I noticed, to my great surprise, that rsync was able to fetch files over the network and write them to disk twice as fast with page_alloc.c. (4MB/s instead of 2MB/s). For ages, I'd assumed it was the driver or hardware. To summarise, I found these differences: page_alloc.c: Pro: Lower CPU usage of kswapd, especially when streaming files. Pro: Doesn't fail allocations when lots of data in filecache; reclaims cache pages when needed. Pro: Keeps file data cached, if the pages are not required for something else. Pro: Faster allocation, surprisingly faster sometimes. Con: After long uptimes, with fork/execs causing large contiguous allocations, eventually memory will be too fragmented for fork/execs and the allocator is unable to recover. So after long uptimes, the system will fail to allow telnet logins, for example, but will still be functioning in other ways. page_alloc2.c: Con: Higher CPU usage of kswapd, especially when streaming files. Con: Fails allocations when lots of data in filecache which could be reclaimed, sometimes. Con: Evicts cached file data regularly. Even tiny files which are read very often from disk will do I/O periodically, instead of always reading from cache. Con: Slower allocation, surprisingly so sometimes. Pro: After long uptimes, with fork/execs causing large contiguous allocations, and simultaneous streaming file data, it manages to keep different types of allocation separate enough that fragmentation is not inevitable. Indefinitely
Re: [uClinux-dev] Using page_alloc2() yields high kswapd runtime.
David McCullough wrote: If you boot te hsystem is a configuration that doesn't use much RAM and don't start and nasty big apps is the system idle (ie kswapd is behaving). If so what triggers it's rampage ? Cheers, Davidm Davidm, This came out a bit garbled. I'm going to paraphrase and hope I got this right: If you boot the system in a configuration that doesn't use much RAM and don't start big and nasty apps is the system idle (i.e. is kswapd behaving?). If so what triggers it's rampage? FYI, Attached are some various runtime scenarios I've sampled. In each case I've measured both (kernel running with page_alloc() vs page_alloc2()) during times at which there are no applications running and when there are. Also I've thrown in an additional measurement showing the state of system after the main application (which is a server application) has been put to some use and is no longer idling. During the CPU intensive times when the application (mspscand) is running, top shows that it is only getting about 40% of the CPU with the remaining 50%+ going to kswapd. With page_alloc2() configured back out, my old performance returns on my application. top shows the application getting CPU percentages in the high 80% range when busy. But I'm also seeing those old ksize on unknown page type errors again. I should also state that I'm running on a ColdFire 5282 with 16MB of SDRAM, uclinux-2.4.32-uc0 (20060806 drop with mods), and using m68k-elf-tools-20030314 (gcc 2.95.3 and matching binutils). HTH. I'll have to consider what to do next. For those of you who *have* tweaked in this area in the past, please share your tweaks. I'm not above some experimentation with this. Thanks, Dave -- David Spain SiCortex, Inc. Three Clock Tower Place, Suite 210 Maynard, MA USA 01754 Email: [EMAIL PROTECTED] -%-%-%-%-%-%-%-%-%-%-% Session with page_alloc() pwr(2) memory allocator + all applications / ps PID PORT STAT SIZE SHARED %CPU COMMAND 1 S 142K 0K 0.5 /bin/init 2 S 0K 0K 0.0 keventd 3 R 0K 0K 0.0 ksoftirqd_CPU0 4 S 0K 0K 0.0 kswapd 5 S 0K 0K 0.0 bdflush 6 S 0K 0K 0.0 kupdated 24 S 29K 0K 0.0 dhcpcd -D -H -p -a eth0 78 S 41K 0K 0.0 portmap 85 S 0K 0K 0.0 rpciod 92 S 198K 4K 0.0 msh /etc/rdate.msh 10.0.0.118 95 S1963K 0K 1.9 /bin/mspscand --execed 96 S 70K 4K 0.0 sleep 300 98 S0 R 37K 0K 7.6 /bin/sh 99 S 19K 0K 0.2 /bin/inetd 100 S 71K 4K 0.6 /bin/syslogd -n 101 S 70K 4K 0.3 /bin/klogd -n / cat /proc/meminfo total:used:free: shared: buffers: cached: Mem: 12742656 7114752 56279040 610304 1236992 Swap:000 MemTotal:12444 kB MemFree: 5496 kB MemShared: 0 kB Buffers: 596 kB Active: 1040 kB Inactive: 764 kB HighTotal: 0 kB HighFree:0 kB LowTotal:12444 kB LowFree: 5496 kB SwapTotal: 0 kB SwapFree:0 kB Session with page_alloc() pwr(2) memory allocator + all applications (After client application mspscand is no longer idle) / ps PID PORT STAT SIZE SHARED %CPU COMMAND 1 S 142K 0K 0.0 /bin/init 2 S 0K 0K 0.0 keventd 3 R 0K 0K 0.4 ksoftirqd_CPU0 4 S 0K 0K 0.0 kswapd 5 S 0K 0K 0.0 bdflush 6 S 0K 0K 0.0 kupdated 24 S 29K 0K 0.0 dhcpcd -D -H -p -a eth0 78 S 41K 0K 0.0 portmap 85 S 0K 0K 0.0 rpciod 92 S 198K 4K 0.0 msh /etc/rdate.msh 10.0.0.118 95 S2281K 0K 18.2 /bin/mspscand --execed 99 S 19K 0K 0.0 /bin/inetd 100 S 71K 4K 0.0 /bin/syslogd -n 101 S 70K 4K 0.0 /bin/klogd -n 113 S 70K 4K 0.0 sleep 300 115 S0 R 30K 0K 0.8 /bin/sh / / cat /proc/meminfo total:used:free: shared: buffers: cached: Mem: 12742656 12451840 2908160 610304 6197248 Swap:000 MemTotal:12444 kB MemFree: 284 kB MemShared: 0 kB Buffers: 596 kB Active: 1064 kB Inactive: 5584 kB HighTotal: 0 kB HighFree:0 kB LowTotal:12444 kB LowFree: 284 kB SwapTotal: 0 kB SwapFree:0 kB Session with page_alloc2() memory allocator + no applications = / ps PID PORT STAT SIZE SHARED %CPU
Re: [uClinux-dev] Using page_alloc2() yields high kswapd runtime.
Aristotelis Iordanidis wrote: We've run into the same problem, working with an armnommu platform. We tracked down the root of the high cpu load to the function kswapd_balance_pgdat() in linux-2.4.x/mmnommu/vmscan.c. The problem occurs only when using the non-power-of-2 memory allocator (i.e. CONFIG_CONTIGUOUS_PAGE_ALLOC is defined). Anyway, all this seems to be caused by the following piece of code: #ifndef CONFIG_CONTIGUOUS_PAGE_ALLOC /* we always want the memory now !! */ __set_current_state(TASK_INTERRUPTIBLE); schedule_timeout(HZ*5); #endif As a workaround, we changed it as shown below, for our architecture (e.g. CONFIG_ARCH_MINE): [Re-enable the time delay, but shorter, for CONTIGUOUS_PAGE_ALLOC]. That's the same as what we ended up with, for streaming HD video from disk. (The only difference is we settled on HZ/5 instead of HZ/10 in your case.) It's unfortunate that it has to be tuned for a particular application and memory size: too little delay, the and CPU is high; too much, and the reclamation rate is not sufficient for a particular rate of file reading vs free RAM, due to the spikiness of the reclamation process. This is where synchronous reclamation, as page_alloc.c does, would be better. It could be added to page_alloc2.c, but clearly everyone is busy doing something else :-) -- Jamie ___ uClinux-dev mailing list uClinux-dev@uclinux.org http://mailman.uclinux.org/mailman/listinfo/uclinux-dev This message was resent by uclinux-dev@uclinux.org To unsubscribe see: http://mailman.uclinux.org/mailman/options/uclinux-dev
Re: [uClinux-dev] Using page_alloc2() yields high kswapd runtime.
I forgot to add the obligatory: uclinux-2.4.32-uc0 from the 20060803 drop. Compiled with the gcc 2.95.3 binutils. Dave -- David Spain SiCortex, Inc. Three Clock Tower Place, Suite 210 Maynard, MA USA 01754 Email: [EMAIL PROTECTED] ___ uClinux-dev mailing list uClinux-dev@uclinux.org http://mailman.uclinux.org/mailman/listinfo/uclinux-dev This message was resent by uclinux-dev@uclinux.org To unsubscribe see: http://mailman.uclinux.org/mailman/options/uclinux-dev
Re: [uClinux-dev] Using page_alloc2() yields high kswapd runtime.
David Spain wrote: I forgot to add the obligatory: uclinux-2.4.32-uc0 from the 20060803 drop. Compiled with the gcc 2.95.3 binutils. page_alloc2.c is better for reducing fragmentation and also being less sensitive to it, but it doesn't interact with kswapd's wakeup logic in quite the way it's supposed to, as far as I can tell. It seems to make it work more often than necessary. page_alloc.c is better for fast allocations, and for keeping more memory free when there is a steady stream of allocations (e.g. when streaming data from disk), but after many allocation-free cycles of large blocks (e.g. when running executables), the system becomes very fragmented. (I was bitten by this in a different way: I'm using vendor-supplied uclinux kernels, and they are configured to use page_alloc2.c. The added CPU usage caused the vendor to tweak things to reduce it, and when streaming files from disk those tweaks caused kswapd to fail to respond quickly enough, causing out of memory failures... But the unpatched uclinux code used too much CPU. We found a compromise). -- Jamie ___ uClinux-dev mailing list uClinux-dev@uclinux.org http://mailman.uclinux.org/mailman/listinfo/uclinux-dev This message was resent by uclinux-dev@uclinux.org To unsubscribe see: http://mailman.uclinux.org/mailman/options/uclinux-dev
Re: [uClinux-dev] Using page_alloc2() yields high kswapd runtime.
On Wednesday 14 March 2007 7:35 pm, David McCullough wrote: If you boot te hsystem is a configuration that doesn't use much RAM and don't start and nasty big apps is the system idle (ie kswapd is behaving). If so what triggers it's rampage ? I'd noticed this too (page_alloc2() high CPU use) when I started development of my system. I am afraid I didn't know enough about it, figured I'm not that short on memory anyway and used the standard power-of-2 allocator. This was (is) 2.4.31-uc0 on MCF5282. -A. ___ uClinux-dev mailing list uClinux-dev@uclinux.org http://mailman.uclinux.org/mailman/listinfo/uclinux-dev This message was resent by uclinux-dev@uclinux.org To unsubscribe see: http://mailman.uclinux.org/mailman/options/uclinux-dev
Re: [uClinux-dev] Using page_alloc2() yields high kswapd runtime.
Jivin Jamie Lokier lays it down ... David McCullough wrote: Feel free to send in some patches :-) When they let me past the dark age of 2.4.26-uc0, maybe I will :-) I have a few ideas to combine the better fragmentation performance of page_alloc2.c with the speed of page_alloc.c (a hybrid of buddy and bitmap search), plus some fragmentation-reducing strategies using zones (nothing to do with uclinux) that were proposed for 2.6 kernels and did well in measurements. You know, when that copious free time rolls around :-) I think everyone is waiting for that one :-) Are you low on memory ? page_alloc2 gets pretty nasty about trying to clear the caches etc as often as possible to keep as much contiguous memory available at all times. Rapidly allocating and freeing memory: it's streaming video from disk at rates of 1-2MB/s, on a device with 32MB total for Linux. Free memory oscillates, decreasing and then jumping up every 5 seconds (on the vendor-patched kernel). Straight uclinux keeps the free memory up more consistently, but at the cost of very high kswapd CPU while streaming. That said, I have seen systems where kswapd CPU usage is not a problem, and oviously there are those where it is. I don't know the cause. 2 possibilities: 1) I haven't actively used a 2.4 kernel on a non-MMU system for some time and the page_alloc2 code may just be wrong due to a kernel update and bit rot. 2) The usage on these systems is triggering the behaviour. If you boot te hsystem is a configuration that doesn't use much RAM and don't start and nasty big apps is the system idle (ie kswapd is behaving). If so what triggers it's rampage ? I think it's the high rate of page allocation which triggers it. There shouldn't be a need to run kswapd constantly, for file cache pages: it should be possible to reclaim cache pages rapidly during allocation, recycling them. I think that's where page_alloc2.c goes wrong. The heuristic interaction between page_alloc.c and kswapd is rather subtle and tricky, but the basic difference is that page_alloc.c doesn't maximise free memory all the time; instead, it keeps track of rapidly reclaimable memory. Apart from the CPU difference, that means page_alloc2.c tends to fail allocations if it really does run out of memory while kswapd is catching up asynchronously. (And failed allocations result in execs crashing, ahem). It's crashes due to memory shortage which prompted me to investigate; the CPU differences were a surprise. A side effect of the high CPU of kswapd with page_alloc2.c in these situations is that allocation is noticably slower. I noticed, to my great surprise, that rsync was able to fetch files over the network and write them to disk twice as fast with page_alloc.c. (4MB/s instead of 2MB/s). For ages, I'd assumed it was the driver or hardware. To summarise, I found these differences: page_alloc.c: Pro: Lower CPU usage of kswapd, especially when streaming files. Pro: Doesn't fail allocations when lots of data in filecache; reclaims cache pages when needed. Pro: Keeps file data cached, if the pages are not required for something else. Pro: Faster allocation, surprisingly faster sometimes. Con: After long uptimes, with fork/execs causing large contiguous allocations, eventually memory will be too fragmented for fork/execs and the allocator is unable to recover. So after long uptimes, the system will fail to allow telnet logins, for example, but will still be functioning in other ways. page_alloc2.c: Con: Higher CPU usage of kswapd, especially when streaming files. Con: Fails allocations when lots of data in filecache which could be reclaimed, sometimes. Con: Evicts cached file data regularly. Even tiny files which are read very often from disk will do I/O periodically, instead of always reading from cache. Con: Slower allocation, surprisingly so sometimes. Pro: After long uptimes, with fork/execs causing large contiguous allocations, and simultaneous streaming file data, it manages to keep different types of allocation separate enough that fragmentation is not inevitable. Indefinitely long uptimes are realistically possible. In the end, we stuck with page_alloc2.c because of that last point. Our systems either crash and burn (with watchdog recovery), or telnet still works :) But we like every performance characteristic of page_alloc.c more. The CPU usage of kswapd was a problem, and the crashing when too much file data cached (due to fast streaming) was a big problem, so we tuned kswapd to a sweet spot for this application, and did everything possible with XIP-in-RAM to free up memory. Currently we have 11MB free (out of 32MB) which