Re: [uClinux-dev] Using page_alloc2() yields high kswapd runtime.

2007-03-19 Thread David Spain

Sorry for the repost. It seems that when I tried to attach the kernel
data as an attached file the text of my mail message got clipped.

Many of you may have already seen this data, sorry for the repost, but
I was concerned that perhaps not all of my original email made it through.

Hopefully this one, sans attachments, will...

Dave


David McCullough wrote:

If you boot te hsystem is a configuration that doesn't use much RAM and
don't start and nasty big apps is the system idle (ie kswapd is
behaving).  If so what triggers it's rampage ?

Cheers,
Davidm



Davidm,

This came out a bit garbled. I'm going to paraphrase and hope I got this
right:

 If you boot the system in a configuration that doesn't use much RAM and
 don't start big and nasty apps is the system idle (i.e. is kswapd behaving?).
 If so what triggers it's rampage?

FYI,

Attached are some various runtime scenarios I've sampled. In each case I've
measured both (kernel running with page_alloc() vs page_alloc2()) during times
at which there are no applications running and when there are.

Also I've thrown in an additional measurement showing the state of system
after the main application (which is a server application) has been put to
some use and is no longer idling.

During the CPU intensive times when the application (mspscand) is running,
top shows that it is only getting about 40% of the CPU with the remaining
50%+ going to kswapd. With page_alloc2() configured back out, my old performance
returns on my application. top shows the application getting CPU percentages
in the high 80% range when busy. But I'm also seeing those old ksize on unknown
page type errors again.

I should also state that I'm running on a ColdFire 5282 with 16MB of SDRAM,
uclinux-2.4.32-uc0 (20060806 drop with mods), and using m68k-elf-tools-20030314
(gcc 2.95.3 and matching binutils).

HTH. I'll have to consider what to do next. For those of you who *have* tweaked
in this area in the past, please share your tweaks. I'm not above some 
experimentation
with this.

Thanks,
Dave

--
David Spain
SiCortex, Inc.
Three Clock Tower Place, Suite 210
Maynard, MA USA 01754  Email: [EMAIL PROTECTED]



Session with page_alloc() pwr(2) memory allocator + all applications


/ ps
  PID PORT STAT  SIZE SHARED %CPU COMMAND
1  S 142K 0K  0.5 /bin/init
2  S   0K 0K  0.0 keventd
3  R   0K 0K  0.0 ksoftirqd_CPU0
4  S   0K 0K  0.0 kswapd
5  S   0K 0K  0.0 bdflush
6  S   0K 0K  0.0 kupdated
   24  S  29K 0K  0.0 dhcpcd -D -H -p -a eth0
   78  S  41K 0K  0.0 portmap
   85  S   0K 0K  0.0 rpciod
   92  S 198K 4K  0.0 msh /etc/rdate.msh 10.0.0.118
   95  S1963K 0K  1.9 /bin/mspscand --execed
   96  S  70K 4K  0.0 sleep 300
   98   S0 R  37K 0K  7.6 /bin/sh
   99  S  19K 0K  0.2 /bin/inetd
  100  S  71K 4K  0.6 /bin/syslogd -n
  101  S  70K 4K  0.3 /bin/klogd -n

/ cat /proc/meminfo
total:used:free:  shared: buffers:  cached:
Mem:  12742656  7114752  56279040   610304  1236992
Swap:000
MemTotal:12444 kB
MemFree:  5496 kB
MemShared:   0 kB
Buffers:   596 kB
Active:   1040 kB
Inactive:  764 kB
HighTotal:   0 kB
HighFree:0 kB
LowTotal:12444 kB
LowFree:  5496 kB
SwapTotal:   0 kB
SwapFree:0 kB



Session with page_alloc() pwr(2) memory allocator + all applications

(After client application mspscand is no longer idle)

/ ps
  PID PORT STAT  SIZE SHARED %CPU COMMAND
1  S 142K 0K  0.0 /bin/init
2  S   0K 0K  0.0 keventd
3  R   0K 0K  0.4 ksoftirqd_CPU0
4  S   0K 0K  0.0 kswapd
5  S   0K 0K  0.0 bdflush
6  S   0K 0K  0.0 kupdated
   24  S  29K 0K  0.0 dhcpcd -D -H -p -a eth0
   78  S  41K 0K  0.0 portmap
   85  S   0K 0K  0.0 rpciod
   92  S 198K 4K  0.0 msh /etc/rdate.msh 10.0.0.118
   95  S2281K 0K 18.2 /bin/mspscand --execed
   99  S  19K 0K  0.0 /bin/inetd
  100  S  71K 4K  0.0 /bin/syslogd -n
  101  S  70K 4K  0.0 /bin/klogd -n
  113  S  70K 4K  0.0 sleep 300
  115   S0 R  30K 0K  0.8 /bin/sh
/
/ cat /proc/meminfo
total:used:free:  shared: buffers:  cached:
Mem:  12742656 12451840   2908160   610304  6197248
Swap:000
MemTotal:12444 kB
MemFree:   284 kB
MemShared:   0 kB
Buffers:   596 kB
Active:   1064 kB
Inactive: 5584 kB
HighTotal:   0 kB
HighFree:0 kB

Re: [uClinux-dev] Using page_alloc2() yields high kswapd runtime.

2007-03-15 Thread Phil Wilshire

Hi All,
This problem is close to my heart too.

On the Blackfin systems we have been working on slobs, slabs and even 
(piece of)cake allocators inside the ever dynamic 2.6 kernel.


We were, I think close to a solution but I for one am having trouble 
keeping any solution in line with kernel movement.


If you want to stream data buffers I would  take a close look at relayfs

I also have a my own simpler close relation that I use in these 
circumstances.
This will stream the data into already allocated io channels (rEALLY BIG 
FIFOS).

This totally avoids any dynamic memory allocation problems.

Just another 10C worth.

Phil Wilshire





David McCullough wrote:

Jivin Jamie Lokier lays it down ...

David McCullough wrote:

Feel free to send in some patches :-)

When they let me past the dark age of 2.4.26-uc0, maybe I will :-)

I have a few ideas to combine the better fragmentation performance of
page_alloc2.c with the speed of page_alloc.c (a hybrid of buddy and
bitmap search), plus some fragmentation-reducing strategies using
zones (nothing to do with uclinux) that were proposed for 2.6 kernels
and did well in measurements.

You know, when that copious free time rolls around :-)


I think everyone is waiting for that one :-)


Are you low on memory ? page_alloc2 gets pretty nasty about trying to
clear the caches etc as often as possible to keep as much contiguous
memory available at all times.

Rapidly allocating and freeing memory: it's streaming video from disk
at rates of 1-2MB/s, on a device with 32MB total for Linux.  Free
memory oscillates, decreasing and then jumping up every 5 seconds (on
the vendor-patched kernel).  Straight uclinux keeps the free memory
up more consistently, but at the cost of very high kswapd CPU while
streaming.


That said, I have seen systems where kswapd CPU usage is not a problem,
and oviously there are those where it is.  I don't know the cause.  2
possibilities:

1) I haven't actively used a 2.4 kernel on a non-MMU system for some
   time and the page_alloc2 code may just be wrong due to a kernel
   update and bit rot.

2) The usage on these systems is triggering the behaviour.

If you boot te hsystem is a configuration that doesn't use much RAM and
don't start and nasty big apps is the system idle (ie kswapd is
behaving).  If so what triggers it's rampage ?

I think it's the high rate of page allocation which triggers it.

There shouldn't be a need to run kswapd constantly, for file cache
pages: it should be possible to reclaim cache pages rapidly during
allocation, recycling them.  I think that's where page_alloc2.c goes
wrong.  The heuristic interaction between page_alloc.c and kswapd is
rather subtle and tricky, but the basic difference is that
page_alloc.c doesn't maximise free memory all the time; instead, it
keeps track of rapidly reclaimable memory.

Apart from the CPU difference, that means page_alloc2.c tends to fail
allocations if it really does run out of memory while kswapd is
catching up asynchronously.  (And failed allocations result in execs
crashing, ahem).  It's crashes due to memory shortage which prompted
me to investigate; the CPU differences were a surprise.

A side effect of the high CPU of kswapd with page_alloc2.c in these
situations is that allocation is noticably slower.  I noticed, to my
great surprise, that rsync was able to fetch files over the network
and write them to disk twice as fast with page_alloc.c.  (4MB/s
instead of 2MB/s).  For ages, I'd assumed it was the driver or hardware.

To summarise, I found these differences:

page_alloc.c:

 Pro: Lower CPU usage of kswapd, especially when streaming files.
 Pro: Doesn't fail allocations when lots of data in filecache;
  reclaims cache pages when needed.
 Pro: Keeps file data cached, if the pages are not required
  for something else.
 Pro: Faster allocation, surprisingly faster sometimes.
 Con: After long uptimes, with fork/execs causing large
  contiguous allocations, eventually memory will be too
  fragmented for fork/execs and the allocator is unable
  to recover.  So after long uptimes, the system will
  fail to allow telnet logins, for example, but will still
  be functioning in other ways.

page_alloc2.c:

 Con: Higher CPU usage of kswapd, especially when streaming files.
 Con: Fails allocations when lots of data in filecache which could
  be reclaimed, sometimes.
 Con: Evicts cached file data regularly.  Even tiny files which are
  read very often from disk will do I/O periodically, instead
  of always reading from cache.
 Con: Slower allocation, surprisingly so sometimes.
 Pro: After long uptimes, with fork/execs causing large contiguous
  allocations, and simultaneous streaming file data, it
  manages to keep different types of allocation separate
  enough that fragmentation is not inevitable.  Indefinitely

Re: [uClinux-dev] Using page_alloc2() yields high kswapd runtime.

2007-03-15 Thread David Spain

David McCullough wrote:

If you boot te hsystem is a configuration that doesn't use much RAM and
don't start and nasty big apps is the system idle (ie kswapd is
behaving).  If so what triggers it's rampage ?

Cheers,
Davidm



Davidm,

This came out a bit garbled. I'm going to paraphrase and hope I got this
right:

 If you boot the system in a configuration that doesn't use much RAM and
 don't start big and nasty apps is the system idle (i.e. is kswapd behaving?).
 If so what triggers it's rampage?

FYI,

Attached are some various runtime scenarios I've sampled. In each case I've
measured both (kernel running with page_alloc() vs page_alloc2()) during times
at which there are no applications running and when there are.

Also I've thrown in an additional measurement showing the state of system
after the main application (which is a server application) has been put to
some use and is no longer idling.

During the CPU intensive times when the application (mspscand) is running,
top shows that it is only getting about 40% of the CPU with the remaining
50%+ going to kswapd. With page_alloc2() configured back out, my old performance
returns on my application. top shows the application getting CPU percentages
in the high 80% range when busy. But I'm also seeing those old ksize on unknown
page type errors again.

I should also state that I'm running on a ColdFire 5282 with 16MB of SDRAM,
uclinux-2.4.32-uc0 (20060806 drop with mods), and using m68k-elf-tools-20030314
(gcc 2.95.3 and matching binutils).

HTH. I'll have to consider what to do next. For those of you who *have* tweaked
in this area in the past, please share your tweaks. I'm not above some 
experimentation
with this.

Thanks,
Dave

--
David Spain
SiCortex, Inc.
Three Clock Tower Place, Suite 210
Maynard, MA USA 01754  Email: [EMAIL PROTECTED]


-%-%-%-%-%-%-%-%-%-%-%

Session with page_alloc() pwr(2) memory allocator + all applications


/ ps
  PID PORT STAT  SIZE SHARED %CPU COMMAND
1  S 142K 0K  0.5 /bin/init
2  S   0K 0K  0.0 keventd
3  R   0K 0K  0.0 ksoftirqd_CPU0
4  S   0K 0K  0.0 kswapd
5  S   0K 0K  0.0 bdflush
6  S   0K 0K  0.0 kupdated
   24  S  29K 0K  0.0 dhcpcd -D -H -p -a eth0
   78  S  41K 0K  0.0 portmap
   85  S   0K 0K  0.0 rpciod
   92  S 198K 4K  0.0 msh /etc/rdate.msh 10.0.0.118
   95  S1963K 0K  1.9 /bin/mspscand --execed
   96  S  70K 4K  0.0 sleep 300
   98   S0 R  37K 0K  7.6 /bin/sh
   99  S  19K 0K  0.2 /bin/inetd
  100  S  71K 4K  0.6 /bin/syslogd -n
  101  S  70K 4K  0.3 /bin/klogd -n

/ cat /proc/meminfo
total:used:free:  shared: buffers:  cached:
Mem:  12742656  7114752  56279040   610304  1236992
Swap:000
MemTotal:12444 kB
MemFree:  5496 kB
MemShared:   0 kB
Buffers:   596 kB
Active:   1040 kB
Inactive:  764 kB
HighTotal:   0 kB
HighFree:0 kB
LowTotal:12444 kB
LowFree:  5496 kB
SwapTotal:   0 kB
SwapFree:0 kB



Session with page_alloc() pwr(2) memory allocator + all applications

(After client application mspscand is no longer idle)

/ ps
  PID PORT STAT  SIZE SHARED %CPU COMMAND
1  S 142K 0K  0.0 /bin/init
2  S   0K 0K  0.0 keventd
3  R   0K 0K  0.4 ksoftirqd_CPU0
4  S   0K 0K  0.0 kswapd
5  S   0K 0K  0.0 bdflush
6  S   0K 0K  0.0 kupdated
   24  S  29K 0K  0.0 dhcpcd -D -H -p -a eth0
   78  S  41K 0K  0.0 portmap
   85  S   0K 0K  0.0 rpciod
   92  S 198K 4K  0.0 msh /etc/rdate.msh 10.0.0.118
   95  S2281K 0K 18.2 /bin/mspscand --execed
   99  S  19K 0K  0.0 /bin/inetd
  100  S  71K 4K  0.0 /bin/syslogd -n
  101  S  70K 4K  0.0 /bin/klogd -n
  113  S  70K 4K  0.0 sleep 300
  115   S0 R  30K 0K  0.8 /bin/sh
/
/ cat /proc/meminfo
total:used:free:  shared: buffers:  cached:
Mem:  12742656 12451840   2908160   610304  6197248
Swap:000
MemTotal:12444 kB
MemFree:   284 kB
MemShared:   0 kB
Buffers:   596 kB
Active:   1064 kB
Inactive: 5584 kB
HighTotal:   0 kB
HighFree:0 kB
LowTotal:12444 kB
LowFree:   284 kB
SwapTotal:   0 kB
SwapFree:0 kB



Session with page_alloc2() memory allocator + no applications
=

/ ps
  PID PORT STAT  SIZE SHARED %CPU 

Re: [uClinux-dev] Using page_alloc2() yields high kswapd runtime.

2007-03-15 Thread Jamie Lokier
Aristotelis Iordanidis wrote:
 We've run into the same problem, working with an armnommu platform.
 We tracked down the root of the high cpu load to the function 
 kswapd_balance_pgdat() in linux-2.4.x/mmnommu/vmscan.c.
 The problem occurs only when using the non-power-of-2 memory allocator 
 (i.e. CONFIG_CONTIGUOUS_PAGE_ALLOC is defined).
 Anyway, all this seems to be caused by the following piece of code:
 
 #ifndef CONFIG_CONTIGUOUS_PAGE_ALLOC /* we always want the memory now !! */
__set_current_state(TASK_INTERRUPTIBLE);
schedule_timeout(HZ*5);
 #endif
 
 As a workaround, we changed it as shown below, for our architecture 
 (e.g. CONFIG_ARCH_MINE):

 [Re-enable the time delay, but shorter, for CONTIGUOUS_PAGE_ALLOC].

That's the same as what we ended up with, for streaming HD video from
disk.  (The only difference is we settled on HZ/5 instead of HZ/10 in
your case.)

It's unfortunate that it has to be tuned for a particular application
and memory size: too little delay, the and CPU is high; too much, and
the reclamation rate is not sufficient for a particular rate of file
reading vs free RAM, due to the spikiness of the reclamation
process.  This is where synchronous reclamation, as page_alloc.c does,
would be better.  It could be added to page_alloc2.c, but clearly
everyone is busy doing something else :-)

-- Jamie
___
uClinux-dev mailing list
uClinux-dev@uclinux.org
http://mailman.uclinux.org/mailman/listinfo/uclinux-dev
This message was resent by uclinux-dev@uclinux.org
To unsubscribe see:
http://mailman.uclinux.org/mailman/options/uclinux-dev


Re: [uClinux-dev] Using page_alloc2() yields high kswapd runtime.

2007-03-14 Thread David Spain

I forgot to add the obligatory:

uclinux-2.4.32-uc0 from the 20060803 drop.
Compiled with the gcc 2.95.3 binutils.

Dave

--
David Spain
SiCortex, Inc.
Three Clock Tower Place, Suite 210
Maynard, MA USA 01754  Email: [EMAIL PROTECTED]
___
uClinux-dev mailing list
uClinux-dev@uclinux.org
http://mailman.uclinux.org/mailman/listinfo/uclinux-dev
This message was resent by uclinux-dev@uclinux.org
To unsubscribe see:
http://mailman.uclinux.org/mailman/options/uclinux-dev


Re: [uClinux-dev] Using page_alloc2() yields high kswapd runtime.

2007-03-14 Thread Jamie Lokier
David Spain wrote:
 I forgot to add the obligatory:
 
 uclinux-2.4.32-uc0 from the 20060803 drop.
 Compiled with the gcc 2.95.3 binutils.

page_alloc2.c is better for reducing fragmentation and also being less
sensitive to it, but it doesn't interact with kswapd's wakeup logic in
quite the way it's supposed to, as far as I can tell.  It seems to
make it work more often than necessary.

page_alloc.c is better for fast allocations, and for keeping more
memory free when there is a steady stream of allocations (e.g. when
streaming data from disk), but after many allocation-free cycles of
large blocks (e.g. when running executables), the system becomes very
fragmented.

(I was bitten by this in a different way: I'm using vendor-supplied
uclinux kernels, and they are configured to use page_alloc2.c.  The
added CPU usage caused the vendor to tweak things to reduce it, and
when streaming files from disk those tweaks caused kswapd to fail to
respond quickly enough, causing out of memory failures...  But the
unpatched uclinux code used too much CPU.  We found a compromise).

-- Jamie
___
uClinux-dev mailing list
uClinux-dev@uclinux.org
http://mailman.uclinux.org/mailman/listinfo/uclinux-dev
This message was resent by uclinux-dev@uclinux.org
To unsubscribe see:
http://mailman.uclinux.org/mailman/options/uclinux-dev


Re: [uClinux-dev] Using page_alloc2() yields high kswapd runtime.

2007-03-14 Thread Andrew Kohlsmith
On Wednesday 14 March 2007 7:35 pm, David McCullough wrote:
 If you boot te hsystem is a configuration that doesn't use much RAM and
 don't start and nasty big apps is the system idle (ie kswapd is
 behaving).  If so what triggers it's rampage ?

I'd noticed this too (page_alloc2() high CPU use) when I started development 
of my system.  I am afraid I didn't know enough about it, figured I'm not 
that short on memory anyway and used the standard power-of-2 allocator.  
This was (is) 2.4.31-uc0 on MCF5282.

-A.
___
uClinux-dev mailing list
uClinux-dev@uclinux.org
http://mailman.uclinux.org/mailman/listinfo/uclinux-dev
This message was resent by uclinux-dev@uclinux.org
To unsubscribe see:
http://mailman.uclinux.org/mailman/options/uclinux-dev


Re: [uClinux-dev] Using page_alloc2() yields high kswapd runtime.

2007-03-14 Thread David McCullough

Jivin Jamie Lokier lays it down ...
 David McCullough wrote:
  Feel free to send in some patches :-)
 
 When they let me past the dark age of 2.4.26-uc0, maybe I will :-)
 
 I have a few ideas to combine the better fragmentation performance of
 page_alloc2.c with the speed of page_alloc.c (a hybrid of buddy and
 bitmap search), plus some fragmentation-reducing strategies using
 zones (nothing to do with uclinux) that were proposed for 2.6 kernels
 and did well in measurements.
 
 You know, when that copious free time rolls around :-)

I think everyone is waiting for that one :-)

  Are you low on memory ? page_alloc2 gets pretty nasty about trying to
  clear the caches etc as often as possible to keep as much contiguous
  memory available at all times.
 
 Rapidly allocating and freeing memory: it's streaming video from disk
 at rates of 1-2MB/s, on a device with 32MB total for Linux.  Free
 memory oscillates, decreasing and then jumping up every 5 seconds (on
 the vendor-patched kernel).  Straight uclinux keeps the free memory
 up more consistently, but at the cost of very high kswapd CPU while
 streaming.
 
  That said, I have seen systems where kswapd CPU usage is not a problem,
  and oviously there are those where it is.  I don't know the cause.  2
  possibilities:
  
  1) I haven't actively used a 2.4 kernel on a non-MMU system for some
 time and the page_alloc2 code may just be wrong due to a kernel
 update and bit rot.
  
  2) The usage on these systems is triggering the behaviour.
  
  If you boot te hsystem is a configuration that doesn't use much RAM and
  don't start and nasty big apps is the system idle (ie kswapd is
  behaving).  If so what triggers it's rampage ?
 
 I think it's the high rate of page allocation which triggers it.
 
 There shouldn't be a need to run kswapd constantly, for file cache
 pages: it should be possible to reclaim cache pages rapidly during
 allocation, recycling them.  I think that's where page_alloc2.c goes
 wrong.  The heuristic interaction between page_alloc.c and kswapd is
 rather subtle and tricky, but the basic difference is that
 page_alloc.c doesn't maximise free memory all the time; instead, it
 keeps track of rapidly reclaimable memory.
 
 Apart from the CPU difference, that means page_alloc2.c tends to fail
 allocations if it really does run out of memory while kswapd is
 catching up asynchronously.  (And failed allocations result in execs
 crashing, ahem).  It's crashes due to memory shortage which prompted
 me to investigate; the CPU differences were a surprise.
 
 A side effect of the high CPU of kswapd with page_alloc2.c in these
 situations is that allocation is noticably slower.  I noticed, to my
 great surprise, that rsync was able to fetch files over the network
 and write them to disk twice as fast with page_alloc.c.  (4MB/s
 instead of 2MB/s).  For ages, I'd assumed it was the driver or hardware.
 
 To summarise, I found these differences:
 
 page_alloc.c:
 
  Pro: Lower CPU usage of kswapd, especially when streaming files.
  Pro: Doesn't fail allocations when lots of data in filecache;
   reclaims cache pages when needed.
  Pro: Keeps file data cached, if the pages are not required
   for something else.
  Pro: Faster allocation, surprisingly faster sometimes.
  Con: After long uptimes, with fork/execs causing large
   contiguous allocations, eventually memory will be too
   fragmented for fork/execs and the allocator is unable
   to recover.  So after long uptimes, the system will
   fail to allow telnet logins, for example, but will still
   be functioning in other ways.
 
 page_alloc2.c:
 
  Con: Higher CPU usage of kswapd, especially when streaming files.
  Con: Fails allocations when lots of data in filecache which could
   be reclaimed, sometimes.
  Con: Evicts cached file data regularly.  Even tiny files which are
   read very often from disk will do I/O periodically, instead
   of always reading from cache.
  Con: Slower allocation, surprisingly so sometimes.
  Pro: After long uptimes, with fork/execs causing large contiguous
   allocations, and simultaneous streaming file data, it
   manages to keep different types of allocation separate
   enough that fragmentation is not inevitable.  Indefinitely
   long uptimes are realistically possible.
 
 In the end, we stuck with page_alloc2.c because of that last point.
 Our systems either crash and burn (with watchdog recovery), or telnet
 still works :) But we like every performance characteristic of
 page_alloc.c more.
 
 The CPU usage of kswapd was a problem, and the crashing when too much
 file data cached (due to fast streaming) was a big problem, so we
 tuned kswapd to a sweet spot for this application, and did everything
 possible with XIP-in-RAM to free up memory.  Currently we have 11MB
 free (out of 32MB) which