Simon Reber <s.re...@lcsys.ch> wrote:
> Well I have a script that loops through all running processes, etc.
> checked the used swap space and calculates the sum from all together.
> When running the script, it always tells me: "Swap usage overall: 0" -
> checking all processes on single base I only see 0
> So I really don't understand how swap can be in use when no process has
> a reference to it.
> Maybe a little background the XML database requires having swap
> available and it also seems to use it.

My low-level kernel internal knowledge (beyond basic tuning) is even
more dated than my JRE/VM internals (I really need to change that), so
I'd have to dive deeper into how various vm/proc information is
instrumented in the kernel before I answer further.  This could
include how pages are allocated and marked by the program itself.

> The problem is, when we stop the XML database (which is java) and the
> Oracle Database, swap isn't freed up.
> When I do a swapoff -a (when all applications are shutdown) it takes
> like 2 hours to free it up and have it available for maintenance

Then they really are pages that are virtually never utilized.  My
apologies, as I might have missed it, but are you seeing any
performance issues?  Is there a "wake-up" event that causes paging en
masse?

Things like sysstat's sa reporting, where you capture the disk
statistics would be extremely helpful, or just run its vmstat for 24
hours and looking for spikes in paging.  If you see reads, and
reduction of swap usage, it could be the event where that data and/or
object is finally utilized.

> I already raised a change today morning to bring swappiness down to 1 -
> but my concern is that it won't fix the problem (change not yet approved
> but hopefully soon).

Again, it's probably best to "define" the problem.  Is this affecting
performance at all?  That is the question.  Sysstat helps
tremendously.

Just looking back at your "snapshot," you are using only 8GiB for
resident objects/data, and another 56GiB is used for read caching.
Your buffers are less than 1GiB in the snapshot.  So your system is
easily not swamped, at least at that point-in-time, and using nearly
all of your memory for caching reads.

> Also, the server isn't very busy during normal operation (from I/O and
> Load perspective) - only if they load the XML database with new data,
> the load and I/O goes up.

Is that XML database largely in memory?  Or is it sizable, but
possibly accounting for much of your read cache?

> Anyhow, I'd like to share some more kernel settings with you - maybe you
> can see something we are doing wrong or at least where we can improve
> kernel.shmmax = 56968879104 (this value is set by puppet to 75% of the
> real available memory)
> kernel.shmall = 4294967296
> vm.max_reclaims_in_progress = 0
> vm.pagecache = 100
> vm.swap_token_timeout = 300     0
> vm.vfs_cache_pressure = 100
> vm.max_map_count = 65536
> vm.percpu_pagelist_fraction = 0
> vm.min_free_kbytes = 34511

Yep, only 0.03GiB must remain free, so you literally have plenty of room.

> vm.lowmem_reserve_ratio = 256   256     32
> vm.swappiness = 60

Default swappiness is the root of all evil.  If you don't want to use
swap aggressively, you never leave this in the double digits.  If you
leave it at 60, expect swap to always be utilized.

Just because swap is utilized does _not_ mean you're out of physical
memory.  It's just the kernel, aggressively in this case, leaving free
memory for buffers that may need to be immediately used, etc..

In fact, considering your dirty background ratio is 10%, that's
7.2GiB, and roughly the amount of free memory you have.  I.e., the
kernel is likely reserving that amount, and not using it for read
cache (stopping at 56GiB, with 8GiB of other resident usage), in case
your system receives off a multi-GiB write in a fraction of a second.

> vm.dirty_expire_centisecs = 2999
> vm.dirty_writeback_centisecs = 499
> vm.dirty_ratio = 40
> vm.dirty_background_ratio = 10
> vm.page-cluster = 3

These are all defaults and never ideal for such large memory systems.
_However_, as I mentioned earlier, this likely does not impact your
symptoms at all.  At most, as I mentioned above, the dirty ratio of
10% could be a factor in why the kernel is leaving around 8GiB free,
and not using more for read cache.

> vm.overcommit_ratio = 50


--
Bryan J Smith - Professional, Technical Annoyance
http://www.linkedin.com/in/bjsmith

_______________________________________________
rhelv5-list mailing list
rhelv5-list@redhat.com
https://www.redhat.com/mailman/listinfo/rhelv5-list

Reply via email to