Simon Reber <s.re...@lcsys.ch> wrote: > Well I have a script that loops through all running processes, etc. > checked the used swap space and calculates the sum from all together. > When running the script, it always tells me: "Swap usage overall: 0" - > checking all processes on single base I only see 0 > So I really don't understand how swap can be in use when no process has > a reference to it. > Maybe a little background the XML database requires having swap > available and it also seems to use it.
My low-level kernel internal knowledge (beyond basic tuning) is even more dated than my JRE/VM internals (I really need to change that), so I'd have to dive deeper into how various vm/proc information is instrumented in the kernel before I answer further. This could include how pages are allocated and marked by the program itself. > The problem is, when we stop the XML database (which is java) and the > Oracle Database, swap isn't freed up. > When I do a swapoff -a (when all applications are shutdown) it takes > like 2 hours to free it up and have it available for maintenance Then they really are pages that are virtually never utilized. My apologies, as I might have missed it, but are you seeing any performance issues? Is there a "wake-up" event that causes paging en masse? Things like sysstat's sa reporting, where you capture the disk statistics would be extremely helpful, or just run its vmstat for 24 hours and looking for spikes in paging. If you see reads, and reduction of swap usage, it could be the event where that data and/or object is finally utilized. > I already raised a change today morning to bring swappiness down to 1 - > but my concern is that it won't fix the problem (change not yet approved > but hopefully soon). Again, it's probably best to "define" the problem. Is this affecting performance at all? That is the question. Sysstat helps tremendously. Just looking back at your "snapshot," you are using only 8GiB for resident objects/data, and another 56GiB is used for read caching. Your buffers are less than 1GiB in the snapshot. So your system is easily not swamped, at least at that point-in-time, and using nearly all of your memory for caching reads. > Also, the server isn't very busy during normal operation (from I/O and > Load perspective) - only if they load the XML database with new data, > the load and I/O goes up. Is that XML database largely in memory? Or is it sizable, but possibly accounting for much of your read cache? > Anyhow, I'd like to share some more kernel settings with you - maybe you > can see something we are doing wrong or at least where we can improve > kernel.shmmax = 56968879104 (this value is set by puppet to 75% of the > real available memory) > kernel.shmall = 4294967296 > vm.max_reclaims_in_progress = 0 > vm.pagecache = 100 > vm.swap_token_timeout = 300 0 > vm.vfs_cache_pressure = 100 > vm.max_map_count = 65536 > vm.percpu_pagelist_fraction = 0 > vm.min_free_kbytes = 34511 Yep, only 0.03GiB must remain free, so you literally have plenty of room. > vm.lowmem_reserve_ratio = 256 256 32 > vm.swappiness = 60 Default swappiness is the root of all evil. If you don't want to use swap aggressively, you never leave this in the double digits. If you leave it at 60, expect swap to always be utilized. Just because swap is utilized does _not_ mean you're out of physical memory. It's just the kernel, aggressively in this case, leaving free memory for buffers that may need to be immediately used, etc.. In fact, considering your dirty background ratio is 10%, that's 7.2GiB, and roughly the amount of free memory you have. I.e., the kernel is likely reserving that amount, and not using it for read cache (stopping at 56GiB, with 8GiB of other resident usage), in case your system receives off a multi-GiB write in a fraction of a second. > vm.dirty_expire_centisecs = 2999 > vm.dirty_writeback_centisecs = 499 > vm.dirty_ratio = 40 > vm.dirty_background_ratio = 10 > vm.page-cluster = 3 These are all defaults and never ideal for such large memory systems. _However_, as I mentioned earlier, this likely does not impact your symptoms at all. At most, as I mentioned above, the dirty ratio of 10% could be a factor in why the kernel is leaving around 8GiB free, and not using more for read cache. > vm.overcommit_ratio = 50 -- Bryan J Smith - Professional, Technical Annoyance http://www.linkedin.com/in/bjsmith _______________________________________________ rhelv5-list mailing list rhelv5-list@redhat.com https://www.redhat.com/mailman/listinfo/rhelv5-list