increasing overcommit_ratio to 95 solved the problem, the box is now using it's memory as expected without needing to resort to swap.
On Tue, Oct 21, 2014 at 3:55 PM, Montana Low <montana...@gmail.com> wrote: > I didn't realize that about overcommit_ratio. It was at 50, I've changed > it to 95. I'll see if that clears up the problem moving forward. > > # cat /proc/meminfo > MemTotal: 30827220 kB > MemFree: 153524 kB > MemAvailable: 17941864 kB > Buffers: 6188 kB > Cached: 24560208 kB > SwapCached: 0 kB > Active: 20971256 kB > Inactive: 8538660 kB > Active(anon): 12460680 kB > Inactive(anon): 36612 kB > Active(file): 8510576 kB > Inactive(file): 8502048 kB > Unevictable: 0 kB > Mlocked: 0 kB > SwapTotal: 0 kB > SwapFree: 0 kB > Dirty: 50088 kB > Writeback: 160 kB > AnonPages: 4943740 kB > Mapped: 7571496 kB > Shmem: 7553176 kB > Slab: 886428 kB > SReclaimable: 858936 kB > SUnreclaim: 27492 kB > KernelStack: 4208 kB > PageTables: 188352 kB > NFS_Unstable: 0 kB > Bounce: 0 kB > WritebackTmp: 0 kB > CommitLimit: 15413608 kB > Committed_AS: 14690544 kB > VmallocTotal: 34359738367 kB > VmallocUsed: 59012 kB > VmallocChunk: 34359642367 kB > HugePages_Total: 0 > HugePages_Free: 0 > HugePages_Rsvd: 0 > HugePages_Surp: 0 > Hugepagesize: 2048 kB > DirectMap4k: 31465472 kB > DirectMap2M: 0 kB > > > > # sysctl -a: > > vm.admin_reserve_kbytes = 8192 > > vm.block_dump = 0 > > vm.dirty_background_bytes = 0 > > vm.dirty_background_ratio = 10 > > vm.dirty_bytes = 0 > > vm.dirty_expire_centisecs = 3000 > > vm.dirty_ratio = 20 > > vm.dirty_writeback_centisecs = 500 > > vm.drop_caches = 0 > > vm.extfrag_threshold = 500 > > vm.hugepages_treat_as_movable = 0 > > vm.hugetlb_shm_group = 0 > > vm.laptop_mode = 0 > > vm.legacy_va_layout = 0 > > vm.lowmem_reserve_ratio = 256 256 32 > > vm.max_map_count = 65530 > > vm.min_free_kbytes = 22207 > > vm.min_slab_ratio = 5 > > vm.min_unmapped_ratio = 1 > > vm.mmap_min_addr = 4096 > > vm.nr_hugepages = 0 > > vm.nr_hugepages_mempolicy = 0 > > vm.nr_overcommit_hugepages = 0 > > vm.nr_pdflush_threads = 0 > > vm.numa_zonelist_order = default > > vm.oom_dump_tasks = 1 > > vm.oom_kill_allocating_task = 0 > > vm.overcommit_kbytes = 0 > > vm.overcommit_memory = 2 > > vm.overcommit_ratio = 50 > > vm.page-cluster = 3 > > vm.panic_on_oom = 0 > > vm.percpu_pagelist_fraction = 0 > > vm.scan_unevictable_pages = 0 > > vm.stat_interval = 1 > > vm.swappiness = 0 > > vm.user_reserve_kbytes = 131072 > > vm.vfs_cache_pressure = 100 > > vm.zone_reclaim_mode = 0 > > > > > > > On Tue, Oct 21, 2014 at 3:46 PM, Tomas Vondra <t...@fuzzy.cz> wrote: > > > > Dne 22 Říjen 2014, 0:25, Montana Low napsal(a): > > > I'm running postgres-9.3 on a 30GB ec2 xen instance w/ linux kernel > > > 3.16.3. > > > I receive numerous Error: out of memory messages in the log, which are > > > aborting client requests, even though there appears to be 23GB > available > > > in > > > the OS cache. > > > > > > There is no swap on the box. Postgres is behind pgbouncer to protect > from > > > the 200 real clients, which limits connections to 32, although there > are > > > rarely more than 20 active connections, even though postgres > > > max_connections is set very high for historic reasons. There is also a > 4GB > > > java process running on the box. > > > > > > > > > > > > > > > relevant postgresql.conf: > > > > > > max_connections = 1000 # (change requires restart) > > > shared_buffers = 7GB # min 128kB > > > work_mem = 40MB # min 64kB > > > maintenance_work_mem = 1GB # min 1MB > > > effective_cache_size = 20GB > > > > > > > > > > > > sysctl.conf: > > > > > > vm.swappiness = 0 > > > vm.overcommit_memory = 2 > > > > This means you have 'no overcommit', so the amount of memory is limited > by > > overcommit_ratio + swap. The default value for overcommit_ratio is 50% > > RAM, and as you have no swap that effectively means only 50% of the RAM > is > > available to the system. > > > > If you want to verify this, check /proc/meminfo - see the lines > > CommitLimit (the current limit) and Commited_AS (committed address > space). > > Once the committed_as reaches the limit, it's game over. > > > > There are different ways to fix this, or at least improve that: > > > > (1) increasing the overcommit_ratio (clearly, 50% is way too low - > > something 90% might be more appropriate on 30GB RAM without swap) > > > > (2) adding swap (say a small ephemeral drive, with swappiness=10 or > > something like that) > > > > Tomas > > >