On 19.03.2017 16:18, Michal Hocko wrote:
On Fri 17-03-17 21:08:31, Gerhard Wiesinger wrote:
On 17.03.2017 18:13, Michal Hocko wrote:
On Fri 17-03-17 17:37:48, Gerhard Wiesinger wrote:
[...]
Why does the kernel prefer to swapin/out and not use

a.) the free memory?
It will use all the free memory up to min watermark which is set up
based on min_free_kbytes.
Makes sense, how is /proc/sys/vm/min_free_kbytes default value calculated?
See init_per_zone_wmark_min

b.) the buffer/cache?
the memory reclaim is strongly biased towards page cache and we try to
avoid swapout as much as possible (see get_scan_count).
If I understand it correctly, swapping is preferred over dropping the
cache, right. Can this behaviour be changed to prefer dropping the
cache to some minimum amount?  Is this also configurable in a way?
No, we enforce swapping if the amount of free + file pages are below the
cumulative high watermark.

(As far as I remember e.g. kernel 2.4 dropped the caches well).

There is ~100M memory available but kernel swaps all the time ...

Any ideas?

Kernel: 4.9.14-200.fc25.x86_64

top - 17:33:43 up 28 min,  3 users,  load average: 3.58, 1.67, 0.89
Tasks: 145 total,   4 running, 141 sleeping,   0 stopped,   0 zombie
%Cpu(s): 19.1 us, 56.2 sy,  0.0 ni,  4.3 id, 13.4 wa, 2.0 hi,  0.3 si,  4.7
st
KiB Mem :   230076 total,    61508 free,   123472 used,    45096 buff/cache

procs -----------memory---------- ---swap-- -----io---- -system--
------cpu-----
  r  b   swpd   free   buff  cache   si   so    bi    bo in   cs us sy id wa st
  3  5 303916  60372    328  43864 27828  200 41420   236 6984 11138 11 47  6 
23 14
I am really surprised to see any reclaim at all. 26% of free memory
doesn't sound as if we should do a reclaim at all. Do you have an
unusual configuration of /proc/sys/vm/min_free_kbytes ? Or is there
anything running inside a memory cgroup with a small limit?
nothing special set regarding /proc/sys/vm/min_free_kbytes (default values),
detailed config below. Regarding cgroups, none of I know. How to check (I
guess nothing is set because cg* commands are not available)?
be careful because systemd started to use some controllers. You can
easily check cgroup mount points.

See below.


/proc/sys/vm/min_free_kbytes
45056
So at least 45M will be kept reserved for the system. Your data
indicated you had more memory. How does /proc/zoneinfo look like?
Btw. you seem to be using fc kernel, are there any patches applied on
top of Linus tree? Could you try to retest vanilla kernel?


System looks normally now, FYI (e.g. now permanent swapping)


free
total used free shared buff/cache available
Mem:         349076      154112       41560         184 153404      148716
Swap:       2064380      831844     1232536

cat /proc/zoneinfo

Node 0, zone      DMA
  per-node stats
      nr_inactive_anon 9543
      nr_active_anon 22105
      nr_inactive_file 9877
      nr_active_file 13416
      nr_unevictable 0
      nr_isolated_anon 0
      nr_isolated_file 0
      nr_pages_scanned 0
      workingset_refault 1926013
      workingset_activate 707166
      workingset_nodereclaim 187276
      nr_anon_pages 11429
      nr_mapped    6852
      nr_file_pages 46772
      nr_dirty     1
      nr_writeback 0
      nr_writeback_temp 0
      nr_shmem     46
      nr_shmem_hugepages 0
      nr_shmem_pmdmapped 0
      nr_anon_transparent_hugepages 0
      nr_unstable  0
      nr_vmscan_write 3319047
      nr_vmscan_immediate_reclaim 32363
      nr_dirtied   222115
      nr_written   3537529
  pages free     3110
        min      27
        low      33
        high     39
   node_scanned  0
        spanned  4095
        present  3998
        managed  3977
      nr_free_pages 3110
      nr_zone_inactive_anon 18
      nr_zone_active_anon 3
      nr_zone_inactive_file 51
      nr_zone_active_file 75
      nr_zone_unevictable 0
      nr_zone_write_pending 0
      nr_mlock     0
      nr_slab_reclaimable 214
      nr_slab_unreclaimable 289
      nr_page_table_pages 185
      nr_kernel_stack 16
      nr_bounce    0
      nr_zspages   0
      numa_hit     1214071
      numa_miss    0
      numa_foreign 0
      numa_interleave 0
      numa_local   1214071
      numa_other   0
      nr_free_cma  0
        protection: (0, 306, 306, 306, 306)
  pagesets
    cpu: 0
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 4
    cpu: 1
              count: 0
              high:  0
              batch: 1
  vm stats threshold: 4
  node_unreclaimable:  0
  start_pfn:           1
  node_inactive_ratio: 0
Node 0, zone    DMA32
  pages free     7921
        min      546
        low      682
        high     818
   node_scanned  0
        spanned  94172
        present  94172
        managed  83292
      nr_free_pages 7921
      nr_zone_inactive_anon 9525
      nr_zone_active_anon 22102
      nr_zone_inactive_file 9826
      nr_zone_active_file 13341
      nr_zone_unevictable 0
      nr_zone_write_pending 1
      nr_mlock     0
      nr_slab_reclaimable 5829
      nr_slab_unreclaimable 8622
      nr_page_table_pages 2638
      nr_kernel_stack 2208
      nr_bounce    0
      nr_zspages   0
      numa_hit     23125334
      numa_miss    0
      numa_foreign 0
      numa_interleave 14307
      numa_local   23125334
      numa_other   0
      nr_free_cma  0
        protection: (0, 0, 0, 0, 0)
  pagesets
    cpu: 0
              count: 17
              high:  90
              batch: 15
  vm stats threshold: 12
    cpu: 1
              count: 55
              high:  90
              batch: 15
  vm stats threshold: 12
  node_unreclaimable:  0
  start_pfn:           4096
  node_inactive_ratio: 0

mount | grep cgroup
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd) cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio) cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpu,cpuacct) cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset) cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls,net_prio) cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb) cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids) cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory) cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event) cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices) cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)

There are patches (see below), but as far as I saw nothing regarding the issues which happen.


BTW: Does it make sense to reduce lower limit for low mem VMs? e.g.

echo "10000" > /proc/sys/vm/min_free_kbytes


Thnx.

Ciao,

Gerhard

https://koji.fedoraproject.org/koji/buildinfo?buildID=870215

## Patches needed for building this package

# build tweak for build ID magic, even for -vanilla
Patch001: kbuild-AFTER_LINK.patch

## compile fixes

# ongoing complaint, full discussion delayed until ksummit/plumbers
Patch002: 0001-iio-Use-event-header-from-kernel-tree.patch

%if !%{nopatches}

# Git trees.

# Standalone patches

# a tempory patch for QCOM hardware enablement. Will be gone by end of 2016/F-26 GA
Patch420: qcom-QDF2432-tmp-errata.patch

# http://www.spinics.net/lists/arm-kernel/msg490981.html
Patch421: geekbox-v4-device-tree-support.patch

# http://www.spinics.net/lists/linux-tegra/msg26029.html
Patch422: usb-phy-tegra-Add-38.4MHz-clock-table-entry.patch

# Fix OMAP4 (pandaboard)
Patch423: arm-revert-mmc-omap_hsmmc-Use-dma_request_chan-for-reque.patch

# Not particularly happy we don't yet have a proper upstream resolution this is the right direction
# https://www.spinics.net/lists/arm-kernel/msg535191.html
Patch424: arm64-mm-Fix-memmap-to-be-initialized-for-the-entire-section.patch

# http://patchwork.ozlabs.org/patch/587554/
Patch425: ARM-tegra-usb-no-reset.patch

Patch426: AllWinner-net-emac.patch

# http://www.spinics.net/lists/devicetree/msg163238.html
Patch430: bcm2837-initial-support.patch

# http://www.spinics.net/lists/dri-devel/msg132235.html
Patch433: drm-vc4-Fix-OOPSes-from-trying-to-cache-a-partially-constructed-BO..patch

# bcm283x mmc for wifi http://www.spinics.net/lists/arm-kernel/msg567077.html
Patch434: bcm283x-mmc-bcm2835.patch

# Upstream fixes for i2c/serial/ethernet MAC addresses
Patch435: bcm283x-fixes.patch

# https://lists.freedesktop.org/archives/dri-devel/2017-February/133823.html
Patch436: vc4-fix-vblank-cursor-update-issue.patch

# http://www.spinics.net/lists/arm-kernel/msg552554.html
Patch438: arm-imx6-hummingboard2.patch

Patch460: lib-cpumask-Make-CPUMASK_OFFSTACK-usable-without-deb.patch

Patch466: input-kill-stupid-messages.patch

Patch467: die-floppy-die.patch

Patch468: no-pcspkr-modalias.patch

Patch470: silence-fbcon-logo.patch

Patch471: Kbuild-Add-an-option-to-enable-GCC-VTA.patch

Patch472: crash-driver.patch

Patch473: efi-lockdown.patch

Patch487: Add-EFI-signature-data-types.patch

Patch488: Add-an-EFI-signature-blob-parser-and-key-loader.patch

# This doesn't apply. It seems like it could be replaced by
# https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=5ac7eace2d00eab5ae0e9fdee63e38aee6001f7c
# which has an explicit line about blacklisting
Patch489: KEYS-Add-a-system-blacklist-keyring.patch

Patch490: MODSIGN-Import-certificates-from-UEFI-Secure-Boot.patch

Patch491: MODSIGN-Support-not-importing-certs-from-db.patch

Patch493: drm-i915-hush-check-crtc-state.patch

Patch494: disable-i8042-check-on-apple-mac.patch

Patch495: lis3-improve-handling-of-null-rate.patch

Patch497: scsi-sd_revalidate_disk-prevent-NULL-ptr-deref.patch

Patch498: criu-no-expert.patch

Patch499: ath9k-rx-dma-stop-check.patch

Patch500: xen-pciback-Don-t-disable-PCI_COMMAND-on-PCI-device-.patch

Patch501: Input-synaptics-pin-3-touches-when-the-firmware-repo.patch

Patch502: firmware-Drop-WARN-from-usermodehelper_read_trylock-.patch

# Patch503: drm-i915-turn-off-wc-mmaps.patch

Patch509: MODSIGN-Don-t-try-secure-boot-if-EFI-runtime-is-disa.patch

#CVE-2016-3134 rhbz 1317383 1317384
Patch665: netfilter-x_tables-deal-with-bogus-nextoffset-values.patch

# grabbed from mailing list
Patch667: v3-Revert-tty-serial-pl011-add-ttyAMA-for-matching-pl011-console.patch

# END OF PATCH DEFINITIONS

Reply via email to