Re: High vm scan rate and dropped keystrokes thru X?
On Tue, Jul 27, 2021 at 06:28:39PM +1200, Lloyd Parkes wrote: > > > On 27/07/21 12:19 am, Paul Ripke wrote: > > On Mon, Jul 26, 2021 at 05:53:19PM +1200, Lloyd Parkes wrote: > > > That's 12GB of RAM in use and 86MB of RAM free. Sounds pretty awful to me. > > > > Sounds normal to me - I don't expect to see any free RAM unless I've just > > - exited a large process > > - deleted a large file with large cache footprint > > - released a large chunk of RAM by other means (mmap, madvise, semctl, etc). > > I haven't run NetBSD on a desktop for a while now, but I still think 12GB is > a lot of memory in use. Maybe I'll get a new MacBook when they start > shipping 32GB Apple CPU ones and then put NetBSD on my current MacBook. There's a bunch of junk running. 3 java processes for 3GiB, mongodb, postgres, apache, firefox, prusa slicer, and it runs as the local network router/proxy with all the usual junk running. I also run pkgsrc builds and netbsd builds, and it handles all that fine. > > A big chunk of it is in file cache, which is unsurprising when reading > > thru a 400GiB file... > > Page activity lasts 20s and at 30MB/s that means you should have 600MB of > file data active. Add 50% for inactive pages and that's still only 900MB. > I'm willing to bet money that zstd only reads each block of data once > (sequentially in fact) and so it doesn't need any file data cache at all. > File metadata is a different matter, but that probably stays active and > there won't be much of it. Yes, it's just cache churn due to sequential read I/O. I can cat the file thru zstd with the same effect. I can even cat the file to /dev/null with the same issue. Yes, the file data cache is pure cost in this case. > I suspect that your vm.filemax is set to more memory than you have available > for the file cache and once that happens anonymous pages start to get > swapped out. My experience is that while anonymous pages sound unimportant, > they are in fact the most important pages to keep in RAM. Thinking about it, > they are the irreplaceable bits of all our running software. > > Try setting vm.filemin=5 and vm.filemax=10. Really. I did it when processing > vast amounts of files in CVS and it worked for me. I would agree, except there's basically zero paging activity for the entire duration. I tried this anyway, and there's no change in behaviour, whatsoever. > Out of curiosity, what are you doing with zstd. You mentioned backups. Is > this dump or restore? dump implements its own file cache, which won;t help > with the memory burden. I just do compressed dumps to an external drive. Doing the dump is fine, but just reading it back leads to bad performance when the page daemon goes nuts. > "top -ores" will tell you what programs are using the most anonymous pages, > which might help identify where all this memory pressure is coming from. I know these, but there is no real memory pressure. It's just that normally the page daemon scans and frees the same number of pages, but for some reason, at some point, it starts scanning 1M+ pages without freeing any. -- Paul Ripke "Great minds discuss ideas, average minds discuss events, small minds discuss people." -- Disputed: Often attributed to Eleanor Roosevelt. 1948.
Re: High vm scan rate and dropped keystrokes thru X?
On 27/07/21 12:19 am, Paul Ripke wrote: On Mon, Jul 26, 2021 at 05:53:19PM +1200, Lloyd Parkes wrote: That's 12GB of RAM in use and 86MB of RAM free. Sounds pretty awful to me. Sounds normal to me - I don't expect to see any free RAM unless I've just - exited a large process - deleted a large file with large cache footprint - released a large chunk of RAM by other means (mmap, madvise, semctl, etc). I haven't run NetBSD on a desktop for a while now, but I still think 12GB is a lot of memory in use. Maybe I'll get a new MacBook when they start shipping 32GB Apple CPU ones and then put NetBSD on my current MacBook. A big chunk of it is in file cache, which is unsurprising when reading thru a 400GiB file... Page activity lasts 20s and at 30MB/s that means you should have 600MB of file data active. Add 50% for inactive pages and that's still only 900MB. I'm willing to bet money that zstd only reads each block of data once (sequentially in fact) and so it doesn't need any file data cache at all. File metadata is a different matter, but that probably stays active and there won't be much of it. I suspect that your vm.filemax is set to more memory than you have available for the file cache and once that happens anonymous pages start to get swapped out. My experience is that while anonymous pages sound unimportant, they are in fact the most important pages to keep in RAM. Thinking about it, they are the irreplaceable bits of all our running software. Try setting vm.filemin=5 and vm.filemax=10. Really. I did it when processing vast amounts of files in CVS and it worked for me. Out of curiosity, what are you doing with zstd. You mentioned backups. Is this dump or restore? dump implements its own file cache, which won;t help with the memory burden. "top -ores" will tell you what programs are using the most anonymous pages, which might help identify where all this memory pressure is coming from. Cheers, Lloyd
Re: High vm scan rate and dropped keystrokes thru X?
On Mon, Jul 26, 2021 at 11:56:13PM +0900, Izumi Tsutsui wrote: > > NetBSD 9.2, amd64, 16GiB RAM, quad core + hyperthreading. > > > > I've repeatedly noticed an issue where a large amount of disk reads can > > result in lost keystrokes, jerky mouse behaviour and other weirdness. > : > > "vmstat 1" during these events shows climbing runqueue, falling free > > memory, high reclaim rate, very high scan rate, and 8 CPUs worth of > > system time - and I hear the BIOS spinning up the CPU fan. > > What "vmstat -m" shows? > > if kmem-160 (or kmem-192) has a large number, maybe caused by > radeondrmkms(4) leaks. > https://mail-index.netbsd.org/netbsd-bugs/2021/07/12/msg072460.html No, no radeon here. To be clear, I don't believe this is a leak. It's just some intermittently poor behaviour during high cache churn. ksh$ vmstat -m | sort -k 8nr | head vcachepl 336 615620540 60809350 2819365 2720482 98883 258691 0 inf0 buf2k 2048 27108120 2543840 1224274 1136951 87323 116413 110 ffsdino2 256 609337370 60181639 1907605 1823135 84470 193820 0 inf0 ffsino 256 608793480 60127246 1906802 1823012 83790 193820 0 inf0 anonpl32 383061100 35856325 43412 3062 40350 42502 0 inf0 ncache 192 124854900 11813042 39948 122 39826 39828 0 inf0 mutex 64 598124920 58941710 200286 164419 35867 51528 0 inf0 bufpl296 24502520 2210437 136128 114245 21883 24008 0 inf 381 buf16k 16384 10464170 983052 143815 126570 17245 22064 1 10 kmem-2048 2048 1866432 169723 35112 26265 8847 12270 0 inf1 'systat vm' shows the system mostly stalled with high sys CPU%, doing page scans: 18 usersLoad 6.98 3.39 2.35 Tue Jul 27 09:21:44 Proc:r d sCsw Traps SysCal Intr Soft Fault PAGING SWAPPING 18 1230 1215 58 4060 978953 58 in out in out ops 82.7% Sy 0.5% Us 0.0% Ni 4.4% In 12.4% Idpages ||||||||||| =>%% forks fkppw Anon 8904872 54% zero 8928 1362 Interrupts fksvm Exec 457148 2% wired 450292 284 TLB shootdownpwait File 3286872 20% inact 2338880 100 cpu0 timer relck Meta 1371982% bufs 234364 4 ioapic0 pin 18 rlkok (kB)real swaponly free 906 ioapic0 pin 16 noram Active98590041745392 10128 ioapic0 pin 23 3 ndcpy Namei Sys-cache Proc-cache68 msi1 vec 0 1 fltcp Calls hits% hits % 22 zfod 236 234 99 cow 2048 fmin Disks: seeks xfers bytes %busy 2730 ftarg wd0 20253K14.3 itarg wd1 18242K13.1 flnan cd0 29 pdfre cd1 1180584 pdscn sd0 1 91K 0.2 raid0 21273K33.4 -- Paul Ripke "Great minds discuss ideas, average minds discuss events, small minds discuss people." -- Disputed: Often attributed to Eleanor Roosevelt. 1948.
Re: High vm scan rate and dropped keystrokes thru X?
> NetBSD 9.2, amd64, 16GiB RAM, quad core + hyperthreading. > > I've repeatedly noticed an issue where a large amount of disk reads can > result in lost keystrokes, jerky mouse behaviour and other weirdness. : > "vmstat 1" during these events shows climbing runqueue, falling free > memory, high reclaim rate, very high scan rate, and 8 CPUs worth of > system time - and I hear the BIOS spinning up the CPU fan. What "vmstat -m" shows? if kmem-160 (or kmem-192) has a large number, maybe caused by radeondrmkms(4) leaks. https://mail-index.netbsd.org/netbsd-bugs/2021/07/12/msg072460.html --- Izumi Tsutsui
Re: High vm scan rate and dropped keystrokes thru X?
On Mon, Jul 26, 2021 at 05:53:19PM +1200, Lloyd Parkes wrote: > It has been a very long time since I had to look at UVM stuff, but luckily > past me post to > https://mail-index.netbsd.org/tech-repository/2010/02/01/msg000364.html. > Well done past me. > > Copying from that post, I was using > vm.anonmin = 10 > vm.filemin = 5 > vm.execmin = 5 > vm.anonmax = 90 > vm.filemax = 10 > vm.execmax = 30 > > > On 25/07/21 5:37 pm, Paul Ripke wrote: > > NetBSD 9.2, amd64, 16GiB RAM, quad core + hyperthreading. > > Sounds normal enough. > > > procsmemory page disks faults cpu > > r b avmfre flt re pi po fr sr w0 w1 in sy cs us sy > > id > > 0 2 12214336 86564 4043 0 0000 66 66 2415 9142 4588 0 3 > > 97 > > That's 12GB of RAM in use and 86MB of RAM free. Sounds pretty awful to me. Sounds normal to me - I don't expect to see any free RAM unless I've just - exited a large process - deleted a large file with large cache footprint - released a large chunk of RAM by other means (mmap, madvise, semctl, etc). > What does top or vmstat -s say about pages active/inactive and > anonymous/cachdd file/cached executable pages. This might give you a hint > about where all your memory has gone and what it is being used for. A big chunk of it is in file cache, which is unsurprising when reading thru a 400GiB file... >From top, around the time things go south - note that these firefox processes aren't actually that busy, their percentages are normally <2%, but the percentages spike up during periods of high scan rate. I'm pretty sure this is just a monitoring artifact. load averages: 3.08, 2.79, 1.87; up 41+10:40:47 130 processes: 1 runnable, 124 sleeping, 1 stopped, 4 on CPU CPU states: 0.2% user, 0.0% nice, 36.8% system, 1.9% interrupt, 60.9% idle Memory: 8545M Act, 3257M Inact, 441M Wired, 446M Exec, 2842M File, 8120K Free Swap: 10G Total, 3074M Used, 7166M Free PID USERNAME PRI NICE SIZE RES STATE TIME WCPUCPU COMMAND 0 root 1260 0K 41M CPU/7 25.7H 53.27% 53.27% [system] 5629 stix 430 3512M 641M parked/4 58:59 29.69% 29.69% firefox 10881 stix 430 3806M 920M parked/5 304:50 14.55% 14.55% firefox 12200 stix 430 3276M 555M parked/2 921:49 12.99% 12.99% firefox 6583 stix 223033M 6604K CPU/2 0:59 10.74% 10.74% zstd 19767 stix 410 3708M 906M CPU/6274:46 6.05% 6.05% firefox 1110 root 850 231M 29M select/1 313:02 4.49% 4.49% X 4227 stix 85060M 32M ttyraw/0 26.6H 4.05% 4.05% systat 28842 stix 850 2791M 1211M psem/5 700:48 1.17% 1.17% java 981 stix 85078M 4952K select/6 23:28 0.88% 0.88% xterm Looking at 'vmstat -s' around the time of badness, I don't see anything obvious standing out. Apart from apparently we have a bug causing several counters to either be negative or up around int64_max... 4096 bytes per page 8 page colors 4055762 pages managed 2485 pages free 2031945 pages active 991997 pages inactive 0 pages paging 112918 pages wired 1999 zero pages 1 reserve pagedaemon pages 40 reserve kernel pages 118829 boot kernel pages 873053 kernel pool pages 2280040 anonymous pages 742869 cached file pages 114192 cached executable pages 2048 minimum free pages 2730 target free pages 1351920 maximum wired pages 1 swap devices 2621439 swap pages 786650 swap pages in use 6124373 swap allocations 12709593653 total faults taken 11622482264 traps 1587775504 device interrupts 10242360957 CPU context switches 3331649724 software interrupts 51409703491 system calls 6103141 pagein requests 1446005 pageout requests 0 pages swapped in 11581719 pages swapped out 19154795 forks total 8291070 forks blocked parent 8291070 forks shared address space w
Re: High vm scan rate and dropped keystrokes thru X?
It has been a very long time since I had to look at UVM stuff, but luckily past me post to https://mail-index.netbsd.org/tech-repository/2010/02/01/msg000364.html. Well done past me. Copying from that post, I was using vm.anonmin = 10 vm.filemin = 5 vm.execmin = 5 vm.anonmax = 90 vm.filemax = 10 vm.execmax = 30 On 25/07/21 5:37 pm, Paul Ripke wrote: NetBSD 9.2, amd64, 16GiB RAM, quad core + hyperthreading. Sounds normal enough. procsmemory page disks faults cpu r b avmfre flt re pi po fr sr w0 w1 in sy cs us sy id 0 2 12214336 86564 4043 0 0000 66 66 2415 9142 4588 0 3 97 That's 12GB of RAM in use and 86MB of RAM free. Sounds pretty awful to me. What does top or vmstat -s say about pages active/inactive and anonymous/cachdd file/cached executable pages. This might give you a hint about where all your memory has gone and what it is being used for. Cheers, Lloyd
High vm scan rate and dropped keystrokes thru X?
NetBSD 9.2, amd64, 16GiB RAM, quad core + hyperthreading. I've repeatedly noticed an issue where a large amount of disk reads can result in lost keystrokes, jerky mouse behaviour and other weirdness. On this occasion, I was trying to "zstd -vt" a 400GiB backup archive from an ffsv2 USB attached HDD. Normally, it hums along at 30MiB/s, and the system is perfectly capable of other tasks. But occasionally (maybe even once or twice a minute), the system partially wedges, drops keystrokes (logged in via X), jerky mouse, and largely unresponsive. "vmstat 1" during these events shows climbing runqueue, falling free memory, high reclaim rate, very high scan rate, and 8 CPUs worth of system time - and I hear the BIOS spinning up the CPU fan. procsmemory page disks faults cpu r b avmfre flt re pi po fr sr w0 w1 in sy cs us sy id 0 2 12214336 86564 4043 0 0000 66 66 2415 9142 4588 0 3 97 0 2 12246244 54100 4171 0 0000 1 0 2040 16832 4405 1 1 98 4 2 12277980 21652 4103 0 0000 13 6 2075 13280 4222 1 3 96 1 2 12264920 36772 3593 730 00 10950 44492 0 0 1934 9351 3595 1 3 96 1 2 12297212 8880 4043 0 00 1080 1301 0 0 1994 8011 3396 0 1 99 2 2 12275644 26016 3612 3942 00 11516 52959 7 7 2011 8851 3658 1 2 97 1 2 12264288 37536 4370 2238 10 10020 44666 1 0 1975 11566 3643 1 3 96 1 2 12296692 34392 4182 0 00 7347 7534 29 14 2333 16088 4706 1 2 97 1 3 14096876 43680 1895 40019 00 6029 697167 10 10 1192 9816 2831 1 12 87 3 2 12283292 18412 3240 0 0000 54 54 2004 8961 3589 1 5 94 3 2 12260528 42140 2571 11816 00 11055 233685 0 0 1559 7628 2929 0 11 88 0 2 12292320 14028 4106 0 10 893 928 1 0 2134 11051 3810 1 1 98 1 2 12277084 26908 2753 4339 00 8147 231620 19 8 1567 17264 3798 1 14 85 9 1 12274492 28764 2441 22971 00 5247 1078197 10 13 1792 15713 4132 1 16 83 3 1 16027172 11588 2250 18831 00 242 1078430 195 82 1665 7999 4199 1 17 82 5 2 13761124 11968 161 6880 00 406 1078429 39 33 734 6303 1888 1 53 46 14 3 12292180 10920 276 17300 00 17 1078429 10 127 580 9782 1833 0 76 23 0 3 13678700 8304 543 13567 00 21 1424324 30 13 713 6863 1786 1 43 56 5 2 16608812 8212 44 17185 00 39 1811100 7 4 553 5491 1479 1 18 82 7 1 15149532 8196 22 5954 00 11 1078667 35 33 605 4981 1571 0 62 38 6 2 12453392 80609 9646 001 1078680 100 106 559 5284 1446 0 60 40 3 2 13072560 8084 23 8158 00 13 1273217 1 1 433 6344 1389 0 79 21 8 1 16595948 80609 15398 100 1959533 1 1 507 4450 1207 0 44 56 3 3 15631436 7888 43 9199 00 47 1082027 4 2 339 4153 1121 0 61 39 4 3 14373172 7940 10 9538 00 15 1078707 0 0 408 4972 1172 0 60 40 1 6 12424060 7000 206 9925 008 1078463 20 24 628 7176 1881 0 47 53 8 4 14632192 5888 413 13300 00 43 1662082 13 8 427 8161 1813 1 67 33 8 2 16117916 6468 405 17990 00 53 1450183 3 1 523 10788 1787 0 69 31 9 4 16260308 7500 31 10739 00 30 1201978 17 15 541 4683 1284 1 62 38 10 4 14928620 9044 27 7773 00 14 1078812 5 4 384 4760 1284 1 83 16 7 3 13699356 10464 30 6783 00 17 1079299 3 2 392 5252 1109 0 82 18 1 3 13023464 9600 651 9757 00 14 1260070 75 61 920 9891 2786 0 35 64 0 6 16074608 9512 42 13714 00 32 1841906 2 11 463 5624 1379 0 37 63 13 1 16224928 9676 33 10763 00 88 1214395 8 12 512 4572 1360 0 32 67 28 3 14692152 9060 28 4717 000 1079161 1 0 524 3802 967 0 66 34 3 9 12545240 8108 31 12217 005 1079165 4 2 637 6603 1341 0 44 55 12 3 13690532 7692 23 12431 008 1427075 48 48 571 4545 1474 0 55 45 10 3 15619384 7432 15 8081 00 10 1561463 4 2 400 4818 1289 0 66 34 25 2 16226876 73005 10344 000 1231134 0 0 425 4729 1140 0 98 2 14 2 16615924 72365 7563 001 1176532 6 6 452 4660 1232 0 99 1 38 2 15239700 71688 8346 000 1079291 8 7 625 4333 1127 0 99 1 0 2 12275088 26032 3857 0 10 11140 11261 95 89 2127 131045 5501 3 40 58 1 2 12265968 37412 3728 0 00 10115 10523 17 15 1770 16068 3847 1 4 95 1 2 12259936 43424 3906 0 00 8966 9136 9 9 1864 12628 3535 1 2 97 0 2 12292244 11120 4063 0 0000 3 1 2087 9112 3538 1 2 97 0 2 12287804 15964 4414 0 00 8524 8609 8 2 1956 8249 3440 1 3 96 1 1 12282612 20756 3674 0 00 8529 8614 42 29 2126 11998 4235 1 4 95 1 1 12265308 37156 3602 0 00 11299 11826 0 0 1831 7291 3287 0 3 97 0 2 12280260 43680 3784 0 00 9141 9345 7 7 1941 11775 3904 0 2 97 1 2 12290916 12452 3905 0 0000 0 0 1933 5999 3066 0 2 98 I'm wondering if my tweaked vm sysctl's might be to blame? vm.anonmin=30 vm.filemax=20 But they're not a huge departure from default