Well I've come up with a multi-pronged solution, after much experimentation, that keeps load in the single digits throughout the entire certmonger startup process.
First, I've learned more about zram swap, namely that the size specification is not the physical ram used but the virtual swap size created. From observation I've found a ~2.8:1 savings in memory between the compression ratio of pages and duplicate pages not being stored multiple times, when running FreeIPA and certmonger is consuming memory with forked processes. So while the swap usage peaked at ~1.3GB the physical memory usage of the swap was only ~462MB. This is important because it means I can use more zram swap and avoid using a swapfile on the SD card entirely. However, Fedora's zram swap configuration method by default doesn't allow you to configure a swap size larger than physical memory as it's expecting you to provide a factor X which it then uses to allocate 1/X memory to zram swap, and due to it using BASH scripting math you can't specify decimals (i.e. 0.5). So I copied the zram startupt script to /opt and changed it to use a different config parameter that directly specifies memory size, and used 'systemctl edit zram-swap.service' to override the ShellExec for it to use the modified script, allowing me to allocate a 2GB zram swap. 'systemctl edit zram-swap' : [Service] ExecStart= ExecStart=/opt/zramstart Diff of /opt/zramstart # diff -bB /usr/sbin/zramstart /opt/zramstart 14a15 > [ -z "$SIZE" ] || zram_size=$SIZE Second, I used 'systemctl edit certmonger.service' to modify Certmonger's service file to specify CPUQuota to prevent it from clobbering all the other normal processes when it fork bombs: [Service] CPUQuota=20% Third, I disabled the certmonger service so it doesn't auto-start at boot and instead created a systemd timer certmonger.timer that starts it after 5 minutes after boot to allow everything else to start up first before it gets hammered: [Unit] Description=Run certmonger after boot settles down [Timer] OnBootSec=5min [Install] WantedBy=timers.target All of these changes *should* survive any system updates as well since no systemd or similar files were edited directly, so that's an added bonus of not having to remember to re-tweak things after an update. With all of the above changes, I'm able to boot, FreeIPA services all start as normal (except certmonger), then a few minutes later certmonger starts, and load never goes above 10, mostly around 5, until certmonger's forked processes all finish up finally. It takes about an hour, but that's ~2x faster than letting it try to complete with no CPU Quota (even with the modified zram swap - and without it, it simply runs out of memory if I don't have additional swapfile which kills performance even more) as load gets over 40 in short order and the system becomes mostly unresponsive. Even during certmonger startup, DNS/LDAP/etc are responsive and thus the Pi is usable for our purposes as a local replica to ensure that offices that lack a full fat FreeIPA installation on real server hardware won't become useless if their VPN connection to a site that does have a full installation goes down. Ensuring local redundancy so regular that work can continue as normal if there's a network outage is the goal of using the Pi after all, and thus the desired result has been achieved after some tinkering. On Thu, May 16, 2019 at 5:37 PM Jonathan Vaughn <[email protected]> wrote: > The many certmonger processes exceed the available RAM (Pi 3 having 1GB) > by a wide margin and cause heavy swapping as they all try to run at once, > and the heavy swapping itself is the reason load gets so high. If it was > one at a time they might still encounter some swapping (or might not, but > it should be doable with just zram swap instead of needing physical swap, > which would mean minimal load hit). I don't know if they wait on a lock at > some point, but they're definitely all kicking off at nearly the same time > and even if they end up pausing when they reach a certain point the system > spends a long time swapping constantly trying to load all of the processes > into memory at once. > > I haven't timed it but it takes at least double digit minutes for load to > recover from 30+ to a "normal" load of less than 5 (at idle, with the other > non-CA FreeIPA services running, and minimal activity, load is around 4 +- > a bit). > > If we can't find a solution to tame certmonger's behavior I am considering > just scheduling certmonger to run once a day or week or whatever at a > preset time outside the normal operating hours for the office that the Pi > happens to be located in, which would at least reduce the impact to just > being very annoying. > > On Wed, May 15, 2019 at 9:00 PM Fraser Tweedale <[email protected]> > wrote: > >> On Wed, May 15, 2019 at 05:15:38PM -0400, Rob Crittenden via >> FreeIPA-users wrote: >> > Jonathan Vaughn via FreeIPA-users wrote: >> > > I previously had tested FreeIPA running on a Raspberry Pi 3B+ and as >> > > long as I didn't run the Dogtag server on it performance seemed >> > > acceptable for the purpose. These are only being used as local >> > > DNS/LDAP/Krb5 replicas, everything also runs on both physical x86_64 >> and >> > > VM x86_64 servers as well in more than one location. >> > >> > It is STRONGLY not recommended to run IPA in production on *Pi. If you >> > have you and your wife on some local LAN then maybe. >> > >> > > However now that I'm trying to set up Pis for actual use (previously >> had >> > > set up a test environment to validate using them) I'm running into >> major >> > > performance issues once certmonger starts. Using a systemd timer to >> > > delay start until everything else starts at least lets everything else >> > > FreeIPA related start up and work, but once certmonger starts it still >> > > hammers the system using tons of memory and causing lots of swapping. >> > > >> > > Is there any reason for it to spawn so many processes all at once, >> > > versus doing them in a more serial fashion? And did something change >> in >> > > FreeIPA/certmonger behavior in the last year that would cause such a >> > > performance regression in memory limited scenarios? Previously I just >> > > had zram swap and it was fine, now I have to replace that with actual >> > > swap on storage. >> > >> > Hard to say since you include no version information. >> > >> > > Also, there's currently no certs needing renewal or anything on this >> > > system, so why does it even spawn so many processes ? >> > > >> > > root 1699 1 0 03:55 ? 00:00:00 /usr/sbin/certmonger >> -S >> > > -p /var/run/certmonger.pid -n >> > > root 1720 1699 0 03:55 ? 00:00:00 /usr/bin/python3 -E >> > > /usr/libexec/certmonger/ipa-server-guard >> /usr/libexec/certmonger/ipa-submit >> > > root 1721 1699 0 03:55 ? 00:00:00 /usr/bin/python3 -E >> > > /usr/libexec/certmonger/ipa-server-guard >> /usr/libexec/certmonger/ipa-submit >> > > root 1722 1699 0 03:55 ? 00:00:00 /usr/bin/python3 -E >> > > /usr/libexec/certmonger/ipa-server-guard >> /usr/libexec/certmonger/ipa-submit >> > > root 1723 1699 0 03:55 ? 00:00:00 /usr/bin/python3 -E >> > > /usr/libexec/certmonger/ipa-server-guard >> /usr/libexec/certmonger/ipa-submit >> > > root 1724 1699 0 03:55 ? 00:00:00 /usr/bin/python3 -E >> > > /usr/libexec/certmonger/ipa-server-guard >> /usr/libexec/certmonger/ipa-submit >> > > root 1725 1699 0 03:55 ? 00:00:00 /usr/bin/python3 -E >> > > /usr/libexec/certmonger/ipa-server-guard >> /usr/libexec/certmonger/ipa-submit >> > > root 1726 1699 0 03:55 ? 00:00:00 /usr/bin/python3 -E >> > > /usr/libexec/certmonger/ipa-server-guard >> /usr/libexec/certmonger/ipa-submit >> > > root 1727 1699 0 03:55 ? 00:00:00 /usr/bin/python3 -E >> > > /usr/libexec/certmonger/ipa-server-guard >> /usr/libexec/certmonger/ipa-submit >> > > root 1742 1699 0 03:55 ? 00:00:00 >> > > /usr/libexec/certmonger/dogtag-ipa-renew-agent-submit >> > > root 1759 1699 0 03:56 ? 00:00:00 /usr/bin/python3 -E >> > > /usr/libexec/certmonger/dogtag-ipa-ca-renew-agent-submit >> > > root 1761 1699 0 03:56 ? 00:00:00 /usr/bin/python3 -E >> > > /usr/libexec/certmonger/dogtag-ipa-ca-renew-agent-submit >> > > root 1762 1699 0 03:56 ? 00:00:00 /usr/bin/python3 -E >> > > /usr/libexec/certmonger/dogtag-ipa-ca-renew-agent-submit >> > > root 1763 1699 0 03:56 ? 00:00:00 /usr/bin/python3 -E >> > > /usr/libexec/certmonger/dogtag-ipa-ca-renew-agent-submit >> > > root 1764 1699 0 03:56 ? 00:00:00 /usr/bin/python3 -E >> > > /usr/libexec/certmonger/dogtag-ipa-ca-renew-agent-submit >> > > root 1765 1699 0 03:56 ? 00:00:00 /usr/bin/python3 -E >> > > /usr/libexec/certmonger/dogtag-ipa-ca-renew-agent-submit >> > > root 1767 1699 0 03:56 ? 00:00:00 /usr/bin/python3 -E >> > > /usr/libexec/certmonger/dogtag-ipa-ca-renew-agent-submit >> > > root 1768 1699 0 03:56 ? 00:00:00 /usr/bin/python3 -E >> > > /usr/libexec/certmonger/dogtag-ipa-ca-renew-agent-submit >> > > root 1769 1699 0 03:56 ? 00:00:00 /usr/bin/python3 -E >> > > /usr/libexec/certmonger/dogtag-ipa-ca-renew-agent-submit >> --reuse-existing >> > > root 1770 1699 0 03:56 ? 00:00:00 /usr/bin/python3 -E >> > > /usr/libexec/certmonger/dogtag-ipa-ca-renew-agent-submit >> --reuse-existing >> > > root 1771 1699 0 03:56 ? 00:00:00 /usr/bin/python3 -E >> > > /usr/libexec/certmonger/dogtag-ipa-ca-renew-agent-submit >> --reuse-existing >> > > root 1772 1699 0 03:56 ? 00:00:00 /usr/bin/python3 -E >> > > /usr/libexec/certmonger/dogtag-ipa-ca-renew-agent-submit >> --reuse-existing >> > > root 1773 1699 0 03:56 ? 00:00:00 /usr/bin/python3 -E >> > > /usr/libexec/certmonger/dogtag-ipa-ca-renew-agent-submit >> --reuse-existing >> > > root 1774 1699 0 03:56 ? 00:00:00 /usr/bin/python3 -E >> > > /usr/libexec/certmonger/dogtag-ipa-ca-renew-agent-submit >> --reuse-existing >> > > root 1775 1699 0 03:56 ? 00:00:00 /usr/bin/python3 -E >> > > /usr/libexec/certmonger/dogtag-ipa-ca-renew-agent-submit >> --reuse-existing >> > > root 1776 1699 0 03:57 ? 00:00:00 /usr/bin/python3 -E >> > > /usr/libexec/certmonger/dogtag-ipa-ca-renew-agent-submit >> --reuse-existing >> > > >> > > Eventually these complete and things settle down but it takes a very >> > > long time, and without delaying certmonger until after the rest of >> > > FreeIPA it can cause various IPA services to take so long that they >> die >> > > and fail to start. >> > >> > On startup certmonger examines all the certs to see if, for example, the >> > roots have changed. There are all the processes because there is one per >> > tracked cert I assume. There is serialization in the IPA certmonger >> > config (ipa-server-guard) so they go one at at time. >> > >> Do they busy-wait on the lock? Maybe that is why the load is so >> high? >> >> I echo Rob's comments about Raspberry Pi. For sure there is room to >> improve performance, but a future where FreeIPA runs well on such >> low-spec machines... it is hard to imagine, and not something we're >> aiming for. >> >> Thanks, >> Fraser >> >> >> > rob >> > _______________________________________________ >> > FreeIPA-users mailing list -- [email protected] >> > To unsubscribe send an email to >> [email protected] >> > Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html >> > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines >> > List Archives: >> https://lists.fedorahosted.org/archives/list/[email protected] >> >
_______________________________________________ FreeIPA-users mailing list -- [email protected] To unsubscribe send an email to [email protected] Fedora Code of Conduct: https://getfedora.org/code-of-conduct.html List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedorahosted.org/archives/list/[email protected]
