Heho,
I am running a small setup, where recently the boarder router VMs of a user 
caused prolonged and consistent low bandwidth (2-3mb/s) yet high utilization 
(many IOPS) disk utilization on the virtualization nodes (more writeup at [1]).

With a bit of digging, we figured out that this was caused by rpki-client, 
mostly due to the nature of /var/cache/rpki-client being 'lots of small files 
(~230k), subsequently opened and closed (during validation), with atime 
probably doing the rest of the hurt.  This lead to rpki-client running for 
~30-60minutes, sometimes dying due to exceeding 3600 seconds runtime. The 
problem becoming so pronounced may also relate to the RPKI blow-up due to some 
recent experiments (currently not finding a fitting link, though; Recall 
cloudflare suffered with some DB bloat because of that in their validators.).

I ultimately resorted to giving an mfs on /var/cache/rpki-client a try. This 
worked surprisingly well, (naturally) removed all disk i/o usage, and improved 
the rpki-client runtime from ~30min to ~16min (CPUs aren't the freshest, so 
this is fine, I guess). Of course the trade-off here is a full sync after every 
reboot.

I recon that this is mostly a fragment of spinning disks being used for storage 
in the virt environment, but it makes me wonder whether it would not make sense 
to note that in the man page? Would like to hear some opinions, though, before 
actually suggesting the change/typing up a fitting section.

With best regards,
Tobias

[1] 
https://doing-stupid-things.as59645.net/networking/bgp/nsfp/2022/07/31/making-it-ping-part-5.html

Reply via email to