Hi there, On Mon, 12 Sep 2022, Eric Tykwinski via clamav-users wrote:
I’ve been more and more moving things over to K8s from Docker ...
Could you explain that a bit more for me? My understanding was that Kubernetes and Docker were more than a little bit complementary. [1] Disclaimer: I've never actually used any of this new-fangled stuff [2] and I'm wondering if we might be able to help each other here.
just wondering if anyone is running a stateful set, IE I only want 1 server to run freshclam, but use the same defs for all other clamd
Maybe if I give you my understanding of how things hang together for clamd it will help. As far as clamd is concerned, the signature database is read-only. It resides in a single directory. For the 'official' signatures that's at least three files - main, daily and bytecode - but if you have e.g. third-party signatures and/or your own Yara rules in the database, it can be many; there are 82 at the moment in our own database directory. On startup the clamd daemon reads the whole thing and builds in-memory a somewhat optimized representation of what it's found. The in-memory representation is, as far as the engine is concerned, itself also then read-only. In will consume of the order of a gigabyte, so it can take a while to build it during which time the engine can't scan anything. [3] The freshclam utility is normally what changes files in the database directory, but third-party tools exist which also do that and you can even do it manually if you wish. I used to do that all the time for my Yara rules, but after years of pain I've given up on ClamAV's Yara implementation and now use a separate Yara engine with separate rules. It's much more efficient, easier to work with, and I haven't yet found anything in Yara 4.2.2 which behaves other than exactly as documented.
I’m assuming I can just put Example in freshclam.conf, and send a clamdscan —reload to the service to hit them all?
After the database has been read, clamd does not read any of the files again until it's time to reload the whole thing. This can be because clamd itself detects some change in one of the files in the directory (there's an internal timeout specified in clamd's configuration file) or because an instruction is sent to clamd via the socket on which it is listening. You can command a 'RELOAD' using clamdscan or simply by sending the command to the socket, e.g. using 'telnet' or 'socat' from the command line after modifying signature files. After an update by freshclam it sends the command if so configured. All the methods of causing a reload have exactly the same effect. They aren't mutually exclusive, but I don't know what might happen if you tried to use two of them at once. :) I haven't thought about what advantage if any might result from using multiple clamd daemons running in containers as compared with running more threads on a single clamd server. My gut feel is that it would probably be more efficient just to run more threads. By now I guess that's a pretty well tested approach, and there have been issues with containers but I'm not well informed about them. The issues on github are probably the best place to look for that kind of thing. Now I'm going into the realms of conjecture. Because you might have multiple clamd daemons running on a single host sharing resources with some containerization method, and because the in-memory representation of the signature database is AFAIK read-only, it stands to reason that you might carry the extraction beyond containerization, sharing memory between all the daemons. I'd bet that would take serious coding if it were to be done explicitly, but you might be able to get it to work by accident (almost) in a container environment. The problem I see is that each time an instance of clamd reloads its database it will write all over its in-memory representation and mess up any optimizations of the copy-on-write variety that the OS has probably already done. But fundamentally, as far as the scanner is concerned, the ruleset is just a pointer to a structure. [1] https://containerjournal.com/topics/container-ecosystems/kubernetes-vs-docker-a-primer/ [2] I use VMs. I feel I don't know nearly enough about containers to use them safely. [4] [3] For this reason a recent improvement has been that the engine can have a second, entirely separate in-memory representation, which it can build while using the first for scanning. This means that while it is building the second it can use up to twice as much memory, but after the second is built, the first chunk of memory will be freed and returned to the OS. For that reason, I think you might *not* want all your clamds to reload at once. [4] e.g. https://containerjournal.com/features/the-state-of-k8s-software-supply-chain-attacks/ HTH A final question: You're putting quite a bit of work into this. Do you have a feel for the probability that clamd will find what you're asking it to look for? -- 73, Ged. _______________________________________________ Manage your clamav-users mailing list subscription / unsubscribe: https://lists.clamav.net/mailman/listinfo/clamav-users Help us build a comprehensive ClamAV guide: https://github.com/Cisco-Talos/clamav-documentation https://docs.clamav.net/#mailing-lists-and-chat