[Please ignore the previous message I sent on this topic. I accidentally pressed 'Send' before my message was complete.]
On 22/06/2019 19:52, cho...@jtan.com wrote: > Lyndon Nerenberg writes: >> We are looking forward to that. *However*, there is a lot to be >> said for regularly re-installing your hosts from scratch. This >> ensures your installer scripts don't rot as host system "features" >> accrete over time. This is prone to happen when you Ansible- or > > Or as I like to put it: Reboot* often, to ensure that you can. Uptime is > overrated. In my experience, there are indeed benefits to rebooting production servers on a scheduled maintenance basis. Here are two example problems that it could help with: 1. If long-running processes are running then there is some chance that the system is suffering memory fragmentation. This will make your server slower. I think it could also/either trigger an OOM. 2. Untested changes could have been deployed since last reboot. They might have unpredictable effects on the startup scripts. 3. The startup scripts might no longer work _at all_ if the server has been in continual operation for a long time, such as five years. This can happen due to the phenomenon known as "bit rot". Some benefits of a regular, scheduled reboot cycle: 1. Rebooting will clear up memory fragmentation. 2. Rebooting will improve confidence that it is possible to reboot the server in a clean way and that the startup scripts still work. After initial boot the server will progress to its intended runtime state. ("Have you tried turning it off and then back on again?") Having this kind of confidence is particularly important when a server crashes or when you need to perform unscheduled maintenance to deploy to urgent hotfix. Another thought literally just occurred to me. Regular _unscheduled_ reboots seem like a typical chaos engineering technique. I haven't investigated chaos engineering closely but I'd be surprised if it isn't. Andrew -- OpenPGP key: EB28 0338 28B7 19DA DAB0 B193 D21D 996E 883B E5B9