> > Am 02.09.2020 um 11:53 schrieb Vlastimil Babka <vba...@suse.cz>: > > > > On 8/28/20 6:47 PM, Pavel Tatashin wrote: > >> There appears to be another problem that is related to the > >> cgroup_mutex -> mem_hotplug_lock deadlock described above. > >> > >> In the original deadlock that I described, the workaround is to > >> replace crash dump from piping to Linux traditional save to files > >> method. However, after trying this workaround, I still observed > >> hardware watchdog resets during machine shutdown. > >> > >> The new problem occurs for the following reason: upon shutdown systemd > >> calls a service that hot-removes memory, and if hot-removing fails for > > > > Why is that hotremove even needed if we're shutting down? Are there any > > (virtualization?) platforms where it makes some difference over plain > > shutdown/restart? > > If all it‘s doing is offlining random memory that sounds unnecessary and > dangerous. Any pointers to this service so we can figure out what it‘s doing > and why? (Arch? Hypervisor?)
Hi David, This is how we are using it at Microsoft: there is a very large number of small memory machines (8G each) with low downtime requirements (reboot must be under a second). There is also a large state ~2G of memory that we need to transfer during reboot, otherwise it is very expensive to recreate the state. We have 2G of system memory memory reserved as a pmem in the device tree, and use it to pass information across reboots. Once the information is not needed we hot-add that memory and use it during runtime, before shutdown we hot-remove the 2G, save the program state on it, and do the reboot. Pasha