On Thu 23-02-17 14:31:24, Vitaly Kuznetsov wrote: > Michal Hocko <mho...@kernel.org> writes: > > > On Wed 22-02-17 10:32:34, Vitaly Kuznetsov wrote: > > [...] > >> > There is a workaround in that a user could online the memory or have > >> > a udev rule to online the memory by using the sysfs interface. The > >> > sysfs interface to online memory goes through device_online() which > >> > should updated the dev->offline flag. I'm not sure that having kernel > >> > memory hotplug rely on userspace actions is the correct way to go. > >> > >> Using udev rule for memory onlining is possible when you disable > >> memhp_auto_online but in some cases it doesn't work well, e.g. when we > >> use memory hotplug to address memory pressure the loop through userspace > >> is really slow and memory consuming, we may hit OOM before we manage to > >> online newly added memory. > > > > How does the in-kernel implementation prevents from that? > > > > Onlining memory on hot-plug is much more reliable, e.g. if we were able > to add it in add_memory_resource() we'll also manage to online it.
How does that differ from initiating online from the users? > With > udev rule we may end up adding many blocks and then (as udev is > asynchronous) failing to online any of them. Why would it fail? > In-kernel operation is synchronous. which doesn't mean anything as the context is preemptible AFAICS. > >> In addition to that, systemd/udev folks > >> continuosly refused to add this udev rule to udev calling it stupid as > >> it actually is an unconditional and redundant ping-pong between kernel > >> and udev. > > > > This is a policy and as such it doesn't belong to the kernel. The whole > > auto-enable in the kernel is just plain wrong IMHO and we shouldn't have > > merged it. > > I disagree. > > First of all it's not a policy, it is a default. We have many other > defaults in kernel. When I add a network card or a storage, for example, > I don't need to go anywhere and 'enable' it before I'm able to use > it from userspace. An for memory (and CPUs) we, for some unknown reason > opted for something completely different. If someone is plugging new > memory into a box he probably wants to use it, I don't see much value in > waiting for a special confirmation from him. This was not my decision so I can only guess but to me it makes sense. Both memory and cpus can be physically present and offline which is a perfectly reasonable state. So having a two phase physicall hotadd is just built on top of physical vs. logical distinction. I completely understand that some usecases will really like to online the whole node as soon as it appears present. But an automatic in-kernel implementation has its down sites - e.g. if this operation fails in the middle you will not know about that unless you check all the memblocks in sysfs. This is really a poor interface. > Second, this feature is optional. If you want to keep old behavior just > don't enable it. It just adds unnecessary configuration noise as well > Third, this solves real world issues. With Hyper-V it is very easy to > show udev failing on stress. What is the reason for this failures. Do you have any link handy? > No other solution to the issue was ever suggested. you mean like using ballooning for the memory overcommit like other more reasonable virtualization solutions? -- Michal Hocko SUSE Labs