[PATCH v2 0/8] mm/memory_hotplug: allow to specify a default online_type

2020-03-17 Thread David Hildenbrand
Distributions nowadays use udev rules ([1] [2]) to specify if and
how to online hotplugged memory. The rules seem to get more complex with
many special cases. Due to the various special cases,
CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE cannot be used. All memory hotplug
is handled via udev rules.

Everytime we hotplug memory, the udev rule will come to the same
conclusion. Especially Hyper-V (but also soon virtio-mem) add a lot of
memory in separate memory blocks and wait for memory to get onlined by user
space before continuing to add more memory blocks (to not add memory faster
than it is getting onlined). This of course slows down the whole memory
hotplug process.

To make the job of distributions easier and to avoid udev rules that get
more and more complicated, let's extend the mechanism provided by
- /sys/devices/system/memory/auto_online_blocks
- "memhp_default_state=" on the kernel cmdline
to be able to specify also "online_movable" as well as "online_kernel"

v1 -> v2:
- Tweaked some patch descriptions
- Added
-- "powernv/memtrace: always online added memory blocks"
-- "hv_balloon: don't check for memhp_auto_online manually"
-- "mm/memory_hotplug: unexport memhp_auto_online"
- "mm/memory_hotplug: convert memhp_auto_online to store an online_type"
-- No longer touches hv/memtrace code


=== Example /usr/libexec/config-memhotplug ===

#!/bin/bash

VIRT=`systemd-detect-virt --vm`
ARCH=`uname -p`

sense_virtio_mem() {
  if [ -d "/sys/bus/virtio/drivers/virtio_mem/" ]; then
DEVICES=`find /sys/bus/virtio/drivers/virtio_mem/ -maxdepth 1 -type l | wc 
-l`
if [ $DEVICES != "0" ]; then
return 0
fi
  fi
  return 1
}

if [ ! -e "/sys/devices/system/memory/auto_online_blocks" ]; then
  echo "Memory hotplug configuration support missing in the kernel"
  exit 1
fi

if grep "memhp_default_state=" /proc/cmdline > /dev/null; then
  echo "Memory hotplug configuration overridden in kernel cmdline 
(memhp_default_state=)"
  exit 1
fi

if [ $VIRT == "microsoft" ]; then
  echo "Detected Hyper-V on $ARCH"
  # Hyper-V wants all memory in ZONE_NORMAL
  ONLINE_TYPE="online_kernel"
elif sense_virtio_mem; then
  echo "Detected virtio-mem on $ARCH"
  # virtio-mem wants all memory in ZONE_NORMAL
  ONLINE_TYPE="online_kernel"
elif [ $ARCH == "s390x" ] || [ $ARCH == "s390" ]; then
  echo "Detected $ARCH"
  # standby memory should not be onlined automatically
  ONLINE_TYPE="offline"
elif [ $ARCH == "ppc64" ] || [ $ARCH == "ppc64le" ]; then
  echo "Detected" $ARCH
  # PPC64 onlines all hotplugged memory right from the kernel
  ONLINE_TYPE="offline"
elif [ $VIRT == "none" ]; then
  echo "Detected bare-metal on $ARCH"
  # Bare metal users expect hotplugged memory to be unpluggable. We assume
  # that ZONE imbalances on such enterpise servers cannot happen and is
  # properly documented
  ONLINE_TYPE="online_movable"
else
  # TODO: Hypervisors that want to unplug DIMMs and can guarantee that ZONE
  # imbalances won't happen
  echo "Detected $VIRT on $ARCH"
  # Usually, ballooning is used in virtual environments, so memory should go to
  # ZONE_NORMAL. However, sometimes "movable_node" is relevant.
  ONLINE_TYPE="online"
fi

echo "Selected online_type:" $ONLINE_TYPE

# Configure what to do with memory that will be hotplugged in the future
echo $ONLINE_TYPE 2>/dev/null > /sys/devices/system/memory/auto_online_blocks
if [ $? != "0" ]; then
  echo "Memory hotplug cannot be configured (e.g., old kernel or missing 
permissions)"
  # A backup udev rule should handle old kernels if necessary
  exit 1
fi

# Process all already pluggedd blocks (e.g., DIMMs, but also Hyper-V or 
virtio-mem)
if [ $ONLINE_TYPE != "offline" ]; then
  for MEMORY in /sys/devices/system/memory/memory*; do
STATE=`cat $MEMORY/state`
if [ $STATE == "offline" ]; then
echo $ONLINE_TYPE > $MEMORY/state
fi
  done
fi


=== Example /usr/lib/systemd/system/config-memhotplug.service ===

[Unit]
Description=Configure memory hotplug behavior
DefaultDependencies=no
Conflicts=shutdown.target
Before=sysinit.target shutdown.target
After=systemd-modules-load.service
ConditionPathExists=|/sys/devices/system/memory/auto_online_blocks

[Service]
ExecStart=/usr/libexec/config-memhotplug
Type=oneshot
TimeoutSec=0
RemainAfterExit=yes

[Install]
WantedBy=sysinit.target


=== Example modification to the 40-redhat.rules [2] ===

diff --git a/40-redhat.rules b/40-redhat.rules-new
index 2c690e5..168fd03 100644
--- a/40-redhat.rules
+++ b/40-redhat.rules-new
@@ -6,6 +6,9 @@ SUBSYSTEM=="cpu", ACTION=="add", TEST=="online", 
ATTR{online}=="0", ATTR{online}
 # Memory hotadd request
 SUBSYSTEM!="memory", GOTO="memory_hotplug_end"
 ACTION!="add", GOTO="memory_hotplug_end"
+# memory hotplug behavior configured
+PROGRAM=="grep online /sys/devices/system/memory/auto_online_blocks", 
GOTO="memory_hotplug_end"
+
 PROGRAM="/bin/uname -p", RESULT=="s390*", GOTO="memory_hotplug_end"

 ENV{.state}="online"

===


[1] https://github.com/lnykryn/systemd-rhel/pull/281
[2] https://

Re: [PATCH v2 0/8] mm/memory_hotplug: allow to specify a default online_type

2020-03-18 Thread Baoquan He
On 03/17/20 at 11:49am, David Hildenbrand wrote:
> Distributions nowadays use udev rules ([1] [2]) to specify if and
> how to online hotplugged memory. The rules seem to get more complex with
> many special cases. Due to the various special cases,
> CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE cannot be used. All memory hotplug
> is handled via udev rules.
> 
> Everytime we hotplug memory, the udev rule will come to the same
> conclusion. Especially Hyper-V (but also soon virtio-mem) add a lot of
> memory in separate memory blocks and wait for memory to get onlined by user
> space before continuing to add more memory blocks (to not add memory faster
> than it is getting onlined). This of course slows down the whole memory
> hotplug process.
> 
> To make the job of distributions easier and to avoid udev rules that get
> more and more complicated, let's extend the mechanism provided by
> - /sys/devices/system/memory/auto_online_blocks
> - "memhp_default_state=" on the kernel cmdline
> to be able to specify also "online_movable" as well as "online_kernel"

This patch series looks good, thanks. Since Andrew has merged it to -mm again,
I won't add my Reviewed-by to bother. 

Hi David, Vitaly

There are several things unclear to me.

So, these improved interfaces are used to alleviate the burden of the 
existing udev rules, or try to replace it? As you know, we have been
using udev rules to interact between kernel and user space on bare metal,
and guests who want to hot add/remove.

And also the OOM issue in hyperV when onlining pages after adding memory
block. I am not a virt devel expert, could this happen on bare metal
system?

Thanks
Baoquan



Re: [PATCH v2 0/8] mm/memory_hotplug: allow to specify a default online_type

2020-03-18 Thread David Hildenbrand
On 18.03.20 14:05, Baoquan He wrote:
> On 03/17/20 at 11:49am, David Hildenbrand wrote:
>> Distributions nowadays use udev rules ([1] [2]) to specify if and
>> how to online hotplugged memory. The rules seem to get more complex with
>> many special cases. Due to the various special cases,
>> CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE cannot be used. All memory hotplug
>> is handled via udev rules.
>>
>> Everytime we hotplug memory, the udev rule will come to the same
>> conclusion. Especially Hyper-V (but also soon virtio-mem) add a lot of
>> memory in separate memory blocks and wait for memory to get onlined by user
>> space before continuing to add more memory blocks (to not add memory faster
>> than it is getting onlined). This of course slows down the whole memory
>> hotplug process.
>>
>> To make the job of distributions easier and to avoid udev rules that get
>> more and more complicated, let's extend the mechanism provided by
>> - /sys/devices/system/memory/auto_online_blocks
>> - "memhp_default_state=" on the kernel cmdline
>> to be able to specify also "online_movable" as well as "online_kernel"
> 
> This patch series looks good, thanks. Since Andrew has merged it to -mm again,
> I won't add my Reviewed-by to bother. 
> 
> Hi David, Vitaly
> 
> There are several things unclear to me.
> 
> So, these improved interfaces are used to alleviate the burden of the 
> existing udev rules, or try to replace it? As you know, we have been

At least in RHEL, my plan is to replace it / use a udev rules as a
fallback on older kernels (see the example scripts below). But other
distribution can handle it as they want.

> using udev rules to interact between kernel and user space on bare metal,
> and guests who want to hot add/remove.>
> And also the OOM issue in hyperV when onlining pages after adding memory
> block. I am not a virt devel expert, could this happen on bare metal
> system?

Don't think it's relevant on bare metal. If you plug a big DIMM, all
memory blocks will be added first in one shot and then all memory blocks
will be onlined. So it doesn't matter "how fast" you online that memory.

In contrast, Hyper-V (and virtio-mem) add one (or a limited number of)
memory block at a time and wait for them to get onlined.

-- 
Thanks,

David / dhildenb



Re: [PATCH v2 0/8] mm/memory_hotplug: allow to specify a default online_type

2020-03-18 Thread Michal Hocko
On Wed 18-03-20 21:05:17, Baoquan He wrote:
> On 03/17/20 at 11:49am, David Hildenbrand wrote:
> > Distributions nowadays use udev rules ([1] [2]) to specify if and
> > how to online hotplugged memory. The rules seem to get more complex with
> > many special cases. Due to the various special cases,
> > CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE cannot be used. All memory hotplug
> > is handled via udev rules.
> > 
> > Everytime we hotplug memory, the udev rule will come to the same
> > conclusion. Especially Hyper-V (but also soon virtio-mem) add a lot of
> > memory in separate memory blocks and wait for memory to get onlined by user
> > space before continuing to add more memory blocks (to not add memory faster
> > than it is getting onlined). This of course slows down the whole memory
> > hotplug process.
> > 
> > To make the job of distributions easier and to avoid udev rules that get
> > more and more complicated, let's extend the mechanism provided by
> > - /sys/devices/system/memory/auto_online_blocks
> > - "memhp_default_state=" on the kernel cmdline
> > to be able to specify also "online_movable" as well as "online_kernel"
> 
> This patch series looks good, thanks. Since Andrew has merged it to -mm again,
> I won't add my Reviewed-by to bother. 

JFYI, Andrew usually adds R-b or A-b tags as they are posted.

-- 
Michal Hocko
SUSE Labs


Re: [PATCH v2 0/8] mm/memory_hotplug: allow to specify a default online_type

2020-03-18 Thread Vitaly Kuznetsov
Baoquan He  writes:

> On 03/17/20 at 11:49am, David Hildenbrand wrote:
>> Distributions nowadays use udev rules ([1] [2]) to specify if and
>> how to online hotplugged memory. The rules seem to get more complex with
>> many special cases. Due to the various special cases,
>> CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE cannot be used. All memory hotplug
>> is handled via udev rules.
>> 
>> Everytime we hotplug memory, the udev rule will come to the same
>> conclusion. Especially Hyper-V (but also soon virtio-mem) add a lot of
>> memory in separate memory blocks and wait for memory to get onlined by user
>> space before continuing to add more memory blocks (to not add memory faster
>> than it is getting onlined). This of course slows down the whole memory
>> hotplug process.
>> 
>> To make the job of distributions easier and to avoid udev rules that get
>> more and more complicated, let's extend the mechanism provided by
>> - /sys/devices/system/memory/auto_online_blocks
>> - "memhp_default_state=" on the kernel cmdline
>> to be able to specify also "online_movable" as well as "online_kernel"
>
> This patch series looks good, thanks. Since Andrew has merged it to -mm again,
> I won't add my Reviewed-by to bother. 
>
> Hi David, Vitaly
>
> There are several things unclear to me.
>
> So, these improved interfaces are used to alleviate the burden of the 
> existing udev rules, or try to replace it? As you know, we have been
> using udev rules to interact between kernel and user space on bare metal,
> and guests who want to hot add/remove.

With 'auto_online_blocks' interface you don't need the udev rule. David
is trying to make it more versatile.

>
> And also the OOM issue in hyperV when onlining pages after adding memory
> block. I am not a virt devel expert, could this happen on bare metal
> system?

Yes - in theory, very unlikely - in practice.

The root cause of the problem here is adding more memory to the system
requires memory (page tables, memmaps,..) so if your system is low on
memory and you're trying to hotplug A LOT you may run into OOM before
you're able to online anything. With bare metal it's usualy not the
case: servers, which are able to hotplug memory, are usually booted with
enough memory and memory hotplug is a manual action (you need to insert
DIMMs!). But, if you boot your server with e.g. 4G, almost exhaust it
and then try to hotplug e.g. 256G ... well, OOM is almost guaranteed.
With virtual machines it's very common (e.g. with Hyper-V VMs) to boot
them with low memory and hotplug it (automatically, by some management
software) when neededm thus the problem is way more common.

-- 
Vitaly



Re: [PATCH v2 0/8] mm/memory_hotplug: allow to specify a default online_type

2020-03-18 Thread Baoquan He
On 03/18/20 at 02:58pm, Vitaly Kuznetsov wrote:
> Baoquan He  writes:
> 
> > On 03/17/20 at 11:49am, David Hildenbrand wrote:
> >> Distributions nowadays use udev rules ([1] [2]) to specify if and
> >> how to online hotplugged memory. The rules seem to get more complex with
> >> many special cases. Due to the various special cases,
> >> CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE cannot be used. All memory hotplug
> >> is handled via udev rules.
> >> 
> >> Everytime we hotplug memory, the udev rule will come to the same
> >> conclusion. Especially Hyper-V (but also soon virtio-mem) add a lot of
> >> memory in separate memory blocks and wait for memory to get onlined by user
> >> space before continuing to add more memory blocks (to not add memory faster
> >> than it is getting onlined). This of course slows down the whole memory
> >> hotplug process.
> >> 
> >> To make the job of distributions easier and to avoid udev rules that get
> >> more and more complicated, let's extend the mechanism provided by
> >> - /sys/devices/system/memory/auto_online_blocks
> >> - "memhp_default_state=" on the kernel cmdline
> >> to be able to specify also "online_movable" as well as "online_kernel"
> >
> > This patch series looks good, thanks. Since Andrew has merged it to -mm 
> > again,
> > I won't add my Reviewed-by to bother. 
> >
> > Hi David, Vitaly
> >
> > There are several things unclear to me.
> >
> > So, these improved interfaces are used to alleviate the burden of the 
> > existing udev rules, or try to replace it? As you know, we have been
> > using udev rules to interact between kernel and user space on bare metal,
> > and guests who want to hot add/remove.
> 
> With 'auto_online_blocks' interface you don't need the udev rule. David
> is trying to make it more versatile.
> 
> >
> > And also the OOM issue in hyperV when onlining pages after adding memory
> > block. I am not a virt devel expert, could this happen on bare metal
> > system?
> 
> Yes - in theory, very unlikely - in practice.
> 
> The root cause of the problem here is adding more memory to the system
> requires memory (page tables, memmaps,..) so if your system is low on
> memory and you're trying to hotplug A LOT you may run into OOM before
> you're able to online anything. With bare metal it's usualy not the
> case: servers, which are able to hotplug memory, are usually booted with
> enough memory and memory hotplug is a manual action (you need to insert
> DIMMs!). But, if you boot your server with e.g. 4G, almost exhaust it
> and then try to hotplug e.g. 256G ... well, OOM is almost guaranteed.

Thanks for this detailed explanation.

I finally know why this is a problem in hyperV. But with the current
mechanism, it will happen on any system if thing is done like this. 

Is there a reason hyperV need boot with small memory, then enlarge it
with huge memory? Since it's a real case in hyperV, I guess there must
be reason, I am just curious.

> With virtual machines it's very common (e.g. with Hyper-V VMs) to boot
> them with low memory and hotplug it (automatically, by some management
> software) when neededm thus the problem is way more common.



Re: [PATCH v2 0/8] mm/memory_hotplug: allow to specify a default online_type

2020-03-18 Thread Baoquan He
On 03/18/20 at 02:54pm, Michal Hocko wrote:
> On Wed 18-03-20 21:05:17, Baoquan He wrote:
> > On 03/17/20 at 11:49am, David Hildenbrand wrote:
> > > Distributions nowadays use udev rules ([1] [2]) to specify if and
> > > how to online hotplugged memory. The rules seem to get more complex with
> > > many special cases. Due to the various special cases,
> > > CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE cannot be used. All memory hotplug
> > > is handled via udev rules.
> > > 
> > > Everytime we hotplug memory, the udev rule will come to the same
> > > conclusion. Especially Hyper-V (but also soon virtio-mem) add a lot of
> > > memory in separate memory blocks and wait for memory to get onlined by 
> > > user
> > > space before continuing to add more memory blocks (to not add memory 
> > > faster
> > > than it is getting onlined). This of course slows down the whole memory
> > > hotplug process.
> > > 
> > > To make the job of distributions easier and to avoid udev rules that get
> > > more and more complicated, let's extend the mechanism provided by
> > > - /sys/devices/system/memory/auto_online_blocks
> > > - "memhp_default_state=" on the kernel cmdline
> > > to be able to specify also "online_movable" as well as "online_kernel"
> > 
> > This patch series looks good, thanks. Since Andrew has merged it to -mm 
> > again,
> > I won't add my Reviewed-by to bother. 
> 
> JFYI, Andrew usually adds R-b or A-b tags as they are posted.

Got it, thanks for telling.



Re: [PATCH v2 0/8] mm/memory_hotplug: allow to specify a default online_type

2020-03-18 Thread Baoquan He
On 03/18/20 at 02:50pm, David Hildenbrand wrote:
> On 18.03.20 14:05, Baoquan He wrote:
> > On 03/17/20 at 11:49am, David Hildenbrand wrote:
> >> Distributions nowadays use udev rules ([1] [2]) to specify if and
> >> how to online hotplugged memory. The rules seem to get more complex with
> >> many special cases. Due to the various special cases,
> >> CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE cannot be used. All memory hotplug
> >> is handled via udev rules.
> >>
> >> Everytime we hotplug memory, the udev rule will come to the same
> >> conclusion. Especially Hyper-V (but also soon virtio-mem) add a lot of
> >> memory in separate memory blocks and wait for memory to get onlined by user
> >> space before continuing to add more memory blocks (to not add memory faster
> >> than it is getting onlined). This of course slows down the whole memory
> >> hotplug process.
> >>
> >> To make the job of distributions easier and to avoid udev rules that get
> >> more and more complicated, let's extend the mechanism provided by
> >> - /sys/devices/system/memory/auto_online_blocks
> >> - "memhp_default_state=" on the kernel cmdline
> >> to be able to specify also "online_movable" as well as "online_kernel"
> > 
> > This patch series looks good, thanks. Since Andrew has merged it to -mm 
> > again,
> > I won't add my Reviewed-by to bother. 
> > 
> > Hi David, Vitaly
> > 
> > There are several things unclear to me.
> > 
> > So, these improved interfaces are used to alleviate the burden of the 
> > existing udev rules, or try to replace it? As you know, we have been
> 
> At least in RHEL, my plan is to replace it / use a udev rules as a
> fallback on older kernels (see the example scripts below). But other

Ok, got it. Didn't notice the script and the systemd service are your
part of plan, thought you are demonstrating the status. Thanks.

> distribution can handle it as they want.
> 
> > using udev rules to interact between kernel and user space on bare metal,
> > and guests who want to hot add/remove.>
> > And also the OOM issue in hyperV when onlining pages after adding memory
> > block. I am not a virt devel expert, could this happen on bare metal
> > system?
> 
> Don't think it's relevant on bare metal. If you plug a big DIMM, all
> memory blocks will be added first in one shot and then all memory blocks
> will be onlined. So it doesn't matter "how fast" you online that memory.
> 
> In contrast, Hyper-V (and virtio-mem) add one (or a limited number of)
> memory block at a time and wait for them to get onlined.
> 
> -- 
> Thanks,
> 
> David / dhildenb



Re: [PATCH v2 0/8] mm/memory_hotplug: allow to specify a default online_type

2020-03-18 Thread Vitaly Kuznetsov
Baoquan He  writes:

> Is there a reason hyperV need boot with small memory, then enlarge it
> with huge memory? Since it's a real case in hyperV, I guess there must
> be reason, I am just curious.
>

It doesn't really *need* to but this can be utilized in e.g. 'hot
standby' schemes I believe. Also, it may be enough if the administrator
is just trying to e.g. double the size of RAM but the VM is already
under memory pressure. I wouldn't say that these cases are common but
afair bugs like 'I tried adding more memory to my VM and it just OOMed'
were reported in the past.

-- 
Vitaly