Hi !

I'm experiencing a consistent failure of the hv_balloon driver to
respond to burst memory demand on Alpine Linux, with VS Code Remote
devcontainers as a
representative workload.

The issue has been thoroughly analyzed using PSI monitoring and kernel
configuration verification. The root cause is the absence of burst
demand support in the driver architecture, compounded by the 1-second
polling loop latency.

A detailed analysis with measurement data and proposed improvements down below.
--

Bug Report / Request for Enhancement: hv_balloon Dynamic Memory
Hot-Add Fails Under Burst Demand Workloads

SUMMARY

The Linux hv_balloon driver for Hyper-V Dynamic Memory is documented
and architecturally designed to manage guest memory in both
directions: increasing memory via hot-add when the guest needs more,
and decreasing it via balloon inflation when the guest needs less. The
original patch comment introducing the driver explicitly states this
dual purpose, and the driver's state machine contains distinct states
for DM_BALLOON_UP, DM_BALLOON_DOWN and DM_HOT_ADD.

In practice, only the downward direction works reliably. This was
demonstrated in a controlled test running Alpine Linux v3.23 (kernel
6.18.20-lts) as a Hyper-V guest with Dynamic Memory configured
(Startup RAM 1024 MB, Maximum 16384 MB). Under a representative burst
demand workload using VS Code Remote SSH with devcontainer startup,
the guest experienced 97% PSI memory stall, 176,000+ swap pages
written, and near-OOM conditions over a sustained period exceeding 150
seconds. During this entire period, MemTotal never increased by a
single kilobyte and dmesg showed zero hot-add activity. The upward
direction failed completely.

The root cause is the complete absence of burst demand support in the
driver architecture, compounded by a fixed-interval 1-second polling
loop between guest and host, sequential hot-add protocol semantics,
and a kernel default configuration (MHP_DEFAULT_ONLINE_TYPE_OFFLINE)
that leaves hot-added memory sections offline even when they do
arrive. Collectively these mean the driver cannot respond to burst
memory demand fast enough to be useful.

Proposed resolution: PSI-triggered hot-add requests independent of the
1-second polling loop, documentation of required
auto_online_blocks=online configuration, and improved diagnostics when
hot-add is not initiated despite high memory pressure.


1. VERIFIED PURPOSE AND DESIGN INTENT

The original patch introducing hv_balloon into the Linux kernel states
explicitly:

"Windows hosts dynamically manage the guest memory allocation via a
combination memory hot add and ballooning. Memory hot add is used to
grow the guest memory up to the maximum memory that can be allocated
to the guest. Ballooning is used to both shrink as well as expand up
to the max memory."

Source: K.Y. Srinivasan, [PATCH 2/2] Drivers: hv: Add Hyper-V balloon
driver, lkml.iu.edu, 2012.

The driver's state machine in the current kernel source at
drivers/hv/hv_balloon.c confirms this with explicit states
DM_BALLOON_UP, DM_BALLOON_DOWN and DM_HOT_ADD. Source:
github.com/torvalds/linux/blob/master/drivers/hv/hv_balloon.c
(verified April 2026).

The upward direction is therefore not an optional or aspirational
feature. It is the primary stated purpose of the hot-add component of
the driver.


2. SYSTEM CONFIGURATION

Host: Windows Server 2022 with Hyper-V, Dynamic Memory enabled

Guest OS: Alpine Linux v3.23, kernel 6.18.20-lts (x86_64)

Kernel config relevant to this report:
- CONFIG_MEMORY_HOTPLUG=y
- CONFIG_MHP_DEFAULT_ONLINE_TYPE_OFFLINE=y (Alpine default)
- CONFIG_PSI=y
- CONFIG_PSI_DEFAULT_DISABLED=y

Hyper-V Dynamic Memory settings:
- Startup RAM: 1024 MB
- Minimum RAM: 512 MB
- Maximum RAM: 16384 MB

auto_online_blocks: Set to online manually after discovering the
default was offline

PSI: Enabled via psi=1 kernel parameter after discovering
CONFIG_PSI_DEFAULT_DISABLED=y

Driver module parameters verified on test system:
- /sys/module/hv_balloon/parameters/hot_add = Y (hot-add enabled)
- /sys/module/hv_balloon/parameters/pressure_report_delay = 0 (no startup delay)


3. USE CASE: VS CODE REMOTE SSH WITH DEVCONTAINER

This use case is representative of a class of developer workloads with
containerized development environments, that are increasingly common
on Linux VMs hosted on Hyper-V.

Workload profile:
VS Code Remote SSH connects to the Alpine guest and starts a Home
Assistant Add-on development container (ha-dev). The VS Code server
process (node) expands from zero to approximately 240 MB RSS within 10
seconds of connection. Multiple node processes spawn in rapid
succession as extensions and language servers load.

Expected behavior:
Hyper-V Dynamic Memory detects guest memory pressure, initiates
hot-add to expand MemTotal beyond the startup value, guest makes the
new memory available via auto_online_blocks, workload proceeds
normally.

Observed behavior:
MemTotal remained at 921764 kB (startup RAM) throughout the entire
session. dmesg showed no hot-add activity whatsoever. The system
responded by swapping aggressively and reaching PSI avg10 values of
97% before becoming unresponsive.


4. MEASUREMENT DATA

All measurements were collected using a custom shell script sampling
/proc/pressure/memory, /proc/vmstat and /proc/meminfo at 1-2 second
intervals, with per-process RSS from /proc/PID/status.

Timeline of VS Code startup of remote containers (elapsed time from connection)

Time    Event                               PSI delta/s      Swap
pages out   MemTotal
+0s     Baseline, docker running            0 us             0
       921764 kB
+48s    VS Code server appears              0 us             0
       921764 kB
+50s    node 107 MB RSS                     152,837 us       3,748
       921764 kB
+55s    4 node processes, 348 MB RSS total  941,966 us       22,945
       921764 kB
+92s    PSI avg10 48%                       6,357,506 us     159,934
       921764 kB
+106s   PSI avg10 83%                       13,446,517 us    160,262
       921764 kB
+179s   PSI avg10 97%                       20,093,449 us    165,208
       921764 kB

Key observations:
- MemTotal never changed from startup value
- No hot-add lines appeared in dmesg at any point during or after the session
- PSI cumulative stall since boot at session end: 804,383,983
microseconds (804 seconds of accumulated memory stall)


5. ROOT CAUSE ANALYSIS

The fundamental design gap in hv_balloon is the complete absence of
burst demand support. The driver was designed around a polling-based,
fixed-interval model and has no mechanism to detect or respond to
rapid memory transitions. All other issues described below are
consequences or amplifications of this core architectural limitation.

Issue 1: No burst demand support in the driver architecture

The driver has no concept of burst demand, a rapid transition from low
memory pressure to near-OOM within seconds. There is no fast path, no
threshold trigger, and no priority escalation mechanism. The entire
communication model between guest and host is based on periodic status
reporting, which by design introduces latency that is structurally
incompatible with burst workloads. A guest can transition from 0% to
97% PSI memory stall and write 160,000 swap pages before the driver
has sent more than a handful of status messages to the host.

Modern workloads such as containerized development environments,
Kubernetes pod scheduling, JVM heap initialization, Node.js extension
loading, routinely demand hundreds of megabytes of memory within a
5-10 second window. The driver architecture predates this workload
class entirely.

Issue 2: 1-second fixed-interval polling loop is too slow for burst workloads

The hv_balloon thread reports memory pressure to the host once per
second via post_status(). Source:
elixir.bootlin.com/linux/v6.14.6/source/drivers/hv/hv_balloon.c#L1381
(verified via Medium article by Shlomi Boutnaru, May 2025).

VS Code expanded from 0 to 240 MB RSS in under 10 seconds. By the time
the host received sufficient pressure signals to consider a hot-add
response, the guest had already exhausted available memory and entered
heavy swap. The polling cadence has no mechanism to accelerate or
escalate regardless of how severe or rapid the memory pressure
becomes.

Issue 3: pressure_report_delay not a factor in this case

The hv_balloon module parameter pressure_report_delay defaults to 30
seconds per the original 2013 patch. Source: K.Y. Srinivasan, [PATCH
1/2] Drivers: hv: balloon: Add a parameter to delay pressure
reporting, lkml.indiana.edu, 2013. On the test system this parameter
was verified to be 0
(/sys/module/hv_balloon/parameters/pressure_report_delay = 0), meaning
pressure reporting was not delayed. This eliminates
pressure_report_delay as a contributing factor in this specific case
and strengthens the conclusion that Issues 1 and 2 are solely
responsible for the observed failure.

Note: on systems where pressure_report_delay retains its default value
of 30, the failure window would be significantly wider, as the host
would receive no pressure data at all during the first 30 seconds
after driver load.

Issue 4: Sequential hot-add protocol prevents parallel responses

Per the Hyper-V Dynamic Memory protocol specification: the host must
not send a new hot-add request until the guest has responded to the
previous one. Source: quoted in QEMU developer discussion,
mail-archive.com, September 2020. Combined with the 128 MB minimum
DIMM size for Linux hot-add (source: patchew.org, verified search
result), each expansion step is large, slow and serialized.

Issue 5: MHP_DEFAULT_ONLINE_TYPE_OFFLINE leaves hot-added memory unusable

Alpine Linux ships with CONFIG_MHP_DEFAULT_ONLINE_TYPE_OFFLINE=y.
Hot-added memory sections are registered in sysfs but remain in
offline state until explicitly brought online. Without udev (Alpine
uses mdev) there is no automatic mechanism to online new sections. The
auto_online_blocks sysfs interface defaults to offline and must be
manually set to online.

This issue was identified and resolved in this specific environment by
setting echo online > /sys/devices/system/memory/auto_online_blocks
and making it persistent via /etc/local.d/memory-hotplug.start.
However even with this fix applied, hot-add was never triggered by the
host during the VS Code session. This confirms that Issues 1-4 are the
primary blockers and Issue 5 is a prerequisite that was already
satisfied.


6. COMPARISON WITH KNOWN SIMILAR REPORTS AND RECENT PATCHES

This failure mode is not new. A Kubernetes/minikube issue from 2017
describes an identical pattern: memory demand increases, Hyper-V
Manager shows warning status, assigned memory never increases, OOM
killer activates. Source: github.com/kubernetes/minikube/issues/1403.
The issue was closed as stale without resolution. The present report
provides significantly more detailed measurement data and kernel
configuration context than the prior report.

The driver is actively maintained. Two recent patches are relevant as context:

- A March 2024 patch by Michael Kelley fixes hot-add failures on
systems with memblock sizes larger than 128 MB, where add_memory()
would fail with error -22. Source: lore.kernel.org/lkml, March 2024.
This is a separate correctness fix and does not address burst demand.
- A January 2025 patch accepted into hyperv-next fixes an issue where
the balloon driver's global page-onlining callback blocked hot-add of
memory from GPU and vPCI device drivers. Source:
mail-archive.com/linux-hyperv, January 2025. Again a separate
correctness fix, but both patches confirm the driver is under active
development and that the maintainers are responsive to bug reports.

No open patches or RFC discussions on [email protected]
addressing burst demand or PSI integration in hv_balloon were
identified as of April 2026.


7. PROPOSED IMPROVEMENTS

RFE 1: PSI-triggered hot-add requests to handle burst demand

The driver's current architecture is built around fixed-interval
polling: it reports memory pressure to the host once per second via
post_status() and waits for the host to initiate a hot-add sequence.
This design has no mechanism to accelerate or escalate outside the
polling cadence regardless of how severe or rapid the memory pressure
becomes.

Modern workloads have fundamentally different memory characteristics.
Containers, container runtimes (Docker, containerd, podman), JVM-based
systems, Node.js applications, Kubernetes pods, and development
environments such as VS Code devcontainers routinely exhibit burst
demand patterns: a guest transitions from low memory pressure to
near-OOM within seconds as processes spawn, images are pulled, or
runtimes initialize their heaps. Kubernetes is a particularly
illustrative case - the entire value proposition of dynamic memory
allocation in a Kubernetes node depends on the hypervisor being able
to supply memory fast enough to honor pod scheduling decisions. When a
scheduler assigns a new pod to a node, it expects memory to be
available within seconds, not after a multi-second feedback loop that
may itself be preceded by a 30-second pressure_report_delay. This
pattern is not an edge case - it is the normal startup behavior of a
significant proportion of workloads running on Linux VMs today.

The VS Code devcontainer use case presented in this report is
representative but not exceptional. Any workload that combines a
container runtime with a language server, a build system, a database
startup sequence, or a Kubernetes pod scheduling event will exhibit
similar burst demand characteristics. The 1-second fixed-interval
polling loop is structurally incapable of protecting against this
class of memory event regardless of host configuration.

The infrastructure to solve this already exists in the Linux kernel.
PSI threshold triggers via poll() on /proc/pressure/memory have been
available since kernel 4.20 and are already used by systemd-oomd and
Facebook's oomd to react to memory pressure faster than periodic
polling allows. The driver should leverage this same mechanism to send
an immediate out-of-band hot-add request to the host when burst demand
is detected, specifically when memory.full exceeds a configurable
threshold such as 10% over a 500ms window. This would allow the host
to begin the hot-add sequence at the onset of burst demand rather than
after the guest has already entered heavy swap.

This improvement requires no protocol-level changes if the existing
hot-add request message is used as the signal. A protocol extension
adding an explicit "burst demand" flag to the status message would be
preferable, allowing the host to prioritize the response and bypass
any queuing of normal pressure-based adjustments.

RFE 2: Document auto_online_blocks requirement

The kernel documentation and Hyper-V guest integration documentation
should explicitly state that auto_online_blocks must be set to online
(or the kernel compiled with MHP_DEFAULT_ONLINE_TYPE_ONLINE_AUTO) for
Dynamic Memory hot-add to function on distributions that do not use
udev with a memory hotplug rule. Currently this is undocumented and
discoverable only by reading kernel source or community bug reports.

RFE 3: Diagnostics when hot-add is not triggered

When PSI memory.full avg10 exceeds a significant threshold (e.g. 10%)
without a hot-add request being sent or received, the driver should
emit a pr_warn to dmesg. Currently the guest has no visibility into
whether the host is aware of pressure, whether a hot-add request was
sent, or whether it failed. This makes the failure mode completely
silent from the guest's perspective.

RFE 4: Consider proactive pressure signaling

The driver could be extended to send an out-of-band high-priority
pressure signal to the host when PSI crosses a critical threshold,
rather than waiting for the next 1-second polling cycle. This would
require a protocol-level change and coordination with the Hyper-V host
implementation but would address the fundamental latency issue.


8. WHAT IS UNKNOWN

Why the host never sent a hot-add request despite the guest reaching
near-OOM conditions is unknown. The host-side decision algorithm for
when to initiate hot-add is proprietary and not publicly documented.
It is possible the host's memory pressure thresholds were not met
because guest-side pressure reporting was too slow to accumulate
sufficient signal. It is also possible the host's algorithm is simply
not designed for burst workloads of this type.

The Hyper-V Dynamic Memory Buffer setting (configurable between 5% and
200%, default 20%) controls how much headroom the host maintains above
current demand. Whether increasing this value would provide sufficient
buffer to absorb burst demand without requiring hot-add at all is
unknown without testing. It would not address the architectural
limitation but could serve as a partial operational mitigation.
--

Reply via email to