Le 19/12/2025 à 17:15, Arnd Bergmann a écrit :
From: Arnd Bergmann <[email protected]>

Most of the common 32-bit architectures (x86, arm, powerpc) all use the
default virtual memory layout that was already in place for i386 systems
in the 1990s, using exactly 3GiB of user TASK_SIZE, with the upper 1GiB
of addresses split between (at most 896MiB) lowmem and vmalloc.

Linux-2.3 introduced CONFIG_HIGHMEM for large x86 server machines that
had 4GiB of RAM or more, with the VMSPLIT_3G/2G/1G options added in
v2.6.16 for machines that had one or two gigabytes of memory but wanted
to avoid the overhead from managing highmem. Over time, similar options
appeared on other 32-bit architectures.

Twenty years later, it makes sense to reconsider the default settings,
as the tradeoffs have changed a bit:

  - Configurations with more than 2GiB have become extremely rare,
    as any users with large memory have moved on to 64-bit systems.
    There were only ever a few Laptop models in this category: Apple
    Powerbook G4 (2005), Macbook (2006), IBM Thinkpad X60 (2006), Arm
    Chromebooks based on Exynos 5800 (2014), Tegra K1 (2014) and RK3288
    (2015), and manufacturer support for all of these has ended in 2020
    or (much) earlier.
    Embedded systems with more than 2GiB use additional SoCs of a
    similar vintage: Intel Atom Z5xx (2008), Freescale QorIQ (2008),
    Marvell Armada XP (2010), Freescale i.MX6Q (2011), LSI Axxia (2013),
    TI Keystone2 (2014), Renesas RZ/G1M (2015). Most boards based on
    these have stopped receiving kernel upgrades. Newer 32-bit chips
    only support smaller memory configurations, though in particular the
    i.MX6Q and Keystone2 families have expected support cycles past 2035.
    While 32-bit server installations used to support even larger memory,
    none of those seem to still be used in production on any architecture.

  - While general-purpose distributes for 32-bit targets were common,
    it was rather risky to change the CONFIG_VMSPLIT setting because
    there is always a possibility of running into device driver bugs or
    applications that need a large virtual memory size. Presumably
    a lot of these issues have been resolved now, so most setups should
    be fine using a custom vmsplit instead of highmem now.

  - As fewer users test highmem, the expectation is that it will
    increasingly break in the future, so getting users to change the
    vmsplit means that even if there is a bug to fix initially,
    it improves the situation in the long run.

  - Highmem will ultimately need to be removed, at least for the page
    cache and most other code using it today. In a previous discussion, I
    had suggested doing this as early as 2029, but based on the discussions
    since ELC, the plan is now to leave highmem-enabled page cache as an
    option until at least 2029, at which point remaining users will have
    the choice between no longer updating kernels or using a combination of
    a custom vmsplit and zram/zswap. Changing the defaults now should both
    speed up the highmem deprecation and make it less painful for users.

  - The most VM space intensive applications tend to be web browsers,
    specifcally Chrome/ChromeOS and Firefox. Both have now stopped
    providing binary updates, but Firefox can still be built from source.
    Testing various combinations on Debian/armhf, I found that Firefox 140
    can still show complex websites with VMSPLIT_2G_OPT with and without
    HIGHMEM, though it failed for me both with the small address space
    of VMSPLIT_1G and the small lowmem of VMSPLIT_3G_OPT when HIGHMEM
    is disabled.
    This is likely to get worse with future versions, so embedded users
    may still be forced to migrate to specialized browsers like WPE Webkit
    when HIGHMEM pagecache is finally removed.

Based on the above observations and the discussion at the kernel summit,
change the defaults to the most appropriate values: use 1GiB of lowmem on
non-highmem configurations, and either 2GiB or 1.75GiB of lowmem on highmem
builds, depending on what is available on the architecture.  As ARM_LPAE
and X86_PAE builds both require a gigabyte-aligned vmsplit, those get
to use VMSPLIT_2G. The result is that the majority of previous highmem
users now only need lowmem. For platform specific defconfig files that
are known to only support up to 1GiB of RAM, drop the CONFIG_HIGHMEM line
as well as a simplification.

On PowerPC and Microblaze, the options have somewhat different names but
should have the same effect. MIPS and Xtensa cannot support a larger
than 512MB of lowmem but are limited to small DDR2 memory in most
implementations, with MT7621 being a notable exception. ARC and C-Sky
could support a configurable vmsplit in theory, but it's not clear
if anyone still cares.
SPARC is currently limited to 192MB of lowmem and should get patched
to behave either like arm/x86 or powerpc/microblaze to support 2GiB
of lowmem.

There are likely going to be regressions from the changed defaults,
in particular when hitting previously hidden device driver bugs
that fail to set the correct DMA mask, or from applications that
need a large virtual address space.
Ideally the in-kernel problems should all be fixable, but the previous
behavior is still selectable as a fallback with CONFIG_EXPERT=y

Cc: Russell King <[email protected]>
Cc: [email protected]
Cc: Thomas Gleixner <[email protected]>
Cc: Ingo Molnar <[email protected]>
Cc: Borislav Petkov <[email protected]>
Cc: Dave Hansen <[email protected]>
Cc: [email protected]
Cc: "H. Peter Anvin" <[email protected]>
Cc: Madhavan Srinivasan <[email protected]>
Cc: Michael Ellerman <[email protected]>
Cc: Nicholas Piggin <[email protected]>
Cc: Christophe Leroy (CS GROUP) <[email protected]>
Cc: [email protected]
Cc: Michal Simek <[email protected]>
Cc: Andrew Morton <[email protected]>
Cc: David Hildenbrand <[email protected]>
Cc: Lorenzo Stoakes <[email protected]>
Cc: Liam R. Howlett <[email protected]>
Cc: Vlastimil Babka <[email protected]>
Cc: Mike Rapoport <[email protected]>
Cc: Suren Baghdasaryan <[email protected]>
Cc: Michal Hocko <[email protected]>
Cc: Matthew Wilcox <[email protected]>
Cc: [email protected]
Cc: Richard Weinberger <[email protected]>
Cc: Linus Walleij <[email protected]>
Cc: Nishanth Menon <[email protected]>
Cc: Andreas Larsson <[email protected]>
Cc: Lucas Stach <[email protected]>
Signed-off-by: Arnd Bergmann <[email protected]>
---
  arch/arm/Kconfig                            |  5 ++++-
  arch/arm/configs/aspeed_g5_defconfig        |  1 -
  arch/arm/configs/dove_defconfig             |  2 --
  arch/arm/configs/mv78xx0_defconfig          |  2 --
  arch/arm/configs/u8500_defconfig            |  1 -
  arch/arm/configs/vt8500_v6_v7_defconfig     |  3 ---
  arch/arm/mach-omap2/Kconfig                 |  1 -
  arch/microblaze/Kconfig                     |  9 ++++++---
  arch/microblaze/configs/mmu_defconfig       |  1 -
  arch/powerpc/Kconfig                        | 17 +++++++++++------
  arch/powerpc/configs/44x/akebono_defconfig  |  1 -
  arch/powerpc/configs/85xx/ksi8560_defconfig |  1 -
  arch/powerpc/configs/85xx/stx_gp3_defconfig |  1 -

Reviewed-by: Christophe Leroy (CS GROUP) <[email protected]>

Be aware that it will likely trivialy conflict with https://lore.kernel.org/linuxppc-dev/6a2575420770d075cd090b5a316730a2ffafdee4.1766574657.git.chle...@kernel.org/

Another point is that it will increase the overall memory usage when people activate KASAN as KASAN reserves 1/8 of RAM for lowmem memory. I think we need to look at the impact on available virtual memory, because 1/8 of 2G is 256M which is the size of the last segment shared by KASAN shadow mem and vmalloc.

Christophe


Reply via email to