On Wed, 1 Jul 2026 17:05:45 +0800 "Li Zhe" <[email protected]> wrote:
> memmap_init_zone_device() can take a noticeable amount of time when large > pmem namespaces are bound or rebound, because it initializes nearly > identical struct page descriptors one PFN at a time. This series reduces > that ZONE_DEVICE memmap initialization overhead by reusing prepared > struct page templates and, on x86, using memcpy_nt() for the template > copy path. > > The main target is large fsdax/devdax pmem configurations, where the > cost of initializing the memmap shows up directly in nd_pmem/dax_pmem > bind and rebind latency. > > Patches 1-3 are preparatory cleanups and helper extraction. Patches 4-5 > add the template-copy fast path for head pages and compound tails. > Patches 6-8 introduce memcpy_nt()/memcpy_nt_drain(), extend the x86 > fixed-size memcpy_flushcache() inline cases used by that helper, and > switch the template-copy path over to memcpy_nt(). > > The fast path remains disabled when the page_ref_set tracepoint is > active, and sanitized builds stay on the slow path so their instrumented > stores are preserved. Architectures without a specialized memcpy_nt() > backend continue to fall back to memcpy(). > > Tested in a VM with a 100 GB fsdax namespace device configured with > map=dev and a 100 GB devdax namespace (align=2097152) on Intel Ice Lake > server. Thanks for persisting with this. Review is still thin :( I see that Mike, Boris and Alistair have commented on previous versions. As did Balbir, who wasn't cc'ed on this (fixed). I'll add it to mm.git for testing exposure (because I'm still a sucker for speedups), but more review is needed, please.

