On 3/30/26 16:13, Kiryl Shutsemau wrote: > On Thu, Mar 26, 2026 at 07:08:47PM -0700, Usama Arif wrote: >> zap_pmd_range() splits a huge PMD when the zap range doesn't cover the >> full PMD (partial unmap). If the split fails, the PMD stays huge. >> Falling through to zap_pte_range() would dereference the huge PMD entry >> as a PTE page table pointer. >> >> Skip the range covered by the PMD on split failure instead. > > Ughh... This is hacky as hell. > >> The skip is safe across all call paths into zap_pmd_range(): >> >> - exit_mmap() and OOM reaper: the zap range covers entire VMAs, so >> every PMD is fully covered (next - addr == HPAGE_PMD_SIZE). The >> zap_huge_pmd() branch handles these without splitting. The split >> failure path is unreachable. >> >> - munmap / mmap overlay: vma_adjust_trans_huge() (called from >> __split_vma) splits any PMD straddling the VMA boundary before the >> VMA is split. If that PMD split fails, __split_vma() returns >> -ENOMEM and the munmap is aborted before reaching zap_pmd_range(). >> The split failure path is unreachable. >> >> - MADV_DONTNEED: advisory hint, the kernel is allowed to ignore it. >> The pages remain valid and accessible. A subsequent access returns >> existing data without faulting. > > Em, no. MADV_DONTNEED users expect memory to be zeroed after the > "advise" is complete. At very least you need to zero the skipped range.
Fully agreed. This definitely needs more thought :) -- Cheers, David
