On 11.06.2025 10:23, David Hildenbrand wrote: > On 11.06.25 10:03, Marek Szyprowski wrote: >> On 11.06.2025 04:38, Alistair Popple wrote: >>> On Tue, Jun 10, 2025 at 06:18:09PM +0200, Marek Szyprowski wrote: >>>> On 04.06.2025 05:21, Alistair Popple wrote: >>>>> The PFN_MAP flag is no longer used for anything, so remove it. >>>>> The PFN_SG_CHAIN and PFN_SG_LAST flags never appear to have been >>>>> used so also remove them. The last user of PFN_SPECIAL was removed >>>>> by 653d7825c149 ("dcssblk: mark DAX broken, remove FS_DAX_LIMITED >>>>> support"). >>>>> >>>>> Signed-off-by: Alistair Popple <apop...@nvidia.com> >>>>> Acked-by: David Hildenbrand <da...@redhat.com> >>>>> Reviewed-by: Christoph Hellwig <h...@lst.de> >>>>> Reviewed-by: Jason Gunthorpe <j...@nvidia.com> >>>>> Cc: gerald.schae...@linux.ibm.com >>>>> Cc: dan.j.willi...@intel.com >>>>> Cc: j...@ziepe.ca >>>>> Cc: wi...@infradead.org >>>>> Cc: da...@redhat.com >>>>> Cc: linux-kernel@vger.kernel.org >>>>> Cc: nvd...@lists.linux.dev >>>>> Cc: jhubb...@nvidia.com >>>>> Cc: h...@lst.de >>>>> Cc: zhang.l...@gmail.com >>>>> Cc: de...@rivosinc.com >>>>> Cc: bj...@kernel.org >>>>> Cc: balb...@nvidia.com >>>>> Cc: lorenzo.stoa...@oracle.com >>>>> Cc: j...@groves.net >>>>> >>>>> --- >>>>> >>>>> Splitting this off from the rest of my series[1] as a separate >>>>> clean-up >>>>> for consideration for the v6.16 merge window as suggested by >>>>> Christoph. >>>>> >>>>> [1] - >>>>> https://lore.kernel.org/linux-mm/cover.541c2702181b7461b84f1a6967a3f0e823023fcc.1748500293.git-series.apop...@nvidia.com/ >>>>> --- >>>>> include/linux/pfn_t.h | 31 >>>>> +++---------------------------- >>>>> mm/memory.c | 2 -- >>>>> tools/testing/nvdimm/test/iomap.c | 4 ---- >>>>> 3 files changed, 3 insertions(+), 34 deletions(-) >>>> This patch landed in today's linux-next as commit 28be5676b4a3 ("mm: >>>> remove PFN_MAP, PFN_SPECIAL, PFN_SG_CHAIN and PFN_SG_LAST"). In my >>>> tests >>>> I've noticed that it breaks operation of all RISC-V 64bit boards on my >>>> test farm (VisionFive2, BananaPiF3 as well as QEMU's Virt machine). >>>> I've >>>> isolated the changes responsible for this issue, see the inline >>>> comments >>>> in the patch below. Here is an example of the issues observed in the >>>> logs from those machines: >>> Thanks for the report. I'm really confused by this because this >>> change should >>> just be removal of dead code - nothing sets any of the removed PFN_* >>> flags >>> AFAICT. >>> >>> I don't have access to any RISC-V hardwdare but you say this >>> reproduces under >>> qemu - what do you run on the system to cause the error? Is it just >>> a simple >>> boot and load a module or are you running selftests or something else? >> >> It fails a simple boot test. Here is a detailed instruction how to >> reproduce this issue with the random Debian rootfs image found on the >> internet (tested on Ubuntu 22.04, with next-20250610 >> kernel source): > > riscv is one of the archs where pte_mkdevmap() will *not* set the pte > as special. (I > raised this recently in the original series, it's all a big mess) > > So, before this change here, pfn_t_devmap() would have returned > "false" if only > PFN_DEV was set, now it would return "true" if only PFN_DEV is set. > > Consequently, in insert_pfn() we would have done a pte_mkspecial(), > now we do a > pte_mkdevmap() -- again, which does not imply "special" on riscv. > > riscv selects CONFIG_ARCH_HAS_PTE_SPECIAL, so if !pte_special(), it's > considered as > normal. > > Would the following fix your issue? > > > diff --git a/mm/memory.c b/mm/memory.c > index 8eba595056fe3..0e972c3493692 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -589,6 +589,10 @@ struct page *vm_normal_page(struct vm_area_struct > *vma, unsigned long addr, > { > unsigned long pfn = pte_pfn(pte); > > + /* TODO: remove this crap and set pte_special() instead. */ > + if (pte_devmap(pte)) > + return NULL; > + > if (IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL)) { > if (likely(!pte_special(pte))) > goto check_pfn; > @@ -598,16 +602,6 @@ struct page *vm_normal_page(struct vm_area_struct > *vma, unsigned long addr, > return NULL; > if (is_zero_pfn(pfn)) > return NULL; > - if (pte_devmap(pte)) > - /* > - * NOTE: New users of ZONE_DEVICE will not set > pte_devmap() > - * and will have refcounts incremented on their struct > pages > - * when they are inserted into PTEs, thus they are > safe to > - * return here. Legacy ZONE_DEVICE pages that set > pte_devmap() > - * do not have refcounts. Example of legacy > ZONE_DEVICE is > - * MEMORY_DEVICE_FS_DAX type in pmem or virtio_fs > drivers. > - */ > - return NULL; > > print_bad_pte(vma, addr, pte, NULL); > return NULL; > > > But, I would have thought the later patches in Alistairs series would > sort that out > (where we remove pte_devmap() ... ) > The above change fixes the issues observed on RISCV boards.
Best regards -- Marek Szyprowski, PhD Samsung R&D Institute Poland