On 07/30/2019 10:33 PM, Matthew Wilcox wrote: > On Mon, Jul 29, 2019 at 02:02:52PM +0530, Anshuman Khandual wrote: >> On 07/27/2019 01:24 AM, Matthew Wilcox wrote: >>> On Fri, Jul 26, 2019 at 10:17:11AM +0530, Anshuman Khandual wrote: >>>>> But 'page' isn't necessarily PMD-aligned. I don't think we can rely on >>>>> architectures doing the right thing if asked to make a PMD for a randomly >>>>> aligned page. >>>>> >>>>> How about finding the physical address of something like kernel_init(), >>>> >>>> Physical address corresponding to the symbol in the kernel text segment ? >>> >>> Yes. We need the address of something that's definitely memory. >>> The stack might be in vmalloc space. We can't allocate memory from the >>> allocator that's PUD-aligned. This seems like a reasonable approximation >>> to something that might work. >> >> Okay sure. What is about vmalloc space being PUD aligned and how that is >> problematic here ? Could you please give some details. Just being curious. > > Those were two different sentences. > > We can't use the address of something on the stack, because we don't > know whether the stack is in vmalloc space or in the direct map. Okay because kernel stack might be on vmalloc() space. > > We can't use the address of something we've allocated from the page > allocator, because the page allocator can't give us PUD-aligned memory. Because this test will be executed early during boot, alloc_contig_range() makes sense for this purpose. Something like alloc_gigantic_page() which other than getting the order from huge_page_order(h) is sort of a generic allocator. Shall we make core part of the function a generic allocator for broader usage in kernel in case the page allocator would not be sufficient like in this case which requires a PUD size and a PUD aligned memory. In case PUD aligned memory block cannot be allocated, pud_basic_tests() must be skipped and a PMD aligned memory block should be used instead as fallback for other tests. > >>> I think that's a mistake. As Russell said, the ARM p*d manipulation >>> functions expect to operate on tables, not on individual entries >>> constructed on the stack. >> >> Hmm. I assume that it will take care of dual 32 bit entry updates on arm >> platform through various helper functions as Russel had mentioned earlier. >> After we create page table with p?d_alloc() functions and pick an entry at >> each page table level. > > Right. > >>> So I think the right thing to do here is allocate an mm, then do the >>> pgd_alloc / p4d_alloc / pud_alloc / pmd_alloc / pte_alloc() steps giving >>> you real page tables that you can manipulate. >>> >>> Then destroy them, of course. And don't access through them. >> >> mm_alloc() seems like a comprehensive helper to allocate and initialize a >> mm_struct. But could we use mm_init() with 'current' in the driver context >> or we >> need to create a dummy task_struct for this purpose. Some initial tests show >> that >> p?d_alloc() and p?d_free() at each level with a fixed virtual address gives >> p?d_t >> entries required at various page table level to test upon. > > I think it's wise to start a new mm. I'm not sure exactly what calls > to make to get one going.> >>>>>> +#ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD >>>>>> +static void pud_basic_tests(void) >>>>> >>>>> Is this the right ifdef? >>>> >>>> IIUC THP at PUD is where the pud_t entries are directly operated upon and >>>> the >>>> corresponding accessors are present only when >>>> HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD >>>> is enabled. Am I missing something here ? >>> >>> Maybe I am. I thought we could end up operating on PUDs for kernel >>> mappings, >>> even without transparent hugepages turned on. >> >> In generic MM ? IIUC except ioremap mapping all other PUD handling for >> kernel virtual >> range is platform specific. All the helpers used in the function >> pud_basic_tests() are >> part of THP and used in mm/huge_memory.c > > But what about hugetlbfs? And vmalloc can also use larger pages these days. > I don't think these tests should be conditional on transparent hugepages. The current proposal restricts itself to very basic operations at each page table level for now. I have subsequent patches which adds various MM feature related specific helpers with respect to SPECIAL, DEVMAP, HugeTLB entries etc. We can also explore platform specific helpers for ioremap and vmalloc. But that is for subsequent patches and scope for current proposal is limited. THP (or PUD THP) config wrappers are here because these helpers mentioned in the current proposal are present only when THP (or PUD THP) is enabled but are absent otherwise. Without these wrappers, we will have build failures. Hence these wrappers are necessary.