On Fri, Nov 21, 2014 at 10:22 AM, Linus Torvalds <torva...@linux-foundation.org> wrote: > > (d) the non-PAE case would look something like this: > > static noinline int vmalloc_fault(unsigned long address) > { > unsigned index; > pgd_t *pgd_dst, pgd_entry; > > /* Make sure we are in vmalloc area: */ > if (!(address >= VMALLOC_START && address < VMALLOC_END)) > return -1;
Side note: I think this is just unnecessary confusion, and generates big constants for no good reason. The thing is, the kernel PGD's should always be in sync. In fact, at PGD allocation time, we just do clone_pgd_range(.. KERNEL_PGD_BOUNDARY, KERNEL_PGD_PTRS); and it might actually be better to structure this to be that exact same thing. So instead of checking the address, we could just do index = pgd_index(address); if (index < KERNEL_PGD_BOUNDARY) return -1; which actually matches our initialization sequence much better anyway. And avoids those random big constants. Also, it turns out that this: if (pgd_present(pgd_dst[index])) generates a crazy big constant because of bad compiler issues (the "pgd_present()" thing only checks the low bit, but it does so on pgd_flags(), which does "native_pgd_val(pgd) & PTE_FLAGS_MASK", so you have an insane extra "and" with the constant 0xffffc00000000fff, just to then "and" it again with "1". It doesn't do that with the first pgd_present() check, oddly enough. WTF, gcc? Anyway, even more importantly, because of the whole issue with nesting page tables, it's probably best to actually avoid all the "pgd_present()" etc helpers, because those might be hardcoded to 1 etc. So avoid the whole issue by just accessign the raw data. Simplify, simplify, simplify. The actual code generation for this all should be maybe 20 instructions. Here's the simplified end result. Again, this is TOTALLY UNTESTED. I compiled it and verified that the code generation looks like what I'd have expected, but that's literally it. static noinline int vmalloc_fault(unsigned long address) { pgd_t *pgd_dst; pgdval_t pgd_entry; unsigned index = pgd_index(address); if (index < KERNEL_PGD_BOUNDARY) return -1; pgd_entry = init_mm.pgd[index].pgd; if (!pgd_entry) return -1; pgd_dst = __va(PAGE_MASK & read_cr3()); pgd_dst += index; if (pgd_dst->pgd) return -1; ACCESS_ONCE(pgd_dst->pgd) = pgd_entry; return 0; } NOKPROBE_SYMBOL(vmalloc_fault); Hmm? Does anybody see anything fundamentally wrong with this? Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/