Hi James,
On 02/13/2019 11:52 PM, James Morse wrote:
Hi guys,
On 13/02/2019 11:15, Dave Young wrote:
On 02/12/19 at 11:03pm, Kazuhito Hagio wrote:
On 2/12/2019 2:59 PM, Bhupesh Sharma wrote:
BTW, in the makedumpfile enablement patch thread for ARMv8.2 LVA
(which I sent out for 52-bit User space VA enablement) (see [0]), Kazu
mentioned that the changes look necessary.
[0]. http://lists.infradead.org/pipermail/kexec/2019-February/022431.html
The increased 'PTRS_PER_PGD' value for such cases needs to be then
calculated as is done by the underlying kernel
Aha! Nothing to do with which-bits-are-pfn in the tables...
You need to know if the top level PGD is 512bytes or bigger. As we use a
kmem-cache the adjacent data could be some else's page tables.
Is this really a problem though? You can't pull the user-space pgd pointers out
of no-where, you must have walked some task_struct and struct_mm's to find them.
In which case you would have the VMAs on hand to tell you if its in the mapped
user range.
It would be good to avoid putting something arch-specific in here if we can at
all help it.
(see
'arch/arm64/include/asm/pgtable-hwdef.h' for details):
#define PTRS_PER_PGD (1 << (MAX_USER_VA_BITS - PGDIR_SHIFT))
Yes, this is the reason why makedumpfile needs the MAX_USER_VA_BITS.
It is used for pgd_index() also in makedumpfile to walk page tables.
/* to find an entry in a page-table-directory */
#define pgd_index(addr) (((addr) >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1))
Since Dave mentioned crash tool does not need it, but crash should also
travel the pg tables.
If this is really necessary it would be good to describe what will
happen without the patch, eg. some user visible error from an actual test etc.
Yes please, it would really help if there was a specific example we could
discuss.
Sure. Here are two use-cases/regressions reported and which I have been
able to reproduce. Note that I tested them both on a CPU which does not
support ARMv8.2-LPA/LVA and on ARMv8 FVP model (which supports ARMv8.2
extensions).
Environment:
------------
Latest Upstream kernel: sha-id: 1f947a7a011fcceb14cb912f5481a53b18f1879a
("Merge branch 'akpm' (patches from Andrew)")
Latest makedumpfile code: (git://git.code.sf.net/p/makedumpfile/code ,
branch: devel)
crash-utility code: (https://github.com/crash-utility/crash.git, sha-id:
e082c372c7f1a782b058ec359dfbbbee0f0b6aad)
Note that Dave A. has since fixed crash-utility by using a hardcoded
value of 'MAX_PHYSMEM_BITS' (via sha id:
ac5a7889d31bb37aa0687110ecea08837f8a66a8) and determining 'vabits_user'
value via vmlinux (via sha id: 8618ddd817621c40c1f44f0ab6df7c7805234416)
(1). Regression Case 1 (ARMv8.2-LPA enabled kernel):
- Upstream makedumpfile and crash-utility (with sha-id
e082c372c7f1a782b058ec359dfbbbee0f0b6aad) are broken on following kind
of platforms:
a. Upstream Kernel 5.0.0-rc6+ with the following kernel configuration:
CONFIG_ARM64_64K_PAGES=y
# CONFIG_ARM64_VA_BITS_42 is not set
CONFIG_ARM64_VA_BITS_48=y
# CONFIG_ARM64_USER_VA_BITS_52 is not set
CONFIG_ARM64_VA_BITS=48
# CONFIG_ARM64_PA_BITS_48 is not set
CONFIG_ARM64_PA_BITS_52=y
CONFIG_ARM64_PA_BITS=52
b. Both on CPUs which don't support ARMv8.2 LPA extension and on ARMv8
FVP model with ARMv8.2 LPA extensions.
- Error message from makedumpfile:
$ makedumpfile -f --mem-usage /proc/kcore -D
max_mapnr : a00000
kimage_voffset : fffeffff80000000
max_physmem_bits : 30
section_size_bits: 1e
vaddr_to_paddr_arm64: pgda=90d70000, pudv.pgd=ffffff80ffffffd0
vaddr_to_paddr_arm64: puda=90d70000, pudv.pgd=9fffff0003
vaddr_to_paddr_arm64: pmda=9fffff0000, pmdv.pud=9ffffe0003
vaddr_to_paddr_arm64: paddr=911a962c
va_bits : 48
page_offset : ffff800000000000
vaddr_to_paddr_arm64: pgda=90d70000, pudv.pgd=41a474
vaddr_to_paddr_arm64: puda=90d70000, pudv.pgd=9fffff0003
vaddr_to_paddr_arm64: pmda=9fffff0000, pmdv.pud=9ffffe0003
vaddr_to_paddr_arm64: paddr=911a1320
num of NODEs : 1
Memory type : SPARSEMEM
vaddr_to_paddr_arm64: pgda=90d70100, pudv.pgd=a657470206461
vaddr_to_paddr_arm64: puda=90d70100, pudv.pgd=9ffffd0003
vaddr_to_paddr_arm64: pmda=9ffffd27d8, pmdv.pud=9ffeee0003
vaddr_to_paddr_arm64: paddr=9ffeedc600
vaddr_to_paddr_arm64: pgda=90d70100, pudv.pgd=ffff97d98224
vaddr_to_paddr_arm64: puda=90d70100, pudv.pgd=9ffffd0003
vaddr_to_paddr_arm64: pmda=9ffffd27d8, pmdv.pud=9ffeee0003
vaddr_to_paddr_arm64: paddr=9ffeedc600
get_mm_sparsemem: Can't get the address of mem_section.
c. Root Cause Analysis -
- After the PA_BITS changes in arm64 kernel we set:
#define MAX_PHYSMEM_BITS CONFIG_ARM64_PA_BITS
- For SPARSEMEM, this value is used to calculate the bits space required
to store a section:
#define SECTIONS_SHIFT (MAX_PHYSMEM_BITS - SECTION_SIZE_BITS)
#define NR_MEM_SECTIONS (1UL << SECTIONS_SHIFT)
- User-space tools use a similar mechanism to determine the SPARSEMEM
type (extreme or not) using the 'NR_MEM_SECTIONS' value (an example from
makedumpfile code):
int
is_sparsemem_extreme(void)
{
if ((ARRAY_LENGTH(mem_section)
== divideup(NR_MEM_SECTIONS(), _SECTIONS_PER_ROOT_EXTREME()))
|| (ARRAY_LENGTH(mem_section) == NOT_FOUND_STRUCTURE))
return TRUE;
else
return FALSE;
}
- Since MAX_PHYSMEM_BITS are 48 bits for normal cases and are 52 bits
for extended PA address space, the memory type is incorrectly calculated
as SPARSEMEM rather than SPARSEMEM_EX in above case.
- Exporting correct 'MAX_PHYSMEM_BITS' via vmcoreinfo for 52-bit PA
case, fixes the above mentioned issue:
$ makedumpfile -f --mem-usage /proc/kcore -D
<..snip..>
NUMBER(MAX_PHYSMEM_BITS)=52
<..snip..>
max_mapnr : a00000
kimage_voffset : fffeffff80000000
max_physmem_bits : 30
section_size_bits: 1e
vaddr_to_paddr_arm64: pgda=90d70000, pudv.pgd=ffffff80ffffffd0
vaddr_to_paddr_arm64: puda=90d70000, pudv.pgd=9fffff0003
vaddr_to_paddr_arm64: pmda=9fffff0000, pmdv.pud=9ffffe0003
vaddr_to_paddr_arm64: paddr=911a962c
va_bits : 48
page_offset : ffff800000000000
vaddr_to_paddr_arm64: pgda=90d70000, pudv.pgd=41a474
vaddr_to_paddr_arm64: puda=90d70000, pudv.pgd=9fffff0003
vaddr_to_paddr_arm64: pmda=9fffff0000, pmdv.pud=9ffffe0003
vaddr_to_paddr_arm64: paddr=911a1320
num of NODEs : 1
Memory type : SPARSEMEM_EX
<..snip..>
TYPE PAGES EXCLUDABLE DESCRIPTION
----------------------------------------------------------------------
ZERO 2626 yes Pages filled with zero
NON_PRI_CACHE 569 yes Cache pages without
private flag
PRI_CACHE 5446 yes Cache pages with
private flag
USER 3213 yes User process pages
FREE 2048971 yes Free pages
KERN_DATA 19034 no Dumpable kernel data
page size: 65536
Total pages on system: 2079859
Total size on system: 136305639424 Byte
(2). Regression Case 2 (ARMv8.2-LPA + LVA [52-bit user-space VA] enabled
kernel):
- Upstream makedumpfile and crash-utility (with sha-id
e082c372c7f1a782b058ec359dfbbbee0f0b6aad) are broken on following kind
of platforms:
a. Upstream Kernel 5.0.0-rc6+ with the following kernel configuration:
CONFIG_ARM64_64K_PAGES=y
# CONFIG_ARM64_VA_BITS_42 is not set
# CONFIG_ARM64_VA_BITS_48 is not set
CONFIG_ARM64_USER_VA_BITS_52=y
CONFIG_ARM64_VA_BITS=48
# CONFIG_ARM64_PA_BITS_48 is not set
CONFIG_ARM64_PA_BITS_52=y
CONFIG_ARM64_PA_BITS=52
b. Both on CPUs which don't support ARMv8.2 extensions and on ARMv8 FVP
model with ARMv8.2 extensions.
- Error message from makedumpfile:
$ makedumpfile -f --mem-usage /proc/kcore -D
max_mapnr : a00000
kimage_voffset : fffeffff78000000
max_physmem_bits : 30
section_size_bits: 1e
vaddr_to_paddr_arm64: pgda=90f30000, pudv.pgd=ffffff80ffffffd0
vaddr_to_paddr_arm64: puda=90f30000, pudv.pgd=0
readpage_elf: Attempt to read non-existent page at 0x0.
readmem: type_addr: 1, addr:0, size:8
vaddr_to_paddr_arm64: Can't read pmd
readmem: Can't convert a virtual address(ffff0000093c576c) to physical
address.
readmem: type_addr: 0, addr:ffff0000093c576c, size:390
check_release: Can't get the address of system_utsname.
c. Root Cause Analysis -
- After the 52-bit user-space VA_BIT changes in arm64 kernel we set:
#define PTRS_PER_PGD (1 << (MAX_USER_VA_BITS - PGDIR_SHIFT))
- User-space tools like makedumpfile and crash use the 'PTRS_PER_PGD'
value to calculate the 'pgd_index()' of a vaddr:
#define pgd_index(vaddr) (((vaddr) >> PGDIR_SHIFT) &
(PTRS_PER_PGD - 1))
- Since the kernel now defines 'MAX_USER_VA_BITS' as:
#ifdef CONFIG_ARM64_USER_VA_BITS_52
#define MAX_USER_VA_BITS 52
#else
#define MAX_USER_VA_BITS VA_BITS
#endif
so, the user-space also needs this value to calculate the 'PTRS_PER_PGD'
and hence 'pgd_index()' correctly.
- Exporting correct 'MAX_USER_VA_BITS' via vmcoreinfo for the above
case, fixes the above mentioned issue:
$ makedumpfile -f --mem-usage /proc/kcore -D
<..snip..>
max_mapnr : a00000
pa_bits : 52
va_bits : 48 (vmcoreinfo)
max_user_va_bits : 52 (vmcoreinfo)
kimage_voffset : fffeffff78000000
max_physmem_bits : 52
section_size_bits: 30
vaddr_to_paddr_arm64: pgda=90f31e00, pudv.pgd=ffffff80ffffffd0
vaddr_to_paddr_arm64: puda=90f31e00, pudv.pgd=9fffff0803
vaddr_to_paddr_arm64: pmda=9fffff0000, pmdv.pud=9ffffe0803
vaddr_to_paddr_arm64: paddr=913c576c
page_offset : ffff800000000000
vaddr_to_paddr_arm64: pgda=90f31e00, pudv.pgd=16e28e8bed294900
vaddr_to_paddr_arm64: puda=90f31e00, pudv.pgd=9fffff0803
vaddr_to_paddr_arm64: pmda=9fffff0000, pmdv.pud=9ffffe0803
vaddr_to_paddr_arm64: paddr=913bd2f8
num of NODEs : 1
Memory type : SPARSEMEM_EX
<..snip..>
Other important notes
---------------------
1. I have quoted only one makedumpfile use-case failure above (i.e.
calculating --mem-usage on the primary kernel). Other use-cases like
creating a dumpfile using /proc/vmcore or post-processing a vmcore are
also broken similarly and get fixed when a kernel which exports
'MAX_USER_VA_BITS' and 'MAX_PHYSMEM_BITS' to vmcoreinfo is used along
with a modified user-space which can read this information from the
vmcoreinfo.
2. I was also going through some of the suggestions on earlier threads
about the PTE calculations for the 52-bit LPA case and discussed them
with some partner arm64 SoC enggs.
The suggestions to convert a page table entry to a physical address
without awareness of 52-bit (with an assumption of 64k page size) can be
risky.
With 64k page and older non-52-bit kernels, while it looks like in the
current checks that bits [15:12] are zero, and we can move the zeros to
bits [51:48] (because the zeros don't affect the overall PA) to generate
the overall 52-bit PA. However, this can cause IMPLEMENTATION SPECIFIC
issues on different platforms while generating a PA and IPA.
Lets see what the ARMv8 architecture reference manual says about the
Bits [15:12] for a 64KB page size:
"Bits [15:12] of each valid translation table descriptor hold Bits
[51:48] of the output address, or of the address
of the translation table to be used for the initial lookup at the next
level of translation. If the implementation
does not support 52-bit physical addresses, then it is IMPLEMENTATION
DEFINED whether non-zero values for
these bits generate an Address size fault. In this case, not generating
an Address Size Fault is deprecated."
As per the vendors, we should not assume that hardware (which does not
support 52-bit physical addresses) would generate an Address size fault
for non-zero values of Bits[15:12], so extending them to bits [51:48]
always can lead to PA address which might cause UNDEFINED behavior on
some SoCs.
Hope the above text clarifies the problem and what I am trying to fix
via this patch. Please let me know if something is missing here.
Thanks,
Bhupesh
_______________________________________________
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec