Andrew Donnellan <a...@linux.ibm.com> writes: > On 6/3/20 6:30 pm, Daniel Axtens wrote: >> kcov instrumentation is collected the __sanitizer_cov_trace_pc hook in >> kernel/kcov.c. The compiler inserts these hooks into every basic block >> unless kcov is disabled for that file. >> >> We then have a deep call-chain: >> - __sanitizer_cov_trace_pc calls to check_kcov_mode() >> - check_kcov_mode() (kernel/kcov.c) calls in_task() >> - in_task() (include/linux/preempt.h) calls preempt_count(). >> - preempt_count() (include/asm-generic/preempt.h) calls >> current_thread_info() >> - because powerpc has THREAD_INFO_IN_TASK, current_thread_info() >> (include/linux/thread_info.h) is defined to 'current' >> - current (arch/powerpc/include/asm/current.h) is defined to >> get_current(). >> - get_current (same file) loads an offset of r13. >> - arch/powerpc/include/asm/paca.h makes r13 a register variable >> called local_paca - it is the PACA for the current CPU, so >> this has the effect of loading the current task from PACA. >> - get_current returns the current task from PACA, >> - current_thread_info returns the task cast to a thread_info >> - preempt_count dereferences the thread_info to load preempt_count >> - that value is used by in_task and so on up the chain >> >> The problem is: >> >> - kcov instrumentation is enabled for arch/powerpc/kernel/dt_cpu_ftrs.c >> >> - even if it were not, dt_cpu_ftrs_init calls generic dt parsing code >> which should definitely have instrumentation enabled. >> >> - setup_64.c calls dt_cpu_ftrs_init before it sets up a PACA. >> >> - If we don't set up a paca, r13 will contain unpredictable data. >> >> - In a zImage compiled with kcov and KASAN, we see r13 containing a value >> that leads to dereferencing invalid memory (something like >> 912a72603d420015). >> >> - Weirdly, the same kernel as a vmlinux loaded directly by qemu does not >> crash. Investigating with gdb, it seems that in the vmlinux boot case, >> r13 is near enough to zero that we just happen to be able to read that >> part of memory (we're operating with translation off at this point) and >> the current pointer also happens to land in readable memory and >> everything just works. >> >> - PACA setup refers to CPU features - setup_paca() looks at >> early_cpu_has_feature(CPU_FTR_HVMODE) >> >> There's no generic kill switch for kcov (as far as I can tell), and we >> don't want to have to turn off instrumentation in the generic dt parsing >> code (which lives outside arch/powerpc/) just because we don't have a real >> paca or task yet. >> >> So: >> - change the test when setting up a PACA to consider the actual value of >> the MSR rather than the CPU feature. >> >> - move the PACA setup to before the cpu feature parsing. >> >> Translations get switched on once we leave early_setup, so I think we'd >> already catch any other cases where the PACA or task aren't set up. >> >> Boot tested on a P9 guest and host. >> >> Fixes: fb0b0a73b223 ("powerpc: Enable kcov") >> Cc: Andrew Donnellan <a...@linux.ibm.com> >> Suggested-by: Michael Ellerman <m...@ellerman.id.au> >> Signed-off-by: Daniel Axtens <d...@axtens.net> >> >> --- >> >> Regarding moving the comment about printk()-safety: >> I am about 75% sure that the thing that makes printk() safe is the PACA, >> not the CPU features. That's what commit 24d9649574fb ("[POWERPC] Document >> when printk is useable") seems to indicate, but as someone wise recently >> told me, "bootstrapping is hard", so I may be totally wrong. >> >> v3: Update comment, thanks Christophe Leroy. >> Remove a comment in dt_cpu_ftrs.c that is no longer accurate - thanks >> Andrew. I think we want to retain all the code still, but I'm open to >> being told otherwise. > > Thanks for doing that. > > This patch and the justification doesn't seem obviously wrong, and is > snowpatch-clean. > > Reviewed-by: Andrew Donnellan <a...@linux.ibm.com> > > (Is it worth cc'ing this to stable in case there are other situations we > haven't foreseen where we hit the unpredictable r13 data? Few people use > kcov...)
I did briefly consider it but didn't believe it reached the stable criteria: | It must fix a real bug that bothers people (not a, “This could be a | problem...” type thing). On reflection it's a real bug (boot hang), it bothers me, and presumably also you due to the syzkaller interaction, and I am led to believe we are both people, so I guess I'll do a v3 with cc: stable. Thanks! Regards, Daniel > > -- > Andrew Donnellan OzLabs, ADL Canberra > a...@linux.ibm.com IBM Australia Limited