Andrew Donnellan <a...@linux.ibm.com> writes:

> On 6/3/20 6:30 pm, Daniel Axtens wrote:
>> kcov instrumentation is collected the __sanitizer_cov_trace_pc hook in
>> kernel/kcov.c. The compiler inserts these hooks into every basic block
>> unless kcov is disabled for that file.
>> 
>> We then have a deep call-chain:
>>   - __sanitizer_cov_trace_pc calls to check_kcov_mode()
>>   - check_kcov_mode() (kernel/kcov.c) calls in_task()
>>   - in_task() (include/linux/preempt.h) calls preempt_count().
>>   - preempt_count() (include/asm-generic/preempt.h) calls
>>       current_thread_info()
>>   - because powerpc has THREAD_INFO_IN_TASK, current_thread_info()
>>       (include/linux/thread_info.h) is defined to 'current'
>>   - current (arch/powerpc/include/asm/current.h) is defined to
>>       get_current().
>>   - get_current (same file) loads an offset of r13.
>>   - arch/powerpc/include/asm/paca.h makes r13 a register variable
>>       called local_paca - it is the PACA for the current CPU, so
>>       this has the effect of loading the current task from PACA.
>>   - get_current returns the current task from PACA,
>>   - current_thread_info returns the task cast to a thread_info
>>   - preempt_count dereferences the thread_info to load preempt_count
>>   - that value is used by in_task and so on up the chain
>> 
>> The problem is:
>> 
>>   - kcov instrumentation is enabled for arch/powerpc/kernel/dt_cpu_ftrs.c
>> 
>>   - even if it were not, dt_cpu_ftrs_init calls generic dt parsing code
>>     which should definitely have instrumentation enabled.
>> 
>>   - setup_64.c calls dt_cpu_ftrs_init before it sets up a PACA.
>> 
>>   - If we don't set up a paca, r13 will contain unpredictable data.
>> 
>>   - In a zImage compiled with kcov and KASAN, we see r13 containing a value
>>     that leads to dereferencing invalid memory (something like
>>     912a72603d420015).
>> 
>>   - Weirdly, the same kernel as a vmlinux loaded directly by qemu does not
>>     crash. Investigating with gdb, it seems that in the vmlinux boot case,
>>     r13 is near enough to zero that we just happen to be able to read that
>>     part of memory (we're operating with translation off at this point) and
>>     the current pointer also happens to land in readable memory and
>>     everything just works.
>> 
>>   - PACA setup refers to CPU features - setup_paca() looks at
>>     early_cpu_has_feature(CPU_FTR_HVMODE)
>> 
>> There's no generic kill switch for kcov (as far as I can tell), and we
>> don't want to have to turn off instrumentation in the generic dt parsing
>> code (which lives outside arch/powerpc/) just because we don't have a real
>> paca or task yet.
>> 
>> So:
>>   - change the test when setting up a PACA to consider the actual value of
>>     the MSR rather than the CPU feature.
>> 
>>   - move the PACA setup to before the cpu feature parsing.
>> 
>> Translations get switched on once we leave early_setup, so I think we'd
>> already catch any other cases where the PACA or task aren't set up.
>> 
>> Boot tested on a P9 guest and host.
>> 
>> Fixes: fb0b0a73b223 ("powerpc: Enable kcov")
>> Cc: Andrew Donnellan <a...@linux.ibm.com>
>> Suggested-by: Michael Ellerman <m...@ellerman.id.au>
>> Signed-off-by: Daniel Axtens <d...@axtens.net>
>> 
>> ---
>> 
>> Regarding moving the comment about printk()-safety:
>> I am about 75% sure that the thing that makes printk() safe is the PACA,
>> not the CPU features. That's what commit 24d9649574fb ("[POWERPC] Document
>> when printk is useable") seems to indicate, but as someone wise recently
>> told me, "bootstrapping is hard", so I may be totally wrong.
>> 
>> v3: Update comment, thanks Christophe Leroy.
>>      Remove a comment in dt_cpu_ftrs.c that is no longer accurate - thanks
>>        Andrew. I think we want to retain all the code still, but I'm open to
>>        being told otherwise.
>
> Thanks for doing that.
>
> This patch and the justification doesn't seem obviously wrong, and is 
> snowpatch-clean.
>
> Reviewed-by: Andrew Donnellan <a...@linux.ibm.com>
>
> (Is it worth cc'ing this to stable in case there are other situations we 
> haven't foreseen where we hit the unpredictable r13 data? Few people use 
> kcov...)

I did briefly consider it but didn't believe it reached the stable
criteria:

| It must fix a real bug that bothers people (not a, “This could be a
| problem...” type thing).

On reflection it's a real bug (boot hang), it bothers me, and presumably
also you due to the syzkaller interaction, and I am led to believe we
are both people, so I guess I'll do a v3 with cc: stable. Thanks!

Regards,
Daniel

>
> -- 
> Andrew Donnellan              OzLabs, ADL Canberra
> a...@linux.ibm.com             IBM Australia Limited

Reply via email to