On Fri, Feb 15, 2019 at 11:47:35PM -0500, Qian Cai wrote:
> Page table walkers trigger soft lockups below with KASAN_SW_TAGS outline
> mode on a large ThunderX2 system, because there is too much overhead to
> call check_memory_region() for every memory access where it needs to
> dereference every byte of the corresponding KASAN shadow address for the
> correct tag.
> 
> [   76.531328] watchdog: BUG: soft lockup - CPU#65 stuck for 23s! 
> [swapper/0:1]
> [   76.538372] Modules linked in:
> [   76.541433] CPU: 65 PID: 1 Comm: swapper/0 Not tainted 5.0.0-rc6+ #62
> [   76.557697] pstate: 60400009 (nZCv daif +PAN -UAO)
> [   76.562491] pc : check_memory_region+0x64/0x94
> [   76.566934] lr : __hwasan_load8_noabort+0x20/0x2c
> [   76.571633] sp : 7eff808ba0247ca0
> [   76.574943] x29: 7eff808ba0247cc0 x28: ffff068cef720000
> [   76.580256] x27: ffff080000000000 x26: 0060000000000793
> [   76.585568] x25: ffff068d00000000 x24: ffff800003537b98
> [   76.590880] x23: 7eff808ba0247e08 x22: 0000000000000000
> [   76.596192] x21: 7eff808ba0247e08 x20: 0000000000000008
> [   76.601503] x19: ffff1000100a8d64 x18: 0000000000000000
> [   76.606814] x17: 0000000001000100 x16: 0000000000000000
> [   76.612125] x15: ffff100013805578 x14: ffff100014085000
> [   76.617437] x13: 0000000030373a2e x12: 00f0000000000793
> [   76.622749] x11: ffff808ba0247e0f x10: ffff0808ba0247e0
> [   76.628060] x9 : ffff0808ba0247e0 x8 : 000000000000007e
> [   76.633371] x7 : 0000000000000000 x6 : 0000000000000002
> [   76.638682] x5 : 0000000000000000 x4 : 00e0000000000793
> [   76.643994] x3 : ffff1000100a8d64 x2 : 0000000000000000
> [   76.649305] x1 : 0000000000000008 x0 : 7eff808ba0247e08
> [   76.654617] Call trace:
> [   76.657066]  check_memory_region+0x64/0x94
> [   76.661162]  __hwasan_load8_noabort+0x20/0x2c
> [   76.665519]  note_page+0x84/0x708
> [   76.668833]  walk_pgd+0x174/0x258
> [   76.672147]  ptdump_check_wx+0x90/0xfc
> [   76.675894]  mark_rodata_ro+0x38/0x44
> [   76.679557]  kernel_init+0x48/0x180
> [   76.683045]  ret_from_fork+0x10/0x18
> 
> Signed-off-by: Qian Cai <c...@lca.pw>
> ---
>  arch/arm64/mm/Makefile | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/arch/arm64/mm/Makefile b/arch/arm64/mm/Makefile
> index 849c1df3d214..4b9a7a50faaf 100644
> --- a/arch/arm64/mm/Makefile
> +++ b/arch/arm64/mm/Makefile
> @@ -12,3 +12,9 @@ KASAN_SANITIZE_physaddr.o   += n
>  
>  obj-$(CONFIG_KASAN)          += kasan_init.o
>  KASAN_SANITIZE_kasan_init.o  := n
> +
> +ifdef CONFIG_KASAN_SW_TAGS
> +ifdef CONFIG_KASAN_OUTLINE
> +KASAN_SANITIZE_dump.o                := n
> +endif
> +endif

I really don't think this is the right way to go about this. Either the
machine eventually makes progress, in which case perhaps the default soft
watchdog timeout should increase when KASAN is enabled, or the machine locks
up, which is a bug.

With your proposal, it will be very difficult to justify ever re-enabling
KASAN on this file and therefore it just chips away at the code coverage
because of an issue that doesn't appear to be well understood.

So please consider this a "NAK" from me based on the current information
that we have.

Will

Reply via email to