Re: 82fef0ad811f "x86/mm: unencrypted non-blocking DMA allocations use coherent pools" was Re: next-0519 on thinkpad x60: sound related? window manager crash

2020-06-07 Thread Alex Xu (Hello71)
Excerpts from David Rientjes's message of June 7, 2020 8:57 pm:
> Thanks for trying it out, Alex.  Would you mind sending your .config and 
> command line?  I assume either mem_encrypt=on or 
> CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT is enabled.
> 
> Could you also give this a try?
> 
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -99,10 +99,11 @@ static inline bool dma_should_alloc_from_pool(struct 
> device *dev, gfp_t gfp,
>  static inline bool dma_should_free_from_pool(struct device *dev,
>unsigned long attrs)
>  {
> - if (IS_ENABLED(CONFIG_DMA_COHERENT_POOL))
> + if (!IS_ENABLED(CONFIG_DMA_COHERENT_POOL))
> + return false;
> + if (force_dma_unencrypted(dev))
>   return true;
> - if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) &&
> - !force_dma_unencrypted(dev))
> + if (attrs & DMA_ATTR_NO_KERNEL_MAPPING)
>   return false;
>   if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP))
>   return true;
> 

This patch doesn't work for me either. It has since occurred to me that 
while I do have CONFIG_AMD_MEM_ENCYRPT=y, I have 
CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=n, because it was broken with 
amdgpu (unfortunately a downgrade from radeon in this respect). Tried it 
again just now and it looks like it's now able to enable KMS, but all it 
displays is serious-looking errors.

Sorry for not mentioning that earlier. I'll send you my .config and 
command line off-list.

Thanks,
Alex.


Re: 82fef0ad811f "x86/mm: unencrypted non-blocking DMA allocations use coherent pools" was Re: next-0519 on thinkpad x60: sound related? window manager crash

2020-06-07 Thread David Rientjes
On Sun, 7 Jun 2020, Alex Xu (Hello71) wrote:

> > On Sun, 7 Jun 2020, Pavel Machek wrote:
> > 
> >> > I have a similar issue, caused between aaa2faab4ed8 and b170290c2836.
> >> > 
> >> > [   20.263098] BUG: unable to handle page fault for address: 
> >> > b2b582cc2000
> >> > [   20.263104] #PF: supervisor write access in kernel mode
> >> > [   20.263105] #PF: error_code(0x000b) - reserved bit violation
> >> > [   20.263107] PGD 3fd03b067 P4D 3fd03b067 PUD 3fd03c067 PMD 3f8822067 
> >> > PTE 8000273942ab2163
> >> > [   20.263113] Oops: 000b [#1] PREEMPT SMP
> >> > [   20.263117] CPU: 3 PID: 691 Comm: mpv Not tainted 
> >> > 5.7.0-11262-gb170290c2836 #1
> >> > [   20.263119] Hardware name: To Be Filled By O.E.M. To Be Filled By 
> >> > O.E.M./B450 Pro4, BIOS P4.10 03/05/2020
> >> > [   20.263125] RIP: 0010:__memset+0x24/0x30
> >> > [   20.263128] Code: cc cc cc cc cc cc 0f 1f 44 00 00 49 89 f9 48 89 d1 
> >> > 83 e2 07 48 c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af 
> >> > c6  48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 f3
> >> > [   20.263131] RSP: 0018:b2b583d07e10 EFLAGS: 00010216
> >> > [   20.263133] RAX:  RBX: 8b8000102c00 RCX: 
> >> > 4000
> >> > [   20.263134] RDX:  RSI:  RDI: 
> >> > b2b582cc2000
> >> > [   20.263136] RBP: 8b8000101000 R08:  R09: 
> >> > b2b582cc2000
> >> > [   20.263137] R10: 5356 R11: 8b8000102c18 R12: 
> >> > 
> >> > [   20.263139] R13:  R14: 8b8039944200 R15: 
> >> > 9794daa0
> >> > [   20.263141] FS:  7f41aa4b4200() GS:8b803ecc() 
> >> > knlGS:
> >> > [   20.263143] CS:  0010 DS:  ES:  CR0: 80050033
> >> > [   20.263144] CR2: b2b582cc2000 CR3: 0003b6731000 CR4: 
> >> > 003406e0
> >> > [   20.263146] Call Trace:
> >> > [   20.263151]  ? snd_pcm_hw_params+0x3f3/0x47a
> >> > [   20.263154]  ? snd_pcm_common_ioctl+0xf2/0xf73
> >> > [   20.263158]  ? snd_pcm_ioctl+0x1e/0x29
> >> > [   20.263161]  ? ksys_ioctl+0x77/0x91
> >> > [   20.263163]  ? __x64_sys_ioctl+0x11/0x14
> >> > [   20.263166]  ? do_syscall_64+0x3d/0xf5
> >> > [   20.263170]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >> > [   20.263173] Modules linked in: uvcvideo videobuf2_vmalloc 
> >> > videobuf2_memops videobuf2_v4l2 videodev snd_usb_audio videobuf2_common 
> >> > snd_hwdep snd_usbmidi_lib input_leds snd_rawmidi led_class
> >> > [   20.263182] CR2: b2b582cc2000
> >> > [   20.263184] ---[ end trace c6b47a774b91f0a0 ]---
> >> > [   20.263187] RIP: 0010:__memset+0x24/0x30
> >> > [   20.263190] Code: cc cc cc cc cc cc 0f 1f 44 00 00 49 89 f9 48 89 d1 
> >> > 83 e2 07 48 c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af 
> >> > c6  48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 f3
> >> > [   20.263192] RSP: 0018:b2b583d07e10 EFLAGS: 00010216
> >> > [   20.263193] RAX:  RBX: 8b8000102c00 RCX: 
> >> > 4000
> >> > [   20.263195] RDX:  RSI:  RDI: 
> >> > b2b582cc2000
> >> > [   20.263196] RBP: 8b8000101000 R08:  R09: 
> >> > b2b582cc2000
> >> > [   20.263197] R10: 5356 R11: 8b8000102c18 R12: 
> >> > 
> >> > [   20.263199] R13:  R14: 8b8039944200 R15: 
> >> > 9794daa0
> >> > [   20.263201] FS:  7f41aa4b4200() GS:8b803ecc() 
> >> > knlGS:
> >> > [   20.263202] CS:  0010 DS:  ES:  CR0: 80050033
> >> > [   20.263204] CR2: b2b582cc2000 CR3: 0003b6731000 CR4: 
> >> > 003406e0
> >> > 
> >> > I bisected this to 82fef0ad811f "x86/mm: unencrypted non-blocking DMA 
> >> > allocations use coherent pools". Reverting 1ee18de92927 resolves the 
> >> > issue.
> >> > 
> >> > Looks like Thinkpad X60 doesn't have VT-d, but could still be DMA 
> >> > related.
> >> 
> >> Note that newer -next releases seem to behave okay for me. The commit
> >> pointed out by siection is really simple:
> >> 
> >> AFAIK you could verify it is responsible by turning off
> >> CONFIG_AMD_MEM_ENCRYPT on latest kernel...
> >> 
> >> Best regards,
> >>Pavel
> >> 
> >> index 1d6104ea8af0..2bf819d3 100644
> >> --- a/arch/x86/Kconfig
> >> +++ b/arch/x86/Kconfig
> >> @@ -1520,6 +1520,7 @@ config X86_CPA_STATISTICS
> >>  config AMD_MEM_ENCRYPT
> >> bool "AMD Secure Memory Encryption (SME) support"
> >> depends on X86_64 && CPU_SUP_AMD
> >> +   select DMA_COHERENT_POOL
> >> select DYNAMIC_PHYSICAL_MASK
> >> select ARCH_USE_MEMREMAP_PROT
> >> select ARCH_HAS_FORCE_DMA_UNENCRYPTED
> > 
> > Thanks for the report!
> > 
> > Besides CONFIG_AMD_MEM_ENCRYPT, do you have CONFIG_DMA_DIRECT_REMAP 
> > enabled?  If so, it may be caused by the virtual address passed to the 
> >

Re: 82fef0ad811f "x86/mm: unencrypted non-blocking DMA allocations use coherent pools" was Re: next-0519 on thinkpad x60: sound related? window manager crash

2020-06-07 Thread Alex Xu (Hello71)
Excerpts from David Rientjes's message of June 7, 2020 3:41 pm:
> On Sun, 7 Jun 2020, Pavel Machek wrote:
> 
>> > I have a similar issue, caused between aaa2faab4ed8 and b170290c2836.
>> > 
>> > [   20.263098] BUG: unable to handle page fault for address: 
>> > b2b582cc2000
>> > [   20.263104] #PF: supervisor write access in kernel mode
>> > [   20.263105] #PF: error_code(0x000b) - reserved bit violation
>> > [   20.263107] PGD 3fd03b067 P4D 3fd03b067 PUD 3fd03c067 PMD 3f8822067 PTE 
>> > 8000273942ab2163
>> > [   20.263113] Oops: 000b [#1] PREEMPT SMP
>> > [   20.263117] CPU: 3 PID: 691 Comm: mpv Not tainted 
>> > 5.7.0-11262-gb170290c2836 #1
>> > [   20.263119] Hardware name: To Be Filled By O.E.M. To Be Filled By 
>> > O.E.M./B450 Pro4, BIOS P4.10 03/05/2020
>> > [   20.263125] RIP: 0010:__memset+0x24/0x30
>> > [   20.263128] Code: cc cc cc cc cc cc 0f 1f 44 00 00 49 89 f9 48 89 d1 83 
>> > e2 07 48 c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 
>> >  48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 f3
>> > [   20.263131] RSP: 0018:b2b583d07e10 EFLAGS: 00010216
>> > [   20.263133] RAX:  RBX: 8b8000102c00 RCX: 
>> > 4000
>> > [   20.263134] RDX:  RSI:  RDI: 
>> > b2b582cc2000
>> > [   20.263136] RBP: 8b8000101000 R08:  R09: 
>> > b2b582cc2000
>> > [   20.263137] R10: 5356 R11: 8b8000102c18 R12: 
>> > 
>> > [   20.263139] R13:  R14: 8b8039944200 R15: 
>> > 9794daa0
>> > [   20.263141] FS:  7f41aa4b4200() GS:8b803ecc() 
>> > knlGS:
>> > [   20.263143] CS:  0010 DS:  ES:  CR0: 80050033
>> > [   20.263144] CR2: b2b582cc2000 CR3: 0003b6731000 CR4: 
>> > 003406e0
>> > [   20.263146] Call Trace:
>> > [   20.263151]  ? snd_pcm_hw_params+0x3f3/0x47a
>> > [   20.263154]  ? snd_pcm_common_ioctl+0xf2/0xf73
>> > [   20.263158]  ? snd_pcm_ioctl+0x1e/0x29
>> > [   20.263161]  ? ksys_ioctl+0x77/0x91
>> > [   20.263163]  ? __x64_sys_ioctl+0x11/0x14
>> > [   20.263166]  ? do_syscall_64+0x3d/0xf5
>> > [   20.263170]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
>> > [   20.263173] Modules linked in: uvcvideo videobuf2_vmalloc 
>> > videobuf2_memops videobuf2_v4l2 videodev snd_usb_audio videobuf2_common 
>> > snd_hwdep snd_usbmidi_lib input_leds snd_rawmidi led_class
>> > [   20.263182] CR2: b2b582cc2000
>> > [   20.263184] ---[ end trace c6b47a774b91f0a0 ]---
>> > [   20.263187] RIP: 0010:__memset+0x24/0x30
>> > [   20.263190] Code: cc cc cc cc cc cc 0f 1f 44 00 00 49 89 f9 48 89 d1 83 
>> > e2 07 48 c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 
>> >  48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 f3
>> > [   20.263192] RSP: 0018:b2b583d07e10 EFLAGS: 00010216
>> > [   20.263193] RAX:  RBX: 8b8000102c00 RCX: 
>> > 4000
>> > [   20.263195] RDX:  RSI:  RDI: 
>> > b2b582cc2000
>> > [   20.263196] RBP: 8b8000101000 R08:  R09: 
>> > b2b582cc2000
>> > [   20.263197] R10: 5356 R11: 8b8000102c18 R12: 
>> > 
>> > [   20.263199] R13:  R14: 8b8039944200 R15: 
>> > 9794daa0
>> > [   20.263201] FS:  7f41aa4b4200() GS:8b803ecc() 
>> > knlGS:
>> > [   20.263202] CS:  0010 DS:  ES:  CR0: 80050033
>> > [   20.263204] CR2: b2b582cc2000 CR3: 0003b6731000 CR4: 
>> > 003406e0
>> > 
>> > I bisected this to 82fef0ad811f "x86/mm: unencrypted non-blocking DMA 
>> > allocations use coherent pools". Reverting 1ee18de92927 resolves the 
>> > issue.
>> > 
>> > Looks like Thinkpad X60 doesn't have VT-d, but could still be DMA 
>> > related.
>> 
>> Note that newer -next releases seem to behave okay for me. The commit
>> pointed out by siection is really simple:
>> 
>> AFAIK you could verify it is responsible by turning off
>> CONFIG_AMD_MEM_ENCRYPT on latest kernel...
>> 
>> Best regards,
>>  Pavel
>> 
>> index 1d6104ea8af0..2bf819d3 100644
>> --- a/arch/x86/Kconfig
>> +++ b/arch/x86/Kconfig
>> @@ -1520,6 +1520,7 @@ config X86_CPA_STATISTICS
>>  config AMD_MEM_ENCRYPT
>> bool "AMD Secure Memory Encryption (SME) support"
>> depends on X86_64 && CPU_SUP_AMD
>> +   select DMA_COHERENT_POOL
>> select DYNAMIC_PHYSICAL_MASK
>> select ARCH_USE_MEMREMAP_PROT
>> select ARCH_HAS_FORCE_DMA_UNENCRYPTED
> 
> Thanks for the report!
> 
> Besides CONFIG_AMD_MEM_ENCRYPT, do you have CONFIG_DMA_DIRECT_REMAP 
> enabled?  If so, it may be caused by the virtual address passed to the 
> set_memory_{decrypted,encrypted}() functions.
> 
> And I assume you are enabling SME by using mem_encrypt=on on the kernel 
> command line or CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT 

Re: 82fef0ad811f "x86/mm: unencrypted non-blocking DMA allocations use coherent pools" was Re: next-0519 on thinkpad x60: sound related? window manager crash

2020-06-07 Thread David Rientjes
On Sun, 7 Jun 2020, Pavel Machek wrote:

> > I have a similar issue, caused between aaa2faab4ed8 and b170290c2836.
> > 
> > [   20.263098] BUG: unable to handle page fault for address: 
> > b2b582cc2000
> > [   20.263104] #PF: supervisor write access in kernel mode
> > [   20.263105] #PF: error_code(0x000b) - reserved bit violation
> > [   20.263107] PGD 3fd03b067 P4D 3fd03b067 PUD 3fd03c067 PMD 3f8822067 PTE 
> > 8000273942ab2163
> > [   20.263113] Oops: 000b [#1] PREEMPT SMP
> > [   20.263117] CPU: 3 PID: 691 Comm: mpv Not tainted 
> > 5.7.0-11262-gb170290c2836 #1
> > [   20.263119] Hardware name: To Be Filled By O.E.M. To Be Filled By 
> > O.E.M./B450 Pro4, BIOS P4.10 03/05/2020
> > [   20.263125] RIP: 0010:__memset+0x24/0x30
> > [   20.263128] Code: cc cc cc cc cc cc 0f 1f 44 00 00 49 89 f9 48 89 d1 83 
> > e2 07 48 c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 
> >  48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 f3
> > [   20.263131] RSP: 0018:b2b583d07e10 EFLAGS: 00010216
> > [   20.263133] RAX:  RBX: 8b8000102c00 RCX: 
> > 4000
> > [   20.263134] RDX:  RSI:  RDI: 
> > b2b582cc2000
> > [   20.263136] RBP: 8b8000101000 R08:  R09: 
> > b2b582cc2000
> > [   20.263137] R10: 5356 R11: 8b8000102c18 R12: 
> > 
> > [   20.263139] R13:  R14: 8b8039944200 R15: 
> > 9794daa0
> > [   20.263141] FS:  7f41aa4b4200() GS:8b803ecc() 
> > knlGS:
> > [   20.263143] CS:  0010 DS:  ES:  CR0: 80050033
> > [   20.263144] CR2: b2b582cc2000 CR3: 0003b6731000 CR4: 
> > 003406e0
> > [   20.263146] Call Trace:
> > [   20.263151]  ? snd_pcm_hw_params+0x3f3/0x47a
> > [   20.263154]  ? snd_pcm_common_ioctl+0xf2/0xf73
> > [   20.263158]  ? snd_pcm_ioctl+0x1e/0x29
> > [   20.263161]  ? ksys_ioctl+0x77/0x91
> > [   20.263163]  ? __x64_sys_ioctl+0x11/0x14
> > [   20.263166]  ? do_syscall_64+0x3d/0xf5
> > [   20.263170]  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > [   20.263173] Modules linked in: uvcvideo videobuf2_vmalloc 
> > videobuf2_memops videobuf2_v4l2 videodev snd_usb_audio videobuf2_common 
> > snd_hwdep snd_usbmidi_lib input_leds snd_rawmidi led_class
> > [   20.263182] CR2: b2b582cc2000
> > [   20.263184] ---[ end trace c6b47a774b91f0a0 ]---
> > [   20.263187] RIP: 0010:__memset+0x24/0x30
> > [   20.263190] Code: cc cc cc cc cc cc 0f 1f 44 00 00 49 89 f9 48 89 d1 83 
> > e2 07 48 c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 
> >  48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 f3
> > [   20.263192] RSP: 0018:b2b583d07e10 EFLAGS: 00010216
> > [   20.263193] RAX:  RBX: 8b8000102c00 RCX: 
> > 4000
> > [   20.263195] RDX:  RSI:  RDI: 
> > b2b582cc2000
> > [   20.263196] RBP: 8b8000101000 R08:  R09: 
> > b2b582cc2000
> > [   20.263197] R10: 5356 R11: 8b8000102c18 R12: 
> > 
> > [   20.263199] R13:  R14: 8b8039944200 R15: 
> > 9794daa0
> > [   20.263201] FS:  7f41aa4b4200() GS:8b803ecc() 
> > knlGS:
> > [   20.263202] CS:  0010 DS:  ES:  CR0: 80050033
> > [   20.263204] CR2: b2b582cc2000 CR3: 0003b6731000 CR4: 
> > 003406e0
> > 
> > I bisected this to 82fef0ad811f "x86/mm: unencrypted non-blocking DMA 
> > allocations use coherent pools". Reverting 1ee18de92927 resolves the 
> > issue.
> > 
> > Looks like Thinkpad X60 doesn't have VT-d, but could still be DMA 
> > related.
> 
> Note that newer -next releases seem to behave okay for me. The commit
> pointed out by siection is really simple:
> 
> AFAIK you could verify it is responsible by turning off
> CONFIG_AMD_MEM_ENCRYPT on latest kernel...
> 
> Best regards,
>   Pavel
> 
> index 1d6104ea8af0..2bf819d3 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -1520,6 +1520,7 @@ config X86_CPA_STATISTICS
>  config AMD_MEM_ENCRYPT
> bool "AMD Secure Memory Encryption (SME) support"
> depends on X86_64 && CPU_SUP_AMD
> +   select DMA_COHERENT_POOL
> select DYNAMIC_PHYSICAL_MASK
> select ARCH_USE_MEMREMAP_PROT
> select ARCH_HAS_FORCE_DMA_UNENCRYPTED

Thanks for the report!

Besides CONFIG_AMD_MEM_ENCRYPT, do you have CONFIG_DMA_DIRECT_REMAP 
enabled?  If so, it may be caused by the virtual address passed to the 
set_memory_{decrypted,encrypted}() functions.

And I assume you are enabling SME by using mem_encrypt=on on the kernel 
command line or CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT is enabled.

We likely need an atomic pool for devices that support DMA to addresses in 
sme_me_mask as well.  I can test this tomorrow, but wanted to get it out 
early to see if