Re: next-0519 on thinkpad x60: sound related? window manager crash
On Sun, 14 Jun 2020 14:07:48 +0200, Alex Xu (Hello71) wrote: > > Excerpts from Takashi Iwai's message of June 14, 2020 5:54 am: > > On Sat, 13 Jun 2020 18:25:22 +0200, > > Alex Xu (Hello71) wrote: > >> > >> Excerpts from Takashi Iwai's message of June 11, 2020 1:11 pm: > >> > Thanks, so something still missing in the mmap handling, I guess. > >> > > >> > I've worked on two different branches for potential fixes of your > >> > problems. Could you test topic/dma-fix and topic/dma-fix2 branches? > >> > git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git > >> > Just pull one of them onto Linus' git HEAD. > >> > > >> > I guess we'll go with David's new patch, but still it's interesting > >> > whether my changes do anything good actually. > >> > > >> > > >> > Takashi > >> > > >> > >> On torvalds 623f6dc593, topic/dma-fix causes sound to be output as > >> alternating half-second bursts of noise and a few seconds of silence. > >> topic/dma-fix2 appears to work properly. > > > > OK, thanks for the feedback! Just to make sure, you're using > > PulseAudio, right? > > If so, it was still something wrong about mmap, and the secondary > > method (the fallback to the continuous page) looks like a safer > > approach in the end. > > > > I suppose that David's fix will be merged sooner or later. Meanwhile > > I'll work on the change in the sound driver side to make things a bit > > more robust. They don't conflict and both good applicable. > > > > > > thanks, > > > > Takashi > > > > Ah, no, I think that wasn't clear. I use ALSA directly with mostly > default configuration, except an asym sets separate default playback and > record devices. > > asound.conf: > > defaults.pcm.card 1 > defaults.ctl.card 1 > > pcm.!default { > type asym > playback.pcm > { > type plug > slave.pcm "dmix" > } > capture.pcm > { > type plug > slave.pcm { > type dsnoop > ipc_key 6793 > slave.pcm "hw:U0x46d0x81d" > } > } > } > > I think I wasn't able to set defaults.pcm.dmix.card and > defaults.pcm.dsnoop.card for some reason, not sure why. I can try that, > but I don't think it will affect this mmap issue. The dmix is an implementation exclusively with mmap, so yes, it's still about the mmap. This also shows the same problem. thanks, Takashi
Re: next-0519 on thinkpad x60: sound related? window manager crash
Excerpts from Takashi Iwai's message of June 14, 2020 5:54 am: > On Sat, 13 Jun 2020 18:25:22 +0200, > Alex Xu (Hello71) wrote: >> >> Excerpts from Takashi Iwai's message of June 11, 2020 1:11 pm: >> > Thanks, so something still missing in the mmap handling, I guess. >> > >> > I've worked on two different branches for potential fixes of your >> > problems. Could you test topic/dma-fix and topic/dma-fix2 branches? >> > git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git >> > Just pull one of them onto Linus' git HEAD. >> > >> > I guess we'll go with David's new patch, but still it's interesting >> > whether my changes do anything good actually. >> > >> > >> > Takashi >> > >> >> On torvalds 623f6dc593, topic/dma-fix causes sound to be output as >> alternating half-second bursts of noise and a few seconds of silence. >> topic/dma-fix2 appears to work properly. > > OK, thanks for the feedback! Just to make sure, you're using > PulseAudio, right? > If so, it was still something wrong about mmap, and the secondary > method (the fallback to the continuous page) looks like a safer > approach in the end. > > I suppose that David's fix will be merged sooner or later. Meanwhile > I'll work on the change in the sound driver side to make things a bit > more robust. They don't conflict and both good applicable. > > > thanks, > > Takashi > Ah, no, I think that wasn't clear. I use ALSA directly with mostly default configuration, except an asym sets separate default playback and record devices. asound.conf: defaults.pcm.card 1 defaults.ctl.card 1 pcm.!default { type asym playback.pcm { type plug slave.pcm "dmix" } capture.pcm { type plug slave.pcm { type dsnoop ipc_key 6793 slave.pcm "hw:U0x46d0x81d" } } } I think I wasn't able to set defaults.pcm.dmix.card and defaults.pcm.dsnoop.card for some reason, not sure why. I can try that, but I don't think it will affect this mmap issue. Thanks, Alex.
Re: next-0519 on thinkpad x60: sound related? window manager crash
On Sat, 13 Jun 2020 18:25:22 +0200, Alex Xu (Hello71) wrote: > > Excerpts from Takashi Iwai's message of June 11, 2020 1:11 pm: > > Thanks, so something still missing in the mmap handling, I guess. > > > > I've worked on two different branches for potential fixes of your > > problems. Could you test topic/dma-fix and topic/dma-fix2 branches? > > git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git > > Just pull one of them onto Linus' git HEAD. > > > > I guess we'll go with David's new patch, but still it's interesting > > whether my changes do anything good actually. > > > > > > Takashi > > > > On torvalds 623f6dc593, topic/dma-fix causes sound to be output as > alternating half-second bursts of noise and a few seconds of silence. > topic/dma-fix2 appears to work properly. OK, thanks for the feedback! Just to make sure, you're using PulseAudio, right? If so, it was still something wrong about mmap, and the secondary method (the fallback to the continuous page) looks like a safer approach in the end. I suppose that David's fix will be merged sooner or later. Meanwhile I'll work on the change in the sound driver side to make things a bit more robust. They don't conflict and both good applicable. thanks, Takashi
Re: next-0519 on thinkpad x60: sound related? window manager crash
Excerpts from Takashi Iwai's message of June 11, 2020 1:11 pm: > Thanks, so something still missing in the mmap handling, I guess. > > I've worked on two different branches for potential fixes of your > problems. Could you test topic/dma-fix and topic/dma-fix2 branches? > git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git > Just pull one of them onto Linus' git HEAD. > > I guess we'll go with David's new patch, but still it's interesting > whether my changes do anything good actually. > > > Takashi > On torvalds 623f6dc593, topic/dma-fix causes sound to be output as alternating half-second bursts of noise and a few seconds of silence. topic/dma-fix2 appears to work properly. Thanks, Alex.
Re: next-0519 on thinkpad x60: sound related? window manager crash
On Thu, 11 Jun 2020 16:51:55 +0200, Alex Xu (Hello71) wrote: > > Excerpts from Takashi Iwai's message of June 9, 2020 11:12 am: > > On Tue, 09 Jun 2020 13:47:33 +0200, > > Christoph Hellwig wrote: > >> > >> Alex, can you try this patch? > > > > Also could you check whether just papering over the memset() call > > alone avoids the crash like below? For PulseAudio and dmix/dsnoop, > > it's the only code path that accesses the vmapped buffer, I believe. > > > > If this works more or less, I'll cook a more comprehensive fix. > > > > > > thanks, > > > > Takashi > > > > --- a/sound/core/pcm_native.c > > +++ b/sound/core/pcm_native.c > > @@ -754,9 +754,11 @@ static int snd_pcm_hw_params(struct snd_pcm_substream > > *substream, > > while (runtime->boundary * 2 <= LONG_MAX - runtime->buffer_size) > > runtime->boundary *= 2; > > > > +#if 0 > > /* clear the buffer for avoiding possible kernel info leaks */ > > if (runtime->dma_area && !substream->ops->copy_user) > > memset(runtime->dma_area, 0, runtime->dma_bytes); > > +#endif > > > > snd_pcm_timer_resolution_change(substream); > > snd_pcm_set_state(substream, SNDRV_PCM_STATE_SETUP); > > > > Sorry, this patch doesn't work for me with SME off using abfbb29297c2. > David's newest submitted patch works for me, which I already replied to > separately. Thanks, so something still missing in the mmap handling, I guess. I've worked on two different branches for potential fixes of your problems. Could you test topic/dma-fix and topic/dma-fix2 branches? git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound.git Just pull one of them onto Linus' git HEAD. I guess we'll go with David's new patch, but still it's interesting whether my changes do anything good actually. Takashi
Re: next-0519 on thinkpad x60: sound related? window manager crash
Excerpts from Takashi Iwai's message of June 9, 2020 11:12 am: > On Tue, 09 Jun 2020 13:47:33 +0200, > Christoph Hellwig wrote: >> >> Alex, can you try this patch? > > Also could you check whether just papering over the memset() call > alone avoids the crash like below? For PulseAudio and dmix/dsnoop, > it's the only code path that accesses the vmapped buffer, I believe. > > If this works more or less, I'll cook a more comprehensive fix. > > > thanks, > > Takashi > > --- a/sound/core/pcm_native.c > +++ b/sound/core/pcm_native.c > @@ -754,9 +754,11 @@ static int snd_pcm_hw_params(struct snd_pcm_substream > *substream, > while (runtime->boundary * 2 <= LONG_MAX - runtime->buffer_size) > runtime->boundary *= 2; > > +#if 0 > /* clear the buffer for avoiding possible kernel info leaks */ > if (runtime->dma_area && !substream->ops->copy_user) > memset(runtime->dma_area, 0, runtime->dma_bytes); > +#endif > > snd_pcm_timer_resolution_change(substream); > snd_pcm_set_state(substream, SNDRV_PCM_STATE_SETUP); > Sorry, this patch doesn't work for me with SME off using abfbb29297c2. David's newest submitted patch works for me, which I already replied to separately. Thanks, Alex.
Re: next-0519 on thinkpad x60: sound related? window manager crash
Excerpts from Christoph Hellwig's message of June 9, 2020 7:47 am: > Alex, can you try this patch? > > diff --git a/sound/core/Kconfig b/sound/core/Kconfig > index d4554f376160a9..10b06e575a7fc5 100644 > --- a/sound/core/Kconfig > +++ b/sound/core/Kconfig > @@ -192,6 +192,6 @@ config SND_VMASTER > > config SND_DMA_SGBUF > def_bool y > - depends on X86 > + depends on BROKEN > > source "sound/core/seq/Kconfig" > Sorry, this patch doesn't work for me with SME off using abfbb29297c2. David's newest submitted patch works for me, which I already replied to separately. Thanks, Alex.
Re: next-0519 on thinkpad x60: sound related? window manager crash
On Tue, Jun 09, 2020 at 10:26:45PM -0700, David Rientjes wrote: > If this option should not implicitly be set for DMA_COHERENT_POOL, then I > assume we need yet another Kconfig option since DMA_REMAP selected it > before and DMA_COHERENT_POOL selects DMA_REMAP :) Yes, but what do we actually need DMA_REMAP for just for the coherent pool? We shuldn't really remap anything for AMD-SEV. Sorry for not noticing this earlier.
Re: next-0519 on thinkpad x60: sound related? window manager crash
On Tue, 9 Jun 2020, Christoph Hellwig wrote: > > Working theory is that CONFIG_DMA_NONCOHERENT_MMAP getting set is causing > > the error_code in the page fault path. Debugging with Alex off-thread we > > found that dma_{alloc,free}_from_pool() are not getting called from the > > new code in dma_direct_{alloc,free}_pages() and he has not enabled > > mem_encrypt. > > While DMA_COHERENT_POOL absolutely should not select DMA_NONCOHERENT_MMAP > (and you should send your patch either way), I don't think it is going > to make a difference here, as DMA_NONCOHERENT_MMAP just means we > allows mmaps even for non-coherent devices, and we do not support > non-coherent devices on x86. > We haven't heard yet whether the disabling of DMA_NONCOHERENT_MMAP fixes Aaron's BUG(), and the patch included some other debugging hints that will be printed out in case it didn't, but I'll share what we figured out: In 5.7, his config didn't have DMA_DIRECT_REMAP or DMA_REMAP (it did have GENERIC_ALLOCATOR already). AMD_MEM_ENCRYPT is set. In Linus HEAD, AMD_MEM_ENCRYPT now selects DMA_COHERENT_POOL so it sets the two aforementioned options. We also figured out that dma_should_alloc_from_pool() is always false up until the BUG(). So what else changed? Only the selection of DMA_REMAP and DMA_NONCOHERENT_MMAP. The comment in the Kconfig about setting "an uncached bit in the pagetables" led me to believe it may be related to the splat he's seeing (reserved bit violation). So I suggested dropping DMA_NONCOHERENT_MMAP from his Kconfig for testing purposes. If this option should not implicitly be set for DMA_COHERENT_POOL, then I assume we need yet another Kconfig option since DMA_REMAP selected it before and DMA_COHERENT_POOL selects DMA_REMAP :) So do we want a DMA_REMAP_BUT_NO_DMA_NONCOHERENT_MMAP? Decouple DMA_REMAP from DMA_NONCOHERENT_MMAP and select the latter wherever the former was set (but not DMA_COHERENT_POOL)? Something else?
Re: next-0519 on thinkpad x60: sound related? window manager crash
On Tue, 09 Jun 2020 13:47:33 +0200, Christoph Hellwig wrote: > > Alex, can you try this patch? Also could you check whether just papering over the memset() call alone avoids the crash like below? For PulseAudio and dmix/dsnoop, it's the only code path that accesses the vmapped buffer, I believe. If this works more or less, I'll cook a more comprehensive fix. thanks, Takashi --- a/sound/core/pcm_native.c +++ b/sound/core/pcm_native.c @@ -754,9 +754,11 @@ static int snd_pcm_hw_params(struct snd_pcm_substream *substream, while (runtime->boundary * 2 <= LONG_MAX - runtime->buffer_size) runtime->boundary *= 2; +#if 0 /* clear the buffer for avoiding possible kernel info leaks */ if (runtime->dma_area && !substream->ops->copy_user) memset(runtime->dma_area, 0, runtime->dma_bytes); +#endif snd_pcm_timer_resolution_change(substream); snd_pcm_set_state(substream, SNDRV_PCM_STATE_SETUP);
Re: next-0519 on thinkpad x60: sound related? window manager crash
Dne 09. 06. 20 v 13:49 Christoph Hellwig napsal(a): On Tue, Jun 09, 2020 at 01:45:34PM +0200, Takashi Iwai wrote: Yes, for the sound stuff, something below should make things working. But it means that we'll lose the SG-buffer allocation and the allocation of large buffers might fail on some machines. We crossed lines there. In general due to better memory compaction and CMA we have better chances to get larger contiguous allocations these days, so this might not be too much of an issue in practice. But turning off the SG DMA scheme seems like a step back. Would be possible to fix this kind of memory mapping? Jaroslav -- Jaroslav Kysela Linux Sound Maintainer; ALSA Project; Red Hat, Inc.
Re: next-0519 on thinkpad x60: sound related? window manager crash
On Tue, Jun 09, 2020 at 01:45:34PM +0200, Takashi Iwai wrote: > Yes, for the sound stuff, something below should make things working. > But it means that we'll lose the SG-buffer allocation and the > allocation of large buffers might fail on some machines. We crossed lines there. In general due to better memory compaction and CMA we have better chances to get larger contiguous allocations these days, so this might not be too much of an issue in practice.
Re: next-0519 on thinkpad x60: sound related? window manager crash
Alex, can you try this patch? diff --git a/sound/core/Kconfig b/sound/core/Kconfig index d4554f376160a9..10b06e575a7fc5 100644 --- a/sound/core/Kconfig +++ b/sound/core/Kconfig @@ -192,6 +192,6 @@ config SND_VMASTER config SND_DMA_SGBUF def_bool y - depends on X86 + depends on BROKEN source "sound/core/seq/Kconfig"
Re: next-0519 on thinkpad x60: sound related? window manager crash
On Tue, 09 Jun 2020 13:40:59 +0200, Christoph Hellwig wrote: > > On Tue, Jun 09, 2020 at 01:38:46PM +0200, Takashi Iwai wrote: > > On Tue, 09 Jun 2020 13:31:23 +0200, > > Christoph Hellwig wrote: > > > > > > On Tue, Jun 09, 2020 at 11:31:20AM +0200, Takashi Iwai wrote: > > > > > > How would be a proper way to get the virtually mapped SG-buffer > > > > > > pages > > > > > > with coherent memory? (Also allowing user-space mmap, too) > > > > > > > > > > dma_mmap_coherent / dma_mmap_attrs for userspace. We don't really > > > > > have a good way for kernel space mappings. > > > > > > > > And that's the missing piece right now... :-< > > > > > > Can you point me to the relevant places (allocation and vmap mostly) > > > so that I can take a look at how to fix this mess? > > > > Found in sound/core/sgbuf.c. It's specific to x86. > > So it looks like we could just turn off CONFIG_SND_DMA_SGBUF and > be done with it? After all this works on other architectures > just fine.. Yes, for the sound stuff, something below should make things working. But it means that we'll lose the SG-buffer allocation and the allocation of large buffers might fail on some machines. Takashi --- a/sound/core/Kconfig +++ b/sound/core/Kconfig @@ -192,6 +192,6 @@ config SND_VMASTER config SND_DMA_SGBUF def_bool y - depends on X86 + depends on X86 && BROKEN source "sound/core/seq/Kconfig"
Re: next-0519 on thinkpad x60: sound related? window manager crash
On Tue, Jun 09, 2020 at 01:38:46PM +0200, Takashi Iwai wrote: > On Tue, 09 Jun 2020 13:31:23 +0200, > Christoph Hellwig wrote: > > > > On Tue, Jun 09, 2020 at 11:31:20AM +0200, Takashi Iwai wrote: > > > > > How would be a proper way to get the virtually mapped SG-buffer pages > > > > > with coherent memory? (Also allowing user-space mmap, too) > > > > > > > > dma_mmap_coherent / dma_mmap_attrs for userspace. We don't really > > > > have a good way for kernel space mappings. > > > > > > And that's the missing piece right now... :-< > > > > Can you point me to the relevant places (allocation and vmap mostly) > > so that I can take a look at how to fix this mess? > > Found in sound/core/sgbuf.c. It's specific to x86. So it looks like we could just turn off CONFIG_SND_DMA_SGBUF and be done with it? After all this works on other architectures just fine..
Re: next-0519 on thinkpad x60: sound related? window manager crash
On Tue, 09 Jun 2020 13:31:23 +0200, Christoph Hellwig wrote: > > On Tue, Jun 09, 2020 at 11:31:20AM +0200, Takashi Iwai wrote: > > > > How would be a proper way to get the virtually mapped SG-buffer pages > > > > with coherent memory? (Also allowing user-space mmap, too) > > > > > > dma_mmap_coherent / dma_mmap_attrs for userspace. We don't really > > > have a good way for kernel space mappings. > > > > And that's the missing piece right now... :-< > > Can you point me to the relevant places (allocation and vmap mostly) > so that I can take a look at how to fix this mess? Found in sound/core/sgbuf.c. It's specific to x86. Also, for V4L, drivers/media/v4l2-core/videobuf-dma-sg.c. thanks, Takashi
Re: next-0519 on thinkpad x60: sound related? window manager crash
On Tue, Jun 09, 2020 at 11:31:20AM +0200, Takashi Iwai wrote: > > > How would be a proper way to get the virtually mapped SG-buffer pages > > > with coherent memory? (Also allowing user-space mmap, too) > > > > dma_mmap_coherent / dma_mmap_attrs for userspace. We don't really > > have a good way for kernel space mappings. > > And that's the missing piece right now... :-< Can you point me to the relevant places (allocation and vmap mostly) so that I can take a look at how to fix this mess?
Re: next-0519 on thinkpad x60: sound related? window manager crash
On Tue, 09 Jun 2020 11:31:20 +0200, Takashi Iwai wrote: > > On Tue, 09 Jun 2020 11:17:27 +0200, > Christoph Hellwig wrote: > > > > On Tue, Jun 09, 2020 at 11:09:14AM +0200, Takashi Iwai wrote: > > > On Tue, 09 Jun 2020 10:43:05 +0200, > > > Christoph Hellwig wrote: > > > > > > > > On Tue, Jun 09, 2020 at 10:05:26AM +0200, Takashi Iwai wrote: > > > > > > >From the disassembly it seems like a vmalloc allocation is NULL, > > > > > > >which > > > > > > seems really weird as this patch shouldn't make a difference for > > > > > > them, > > > > > > and I also only see a single places that allocates the field, and > > > > > > that > > > > > > checks for an allocation failure. But the sound code is a little > > > > > > hard to unwind sometimes. > > > > > > > > > > It's not clear which sound device being affected, but if it's > > > > > HD-audio on x86, runtime->dma_area points to a vmapped buffer from > > > > > SG-pages allocated by dma_alloc_coherent(). > > > > > > > > > > OTOH, if it's a USB-audio, runtime->dma_area is a buffer by > > > > > vmalloc(). > > > > > > > > Err, you can't just vmap a buffer returned from dma_alloc_coherent, > > > > dma_alloc_coherent returns values are opaque and can't be used > > > > for virt_to_page. Whatever that code did has already been broken > > > > per the DMA API contract and on many architectures and just happend > > > > to work on x86 by accident. > > > > > > Hmm, that's bad. > > > > > > How would be a proper way to get the virtually mapped SG-buffer pages > > > with coherent memory? (Also allowing user-space mmap, too) > > > > dma_mmap_coherent / dma_mmap_attrs for userspace. We don't really > > have a good way for kernel space mappings. > > And that's the missing piece right now... :-< BTW, this kind of usage is not specific to sound, but also V4L also does vmap() over SG pages from dma_alloc_coherent(). It seems done only on selected devices, though. Takashi
Re: next-0519 on thinkpad x60: sound related? window manager crash
On Tue, 09 Jun 2020 11:17:27 +0200, Christoph Hellwig wrote: > > On Tue, Jun 09, 2020 at 11:09:14AM +0200, Takashi Iwai wrote: > > On Tue, 09 Jun 2020 10:43:05 +0200, > > Christoph Hellwig wrote: > > > > > > On Tue, Jun 09, 2020 at 10:05:26AM +0200, Takashi Iwai wrote: > > > > > >From the disassembly it seems like a vmalloc allocation is NULL, > > > > > >which > > > > > seems really weird as this patch shouldn't make a difference for them, > > > > > and I also only see a single places that allocates the field, and that > > > > > checks for an allocation failure. But the sound code is a little > > > > > hard to unwind sometimes. > > > > > > > > It's not clear which sound device being affected, but if it's > > > > HD-audio on x86, runtime->dma_area points to a vmapped buffer from > > > > SG-pages allocated by dma_alloc_coherent(). > > > > > > > > OTOH, if it's a USB-audio, runtime->dma_area is a buffer by > > > > vmalloc(). > > > > > > Err, you can't just vmap a buffer returned from dma_alloc_coherent, > > > dma_alloc_coherent returns values are opaque and can't be used > > > for virt_to_page. Whatever that code did has already been broken > > > per the DMA API contract and on many architectures and just happend > > > to work on x86 by accident. > > > > Hmm, that's bad. > > > > How would be a proper way to get the virtually mapped SG-buffer pages > > with coherent memory? (Also allowing user-space mmap, too) > > dma_mmap_coherent / dma_mmap_attrs for userspace. We don't really > have a good way for kernel space mappings. And that's the missing piece right now... :-< Takashi
Re: next-0519 on thinkpad x60: sound related? window manager crash
On Tue, Jun 09, 2020 at 11:09:14AM +0200, Takashi Iwai wrote: > On Tue, 09 Jun 2020 10:43:05 +0200, > Christoph Hellwig wrote: > > > > On Tue, Jun 09, 2020 at 10:05:26AM +0200, Takashi Iwai wrote: > > > > >From the disassembly it seems like a vmalloc allocation is NULL, which > > > > seems really weird as this patch shouldn't make a difference for them, > > > > and I also only see a single places that allocates the field, and that > > > > checks for an allocation failure. But the sound code is a little > > > > hard to unwind sometimes. > > > > > > It's not clear which sound device being affected, but if it's > > > HD-audio on x86, runtime->dma_area points to a vmapped buffer from > > > SG-pages allocated by dma_alloc_coherent(). > > > > > > OTOH, if it's a USB-audio, runtime->dma_area is a buffer by > > > vmalloc(). > > > > Err, you can't just vmap a buffer returned from dma_alloc_coherent, > > dma_alloc_coherent returns values are opaque and can't be used > > for virt_to_page. Whatever that code did has already been broken > > per the DMA API contract and on many architectures and just happend > > to work on x86 by accident. > > Hmm, that's bad. > > How would be a proper way to get the virtually mapped SG-buffer pages > with coherent memory? (Also allowing user-space mmap, too) dma_mmap_coherent / dma_mmap_attrs for userspace. We don't really have a good way for kernel space mappings.
Re: next-0519 on thinkpad x60: sound related? window manager crash
On Tue, 09 Jun 2020 10:43:05 +0200, Christoph Hellwig wrote: > > On Tue, Jun 09, 2020 at 10:05:26AM +0200, Takashi Iwai wrote: > > > >From the disassembly it seems like a vmalloc allocation is NULL, which > > > seems really weird as this patch shouldn't make a difference for them, > > > and I also only see a single places that allocates the field, and that > > > checks for an allocation failure. But the sound code is a little > > > hard to unwind sometimes. > > > > It's not clear which sound device being affected, but if it's > > HD-audio on x86, runtime->dma_area points to a vmapped buffer from > > SG-pages allocated by dma_alloc_coherent(). > > > > OTOH, if it's a USB-audio, runtime->dma_area is a buffer by > > vmalloc(). > > Err, you can't just vmap a buffer returned from dma_alloc_coherent, > dma_alloc_coherent returns values are opaque and can't be used > for virt_to_page. Whatever that code did has already been broken > per the DMA API contract and on many architectures and just happend > to work on x86 by accident. Hmm, that's bad. How would be a proper way to get the virtually mapped SG-buffer pages with coherent memory? (Also allowing user-space mmap, too) thanks, Takashi
Re: next-0519 on thinkpad x60: sound related? window manager crash
On Tue, Jun 09, 2020 at 10:05:26AM +0200, Takashi Iwai wrote: > > >From the disassembly it seems like a vmalloc allocation is NULL, which > > seems really weird as this patch shouldn't make a difference for them, > > and I also only see a single places that allocates the field, and that > > checks for an allocation failure. But the sound code is a little > > hard to unwind sometimes. > > It's not clear which sound device being affected, but if it's > HD-audio on x86, runtime->dma_area points to a vmapped buffer from > SG-pages allocated by dma_alloc_coherent(). > > OTOH, if it's a USB-audio, runtime->dma_area is a buffer by > vmalloc(). Err, you can't just vmap a buffer returned from dma_alloc_coherent, dma_alloc_coherent returns values are opaque and can't be used for virt_to_page. Whatever that code did has already been broken per the DMA API contract and on many architectures and just happend to work on x86 by accident.
Re: next-0519 on thinkpad x60: sound related? window manager crash
On Tue, 09 Jun 2020 07:43:06 +0200, Christoph Hellwig wrote: > > On Mon, Jun 08, 2020 at 07:31:47PM -0700, David Rientjes wrote: > > On Mon, 8 Jun 2020, Alex Xu (Hello71) wrote: > > > > > Excerpts from Christoph Hellwig's message of June 8, 2020 2:19 am: > > > > Can you do a listing using gdb where this happens? > > > > > > > > gdb vmlinux > > > > > > > > l *(snd_pcm_hw_params+0x3f3) > > > > > > > > ? > > > > > > > > > > (gdb) l *(snd_pcm_hw_params+0x3f3) > > > 0x817efc85 is in snd_pcm_hw_params > > > (.../linux/sound/core/pcm_native.c:749). > > > 744 while (runtime->boundary * 2 <= LONG_MAX - > > > runtime->buffer_size) > > > 745 runtime->boundary *= 2; > > > 746 > > > 747 /* clear the buffer for avoiding possible kernel info > > > leaks */ > > > 748 if (runtime->dma_area && !substream->ops->copy_user) > > > 749 memset(runtime->dma_area, 0, runtime->dma_bytes); > > > 750 > > > 751 snd_pcm_timer_resolution_change(substream); > > > 752 snd_pcm_set_state(substream, SNDRV_PCM_STATE_SETUP); > > > 753 > > > > > > > Working theory is that CONFIG_DMA_NONCOHERENT_MMAP getting set is causing > > the error_code in the page fault path. Debugging with Alex off-thread we > > found that dma_{alloc,free}_from_pool() are not getting called from the > > new code in dma_direct_{alloc,free}_pages() and he has not enabled > > mem_encrypt. > > While DMA_COHERENT_POOL absolutely should not select DMA_NONCOHERENT_MMAP > (and you should send your patch either way), I don't think it is going > to make a difference here, as DMA_NONCOHERENT_MMAP just means we > allows mmaps even for non-coherent devices, and we do not support > non-coherent devices on x86. > > >From the disassembly it seems like a vmalloc allocation is NULL, which > seems really weird as this patch shouldn't make a difference for them, > and I also only see a single places that allocates the field, and that > checks for an allocation failure. But the sound code is a little > hard to unwind sometimes. It's not clear which sound device being affected, but if it's HD-audio on x86, runtime->dma_area points to a vmapped buffer from SG-pages allocated by dma_alloc_coherent(). OTOH, if it's a USB-audio, runtime->dma_area is a buffer by vmalloc(). Takashi
Re: next-0519 on thinkpad x60: sound related? window manager crash
On Mon, Jun 08, 2020 at 07:31:47PM -0700, David Rientjes wrote: > On Mon, 8 Jun 2020, Alex Xu (Hello71) wrote: > > > Excerpts from Christoph Hellwig's message of June 8, 2020 2:19 am: > > > Can you do a listing using gdb where this happens? > > > > > > gdb vmlinux > > > > > > l *(snd_pcm_hw_params+0x3f3) > > > > > > ? > > > > > > > (gdb) l *(snd_pcm_hw_params+0x3f3) > > 0x817efc85 is in snd_pcm_hw_params > > (.../linux/sound/core/pcm_native.c:749). > > 744 while (runtime->boundary * 2 <= LONG_MAX - > > runtime->buffer_size) > > 745 runtime->boundary *= 2; > > 746 > > 747 /* clear the buffer for avoiding possible kernel info leaks > > */ > > 748 if (runtime->dma_area && !substream->ops->copy_user) > > 749 memset(runtime->dma_area, 0, runtime->dma_bytes); > > 750 > > 751 snd_pcm_timer_resolution_change(substream); > > 752 snd_pcm_set_state(substream, SNDRV_PCM_STATE_SETUP); > > 753 > > > > Working theory is that CONFIG_DMA_NONCOHERENT_MMAP getting set is causing > the error_code in the page fault path. Debugging with Alex off-thread we > found that dma_{alloc,free}_from_pool() are not getting called from the > new code in dma_direct_{alloc,free}_pages() and he has not enabled > mem_encrypt. While DMA_COHERENT_POOL absolutely should not select DMA_NONCOHERENT_MMAP (and you should send your patch either way), I don't think it is going to make a difference here, as DMA_NONCOHERENT_MMAP just means we allows mmaps even for non-coherent devices, and we do not support non-coherent devices on x86. >From the disassembly it seems like a vmalloc allocation is NULL, which seems really weird as this patch shouldn't make a difference for them, and I also only see a single places that allocates the field, and that checks for an allocation failure. But the sound code is a little hard to unwind sometimes.
Re: next-0519 on thinkpad x60: sound related? window manager crash
On Mon, 8 Jun 2020, Alex Xu (Hello71) wrote: > Excerpts from Christoph Hellwig's message of June 8, 2020 2:19 am: > > Can you do a listing using gdb where this happens? > > > > gdb vmlinux > > > > l *(snd_pcm_hw_params+0x3f3) > > > > ? > > > > (gdb) l *(snd_pcm_hw_params+0x3f3) > 0x817efc85 is in snd_pcm_hw_params > (.../linux/sound/core/pcm_native.c:749). > 744 while (runtime->boundary * 2 <= LONG_MAX - > runtime->buffer_size) > 745 runtime->boundary *= 2; > 746 > 747 /* clear the buffer for avoiding possible kernel info leaks */ > 748 if (runtime->dma_area && !substream->ops->copy_user) > 749 memset(runtime->dma_area, 0, runtime->dma_bytes); > 750 > 751 snd_pcm_timer_resolution_change(substream); > 752 snd_pcm_set_state(substream, SNDRV_PCM_STATE_SETUP); > 753 > Working theory is that CONFIG_DMA_NONCOHERENT_MMAP getting set is causing the error_code in the page fault path. Debugging with Alex off-thread we found that dma_{alloc,free}_from_pool() are not getting called from the new code in dma_direct_{alloc,free}_pages() and he has not enabled mem_encrypt. So the issue is related to setting CONFIG_DMA_COHERENT_POOL, and not anything else related to AMD SME. He has a patch to try out, but I wanted to update the thread in case there are other ideas to try other than selecting CONFIG_DMA_NONCOHERENT_MMAP only when CONFIG_DMA_REMAP is set (and not CONFIG_DMA_COHERENT_POOL).
Re: next-0519 on thinkpad x60: sound related? window manager crash
Excerpts from Christoph Hellwig's message of June 8, 2020 2:19 am: > Can you do a listing using gdb where this happens? > > gdb vmlinux > > l *(snd_pcm_hw_params+0x3f3) > > ? > (gdb) l *(snd_pcm_hw_params+0x3f3) 0x817efc85 is in snd_pcm_hw_params (.../linux/sound/core/pcm_native.c:749). 744 while (runtime->boundary * 2 <= LONG_MAX - runtime->buffer_size) 745 runtime->boundary *= 2; 746 747 /* clear the buffer for avoiding possible kernel info leaks */ 748 if (runtime->dma_area && !substream->ops->copy_user) 749 memset(runtime->dma_area, 0, runtime->dma_bytes); 750 751 snd_pcm_timer_resolution_change(substream); 752 snd_pcm_set_state(substream, SNDRV_PCM_STATE_SETUP); 753
Re: next-0519 on thinkpad x60: sound related? window manager crash
Can you do a listing using gdb where this happens? gdb vmlinux l *(snd_pcm_hw_params+0x3f3) ? On Sun, Jun 07, 2020 at 11:58:21AM -0400, Alex Xu (Hello71) wrote: > I have a similar issue, caused between aaa2faab4ed8 and b170290c2836. > > [ 20.263098] BUG: unable to handle page fault for address: b2b582cc2000 > [ 20.263104] #PF: supervisor write access in kernel mode > [ 20.263105] #PF: error_code(0x000b) - reserved bit violation > [ 20.263107] PGD 3fd03b067 P4D 3fd03b067 PUD 3fd03c067 PMD 3f8822067 PTE > 8000273942ab2163 > [ 20.263113] Oops: 000b [#1] PREEMPT SMP > [ 20.263117] CPU: 3 PID: 691 Comm: mpv Not tainted > 5.7.0-11262-gb170290c2836 #1 > [ 20.263119] Hardware name: To Be Filled By O.E.M. To Be Filled By > O.E.M./B450 Pro4, BIOS P4.10 03/05/2020 > [ 20.263125] RIP: 0010:__memset+0x24/0x30 > [ 20.263128] Code: cc cc cc cc cc cc 0f 1f 44 00 00 49 89 f9 48 89 d1 83 e2 > 07 48 c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 48 > ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 f3 > [ 20.263131] RSP: 0018:b2b583d07e10 EFLAGS: 00010216 > [ 20.263133] RAX: RBX: 8b8000102c00 RCX: > 4000 > [ 20.263134] RDX: RSI: RDI: > b2b582cc2000 > [ 20.263136] RBP: 8b8000101000 R08: R09: > b2b582cc2000 > [ 20.263137] R10: 5356 R11: 8b8000102c18 R12: > > [ 20.263139] R13: R14: 8b8039944200 R15: > 9794daa0 > [ 20.263141] FS: 7f41aa4b4200() GS:8b803ecc() > knlGS: > [ 20.263143] CS: 0010 DS: ES: CR0: 80050033 > [ 20.263144] CR2: b2b582cc2000 CR3: 0003b6731000 CR4: > 003406e0 > [ 20.263146] Call Trace: > [ 20.263151] ? snd_pcm_hw_params+0x3f3/0x47a > [ 20.263154] ? snd_pcm_common_ioctl+0xf2/0xf73 > [ 20.263158] ? snd_pcm_ioctl+0x1e/0x29 > [ 20.263161] ? ksys_ioctl+0x77/0x91 > [ 20.263163] ? __x64_sys_ioctl+0x11/0x14 > [ 20.263166] ? do_syscall_64+0x3d/0xf5 > [ 20.263170] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [ 20.263173] Modules linked in: uvcvideo videobuf2_vmalloc videobuf2_memops > videobuf2_v4l2 videodev snd_usb_audio videobuf2_common snd_hwdep > snd_usbmidi_lib input_leds snd_rawmidi led_class > [ 20.263182] CR2: b2b582cc2000 > [ 20.263184] ---[ end trace c6b47a774b91f0a0 ]--- > [ 20.263187] RIP: 0010:__memset+0x24/0x30 > [ 20.263190] Code: cc cc cc cc cc cc 0f 1f 44 00 00 49 89 f9 48 89 d1 83 e2 > 07 48 c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 48 > ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 f3 > [ 20.263192] RSP: 0018:b2b583d07e10 EFLAGS: 00010216 > [ 20.263193] RAX: RBX: 8b8000102c00 RCX: > 4000 > [ 20.263195] RDX: RSI: RDI: > b2b582cc2000 > [ 20.263196] RBP: 8b8000101000 R08: R09: > b2b582cc2000 > [ 20.263197] R10: 5356 R11: 8b8000102c18 R12: > > [ 20.263199] R13: R14: 8b8039944200 R15: > 9794daa0 > [ 20.263201] FS: 7f41aa4b4200() GS:8b803ecc() > knlGS: > [ 20.263202] CS: 0010 DS: ES: CR0: 80050033 > [ 20.263204] CR2: b2b582cc2000 CR3: 0003b6731000 CR4: > 003406e0 > > I bisected this to 82fef0ad811f "x86/mm: unencrypted non-blocking DMA > allocations use coherent pools". Reverting 1ee18de92927 resolves the > issue. > > Looks like Thinkpad X60 doesn't have VT-d, but could still be DMA > related. ---end quoted text---
Re: 82fef0ad811f "x86/mm: unencrypted non-blocking DMA allocations use coherent pools" was Re: next-0519 on thinkpad x60: sound related? window manager crash
Excerpts from David Rientjes's message of June 7, 2020 8:57 pm: > Thanks for trying it out, Alex. Would you mind sending your .config and > command line? I assume either mem_encrypt=on or > CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT is enabled. > > Could you also give this a try? > > diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c > --- a/kernel/dma/direct.c > +++ b/kernel/dma/direct.c > @@ -99,10 +99,11 @@ static inline bool dma_should_alloc_from_pool(struct > device *dev, gfp_t gfp, > static inline bool dma_should_free_from_pool(struct device *dev, >unsigned long attrs) > { > - if (IS_ENABLED(CONFIG_DMA_COHERENT_POOL)) > + if (!IS_ENABLED(CONFIG_DMA_COHERENT_POOL)) > + return false; > + if (force_dma_unencrypted(dev)) > return true; > - if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) && > - !force_dma_unencrypted(dev)) > + if (attrs & DMA_ATTR_NO_KERNEL_MAPPING) > return false; > if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP)) > return true; > This patch doesn't work for me either. It has since occurred to me that while I do have CONFIG_AMD_MEM_ENCYRPT=y, I have CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT=n, because it was broken with amdgpu (unfortunately a downgrade from radeon in this respect). Tried it again just now and it looks like it's now able to enable KMS, but all it displays is serious-looking errors. Sorry for not mentioning that earlier. I'll send you my .config and command line off-list. Thanks, Alex.
Re: 82fef0ad811f "x86/mm: unencrypted non-blocking DMA allocations use coherent pools" was Re: next-0519 on thinkpad x60: sound related? window manager crash
On Sun, 7 Jun 2020, Alex Xu (Hello71) wrote: > > On Sun, 7 Jun 2020, Pavel Machek wrote: > > > >> > I have a similar issue, caused between aaa2faab4ed8 and b170290c2836. > >> > > >> > [ 20.263098] BUG: unable to handle page fault for address: > >> > b2b582cc2000 > >> > [ 20.263104] #PF: supervisor write access in kernel mode > >> > [ 20.263105] #PF: error_code(0x000b) - reserved bit violation > >> > [ 20.263107] PGD 3fd03b067 P4D 3fd03b067 PUD 3fd03c067 PMD 3f8822067 > >> > PTE 8000273942ab2163 > >> > [ 20.263113] Oops: 000b [#1] PREEMPT SMP > >> > [ 20.263117] CPU: 3 PID: 691 Comm: mpv Not tainted > >> > 5.7.0-11262-gb170290c2836 #1 > >> > [ 20.263119] Hardware name: To Be Filled By O.E.M. To Be Filled By > >> > O.E.M./B450 Pro4, BIOS P4.10 03/05/2020 > >> > [ 20.263125] RIP: 0010:__memset+0x24/0x30 > >> > [ 20.263128] Code: cc cc cc cc cc cc 0f 1f 44 00 00 49 89 f9 48 89 d1 > >> > 83 e2 07 48 c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af > >> > c6 48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 f3 > >> > [ 20.263131] RSP: 0018:b2b583d07e10 EFLAGS: 00010216 > >> > [ 20.263133] RAX: RBX: 8b8000102c00 RCX: > >> > 4000 > >> > [ 20.263134] RDX: RSI: RDI: > >> > b2b582cc2000 > >> > [ 20.263136] RBP: 8b8000101000 R08: R09: > >> > b2b582cc2000 > >> > [ 20.263137] R10: 5356 R11: 8b8000102c18 R12: > >> > > >> > [ 20.263139] R13: R14: 8b8039944200 R15: > >> > 9794daa0 > >> > [ 20.263141] FS: 7f41aa4b4200() GS:8b803ecc() > >> > knlGS: > >> > [ 20.263143] CS: 0010 DS: ES: CR0: 80050033 > >> > [ 20.263144] CR2: b2b582cc2000 CR3: 0003b6731000 CR4: > >> > 003406e0 > >> > [ 20.263146] Call Trace: > >> > [ 20.263151] ? snd_pcm_hw_params+0x3f3/0x47a > >> > [ 20.263154] ? snd_pcm_common_ioctl+0xf2/0xf73 > >> > [ 20.263158] ? snd_pcm_ioctl+0x1e/0x29 > >> > [ 20.263161] ? ksys_ioctl+0x77/0x91 > >> > [ 20.263163] ? __x64_sys_ioctl+0x11/0x14 > >> > [ 20.263166] ? do_syscall_64+0x3d/0xf5 > >> > [ 20.263170] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 > >> > [ 20.263173] Modules linked in: uvcvideo videobuf2_vmalloc > >> > videobuf2_memops videobuf2_v4l2 videodev snd_usb_audio videobuf2_common > >> > snd_hwdep snd_usbmidi_lib input_leds snd_rawmidi led_class > >> > [ 20.263182] CR2: b2b582cc2000 > >> > [ 20.263184] ---[ end trace c6b47a774b91f0a0 ]--- > >> > [ 20.263187] RIP: 0010:__memset+0x24/0x30 > >> > [ 20.263190] Code: cc cc cc cc cc cc 0f 1f 44 00 00 49 89 f9 48 89 d1 > >> > 83 e2 07 48 c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af > >> > c6 48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 f3 > >> > [ 20.263192] RSP: 0018:b2b583d07e10 EFLAGS: 00010216 > >> > [ 20.263193] RAX: RBX: 8b8000102c00 RCX: > >> > 4000 > >> > [ 20.263195] RDX: RSI: RDI: > >> > b2b582cc2000 > >> > [ 20.263196] RBP: 8b8000101000 R08: R09: > >> > b2b582cc2000 > >> > [ 20.263197] R10: 5356 R11: 8b8000102c18 R12: > >> > > >> > [ 20.263199] R13: R14: 8b8039944200 R15: > >> > 9794daa0 > >> > [ 20.263201] FS: 7f41aa4b4200() GS:8b803ecc() > >> > knlGS: > >> > [ 20.263202] CS: 0010 DS: ES: CR0: 80050033 > >> > [ 20.263204] CR2: b2b582cc2000 CR3: 0003b6731000 CR4: > >> > 003406e0 > >> > > >> > I bisected this to 82fef0ad811f "x86/mm: unencrypted non-blocking DMA > >> > allocations use coherent pools". Reverting 1ee18de92927 resolves the > >> > issue. > >> > > >> > Looks like Thinkpad X60 doesn't have VT-d, but could still be DMA > >> > related. > >> > >> Note that newer -next releases seem to behave okay for me. The commit > >> pointed out by siection is really simple: > >> > >> AFAIK you could verify it is responsible by turning off > >> CONFIG_AMD_MEM_ENCRYPT on latest kernel... > >> > >> Best regards, > >>Pavel > >> > >> index 1d6104ea8af0..2bf819d3 100644 > >> --- a/arch/x86/Kconfig > >> +++ b/arch/x86/Kconfig > >> @@ -1520,6 +1520,7 @@ config X86_CPA_STATISTICS > >> config AMD_MEM_ENCRYPT > >> bool "AMD Secure Memory Encryption (SME) support" > >> depends on X86_64 && CPU_SUP_AMD > >> + select DMA_COHERENT_POOL > >> select DYNAMIC_PHYSICAL_MASK > >> select ARCH_USE_MEMREMAP_PROT > >> select ARCH_HAS_FORCE_DMA_UNENCRYPTED > > > > Thanks for the report! > > > > Besides CONFIG_AMD_MEM_ENCRYPT, do you have CONFIG_DMA_DIRECT_REMAP > > enabled? If so, it may be caused by the virtual address passed to the > >
Re: 82fef0ad811f "x86/mm: unencrypted non-blocking DMA allocations use coherent pools" was Re: next-0519 on thinkpad x60: sound related? window manager crash
Excerpts from David Rientjes's message of June 7, 2020 3:41 pm: > On Sun, 7 Jun 2020, Pavel Machek wrote: > >> > I have a similar issue, caused between aaa2faab4ed8 and b170290c2836. >> > >> > [ 20.263098] BUG: unable to handle page fault for address: >> > b2b582cc2000 >> > [ 20.263104] #PF: supervisor write access in kernel mode >> > [ 20.263105] #PF: error_code(0x000b) - reserved bit violation >> > [ 20.263107] PGD 3fd03b067 P4D 3fd03b067 PUD 3fd03c067 PMD 3f8822067 PTE >> > 8000273942ab2163 >> > [ 20.263113] Oops: 000b [#1] PREEMPT SMP >> > [ 20.263117] CPU: 3 PID: 691 Comm: mpv Not tainted >> > 5.7.0-11262-gb170290c2836 #1 >> > [ 20.263119] Hardware name: To Be Filled By O.E.M. To Be Filled By >> > O.E.M./B450 Pro4, BIOS P4.10 03/05/2020 >> > [ 20.263125] RIP: 0010:__memset+0x24/0x30 >> > [ 20.263128] Code: cc cc cc cc cc cc 0f 1f 44 00 00 49 89 f9 48 89 d1 83 >> > e2 07 48 c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 >> > 48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 f3 >> > [ 20.263131] RSP: 0018:b2b583d07e10 EFLAGS: 00010216 >> > [ 20.263133] RAX: RBX: 8b8000102c00 RCX: >> > 4000 >> > [ 20.263134] RDX: RSI: RDI: >> > b2b582cc2000 >> > [ 20.263136] RBP: 8b8000101000 R08: R09: >> > b2b582cc2000 >> > [ 20.263137] R10: 5356 R11: 8b8000102c18 R12: >> > >> > [ 20.263139] R13: R14: 8b8039944200 R15: >> > 9794daa0 >> > [ 20.263141] FS: 7f41aa4b4200() GS:8b803ecc() >> > knlGS: >> > [ 20.263143] CS: 0010 DS: ES: CR0: 80050033 >> > [ 20.263144] CR2: b2b582cc2000 CR3: 0003b6731000 CR4: >> > 003406e0 >> > [ 20.263146] Call Trace: >> > [ 20.263151] ? snd_pcm_hw_params+0x3f3/0x47a >> > [ 20.263154] ? snd_pcm_common_ioctl+0xf2/0xf73 >> > [ 20.263158] ? snd_pcm_ioctl+0x1e/0x29 >> > [ 20.263161] ? ksys_ioctl+0x77/0x91 >> > [ 20.263163] ? __x64_sys_ioctl+0x11/0x14 >> > [ 20.263166] ? do_syscall_64+0x3d/0xf5 >> > [ 20.263170] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 >> > [ 20.263173] Modules linked in: uvcvideo videobuf2_vmalloc >> > videobuf2_memops videobuf2_v4l2 videodev snd_usb_audio videobuf2_common >> > snd_hwdep snd_usbmidi_lib input_leds snd_rawmidi led_class >> > [ 20.263182] CR2: b2b582cc2000 >> > [ 20.263184] ---[ end trace c6b47a774b91f0a0 ]--- >> > [ 20.263187] RIP: 0010:__memset+0x24/0x30 >> > [ 20.263190] Code: cc cc cc cc cc cc 0f 1f 44 00 00 49 89 f9 48 89 d1 83 >> > e2 07 48 c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 >> > 48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 f3 >> > [ 20.263192] RSP: 0018:b2b583d07e10 EFLAGS: 00010216 >> > [ 20.263193] RAX: RBX: 8b8000102c00 RCX: >> > 4000 >> > [ 20.263195] RDX: RSI: RDI: >> > b2b582cc2000 >> > [ 20.263196] RBP: 8b8000101000 R08: R09: >> > b2b582cc2000 >> > [ 20.263197] R10: 5356 R11: 8b8000102c18 R12: >> > >> > [ 20.263199] R13: R14: 8b8039944200 R15: >> > 9794daa0 >> > [ 20.263201] FS: 7f41aa4b4200() GS:8b803ecc() >> > knlGS: >> > [ 20.263202] CS: 0010 DS: ES: CR0: 80050033 >> > [ 20.263204] CR2: b2b582cc2000 CR3: 0003b6731000 CR4: >> > 003406e0 >> > >> > I bisected this to 82fef0ad811f "x86/mm: unencrypted non-blocking DMA >> > allocations use coherent pools". Reverting 1ee18de92927 resolves the >> > issue. >> > >> > Looks like Thinkpad X60 doesn't have VT-d, but could still be DMA >> > related. >> >> Note that newer -next releases seem to behave okay for me. The commit >> pointed out by siection is really simple: >> >> AFAIK you could verify it is responsible by turning off >> CONFIG_AMD_MEM_ENCRYPT on latest kernel... >> >> Best regards, >> Pavel >> >> index 1d6104ea8af0..2bf819d3 100644 >> --- a/arch/x86/Kconfig >> +++ b/arch/x86/Kconfig >> @@ -1520,6 +1520,7 @@ config X86_CPA_STATISTICS >> config AMD_MEM_ENCRYPT >> bool "AMD Secure Memory Encryption (SME) support" >> depends on X86_64 && CPU_SUP_AMD >> + select DMA_COHERENT_POOL >> select DYNAMIC_PHYSICAL_MASK >> select ARCH_USE_MEMREMAP_PROT >> select ARCH_HAS_FORCE_DMA_UNENCRYPTED > > Thanks for the report! > > Besides CONFIG_AMD_MEM_ENCRYPT, do you have CONFIG_DMA_DIRECT_REMAP > enabled? If so, it may be caused by the virtual address passed to the > set_memory_{decrypted,encrypted}() functions. > > And I assume you are enabling SME by using mem_encrypt=on on the kernel > command line or CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT
Re: 82fef0ad811f "x86/mm: unencrypted non-blocking DMA allocations use coherent pools" was Re: next-0519 on thinkpad x60: sound related? window manager crash
On Sun, 7 Jun 2020, Pavel Machek wrote: > > I have a similar issue, caused between aaa2faab4ed8 and b170290c2836. > > > > [ 20.263098] BUG: unable to handle page fault for address: > > b2b582cc2000 > > [ 20.263104] #PF: supervisor write access in kernel mode > > [ 20.263105] #PF: error_code(0x000b) - reserved bit violation > > [ 20.263107] PGD 3fd03b067 P4D 3fd03b067 PUD 3fd03c067 PMD 3f8822067 PTE > > 8000273942ab2163 > > [ 20.263113] Oops: 000b [#1] PREEMPT SMP > > [ 20.263117] CPU: 3 PID: 691 Comm: mpv Not tainted > > 5.7.0-11262-gb170290c2836 #1 > > [ 20.263119] Hardware name: To Be Filled By O.E.M. To Be Filled By > > O.E.M./B450 Pro4, BIOS P4.10 03/05/2020 > > [ 20.263125] RIP: 0010:__memset+0x24/0x30 > > [ 20.263128] Code: cc cc cc cc cc cc 0f 1f 44 00 00 49 89 f9 48 89 d1 83 > > e2 07 48 c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 > > 48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 f3 > > [ 20.263131] RSP: 0018:b2b583d07e10 EFLAGS: 00010216 > > [ 20.263133] RAX: RBX: 8b8000102c00 RCX: > > 4000 > > [ 20.263134] RDX: RSI: RDI: > > b2b582cc2000 > > [ 20.263136] RBP: 8b8000101000 R08: R09: > > b2b582cc2000 > > [ 20.263137] R10: 5356 R11: 8b8000102c18 R12: > > > > [ 20.263139] R13: R14: 8b8039944200 R15: > > 9794daa0 > > [ 20.263141] FS: 7f41aa4b4200() GS:8b803ecc() > > knlGS: > > [ 20.263143] CS: 0010 DS: ES: CR0: 80050033 > > [ 20.263144] CR2: b2b582cc2000 CR3: 0003b6731000 CR4: > > 003406e0 > > [ 20.263146] Call Trace: > > [ 20.263151] ? snd_pcm_hw_params+0x3f3/0x47a > > [ 20.263154] ? snd_pcm_common_ioctl+0xf2/0xf73 > > [ 20.263158] ? snd_pcm_ioctl+0x1e/0x29 > > [ 20.263161] ? ksys_ioctl+0x77/0x91 > > [ 20.263163] ? __x64_sys_ioctl+0x11/0x14 > > [ 20.263166] ? do_syscall_64+0x3d/0xf5 > > [ 20.263170] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > [ 20.263173] Modules linked in: uvcvideo videobuf2_vmalloc > > videobuf2_memops videobuf2_v4l2 videodev snd_usb_audio videobuf2_common > > snd_hwdep snd_usbmidi_lib input_leds snd_rawmidi led_class > > [ 20.263182] CR2: b2b582cc2000 > > [ 20.263184] ---[ end trace c6b47a774b91f0a0 ]--- > > [ 20.263187] RIP: 0010:__memset+0x24/0x30 > > [ 20.263190] Code: cc cc cc cc cc cc 0f 1f 44 00 00 49 89 f9 48 89 d1 83 > > e2 07 48 c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 > > 48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 f3 > > [ 20.263192] RSP: 0018:b2b583d07e10 EFLAGS: 00010216 > > [ 20.263193] RAX: RBX: 8b8000102c00 RCX: > > 4000 > > [ 20.263195] RDX: RSI: RDI: > > b2b582cc2000 > > [ 20.263196] RBP: 8b8000101000 R08: R09: > > b2b582cc2000 > > [ 20.263197] R10: 5356 R11: 8b8000102c18 R12: > > > > [ 20.263199] R13: R14: 8b8039944200 R15: > > 9794daa0 > > [ 20.263201] FS: 7f41aa4b4200() GS:8b803ecc() > > knlGS: > > [ 20.263202] CS: 0010 DS: ES: CR0: 80050033 > > [ 20.263204] CR2: b2b582cc2000 CR3: 0003b6731000 CR4: > > 003406e0 > > > > I bisected this to 82fef0ad811f "x86/mm: unencrypted non-blocking DMA > > allocations use coherent pools". Reverting 1ee18de92927 resolves the > > issue. > > > > Looks like Thinkpad X60 doesn't have VT-d, but could still be DMA > > related. > > Note that newer -next releases seem to behave okay for me. The commit > pointed out by siection is really simple: > > AFAIK you could verify it is responsible by turning off > CONFIG_AMD_MEM_ENCRYPT on latest kernel... > > Best regards, > Pavel > > index 1d6104ea8af0..2bf819d3 100644 > --- a/arch/x86/Kconfig > +++ b/arch/x86/Kconfig > @@ -1520,6 +1520,7 @@ config X86_CPA_STATISTICS > config AMD_MEM_ENCRYPT > bool "AMD Secure Memory Encryption (SME) support" > depends on X86_64 && CPU_SUP_AMD > + select DMA_COHERENT_POOL > select DYNAMIC_PHYSICAL_MASK > select ARCH_USE_MEMREMAP_PROT > select ARCH_HAS_FORCE_DMA_UNENCRYPTED Thanks for the report! Besides CONFIG_AMD_MEM_ENCRYPT, do you have CONFIG_DMA_DIRECT_REMAP enabled? If so, it may be caused by the virtual address passed to the set_memory_{decrypted,encrypted}() functions. And I assume you are enabling SME by using mem_encrypt=on on the kernel command line or CONFIG_AMD_MEM_ENCRYPT_ACTIVE_BY_DEFAULT is enabled. We likely need an atomic pool for devices that support DMA to addresses in sme_me_mask as well. I can test this tomorrow, but wanted to get it out early to see if
82fef0ad811f "x86/mm: unencrypted non-blocking DMA allocations use coherent pools" was Re: next-0519 on thinkpad x60: sound related? window manager crash
Hi! > I have a similar issue, caused between aaa2faab4ed8 and b170290c2836. > > [ 20.263098] BUG: unable to handle page fault for address: b2b582cc2000 > [ 20.263104] #PF: supervisor write access in kernel mode > [ 20.263105] #PF: error_code(0x000b) - reserved bit violation > [ 20.263107] PGD 3fd03b067 P4D 3fd03b067 PUD 3fd03c067 PMD 3f8822067 PTE > 8000273942ab2163 > [ 20.263113] Oops: 000b [#1] PREEMPT SMP > [ 20.263117] CPU: 3 PID: 691 Comm: mpv Not tainted > 5.7.0-11262-gb170290c2836 #1 > [ 20.263119] Hardware name: To Be Filled By O.E.M. To Be Filled By > O.E.M./B450 Pro4, BIOS P4.10 03/05/2020 > [ 20.263125] RIP: 0010:__memset+0x24/0x30 > [ 20.263128] Code: cc cc cc cc cc cc 0f 1f 44 00 00 49 89 f9 48 89 d1 83 e2 > 07 48 c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 48 > ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 f3 > [ 20.263131] RSP: 0018:b2b583d07e10 EFLAGS: 00010216 > [ 20.263133] RAX: RBX: 8b8000102c00 RCX: > 4000 > [ 20.263134] RDX: RSI: RDI: > b2b582cc2000 > [ 20.263136] RBP: 8b8000101000 R08: R09: > b2b582cc2000 > [ 20.263137] R10: 5356 R11: 8b8000102c18 R12: > > [ 20.263139] R13: R14: 8b8039944200 R15: > 9794daa0 > [ 20.263141] FS: 7f41aa4b4200() GS:8b803ecc() > knlGS: > [ 20.263143] CS: 0010 DS: ES: CR0: 80050033 > [ 20.263144] CR2: b2b582cc2000 CR3: 0003b6731000 CR4: > 003406e0 > [ 20.263146] Call Trace: > [ 20.263151] ? snd_pcm_hw_params+0x3f3/0x47a > [ 20.263154] ? snd_pcm_common_ioctl+0xf2/0xf73 > [ 20.263158] ? snd_pcm_ioctl+0x1e/0x29 > [ 20.263161] ? ksys_ioctl+0x77/0x91 > [ 20.263163] ? __x64_sys_ioctl+0x11/0x14 > [ 20.263166] ? do_syscall_64+0x3d/0xf5 > [ 20.263170] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [ 20.263173] Modules linked in: uvcvideo videobuf2_vmalloc videobuf2_memops > videobuf2_v4l2 videodev snd_usb_audio videobuf2_common snd_hwdep > snd_usbmidi_lib input_leds snd_rawmidi led_class > [ 20.263182] CR2: b2b582cc2000 > [ 20.263184] ---[ end trace c6b47a774b91f0a0 ]--- > [ 20.263187] RIP: 0010:__memset+0x24/0x30 > [ 20.263190] Code: cc cc cc cc cc cc 0f 1f 44 00 00 49 89 f9 48 89 d1 83 e2 > 07 48 c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 48 > ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 f3 > [ 20.263192] RSP: 0018:b2b583d07e10 EFLAGS: 00010216 > [ 20.263193] RAX: RBX: 8b8000102c00 RCX: > 4000 > [ 20.263195] RDX: RSI: RDI: > b2b582cc2000 > [ 20.263196] RBP: 8b8000101000 R08: R09: > b2b582cc2000 > [ 20.263197] R10: 5356 R11: 8b8000102c18 R12: > > [ 20.263199] R13: R14: 8b8039944200 R15: > 9794daa0 > [ 20.263201] FS: 7f41aa4b4200() GS:8b803ecc() > knlGS: > [ 20.263202] CS: 0010 DS: ES: CR0: 80050033 > [ 20.263204] CR2: b2b582cc2000 CR3: 0003b6731000 CR4: > 003406e0 > > I bisected this to 82fef0ad811f "x86/mm: unencrypted non-blocking DMA > allocations use coherent pools". Reverting 1ee18de92927 resolves the > issue. > > Looks like Thinkpad X60 doesn't have VT-d, but could still be DMA > related. Note that newer -next releases seem to behave okay for me. The commit pointed out by siection is really simple: AFAIK you could verify it is responsible by turning off CONFIG_AMD_MEM_ENCRYPT on latest kernel... Best regards, Pavel index 1d6104ea8af0..2bf819d3 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1520,6 +1520,7 @@ config X86_CPA_STATISTICS config AMD_MEM_ENCRYPT bool "AMD Secure Memory Encryption (SME) support" depends on X86_64 && CPU_SUP_AMD + select DMA_COHERENT_POOL select DYNAMIC_PHYSICAL_MASK select ARCH_USE_MEMREMAP_PROT select ARCH_HAS_FORCE_DMA_UNENCRYPTED -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: PGP signature
Re: next-0519 on thinkpad x60: sound related? window manager crash
I have a similar issue, caused between aaa2faab4ed8 and b170290c2836. [ 20.263098] BUG: unable to handle page fault for address: b2b582cc2000 [ 20.263104] #PF: supervisor write access in kernel mode [ 20.263105] #PF: error_code(0x000b) - reserved bit violation [ 20.263107] PGD 3fd03b067 P4D 3fd03b067 PUD 3fd03c067 PMD 3f8822067 PTE 8000273942ab2163 [ 20.263113] Oops: 000b [#1] PREEMPT SMP [ 20.263117] CPU: 3 PID: 691 Comm: mpv Not tainted 5.7.0-11262-gb170290c2836 #1 [ 20.263119] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./B450 Pro4, BIOS P4.10 03/05/2020 [ 20.263125] RIP: 0010:__memset+0x24/0x30 [ 20.263128] Code: cc cc cc cc cc cc 0f 1f 44 00 00 49 89 f9 48 89 d1 83 e2 07 48 c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 f3 [ 20.263131] RSP: 0018:b2b583d07e10 EFLAGS: 00010216 [ 20.263133] RAX: RBX: 8b8000102c00 RCX: 4000 [ 20.263134] RDX: RSI: RDI: b2b582cc2000 [ 20.263136] RBP: 8b8000101000 R08: R09: b2b582cc2000 [ 20.263137] R10: 5356 R11: 8b8000102c18 R12: [ 20.263139] R13: R14: 8b8039944200 R15: 9794daa0 [ 20.263141] FS: 7f41aa4b4200() GS:8b803ecc() knlGS: [ 20.263143] CS: 0010 DS: ES: CR0: 80050033 [ 20.263144] CR2: b2b582cc2000 CR3: 0003b6731000 CR4: 003406e0 [ 20.263146] Call Trace: [ 20.263151] ? snd_pcm_hw_params+0x3f3/0x47a [ 20.263154] ? snd_pcm_common_ioctl+0xf2/0xf73 [ 20.263158] ? snd_pcm_ioctl+0x1e/0x29 [ 20.263161] ? ksys_ioctl+0x77/0x91 [ 20.263163] ? __x64_sys_ioctl+0x11/0x14 [ 20.263166] ? do_syscall_64+0x3d/0xf5 [ 20.263170] ? entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 20.263173] Modules linked in: uvcvideo videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videodev snd_usb_audio videobuf2_common snd_hwdep snd_usbmidi_lib input_leds snd_rawmidi led_class [ 20.263182] CR2: b2b582cc2000 [ 20.263184] ---[ end trace c6b47a774b91f0a0 ]--- [ 20.263187] RIP: 0010:__memset+0x24/0x30 [ 20.263190] Code: cc cc cc cc cc cc 0f 1f 44 00 00 49 89 f9 48 89 d1 83 e2 07 48 c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 0f af c6 48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 d1 f3 [ 20.263192] RSP: 0018:b2b583d07e10 EFLAGS: 00010216 [ 20.263193] RAX: RBX: 8b8000102c00 RCX: 4000 [ 20.263195] RDX: RSI: RDI: b2b582cc2000 [ 20.263196] RBP: 8b8000101000 R08: R09: b2b582cc2000 [ 20.263197] R10: 5356 R11: 8b8000102c18 R12: [ 20.263199] R13: R14: 8b8039944200 R15: 9794daa0 [ 20.263201] FS: 7f41aa4b4200() GS:8b803ecc() knlGS: [ 20.263202] CS: 0010 DS: ES: CR0: 80050033 [ 20.263204] CR2: b2b582cc2000 CR3: 0003b6731000 CR4: 003406e0 I bisected this to 82fef0ad811f "x86/mm: unencrypted non-blocking DMA allocations use coherent pools". Reverting 1ee18de92927 resolves the issue. Looks like Thinkpad X60 doesn't have VT-d, but could still be DMA related.
Re: next-0519 on thinkpad x60: sound related? window manager crash
Hi! > My window manager stopped responding. I was able to recover machine > using sysrq-k. > > I started writing nice report, when session failed second time. And > then third time on next attempt. > > Any ideas? > > I'll send this out before this locks up... Today it crashed again, with similar oops in the log. My records say: fb57b1fabcb2 (HEAD, tag: next-20200519, origin/master, origin/HEAD) HEAD@{0}: checkout: moving from bdecf38f228bcca73b31ada98b5b7ba1215eb9c9 to next-20200519 bdecf38f228b (tag: next-20200515) HEAD@{1}: checkout: moving from 30e2206e11ce27ae910cc0dab21472429e400a87 to next-20200515 So it is well possible that 0515 worked okay for few days. Hmm. Perhaps I'll try going to 0516 and see if it is stable? Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html signature.asc Description: PGP signature
Re: next-0519 on thinkpad x60: sound related? window manager crash
On Wed, 20 May 2020 13:39:06 +0200, Pavel Machek wrote: > > On Wed 2020-05-20 13:37:02, Takashi Iwai wrote: > > On Wed, 20 May 2020 13:11:37 +0200, > > Pavel Machek wrote: > > > > > > Hi! > > > > > > My window manager stopped responding. I was able to recover machine > > > using sysrq-k. > > > > > > I started writing nice report, when session failed second time. And > > > then third time on next attempt. > > > > > > Any ideas? > > > > Do you know when the regression started? > > There have been significant code changes regarding the sound buffer > > management, and it's merged in 5.6-rc1. Other than that, I have no > > idea yet. > > It is first time I seen this. I may have missed the oops in the logs, > but I would not miss marco dying. > > So... AFAICT this was not there in -next20200505 or so. Ah so it's so new. Then I don't think it's from the sound driver code change; there haven't been much changes in the core part that may lead to such an error. Takashi > Best regard, > Pavel > > > > > [ 3730.016148] perf: interrupt took too long (3135 > 3133), lowering > > > kernel.perf_event_max_sample_rate to 63750 > > > [ 4274.984810] BUG: unable to handle page fault for address: f860 > > > [ 4274.984821] #PF: supervisor write access in kernel mode > > > [ 4274.984827] #PF: error_code(0x0002) - not-present page > > > [ 4274.984833] *pdpt = 2c0b2001 *pde = > > > [ 4274.984843] Oops: 0002 [#1] PREEMPT SMP PTI > > > [ 4274.984853] CPU: 1 PID: 3351 Comm: marco Not tainted > > > 5.7.0-rc6-next-20200519+ #115 > > > [ 4274.984859] Hardware name: LENOVO 17097HU/17097HU, BIOS 7BETD8WW (2.19 > > > ) 03/31/2011 > > > [ 4274.984871] EIP: memset+0xb/0x20 > > > [ 4274.984878] Code: f9 01 72 0b 8a 0e 88 0f 8d b4 26 00 00 00 00 8b 45 > > > f0 83 c4 04 5b 5e 5f 5d c3 8d 74 26 00 90 55 89 e5 57 89 c7 53 89 c3 89 > > > d0 aa 89 d8 5b 5f 5d c3 cc cc cc cc cc cc cc cc cc cc cc cc cc 89 > > > [ 4274.984885] EAX: EBX: f85fe000 ECX: 0001e000 EDX: > > > [ 4274.984892] ESI: ed158400 EDI: f860 EBP: edcc9e6c ESP: edcc9e64 > > > [ 4274.984898] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: > > > 00210246 > > > [ 4274.984905] CR0: 80050033 CR2: f860 CR3: 2c114000 CR4: 06b0 > > > [ 4274.984910] Call Trace: > > > [ 4274.984923] snd_pcm_hw_params+0x38d/0x400 > > > [ 4274.984930] snd_pcm_ioctl+0x187/0xe80 > > > [ 4274.984940] ? __fget_files+0x86/0xc0 > > > [ 4274.984947] ? __fget_light+0x6b/0x80 > > > [ 4274.984954] ? snd_pcm_status_user64+0x90/0x90 > > > [ 4274.984962] ksys_ioctl+0x1cd/0x880 > > > [ 4274.984971] ? ksys_mmap_pgoff+0x81/0xc0 > > > [ 4274.984978] ? fput+0xd/0x10 > > > [ 4274.984984] ? ksys_mmap_pgoff+0x8d/0xc0 > > > [ 4274.984991] __ia32_sys_ioctl+0x10/0x12 > > > [ 4274.985000] do_int80_syscall_32+0x3c/0x100 > > > [ 4274.985010] entry_INT80_32+0x116/0x116 > > > [ 4274.985016] EIP: 0xb7f17092 > > > [ 4274.985023] Code: 00 00 00 e9 90 ff ff ff ff a3 24 00 00 00 68 30 00 > > > 00 00 e9 80 ff ff ff ff a3 e8 ff ff ff 66 90 00 00 00 00 00 00 00 00 cd > > > 80 8d b4 26 00 00 00 00 8d b6 00 00 00 00 8b 1c 24 c3 8d b4 26 00 > > > [ 4274.985030] EAX: ffda EBX: 0011 ECX: c25c4111 EDX: bf8d5280 > > > [ 4274.985036] ESI: 08250880 EDI: bf8d5280 EBP: 082a4150 ESP: bf8d50a4 > > > [ 4274.985042] DS: 007b ES: 007b FS: GS: 0033 SS: 007b EFLAGS: > > > 00200292 > > > [ 4274.985051] ? nmi+0xcc/0x2bc > > > [ 4274.985055] Modules linked in: > > > [ 4274.985063] CR2: f860 > > > [ 4274.985072] ---[ end trace 61b0852711d6de1d ]--- > > > [ 4274.985079] EIP: memset+0xb/0x20 > > > [ 4274.985086] Code: f9 01 72 0b 8a 0e 88 0f 8d b4 26 00 00 00 00 8b 45 > > > f0 83 c4 04 5b 5e 5f 5d c3 8d 74 26 00 90 55 89 e5 57 89 c7 53 89 c3 89 > > > d0 aa 89 d8 5b 5f 5d c3 cc cc cc cc cc cc cc cc cc cc cc cc cc 89 > > > [ 4274.985092] EAX: EBX: f85fe000 ECX: 0001e000 EDX: > > > [ 4274.985099] ESI: ed158400 EDI: f860 EBP: edcc9e6c ESP: edcc9e64 > > > [ 4274.985105] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: > > > 00210246 > > > [ 4274.985112] CR0: 80050033 CR2: f860 CR3: 2c114000 CR4: 06b0 > > > [ 4337.396551] sysrq: SAK > > > [ 4337.397010] tty tty7: SAK: killed process 2963 (Xorg): by session > > > [ 4337.397282] tty tty7: SAK: killed process 2963 (Xorg): by controlling > > > tty > > > [ 4337.397621] tty tty7: SAK: killed process 3484 (console-kit-dae): by > > > fd#9 > > > [ 4337.397934] tty tty7: SAK: killed process 3485 (console-kit-dae): by > > > fd#9 > > > [ 4337.397940] tty tty7: SAK: killed process 3486 (console-kit-dae): by > > > fd#9 > > > [ 4337.397945] tty tty7: SAK: killed process 3487 (console-kit-dae): by > > > fd#9 > > > [ 4337.397951] tty tty7: SAK: killed process 3488 (console-kit-dae): by > > > fd#9 > > > [ 4337.397956] tty tty7: SAK: killed process 3489 (cons
Re: next-0519 on thinkpad x60: sound related? window manager crash
On Wed 2020-05-20 13:37:02, Takashi Iwai wrote: > On Wed, 20 May 2020 13:11:37 +0200, > Pavel Machek wrote: > > > > Hi! > > > > My window manager stopped responding. I was able to recover machine > > using sysrq-k. > > > > I started writing nice report, when session failed second time. And > > then third time on next attempt. > > > > Any ideas? > > Do you know when the regression started? > There have been significant code changes regarding the sound buffer > management, and it's merged in 5.6-rc1. Other than that, I have no > idea yet. It is first time I seen this. I may have missed the oops in the logs, but I would not miss marco dying. So... AFAICT this was not there in -next20200505 or so. Best regard, Pavel > > [ 3730.016148] perf: interrupt took too long (3135 > 3133), lowering > > kernel.perf_event_max_sample_rate to 63750 > > [ 4274.984810] BUG: unable to handle page fault for address: f860 > > [ 4274.984821] #PF: supervisor write access in kernel mode > > [ 4274.984827] #PF: error_code(0x0002) - not-present page > > [ 4274.984833] *pdpt = 2c0b2001 *pde = > > [ 4274.984843] Oops: 0002 [#1] PREEMPT SMP PTI > > [ 4274.984853] CPU: 1 PID: 3351 Comm: marco Not tainted > > 5.7.0-rc6-next-20200519+ #115 > > [ 4274.984859] Hardware name: LENOVO 17097HU/17097HU, BIOS 7BETD8WW (2.19 ) > > 03/31/2011 > > [ 4274.984871] EIP: memset+0xb/0x20 > > [ 4274.984878] Code: f9 01 72 0b 8a 0e 88 0f 8d b4 26 00 00 00 00 8b 45 f0 > > 83 c4 04 5b 5e 5f 5d c3 8d 74 26 00 90 55 89 e5 57 89 c7 53 89 c3 89 d0 > > aa 89 d8 5b 5f 5d c3 cc cc cc cc cc cc cc cc cc cc cc cc cc 89 > > [ 4274.984885] EAX: EBX: f85fe000 ECX: 0001e000 EDX: > > [ 4274.984892] ESI: ed158400 EDI: f860 EBP: edcc9e6c ESP: edcc9e64 > > [ 4274.984898] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210246 > > [ 4274.984905] CR0: 80050033 CR2: f860 CR3: 2c114000 CR4: 06b0 > > [ 4274.984910] Call Trace: > > [ 4274.984923] snd_pcm_hw_params+0x38d/0x400 > > [ 4274.984930] snd_pcm_ioctl+0x187/0xe80 > > [ 4274.984940] ? __fget_files+0x86/0xc0 > > [ 4274.984947] ? __fget_light+0x6b/0x80 > > [ 4274.984954] ? snd_pcm_status_user64+0x90/0x90 > > [ 4274.984962] ksys_ioctl+0x1cd/0x880 > > [ 4274.984971] ? ksys_mmap_pgoff+0x81/0xc0 > > [ 4274.984978] ? fput+0xd/0x10 > > [ 4274.984984] ? ksys_mmap_pgoff+0x8d/0xc0 > > [ 4274.984991] __ia32_sys_ioctl+0x10/0x12 > > [ 4274.985000] do_int80_syscall_32+0x3c/0x100 > > [ 4274.985010] entry_INT80_32+0x116/0x116 > > [ 4274.985016] EIP: 0xb7f17092 > > [ 4274.985023] Code: 00 00 00 e9 90 ff ff ff ff a3 24 00 00 00 68 30 00 00 > > 00 e9 80 ff ff ff ff a3 e8 ff ff ff 66 90 00 00 00 00 00 00 00 00 cd 80 > > 8d b4 26 00 00 00 00 8d b6 00 00 00 00 8b 1c 24 c3 8d b4 26 00 > > [ 4274.985030] EAX: ffda EBX: 0011 ECX: c25c4111 EDX: bf8d5280 > > [ 4274.985036] ESI: 08250880 EDI: bf8d5280 EBP: 082a4150 ESP: bf8d50a4 > > [ 4274.985042] DS: 007b ES: 007b FS: GS: 0033 SS: 007b EFLAGS: 00200292 > > [ 4274.985051] ? nmi+0xcc/0x2bc > > [ 4274.985055] Modules linked in: > > [ 4274.985063] CR2: f860 > > [ 4274.985072] ---[ end trace 61b0852711d6de1d ]--- > > [ 4274.985079] EIP: memset+0xb/0x20 > > [ 4274.985086] Code: f9 01 72 0b 8a 0e 88 0f 8d b4 26 00 00 00 00 8b 45 f0 > > 83 c4 04 5b 5e 5f 5d c3 8d 74 26 00 90 55 89 e5 57 89 c7 53 89 c3 89 d0 > > aa 89 d8 5b 5f 5d c3 cc cc cc cc cc cc cc cc cc cc cc cc cc 89 > > [ 4274.985092] EAX: EBX: f85fe000 ECX: 0001e000 EDX: > > [ 4274.985099] ESI: ed158400 EDI: f860 EBP: edcc9e6c ESP: edcc9e64 > > [ 4274.985105] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210246 > > [ 4274.985112] CR0: 80050033 CR2: f860 CR3: 2c114000 CR4: 06b0 > > [ 4337.396551] sysrq: SAK > > [ 4337.397010] tty tty7: SAK: killed process 2963 (Xorg): by session > > [ 4337.397282] tty tty7: SAK: killed process 2963 (Xorg): by controlling tty > > [ 4337.397621] tty tty7: SAK: killed process 3484 (console-kit-dae): by fd#9 > > [ 4337.397934] tty tty7: SAK: killed process 3485 (console-kit-dae): by fd#9 > > [ 4337.397940] tty tty7: SAK: killed process 3486 (console-kit-dae): by fd#9 > > [ 4337.397945] tty tty7: SAK: killed process 3487 (console-kit-dae): by fd#9 > > [ 4337.397951] tty tty7: SAK: killed process 3488 (console-kit-dae): by fd#9 > > [ 4337.397956] tty tty7: SAK: killed process 3489 (console-kit-dae): by fd#9 > > [ 4337.397961] tty tty7: SAK: killed process 3490 (console-kit-dae): by fd#9 > > [ 4337.397967] tty tty7: SAK: killed process 3491 (console-kit-dae): by fd#9 > > [ 4337.397972] tty tty7: SAK: killed process 3492 (console-kit-dae): by fd#9 > > [ 4337.397978] tty tty7: SAK: killed process 3493 (console-kit-dae): by fd#9 > > [ 4337.397983] tty tty7: SAK: killed process 3494 (console-kit-dae): by fd#9 > > [ 4337.397989] tty tty7: SAK: ki
Re: next-0519 on thinkpad x60: sound related? window manager crash
On Wed, 20 May 2020 13:11:37 +0200, Pavel Machek wrote: > > Hi! > > My window manager stopped responding. I was able to recover machine > using sysrq-k. > > I started writing nice report, when session failed second time. And > then third time on next attempt. > > Any ideas? Do you know when the regression started? There have been significant code changes regarding the sound buffer management, and it's merged in 5.6-rc1. Other than that, I have no idea yet. Takashi > > I'll send this out before this locks up... > > Best regards, > Pavel > > [ 2801.147411] sdhci-pci :15:00.2: Will use DMA mode even though HW > doesn't fully claim to support it. > [ 2801.187449] sdhci-pci :15:00.2: Will use DMA mode even though HW > doesn't fully claim to support it. > [ 2801.192260] usb 1-2: new high-speed USB device number 5 using ehci-pci > [ 2801.240241] sdhci-pci :15:00.2: Will use DMA mode even though HW > doesn't fully claim to support it. > [ 2801.300663] sdhci-pci :15:00.2: Will use DMA mode even though HW > doesn't fully claim to support it. > [ 2801.352181] usb 1-2: New USB device found, idVendor=0525, idProduct=a4a1, > bcdDevice= 5.07 > [ 2801.352192] usb 1-2: New USB device strings: Mfr=1, Product=2, > SerialNumber=0 > [ 2801.352200] usb 1-2: Product: Ethernet Gadget > [ 2801.352207] usb 1-2: Manufacturer: Linux 5.7.0-rc4-00046-g6d7c0f75a522 > with musb-hdrc > [ 2801.419872] e1000e :02:00.0 eth1: NIC Link is Down > [ 2801.428760] cdc_ether 1-2:1.0 usb0: register 'cdc_ether' at > usb-:00:1d.7-2, CDC Ethernet Device, 72:ed:12:23:c9:c2 > [ 2804.020289] wlan0: authenticate with 5c:f4:ab:10:d2:bb > [ 2804.020451] wlan0: send auth to 5c:f4:ab:10:d2:bb (try 1/3) > [ 2804.022385] wlan0: authenticated > [ 2804.024243] wlan0: associate with 5c:f4:ab:10:d2:bb (try 1/3) > [ 2804.026985] wlan0: RX AssocResp from 5c:f4:ab:10:d2:bb (capab=0x411 > status=0 aid=2) > [ 2804.028961] wlan0: associated > [ 2874.520955] perf: interrupt took too long (2507 > 2500), lowering > kernel.perf_event_max_sample_rate to 79750 > [ 3730.016148] perf: interrupt took too long (3135 > 3133), lowering > kernel.perf_event_max_sample_rate to 63750 > [ 4274.984810] BUG: unable to handle page fault for address: f860 > [ 4274.984821] #PF: supervisor write access in kernel mode > [ 4274.984827] #PF: error_code(0x0002) - not-present page > [ 4274.984833] *pdpt = 2c0b2001 *pde = > [ 4274.984843] Oops: 0002 [#1] PREEMPT SMP PTI > [ 4274.984853] CPU: 1 PID: 3351 Comm: marco Not tainted > 5.7.0-rc6-next-20200519+ #115 > [ 4274.984859] Hardware name: LENOVO 17097HU/17097HU, BIOS 7BETD8WW (2.19 ) > 03/31/2011 > [ 4274.984871] EIP: memset+0xb/0x20 > [ 4274.984878] Code: f9 01 72 0b 8a 0e 88 0f 8d b4 26 00 00 00 00 8b 45 f0 83 > c4 04 5b 5e 5f 5d c3 8d 74 26 00 90 55 89 e5 57 89 c7 53 89 c3 89 d0 aa > 89 d8 5b 5f 5d c3 cc cc cc cc cc cc cc cc cc cc cc cc cc 89 > [ 4274.984885] EAX: EBX: f85fe000 ECX: 0001e000 EDX: > [ 4274.984892] ESI: ed158400 EDI: f860 EBP: edcc9e6c ESP: edcc9e64 > [ 4274.984898] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210246 > [ 4274.984905] CR0: 80050033 CR2: f860 CR3: 2c114000 CR4: 06b0 > [ 4274.984910] Call Trace: > [ 4274.984923] snd_pcm_hw_params+0x38d/0x400 > [ 4274.984930] snd_pcm_ioctl+0x187/0xe80 > [ 4274.984940] ? __fget_files+0x86/0xc0 > [ 4274.984947] ? __fget_light+0x6b/0x80 > [ 4274.984954] ? snd_pcm_status_user64+0x90/0x90 > [ 4274.984962] ksys_ioctl+0x1cd/0x880 > [ 4274.984971] ? ksys_mmap_pgoff+0x81/0xc0 > [ 4274.984978] ? fput+0xd/0x10 > [ 4274.984984] ? ksys_mmap_pgoff+0x8d/0xc0 > [ 4274.984991] __ia32_sys_ioctl+0x10/0x12 > [ 4274.985000] do_int80_syscall_32+0x3c/0x100 > [ 4274.985010] entry_INT80_32+0x116/0x116 > [ 4274.985016] EIP: 0xb7f17092 > [ 4274.985023] Code: 00 00 00 e9 90 ff ff ff ff a3 24 00 00 00 68 30 00 00 00 > e9 80 ff ff ff ff a3 e8 ff ff ff 66 90 00 00 00 00 00 00 00 00 cd 80 8d > b4 26 00 00 00 00 8d b6 00 00 00 00 8b 1c 24 c3 8d b4 26 00 > [ 4274.985030] EAX: ffda EBX: 0011 ECX: c25c4111 EDX: bf8d5280 > [ 4274.985036] ESI: 08250880 EDI: bf8d5280 EBP: 082a4150 ESP: bf8d50a4 > [ 4274.985042] DS: 007b ES: 007b FS: GS: 0033 SS: 007b EFLAGS: 00200292 > [ 4274.985051] ? nmi+0xcc/0x2bc > [ 4274.985055] Modules linked in: > [ 4274.985063] CR2: f860 > [ 4274.985072] ---[ end trace 61b0852711d6de1d ]--- > [ 4274.985079] EIP: memset+0xb/0x20 > [ 4274.985086] Code: f9 01 72 0b 8a 0e 88 0f 8d b4 26 00 00 00 00 8b 45 f0 83 > c4 04 5b 5e 5f 5d c3 8d 74 26 00 90 55 89 e5 57 89 c7 53 89 c3 89 d0 aa > 89 d8 5b 5f 5d c3 cc cc cc cc cc cc cc cc cc cc cc cc cc 89 > [ 4274.985092] EAX: EBX: f85fe000 ECX: 0001e000 EDX: > [ 4274.985099] ESI: ed158400 EDI: f860 EBP: edcc9e6c ESP: edcc9e64 > [ 4274.985105] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210246 > [ 4274.985112
next-0519 on thinkpad x60: sound related? window manager crash
Hi! My window manager stopped responding. I was able to recover machine using sysrq-k. I started writing nice report, when session failed second time. And then third time on next attempt. Any ideas? I'll send this out before this locks up... Best regards, Pavel [ 2801.147411] sdhci-pci :15:00.2: Will use DMA mode even though HW doesn't fully claim to support it. [ 2801.187449] sdhci-pci :15:00.2: Will use DMA mode even though HW doesn't fully claim to support it. [ 2801.192260] usb 1-2: new high-speed USB device number 5 using ehci-pci [ 2801.240241] sdhci-pci :15:00.2: Will use DMA mode even though HW doesn't fully claim to support it. [ 2801.300663] sdhci-pci :15:00.2: Will use DMA mode even though HW doesn't fully claim to support it. [ 2801.352181] usb 1-2: New USB device found, idVendor=0525, idProduct=a4a1, bcdDevice= 5.07 [ 2801.352192] usb 1-2: New USB device strings: Mfr=1, Product=2, SerialNumber=0 [ 2801.352200] usb 1-2: Product: Ethernet Gadget [ 2801.352207] usb 1-2: Manufacturer: Linux 5.7.0-rc4-00046-g6d7c0f75a522 with musb-hdrc [ 2801.419872] e1000e :02:00.0 eth1: NIC Link is Down [ 2801.428760] cdc_ether 1-2:1.0 usb0: register 'cdc_ether' at usb-:00:1d.7-2, CDC Ethernet Device, 72:ed:12:23:c9:c2 [ 2804.020289] wlan0: authenticate with 5c:f4:ab:10:d2:bb [ 2804.020451] wlan0: send auth to 5c:f4:ab:10:d2:bb (try 1/3) [ 2804.022385] wlan0: authenticated [ 2804.024243] wlan0: associate with 5c:f4:ab:10:d2:bb (try 1/3) [ 2804.026985] wlan0: RX AssocResp from 5c:f4:ab:10:d2:bb (capab=0x411 status=0 aid=2) [ 2804.028961] wlan0: associated [ 2874.520955] perf: interrupt took too long (2507 > 2500), lowering kernel.perf_event_max_sample_rate to 79750 [ 3730.016148] perf: interrupt took too long (3135 > 3133), lowering kernel.perf_event_max_sample_rate to 63750 [ 4274.984810] BUG: unable to handle page fault for address: f860 [ 4274.984821] #PF: supervisor write access in kernel mode [ 4274.984827] #PF: error_code(0x0002) - not-present page [ 4274.984833] *pdpt = 2c0b2001 *pde = [ 4274.984843] Oops: 0002 [#1] PREEMPT SMP PTI [ 4274.984853] CPU: 1 PID: 3351 Comm: marco Not tainted 5.7.0-rc6-next-20200519+ #115 [ 4274.984859] Hardware name: LENOVO 17097HU/17097HU, BIOS 7BETD8WW (2.19 ) 03/31/2011 [ 4274.984871] EIP: memset+0xb/0x20 [ 4274.984878] Code: f9 01 72 0b 8a 0e 88 0f 8d b4 26 00 00 00 00 8b 45 f0 83 c4 04 5b 5e 5f 5d c3 8d 74 26 00 90 55 89 e5 57 89 c7 53 89 c3 89 d0 aa 89 d8 5b 5f 5d c3 cc cc cc cc cc cc cc cc cc cc cc cc cc 89 [ 4274.984885] EAX: EBX: f85fe000 ECX: 0001e000 EDX: [ 4274.984892] ESI: ed158400 EDI: f860 EBP: edcc9e6c ESP: edcc9e64 [ 4274.984898] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210246 [ 4274.984905] CR0: 80050033 CR2: f860 CR3: 2c114000 CR4: 06b0 [ 4274.984910] Call Trace: [ 4274.984923] snd_pcm_hw_params+0x38d/0x400 [ 4274.984930] snd_pcm_ioctl+0x187/0xe80 [ 4274.984940] ? __fget_files+0x86/0xc0 [ 4274.984947] ? __fget_light+0x6b/0x80 [ 4274.984954] ? snd_pcm_status_user64+0x90/0x90 [ 4274.984962] ksys_ioctl+0x1cd/0x880 [ 4274.984971] ? ksys_mmap_pgoff+0x81/0xc0 [ 4274.984978] ? fput+0xd/0x10 [ 4274.984984] ? ksys_mmap_pgoff+0x8d/0xc0 [ 4274.984991] __ia32_sys_ioctl+0x10/0x12 [ 4274.985000] do_int80_syscall_32+0x3c/0x100 [ 4274.985010] entry_INT80_32+0x116/0x116 [ 4274.985016] EIP: 0xb7f17092 [ 4274.985023] Code: 00 00 00 e9 90 ff ff ff ff a3 24 00 00 00 68 30 00 00 00 e9 80 ff ff ff ff a3 e8 ff ff ff 66 90 00 00 00 00 00 00 00 00 cd 80 8d b4 26 00 00 00 00 8d b6 00 00 00 00 8b 1c 24 c3 8d b4 26 00 [ 4274.985030] EAX: ffda EBX: 0011 ECX: c25c4111 EDX: bf8d5280 [ 4274.985036] ESI: 08250880 EDI: bf8d5280 EBP: 082a4150 ESP: bf8d50a4 [ 4274.985042] DS: 007b ES: 007b FS: GS: 0033 SS: 007b EFLAGS: 00200292 [ 4274.985051] ? nmi+0xcc/0x2bc [ 4274.985055] Modules linked in: [ 4274.985063] CR2: f860 [ 4274.985072] ---[ end trace 61b0852711d6de1d ]--- [ 4274.985079] EIP: memset+0xb/0x20 [ 4274.985086] Code: f9 01 72 0b 8a 0e 88 0f 8d b4 26 00 00 00 00 8b 45 f0 83 c4 04 5b 5e 5f 5d c3 8d 74 26 00 90 55 89 e5 57 89 c7 53 89 c3 89 d0 aa 89 d8 5b 5f 5d c3 cc cc cc cc cc cc cc cc cc cc cc cc cc 89 [ 4274.985092] EAX: EBX: f85fe000 ECX: 0001e000 EDX: [ 4274.985099] ESI: ed158400 EDI: f860 EBP: edcc9e6c ESP: edcc9e64 [ 4274.985105] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 EFLAGS: 00210246 [ 4274.985112] CR0: 80050033 CR2: f860 CR3: 2c114000 CR4: 06b0 [ 4337.396551] sysrq: SAK [ 4337.397010] tty tty7: SAK: killed process 2963 (Xorg): by session [ 4337.397282] tty tty7: SAK: killed process 2963 (Xorg): by controlling tty [ 4337.397621] tty tty7: SAK: killed process 3484 (console-kit-dae): by fd#9 [ 4337.397934] tty tty7: SAK: killed process 3485 (console-kit-dae): by fd#9 [ 4337.397940] tty tty7: SAK: killed process 3486 (console-ki