Re: amdgpu/TTM oopses since merging swiotlb_dma_ops into the dma_direct code
On Mon, 14 Jan 2019 at 19:10, Christoph Hellwig wrote: > > On Thu, Jan 10, 2019 at 06:52:26PM +0100, Sibren Vasse wrote: > > On Thu, 10 Jan 2019 at 15:48, Christoph Hellwig wrote: > > > > > > On Thu, Jan 10, 2019 at 03:00:31PM +0100, Christian König wrote: > > > >> From the trace it looks like we git the case where swiotlb tries > > > >> to copy back data from a bounce buffer, but hits a dangling or NULL > > > >> pointer. So a couple questions for the submitter: > > > >> > > > >> - does the system have more than 4GB memory and thus use swiotlb? > > > >> (check /proc/meminfo, and if something SWIOTLB appears in dmesg) > > > >> - does the device this happens on have a DMA mask smaller than > > > >> the available memory, that is should swiotlb be used here to start > > > >> with? > > > > > > > > Rather unlikely. The device is an AMD GPU, so we can address memory up > > > > to > > > > 1TB. > > > > > > So we probably somehow got a false positive. > > > > > > For now I'like the reported to confirm that the dma_direct_unmap_page+0x92 > > > backtrace really is in the swiotlb code (I can't think of anything else, > > > but I'd rather be sure). > > I'm not sure what you want me to confirm. Could you elaborate? > > Please open the vmlinux file for which this happend in gdb, > then send the output from this command > > l *(dma_direct_unmap_page+0x92) > > to this thread. My call trace contained: Jan 10 16:34:51 kernel: dma_direct_unmap_page+0x7a/0x80 (gdb) list *(dma_direct_unmap_page+0x7a) 0x810fa28a is in dma_direct_unmap_page (kernel/dma/direct.c:291). 286 size_t size, enum dma_data_direction dir, unsigned long attrs) 287 { 288 phys_addr_t phys = dma_to_phys(dev, addr); 289 290 if (!(attrs & DMA_ATTR_SKIP_CPU_SYNC)) 291 dma_direct_sync_single_for_cpu(dev, addr, size, dir); 292 293 if (unlikely(is_swiotlb_buffer(phys))) 294 swiotlb_tbl_unmap_single(dev, phys, size, dir, attrs); 295 } ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: amdgpu/TTM oopses since merging swiotlb_dma_ops into the dma_direct code
On Mon, 14 Jan 2019 at 19:13, Christoph Hellwig wrote: > > Hmm, I wonder if we are not actually using swiotlb in the end, > can you check if your dmesg contains this line or not? > > PCI-DMA: Using software bounce buffering for IO (SWIOTLB) This line does not appear in my dmesg. > > If not I guess we found a bug in swiotlb exit vs is_swiotlb_buffer, > and you can try this patch: > > diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c > index d6361776dc5c..1fb6fd68b9c7 100644 > --- a/kernel/dma/swiotlb.c > +++ b/kernel/dma/swiotlb.c > @@ -378,6 +378,8 @@ void __init swiotlb_exit(void) > memblock_free_late(io_tlb_start, >PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT)); > } > + io_tlb_start = 0; > + io_tlb_end = 0; > io_tlb_nslabs = 0; > max_segment = 0; > } With the patch applied to v5.0-rc2 I can no longer reproduce the issue. ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: amdgpu/TTM oopses since merging swiotlb_dma_ops into the dma_direct code
On Thu, 10 Jan 2019 at 18:06, Konrad Rzeszutek Wilk wrote: > > On Thu, Jan 10, 2019 at 04:26:43PM +0100, Sibren Vasse wrote: > > On Thu, 10 Jan 2019 at 14:57, Christoph Hellwig wrote: > > > > > > On Thu, Jan 10, 2019 at 10:59:02AM +0100, Michel Dänzer wrote: > > > > > > > > Hi Christoph, > > > > > > > > > > > > https://bugs.freedesktop.org/109234 (please ignore comments #6-#9) was > > > > bisected to your commit 55897af63091 "dma-direct: merge swiotlb_dma_ops > > > > into the dma_direct code". Any ideas? > > > > > > From the trace it looks like we git the case where swiotlb tries > > > to copy back data from a bounce buffer, but hits a dangling or NULL > > > pointer. So a couple questions for the submitter: > > My apologies if I misunderstand something, this subject matter is new to me. > > > > > > > > - does the system have more than 4GB memory and thus use swiotlb? > > My system has 8GB memory. The other report on the bug tracker had 16GB. > > > > >(check /proc/meminfo, and if something SWIOTLB appears in dmesg) > > /proc/meminfo: https://ptpb.pw/4rxI > > Can I grep dmesg for a string? > > Can you attach the 'dmesg'? Dmesg attached. dmesg Description: Binary data ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: amdgpu/TTM oopses since merging swiotlb_dma_ops into the dma_direct code
On Thu, 10 Jan 2019 at 14:57, Christoph Hellwig wrote: > > On Thu, Jan 10, 2019 at 10:59:02AM +0100, Michel Dänzer wrote: > > > > Hi Christoph, > > > > > > https://bugs.freedesktop.org/109234 (please ignore comments #6-#9) was > > bisected to your commit 55897af63091 "dma-direct: merge swiotlb_dma_ops > > into the dma_direct code". Any ideas? > > From the trace it looks like we git the case where swiotlb tries > to copy back data from a bounce buffer, but hits a dangling or NULL > pointer. So a couple questions for the submitter: My apologies if I misunderstand something, this subject matter is new to me. > > - does the system have more than 4GB memory and thus use swiotlb? My system has 8GB memory. The other report on the bug tracker had 16GB. >(check /proc/meminfo, and if something SWIOTLB appears in dmesg) /proc/meminfo: https://ptpb.pw/4rxI Can I grep dmesg for a string? > - does the device this happens on have a DMA mask smaller than >the available memory, that is should swiotlb be used here to start >with? It's a MSI Radeon RX 570 Gaming X 4GB. The other report was a RX 580. lshw output: https://ptpb.pw/6s0H Regards, Sibren ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel
Re: amdgpu/TTM oopses since merging swiotlb_dma_ops into the dma_direct code
On Thu, 10 Jan 2019 at 15:48, Christoph Hellwig wrote: > > On Thu, Jan 10, 2019 at 03:00:31PM +0100, Christian König wrote: > >> From the trace it looks like we git the case where swiotlb tries > >> to copy back data from a bounce buffer, but hits a dangling or NULL > >> pointer. So a couple questions for the submitter: > >> > >> - does the system have more than 4GB memory and thus use swiotlb? > >> (check /proc/meminfo, and if something SWIOTLB appears in dmesg) > >> - does the device this happens on have a DMA mask smaller than > >> the available memory, that is should swiotlb be used here to start > >> with? > > > > Rather unlikely. The device is an AMD GPU, so we can address memory up to > > 1TB. > > So we probably somehow got a false positive. > > For now I'like the reported to confirm that the dma_direct_unmap_page+0x92 > backtrace really is in the swiotlb code (I can't think of anything else, > but I'd rather be sure). I'm not sure what you want me to confirm. Could you elaborate? > > Second it would be great to print what the contents of io_tlb_start > and io_tlb_end are, e.g. by doing a printk_once in is_swiotlb_buffer, > maybe that gives a clue why we are hitting the swiotlb code here. diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h index 7c007ed7505f..042246dbae00 100644 --- a/include/linux/swiotlb.h +++ b/include/linux/swiotlb.h @@ -69,6 +69,7 @@ extern phys_addr_t io_tlb_start, io_tlb_end; static inline bool is_swiotlb_buffer(phys_addr_t paddr) { +printk_once(KERN_INFO "io_tlb_start: %llu, io_tlb_end: %llu", io_tlb_start, io_tlb_end); return paddr >= io_tlb_start && paddr < io_tlb_end; } Result on boot: [ 11.405558] io_tlb_start: 3782983680, io_tlb_end: 3850092544 Regards, Sibren ___ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel