On 7/14/19 10:06 AM, Nikolai Zhubr wrote:
> Hi all,
> 
> After reading some (apparently contradictory) revisions of DMA api references 
> in Documentation/DMA-*.txt, some (contradictory) discussions thereof, and 
> even digging through the in-tree drivers in search for a good enlightening 
> example, still I have to ask for advice.
> 
> I'm crafting a tiny driver (or rather, a kernel-mode helper) for a very 
> special PCIe device. And actually it does work already, but performs 
> differenly on different kernels. I'm targeting x86 (i686) only (although 
> preferrably the driver should stay platform-neutral) and I need to support 
> kernels 4.9+. Due to how the device is designed and used, very little has to 
> be done in kernel space. The device has large internal memory, which 
> accumulates some measurement data, and it is capable of transferring it to 
> the host using DMA (with at least 32-bit address space available). Arranging 
> memory for DMA is pretty much the only thing that userspace can not 
> reasonably do, so this needs to be in the driver. So my currenly attempted 
> layout is as follows:
> 
> 1. In the (kernel-mode) driver, allocate large contiguous block of physical 
> memory to do DMA into. It will be later reused several times. This block does 
> not need to have a kernel-mode virtual address because it will never be 
> accessed from the driver directly. The block size is typically 128M and I use 
> CMA=256M. Currently I use dma_alloc_coherent(), but I'm not convinced it 
> really needs to be a strictly coherent memory, for performance reasons, see 
> below. Also, AFAICS on x86 dma_alloc_coherent() always creates a kernel 
> address mapping anyway, so maybe I'd better simply kalloc() with subsequent 
> dma_map_single()?
> 
> 2. Upon DMA completion (from device to host), some sort of 
> barrier/synchronization might be necessary (to be safe WRT speculative loads, 
> cache, etc), like dma_cache_sync() or dma_sync_single_for_cpu(), however the 
> latter looks like a nop for x86 AFAICS, and the former is apparently 
> flush_write_buffers() which is not very involved either (asm lock; nop) and 
> does not look usefull for my case. Currentlly, I do not use any, and it seems 
> like OK, maybe by pure luck. So, is it so trivially simple on x86 or am I 
> just missing something horribly big here?
> 
> 3. mmap this buffer for userspace. Reading from it should be as fast as 
> possible, therefore this block AFAICS should be cacheble (and prefetchable 
> and whatever else for better performance), at least from userspace context. 
> It is not quite clear if such properties would depend on block allocation 
> method (in step 1 above) or just on remapping attributes only. Currently, for 
> mmap I employ dma_mmap_coherent(), but it seems also possible to use 
> remap_pfn_range(), and also change vm_page_prot somewhat. I've already found 
> that e.g. pgprot_noncached hurts performance quite a lot, but supposedly 
> without it some DMA barrier (step 2 above) seems still necessary?
> 
> Any hints greatly appreciated,
> 
> Regards,
> Nikolai

Hi,

I suggest that you try some mailing list(s) besides linux-kernel.
The MAINTAINERS file has these possibilities:

dmaeng...@vger.kernel.org
io...@lists.linux-foundation.org

or just try linux...@vger.kernel.org

-- 
~Randy

Reply via email to