Hi all, Here's to free software! (and good karma?)
A M Lokesh mailed me directly for a follow up question about this old thread, I thought it would be interesting to post my reply to the list. On 2013-01-22, at 10:23, Lokesh M wrote: > > After reading through your below thread, I was wondering if you could please > give me some feedback > > https://lkml.org/lkml/2011/11/18/462 > > I am in a similar situation, where I need to write a driver to pass the Data > from PCIe(FPGA) to my Linux machine (4MB would be enough - streaming). I > havn't checked if my Server supports Vt-d, But was interested in the way > your implementation was(UIO mapping). I've sort of abandonned the vt-d way of doing it because I also need to support an Atom architecture. Was a bit glad to do it like this since I know the IOMMU translation tables and what not aren't free and the code to map and support this was kind of hard to follow. Giving it up also means loosing FPGA stray memory access protection though, but it's not like I had the choice (Atom). > > I was looking to know more about the 2 buffer mapping you have for streaming > data and how it is achieved, We have mapping for BAR0 for register access and > I would like to implement similar buffer for Data as well. So Please let me > know any details and point me to few documentation to implement the same. > We have a bounce buffer mechanism (device ->Kernel ->User) but the speed is > around 100MB which I need to improve. On Nov 18, 2011, at 17:08, Greg KH wrote: > On Fri, Nov 18, 2011 at 04:16:23PM -0500, Jean-Francois Dagenais wrote: >> >> >> I had thought about cutting out a chunk of ram from the kernel's boot >> args, but had always feared cache/snooping errors. Not to mention I >> had no idea how to "claim" or setup this memory once my driver's probe >> function. Maybe I would still be lucky and it would just work? mmmh... > > Yeah, don't do that, it might not work out well. > > greg k-h Turns out, for me, this works very well!! So, here the jist of what I do... remember, I only need to support pure Core2 + Intel CPU/Chipset architectures on very specific COM modules. This means the architecture takes care of invalidating the CPU cachelines when the PCI-E device (an FPGA) bus masters reads and writes to RAM (bus snooping). The area I describe here is 128M (on the other system, I used 256M successfully) and is strictly used for FPGA write - CPU read. As a note, the other area I use (only 1M) for CPU write - FPGA read is still allocated using pci_alloc_consistent. The DMA address is collected through the 3rd argument of pci_alloc_consistent and is handed to UIO as UIO_MEM_PHYS type of memory. FYI, I had previously succeeded in allocating 4M using pci_alloc_consistent, but only if done quite soon after boot. This was on a Core2 duo arch. I do hook into the kernel boot parameter "memmap" to reserve a chunk of contiguous memory which I know falls inside a range which the BIOS declares (through E820) as available. This makes the kernel's memory management ignore this area. I compile-in a kernel module which looks like this: void* uio_hud_memblock_addr; EXPORT_SYMBOL(uio_hud_memblock_addr); unsigned long long uio_hud_memblock_size; EXPORT_SYMBOL(uio_hud_memblock_size); /* taken from parse_memmap_opt in e820.c and modified */ static int __init uio_hud_memblock_setup(char *str) { char *cur_p = str; u64 mem_size; if (!str) return -EINVAL; mem_size = memparse(str, &cur_p); if (cur_p == str) return -EINVAL; if (*cur_p == '$') { uio_hud_memblock_addr = (void*)(ulong)memparse(cur_p+1, &cur_p); uio_hud_memblock_size = mem_size; } else { return -EINVAL; } return *cur_p == '\0' ? 0 : -EINVAL; } __setup("memmap=", uio_hud_memblock_setup); static int __init uio_hud_memblock_init(void) { if(uio_hud_memblock_addr) { PDEBUG("ram memblock at %p (size:%llu)\n", uio_hud_memblock_addr, uio_hud_memblock_size); } else { PDEBUG("no memmap=nn$ss kernel parameter found\n"); } return 0; } early_initcall(uio_hud_memblock_init); MODULE_AUTHOR("Jean-Francois Dagenais"); MODULE_DESCRIPTION("Built-in module to parse the memmap memblock reservation"); MODULE_LICENSE("GPL"); The parsed address and size (uio_hud_memblock_addr/size) are exported for my other non-compiled in module to discover. That module is the real PCI "driver" which simply takes this address and size, and hands it to UIO as a memory map of type UIO_MEM_PHYS. That's pretty much it for the kernel stuff (aside for the trivial interrupt handling). In userspace, I also have a UIO map for the FPGA's BAR0 registers where I instruct the device where the other two physical memory ranges (begin and end addresses and one for it's read ops (1M), one for it's write ops(128M), so 4 physical addresses). The device autonomously updates where it's going to write next (it's "data write addr" register), rolls around when reaching the end and sends me an interrupt for each "data unit" it finishes. The interrupt is forwarded to userspace as described in UIO docs because of a small ISR in my kernel driver. Userspace instructs the device through a "software read addr" register which indicates to FPGA the lowest address which the software still needs (hasn't consumed yet). This is so the autonomous FPGA doesn't overwrite busy memory. As soon as I update the soft read addr, the FPGA can fill that spot again. This way you squeeze out as much as you can out of the architecture as the CPU is only burdonned with consuming the data and updating a pointer. Cheers! /jfd-- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/