Re: RFC: prepare for struct scatterlist entries without page backing
Hi Christoph, On Fri, Aug 14, 2015 at 12:35 AM, Christoph Hellwig wrote: > On Thu, Aug 13, 2015 at 09:37:37AM +1000, Julian Calaby wrote: >> I.e. ~90% of this patch set seems to be just mechanically dropping >> BUG_ON()s and converting open coded stuff to use accessor functions >> (which should be macros or get inlined, right?) - and the remaining >> bit is not flushing if we don't have a physical page somewhere. > > Which is was 90%. By lines changed most actually is the diffs for > the cache flushing. I was talking in terms of changes made, not lines changed: by my recollection, about a third of the patches didn't touch flush calls and most of the lines changed looked like refactoring so that making the flush call conditional would be easier. I guess it smelled like you were doing lots of distinct changes in a single patch and I got my numbers wrong. >> Would it make sense to split this patch set into a few bits: one to >> drop all the useless BUG_ON()s, one to convert all the open coded >> stuff to accessor functions, then another to do the actual page-less >> sg stuff? > > Without the ifs the BUG_ON() actually are useful to assert we > never feed the sort of physical addresses we can't otherwise support, > so I don't think that part is doable. My point is that there's a couple of patches that only remove BUG_ON()s, which implies that for that particular driver it doesn't matter if there's a physical page or not, so therefore that code is purely "documentation". Thanks, -- Julian Calaby Email: julian.cal...@gmail.com Profile: http://www.google.com/profiles/julian.calaby/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFC: prepare for struct scatterlist entries without page backing
On 08/13/2015 05:40 PM, Christoph Hellwig wrote: > On Wed, Aug 12, 2015 at 03:42:47PM +0300, Boaz Harrosh wrote: >> The support I have suggested and submitted for zone-less sections. >> (In my add_persistent_memory() patchset) >> >> Would work perfectly well and transparent for all such multimedia cases. >> (All hacks removed). In fact I have loaded pmem (with-pages) on a VRAM >> a few times and it is great easy fun. (I wanted to experiment with cached >> memory over a pcie) > > And everyone agree that it was both buggy and incomplete. > What? No one ever said anything about bugs. Is the first ever I hear of it. I was always in the notion that no one even tried it out. I'm smoking these page-full nvidimms for more than a year. With RDMA to pears and swap out to disks. So is not that bad I would say > Dan has done a respin of the page backed nvdimm work with most of > these comments addressed. > I would love some comments. All I got so far is silence. (And I do not like Dan's patches comments will come next week) > I have to say I hate both pfn-based I/O [1] and page backed nvdimms with > passion, so we're looking into the lesser evil with an open mind. > > [1] not the SGL part posted here, which I think is quite sane. The bio > side is much worse, though. > What can I say. I like the page-backed nvdimms. And the long term for me is 2M pages. I hope we can sit one day soon and you explain to me whats evil about it. I would really really like to understand Thanks though Boaz -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFC: prepare for struct scatterlist entries without page backing
On Wed, Aug 12, 2015 at 03:42:47PM +0300, Boaz Harrosh wrote: > The support I have suggested and submitted for zone-less sections. > (In my add_persistent_memory() patchset) > > Would work perfectly well and transparent for all such multimedia cases. > (All hacks removed). In fact I have loaded pmem (with-pages) on a VRAM > a few times and it is great easy fun. (I wanted to experiment with cached > memory over a pcie) And everyone agree that it was both buggy and incomplete. Dan has done a respin of the page backed nvdimm work with most of these comments addressed. I have to say I hate both pfn-based I/O [1] and page backed nvdimms with passion, so we're looking into the lesser evil with an open mind. [1] not the SGL part posted here, which I think is quite sane. The bio side is much worse, though. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFC: prepare for struct scatterlist entries without page backing
On Thu, Aug 13, 2015 at 09:37:37AM +1000, Julian Calaby wrote: > I.e. ~90% of this patch set seems to be just mechanically dropping > BUG_ON()s and converting open coded stuff to use accessor functions > (which should be macros or get inlined, right?) - and the remaining > bit is not flushing if we don't have a physical page somewhere. Which is was 90%. By lines changed most actually is the diffs for the cache flushing. > Would it make sense to split this patch set into a few bits: one to > drop all the useless BUG_ON()s, one to convert all the open coded > stuff to accessor functions, then another to do the actual page-less > sg stuff? Without the ifs the BUG_ON() actually are useful to assert we never feed the sort of physical addresses we can't otherwise support, so I don't think that part is doable. A simple series to make more use of sg_phys and add sg_pfn might still be useful, though. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFC: prepare for struct scatterlist entries without page backing
On Wed, Aug 12, 2015 at 03:42:47PM +0300, Boaz Harrosh wrote: The support I have suggested and submitted for zone-less sections. (In my add_persistent_memory() patchset) Would work perfectly well and transparent for all such multimedia cases. (All hacks removed). In fact I have loaded pmem (with-pages) on a VRAM a few times and it is great easy fun. (I wanted to experiment with cached memory over a pcie) And everyone agree that it was both buggy and incomplete. Dan has done a respin of the page backed nvdimm work with most of these comments addressed. I have to say I hate both pfn-based I/O [1] and page backed nvdimms with passion, so we're looking into the lesser evil with an open mind. [1] not the SGL part posted here, which I think is quite sane. The bio side is much worse, though. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFC: prepare for struct scatterlist entries without page backing
On Thu, Aug 13, 2015 at 09:37:37AM +1000, Julian Calaby wrote: I.e. ~90% of this patch set seems to be just mechanically dropping BUG_ON()s and converting open coded stuff to use accessor functions (which should be macros or get inlined, right?) - and the remaining bit is not flushing if we don't have a physical page somewhere. Which is was 90%. By lines changed most actually is the diffs for the cache flushing. Would it make sense to split this patch set into a few bits: one to drop all the useless BUG_ON()s, one to convert all the open coded stuff to accessor functions, then another to do the actual page-less sg stuff? Without the ifs the BUG_ON() actually are useful to assert we never feed the sort of physical addresses we can't otherwise support, so I don't think that part is doable. A simple series to make more use of sg_phys and add sg_pfn might still be useful, though. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFC: prepare for struct scatterlist entries without page backing
Hi Christoph, On Fri, Aug 14, 2015 at 12:35 AM, Christoph Hellwig h...@lst.de wrote: On Thu, Aug 13, 2015 at 09:37:37AM +1000, Julian Calaby wrote: I.e. ~90% of this patch set seems to be just mechanically dropping BUG_ON()s and converting open coded stuff to use accessor functions (which should be macros or get inlined, right?) - and the remaining bit is not flushing if we don't have a physical page somewhere. Which is was 90%. By lines changed most actually is the diffs for the cache flushing. I was talking in terms of changes made, not lines changed: by my recollection, about a third of the patches didn't touch flush calls and most of the lines changed looked like refactoring so that making the flush call conditional would be easier. I guess it smelled like you were doing lots of distinct changes in a single patch and I got my numbers wrong. Would it make sense to split this patch set into a few bits: one to drop all the useless BUG_ON()s, one to convert all the open coded stuff to accessor functions, then another to do the actual page-less sg stuff? Without the ifs the BUG_ON() actually are useful to assert we never feed the sort of physical addresses we can't otherwise support, so I don't think that part is doable. My point is that there's a couple of patches that only remove BUG_ON()s, which implies that for that particular driver it doesn't matter if there's a physical page or not, so therefore that code is purely documentation. Thanks, -- Julian Calaby Email: julian.cal...@gmail.com Profile: http://www.google.com/profiles/julian.calaby/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFC: prepare for struct scatterlist entries without page backing
On 08/13/2015 05:40 PM, Christoph Hellwig wrote: On Wed, Aug 12, 2015 at 03:42:47PM +0300, Boaz Harrosh wrote: The support I have suggested and submitted for zone-less sections. (In my add_persistent_memory() patchset) Would work perfectly well and transparent for all such multimedia cases. (All hacks removed). In fact I have loaded pmem (with-pages) on a VRAM a few times and it is great easy fun. (I wanted to experiment with cached memory over a pcie) And everyone agree that it was both buggy and incomplete. What? No one ever said anything about bugs. Is the first ever I hear of it. I was always in the notion that no one even tried it out. I'm smoking these page-full nvidimms for more than a year. With RDMA to pears and swap out to disks. So is not that bad I would say Dan has done a respin of the page backed nvdimm work with most of these comments addressed. I would love some comments. All I got so far is silence. (And I do not like Dan's patches comments will come next week) I have to say I hate both pfn-based I/O [1] and page backed nvdimms with passion, so we're looking into the lesser evil with an open mind. [1] not the SGL part posted here, which I think is quite sane. The bio side is much worse, though. What can I say. I like the page-backed nvdimms. And the long term for me is 2M pages. I hope we can sit one day soon and you explain to me whats evil about it. I would really really like to understand Thanks though Boaz -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFC: prepare for struct scatterlist entries without page backing
Hi, On Wed, Aug 12, 2015 at 10:42 PM, Boaz Harrosh wrote: > On 08/12/2015 10:05 AM, Christoph Hellwig wrote: >> It turns out most DMA mapping implementation can handle SGLs without >> page structures with some fairly simple mechanical work. Most of it >> is just about consistently using sg_phys. For implementations that >> need to flush caches we need a new helper that skips these cache >> flushes if a entry doesn't have a kernel virtual address. >> >> However the ccio (parisc) and sba_iommu (parisc & ia64) IOMMUs seem >> to be operate mostly on virtual addresses. It's a fairly odd concept >> that I don't fully grasp, so I'll need some help with those if we want >> to bring this forward. >> >> Additional this series skips ARM entirely for now. The reason is >> that most arm implementations of the .map_sg operation just iterate >> over all entries and call ->map_page for it, which means we'd need >> to convert those to a ->map_pfn similar to Dan's previous approach. >> > [snip] > > It is a bit of work but is worth while, and accelerating tremendously > lots of workloads. Not like this abomination which only branches > things more and more, and making things fatter and slower. As a random guy reading a big bunch of patches on code I know almost nothing about, parts of this comment really resonated with me: overall, we seem to be adding a lot of if statements to code that appears to be in a hot path. I.e. ~90% of this patch set seems to be just mechanically dropping BUG_ON()s and converting open coded stuff to use accessor functions (which should be macros or get inlined, right?) - and the remaining bit is not flushing if we don't have a physical page somewhere. Would it make sense to split this patch set into a few bits: one to drop all the useless BUG_ON()s, one to convert all the open coded stuff to accessor functions, then another to do the actual page-less sg stuff? Thanks, -- Julian Calaby Email: julian.cal...@gmail.com Profile: http://www.google.com/profiles/julian.calaby/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFC: prepare for struct scatterlist entries without page backing
On Wed, Aug 12, 2015 at 10:00 AM, James Bottomley wrote: > On Wed, 2015-08-12 at 09:05 +0200, Christoph Hellwig wrote: ... >> However the ccio (parisc) and sba_iommu (parisc & ia64) IOMMUs seem >> to be operate mostly on virtual addresses. It's a fairly odd concept >> that I don't fully grasp, so I'll need some help with those if we want >> to bring this forward. James explained the primary function of IOMMUs on parisc (DMA-Cache coherency) much better than I ever could. Three more observations: 1) the IOMMU can be bypassed by 64-bit DMA devices on IA64. 2) IOMMU enables 32-bit DMA devices to reach > 32-bit physical memory and thus avoiding bounce buffers. parisc and older IA-64 have some 32-bit PCI devices - e.g. IDE boot HDD. 3) IOMMU acts as a proxy for IO devices by fetching cachelines of data for PA-RISC systems whose memory controllers ONLY serve cacheline sized transactions. ie. 32-bit DMA results in the IOMMU fetching the cacheline and updating just the 32-bits in a DMA cache coherent fashion. Bonus thought: 4) IOMMU can improve DMA performance in some cases using "hints" provided by the OS (e.g. prefetching DMA data or using READ_CURRENT bus transactions instead of normal memory fetches.) cheers, grant -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFC: prepare for struct scatterlist entries without page backing
On Wed, 2015-08-12 at 09:05 +0200, Christoph Hellwig wrote: > Dan Williams started to look into addressing I/O to and from > Persistent Memory in his series from June: > > http://thread.gmane.org/gmane.linux.kernel.cross-arch/27944 > > I've started looking into DMA mapping of these SGLs specifically instead > of the map_pfn method in there. In addition to supporting NVDIMM backed > I/O I also suspect this would be highly useful for media drivers that > go through nasty hoops to be able to DMA from/to their ioremapped regions, > with vb2_dc_get_userptr in drivers/media/v4l2-core/videobuf2-dma-contig.c > being a prime example for the unsafe hacks currently used. > > It turns out most DMA mapping implementation can handle SGLs without > page structures with some fairly simple mechanical work. Most of it > is just about consistently using sg_phys. For implementations that > need to flush caches we need a new helper that skips these cache > flushes if a entry doesn't have a kernel virtual address. > > However the ccio (parisc) and sba_iommu (parisc & ia64) IOMMUs seem > to be operate mostly on virtual addresses. It's a fairly odd concept > that I don't fully grasp, so I'll need some help with those if we want > to bring this forward. I can explain that. I think this doesn't apply to ia64 because it's cache is PIPT, but on parisc, we have a VIPT cache. On normal physically indexed architectures, when the iommu sees a DMA transfer to/from physical memory, it also notifies the CPU to flush the internal CPU caches of those lines. This is usually an interlocking step of the transfer to make sure the page is coherent before transfer to/from the device (it's why the ia32 for instance is a coherent architecture). Because the system is physically indexed, there's no need to worry about aliases. On Virtually Indexed systems, like parisc, there is an aliasing problem. The CCIO iommu unit (and all other iommu systems on parisc) have what's called a local coherence index (LCI). You program it as part of the IOMMU page table and it tells the system which Virtual line in the cache to flush as part of the IO transaction, thus still ensuring cache coherence. That's why we have to know the virtual as well as physical addresses for the page. The problem we have in Linux is that we have two virtual addresses, which are often incoherent aliases: the user virtual address and a kernel virtual address but we can only make the page coherent with a single alias (only one LCI). The way I/O on Linux currently works is that get_user_pages actually flushes the user virtual address, so that's expected to be coherent, so the address we program into the VCI is the kernel virtual address. Usually nothing in the kernel has ever touched the page, so there's nothing to flush, but we do it just in case. In theory, for these non kernel page backed SG entries, we can make the process more efficient by not flushing in gup and instead programming the user virtual address into the local coherence index. However, simply zeroing the LCI will also work (except that poor VI zero line will get flushed repeatedly, so it's probably best to pick a known untouched line in the kernel). James -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFC: prepare for struct scatterlist entries without page backing
On 08/12/2015 10:05 AM, Christoph Hellwig wrote: > Dan Williams started to look into addressing I/O to and from > Persistent Memory in his series from June: > > http://thread.gmane.org/gmane.linux.kernel.cross-arch/27944 > > I've started looking into DMA mapping of these SGLs specifically instead > of the map_pfn method in there. In addition to supporting NVDIMM backed > I/O I also suspect this would be highly useful for media drivers that > go through nasty hoops to be able to DMA from/to their ioremapped regions, > with vb2_dc_get_userptr in drivers/media/v4l2-core/videobuf2-dma-contig.c > being a prime example for the unsafe hacks currently used. > The support I have suggested and submitted for zone-less sections. (In my add_persistent_memory() patchset) Would work perfectly well and transparent for all such multimedia cases. (All hacks removed). In fact I have loaded pmem (with-pages) on a VRAM a few times and it is great easy fun. (I wanted to experiment with cached memory over a pcie) > It turns out most DMA mapping implementation can handle SGLs without > page structures with some fairly simple mechanical work. Most of it > is just about consistently using sg_phys. For implementations that > need to flush caches we need a new helper that skips these cache > flushes if a entry doesn't have a kernel virtual address. > > However the ccio (parisc) and sba_iommu (parisc & ia64) IOMMUs seem > to be operate mostly on virtual addresses. It's a fairly odd concept > that I don't fully grasp, so I'll need some help with those if we want > to bring this forward. > > Additional this series skips ARM entirely for now. The reason is > that most arm implementations of the .map_sg operation just iterate > over all entries and call ->map_page for it, which means we'd need > to convert those to a ->map_pfn similar to Dan's previous approach. > All this endless work for nothing more than uglyfing the Kernel, and It will never end. When a real and fully working solution is right here for more then a year. If you are really up for a deep audit and a mammoth testing effort, why not do a more worthy, and order of magnitude smaller work and support 2M and 1G variable sized "pages". All the virtual-vs-phisical-vs-caching just works. Most of the core work is there. Block layer and lots of other subsytems already support sending a single page-pointer with bvec_offset bvec_len bigger then 4K. Other system will be small fixes sprinkled around but not at all this endless stream of subsystem after another of patches. And for why. The novelty of pages is the section object, the section is reached from page* from virtual as well as physical planes. And is a center that translate from all plains to all plains. You keep this concept only make 2M-page sections and 1G-page sections. It is a bit of work but is worth while, and accelerating tremendously lots of workloads. Not like this abomination which only branches things more and more, and making things fatter and slower. It all feels like a typhoon, the inertia of tones and tons of men hours work, in a huge wave. How will you ever stop such a rushing mass. I'm trying to dock under but, surly it makes me sad. Thanks Boaz -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFC: prepare for struct scatterlist entries without page backing
Hi, On Wed, Aug 12, 2015 at 10:42 PM, Boaz Harrosh b...@plexistor.com wrote: On 08/12/2015 10:05 AM, Christoph Hellwig wrote: It turns out most DMA mapping implementation can handle SGLs without page structures with some fairly simple mechanical work. Most of it is just about consistently using sg_phys. For implementations that need to flush caches we need a new helper that skips these cache flushes if a entry doesn't have a kernel virtual address. However the ccio (parisc) and sba_iommu (parisc ia64) IOMMUs seem to be operate mostly on virtual addresses. It's a fairly odd concept that I don't fully grasp, so I'll need some help with those if we want to bring this forward. Additional this series skips ARM entirely for now. The reason is that most arm implementations of the .map_sg operation just iterate over all entries and call -map_page for it, which means we'd need to convert those to a -map_pfn similar to Dan's previous approach. [snip] It is a bit of work but is worth while, and accelerating tremendously lots of workloads. Not like this abomination which only branches things more and more, and making things fatter and slower. As a random guy reading a big bunch of patches on code I know almost nothing about, parts of this comment really resonated with me: overall, we seem to be adding a lot of if statements to code that appears to be in a hot path. I.e. ~90% of this patch set seems to be just mechanically dropping BUG_ON()s and converting open coded stuff to use accessor functions (which should be macros or get inlined, right?) - and the remaining bit is not flushing if we don't have a physical page somewhere. Would it make sense to split this patch set into a few bits: one to drop all the useless BUG_ON()s, one to convert all the open coded stuff to accessor functions, then another to do the actual page-less sg stuff? Thanks, -- Julian Calaby Email: julian.cal...@gmail.com Profile: http://www.google.com/profiles/julian.calaby/ -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFC: prepare for struct scatterlist entries without page backing
On Wed, 2015-08-12 at 09:05 +0200, Christoph Hellwig wrote: Dan Williams started to look into addressing I/O to and from Persistent Memory in his series from June: http://thread.gmane.org/gmane.linux.kernel.cross-arch/27944 I've started looking into DMA mapping of these SGLs specifically instead of the map_pfn method in there. In addition to supporting NVDIMM backed I/O I also suspect this would be highly useful for media drivers that go through nasty hoops to be able to DMA from/to their ioremapped regions, with vb2_dc_get_userptr in drivers/media/v4l2-core/videobuf2-dma-contig.c being a prime example for the unsafe hacks currently used. It turns out most DMA mapping implementation can handle SGLs without page structures with some fairly simple mechanical work. Most of it is just about consistently using sg_phys. For implementations that need to flush caches we need a new helper that skips these cache flushes if a entry doesn't have a kernel virtual address. However the ccio (parisc) and sba_iommu (parisc ia64) IOMMUs seem to be operate mostly on virtual addresses. It's a fairly odd concept that I don't fully grasp, so I'll need some help with those if we want to bring this forward. I can explain that. I think this doesn't apply to ia64 because it's cache is PIPT, but on parisc, we have a VIPT cache. On normal physically indexed architectures, when the iommu sees a DMA transfer to/from physical memory, it also notifies the CPU to flush the internal CPU caches of those lines. This is usually an interlocking step of the transfer to make sure the page is coherent before transfer to/from the device (it's why the ia32 for instance is a coherent architecture). Because the system is physically indexed, there's no need to worry about aliases. On Virtually Indexed systems, like parisc, there is an aliasing problem. The CCIO iommu unit (and all other iommu systems on parisc) have what's called a local coherence index (LCI). You program it as part of the IOMMU page table and it tells the system which Virtual line in the cache to flush as part of the IO transaction, thus still ensuring cache coherence. That's why we have to know the virtual as well as physical addresses for the page. The problem we have in Linux is that we have two virtual addresses, which are often incoherent aliases: the user virtual address and a kernel virtual address but we can only make the page coherent with a single alias (only one LCI). The way I/O on Linux currently works is that get_user_pages actually flushes the user virtual address, so that's expected to be coherent, so the address we program into the VCI is the kernel virtual address. Usually nothing in the kernel has ever touched the page, so there's nothing to flush, but we do it just in case. In theory, for these non kernel page backed SG entries, we can make the process more efficient by not flushing in gup and instead programming the user virtual address into the local coherence index. However, simply zeroing the LCI will also work (except that poor VI zero line will get flushed repeatedly, so it's probably best to pick a known untouched line in the kernel). James -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFC: prepare for struct scatterlist entries without page backing
On 08/12/2015 10:05 AM, Christoph Hellwig wrote: Dan Williams started to look into addressing I/O to and from Persistent Memory in his series from June: http://thread.gmane.org/gmane.linux.kernel.cross-arch/27944 I've started looking into DMA mapping of these SGLs specifically instead of the map_pfn method in there. In addition to supporting NVDIMM backed I/O I also suspect this would be highly useful for media drivers that go through nasty hoops to be able to DMA from/to their ioremapped regions, with vb2_dc_get_userptr in drivers/media/v4l2-core/videobuf2-dma-contig.c being a prime example for the unsafe hacks currently used. The support I have suggested and submitted for zone-less sections. (In my add_persistent_memory() patchset) Would work perfectly well and transparent for all such multimedia cases. (All hacks removed). In fact I have loaded pmem (with-pages) on a VRAM a few times and it is great easy fun. (I wanted to experiment with cached memory over a pcie) It turns out most DMA mapping implementation can handle SGLs without page structures with some fairly simple mechanical work. Most of it is just about consistently using sg_phys. For implementations that need to flush caches we need a new helper that skips these cache flushes if a entry doesn't have a kernel virtual address. However the ccio (parisc) and sba_iommu (parisc ia64) IOMMUs seem to be operate mostly on virtual addresses. It's a fairly odd concept that I don't fully grasp, so I'll need some help with those if we want to bring this forward. Additional this series skips ARM entirely for now. The reason is that most arm implementations of the .map_sg operation just iterate over all entries and call -map_page for it, which means we'd need to convert those to a -map_pfn similar to Dan's previous approach. All this endless work for nothing more than uglyfing the Kernel, and It will never end. When a real and fully working solution is right here for more then a year. If you are really up for a deep audit and a mammoth testing effort, why not do a more worthy, and order of magnitude smaller work and support 2M and 1G variable sized pages. All the virtual-vs-phisical-vs-caching just works. Most of the core work is there. Block layer and lots of other subsytems already support sending a single page-pointer with bvec_offset bvec_len bigger then 4K. Other system will be small fixes sprinkled around but not at all this endless stream of subsystem after another of patches. And for why. The novelty of pages is the section object, the section is reached from page* from virtual as well as physical planes. And is a center that translate from all plains to all plains. You keep this concept only make 2M-page sections and 1G-page sections. It is a bit of work but is worth while, and accelerating tremendously lots of workloads. Not like this abomination which only branches things more and more, and making things fatter and slower. It all feels like a typhoon, the inertia of tones and tons of men hours work, in a huge wave. How will you ever stop such a rushing mass. I'm trying to dock under but, surly it makes me sad. Thanks Boaz -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: RFC: prepare for struct scatterlist entries without page backing
On Wed, Aug 12, 2015 at 10:00 AM, James Bottomley james.bottom...@hansenpartnership.com wrote: On Wed, 2015-08-12 at 09:05 +0200, Christoph Hellwig wrote: ... However the ccio (parisc) and sba_iommu (parisc ia64) IOMMUs seem to be operate mostly on virtual addresses. It's a fairly odd concept that I don't fully grasp, so I'll need some help with those if we want to bring this forward. James explained the primary function of IOMMUs on parisc (DMA-Cache coherency) much better than I ever could. Three more observations: 1) the IOMMU can be bypassed by 64-bit DMA devices on IA64. 2) IOMMU enables 32-bit DMA devices to reach 32-bit physical memory and thus avoiding bounce buffers. parisc and older IA-64 have some 32-bit PCI devices - e.g. IDE boot HDD. 3) IOMMU acts as a proxy for IO devices by fetching cachelines of data for PA-RISC systems whose memory controllers ONLY serve cacheline sized transactions. ie. 32-bit DMA results in the IOMMU fetching the cacheline and updating just the 32-bits in a DMA cache coherent fashion. Bonus thought: 4) IOMMU can improve DMA performance in some cases using hints provided by the OS (e.g. prefetching DMA data or using READ_CURRENT bus transactions instead of normal memory fetches.) cheers, grant -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/