Re: [PATCH] dma-direct: Don't over-decrypt memory
On Fri, 20 May 2022, Robin Murphy wrote: > The original x86 sev_alloc() only called set_memory_decrypted() on > memory returned by alloc_pages_node(), so the page order calculation > fell out of that logic. However, the common dma-direct code has several > potential allocators, not all of which are guaranteed to round up the > underlying allocation to a power-of-two size, so carrying over that > calculation for the encryption/decryption size was a mistake. Fix it by > rounding to a *number* of pages, rather than an order. > > Until recently there was an even worse interaction with DMA_DIRECT_REMAP > where we could have ended up decrypting part of the next adjacent > vmalloc area, only averted by no architecture actually supporting both > configs at once. Don't ask how I found that one out... > > CC: sta...@vger.kernel.org > Fixes: c10f07aa27da ("dma/direct: Handle force decryption for DMA coherent > buffers in common code") > Signed-off-by: Robin Murphy Acked-by: David Rientjes ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 4/4] dma-direct: factor out a helper for DMA_ATTR_NO_KERNEL_MAPPING allocations
On Tue, 19 Oct 2021, Christoph Hellwig wrote: > Split the code for DMA_ATTR_NO_KERNEL_MAPPING allocations into a separate > helper to make dma_direct_alloc a little more readable. > > Signed-off-by: Christoph Hellwig Acked-by: David Rientjes (I think my name got mangled in your To: field on the original emails :) ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/4] dma-direct: leak memory that can't be re-encrypted
On Tue, 19 Oct 2021, Christoph Hellwig wrote: > We must never unencryped memory go back into the general page pool. > So if we fail to set it back to encrypted when freeing DMA memory, leak > the memory insted and warn the user. > > Signed-off-by: Christoph Hellwig > --- > kernel/dma/direct.c | 17 + > 1 file changed, 13 insertions(+), 4 deletions(-) > > diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c > index 96f02248e7fa2..6673f7aebf787 100644 > --- a/kernel/dma/direct.c > +++ b/kernel/dma/direct.c > @@ -264,9 +264,11 @@ void *dma_direct_alloc(struct device *dev, size_t size, > return ret; > > out_encrypt_pages: > - /* If memory cannot be re-encrypted, it must be leaked */ > - if (dma_set_encrypted(dev, page_address(page), size)) > + if (dma_set_encrypted(dev, page_address(page), size)) { > + pr_warn_ratelimited( > + "leaking DMA memory that can't be re-encrypted\n"); > return NULL; > + } > out_free_pages: > __dma_direct_free_pages(dev, page, size); > return NULL; > @@ -305,7 +307,11 @@ void dma_direct_free(struct device *dev, size_t size, > dma_free_from_pool(dev, cpu_addr, PAGE_ALIGN(size))) > return; > > - dma_set_encrypted(dev, cpu_addr, 1 << page_order); > + if (dma_set_encrypted(dev, cpu_addr, 1 << page_order)) { > + pr_warn_ratelimited( > + "leaking DMA memory that can't be re-encrypted\n"); > + return; > + } > > if (IS_ENABLED(CONFIG_DMA_REMAP) && is_vmalloc_addr(cpu_addr)) > vunmap(cpu_addr); > @@ -363,7 +369,10 @@ void dma_direct_free_pages(struct device *dev, size_t > size, > dma_free_from_pool(dev, vaddr, size)) > return; > > - dma_set_encrypted(dev, vaddr, 1 << page_order); > + if (dma_set_encrypted(dev, vaddr, 1 << page_order)) { > + pr_warn_ratelimited( > + "leaking DMA memory that can't be re-encrypted\n"); > + } Is this actually leaking the memory? > __dma_direct_free_pages(dev, page, size); > } > > -- > 2.30.2 > > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 1/4] dma-direct: factor out dma_set_{de,en}crypted helpers
On Tue, 19 Oct 2021, Christoph Hellwig wrote: > Factor out helpers the make dealing with memory encryption a little less > cumbersome. > > Signed-off-by: Christoph Hellwig > --- > kernel/dma/direct.c | 55 + > 1 file changed, 25 insertions(+), 30 deletions(-) > > diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c > index 4c6c5e0635e34..96f02248e7fa2 100644 > --- a/kernel/dma/direct.c > +++ b/kernel/dma/direct.c > @@ -75,6 +75,20 @@ static bool dma_coherent_ok(struct device *dev, > phys_addr_t phys, size_t size) > min_not_zero(dev->coherent_dma_mask, dev->bus_dma_limit); > } > > +static int dma_set_decrypted(struct device *dev, void *vaddr, size_t size) > +{ > + if (!force_dma_unencrypted(dev)) > + return 0; > + return set_memory_decrypted((unsigned long)vaddr, 1 << get_order(size)); > +} > + > +static int dma_set_encrypted(struct device *dev, void *vaddr, size_t size) > +{ > + if (!force_dma_unencrypted(dev)) > + return 0; > + return set_memory_encrypted((unsigned long)vaddr, 1 << get_order(size)); > +} > + > static void __dma_direct_free_pages(struct device *dev, struct page *page, > size_t size) > { > @@ -216,12 +230,7 @@ void *dma_direct_alloc(struct device *dev, size_t size, > __builtin_return_address(0)); > if (!ret) > goto out_free_pages; > - if (force_dma_unencrypted(dev)) { > - err = set_memory_decrypted((unsigned long)ret, > -1 << get_order(size)); > - if (err) > - goto out_free_pages; > - } > + err = dma_set_decrypted(dev, ret, size); Should be if (dma_set_decrypted(dev, ret, size)) goto out_free_pages; ? Otherwise we ignore the value? > memset(ret, 0, size); > goto done; > } > @@ -238,13 +247,9 @@ void *dma_direct_alloc(struct device *dev, size_t size, > } > > ret = page_address(page); > - if (force_dma_unencrypted(dev)) { > - err = set_memory_decrypted((unsigned long)ret, > -1 << get_order(size)); > - if (err) > - goto out_free_pages; > - } > - > + err = dma_set_decrypted(dev, ret, size); > + if (err) > + goto out_free_pages; > memset(ret, 0, size); > > if (IS_ENABLED(CONFIG_ARCH_HAS_DMA_SET_UNCACHED) && > @@ -259,13 +264,9 @@ void *dma_direct_alloc(struct device *dev, size_t size, > return ret; > > out_encrypt_pages: > - if (force_dma_unencrypted(dev)) { > - err = set_memory_encrypted((unsigned long)page_address(page), > -1 << get_order(size)); > - /* If memory cannot be re-encrypted, it must be leaked */ > - if (err) > - return NULL; > - } > + /* If memory cannot be re-encrypted, it must be leaked */ > + if (dma_set_encrypted(dev, page_address(page), size)) > + return NULL; > out_free_pages: > __dma_direct_free_pages(dev, page, size); > return NULL; > @@ -304,8 +305,7 @@ void dma_direct_free(struct device *dev, size_t size, > dma_free_from_pool(dev, cpu_addr, PAGE_ALIGN(size))) > return; > > - if (force_dma_unencrypted(dev)) > - set_memory_encrypted((unsigned long)cpu_addr, 1 << page_order); > + dma_set_encrypted(dev, cpu_addr, 1 << page_order); > > if (IS_ENABLED(CONFIG_DMA_REMAP) && is_vmalloc_addr(cpu_addr)) > vunmap(cpu_addr); > @@ -341,11 +341,8 @@ struct page *dma_direct_alloc_pages(struct device *dev, > size_t size, > } > > ret = page_address(page); > - if (force_dma_unencrypted(dev)) { > - if (set_memory_decrypted((unsigned long)ret, > - 1 << get_order(size))) > - goto out_free_pages; > - } > + if (dma_set_decrypted(dev, ret, size)) > + goto out_free_pages; > memset(ret, 0, size); > *dma_handle = phys_to_dma_direct(dev, page_to_phys(page)); > return page; > @@ -366,9 +363,7 @@ void dma_direct_free_pages(struct device *dev, size_t > size, > dma_free_from_pool(dev, vaddr, size)) > return; > > - if (force_dma_unencrypted(dev)) > - set_memory_encrypted((unsigned long)vaddr, 1 << page_order); > - > + dma_set_encrypted(dev, vaddr, 1 << page_order); > __dma_direct_free_pages(dev, page, size); > } > > -- > 2.30.2 > > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [RFC PATCH v3 1/2] mempinfd: Add new syscall to provide memory pin
On Sun, 7 Feb 2021, Song Bao Hua (Barry Song) wrote: > NUMA balancer is just one of many reasons for page migration. Even one > simple alloc_pages() can cause memory migration in just single NUMA > node or UMA system. > > The other reasons for page migration include but are not limited to: > * memory move due to CMA > * memory move due to huge pages creation > > Hardly we can ask users to disable the COMPACTION, CMA and Huge Page > in the whole system. > What about only for mlocked memory, i.e. disable vm.compact_unevictable_allowed? Adding syscalls is a big deal, we can make a reasonable inference that we'll have to support this forever if it's merged. I haven't seen mention of what other unevictable memory *should* be migratable that would be adversely affected if we disable that sysctl. Maybe that gets you part of the way there and there are some other deficiencies, but it seems like a good start would be to describe how CONFIG_NUMA_BALANCING=n + vm.compact_unevcitable_allowed + mlock() doesn't get you mostly there and then look into what's missing. If it's a very compelling case where there simply are no alternatives, it would make sense. Alternative is to find a more generic way, perhaps in combination with vm.compact_unevictable_allowed, to achieve what you're looking to do that can be useful even beyond your originally intended use case. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: revert scope for 5.8, was Re: dma-pool fixes
On Mon, 3 Aug 2020, Christoph Hellwig wrote: > On Sun, Aug 02, 2020 at 09:14:41PM -0700, David Rientjes wrote: > > Christoph: since we have atomic DMA coherent pools in 5.8 now, recall our > > previous discussions, including Greg KH, regarding backports to stable > > trees (we are interested in 5.4 and 5.6) of this support for the purposes > > of confidential VM users who wish to run their own SEV-enabled guests in > > cloud. > > > > Would you have any objections to backports being offered for this support > > now? We can prepare them immediately. Or, if you would prefer we hold > > off for a while, please let me know and we can delay it until you are more > > comfortable. (For confidential VM users, the ability to run your own > > unmodified stable kernel is a much different security story than a guest > > modified by the cloud provider.) > > Before we start backporting I think we need the two fixes from > the "dma pool fixes" series. Can you start reviewing them? I also > think we should probably wait at least until -rc2 for any fallout > before starting the backport. > Thanks Christoph, this makes perfect sense. I see Nicolas has refreshed the two patches: dma-pool: fix coherent pool allocations for IOMMU mappings dma-pool: Only allocate from CMA when in same memory zone and posted them again today on LKML for review. I'll review those and we'll send a full series of backports for stable consideration for 5.4 LTS and 5.6 around 5.9-rc2 timeframe. Thanks! ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: revert scope for 5.8, was Re: dma-pool fixes
On Sun, 2 Aug 2020, Amit Pundir wrote: > > > > Hi, I found the problematic memory region. It was a memory > > > > chunk reserved/removed in the downstream tree but was > > > > seemingly reserved upstream for different drivers. I failed to > > > > calculate the length of the total region reserved downstream > > > > correctly. And there was still a portion of memory left unmarked, > > > > which I should have marked as reserved in my testing earlier > > > > today. > > > > > > > > Sorry for all the noise and thanks Nicolas, Christoph and David > > > > for your patience. > > > > > > So you'll need to patch the upstream DTS to fix this up? Do you also > > > need my two fixes? What about the Oneplus phones? Can you send a > > > mail with a summary of the status? > > > > Poco's DTS is not upstreamed yet. I have updated it for this fix > > and sent out a newer version for review. > > https://lkml.org/lkml/2020/8/1/184 > > > > I didn't need to try your two add-on fixes. I'll give them a spin > > later today. > > Hi Christoph, > > I see no obvious regressions with your twin dma-pool fixes on my > PocoF1 phone. > > Caleb also confirmed that a similar reserved-memory region fix in his > One Plus 6 device-tree worked for him too. > Thanks very much for the update, Amit. Christoph: since we have atomic DMA coherent pools in 5.8 now, recall our previous discussions, including Greg KH, regarding backports to stable trees (we are interested in 5.4 and 5.6) of this support for the purposes of confidential VM users who wish to run their own SEV-enabled guests in cloud. Would you have any objections to backports being offered for this support now? We can prepare them immediately. Or, if you would prefer we hold off for a while, please let me know and we can delay it until you are more comfortable. (For confidential VM users, the ability to run your own unmodified stable kernel is a much different security story than a guest modified by the cloud provider.) ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: dma-pool fixes
On Fri, 31 Jul 2020, David Rientjes wrote: > > > Hi Nicolas, Christoph, > > > > > > Just out of curiosity, I'm wondering if we can restore the earlier > > > behaviour and make DMA atomic allocation configured thru platform > > > specific device tree instead? > > > > > > Or if you can allow a more hackish approach to restore the earlier > > > logic, using of_machine_is_compatible() just for my device for the > > > time being. Meanwhile I'm checking with other developers running the > > > mainline kernel on sdm845 phones like OnePlus 6/6T, if they see this > > > issue too. > > > > If we don't find a fix for your platform I'm going to send Linus a > > last minute revert this weekend, to stick to the no regressions policy. > > I still hope we can fix the issue for real. > > > > What would be the scope of this potential revert? > To follow-up on this, the introduction of the DMA atomic pools in 5.8 fixes an issue for any AMD SEV enabled guest that has a driver that requires atomic DMA allocations (for us, nvme) because runtime decryption of memory allocated through the DMA API may block. This manifests itself as "sleeping in invalid context" BUGs for any confidential VM user in cloud. I unfortunately don't have Amit's device to be able to independently debug this issue and certainly could not have done a better job at working the bug than Nicolas and Christoph have done so far. I'm as baffled by the results as anybody else. I fully understand the no regressions policy. I'd also ask that we consider that *all* SEV guests are currently broken if they use nvme or any other driver that does atomic DMA allocations. It's an extremely serious issue for cloud. If there is *anything* that I can do to make forward progress on this issue for 5.8, including some of the workarounds above that Amit requested, I'd be very happy to help. Christoph will make the right decision for DMA in 5.8, but I simply wanted to state how critical working SEV guests are to users. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: dma-pool fixes
On Fri, 31 Jul 2020, Christoph Hellwig wrote: > > Hi Nicolas, Christoph, > > > > Just out of curiosity, I'm wondering if we can restore the earlier > > behaviour and make DMA atomic allocation configured thru platform > > specific device tree instead? > > > > Or if you can allow a more hackish approach to restore the earlier > > logic, using of_machine_is_compatible() just for my device for the > > time being. Meanwhile I'm checking with other developers running the > > mainline kernel on sdm845 phones like OnePlus 6/6T, if they see this > > issue too. > > If we don't find a fix for your platform I'm going to send Linus a > last minute revert this weekend, to stick to the no regressions policy. > I still hope we can fix the issue for real. > What would be the scope of this potential revert? ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH 2/4] dma-pool: Get rid of dma_in_atomic_pool()
On Thu, 9 Jul 2020, Nicolas Saenz Julienne wrote: > The function is only used once and can be simplified to a one-liner. > > Signed-off-by: Nicolas Saenz Julienne I'll leave this one to Christoph to decide on. One thing I really liked about hacking around in kernel/dma is the coding style, it really follows "one function does one thing and does it well" even if there is only one caller. dma_in_atomic_pool() was an attempt to follow in those footsteps. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] dma-pool: use single atomic pool for both DMA zones
On Wed, 8 Jul 2020, Christoph Hellwig wrote: > On Wed, Jul 08, 2020 at 06:00:35PM +0200, Nicolas Saenz Julienne wrote: > > On Wed, 2020-07-08 at 17:35 +0200, Christoph Hellwig wrote: > > > On Tue, Jul 07, 2020 at 02:28:04PM +0200, Nicolas Saenz Julienne wrote: > > > > When allocating atomic DMA memory for a device, the dma-pool core > > > > queries __dma_direct_optimal_gfp_mask() to check which atomic pool to > > > > use. It turns out the GFP flag returned is only an optimistic guess. > > > > The pool selected might sometimes live in a zone higher than the > > > > device's view of memory. > > > > > > > > As there isn't a way to grantee a mapping between a device's DMA > > > > constraints and correct GFP flags this unifies both DMA atomic pools. > > > > The resulting pool is allocated in the lower DMA zone available, if any, > > > > so as for devices to always get accessible memory while having the > > > > flexibility of using dma_pool_kernel for the non constrained ones. > > > > > > > > Fixes: c84dc6e68a1d ("dma-pool: add additional coherent pools to map to > > > > gfp > > > > mask") > > > > Reported-by: Jeremy Linton > > > > Suggested-by: Robin Murphy > > > > Signed-off-by: Nicolas Saenz Julienne > > > > > > Hmm, this is not what I expected from the previous thread. I thought > > > we'd just use one dma pool based on runtime available of the zones.. > > > > I may be misunderstanding you, but isn't that going back to how things used > > to > > be before pulling in David Rientjes' work? The benefit of having a > > GFP_KERNEL > > pool is that non-address-constrained devices can get their atomic memory > > there, > > instead of consuming somewhat scarcer low memory. > > Yes, I think we are misunderstanding each other. I don't want to remove > any pool, just make better runtime decisions when to use them. > Just to be extra explicit for the record and for my own understanding: Nicolas, your patch series "dma-pool: Fix atomic pool selection" obsoletes this patch, right? :) ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] dma-pool: Do not allocate pool memory from CMA
On Wed, 8 Jul 2020, Nicolas Saenz Julienne wrote: > There is no guarantee to CMA's placement, so allocating a zone specific > atomic pool from CMA might return memory from a completely different > memory zone. So stop using it. > > Fixes: c84dc6e68a1d ("dma-pool: add additional coherent pools to map to gfp > mask") > Reported-by: Jeremy Linton > Signed-off-by: Nicolas Saenz Julienne Acked-by: David Rientjes Thanks Nicolas! ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [patch] dma-pool: warn when coherent pool is depleted
On Sun, 21 Jun 2020, Guenter Roeck wrote: > > When a DMA coherent pool is depleted, allocation failures may or may not > > get reported in the kernel log depending on the allocator. > > > > The admin does have a workaround, however, by using coherent_pool= on the > > kernel command line. > > > > Provide some guidance on the failure and a recommended minimum size for > > the pools (double the size). > > > > Signed-off-by: David Rientjes > > Tested-by: Guenter Roeck > > Also confirmed that coherent_pool=256k works around the crash > I had observed. > Thanks Guenter. Christoph, does it make sense to apply this patch since there may not be an artifact left behind in the kernel log on allocation failure by the caller? > Guenter > > > --- > > kernel/dma/pool.c | 6 +- > > 1 file changed, 5 insertions(+), 1 deletion(-) > > > > diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c > > --- a/kernel/dma/pool.c > > +++ b/kernel/dma/pool.c > > @@ -239,12 +239,16 @@ void *dma_alloc_from_pool(struct device *dev, size_t > > size, > > } > > > > val = gen_pool_alloc(pool, size); > > - if (val) { > > + if (likely(val)) { > > phys_addr_t phys = gen_pool_virt_to_phys(pool, val); > > > > *ret_page = pfn_to_page(__phys_to_pfn(phys)); > > ptr = (void *)val; > > memset(ptr, 0, size); > > + } else { > > + WARN_ONCE(1, "DMA coherent pool depleted, increase size " > > +"(recommended min coherent_pool=%zuK)\n", > > + gen_pool_size(pool) >> 9); > > } > > if (gen_pool_avail(pool) < atomic_pool_size) > > schedule_work(&atomic_pool_work); > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch] dma-pool: warn when coherent pool is depleted
When a DMA coherent pool is depleted, allocation failures may or may not get reported in the kernel log depending on the allocator. The admin does have a workaround, however, by using coherent_pool= on the kernel command line. Provide some guidance on the failure and a recommended minimum size for the pools (double the size). Signed-off-by: David Rientjes --- kernel/dma/pool.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c --- a/kernel/dma/pool.c +++ b/kernel/dma/pool.c @@ -239,12 +239,16 @@ void *dma_alloc_from_pool(struct device *dev, size_t size, } val = gen_pool_alloc(pool, size); - if (val) { + if (likely(val)) { phys_addr_t phys = gen_pool_virt_to_phys(pool, val); *ret_page = pfn_to_page(__phys_to_pfn(phys)); ptr = (void *)val; memset(ptr, 0, size); + } else { + WARN_ONCE(1, "DMA coherent pool depleted, increase size " +"(recommended min coherent_pool=%zuK)\n", + gen_pool_size(pool) >> 9); } if (gen_pool_avail(pool) < atomic_pool_size) schedule_work(&atomic_pool_work); ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2] dma-pool: Fix too large DMA pools on medium systems
On Sun, 21 Jun 2020, Guenter Roeck wrote: > >> This patch results in a boot failure in some of my powerpc boot tests, > >> specifically those testing boots from mptsas1068 devices. Error message: > >> > >> mptsas :00:02.0: enabling device ( -> 0002) > >> mptbase: ioc0: Initiating bringup > >> ioc0: LSISAS1068 A0: Capabilities={Initiator} > >> mptbase: ioc0: ERROR - Unable to allocate Reply, Request, Chain Buffers! > >> mptbase: ioc0: ERROR - didn't initialize properly! (-3) > >> mptsas: probe of :00:02.0 failed with error -3 > >> > >> Configuration is bamboo:44x/bamboo_defconfig plus various added drivers. > >> Qemu command line is > >> > >> qemu-system-ppc -kernel vmlinux -M bamboo \ > >> -m 256 -no-reboot -snapshot -device mptsas1068,id=scsi \ > >> -device scsi-hd,bus=scsi.0,drive=d0,wwn=0x5000c50015ea71ac -drive \ > >> file=rootfs.ext2,format=raw,if=none,id=d0 \ > >> --append "panic=-1 slub_debug=FZPUA root=/dev/sda mem=256M > >> console=ttyS0" \ > >> -monitor none -nographic > >> > >> canyonlands_defconfig with sam460ex machine and otherwise similar command > >> line > >> fails as well. > >> > >> Reverting this patch fixes the problem. > > > > This looks like the minimum value of 128 KiB is not sufficient, and the > > bug is in the intention of 1d659236fb43c4d2 ("dma-pool: scale the > > default DMA coherent pool size with memory capacity")? > > Before, there was a single pool of (fixed) 256 KiB size, now there are > > up to three coherent pools (DMA, DMA32, and kernel), albeit of smaller > > size (128 KiB each). > > > > Can you print the requested size in drivers/message/fusion/mptbase.c: > > PrimeIocFifos()? > > 172928 bytes > > > Does replacing all SZ_128K by SZ_256K in my patch help? > > Yes, it does. > The new coherent pools should auto expand when they are close to being depleted but there's no guarantee that it can be done fast enough. Switching the min size to be the previous min size (256KB) seems like the best option and it matches what Documentation/admin-guide/kernel-parameters.txt still stays. I'll also send a patch to point in the right direction when this happens. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: kernel BUG at mm/huge_memory.c:2613!
On Fri, 19 Jun 2020, Roman Gushchin wrote: > > > [ 40.287524] BUG: unable to handle page fault for address: > > > a77b833df000 > > > [ 40.287529] #PF: supervisor write access in kernel mode > > > [ 40.287531] #PF: error_code(0x000b) - reserved bit violation > > > [ 40.287532] PGD 40d14e067 P4D 40d14e067 PUD 40d14f067 PMD 3ec54d067 > > > PTE 80001688033d9163 > > > [ 40.287538] Oops: 000b [#2] SMP NOPTI > > > [ 40.287542] CPU: 9 PID: 1986 Comm: pulseaudio Tainted: G D > > > 5.8.0-rc1+ #697 > > > [ 40.287544] Hardware name: Gigabyte Technology Co., Ltd. > > > AB350-Gaming/AB350-Gaming-CF, BIOS F25 01/16/2019 > > > [ 40.287550] RIP: 0010:__memset+0x24/0x30 > > > [ 40.287553] Code: cc cc cc cc cc cc 0f 1f 44 00 00 49 89 f9 48 89 > > > d1 83 e2 07 48 c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 > > > 0f af c6 48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 > > > d1 f3 > > > [ 40.287556] RSP: 0018:a77b827a7e08 EFLAGS: 00010216 > > > [ 40.287558] RAX: RBX: 90f77dced800 RCX: > > > 08a0 > > > [ 40.287560] RDX: RSI: RDI: > > > a77b833df000 > > > [ 40.287561] RBP: 90f7898c7000 R08: 90f78c507768 R09: > > > a77b833df000 > > > [ 40.287563] R10: a77b833df000 R11: 90f7839f2d40 R12: > > > > > > [ 40.287564] R13: 90f76a802e00 R14: c0cb8880 R15: > > > 90f770f4e700 > > > [ 40.287567] FS: 7f3d8e8df880() GS:90f78ee4() > > > knlGS: > > > [ 40.287569] CS: 0010 DS: ES: CR0: 80050033 > > > [ 40.287570] CR2: a77b833df000 CR3: 0003fa556000 CR4: > > > 003406e0 > > > [ 40.287572] Call Trace: > > > [ 40.287584] snd_pcm_hw_params+0x3fd/0x490 [snd_pcm] > > > [ 40.287593] snd_pcm_common_ioctl+0x1c5/0x1110 [snd_pcm] > > > [ 40.287601] ? snd_pcm_info_user+0x64/0x80 [snd_pcm] > > > [ 40.287608] snd_pcm_ioctl+0x23/0x30 [snd_pcm] > > > [ 40.287613] ksys_ioctl+0x82/0xc0 > > > [ 40.287617] __x64_sys_ioctl+0x16/0x20 > > > [ 40.287622] do_syscall_64+0x4d/0x90 > > > [ 40.287627] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > > > > Hi Roman, > > > > If you have CONFIG_AMD_MEM_ENCRYPT set, this should be resolved by > > > > commit dbed452a078d56bc7f1abecc3edd6a75e8e4484e > > Author: David Rientjes > > Date: Thu Jun 11 00:25:57 2020 -0700 > > > > dma-pool: decouple DMA_REMAP from DMA_COHERENT_POOL > > > > Or you might want to wait for 5.8-rc2 instead which includes this fix. > > > > Hello, David! > > Thank you for pointing at it! Unfortunately, there must be something wrong > with drivers, your patch didn't help much. I still see the same stacktrace. > Wow, I'm very surprised. Do you have CONFIG_AMD_MEM_ENCRYPT enabled? Adding Takashi. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: kernel BUG at mm/huge_memory.c:2613!
On Fri, 19 Jun 2020, Roman Gushchin wrote: > [ 40.287524] BUG: unable to handle page fault for address: a77b833df000 > [ 40.287529] #PF: supervisor write access in kernel mode > [ 40.287531] #PF: error_code(0x000b) - reserved bit violation > [ 40.287532] PGD 40d14e067 P4D 40d14e067 PUD 40d14f067 PMD 3ec54d067 > PTE 80001688033d9163 > [ 40.287538] Oops: 000b [#2] SMP NOPTI > [ 40.287542] CPU: 9 PID: 1986 Comm: pulseaudio Tainted: G D > 5.8.0-rc1+ #697 > [ 40.287544] Hardware name: Gigabyte Technology Co., Ltd. > AB350-Gaming/AB350-Gaming-CF, BIOS F25 01/16/2019 > [ 40.287550] RIP: 0010:__memset+0x24/0x30 > [ 40.287553] Code: cc cc cc cc cc cc 0f 1f 44 00 00 49 89 f9 48 89 > d1 83 e2 07 48 c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48 > 0f af c6 48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89 > d1 f3 > [ 40.287556] RSP: 0018:a77b827a7e08 EFLAGS: 00010216 > [ 40.287558] RAX: RBX: 90f77dced800 RCX: > 08a0 > [ 40.287560] RDX: RSI: RDI: > a77b833df000 > [ 40.287561] RBP: 90f7898c7000 R08: 90f78c507768 R09: > a77b833df000 > [ 40.287563] R10: a77b833df000 R11: 90f7839f2d40 R12: > > [ 40.287564] R13: 90f76a802e00 R14: c0cb8880 R15: > 90f770f4e700 > [ 40.287567] FS: 7f3d8e8df880() GS:90f78ee4() > knlGS: > [ 40.287569] CS: 0010 DS: ES: CR0: 80050033 > [ 40.287570] CR2: a77b833df000 CR3: 0003fa556000 CR4: > 003406e0 > [ 40.287572] Call Trace: > [ 40.287584] snd_pcm_hw_params+0x3fd/0x490 [snd_pcm] > [ 40.287593] snd_pcm_common_ioctl+0x1c5/0x1110 [snd_pcm] > [ 40.287601] ? snd_pcm_info_user+0x64/0x80 [snd_pcm] > [ 40.287608] snd_pcm_ioctl+0x23/0x30 [snd_pcm] > [ 40.287613] ksys_ioctl+0x82/0xc0 > [ 40.287617] __x64_sys_ioctl+0x16/0x20 > [ 40.287622] do_syscall_64+0x4d/0x90 > [ 40.287627] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Hi Roman, If you have CONFIG_AMD_MEM_ENCRYPT set, this should be resolved by commit dbed452a078d56bc7f1abecc3edd6a75e8e4484e Author: David Rientjes Date: Thu Jun 11 00:25:57 2020 -0700 dma-pool: decouple DMA_REMAP from DMA_COHERENT_POOL Or you might want to wait for 5.8-rc2 instead which includes this fix. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH] dma-mapping: DMA_COHERENT_POOL should select GENERIC_ALLOCATOR
On Thu, 18 Jun 2020, Christoph Hellwig wrote: > The dma coherent pool code needs genalloc. Move the select over > from DMA_REMAP, which doesn't actually need it. > > Fixes: dbed452a078d ("dma-pool: decouple DMA_REMAP from DMA_COHERENT_POOL") > Reported-by: kernel test robot Acked-by: David Rientjes Thanks Christoph. In the initial bug report from Alex Xu, his .config set CONFIG_GENERIC_ALLOCATOR already so I think there is very little risk with this patch. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch for-5.8 1/4] dma-direct: always align allocation size in dma_direct_alloc_pages()
dma_alloc_contiguous() does size >> PAGE_SHIFT and set_memory_decrypted() works at page granularity. It's necessary to page align the allocation size in dma_direct_alloc_pages() for consistent behavior. This also fixes an issue when arch_dma_prep_coherent() is called on an unaligned allocation size for dma_alloc_need_uncached() when CONFIG_DMA_DIRECT_REMAP is disabled but CONFIG_ARCH_HAS_DMA_SET_UNCACHED is enabled. Cc: sta...@vger.kernel.org Signed-off-by: David Rientjes --- kernel/dma/direct.c | 17 ++--- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -112,11 +112,12 @@ static inline bool dma_should_free_from_pool(struct device *dev, struct page *__dma_direct_alloc_pages(struct device *dev, size_t size, gfp_t gfp, unsigned long attrs) { - size_t alloc_size = PAGE_ALIGN(size); int node = dev_to_node(dev); struct page *page = NULL; u64 phys_limit; + VM_BUG_ON(!PAGE_ALIGNED(size)); + if (attrs & DMA_ATTR_NO_WARN) gfp |= __GFP_NOWARN; @@ -124,14 +125,14 @@ struct page *__dma_direct_alloc_pages(struct device *dev, size_t size, gfp &= ~__GFP_ZERO; gfp |= dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask, &phys_limit); - page = dma_alloc_contiguous(dev, alloc_size, gfp); + page = dma_alloc_contiguous(dev, size, gfp); if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) { - dma_free_contiguous(dev, page, alloc_size); + dma_free_contiguous(dev, page, size); page = NULL; } again: if (!page) - page = alloc_pages_node(node, gfp, get_order(alloc_size)); + page = alloc_pages_node(node, gfp, get_order(size)); if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) { dma_free_contiguous(dev, page, size); page = NULL; @@ -158,8 +159,10 @@ void *dma_direct_alloc_pages(struct device *dev, size_t size, struct page *page; void *ret; + size = PAGE_ALIGN(size); + if (dma_should_alloc_from_pool(dev, gfp, attrs)) { - ret = dma_alloc_from_pool(dev, PAGE_ALIGN(size), &page, gfp); + ret = dma_alloc_from_pool(dev, size, &page, gfp); if (!ret) return NULL; goto done; @@ -183,10 +186,10 @@ void *dma_direct_alloc_pages(struct device *dev, size_t size, dma_alloc_need_uncached(dev, attrs)) || (IS_ENABLED(CONFIG_DMA_REMAP) && PageHighMem(page))) { /* remove any dirty cache lines on the kernel alias */ - arch_dma_prep_coherent(page, PAGE_ALIGN(size)); + arch_dma_prep_coherent(page, size); /* create a coherent mapping */ - ret = dma_common_contiguous_remap(page, PAGE_ALIGN(size), + ret = dma_common_contiguous_remap(page, size, dma_pgprot(dev, PAGE_KERNEL, attrs), __builtin_return_address(0)); if (!ret) ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch for-5.8 4/4] dma-direct: add missing set_memory_decrypted() for coherent mapping
When a coherent mapping is created in dma_direct_alloc_pages(), it needs to be decrypted if the device requires unencrypted DMA before returning. Fixes: 3acac065508f ("dma-mapping: merge the generic remapping helpers into dma-direct") Cc: sta...@vger.kernel.org # 5.5+ Signed-off-by: David Rientjes --- kernel/dma/direct.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -195,6 +195,12 @@ void *dma_direct_alloc_pages(struct device *dev, size_t size, __builtin_return_address(0)); if (!ret) goto out_free_pages; + if (force_dma_unencrypted(dev)) { + err = set_memory_decrypted((unsigned long)ret, + 1 << get_order(size)); + if (err) + goto out_free_pages; + } memset(ret, 0, size); goto done; } ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch for-5.8 2/4] dma-direct: re-encrypt memory if dma_direct_alloc_pages() fails
If arch_dma_set_uncached() fails after memory has been decrypted, it needs to be re-encrypted before freeing. Fixes: fa7e2247c572 ("dma-direct: make uncached_kernel_address more general") Cc: sta...@vger.kernel.org # 5.7 Signed-off-by: David Rientjes --- kernel/dma/direct.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -220,7 +220,7 @@ void *dma_direct_alloc_pages(struct device *dev, size_t size, arch_dma_prep_coherent(page, size); ret = arch_dma_set_uncached(ret, size); if (IS_ERR(ret)) - goto out_free_pages; + goto out_encrypt_pages; } done: if (force_dma_unencrypted(dev)) @@ -228,6 +228,10 @@ void *dma_direct_alloc_pages(struct device *dev, size_t size, else *dma_handle = phys_to_dma(dev, page_to_phys(page)); return ret; +out_encrypt_pages: + if (force_dma_unencrypted(dev)) + set_memory_encrypted((unsigned long)page_address(page), +1 << get_order(size)); out_free_pages: dma_free_contiguous(dev, page, size); return NULL; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch for-5.8 0/4] dma-direct: dma_direct_alloc_pages() fixes for AMD SEV
While debugging recently reported issues concerning DMA allocation practices when CONFIG_AMD_MEM_ENCRYPT is enabled, some curiosities arose when looking at dma_direct_alloc_pages() behavior. Fix these up. These are likely all stable material, so proposing for 5.8. --- kernel/dma/direct.c | 42 -- 1 file changed, 32 insertions(+), 10 deletions(-) ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch for-5.8 3/4] dma-direct: check return value when encrypting or decrypting memory
__change_page_attr() can fail which will cause set_memory_encrypted() and set_memory_decrypted() to return non-zero. If the device requires unencrypted DMA memory and decryption fails, simply free the memory and fail. If attempting to re-encrypt in the failure path and that encryption fails, there is no alternative other than to leak the memory. Fixes: c10f07aa27da ("dma/direct: Handle force decryption for DMA coherent buffers in common code") Cc: sta...@vger.kernel.org # 4.17+ Signed-off-by: David Rientjes --- kernel/dma/direct.c | 19 ++- 1 file changed, 14 insertions(+), 5 deletions(-) diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -158,6 +158,7 @@ void *dma_direct_alloc_pages(struct device *dev, size_t size, { struct page *page; void *ret; + int err; size = PAGE_ALIGN(size); @@ -210,8 +211,12 @@ void *dma_direct_alloc_pages(struct device *dev, size_t size, } ret = page_address(page); - if (force_dma_unencrypted(dev)) - set_memory_decrypted((unsigned long)ret, 1 << get_order(size)); + if (force_dma_unencrypted(dev)) { + err = set_memory_decrypted((unsigned long)ret, + 1 << get_order(size)); + if (err) + goto out_free_pages; + } memset(ret, 0, size); @@ -229,9 +234,13 @@ void *dma_direct_alloc_pages(struct device *dev, size_t size, *dma_handle = phys_to_dma(dev, page_to_phys(page)); return ret; out_encrypt_pages: - if (force_dma_unencrypted(dev)) - set_memory_encrypted((unsigned long)page_address(page), -1 << get_order(size)); + if (force_dma_unencrypted(dev)) { + err = set_memory_encrypted((unsigned long)page_address(page), + 1 << get_order(size)); + /* If memory cannot be re-encrypted, it must be leaked */ + if (err) + return NULL; + } out_free_pages: dma_free_contiguous(dev, page, size); return NULL; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v2] dma-pool: Fix too large DMA pools on medium systems
On Mon, 8 Jun 2020, Geert Uytterhoeven wrote: > On systems with at least 32 MiB, but less than 32 GiB of RAM, the DMA > memory pools are much larger than intended (e.g. 2 MiB instead of 128 > KiB on a 256 MiB system). > > Fix this by correcting the calculation of the number of GiBs of RAM in > the system. Invert the order of the min/max operations, to keep on > calculating in pages until the last step, which aids readability. > > Fixes: 1d659236fb43c4d2 ("dma-pool: scale the default DMA coherent pool size > with memory capacity") > Signed-off-by: Geert Uytterhoeven This works as well and is much more readable. Thanks Geert! Acked-by: David Rientjes ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [patch 0/7] unencrypted atomic DMA pools with dynamic expansion
On Fri, 17 Apr 2020, Christoph Hellwig wrote: > So modulo a few comments that I can fix up myself this looks good. Unless > you want to resend for some reason I'm ready to pick this up once I open > the dma-mapping tree after -rc2. > Yes, please do, and thanks to both you and Thomas for the guidance and code reviews. Once these patches take on their final form in your branch, how supportive would you be of stable backports going back to 4.19 LTS? There have been several changes to this area over time, so there are varying levels of rework that need to be done for each stable kernel back to 4.19. But I'd be happy to do that work if you are receptive to it. For rationale, without these fixes, all SEV enabled guests warn of blocking in rcu read side critical sections when using drivers that allocate atomically though the DMA API that calls set_memory_decrypted(). Users can see warnings such as these in the guest: BUG: sleeping function called from invalid context at mm/vmalloc.c:1710 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 3383, name: fio 2 locks held by fio/3383: #0: 93b6a8568348 (&sb->s_type->i_mutex_key#16){+.+.}, at: ext4_file_write_iter+0xa2/0x5d0 #1: a52a61a0 (rcu_read_lock){}, at: hctx_lock+0x1a/0xe0 CPU: 0 PID: 3383 Comm: fio Tainted: GW 5.5.10 #14 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: dump_stack+0x98/0xd5 ___might_sleep+0x175/0x260 __might_sleep+0x4a/0x80 _vm_unmap_aliases+0x45/0x250 vm_unmap_aliases+0x19/0x20 __set_memory_enc_dec+0xa4/0x130 set_memory_decrypted+0x10/0x20 dma_direct_alloc_pages+0x148/0x150 dma_direct_alloc+0xe/0x10 dma_alloc_attrs+0x86/0xc0 dma_pool_alloc+0x16f/0x2b0 nvme_queue_rq+0x878/0xc30 [nvme] __blk_mq_try_issue_directly+0x135/0x200 blk_mq_request_issue_directly+0x4f/0x80 blk_mq_try_issue_list_directly+0x46/0xb0 blk_mq_sched_insert_requests+0x19b/0x2b0 blk_mq_flush_plug_list+0x22f/0x3b0 blk_flush_plug_list+0xd1/0x100 blk_finish_plug+0x2c/0x40 iomap_dio_rw+0x427/0x490 ext4_file_write_iter+0x181/0x5d0 aio_write+0x109/0x1b0 io_submit_one+0x7d0/0xfa0 __x64_sys_io_submit+0xa2/0x280 do_syscall_64+0x5f/0x250 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch 4/7] dma-direct: atomic allocations must come from atomic coherent pools
When a device requires unencrypted memory and the context does not allow blocking, memory must be returned from the atomic coherent pools. This avoids the remap when CONFIG_DMA_DIRECT_REMAP is not enabled and the config only requires CONFIG_DMA_COHERENT_POOL. This will be used for CONFIG_AMD_MEM_ENCRYPT in a subsequent patch. Keep all memory in these pools unencrypted. When set_memory_decrypted() fails, this prohibits the memory from being added. If adding memory to the genpool fails, and set_memory_encrypted() subsequently fails, there is no alternative other than leaking the memory. Signed-off-by: David Rientjes --- kernel/dma/direct.c | 46 ++--- kernel/dma/pool.c | 27 +++--- 2 files changed, 63 insertions(+), 10 deletions(-) diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c index a834ee22f8ff..07ecc5c4d134 100644 --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -76,6 +76,39 @@ static bool dma_coherent_ok(struct device *dev, phys_addr_t phys, size_t size) min_not_zero(dev->coherent_dma_mask, dev->bus_dma_limit); } +/* + * Decrypting memory is allowed to block, so if this device requires + * unencrypted memory it must come from atomic pools. + */ +static inline bool dma_should_alloc_from_pool(struct device *dev, gfp_t gfp, + unsigned long attrs) +{ + if (!IS_ENABLED(CONFIG_DMA_COHERENT_POOL)) + return false; + if (gfpflags_allow_blocking(gfp)) + return false; + if (force_dma_unencrypted(dev)) + return true; + if (!IS_ENABLED(CONFIG_DMA_DIRECT_REMAP)) + return false; + if (dma_alloc_need_uncached(dev, attrs)) + return true; + return false; +} + +static inline bool dma_should_free_from_pool(struct device *dev, +unsigned long attrs) +{ + if (IS_ENABLED(CONFIG_DMA_COHERENT_POOL)) + return true; + if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) && + !force_dma_unencrypted(dev)) + return false; + if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP)) + return true; + return false; +} + struct page *__dma_direct_alloc_pages(struct device *dev, size_t size, gfp_t gfp, unsigned long attrs) { @@ -125,9 +158,7 @@ void *dma_direct_alloc_pages(struct device *dev, size_t size, struct page *page; void *ret; - if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && - dma_alloc_need_uncached(dev, attrs) && - !gfpflags_allow_blocking(gfp)) { + if (dma_should_alloc_from_pool(dev, gfp, attrs)) { ret = dma_alloc_from_pool(dev, PAGE_ALIGN(size), &page, gfp); if (!ret) return NULL; @@ -204,6 +235,11 @@ void dma_direct_free_pages(struct device *dev, size_t size, void *cpu_addr, { unsigned int page_order = get_order(size); + /* If cpu_addr is not from an atomic pool, dma_free_from_pool() fails */ + if (dma_should_free_from_pool(dev, attrs) && + dma_free_from_pool(dev, cpu_addr, PAGE_ALIGN(size))) + return; + if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) && !force_dma_unencrypted(dev)) { /* cpu_addr is a struct page cookie, not a kernel address */ @@ -211,10 +247,6 @@ void dma_direct_free_pages(struct device *dev, size_t size, void *cpu_addr, return; } - if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && - dma_free_from_pool(dev, cpu_addr, PAGE_ALIGN(size))) - return; - if (force_dma_unencrypted(dev)) set_memory_encrypted((unsigned long)cpu_addr, 1 << page_order); diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c index 9e2da17ed17b..cf052314d9e4 100644 --- a/kernel/dma/pool.c +++ b/kernel/dma/pool.c @@ -7,6 +7,7 @@ #include #include #include +#include #include #include #include @@ -53,22 +54,42 @@ static int atomic_pool_expand(struct gen_pool *pool, size_t pool_size, arch_dma_prep_coherent(page, pool_size); +#ifdef CONFIG_DMA_DIRECT_REMAP addr = dma_common_contiguous_remap(page, pool_size, pgprot_dmacoherent(PAGE_KERNEL), __builtin_return_address(0)); if (!addr) goto free_page; - +#else + addr = page_to_virt(page); +#endif + /* +* Memory in the atomic DMA pools must be unencrypted, the pools do not +* shrink so no re-encryption occurs in dma_direct_free_pages(). +*/ + ret = set_memory_decrypted((unsigned long)page_to_virt(page), + 1 << order); + if (ret) + goto remove_mapping; ret = gen_pool_add_virt(pool, (unsigned long)addr, page_to_phys(page),
[patch 1/7] dma-remap: separate DMA atomic pools from direct remap code
DMA atomic pools will be needed beyond only CONFIG_DMA_DIRECT_REMAP so separate them out into their own file. This also adds a new Kconfig option that can be subsequently used for options, such as CONFIG_AMD_MEM_ENCRYPT, that will utilize the coherent pools but do not have a dependency on direct remapping. For this patch alone, there is no functional change introduced. Reviewed-by: Christoph Hellwig Signed-off-by: David Rientjes --- kernel/dma/Kconfig | 6 ++- kernel/dma/Makefile | 1 + kernel/dma/pool.c | 123 kernel/dma/remap.c | 114 4 files changed, 129 insertions(+), 115 deletions(-) create mode 100644 kernel/dma/pool.c diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig index 4c103a24e380..d006668c0027 100644 --- a/kernel/dma/Kconfig +++ b/kernel/dma/Kconfig @@ -79,10 +79,14 @@ config DMA_REMAP select DMA_NONCOHERENT_MMAP bool -config DMA_DIRECT_REMAP +config DMA_COHERENT_POOL bool select DMA_REMAP +config DMA_DIRECT_REMAP + bool + select DMA_COHERENT_POOL + config DMA_CMA bool "DMA Contiguous Memory Allocator" depends on HAVE_DMA_CONTIGUOUS && CMA diff --git a/kernel/dma/Makefile b/kernel/dma/Makefile index d237cf3dc181..370f63344e9c 100644 --- a/kernel/dma/Makefile +++ b/kernel/dma/Makefile @@ -6,4 +6,5 @@ obj-$(CONFIG_DMA_DECLARE_COHERENT) += coherent.o obj-$(CONFIG_DMA_VIRT_OPS) += virt.o obj-$(CONFIG_DMA_API_DEBUG)+= debug.o obj-$(CONFIG_SWIOTLB) += swiotlb.o +obj-$(CONFIG_DMA_COHERENT_POOL)+= pool.o obj-$(CONFIG_DMA_REMAP)+= remap.o diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c new file mode 100644 index ..6612c2d51d3c --- /dev/null +++ b/kernel/dma/pool.c @@ -0,0 +1,123 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2020 Google LLC + */ +#include +#include +#include +#include +#include +#include +#include + +static struct gen_pool *atomic_pool __ro_after_init; + +#define DEFAULT_DMA_COHERENT_POOL_SIZE SZ_256K +static size_t atomic_pool_size __initdata = DEFAULT_DMA_COHERENT_POOL_SIZE; + +static int __init early_coherent_pool(char *p) +{ + atomic_pool_size = memparse(p, &p); + return 0; +} +early_param("coherent_pool", early_coherent_pool); + +static gfp_t dma_atomic_pool_gfp(void) +{ + if (IS_ENABLED(CONFIG_ZONE_DMA)) + return GFP_DMA; + if (IS_ENABLED(CONFIG_ZONE_DMA32)) + return GFP_DMA32; + return GFP_KERNEL; +} + +static int __init dma_atomic_pool_init(void) +{ + unsigned int pool_size_order = get_order(atomic_pool_size); + unsigned long nr_pages = atomic_pool_size >> PAGE_SHIFT; + struct page *page; + void *addr; + int ret; + + if (dev_get_cma_area(NULL)) + page = dma_alloc_from_contiguous(NULL, nr_pages, +pool_size_order, false); + else + page = alloc_pages(dma_atomic_pool_gfp(), pool_size_order); + if (!page) + goto out; + + arch_dma_prep_coherent(page, atomic_pool_size); + + atomic_pool = gen_pool_create(PAGE_SHIFT, -1); + if (!atomic_pool) + goto free_page; + + addr = dma_common_contiguous_remap(page, atomic_pool_size, + pgprot_dmacoherent(PAGE_KERNEL), + __builtin_return_address(0)); + if (!addr) + goto destroy_genpool; + + ret = gen_pool_add_virt(atomic_pool, (unsigned long)addr, + page_to_phys(page), atomic_pool_size, -1); + if (ret) + goto remove_mapping; + gen_pool_set_algo(atomic_pool, gen_pool_first_fit_order_align, NULL); + + pr_info("DMA: preallocated %zu KiB pool for atomic allocations\n", + atomic_pool_size / 1024); + return 0; + +remove_mapping: + dma_common_free_remap(addr, atomic_pool_size); +destroy_genpool: + gen_pool_destroy(atomic_pool); + atomic_pool = NULL; +free_page: + if (!dma_release_from_contiguous(NULL, page, nr_pages)) + __free_pages(page, pool_size_order); +out: + pr_err("DMA: failed to allocate %zu KiB pool for atomic coherent allocation\n", + atomic_pool_size / 1024); + return -ENOMEM; +} +postcore_initcall(dma_atomic_pool_init); + +bool dma_in_atomic_pool(void *start, size_t size) +{ + if (unlikely(!atomic_pool)) + return false; + + return gen_pool_has_addr(atomic_pool, (unsigned long)start, size); +} + +void *dma_alloc_from_pool(size_t size, struct page **ret_page, gfp_t flags) +{ + unsigned long val; + void *ptr = NULL; + + if (!atomic_pool) { + WARN(1, "coherent pool not initialised!\n"); + return NULL; +
[patch 3/7] dma-pool: dynamically expanding atomic pools
When an atomic pool becomes fully depleted because it is now relied upon for all non-blocking allocations through the DMA API, allow background expansion of each pool by a kworker. When an atomic pool has less than the default size of memory left, kick off a kworker to dynamically expand the pool in the background. The pool is doubled in size, up to MAX_ORDER-1. If memory cannot be allocated at the requested order, smaller allocation(s) are attempted. This allows the default size to be kept quite low when one or more of the atomic pools is not used. Allocations for lowmem should also use GFP_KERNEL for the benefits of reclaim, so use GFP_KERNEL | GFP_DMA and GFP_KERNEL | GFP_DMA32 for lowmem allocations. This also allows __dma_atomic_pool_init() to return a pointer to the pool to make initialization cleaner. Also switch over some node ids to the more appropriate NUMA_NO_NODE. Signed-off-by: David Rientjes --- kernel/dma/pool.c | 122 +++--- 1 file changed, 84 insertions(+), 38 deletions(-) diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c index 5c98ab991b16..9e2da17ed17b 100644 --- a/kernel/dma/pool.c +++ b/kernel/dma/pool.c @@ -9,13 +9,17 @@ #include #include #include +#include static struct gen_pool *atomic_pool_dma __ro_after_init; static struct gen_pool *atomic_pool_dma32 __ro_after_init; static struct gen_pool *atomic_pool_kernel __ro_after_init; #define DEFAULT_DMA_COHERENT_POOL_SIZE SZ_256K -static size_t atomic_pool_size __initdata = DEFAULT_DMA_COHERENT_POOL_SIZE; +static size_t atomic_pool_size = DEFAULT_DMA_COHERENT_POOL_SIZE; + +/* Dynamic background expansion when the atomic pool is near capacity */ +static struct work_struct atomic_pool_work; static int __init early_coherent_pool(char *p) { @@ -24,76 +28,116 @@ static int __init early_coherent_pool(char *p) } early_param("coherent_pool", early_coherent_pool); -static int __init __dma_atomic_pool_init(struct gen_pool **pool, -size_t pool_size, gfp_t gfp) +static int atomic_pool_expand(struct gen_pool *pool, size_t pool_size, + gfp_t gfp) { - const unsigned int order = get_order(pool_size); - const unsigned long nr_pages = pool_size >> PAGE_SHIFT; + unsigned int order; struct page *page; void *addr; - int ret; + int ret = -ENOMEM; + + /* Cannot allocate larger than MAX_ORDER-1 */ + order = min(get_order(pool_size), MAX_ORDER-1); + + do { + pool_size = 1 << (PAGE_SHIFT + order); - if (dev_get_cma_area(NULL)) - page = dma_alloc_from_contiguous(NULL, nr_pages, order, false); - else - page = alloc_pages(gfp, order); + if (dev_get_cma_area(NULL)) + page = dma_alloc_from_contiguous(NULL, 1 << order, +order, false); + else + page = alloc_pages(gfp, order); + } while (!page && order-- > 0); if (!page) goto out; arch_dma_prep_coherent(page, pool_size); - *pool = gen_pool_create(PAGE_SHIFT, -1); - if (!*pool) - goto free_page; - addr = dma_common_contiguous_remap(page, pool_size, pgprot_dmacoherent(PAGE_KERNEL), __builtin_return_address(0)); if (!addr) - goto destroy_genpool; + goto free_page; - ret = gen_pool_add_virt(*pool, (unsigned long)addr, page_to_phys(page), - pool_size, -1); + ret = gen_pool_add_virt(pool, (unsigned long)addr, page_to_phys(page), + pool_size, NUMA_NO_NODE); if (ret) goto remove_mapping; - gen_pool_set_algo(*pool, gen_pool_first_fit_order_align, NULL); - pr_info("DMA: preallocated %zu KiB %pGg pool for atomic allocations\n", - pool_size >> 10, &gfp); return 0; remove_mapping: dma_common_free_remap(addr, pool_size); -destroy_genpool: - gen_pool_destroy(*pool); - *pool = NULL; free_page: - if (!dma_release_from_contiguous(NULL, page, nr_pages)) + if (!dma_release_from_contiguous(NULL, page, 1 << order)) __free_pages(page, order); out: - pr_err("DMA: failed to allocate %zu KiB %pGg pool for atomic allocation\n", - pool_size >> 10, &gfp); - return -ENOMEM; + return ret; +} + +static void atomic_pool_resize(struct gen_pool *pool, gfp_t gfp) +{ + if (pool && gen_pool_avail(pool) < atomic_pool_size) + atomic_pool_expand(pool, gen_pool_size(pool), gfp); +} + +static void atomic_pool_work_fn(struct work_struct *work) +{ + if (IS_ENABLED(CONFIG_ZONE_DMA)) + atomic_pool_resize(atomic_pool_dma, +
[patch 6/7] x86/mm: unencrypted non-blocking DMA allocations use coherent pools
When CONFIG_AMD_MEM_ENCRYPT is enabled and a device requires unencrypted DMA, all non-blocking allocations must originate from the atomic DMA coherent pools. Select CONFIG_DMA_COHERENT_POOL for CONFIG_AMD_MEM_ENCRYPT. Signed-off-by: David Rientjes --- arch/x86/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 1d6104ea8af0..2bf819d3 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1520,6 +1520,7 @@ config X86_CPA_STATISTICS config AMD_MEM_ENCRYPT bool "AMD Secure Memory Encryption (SME) support" depends on X86_64 && CPU_SUP_AMD + select DMA_COHERENT_POOL select DYNAMIC_PHYSICAL_MASK select ARCH_USE_MEMREMAP_PROT select ARCH_HAS_FORCE_DMA_UNENCRYPTED ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch 0/7] unencrypted atomic DMA pools with dynamic expansion
set_memory_decrypted() may block so it is not possible to do non-blocking allocations through the DMA API for devices that required unencrypted memory. The solution is to expand the atomic DMA pools for the various possible gfp requirements as a means to prevent an unnecessary depletion of lowmem. These atomic pools are separated from the remap code and can be selected for configurations that need them outside the scope of CONFIG_DMA_DIRECT_REMAP, such as CONFIG_AMD_MEM_ENCRYPT. These atomic DMA pools are kept unencrypted so they can immediately be used for non-blocking allocations. Since the need for this type of memory depends on the kernel config and devices being used, these pools are also dynamically expandable. The sizes of the various atomic DMA pools is exported through debugfs at /sys/kernel/debug/dma_pools. This patchset is based on latest Linus HEAD: commit 8632e9b5645bbc2331d21d892b0d6961c1a08429 Merge: 6cc9306b8fc0 f3a99e761efa Author: Linus Torvalds Date: Tue Apr 14 11:58:04 2020 -0700 Merge tag 'hyperv-fixes-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux --- arch/x86/Kconfig| 1 + drivers/iommu/dma-iommu.c | 5 +- include/linux/dma-direct.h | 2 + include/linux/dma-mapping.h | 6 +- kernel/dma/Kconfig | 6 +- kernel/dma/Makefile | 1 + kernel/dma/direct.c | 56 ++-- kernel/dma/pool.c | 275 kernel/dma/remap.c | 114 --- 9 files changed, 334 insertions(+), 132 deletions(-) create mode 100644 kernel/dma/pool.c ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch 7/7] dma-pool: scale the default DMA coherent pool size with memory capacity
When AMD memory encryption is enabled, some devices may use more than 256KB/sec from the atomic pools. It would be more appropriate to scale the default size based on memory capacity unless the coherent_pool option is used on the kernel command line. This provides a slight optimization on initial expansion and is deemed appropriate due to the increased reliance on the atomic pools. Note that the default size of 128KB per pool will normally be larger than the single coherent pool implementation since there are now up to three coherent pools (DMA, DMA32, and kernel). Note that even prior to this patch, coherent_pool= for sizes larger than 1 << (PAGE_SHIFT + MAX_ORDER-1) can fail. With new dynamic expansion support, this would be trivially extensible to allow even larger initial sizes. Signed-off-by: David Rientjes --- kernel/dma/pool.c | 14 -- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c index 3e22022c933b..763b687569b0 100644 --- a/kernel/dma/pool.c +++ b/kernel/dma/pool.c @@ -22,8 +22,8 @@ static unsigned long pool_size_dma32; static unsigned long pool_size_kernel; #endif -#define DEFAULT_DMA_COHERENT_POOL_SIZE SZ_256K -static size_t atomic_pool_size = DEFAULT_DMA_COHERENT_POOL_SIZE; +/* Size can be defined by the coherent_pool command line */ +static size_t atomic_pool_size; /* Dynamic background expansion when the atomic pool is near capacity */ static struct work_struct atomic_pool_work; @@ -181,6 +181,16 @@ static int __init dma_atomic_pool_init(void) { int ret = 0; + /* +* If coherent_pool was not used on the command line, default the pool +* sizes to 128KB per 1GB of memory, min 128KB, max MAX_ORDER-1. +*/ + if (!atomic_pool_size) { + atomic_pool_size = max(totalram_pages() >> PAGE_SHIFT, 1UL) * + SZ_128K; + atomic_pool_size = min_t(size_t, atomic_pool_size, +1 << (PAGE_SHIFT + MAX_ORDER-1)); + } INIT_WORK(&atomic_pool_work, atomic_pool_work_fn); atomic_pool_kernel = __dma_atomic_pool_init(atomic_pool_size, ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch 5/7] dma-pool: add pool sizes to debugfs
The atomic DMA pools can dynamically expand based on non-blocking allocations that need to use it. Export the sizes of each of these pools, in bytes, through debugfs for measurement. Suggested-by: Christoph Hellwig Signed-off-by: David Rientjes --- kernel/dma/pool.c | 41 + 1 file changed, 41 insertions(+) diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c index cf052314d9e4..3e22022c933b 100644 --- a/kernel/dma/pool.c +++ b/kernel/dma/pool.c @@ -2,6 +2,7 @@ /* * Copyright (C) 2020 Google LLC */ +#include #include #include #include @@ -15,6 +16,11 @@ static struct gen_pool *atomic_pool_dma __ro_after_init; static struct gen_pool *atomic_pool_dma32 __ro_after_init; static struct gen_pool *atomic_pool_kernel __ro_after_init; +#ifdef CONFIG_DEBUG_FS +static unsigned long pool_size_dma; +static unsigned long pool_size_dma32; +static unsigned long pool_size_kernel; +#endif #define DEFAULT_DMA_COHERENT_POOL_SIZE SZ_256K static size_t atomic_pool_size = DEFAULT_DMA_COHERENT_POOL_SIZE; @@ -29,6 +35,38 @@ static int __init early_coherent_pool(char *p) } early_param("coherent_pool", early_coherent_pool); +#ifdef CONFIG_DEBUG_FS +static void __init dma_atomic_pool_debugfs_init(void) +{ + struct dentry *root; + + root = debugfs_create_dir("dma_pools", NULL); + if (IS_ERR_OR_NULL(root)) + return; + + debugfs_create_ulong("pool_size_dma", 0400, root, &pool_size_dma); + debugfs_create_ulong("pool_size_dma32", 0400, root, &pool_size_dma32); + debugfs_create_ulong("pool_size_kernel", 0400, root, &pool_size_kernel); +} + +static void dma_atomic_pool_size_add(gfp_t gfp, size_t size) +{ + if (gfp & __GFP_DMA) + pool_size_dma += size; + else if (gfp & __GFP_DMA32) + pool_size_dma32 += size; + else + pool_size_kernel += size; +} +#else +static inline void dma_atomic_pool_debugfs_init(void) +{ +} +static inline void dma_atomic_pool_size_add(gfp_t gfp, size_t size) +{ +} +#endif /* CONFIG_DEBUG_FS */ + static int atomic_pool_expand(struct gen_pool *pool, size_t pool_size, gfp_t gfp) { @@ -76,6 +114,7 @@ static int atomic_pool_expand(struct gen_pool *pool, size_t pool_size, if (ret) goto encrypt_mapping; + dma_atomic_pool_size_add(gfp, pool_size); return 0; encrypt_mapping: @@ -160,6 +199,8 @@ static int __init dma_atomic_pool_init(void) if (!atomic_pool_dma32) ret = -ENOMEM; } + + dma_atomic_pool_debugfs_init(); return ret; } postcore_initcall(dma_atomic_pool_init); ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[patch 2/7] dma-pool: add additional coherent pools to map to gfp mask
The single atomic pool is allocated from the lowest zone possible since it is guaranteed to be applicable for any DMA allocation. Devices may allocate through the DMA API but not have a strict reliance on GFP_DMA memory. Since the atomic pool will be used for all non-blockable allocations, returning all memory from ZONE_DMA may unnecessarily deplete the zone. Provision for multiple atomic pools that will map to the optimal gfp mask of the device. When allocating non-blockable memory, determine the optimal gfp mask of the device and use the appropriate atomic pool. The coherent DMA mask will remain the same between allocation and free and, thus, memory will be freed to the same atomic pool it was allocated from. __dma_atomic_pool_init() will be changed to return struct gen_pool * later once dynamic expansion is added. Signed-off-by: David Rientjes --- drivers/iommu/dma-iommu.c | 5 +- include/linux/dma-direct.h | 2 + include/linux/dma-mapping.h | 6 +- kernel/dma/direct.c | 12 ++-- kernel/dma/pool.c | 120 +++- 5 files changed, 91 insertions(+), 54 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index ba128d1cdaee..4959f5df21bd 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -952,7 +952,7 @@ static void __iommu_dma_free(struct device *dev, size_t size, void *cpu_addr) /* Non-coherent atomic allocation? Easy */ if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && - dma_free_from_pool(cpu_addr, alloc_size)) + dma_free_from_pool(dev, cpu_addr, alloc_size)) return; if (IS_ENABLED(CONFIG_DMA_REMAP) && is_vmalloc_addr(cpu_addr)) { @@ -1035,7 +1035,8 @@ static void *iommu_dma_alloc(struct device *dev, size_t size, if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && !gfpflags_allow_blocking(gfp) && !coherent) - cpu_addr = dma_alloc_from_pool(PAGE_ALIGN(size), &page, gfp); + cpu_addr = dma_alloc_from_pool(dev, PAGE_ALIGN(size), &page, + gfp); else cpu_addr = iommu_dma_alloc_pages(dev, size, &page, gfp, attrs); if (!cpu_addr) diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h index 24b8684aa21d..136f984df0d9 100644 --- a/include/linux/dma-direct.h +++ b/include/linux/dma-direct.h @@ -67,6 +67,8 @@ static inline bool dma_capable(struct device *dev, dma_addr_t addr, size_t size, } u64 dma_direct_get_required_mask(struct device *dev); +gfp_t dma_direct_optimal_gfp_mask(struct device *dev, u64 dma_mask, + u64 *phys_mask); void *dma_direct_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs); void dma_direct_free(struct device *dev, size_t size, void *cpu_addr, diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index 330ad58fbf4d..b43116a6405d 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -630,9 +630,9 @@ void *dma_common_pages_remap(struct page **pages, size_t size, pgprot_t prot, const void *caller); void dma_common_free_remap(void *cpu_addr, size_t size); -bool dma_in_atomic_pool(void *start, size_t size); -void *dma_alloc_from_pool(size_t size, struct page **ret_page, gfp_t flags); -bool dma_free_from_pool(void *start, size_t size); +void *dma_alloc_from_pool(struct device *dev, size_t size, + struct page **ret_page, gfp_t flags); +bool dma_free_from_pool(struct device *dev, void *start, size_t size); int dma_common_get_sgtable(struct device *dev, struct sg_table *sgt, void *cpu_addr, diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c index 8f4bbdaf965e..a834ee22f8ff 100644 --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -45,8 +45,8 @@ u64 dma_direct_get_required_mask(struct device *dev) return (1ULL << (fls64(max_dma) - 1)) * 2 - 1; } -static gfp_t __dma_direct_optimal_gfp_mask(struct device *dev, u64 dma_mask, - u64 *phys_limit) +gfp_t dma_direct_optimal_gfp_mask(struct device *dev, u64 dma_mask, + u64 *phys_limit) { u64 dma_limit = min_not_zero(dma_mask, dev->bus_dma_limit); @@ -89,8 +89,8 @@ struct page *__dma_direct_alloc_pages(struct device *dev, size_t size, /* we always manually zero the memory once we are done: */ gfp &= ~__GFP_ZERO; - gfp |= __dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask, - &phys_limit); + gfp |= dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask, + &phys_limit); page = dma_alloc_contiguous(dev, alloc_size, gfp); if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) { dma_free_contiguous(dev, page, alloc_size); @@ -128,7 +128,7 @@ void *dma_direct_al
Re: [rfc v2 3/6] dma-pool: dynamically expanding atomic pools
On Tue, 14 Apr 2020, Christoph Hellwig wrote: > > I'll rely on Christoph to determine whether it makes sense to add some > > periodic scavening of the atomic pools, whether that's needed for this to > > be merged, or wheter we should enforce some maximum pool size. > > I don't really see the point. In fact the only part of the series > I feel uneasy about is the growing of the pools, because it already > adds a fair amount of complexity that we might not need for simple > things, but shrinking really doesn't make any sense. So I'm tempted > to not ever support shrinking, and even make growing optional code under > a new config variable. We'll also need a way to query the current size > through e.g. a debugfs file. > New debugfs file sounds good, I'll add it. If we want to disable dynamic expansion when the pool is depleted under a new config option, let me know. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [rfc v2 4/6] dma-direct: atomic allocations must come from atomic coherent pools
On Tue, 14 Apr 2020, Christoph Hellwig wrote: > > + /* > > +* Unencrypted memory must come directly from DMA atomic pools if > > +* blocking is not allowed. > > +*/ > > + if (IS_ENABLED(CONFIG_DMA_COHERENT_POOL) && > > + force_dma_unencrypted(dev) && !gfpflags_allow_blocking(gfp)) { > > + ret = dma_alloc_from_pool(dev, PAGE_ALIGN(size), &page, gfp); > > + if (!ret) > > + return NULL; > > + goto done; > > + } > > + > > if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && > > dma_alloc_need_uncached(dev, attrs) && > > !gfpflags_allow_blocking(gfp)) { > > Can we keep a single conditional for the pool allocations? Maybe > add a new dma_alloc_from_pool helper ala: > > static inline bool dma_alloc_from_pool(struct device *dev, gfp_t gfp) > { > if (!IS_ENABLED(CONFIG_DMA_COHERENT_POOL)) > return false; > if (gfpflags_allow_blocking(gfp)) > return false; > if (force_dma_unencrypted(dev)) > return true; > if (dma_alloc_need_uncached(dev)) > return true; > } Looks good, fixed. I renamed it to dma_should_alloc_from_pool() to avoid confusing it with the actual allocation function and added a dma_should_free_from_pool() as well. diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -75,6 +75,39 @@ static bool dma_coherent_ok(struct device *dev, phys_addr_t phys, size_t size) min_not_zero(dev->coherent_dma_mask, dev->bus_dma_limit); } +/* + * Decrypting memory is allowed to block, so if this device requires + * unencrypted memory it must come from atomic pools. + */ +static inline bool dma_should_alloc_from_pool(struct device *dev, gfp_t gfp, + unsigned long attrs) +{ + if (!IS_ENABLED(CONFIG_DMA_COHERENTPOOL)) + return false; + if (gfpflags_allow_blocking(gfp)) + return false; + if (force_dma_unencrypted(dev)) + return true; + if (!IS_ENABLED(CONFIG_DMA_DIRECT_REMAP)) + return false; + if (dma_alloc_need_uncached(dev, attrs)) + return true; + return false; +} + +static inline bool dma_should_free_from_pool(struct device *dev, +unsigned long attrs) +{ + if (IS_ENABLED(CONFIG_DMA_COHERENT_POOL)) + return true; + if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) && + !force_dma_unencrypted(dev)) + return false; + if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP)) + return true; + return false; +} + struct page *__dma_direct_alloc_pages(struct device *dev, size_t size, gfp_t gfp, unsigned long attrs) { @@ -124,9 +157,7 @@ void *dma_direct_alloc_pages(struct device *dev, size_t size, struct page *page; void *ret; - if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && - dma_alloc_need_uncached(dev, attrs) && - !gfpflags_allow_blocking(gfp)) { + if (dma_should_alloc_from_pool(dev, gfp, attrs)) { ret = dma_alloc_from_pool(dev, PAGE_ALIGN(size), &page, gfp); if (!ret) return NULL; @@ -202,6 +233,11 @@ void dma_direct_free_pages(struct device *dev, size_t size, void *cpu_addr, { unsigned int page_order = get_order(size); + /* If cpu_addr is not from an atomic pool, dma_free_from_pool() fails */ + if (dma_should_free_from_pool(dev, attrs) && + dma_free_from_pool(dev, cpu_addr, PAGE_ALIGN(size))) + return; + if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) && !force_dma_unencrypted(dev)) { /* cpu_addr is a struct page cookie, not a kernel address */ @@ -209,10 +245,6 @@ void dma_direct_free_pages(struct device *dev, size_t size, void *cpu_addr, return; } - if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && - dma_free_from_pool(dev, cpu_addr, PAGE_ALIGN(size))) - return; - if (force_dma_unencrypted(dev)) set_memory_encrypted((unsigned long)cpu_addr, 1 << page_order); ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [rfc v2 4/6] dma-direct: atomic allocations must come from atomic coherent pools
On Thu, 9 Apr 2020, Tom Lendacky wrote: > > When a device required unencrypted memory and the context does not allow > > required => requires > Fixed, thanks. > > blocking, memory must be returned from the atomic coherent pools. > > > > This avoids the remap when CONFIG_DMA_DIRECT_REMAP is not enabled and the > > config only requires CONFIG_DMA_COHERENT_POOL. This will be used for > > CONFIG_AMD_MEM_ENCRYPT in a subsequent patch. > > > > Keep all memory in these pools unencrypted. > > > > Signed-off-by: David Rientjes > > --- > > kernel/dma/direct.c | 16 > > kernel/dma/pool.c | 15 +-- > > 2 files changed, 29 insertions(+), 2 deletions(-) > > > > diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c > > index 70800ca64f13..44165263c185 100644 > > --- a/kernel/dma/direct.c > > +++ b/kernel/dma/direct.c > > @@ -124,6 +124,18 @@ void *dma_direct_alloc_pages(struct device *dev, size_t > > size, > > struct page *page; > > void *ret; > > + /* > > +* Unencrypted memory must come directly from DMA atomic pools if > > +* blocking is not allowed. > > +*/ > > + if (IS_ENABLED(CONFIG_DMA_COHERENT_POOL) && > > + force_dma_unencrypted(dev) && !gfpflags_allow_blocking(gfp)) { > > + ret = dma_alloc_from_pool(dev, PAGE_ALIGN(size), &page, gfp); > > + if (!ret) > > + return NULL; > > + goto done; > > + } > > + > > if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && > > dma_alloc_need_uncached(dev, attrs) && > > !gfpflags_allow_blocking(gfp)) { > > @@ -203,6 +215,10 @@ void dma_direct_free_pages(struct device *dev, size_t > > size, void *cpu_addr, > > { > > unsigned int page_order = get_order(size); > > + if (IS_ENABLED(CONFIG_DMA_COHERENT_POOL) && > > + dma_free_from_pool(dev, cpu_addr, PAGE_ALIGN(size))) > > + return; > > + > > if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) && > > !force_dma_unencrypted(dev)) { > > /* cpu_addr is a struct page cookie, not a kernel address */ > > diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c > > index e14c5a2da734..6685ab89cfa7 100644 > > --- a/kernel/dma/pool.c > > +++ b/kernel/dma/pool.c > > @@ -9,6 +9,7 @@ > > #include > > #include > > #include > > +#include > > #include > > #include > > #include > > @@ -55,12 +56,20 @@ static int atomic_pool_expand(struct gen_pool *pool, > > size_t pool_size, > > arch_dma_prep_coherent(page, pool_size); > > +#ifdef CONFIG_DMA_DIRECT_REMAP > > addr = dma_common_contiguous_remap(page, pool_size, > >pgprot_dmacoherent(PAGE_KERNEL), > >__builtin_return_address(0)); > > if (!addr) > > goto free_page; > > - > > +#else > > + addr = page_to_virt(page); > > +#endif > > + /* > > +* Memory in the atomic DMA pools must be unencrypted, the pools do > > not > > +* shrink so no re-encryption occurs in dma_direct_free_pages(). > > +*/ > > + set_memory_decrypted((unsigned long)page_to_virt(page), 1 << order); > > ret = gen_pool_add_virt(pool, (unsigned long)addr, page_to_phys(page), > > pool_size, NUMA_NO_NODE); > > if (ret) > > @@ -69,8 +78,10 @@ static int atomic_pool_expand(struct gen_pool *pool, > > size_t pool_size, > > return 0; > > remove_mapping: > > +#ifdef CONFIG_DMA_DIRECT_REMAP > > dma_common_free_remap(addr, pool_size); > > You're about to free the memory, but you've called set_memory_decrypted() > against it, so you need to do a set_memory_encrypted() to bring it back to a > state ready for allocation again. > Ah, good catch, thanks. I notice that I should also be checking the return value of set_memory_decrypted() because pages added to the coherent pools *must* be unencrypted. If it fails, we fail the expansion. And do the same thing for set_memory_encrypted(), which would be a bizarre situation (decrypt succeeded, encrypt failed), by simply leaking the page. diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c --- a/kernel/dma/pool.c +++ b/kernel/dma/pool.c @@ -7,6 +7,7 @@ #include #include #include +#include #include #include #include @@ -53,22 +54,42 @@ static int atomic_pool_expand(struct gen_pool *pool, size_t pool_size, arch_dma_prep_coherent(page, pool_size); +#ifdef CONFIG_DMA_DIRECT_REMAP addr = dma_common_contiguous_remap(page, pool_size, pgprot_dmacoherent(PAGE_KERNEL), __builtin_return_address(0)); if (!addr) goto free_page; - +#else + addr = page_to_virt(page); +#endif + /* +* Memory in the atomic DMA pools must be unencrypted, the pools do not +* shrink so no re-encryption occurs in dma_direct_free_pages(). +*/ + ret = set_memory_decrypted((unsigned long)page_to_virt(page)
Re: [rfc v2 3/6] dma-pool: dynamically expanding atomic pools
On Fri, 10 Apr 2020, Hillf Danton wrote: > > On Wed, 8 Apr 2020 14:21:06 -0700 (PDT) David Rientjes wrote: > > > > When an atomic pool becomes fully depleted because it is now relied upon > > for all non-blocking allocations through the DMA API, allow background > > expansion of each pool by a kworker. > > > > When an atomic pool has less than the default size of memory left, kick > > off a kworker to dynamically expand the pool in the background. The pool > > is doubled in size, up to MAX_ORDER-1. If memory cannot be allocated at > > the requested order, smaller allocation(s) are attempted. > > > What is proposed looks like a path of single lane without how to > dynamically shrink the pool taken into account. Thus the risk may > rise in corner cases where pools are over-expanded in long run > after one-off peak allocation requests. > To us, this is actually a benefit: we prefer the peak size to be maintained so that we do not need to dynamic resize the pool later at the cost of throughput. Genpool also does not have great support for scavenging and freeing unused chunks. Perhaps we could enforce a maximum size on the pools just as we allow the default size to be defined by coherent_size= on the command line. Our use case would not set this, however, since we have not seen egregious genpool sizes as the result of non-blockable DMA allocations (perhaps the drivers we use just play friendlier and you have seen excessive usage?). I'll rely on Christoph to determine whether it makes sense to add some periodic scavening of the atomic pools, whether that's needed for this to be merged, or wheter we should enforce some maximum pool size. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[rfc v2 0/6] unencrypted atomic DMA pools with dynamic expansion
set_memory_decrypted() may block so it is not possible to do non-blocking allocations through the DMA API for devices that required unencrypted memory. The solution is to expand the atomic DMA pools for the various possible gfp requirements as a means to prevent an unnecessary depletion of lowmem. These atomic pools are separated from the remap code and can be selected for configurations that need them outside the scope of CONFIG_DMA_DIRECT_REMAP, such as CONFIG_AMD_MEM_ENCRYPT. These atomic DMA pools are kept unencrypted so they can immediately be used for non-blocking allocations. Since the need for this type of memory depends on the kernel config and devices being used, these pools are also dynamically expandable. There are a number of changes to v1 of the patchset based on Christoph's feedback and the separation of the atomic pools out into the new kernel/dma/pool.c. NOTE! I'd like eyes from Christoph specifically on patch 4 in the series where we handle the coherent pools differently depending on CONFIG_DMA_COHERENT_POOL and/or CONFIG_DMA_DIRECT_REMAP and from Thomas on the requirement for set_memory_decrypted() for all DMA coherent pools. Why still an RFC? We are starting to aggressively test this series but since there were a number of changes that were requested for the first RFC, it would be better to have feedback and factor that into the series earlier rather than later so testing can be focused on a series that could be merged upstream. This patchset is based on latest Linus HEAD: commit ae46d2aa6a7fbe8ca0946f24b061b6ccdc6c3f25 Author: Hillf Danton Date: Wed Apr 8 11:59:24 2020 -0400 mm/gup: Let __get_user_pages_locked() return -EINTR for fatal signal --- arch/x86/Kconfig| 1 + drivers/iommu/dma-iommu.c | 5 +- include/linux/dma-direct.h | 2 + include/linux/dma-mapping.h | 6 +- kernel/dma/Kconfig | 6 +- kernel/dma/Makefile | 1 + kernel/dma/direct.c | 28 - kernel/dma/pool.c | 224 kernel/dma/remap.c | 114 -- 9 files changed, 261 insertions(+), 126 deletions(-) create mode 100644 kernel/dma/pool.c ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[rfc v2 6/6] dma-pool: scale the default DMA coherent pool size with memory capacity
When AMD memory encryption is enabled, some devices may use more than 256KB/sec from the atomic pools. It would be more appropriate to scale the default size based on memory capacity unless the coherent_pool option is used on the kernel command line. This provides a slight optimization on initial expansion and is deemed appropriate due to the increased reliance on the atomic pools. Note that the default size of 128KB per pool will normally be larger than the single coherent pool implementation since there are now up to three coherent pools (DMA, DMA32, and kernel). Alternatively, this could be done only when CONFIG_AMD_MEM_ENCRYPT is enabled. Signed-off-by: David Rientjes --- kernel/dma/pool.c | 14 -- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c index 6685ab89cfa7..42bac953548c 100644 --- a/kernel/dma/pool.c +++ b/kernel/dma/pool.c @@ -18,8 +18,8 @@ static struct gen_pool *atomic_pool_dma __ro_after_init; static struct gen_pool *atomic_pool_dma32 __ro_after_init; static struct gen_pool *atomic_pool_kernel __ro_after_init; -#define DEFAULT_DMA_COHERENT_POOL_SIZE SZ_256K -static size_t atomic_pool_size = DEFAULT_DMA_COHERENT_POOL_SIZE; +/* Size can be defined by the coherent_pool command line */ +static size_t atomic_pool_size; /* Dynamic background expansion when the atomic pool is near capacity */ static struct work_struct atomic_pool_work; @@ -132,6 +132,16 @@ static int __init dma_atomic_pool_init(void) { int ret = 0; + /* +* If coherent_pool was not used on the command line, default the pool +* sizes to 128KB per 1GB of memory, min 128KB, max MAX_ORDER-1. +*/ + if (!atomic_pool_size) { + atomic_pool_size = max(totalram_pages() >> PAGE_SHIFT, 1UL) * + SZ_128K; + atomic_pool_size = min_t(size_t, atomic_pool_size, +1 << (PAGE_SHIFT + MAX_ORDER-1)); + } INIT_WORK(&atomic_pool_work, atomic_pool_work_fn); atomic_pool_kernel = __dma_atomic_pool_init(atomic_pool_size, ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[rfc v2 3/6] dma-pool: dynamically expanding atomic pools
When an atomic pool becomes fully depleted because it is now relied upon for all non-blocking allocations through the DMA API, allow background expansion of each pool by a kworker. When an atomic pool has less than the default size of memory left, kick off a kworker to dynamically expand the pool in the background. The pool is doubled in size, up to MAX_ORDER-1. If memory cannot be allocated at the requested order, smaller allocation(s) are attempted. This allows the default size to be kept quite low when one or more of the atomic pools is not used. This also allows __dma_atomic_pool_init to return a pointer to the pool to make initialization cleaner. Also switch over some node ids to the more appropriate NUMA_NO_NODE. Signed-off-by: David Rientjes --- kernel/dma/pool.c | 120 +++--- 1 file changed, 82 insertions(+), 38 deletions(-) diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c index 9b806f5eded8..e14c5a2da734 100644 --- a/kernel/dma/pool.c +++ b/kernel/dma/pool.c @@ -11,13 +11,17 @@ #include #include #include +#include static struct gen_pool *atomic_pool_dma __ro_after_init; static struct gen_pool *atomic_pool_dma32 __ro_after_init; static struct gen_pool *atomic_pool_kernel __ro_after_init; #define DEFAULT_DMA_COHERENT_POOL_SIZE SZ_256K -static size_t atomic_pool_size __initdata = DEFAULT_DMA_COHERENT_POOL_SIZE; +static size_t atomic_pool_size = DEFAULT_DMA_COHERENT_POOL_SIZE; + +/* Dynamic background expansion when the atomic pool is near capacity */ +static struct work_struct atomic_pool_work; static int __init early_coherent_pool(char *p) { @@ -26,76 +30,114 @@ static int __init early_coherent_pool(char *p) } early_param("coherent_pool", early_coherent_pool); -static int __init __dma_atomic_pool_init(struct gen_pool **pool, -size_t pool_size, gfp_t gfp) +static int atomic_pool_expand(struct gen_pool *pool, size_t pool_size, + gfp_t gfp) { - const unsigned int order = get_order(pool_size); - const unsigned long nr_pages = pool_size >> PAGE_SHIFT; + unsigned int order; struct page *page; void *addr; - int ret; + int ret = -ENOMEM; + + /* Cannot allocate larger than MAX_ORDER-1 */ + order = min(get_order(pool_size), MAX_ORDER-1); + + do { + pool_size = 1 << (PAGE_SHIFT + order); - if (dev_get_cma_area(NULL)) - page = dma_alloc_from_contiguous(NULL, nr_pages, order, false); - else - page = alloc_pages(gfp, order); + if (dev_get_cma_area(NULL)) + page = dma_alloc_from_contiguous(NULL, 1 << order, +order, false); + else + page = alloc_pages(gfp, order); + } while (!page && order-- > 0); if (!page) goto out; arch_dma_prep_coherent(page, pool_size); - *pool = gen_pool_create(PAGE_SHIFT, -1); - if (!*pool) - goto free_page; - addr = dma_common_contiguous_remap(page, pool_size, pgprot_dmacoherent(PAGE_KERNEL), __builtin_return_address(0)); if (!addr) - goto destroy_genpool; + goto free_page; - ret = gen_pool_add_virt(*pool, (unsigned long)addr, page_to_phys(page), - pool_size, -1); + ret = gen_pool_add_virt(pool, (unsigned long)addr, page_to_phys(page), + pool_size, NUMA_NO_NODE); if (ret) goto remove_mapping; - gen_pool_set_algo(*pool, gen_pool_first_fit_order_align, NULL); - pr_info("DMA: preallocated %zu KiB %pGg pool for atomic allocations\n", - pool_size >> 10, &gfp); return 0; remove_mapping: dma_common_free_remap(addr, pool_size); -destroy_genpool: - gen_pool_destroy(*pool); - *pool = NULL; free_page: - if (!dma_release_from_contiguous(NULL, page, nr_pages)) + if (!dma_release_from_contiguous(NULL, page, 1 << order)) __free_pages(page, order); out: - pr_err("DMA: failed to allocate %zu KiB %pGg pool for atomic allocation\n", - pool_size >> 10, &gfp); - return -ENOMEM; + return ret; +} + +static void atomic_pool_resize(struct gen_pool *pool, gfp_t gfp) +{ + if (pool && gen_pool_avail(pool) < atomic_pool_size) + atomic_pool_expand(pool, gen_pool_size(pool), gfp); +} + +static void atomic_pool_work_fn(struct work_struct *work) +{ + if (IS_ENABLED(CONFIG_ZONE_DMA)) + atomic_pool_resize(atomic_pool_dma, GFP_DMA); + if (IS_ENABLED(CONFIG_ZONE_DMA32)) + atomic_pool_resize(atomic_pool_dma32, GFP_DMA32); + atomic_pool_resize(atomic_pool_kernel, GFP_KE
[rfc v2 5/6] x86/mm: unencrypted non-blocking DMA allocations use coherent pools
When CONFIG_AMD_MEM_ENCRYPT is enabled and a device requires unencrypted DMA, all non-blocking allocations must originate from the atomic DMA coherent pools. Select CONFIG_DMA_COHERENT_POOL for CONFIG_AMD_MEM_ENCRYPT. Signed-off-by: David Rientjes --- arch/x86/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 8d078642b4be..b7c9e78a5374 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1520,6 +1520,7 @@ config X86_CPA_STATISTICS config AMD_MEM_ENCRYPT bool "AMD Secure Memory Encryption (SME) support" depends on X86_64 && CPU_SUP_AMD + select DMA_COHERENT_POOL select DYNAMIC_PHYSICAL_MASK select ARCH_USE_MEMREMAP_PROT select ARCH_HAS_FORCE_DMA_UNENCRYPTED ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[rfc v2 2/6] dma-pool: add additional coherent pools to map to gfp mask
The single atomic pool is allocated from the lowest zone possible since it is guaranteed to be applicable for any DMA allocation. Devices may allocate through the DMA API but not have a strict reliance on GFP_DMA memory. Since the atomic pool will be used for all non-blockable allocations, returning all memory from ZONE_DMA may unnecessarily deplete the zone. Provision for multiple atomic pools that will map to the optimal gfp mask of the device. When allocating non-blockable memory, determine the optimal gfp mask of the device and use the appropriate atomic pool. The coherent DMA mask will remain the same between allocation and free and, thus, memory will be freed to the same atomic pool it was allocated from. __dma_atomic_pool_init() will be changed to return struct gen_pool * later once dynamic expansion is added. Signed-off-by: David Rientjes --- drivers/iommu/dma-iommu.c | 5 +- include/linux/dma-direct.h | 2 + include/linux/dma-mapping.h | 6 +- kernel/dma/direct.c | 12 ++-- kernel/dma/pool.c | 120 +++- 5 files changed, 91 insertions(+), 54 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index ba128d1cdaee..4959f5df21bd 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -952,7 +952,7 @@ static void __iommu_dma_free(struct device *dev, size_t size, void *cpu_addr) /* Non-coherent atomic allocation? Easy */ if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && - dma_free_from_pool(cpu_addr, alloc_size)) + dma_free_from_pool(dev, cpu_addr, alloc_size)) return; if (IS_ENABLED(CONFIG_DMA_REMAP) && is_vmalloc_addr(cpu_addr)) { @@ -1035,7 +1035,8 @@ static void *iommu_dma_alloc(struct device *dev, size_t size, if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && !gfpflags_allow_blocking(gfp) && !coherent) - cpu_addr = dma_alloc_from_pool(PAGE_ALIGN(size), &page, gfp); + cpu_addr = dma_alloc_from_pool(dev, PAGE_ALIGN(size), &page, + gfp); else cpu_addr = iommu_dma_alloc_pages(dev, size, &page, gfp, attrs); if (!cpu_addr) diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h index 24b8684aa21d..136f984df0d9 100644 --- a/include/linux/dma-direct.h +++ b/include/linux/dma-direct.h @@ -67,6 +67,8 @@ static inline bool dma_capable(struct device *dev, dma_addr_t addr, size_t size, } u64 dma_direct_get_required_mask(struct device *dev); +gfp_t dma_direct_optimal_gfp_mask(struct device *dev, u64 dma_mask, + u64 *phys_mask); void *dma_direct_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs); void dma_direct_free(struct device *dev, size_t size, void *cpu_addr, diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index 330ad58fbf4d..b43116a6405d 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -630,9 +630,9 @@ void *dma_common_pages_remap(struct page **pages, size_t size, pgprot_t prot, const void *caller); void dma_common_free_remap(void *cpu_addr, size_t size); -bool dma_in_atomic_pool(void *start, size_t size); -void *dma_alloc_from_pool(size_t size, struct page **ret_page, gfp_t flags); -bool dma_free_from_pool(void *start, size_t size); +void *dma_alloc_from_pool(struct device *dev, size_t size, + struct page **ret_page, gfp_t flags); +bool dma_free_from_pool(struct device *dev, void *start, size_t size); int dma_common_get_sgtable(struct device *dev, struct sg_table *sgt, void *cpu_addr, diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c index a8560052a915..70800ca64f13 100644 --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -44,8 +44,8 @@ u64 dma_direct_get_required_mask(struct device *dev) return (1ULL << (fls64(max_dma) - 1)) * 2 - 1; } -static gfp_t __dma_direct_optimal_gfp_mask(struct device *dev, u64 dma_mask, - u64 *phys_limit) +gfp_t dma_direct_optimal_gfp_mask(struct device *dev, u64 dma_mask, + u64 *phys_limit) { u64 dma_limit = min_not_zero(dma_mask, dev->bus_dma_limit); @@ -88,8 +88,8 @@ struct page *__dma_direct_alloc_pages(struct device *dev, size_t size, /* we always manually zero the memory once we are done: */ gfp &= ~__GFP_ZERO; - gfp |= __dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask, - &phys_limit); + gfp |= dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask, + &phys_limit); page = dma_alloc_contiguous(dev, alloc_size, gfp); if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) { dma_free_contiguous(dev, page, alloc_size); @@ -127,7 +127,7 @@ void *dma_direct_al
[rfc v2 1/6] dma-remap: separate DMA atomic pools from direct remap code
DMA atomic pools will be needed beyond only CONFIG_DMA_DIRECT_REMAP so separate them out into their own file. This also adds a new Kconfig option that can be subsequently used for options, such as CONFIG_AMD_MEM_ENCRYPT, that will utilize the coherent pools but do not have a dependency on direct remapping. For this patch alone, there is no functional change introduced. Signed-off-by: David Rientjes --- kernel/dma/Kconfig | 6 ++- kernel/dma/Makefile | 1 + kernel/dma/pool.c | 125 kernel/dma/remap.c | 114 4 files changed, 131 insertions(+), 115 deletions(-) create mode 100644 kernel/dma/pool.c diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig index 4c103a24e380..d006668c0027 100644 --- a/kernel/dma/Kconfig +++ b/kernel/dma/Kconfig @@ -79,10 +79,14 @@ config DMA_REMAP select DMA_NONCOHERENT_MMAP bool -config DMA_DIRECT_REMAP +config DMA_COHERENT_POOL bool select DMA_REMAP +config DMA_DIRECT_REMAP + bool + select DMA_COHERENT_POOL + config DMA_CMA bool "DMA Contiguous Memory Allocator" depends on HAVE_DMA_CONTIGUOUS && CMA diff --git a/kernel/dma/Makefile b/kernel/dma/Makefile index d237cf3dc181..370f63344e9c 100644 --- a/kernel/dma/Makefile +++ b/kernel/dma/Makefile @@ -6,4 +6,5 @@ obj-$(CONFIG_DMA_DECLARE_COHERENT) += coherent.o obj-$(CONFIG_DMA_VIRT_OPS) += virt.o obj-$(CONFIG_DMA_API_DEBUG)+= debug.o obj-$(CONFIG_SWIOTLB) += swiotlb.o +obj-$(CONFIG_DMA_COHERENT_POOL)+= pool.o obj-$(CONFIG_DMA_REMAP)+= remap.o diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c new file mode 100644 index ..64cbe5852cf6 --- /dev/null +++ b/kernel/dma/pool.c @@ -0,0 +1,125 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2012 ARM Ltd. + * Copyright (c) 2014 The Linux Foundation + * Copyright (C) 2020 Google LLC + */ +#include +#include +#include +#include +#include +#include +#include + +static struct gen_pool *atomic_pool __ro_after_init; + +#define DEFAULT_DMA_COHERENT_POOL_SIZE SZ_256K +static size_t atomic_pool_size __initdata = DEFAULT_DMA_COHERENT_POOL_SIZE; + +static int __init early_coherent_pool(char *p) +{ + atomic_pool_size = memparse(p, &p); + return 0; +} +early_param("coherent_pool", early_coherent_pool); + +static gfp_t dma_atomic_pool_gfp(void) +{ + if (IS_ENABLED(CONFIG_ZONE_DMA)) + return GFP_DMA; + if (IS_ENABLED(CONFIG_ZONE_DMA32)) + return GFP_DMA32; + return GFP_KERNEL; +} + +static int __init dma_atomic_pool_init(void) +{ + unsigned int pool_size_order = get_order(atomic_pool_size); + unsigned long nr_pages = atomic_pool_size >> PAGE_SHIFT; + struct page *page; + void *addr; + int ret; + + if (dev_get_cma_area(NULL)) + page = dma_alloc_from_contiguous(NULL, nr_pages, +pool_size_order, false); + else + page = alloc_pages(dma_atomic_pool_gfp(), pool_size_order); + if (!page) + goto out; + + arch_dma_prep_coherent(page, atomic_pool_size); + + atomic_pool = gen_pool_create(PAGE_SHIFT, -1); + if (!atomic_pool) + goto free_page; + + addr = dma_common_contiguous_remap(page, atomic_pool_size, + pgprot_dmacoherent(PAGE_KERNEL), + __builtin_return_address(0)); + if (!addr) + goto destroy_genpool; + + ret = gen_pool_add_virt(atomic_pool, (unsigned long)addr, + page_to_phys(page), atomic_pool_size, -1); + if (ret) + goto remove_mapping; + gen_pool_set_algo(atomic_pool, gen_pool_first_fit_order_align, NULL); + + pr_info("DMA: preallocated %zu KiB pool for atomic allocations\n", + atomic_pool_size / 1024); + return 0; + +remove_mapping: + dma_common_free_remap(addr, atomic_pool_size); +destroy_genpool: + gen_pool_destroy(atomic_pool); + atomic_pool = NULL; +free_page: + if (!dma_release_from_contiguous(NULL, page, nr_pages)) + __free_pages(page, pool_size_order); +out: + pr_err("DMA: failed to allocate %zu KiB pool for atomic coherent allocation\n", + atomic_pool_size / 1024); + return -ENOMEM; +} +postcore_initcall(dma_atomic_pool_init); + +bool dma_in_atomic_pool(void *start, size_t size) +{ + if (unlikely(!atomic_pool)) + return false; + + return gen_pool_has_addr(atomic_pool, (unsigned long)start, size); +} + +void *dma_alloc_from_pool(size_t size, struct page **ret_page, gfp_t flags) +{ + unsigned long val; + void *ptr = NULL; + + if (!atomic_pool) { + WARN(1, "coherent pool not initiali
[rfc v2 4/6] dma-direct: atomic allocations must come from atomic coherent pools
When a device required unencrypted memory and the context does not allow blocking, memory must be returned from the atomic coherent pools. This avoids the remap when CONFIG_DMA_DIRECT_REMAP is not enabled and the config only requires CONFIG_DMA_COHERENT_POOL. This will be used for CONFIG_AMD_MEM_ENCRYPT in a subsequent patch. Keep all memory in these pools unencrypted. Signed-off-by: David Rientjes --- kernel/dma/direct.c | 16 kernel/dma/pool.c | 15 +-- 2 files changed, 29 insertions(+), 2 deletions(-) diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c index 70800ca64f13..44165263c185 100644 --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -124,6 +124,18 @@ void *dma_direct_alloc_pages(struct device *dev, size_t size, struct page *page; void *ret; + /* +* Unencrypted memory must come directly from DMA atomic pools if +* blocking is not allowed. +*/ + if (IS_ENABLED(CONFIG_DMA_COHERENT_POOL) && + force_dma_unencrypted(dev) && !gfpflags_allow_blocking(gfp)) { + ret = dma_alloc_from_pool(dev, PAGE_ALIGN(size), &page, gfp); + if (!ret) + return NULL; + goto done; + } + if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && dma_alloc_need_uncached(dev, attrs) && !gfpflags_allow_blocking(gfp)) { @@ -203,6 +215,10 @@ void dma_direct_free_pages(struct device *dev, size_t size, void *cpu_addr, { unsigned int page_order = get_order(size); + if (IS_ENABLED(CONFIG_DMA_COHERENT_POOL) && + dma_free_from_pool(dev, cpu_addr, PAGE_ALIGN(size))) + return; + if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) && !force_dma_unencrypted(dev)) { /* cpu_addr is a struct page cookie, not a kernel address */ diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c index e14c5a2da734..6685ab89cfa7 100644 --- a/kernel/dma/pool.c +++ b/kernel/dma/pool.c @@ -9,6 +9,7 @@ #include #include #include +#include #include #include #include @@ -55,12 +56,20 @@ static int atomic_pool_expand(struct gen_pool *pool, size_t pool_size, arch_dma_prep_coherent(page, pool_size); +#ifdef CONFIG_DMA_DIRECT_REMAP addr = dma_common_contiguous_remap(page, pool_size, pgprot_dmacoherent(PAGE_KERNEL), __builtin_return_address(0)); if (!addr) goto free_page; - +#else + addr = page_to_virt(page); +#endif + /* +* Memory in the atomic DMA pools must be unencrypted, the pools do not +* shrink so no re-encryption occurs in dma_direct_free_pages(). +*/ + set_memory_decrypted((unsigned long)page_to_virt(page), 1 << order); ret = gen_pool_add_virt(pool, (unsigned long)addr, page_to_phys(page), pool_size, NUMA_NO_NODE); if (ret) @@ -69,8 +78,10 @@ static int atomic_pool_expand(struct gen_pool *pool, size_t pool_size, return 0; remove_mapping: +#ifdef CONFIG_DMA_DIRECT_REMAP dma_common_free_remap(addr, pool_size); -free_page: +#endif +free_page: __maybe_unused if (!dma_release_from_contiguous(NULL, page, 1 << order)) __free_pages(page, order); out: ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [rfc 5/6] dma-direct: atomic allocations must come from unencrypted pools
On Thu, 5 Mar 2020, Christoph Hellwig wrote: > On Sun, Mar 01, 2020 at 04:05:23PM -0800, David Rientjes wrote: > > When AMD memory encryption is enabled, all non-blocking DMA allocations > > must originate from the atomic pools depending on the device and the gfp > > mask of the allocation. > > > > Keep all memory in these pools unencrypted. > > > > Signed-off-by: David Rientjes > > --- > > arch/x86/Kconfig| 1 + > > kernel/dma/direct.c | 9 - > > kernel/dma/remap.c | 2 ++ > > 3 files changed, 7 insertions(+), 5 deletions(-) > > > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > > --- a/arch/x86/Kconfig > > +++ b/arch/x86/Kconfig > > @@ -1523,6 +1523,7 @@ config X86_CPA_STATISTICS > > config AMD_MEM_ENCRYPT > > bool "AMD Secure Memory Encryption (SME) support" > > depends on X86_64 && CPU_SUP_AMD > > + select DMA_DIRECT_REMAP > > I think we need to split the pool from remapping so that we don't drag > in the remap code for x86. > Thanks for the review, Christoph. I can address all the comments that you provided for the series but am hoping to get a clarification on this one depending on how elaborate the change you would prefer. As a preliminary change to this series, I could move the atomic pools and coherent_pool command line to a new kernel/dma/atomic_pools.c file with a new CONFIG_DMA_ATOMIC_POOLS that would get "select"ed by CONFIG_DMA_REMAP and CONFIG_AMD_MEM_ENCRYPT and call into dma_common_contiguous_remap() if we have CONFIG_DMA_DIRECT_REMAP when adding pages to the pool. I think that's what you mean by splitting the pool from remapping, otherwise we still have a full CONFIG_DMA_REMAP dependency here. If you had something else in mind, please let me know. Thanks! > > if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && > > - dma_alloc_need_uncached(dev, attrs) && > > We still need a check here for either uncached or memory encryption. > > > @@ -141,6 +142,7 @@ static int atomic_pool_expand(struct gen_pool *pool, > > size_t pool_size, > > if (!addr) > > goto free_page; > > > > + set_memory_decrypted((unsigned long)page_to_virt(page), nr_pages); > > This probably warrants a comment. > > Also I think the infrastructure changes should be split from the x86 > wire up. > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [rfc 4/6] dma-remap: dynamically expanding atomic pools
On Sun, 1 Mar 2020, David Rientjes wrote: > When an atomic pool becomes fully depleted because it is now relied upon > for all non-blocking allocations through the DMA API, allow background > expansion of each pool by a kworker. > > When an atomic pool has less than the default size of memory left, kick > off a kworker to dynamically expand the pool in the background. The pool > is doubled in size. > > This allows the default size to be kept quite low when one or more of the > atomic pools is not used. > > Also switch over some node ids to the more appropriate NUMA_NO_NODE. > > Signed-off-by: David Rientjes > --- > kernel/dma/remap.c | 79 ++ > 1 file changed, 58 insertions(+), 21 deletions(-) > > diff --git a/kernel/dma/remap.c b/kernel/dma/remap.c > --- a/kernel/dma/remap.c > +++ b/kernel/dma/remap.c > @@ -10,6 +10,7 @@ > #include > #include > #include > +#include > > struct page **dma_common_find_pages(void *cpu_addr) > { > @@ -104,7 +105,10 @@ static struct gen_pool *atomic_pool_dma32 > __ro_after_init; > static struct gen_pool *atomic_pool_normal __ro_after_init; > > #define DEFAULT_DMA_COHERENT_POOL_SIZE SZ_256K > -static size_t atomic_pool_size __initdata = DEFAULT_DMA_COHERENT_POOL_SIZE; > +static size_t atomic_pool_size = DEFAULT_DMA_COHERENT_POOL_SIZE; > + > +/* Dynamic background expansion when the atomic pool is near capacity */ > +struct work_struct atomic_pool_work; > > static int __init early_coherent_pool(char *p) > { > @@ -113,14 +117,14 @@ static int __init early_coherent_pool(char *p) > } > early_param("coherent_pool", early_coherent_pool); > > -static int __init __dma_atomic_pool_init(struct gen_pool **pool, > - size_t pool_size, gfp_t gfp) > +static int atomic_pool_expand(struct gen_pool *pool, size_t pool_size, > + gfp_t gfp) > { > - const unsigned int order = get_order(pool_size); > const unsigned long nr_pages = pool_size >> PAGE_SHIFT; > + const unsigned int order = get_order(pool_size); > struct page *page; > void *addr; > - int ret; > + int ret = -ENOMEM; > > if (dev_get_cma_area(NULL)) > page = dma_alloc_from_contiguous(NULL, nr_pages, order, false); There's an issue here if the pool grows too large which would result in order > MAX_ORDER-1. We can fix that by limiting order to MAX_ORDER-1 and doing nr_pages = 1 << order. I should also add support for trying smaller page allocations if our preferred expansion size results in an allocation failure. Other than that, I'll remove the RFC tag and send a refreshed series by the end of the week unless there are other comments or suggestions to factor in. Thanks! > @@ -131,38 +135,67 @@ static int __init __dma_atomic_pool_init(struct > gen_pool **pool, > > arch_dma_prep_coherent(page, pool_size); > > - *pool = gen_pool_create(PAGE_SHIFT, -1); > - if (!*pool) > - goto free_page; > - > addr = dma_common_contiguous_remap(page, pool_size, > pgprot_dmacoherent(PAGE_KERNEL), > __builtin_return_address(0)); > if (!addr) > - goto destroy_genpool; > + goto free_page; > > - ret = gen_pool_add_virt(*pool, (unsigned long)addr, page_to_phys(page), > - pool_size, -1); > + ret = gen_pool_add_virt(pool, (unsigned long)addr, page_to_phys(page), > + pool_size, NUMA_NO_NODE); > if (ret) > goto remove_mapping; > - gen_pool_set_algo(*pool, gen_pool_first_fit_order_align, NULL); > > - pr_info("DMA: preallocated %zu KiB %pGg pool for atomic allocations\n", > - pool_size >> 10, &gfp); > return 0; > > remove_mapping: > dma_common_free_remap(addr, pool_size); > -destroy_genpool: > - gen_pool_destroy(*pool); > - *pool = NULL; > free_page: > if (!dma_release_from_contiguous(NULL, page, nr_pages)) > __free_pages(page, order); > out: > - pr_err("DMA: failed to allocate %zu KiB %pGg pool for atomic > allocation\n", > - atomic_pool_size >> 10, &gfp); > - return -ENOMEM; > + return ret; > +} > + > +static void atomic_pool_resize(struct gen_pool *pool, gfp_t gfp) > +{ > + if (pool && gen_pool_avail(pool) < atomic_pool_size) > + atomic_pool_expand(pool, gen_pool_size(pool), gfp); > +} > + > +static void atomic_pool_work_fn(struct work_struct *work) > +{ > + if (IS_ENABLED(CONFIG_ZONE_DMA)) > + atomic_pool_resize(atomic_pool, GFP_DMA); > + if (IS_ENABLED(CONFIG_ZONE_DMA32)) > + atomic_pool_resize(atomic_pool_dma32, GFP_DMA32); > + atomic_pool_resize(atomic_pool_normal, GFP_KERNEL); > +} > + > +static int __init __dma_atomic_pool_init(struct gen_pool **pool, > +
[rfc 3/6] dma-remap: wire up the atomic pools depending on gfp mask
When allocating non-blockable memory, determine the optimal gfp mask of the device and use the appropriate atomic pool. The coherent DMA mask will remain the same between allocation and free and, thus, memory will be freed to the same atomic pool it was allocated from. Signed-off-by: David Rientjes --- include/linux/dma-direct.h | 2 ++ kernel/dma/direct.c| 6 +++--- kernel/dma/remap.c | 34 ++ 3 files changed, 31 insertions(+), 11 deletions(-) diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h --- a/include/linux/dma-direct.h +++ b/include/linux/dma-direct.h @@ -67,6 +67,8 @@ static inline bool dma_capable(struct device *dev, dma_addr_t addr, size_t size, } u64 dma_direct_get_required_mask(struct device *dev); +gfp_t dma_direct_optimal_gfp_mask(struct device *dev, u64 dma_mask, + u64 *phys_mask); void *dma_direct_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs); void dma_direct_free(struct device *dev, size_t size, void *cpu_addr, diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -44,8 +44,8 @@ u64 dma_direct_get_required_mask(struct device *dev) return (1ULL << (fls64(max_dma) - 1)) * 2 - 1; } -static gfp_t __dma_direct_optimal_gfp_mask(struct device *dev, u64 dma_mask, - u64 *phys_limit) +gfp_t dma_direct_optimal_gfp_mask(struct device *dev, u64 dma_mask, + u64 *phys_limit) { u64 dma_limit = min_not_zero(dma_mask, dev->bus_dma_limit); @@ -88,7 +88,7 @@ struct page *__dma_direct_alloc_pages(struct device *dev, size_t size, /* we always manually zero the memory once we are done: */ gfp &= ~__GFP_ZERO; - gfp |= __dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask, + gfp |= dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask, &phys_limit); page = dma_alloc_contiguous(dev, alloc_size, gfp); if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) { diff --git a/kernel/dma/remap.c b/kernel/dma/remap.c --- a/kernel/dma/remap.c +++ b/kernel/dma/remap.c @@ -188,28 +188,44 @@ static int __init dma_atomic_pool_init(void) } postcore_initcall(dma_atomic_pool_init); +static inline struct gen_pool *dev_to_pool(struct device *dev) +{ + u64 phys_mask; + gfp_t gfp; + + gfp = dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask, + &phys_mask); + if (IS_ENABLED(CONFIG_ZONE_DMA) && gfp == GFP_DMA) + return atomic_pool; + if (IS_ENABLED(CONFIG_ZONE_DMA32) && gfp == GFP_DMA32) + return atomic_pool_dma32; + return atomic_pool_normal; +} + static bool dma_in_atomic_pool(struct device *dev, void *start, size_t size) { - if (unlikely(!atomic_pool)) - return false; + struct gen_pool *pool = dev_to_pool(dev); - return gen_pool_has_addr(atomic_pool, (unsigned long)start, size); + if (unlikely(!pool)) + return false; + return gen_pool_has_addr(pool, (unsigned long)start, size); } void *dma_alloc_from_pool(struct device *dev, size_t size, struct page **ret_page, gfp_t flags) { + struct gen_pool *pool = dev_to_pool(dev); unsigned long val; void *ptr = NULL; - if (!atomic_pool) { - WARN(1, "coherent pool not initialised!\n"); + if (!pool) { + WARN(1, "%pGg atomic pool not initialised!\n", &flags); return NULL; } - val = gen_pool_alloc(atomic_pool, size); + val = gen_pool_alloc(pool, size); if (val) { - phys_addr_t phys = gen_pool_virt_to_phys(atomic_pool, val); + phys_addr_t phys = gen_pool_virt_to_phys(pool, val); *ret_page = pfn_to_page(__phys_to_pfn(phys)); ptr = (void *)val; @@ -221,9 +237,11 @@ void *dma_alloc_from_pool(struct device *dev, size_t size, bool dma_free_from_pool(struct device *dev, void *start, size_t size) { + struct gen_pool *pool = dev_to_pool(dev); + if (!dma_in_atomic_pool(dev, start, size)) return false; - gen_pool_free(atomic_pool, (unsigned long)start, size); + gen_pool_free(pool, (unsigned long)start, size); return true; } #endif /* CONFIG_DMA_DIRECT_REMAP */ ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[rfc 4/6] dma-remap: dynamically expanding atomic pools
When an atomic pool becomes fully depleted because it is now relied upon for all non-blocking allocations through the DMA API, allow background expansion of each pool by a kworker. When an atomic pool has less than the default size of memory left, kick off a kworker to dynamically expand the pool in the background. The pool is doubled in size. This allows the default size to be kept quite low when one or more of the atomic pools is not used. Also switch over some node ids to the more appropriate NUMA_NO_NODE. Signed-off-by: David Rientjes --- kernel/dma/remap.c | 79 ++ 1 file changed, 58 insertions(+), 21 deletions(-) diff --git a/kernel/dma/remap.c b/kernel/dma/remap.c --- a/kernel/dma/remap.c +++ b/kernel/dma/remap.c @@ -10,6 +10,7 @@ #include #include #include +#include struct page **dma_common_find_pages(void *cpu_addr) { @@ -104,7 +105,10 @@ static struct gen_pool *atomic_pool_dma32 __ro_after_init; static struct gen_pool *atomic_pool_normal __ro_after_init; #define DEFAULT_DMA_COHERENT_POOL_SIZE SZ_256K -static size_t atomic_pool_size __initdata = DEFAULT_DMA_COHERENT_POOL_SIZE; +static size_t atomic_pool_size = DEFAULT_DMA_COHERENT_POOL_SIZE; + +/* Dynamic background expansion when the atomic pool is near capacity */ +struct work_struct atomic_pool_work; static int __init early_coherent_pool(char *p) { @@ -113,14 +117,14 @@ static int __init early_coherent_pool(char *p) } early_param("coherent_pool", early_coherent_pool); -static int __init __dma_atomic_pool_init(struct gen_pool **pool, -size_t pool_size, gfp_t gfp) +static int atomic_pool_expand(struct gen_pool *pool, size_t pool_size, + gfp_t gfp) { - const unsigned int order = get_order(pool_size); const unsigned long nr_pages = pool_size >> PAGE_SHIFT; + const unsigned int order = get_order(pool_size); struct page *page; void *addr; - int ret; + int ret = -ENOMEM; if (dev_get_cma_area(NULL)) page = dma_alloc_from_contiguous(NULL, nr_pages, order, false); @@ -131,38 +135,67 @@ static int __init __dma_atomic_pool_init(struct gen_pool **pool, arch_dma_prep_coherent(page, pool_size); - *pool = gen_pool_create(PAGE_SHIFT, -1); - if (!*pool) - goto free_page; - addr = dma_common_contiguous_remap(page, pool_size, pgprot_dmacoherent(PAGE_KERNEL), __builtin_return_address(0)); if (!addr) - goto destroy_genpool; + goto free_page; - ret = gen_pool_add_virt(*pool, (unsigned long)addr, page_to_phys(page), - pool_size, -1); + ret = gen_pool_add_virt(pool, (unsigned long)addr, page_to_phys(page), + pool_size, NUMA_NO_NODE); if (ret) goto remove_mapping; - gen_pool_set_algo(*pool, gen_pool_first_fit_order_align, NULL); - pr_info("DMA: preallocated %zu KiB %pGg pool for atomic allocations\n", - pool_size >> 10, &gfp); return 0; remove_mapping: dma_common_free_remap(addr, pool_size); -destroy_genpool: - gen_pool_destroy(*pool); - *pool = NULL; free_page: if (!dma_release_from_contiguous(NULL, page, nr_pages)) __free_pages(page, order); out: - pr_err("DMA: failed to allocate %zu KiB %pGg pool for atomic allocation\n", - atomic_pool_size >> 10, &gfp); - return -ENOMEM; + return ret; +} + +static void atomic_pool_resize(struct gen_pool *pool, gfp_t gfp) +{ + if (pool && gen_pool_avail(pool) < atomic_pool_size) + atomic_pool_expand(pool, gen_pool_size(pool), gfp); +} + +static void atomic_pool_work_fn(struct work_struct *work) +{ + if (IS_ENABLED(CONFIG_ZONE_DMA)) + atomic_pool_resize(atomic_pool, GFP_DMA); + if (IS_ENABLED(CONFIG_ZONE_DMA32)) + atomic_pool_resize(atomic_pool_dma32, GFP_DMA32); + atomic_pool_resize(atomic_pool_normal, GFP_KERNEL); +} + +static int __init __dma_atomic_pool_init(struct gen_pool **pool, +size_t pool_size, gfp_t gfp) +{ + int ret; + + *pool = gen_pool_create(PAGE_SHIFT, NUMA_NO_NODE); + if (!*pool) + return -ENOMEM; + + gen_pool_set_algo(*pool, gen_pool_first_fit_order_align, NULL); + + ret = atomic_pool_expand(*pool, pool_size, gfp); + if (ret) { + gen_pool_destroy(*pool); + *pool = NULL; + pr_err("DMA: failed to allocate %zu KiB %pGg pool for atomic allocation\n", + atomic_pool_size >> 10, &gfp); + return ret; + } + + + pr_info("DMA: preallocated %zu KiB %pGg pool for atomic allocations\n", +
[rfc 6/6] dma-remap: double the default DMA coherent pool size
When AMD memory encryption is enabled, some devices may used more than 256KB/sec from the atomic pools. Double the default size to make the original size and expansion more appropriate. This provides a slight optimization on initial expansion and is deemed appropriate for all configs with CONFIG_DMA_REMAP enabled because of the increased reliance on the atomic pools. Alternatively, this could be done only when CONFIG_AMD_MEM_ENCRYPT is enabled. Signed-off-by: David Rientjes --- kernel/dma/remap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/dma/remap.c b/kernel/dma/remap.c --- a/kernel/dma/remap.c +++ b/kernel/dma/remap.c @@ -105,7 +105,7 @@ static struct gen_pool *atomic_pool __ro_after_init; static struct gen_pool *atomic_pool_dma32 __ro_after_init; static struct gen_pool *atomic_pool_normal __ro_after_init; -#define DEFAULT_DMA_COHERENT_POOL_SIZE SZ_256K +#define DEFAULT_DMA_COHERENT_POOL_SIZE SZ_512K static size_t atomic_pool_size = DEFAULT_DMA_COHERENT_POOL_SIZE; /* Dynamic background expansion when the atomic pool is near capacity */ ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[rfc 5/6] dma-direct: atomic allocations must come from unencrypted pools
When AMD memory encryption is enabled, all non-blocking DMA allocations must originate from the atomic pools depending on the device and the gfp mask of the allocation. Keep all memory in these pools unencrypted. Signed-off-by: David Rientjes --- arch/x86/Kconfig| 1 + kernel/dma/direct.c | 9 - kernel/dma/remap.c | 2 ++ 3 files changed, 7 insertions(+), 5 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1523,6 +1523,7 @@ config X86_CPA_STATISTICS config AMD_MEM_ENCRYPT bool "AMD Secure Memory Encryption (SME) support" depends on X86_64 && CPU_SUP_AMD + select DMA_DIRECT_REMAP select DYNAMIC_PHYSICAL_MASK select ARCH_USE_MEMREMAP_PROT select ARCH_HAS_FORCE_DMA_UNENCRYPTED diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -125,7 +125,6 @@ void *dma_direct_alloc_pages(struct device *dev, size_t size, void *ret; if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && - dma_alloc_need_uncached(dev, attrs) && !gfpflags_allow_blocking(gfp)) { ret = dma_alloc_from_pool(dev, PAGE_ALIGN(size), &page, gfp); if (!ret) @@ -202,6 +201,10 @@ void dma_direct_free_pages(struct device *dev, size_t size, void *cpu_addr, { unsigned int page_order = get_order(size); + if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && + dma_free_from_pool(dev, cpu_addr, PAGE_ALIGN(size))) + return; + if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) && !force_dma_unencrypted(dev)) { /* cpu_addr is a struct page cookie, not a kernel address */ @@ -209,10 +212,6 @@ void dma_direct_free_pages(struct device *dev, size_t size, void *cpu_addr, return; } - if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && - dma_free_from_pool(dev, cpu_addr, PAGE_ALIGN(size))) - return; - if (force_dma_unencrypted(dev)) set_memory_encrypted((unsigned long)cpu_addr, 1 << page_order); diff --git a/kernel/dma/remap.c b/kernel/dma/remap.c --- a/kernel/dma/remap.c +++ b/kernel/dma/remap.c @@ -8,6 +8,7 @@ #include #include #include +#include #include #include #include @@ -141,6 +142,7 @@ static int atomic_pool_expand(struct gen_pool *pool, size_t pool_size, if (!addr) goto free_page; + set_memory_decrypted((unsigned long)page_to_virt(page), nr_pages); ret = gen_pool_add_virt(pool, (unsigned long)addr, page_to_phys(page), pool_size, NUMA_NO_NODE); if (ret) ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[rfc 0/6] unencrypted atomic DMA pools with dynamic expansion
set_memory_decrypted() may block so it is not possible to do non-blocking allocations through the DMA API for devices that required unencrypted memory. The solution is to expand the atomic DMA pools for the various possible gfp requirements as a means to prevent an unnecessary depletion of lowmem. These atomic DMA pools are kept unencrypted so they can immediately be used for non-blocking allocations. Since the need for this type of memory depends on the kernel config and devices being used, these pools are also dynamically expandable. Some of these patches could be squashed into each other, but they were separated out to ease code review. This patchset is based on 5.6-rc4. --- arch/x86/Kconfig| 1 + drivers/iommu/dma-iommu.c | 5 +- include/linux/dma-direct.h | 2 + include/linux/dma-mapping.h | 6 +- kernel/dma/direct.c | 17 ++-- kernel/dma/remap.c | 173 +--- 6 files changed, 140 insertions(+), 64 deletions(-) ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[rfc 2/6] dma-remap: add additional atomic pools to map to gfp mask
The single atomic pool is allocated from the lowest zone possible since it is guaranteed to be applicable for any DMA allocation. Devices may allocate through the DMA API but not have a strict reliance on GFP_DMA memory. Since the atomic pool will be used for all non-blockable allocations, returning all memory from ZONE_DMA may unnecessarily deplete the zone. Provision for multiple atomic pools that will map to the optimal gfp mask of the device. These will be wired up in a subsequent patch. Signed-off-by: David Rientjes --- kernel/dma/remap.c | 75 +++--- 1 file changed, 45 insertions(+), 30 deletions(-) diff --git a/kernel/dma/remap.c b/kernel/dma/remap.c --- a/kernel/dma/remap.c +++ b/kernel/dma/remap.c @@ -100,6 +100,8 @@ void dma_common_free_remap(void *cpu_addr, size_t size) #ifdef CONFIG_DMA_DIRECT_REMAP static struct gen_pool *atomic_pool __ro_after_init; +static struct gen_pool *atomic_pool_dma32 __ro_after_init; +static struct gen_pool *atomic_pool_normal __ro_after_init; #define DEFAULT_DMA_COHERENT_POOL_SIZE SZ_256K static size_t atomic_pool_size __initdata = DEFAULT_DMA_COHERENT_POOL_SIZE; @@ -111,66 +113,79 @@ static int __init early_coherent_pool(char *p) } early_param("coherent_pool", early_coherent_pool); -static gfp_t dma_atomic_pool_gfp(void) +static int __init __dma_atomic_pool_init(struct gen_pool **pool, +size_t pool_size, gfp_t gfp) { - if (IS_ENABLED(CONFIG_ZONE_DMA)) - return GFP_DMA; - if (IS_ENABLED(CONFIG_ZONE_DMA32)) - return GFP_DMA32; - return GFP_KERNEL; -} - -static int __init dma_atomic_pool_init(void) -{ - unsigned int pool_size_order = get_order(atomic_pool_size); - unsigned long nr_pages = atomic_pool_size >> PAGE_SHIFT; + const unsigned int order = get_order(pool_size); + const unsigned long nr_pages = pool_size >> PAGE_SHIFT; struct page *page; void *addr; int ret; if (dev_get_cma_area(NULL)) - page = dma_alloc_from_contiguous(NULL, nr_pages, -pool_size_order, false); + page = dma_alloc_from_contiguous(NULL, nr_pages, order, false); else - page = alloc_pages(dma_atomic_pool_gfp(), pool_size_order); + page = alloc_pages(gfp, order); if (!page) goto out; - arch_dma_prep_coherent(page, atomic_pool_size); + arch_dma_prep_coherent(page, pool_size); - atomic_pool = gen_pool_create(PAGE_SHIFT, -1); - if (!atomic_pool) + *pool = gen_pool_create(PAGE_SHIFT, -1); + if (!*pool) goto free_page; - addr = dma_common_contiguous_remap(page, atomic_pool_size, + addr = dma_common_contiguous_remap(page, pool_size, pgprot_dmacoherent(PAGE_KERNEL), __builtin_return_address(0)); if (!addr) goto destroy_genpool; - ret = gen_pool_add_virt(atomic_pool, (unsigned long)addr, - page_to_phys(page), atomic_pool_size, -1); + ret = gen_pool_add_virt(*pool, (unsigned long)addr, page_to_phys(page), + pool_size, -1); if (ret) goto remove_mapping; - gen_pool_set_algo(atomic_pool, gen_pool_first_fit_order_align, NULL); + gen_pool_set_algo(*pool, gen_pool_first_fit_order_align, NULL); - pr_info("DMA: preallocated %zu KiB pool for atomic allocations\n", - atomic_pool_size / 1024); + pr_info("DMA: preallocated %zu KiB %pGg pool for atomic allocations\n", + pool_size >> 10, &gfp); return 0; remove_mapping: - dma_common_free_remap(addr, atomic_pool_size); + dma_common_free_remap(addr, pool_size); destroy_genpool: - gen_pool_destroy(atomic_pool); - atomic_pool = NULL; + gen_pool_destroy(*pool); + *pool = NULL; free_page: if (!dma_release_from_contiguous(NULL, page, nr_pages)) - __free_pages(page, pool_size_order); + __free_pages(page, order); out: - pr_err("DMA: failed to allocate %zu KiB pool for atomic coherent allocation\n", - atomic_pool_size / 1024); + pr_err("DMA: failed to allocate %zu KiB %pGg pool for atomic allocation\n", + atomic_pool_size >> 10, &gfp); return -ENOMEM; } + +static int __init dma_atomic_pool_init(void) +{ + int ret = 0; + int err; + + ret = __dma_atomic_pool_init(&atomic_pool_normal, atomic_pool_size, +GFP_KERNEL); + if (IS_ENABLED(CONFIG_ZONE_DMA)) { + err = __dma_atomic_pool_init(&atomic_pool, atomic_pool_size, +GFP_DMA); + if (!ret && err) +
[rfc 1/6] dma-mapping: pass device to atomic allocation functions
This augments the dma_{alloc,free}_from_pool() functions with a pointer to the struct device of the allocation. This introduces no functional change and will be used later to determine the optimal gfp mask to allocate memory from. dma_in_atomic_pool() is not used outside kernel/dma/remap.c, so remove its declaration from the header file. Signed-off-by: David Rientjes --- drivers/iommu/dma-iommu.c | 5 +++-- include/linux/dma-mapping.h | 6 +++--- kernel/dma/direct.c | 4 ++-- kernel/dma/remap.c | 9 + 4 files changed, 13 insertions(+), 11 deletions(-) diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -952,7 +952,7 @@ static void __iommu_dma_free(struct device *dev, size_t size, void *cpu_addr) /* Non-coherent atomic allocation? Easy */ if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && - dma_free_from_pool(cpu_addr, alloc_size)) + dma_free_from_pool(dev, cpu_addr, alloc_size)) return; if (IS_ENABLED(CONFIG_DMA_REMAP) && is_vmalloc_addr(cpu_addr)) { @@ -1035,7 +1035,8 @@ static void *iommu_dma_alloc(struct device *dev, size_t size, if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && !gfpflags_allow_blocking(gfp) && !coherent) - cpu_addr = dma_alloc_from_pool(PAGE_ALIGN(size), &page, gfp); + cpu_addr = dma_alloc_from_pool(dev, PAGE_ALIGN(size), &page, + gfp); else cpu_addr = iommu_dma_alloc_pages(dev, size, &page, gfp, attrs); if (!cpu_addr) diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -630,9 +630,9 @@ void *dma_common_pages_remap(struct page **pages, size_t size, pgprot_t prot, const void *caller); void dma_common_free_remap(void *cpu_addr, size_t size); -bool dma_in_atomic_pool(void *start, size_t size); -void *dma_alloc_from_pool(size_t size, struct page **ret_page, gfp_t flags); -bool dma_free_from_pool(void *start, size_t size); +void *dma_alloc_from_pool(struct device *dev, size_t size, + struct page **ret_page, gfp_t flags); +bool dma_free_from_pool(struct device *dev, void *start, size_t size); int dma_common_get_sgtable(struct device *dev, struct sg_table *sgt, void *cpu_addr, diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -127,7 +127,7 @@ void *dma_direct_alloc_pages(struct device *dev, size_t size, if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && dma_alloc_need_uncached(dev, attrs) && !gfpflags_allow_blocking(gfp)) { - ret = dma_alloc_from_pool(PAGE_ALIGN(size), &page, gfp); + ret = dma_alloc_from_pool(dev, PAGE_ALIGN(size), &page, gfp); if (!ret) return NULL; goto done; @@ -210,7 +210,7 @@ void dma_direct_free_pages(struct device *dev, size_t size, void *cpu_addr, } if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && - dma_free_from_pool(cpu_addr, PAGE_ALIGN(size))) + dma_free_from_pool(dev, cpu_addr, PAGE_ALIGN(size))) return; if (force_dma_unencrypted(dev)) diff --git a/kernel/dma/remap.c b/kernel/dma/remap.c --- a/kernel/dma/remap.c +++ b/kernel/dma/remap.c @@ -173,7 +173,7 @@ static int __init dma_atomic_pool_init(void) } postcore_initcall(dma_atomic_pool_init); -bool dma_in_atomic_pool(void *start, size_t size) +static bool dma_in_atomic_pool(struct device *dev, void *start, size_t size) { if (unlikely(!atomic_pool)) return false; @@ -181,7 +181,8 @@ bool dma_in_atomic_pool(void *start, size_t size) return gen_pool_has_addr(atomic_pool, (unsigned long)start, size); } -void *dma_alloc_from_pool(size_t size, struct page **ret_page, gfp_t flags) +void *dma_alloc_from_pool(struct device *dev, size_t size, + struct page **ret_page, gfp_t flags) { unsigned long val; void *ptr = NULL; @@ -203,9 +204,9 @@ void *dma_alloc_from_pool(size_t size, struct page **ret_page, gfp_t flags) return ptr; } -bool dma_free_from_pool(void *start, size_t size) +bool dma_free_from_pool(struct device *dev, void *start, size_t size) { - if (!dma_in_atomic_pool(start, size)) + if (!dma_in_atomic_pool(dev, start, size)) return false; gen_pool_free(atomic_pool, (unsigned long)start, size); return true; ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [rfc] dma-mapping: preallocate unencrypted DMA atomic pool
On Fri, 17 Jan 2020, Tom Lendacky wrote: > On 12/31/19 7:54 PM, David Rientjes wrote: > > Christoph, Thomas, is something like this (without the diagnosic > > information included in this patch) acceptable for these allocations? > > Adding expansion support when the pool is half depleted wouldn't be *that* > > hard. > > Sorry for the delay in responding... Overall, I think this will work > well. If you want the default size to be adjustable, you can always go > with a Kconfig setting or a command line parameter or both (I realize the > command line parameter is not optimal). > Ok, so we've determined that we don't only have a dependency on GFP_DMA memory through the DMA API in a non-blocking context that needs to be unencrypted, but also GFP_KERNEL. We don't have a dependency on GFP_DMA32 memory (yet) but should likely provision for it. So my patch would change by actually providing three separate pools, one for ZONE_DMA, one for ZONE_DMA32, and one for ZONE_NORMAL. The ZONE_DMA already exists in the form of the atomic_pool, so it would add two additional pools that likely start out at the same size and dynamically expand with a kworker when its usage approaches the limitatios of the pool. I don't necessarily like needing three separate pools, but I can't think of a better way to provide unencrypted memory for non-blocking allocations that work for all possible device gfp masks. My idea right now is to create all three pools instead of the single atomic_pool, all 256KB in size, and anytime their usage reaches half their limit, we kick off some background work to double the size of the pool with GFP_KERNEL | __GFP_NORETRY. Our experimentation suggests that a kworker would actually respond in time for this. Any objections to this approach? If so, an alternative suggestion would be appreciated :) I plan on all atomic pools to be unencrypted at the time the allocation is successful unless there is some strange need for non-blocking atomic allocations through the DMA API that should *not* be encrypted. > Just a couple of overall comments about the use of variable names and > messages using both unencrypted and encrypted, I think the use should be > consistent throughout the patch. > > Thanks, > Tom > > > > > Or are there alternatives we should consider? Thanks! > > > > > > > > > > When AMD SEV is enabled in the guest, all allocations through > > dma_pool_alloc_page() must call set_memory_decrypted() for unencrypted > > DMA. This includes dma_pool_alloc() and dma_direct_alloc_pages(). These > > calls may block which is not allowed in atomic allocation contexts such as > > from the NVMe driver. > > > > Preallocate a complementary unecrypted DMA atomic pool that is initially > > 4MB in size. This patch does not contain dynamic expansion, but that > > could be added if necessary. > > > > In our stress testing, our peak unecrypted DMA atomic allocation > > requirements is ~1.4MB, so 4MB is plenty. This pool is similar to the > > existing DMA atomic pool but is unencrypted. > > > > Signed-off-by: David Rientjes > > --- > > Based on v5.4 HEAD. > > > > This commit contains diagnostic information and is not intended for use > > in a production environment. > > > > arch/x86/Kconfig| 1 + > > drivers/iommu/dma-iommu.c | 5 +- > > include/linux/dma-mapping.h | 7 ++- > > kernel/dma/direct.c | 16 - > > kernel/dma/remap.c | 116 ++-- > > 5 files changed, 108 insertions(+), 37 deletions(-) > > > > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig > > --- a/arch/x86/Kconfig > > +++ b/arch/x86/Kconfig > > @@ -1530,6 +1530,7 @@ config X86_CPA_STATISTICS > > config AMD_MEM_ENCRYPT > > bool "AMD Secure Memory Encryption (SME) support" > > depends on X86_64 && CPU_SUP_AMD > > + select DMA_DIRECT_REMAP > > select DYNAMIC_PHYSICAL_MASK > > select ARCH_USE_MEMREMAP_PROT > > select ARCH_HAS_FORCE_DMA_UNENCRYPTED > > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c > > --- a/drivers/iommu/dma-iommu.c > > +++ b/drivers/iommu/dma-iommu.c > > @@ -928,7 +928,7 @@ static void __iommu_dma_free(struct device *dev, size_t > > size, void *cpu_addr) > > > > /* Non-coherent atomic allocation? Easy */ > > if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && > > - dma_free_from_pool(cpu_addr, alloc_size)) > > + dma_free_from_pool(dev, cpu_addr, alloc_size)) > > return; > > > > if (IS_ENABLED(CONFIG_DMA_REMAP) && is_vmalloc_addr(cpu_addr)) { > > @@ -1011,7 +1011,8 @@ static void *iommu_dma_alloc(struct device *dev, > > size_t size, > > > > if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && > > !gfpflags_allow_blocking(gfp) && !coherent) > > - cpu_addr = dma_alloc_from_pool(PAGE_ALIGN(size), &page, gfp); > > + cpu_addr = dma_alloc_from_pool(dev, PAGE_ALIGN(size), &page, > > +
Re: [rfc] dma-mapping: preallocate unencrypted DMA atomic pool
On Thu, 9 Jan 2020, Christoph Hellwig wrote: > > I'll rely on Thomas to chime in if this doesn't make sense for the SEV > > usecase. > > > > I think the sizing of the single atomic pool needs to be determined. Our > > peak usage that we have measured from NVMe is ~1.4MB and atomic_pool is > > currently sized to 256KB by default. I'm unsure at this time if we need > > to be able to dynamically expand this pool with a kworker. > > > > Maybe when CONFIG_AMD_MEM_ENCRYPT is enabled this atomic pool should be > > sized to 2MB or so and then when it reaches half capacity we schedule some > > background work to dynamically increase it? That wouldn't be hard unless > > the pool can be rapidly depleted. > > > > Note that a non-coherent architecture with the same workload would need > the same size. > > > Do we want to increase the atomic pool size by default and then do > > background dynamic expansion? > > For now I'd just scale with system memory size. > Thanks Christoph and Robin for the help, we're running some additional stress tests to double check that our required amount of memory from this pool is accurate. Once that's done, I'll refresh the patch with th suggestions and propose it formally. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [rfc] dma-mapping: preallocate unencrypted DMA atomic pool
On Tue, 7 Jan 2020, Christoph Hellwig wrote: > > On 01/01/2020 1:54 am, David Rientjes via iommu wrote: > >> Christoph, Thomas, is something like this (without the diagnosic > >> information included in this patch) acceptable for these allocations? > >> Adding expansion support when the pool is half depleted wouldn't be *that* > >> hard. > >> > >> Or are there alternatives we should consider? Thanks! > > > > Are there any platforms which require both non-cacheable remapping *and* > > unencrypted remapping for distinct subsets of devices? > > > > If not (and I'm assuming there aren't, because otherwise this patch is > > incomplete in covering only 2 of the 3 possible combinations), then > > couldn't we keep things simpler by just attributing both properties to the > > single "atomic pool" on the basis that one or the other will always be a > > no-op? In other words, basically just tweaking the existing "!coherent" > > tests to "!coherent || force_dma_unencrypted()" and doing > > set_dma_unencrypted() unconditionally in atomic_pool_init(). > > I think that would make most sense. > I'll rely on Thomas to chime in if this doesn't make sense for the SEV usecase. I think the sizing of the single atomic pool needs to be determined. Our peak usage that we have measured from NVMe is ~1.4MB and atomic_pool is currently sized to 256KB by default. I'm unsure at this time if we need to be able to dynamically expand this pool with a kworker. Maybe when CONFIG_AMD_MEM_ENCRYPT is enabled this atomic pool should be sized to 2MB or so and then when it reaches half capacity we schedule some background work to dynamically increase it? That wouldn't be hard unless the pool can be rapidly depleted. Do we want to increase the atomic pool size by default and then do background dynamic expansion? ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
[rfc] dma-mapping: preallocate unencrypted DMA atomic pool
Christoph, Thomas, is something like this (without the diagnosic information included in this patch) acceptable for these allocations? Adding expansion support when the pool is half depleted wouldn't be *that* hard. Or are there alternatives we should consider? Thanks! When AMD SEV is enabled in the guest, all allocations through dma_pool_alloc_page() must call set_memory_decrypted() for unencrypted DMA. This includes dma_pool_alloc() and dma_direct_alloc_pages(). These calls may block which is not allowed in atomic allocation contexts such as from the NVMe driver. Preallocate a complementary unecrypted DMA atomic pool that is initially 4MB in size. This patch does not contain dynamic expansion, but that could be added if necessary. In our stress testing, our peak unecrypted DMA atomic allocation requirements is ~1.4MB, so 4MB is plenty. This pool is similar to the existing DMA atomic pool but is unencrypted. Signed-off-by: David Rientjes --- Based on v5.4 HEAD. This commit contains diagnostic information and is not intended for use in a production environment. arch/x86/Kconfig| 1 + drivers/iommu/dma-iommu.c | 5 +- include/linux/dma-mapping.h | 7 ++- kernel/dma/direct.c | 16 - kernel/dma/remap.c | 116 ++-- 5 files changed, 108 insertions(+), 37 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1530,6 +1530,7 @@ config X86_CPA_STATISTICS config AMD_MEM_ENCRYPT bool "AMD Secure Memory Encryption (SME) support" depends on X86_64 && CPU_SUP_AMD + select DMA_DIRECT_REMAP select DYNAMIC_PHYSICAL_MASK select ARCH_USE_MEMREMAP_PROT select ARCH_HAS_FORCE_DMA_UNENCRYPTED diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -928,7 +928,7 @@ static void __iommu_dma_free(struct device *dev, size_t size, void *cpu_addr) /* Non-coherent atomic allocation? Easy */ if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && - dma_free_from_pool(cpu_addr, alloc_size)) + dma_free_from_pool(dev, cpu_addr, alloc_size)) return; if (IS_ENABLED(CONFIG_DMA_REMAP) && is_vmalloc_addr(cpu_addr)) { @@ -1011,7 +1011,8 @@ static void *iommu_dma_alloc(struct device *dev, size_t size, if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && !gfpflags_allow_blocking(gfp) && !coherent) - cpu_addr = dma_alloc_from_pool(PAGE_ALIGN(size), &page, gfp); + cpu_addr = dma_alloc_from_pool(dev, PAGE_ALIGN(size), &page, + gfp); else cpu_addr = iommu_dma_alloc_pages(dev, size, &page, gfp, attrs); if (!cpu_addr) diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -629,9 +629,10 @@ void *dma_common_pages_remap(struct page **pages, size_t size, pgprot_t prot, const void *caller); void dma_common_free_remap(void *cpu_addr, size_t size); -bool dma_in_atomic_pool(void *start, size_t size); -void *dma_alloc_from_pool(size_t size, struct page **ret_page, gfp_t flags); -bool dma_free_from_pool(void *start, size_t size); +bool dma_in_atomic_pool(struct device *dev, void *start, size_t size); +void *dma_alloc_from_pool(struct device *dev, size_t size, + struct page **ret_page, gfp_t flags); +bool dma_free_from_pool(struct device *dev, void *start, size_t size); int dma_common_get_sgtable(struct device *dev, struct sg_table *sgt, void *cpu_addr, diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -10,6 +10,7 @@ #include #include #include +#include #include #include #include @@ -131,6 +132,13 @@ void *dma_direct_alloc_pages(struct device *dev, size_t size, struct page *page; void *ret; + if (!gfpflags_allow_blocking(gfp) && force_dma_unencrypted(dev)) { + ret = dma_alloc_from_pool(dev, size, &page, gfp); + if (!ret) + return NULL; + goto done; + } + page = __dma_direct_alloc_pages(dev, size, dma_handle, gfp, attrs); if (!page) return NULL; @@ -156,7 +164,7 @@ void *dma_direct_alloc_pages(struct device *dev, size_t size, __dma_direct_free_pages(dev, size, page); return NULL; } - +done: ret = page_address(page); if (force_dma_unencrypted(dev)) { set_memory_decrypted((unsigned long)ret, 1 << get_order(size)); @@ -185,6 +193,12 @@ void dma_direct_free_pages(struct device *dev, size_t size, void *cpu_addr, { unsigned int page_order = get_order(size); + if (force_dma_unencrypted(dev) &&
Re: [bug] __blk_mq_run_hw_queue suspicious rcu usage
On Thu, 12 Dec 2019, David Rientjes wrote: > Since all DMA must be unencrypted in this case, what happens if all > dma_direct_alloc_pages() calls go through the DMA pool in > kernel/dma/remap.c when force_dma_unencrypted(dev) == true since > __PAGE_ENC is cleared for these ptes? (Ignoring for a moment that this > special pool should likely be a separate dma pool.) > > I assume a general depletion of that atomic pool so > DEFAULT_DMA_COHERENT_POOL_SIZE becomes insufficient. I'm not sure what > size any DMA pool wired up for this specific purpose would need to be > sized at, so I assume dynamic resizing is required. > > It shouldn't be *that* difficult to supplement kernel/dma/remap.c with the > ability to do background expansion of the atomic pool when nearing its > capacity for this purpose? I imagine that if we just can't allocate pages > within the DMA mask that it's the only blocker to dynamic expansion and we > don't oom kill for lowmem. But perhaps vm.lowmem_reserve_ratio is good > enough protection? > > Beyond that, I'm not sure what sizing would be appropriate if this is to > be a generic solution in the DMA API for all devices that may require > unecrypted memory. > Optimizations involving lowmem reserve ratio aside, is it ok that CONFIG_AMD_MEM_ENCRYPT develops a dependency on DMA_DIRECT_REMAP because set_memory_decrypted() must be allowed to block? If so, we could allocate from the atomic pool when we can't block and the device requires unencrypted DMA from dma_direct_alloc_pages(). I assume we need this to be its own atomic pool specifically for force_dma_unencrypted() devices and to check addr_in_gen_pool() for this new unencrypted pool in dma_direct_free_pages(). I have no idea how large this unencrypted atomic pool should be sized. We could determine a nice default and grow size for nvme itself, but as Christoph mentioned many drivers require non-blockable allocations that can be run inside a SEV encrypted guest. Trivial implementation would be to just double the size of the unencrypted pool when it reaches half capacity. Perhaps done with GFP_KERNEL | __GFP_DMA allocations in a workqueue. We can reclaim from ZONE_DMA or ZONE_DMA32 in this context but when that fails I'm not sure if it's satisfactory to just fail the dma_pool_alloc() when the unecrypted pool runs out. Heuristics can be tweaked, of course, but I want to make sure I'm not missing anything obvious with this approach before implementing it. Please let me know, thanks. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [bug] __blk_mq_run_hw_queue suspicious rcu usage
On Thu, 12 Dec 2019, David Rientjes wrote: > Since all DMA must be unencrypted in this case, what happens if all > dma_direct_alloc_pages() calls go through the DMA pool in > kernel/dma/remap.c when force_dma_unencrypted(dev) == true since > __PAGE_ENC is cleared for these ptes? (Ignoring for a moment that this > special pool should likely be a separate dma pool.) > > I assume a general depletion of that atomic pool so > DEFAULT_DMA_COHERENT_POOL_SIZE becomes insufficient. I'm not sure what > size any DMA pool wired up for this specific purpose would need to be > sized at, so I assume dynamic resizing is required. > > It shouldn't be *that* difficult to supplement kernel/dma/remap.c with the > ability to do background expansion of the atomic pool when nearing its > capacity for this purpose? I imagine that if we just can't allocate pages > within the DMA mask that it's the only blocker to dynamic expansion and we > don't oom kill for lowmem. But perhaps vm.lowmem_reserve_ratio is good > enough protection? > > Beyond that, I'm not sure what sizing would be appropriate if this is to > be a generic solution in the DMA API for all devices that may require > unecrypted memory. > Secondly, I'm wondering about how the DMA pool for atomic allocations compares with lowmem reserve for both ZONE_DMA and ZONE_DMA32. For allocations where the classzone index is one of these zones, the lowmem reserve is static, we don't account the amount of lowmem allocated and adjust this for future watermark checks in the page allocator. We always guarantee that reserve is free (absent the depletion of the zone due to GFP_ATOMIC allocations where we fall below the min watermarks). If all DMA memory needs to have _PAGE_ENC cleared when the guest is SEV encrypted, I'm wondering if the entire lowmem reserve could be designed as a pool of lowmem pages rather than a watermark check. If implemented as a pool of pages in the page allocator itself, and today's reserve is static, maybe we could get away with a dynamic resizing based on that static amount? We could offload the handling of this reserve to kswapd such that when the pool falls below today's reserve amount, we dynamically expand, do the necessary unencryption in blockable context, and add to the pool. Bonus is that this provides high-order lowmem reserve if implemented as per-order freelists rather than the current watermark check that provides no guarantees for any high-order lowmem. I don't want to distract from the first set of questions in my previous email because I need an understanding of that anyway, but I'm hoping Christoph can guide me on why the above wouldn't be an improvement even for non encrypted guests. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [bug] __blk_mq_run_hw_queue suspicious rcu usage
On Thu, 28 Nov 2019, Christoph Hellwig wrote: > > So we're left with making dma_pool_alloc(GFP_ATOMIC) actually be atomic > > even when the DMA needs to be unencrypted for SEV. Christoph's suggestion > > was to wire up dmapool in kernel/dma/remap.c for this. Is that necessary > > to be done for all devices that need to do dma_pool_alloc(GFP_ATOMIC) or > > can we do it within the DMA API itself so it's transparent to the driver? > > It needs to be transparent to the driver. Lots of drivers use GFP_ATOMIC > dma allocations, and all of them are broken on SEV setups currently. > Not my area, so bear with me. Since all DMA must be unencrypted in this case, what happens if all dma_direct_alloc_pages() calls go through the DMA pool in kernel/dma/remap.c when force_dma_unencrypted(dev) == true since __PAGE_ENC is cleared for these ptes? (Ignoring for a moment that this special pool should likely be a separate dma pool.) I assume a general depletion of that atomic pool so DEFAULT_DMA_COHERENT_POOL_SIZE becomes insufficient. I'm not sure what size any DMA pool wired up for this specific purpose would need to be sized at, so I assume dynamic resizing is required. It shouldn't be *that* difficult to supplement kernel/dma/remap.c with the ability to do background expansion of the atomic pool when nearing its capacity for this purpose? I imagine that if we just can't allocate pages within the DMA mask that it's the only blocker to dynamic expansion and we don't oom kill for lowmem. But perhaps vm.lowmem_reserve_ratio is good enough protection? Beyond that, I'm not sure what sizing would be appropriate if this is to be a generic solution in the DMA API for all devices that may require unecrypted memory. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [bug] __blk_mq_run_hw_queue suspicious rcu usage
On Wed, 18 Sep 2019, Christoph Hellwig wrote: > On Tue, Sep 17, 2019 at 06:41:02PM +, Lendacky, Thomas wrote: > > > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c > > > --- a/drivers/nvme/host/pci.c > > > +++ b/drivers/nvme/host/pci.c > > > @@ -1613,7 +1613,8 @@ static int nvme_alloc_admin_tags(struct nvme_dev > > > *dev) > > > dev->admin_tagset.timeout = ADMIN_TIMEOUT; > > > dev->admin_tagset.numa_node = dev_to_node(dev->dev); > > > dev->admin_tagset.cmd_size = sizeof(struct nvme_iod); > > > - dev->admin_tagset.flags = BLK_MQ_F_NO_SCHED; > > > + dev->admin_tagset.flags = BLK_MQ_F_NO_SCHED | > > > + BLK_MQ_F_BLOCKING; > > > > I think you want to only set the BLK_MQ_F_BLOCKING if the DMA is required > > to be unencrypted. Unfortunately, force_dma_unencrypted() can't be called > > from a module. Is there a DMA API that could be called to get that info? > > The DMA API must support non-blocking calls, and various drivers rely > on that. So we need to provide that even for the SEV case. If the > actual blocking can't be made to work we'll need to wire up the DMA > pool in kernel/dma/remap.c for it (and probably move it to separate > file). > Resurrecting this thread from a couple months ago because it appears that this is still an issue with 5.4 guests. dma_pool_alloc(), regardless of whether mem_flags allows blocking or not, can always sleep if the device's DMA must be unencrypted and mem_encrypt_active() == true. We know this because vm_unmap_aliases() can always block. NVMe's setup of PRPs and SGLs uses dma_pool_alloc(GFP_ATOMIC) but when this is a SEV-enabled guest this allocation may block due to the possibility of allocating DMA coherent memory through dma_direct_alloc(). It seems like one solution would be to add significant latency by doing BLK_MQ_F_BLOCKING if force_dma_unencrypted() is true for the device but this comes with significant downsides. So we're left with making dma_pool_alloc(GFP_ATOMIC) actually be atomic even when the DMA needs to be unencrypted for SEV. Christoph's suggestion was to wire up dmapool in kernel/dma/remap.c for this. Is that necessary to be done for all devices that need to do dma_pool_alloc(GFP_ATOMIC) or can we do it within the DMA API itself so it's transparent to the driver? Thomas/Brijesh: separately, it seems the use of set_memory_encrypted() or set_memory_decrypted() must be possible without blocking; is this only an issue from the DMA API point of view or can it be done elsewhere? ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [bug] __blk_mq_run_hw_queue suspicious rcu usage
On Thu, 5 Sep 2019, Christoph Hellwig wrote: > > Hi Christoph, Jens, and Ming, > > > > While booting a 5.2 SEV-enabled guest we have encountered the following > > WARNING that is followed up by a BUG because we are in atomic context > > while trying to call set_memory_decrypted: > > Well, this really is a x86 / DMA API issue unfortunately. Drivers > are allowed to do GFP_ATOMIC dma allocation under locks / rcu critical > sections and from interrupts. And it seems like the SEV case can't > handle that. We have some semi-generic code to have a fixed sized > pool in kernel/dma for non-coherent platforms that have similar issues > that we could try to wire up, but I wonder if there is a better way > to handle the issue, so I've added Tom and the x86 maintainers. > > Now independent of that issue using DMA coherent memory for the nvme > PRPs/SGLs doesn't actually feel very optional. We could do with > normal kmalloc allocations and just sync it to the device and back. > I wonder if we should create some general mempool-like helpers for that. > Thanks for looking into this. I assume it's a non-starter to try to address this in _vm_unmap_aliases() itself, i.e. rely on a purge spinlock to do all synchronization (or trylock if not forced) for purge_vmap_area_lazy() rather than only the vmap_area_lock within it. In other words, no mutex. If that's the case, and set_memory_encrypted() can't be fixed to not need to sleep by changing _vm_unmap_aliases() locking, then I assume dmapool is our only alternative? I have no idea with how large this should be. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [GIT PULL] dma mapping fixes for 4.17-rc3
On Tue, 24 Apr 2018, Christoph Hellwig wrote: > On Tue, Apr 24, 2018 at 11:54:26PM -0700, David Rientjes wrote: > > Shouldn't that test for dev->coherent_dma_mask < DMA_BIT_MASK(32) be more > > accurately <=? > > No, it should really be <. The exactly 32-bit case is already covered > with GFP_DMA32. Eventualy it should be < 24-bit with a separate > GFP_DMA32 case for > 32-bit && < 64-bit. Takashi has been working on > that, but it is 4.18 material. > Ack, thanks. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [GIT PULL] dma mapping fixes for 4.17-rc3
On Wed, 25 Apr 2018, Christoph Hellwig wrote: > The following changes since commit 6d08b06e67cd117f6992c46611dfb4ce267cd71e: > > Linux 4.17-rc2 (2018-04-22 19:20:09 -0700) > > are available in the Git repository at: > > git://git.infradead.org/users/hch/dma-mapping.git tags/dma-mapping-4.17-3 > > for you to fetch changes up to 60695be2bb6b0623f8e53bd9949d582a83c6d44a: > > dma-mapping: postpone cpu addr translation on mmap (2018-04-23 14:44:24 > +0200) > > > A few small dma-mapping fixes for Linux 4.17-rc3: > > - don't loop to try GFP_DMA allocations if ZONE_DMA is not actually >enabled (regression in 4.16) Shouldn't that test for dev->coherent_dma_mask < DMA_BIT_MASK(32) be more accurately <=? ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu