Re: [PATCH] dma-direct: Don't over-decrypt memory

2022-05-22 Thread David Rientjes via iommu
On Fri, 20 May 2022, Robin Murphy wrote:

> The original x86 sev_alloc() only called set_memory_decrypted() on
> memory returned by alloc_pages_node(), so the page order calculation
> fell out of that logic. However, the common dma-direct code has several
> potential allocators, not all of which are guaranteed to round up the
> underlying allocation to a power-of-two size, so carrying over that
> calculation for the encryption/decryption size was a mistake. Fix it by
> rounding to a *number* of pages, rather than an order.
> 
> Until recently there was an even worse interaction with DMA_DIRECT_REMAP
> where we could have ended up decrypting part of the next adjacent
> vmalloc area, only averted by no architecture actually supporting both
> configs at once. Don't ask how I found that one out...
> 
> CC: sta...@vger.kernel.org
> Fixes: c10f07aa27da ("dma/direct: Handle force decryption for DMA coherent 
> buffers in common code")
> Signed-off-by: Robin Murphy 

Acked-by: David Rientjes 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 4/4] dma-direct: factor out a helper for DMA_ATTR_NO_KERNEL_MAPPING allocations

2021-10-19 Thread David Rientjes via iommu
On Tue, 19 Oct 2021, Christoph Hellwig wrote:

> Split the code for DMA_ATTR_NO_KERNEL_MAPPING allocations into a separate
> helper to make dma_direct_alloc a little more readable.
> 
> Signed-off-by: Christoph Hellwig 

Acked-by: David Rientjes 

(I think my name got mangled in your To: field on the original emails :)
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/4] dma-direct: leak memory that can't be re-encrypted

2021-10-19 Thread David Rientjes via iommu
On Tue, 19 Oct 2021, Christoph Hellwig wrote:

> We must never unencryped memory go back into the general page pool.
> So if we fail to set it back to encrypted when freeing DMA memory, leak
> the memory insted and warn the user.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  kernel/dma/direct.c | 17 +
>  1 file changed, 13 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index 96f02248e7fa2..6673f7aebf787 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -264,9 +264,11 @@ void *dma_direct_alloc(struct device *dev, size_t size,
>   return ret;
>  
>  out_encrypt_pages:
> - /* If memory cannot be re-encrypted, it must be leaked */
> - if (dma_set_encrypted(dev, page_address(page), size))
> + if (dma_set_encrypted(dev, page_address(page), size)) {
> + pr_warn_ratelimited(
> + "leaking DMA memory that can't be re-encrypted\n");
>   return NULL;
> + }
>  out_free_pages:
>   __dma_direct_free_pages(dev, page, size);
>   return NULL;
> @@ -305,7 +307,11 @@ void dma_direct_free(struct device *dev, size_t size,
>   dma_free_from_pool(dev, cpu_addr, PAGE_ALIGN(size)))
>   return;
>  
> - dma_set_encrypted(dev, cpu_addr, 1 << page_order);
> + if (dma_set_encrypted(dev, cpu_addr, 1 << page_order)) {
> + pr_warn_ratelimited(
> + "leaking DMA memory that can't be re-encrypted\n");
> + return;
> + }
>  
>   if (IS_ENABLED(CONFIG_DMA_REMAP) && is_vmalloc_addr(cpu_addr))
>   vunmap(cpu_addr);
> @@ -363,7 +369,10 @@ void dma_direct_free_pages(struct device *dev, size_t 
> size,
>   dma_free_from_pool(dev, vaddr, size))
>   return;
>  
> - dma_set_encrypted(dev, vaddr, 1 << page_order);
> + if (dma_set_encrypted(dev, vaddr, 1 << page_order)) {
> + pr_warn_ratelimited(
> + "leaking DMA memory that can't be re-encrypted\n");
> + }

Is this actually leaking the memory?

>   __dma_direct_free_pages(dev, page, size);
>  }
>  
> -- 
> 2.30.2
> 
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 1/4] dma-direct: factor out dma_set_{de,en}crypted helpers

2021-10-19 Thread David Rientjes via iommu
On Tue, 19 Oct 2021, Christoph Hellwig wrote:

> Factor out helpers the make dealing with memory encryption a little less
> cumbersome.
> 
> Signed-off-by: Christoph Hellwig 
> ---
>  kernel/dma/direct.c | 55 +
>  1 file changed, 25 insertions(+), 30 deletions(-)
> 
> diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> index 4c6c5e0635e34..96f02248e7fa2 100644
> --- a/kernel/dma/direct.c
> +++ b/kernel/dma/direct.c
> @@ -75,6 +75,20 @@ static bool dma_coherent_ok(struct device *dev, 
> phys_addr_t phys, size_t size)
>   min_not_zero(dev->coherent_dma_mask, dev->bus_dma_limit);
>  }
>  
> +static int dma_set_decrypted(struct device *dev, void *vaddr, size_t size)
> +{
> + if (!force_dma_unencrypted(dev))
> + return 0;
> + return set_memory_decrypted((unsigned long)vaddr, 1 << get_order(size));
> +}
> +
> +static int dma_set_encrypted(struct device *dev, void *vaddr, size_t size)
> +{
> + if (!force_dma_unencrypted(dev))
> + return 0;
> + return set_memory_encrypted((unsigned long)vaddr, 1 << get_order(size));
> +}
> +
>  static void __dma_direct_free_pages(struct device *dev, struct page *page,
>   size_t size)
>  {
> @@ -216,12 +230,7 @@ void *dma_direct_alloc(struct device *dev, size_t size,
>   __builtin_return_address(0));
>   if (!ret)
>   goto out_free_pages;
> - if (force_dma_unencrypted(dev)) {
> - err = set_memory_decrypted((unsigned long)ret,
> -1 << get_order(size));
> - if (err)
> - goto out_free_pages;
> - }
> + err = dma_set_decrypted(dev, ret, size);

Should be

if (dma_set_decrypted(dev, ret, size))
goto out_free_pages;

?  Otherwise we ignore the value?

>   memset(ret, 0, size);
>   goto done;
>   }
> @@ -238,13 +247,9 @@ void *dma_direct_alloc(struct device *dev, size_t size,
>   }
>  
>   ret = page_address(page);
> - if (force_dma_unencrypted(dev)) {
> - err = set_memory_decrypted((unsigned long)ret,
> -1 << get_order(size));
> - if (err)
> - goto out_free_pages;
> - }
> -
> + err = dma_set_decrypted(dev, ret, size);
> + if (err)
> + goto out_free_pages;
>   memset(ret, 0, size);
>  
>   if (IS_ENABLED(CONFIG_ARCH_HAS_DMA_SET_UNCACHED) &&
> @@ -259,13 +264,9 @@ void *dma_direct_alloc(struct device *dev, size_t size,
>   return ret;
>  
>  out_encrypt_pages:
> - if (force_dma_unencrypted(dev)) {
> - err = set_memory_encrypted((unsigned long)page_address(page),
> -1 << get_order(size));
> - /* If memory cannot be re-encrypted, it must be leaked */
> - if (err)
> - return NULL;
> - }
> + /* If memory cannot be re-encrypted, it must be leaked */
> + if (dma_set_encrypted(dev, page_address(page), size))
> + return NULL;
>  out_free_pages:
>   __dma_direct_free_pages(dev, page, size);
>   return NULL;
> @@ -304,8 +305,7 @@ void dma_direct_free(struct device *dev, size_t size,
>   dma_free_from_pool(dev, cpu_addr, PAGE_ALIGN(size)))
>   return;
>  
> - if (force_dma_unencrypted(dev))
> - set_memory_encrypted((unsigned long)cpu_addr, 1 << page_order);
> + dma_set_encrypted(dev, cpu_addr, 1 << page_order);
>  
>   if (IS_ENABLED(CONFIG_DMA_REMAP) && is_vmalloc_addr(cpu_addr))
>   vunmap(cpu_addr);
> @@ -341,11 +341,8 @@ struct page *dma_direct_alloc_pages(struct device *dev, 
> size_t size,
>   }
>  
>   ret = page_address(page);
> - if (force_dma_unencrypted(dev)) {
> - if (set_memory_decrypted((unsigned long)ret,
> - 1 << get_order(size)))
> - goto out_free_pages;
> - }
> + if (dma_set_decrypted(dev, ret, size))
> + goto out_free_pages;
>   memset(ret, 0, size);
>   *dma_handle = phys_to_dma_direct(dev, page_to_phys(page));
>   return page;
> @@ -366,9 +363,7 @@ void dma_direct_free_pages(struct device *dev, size_t 
> size,
>   dma_free_from_pool(dev, vaddr, size))
>   return;
>  
> - if (force_dma_unencrypted(dev))
> - set_memory_encrypted((unsigned long)vaddr, 1 << page_order);
> -
> + dma_set_encrypted(dev, vaddr, 1 << page_order);
>   __dma_direct_free_pages(dev, page, size);
>  }
>  
> -- 
> 2.30.2
> 
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [RFC PATCH v3 1/2] mempinfd: Add new syscall to provide memory pin

2021-02-07 Thread David Rientjes via iommu
On Sun, 7 Feb 2021, Song Bao Hua (Barry Song) wrote:

> NUMA balancer is just one of many reasons for page migration. Even one
> simple alloc_pages() can cause memory migration in just single NUMA
> node or UMA system.
> 
> The other reasons for page migration include but are not limited to:
> * memory move due to CMA
> * memory move due to huge pages creation
> 
> Hardly we can ask users to disable the COMPACTION, CMA and Huge Page
> in the whole system.
> 

What about only for mlocked memory, i.e. disable 
vm.compact_unevictable_allowed?

Adding syscalls is a big deal, we can make a reasonable inference that 
we'll have to support this forever if it's merged.  I haven't seen mention 
of what other unevictable memory *should* be migratable that would be 
adversely affected if we disable that sysctl.  Maybe that gets you part of 
the way there and there are some other deficiencies, but it seems like a 
good start would be to describe how CONFIG_NUMA_BALANCING=n + 
vm.compact_unevcitable_allowed + mlock() doesn't get you mostly there and 
then look into what's missing.

If it's a very compelling case where there simply are no alternatives, it 
would make sense.  Alternative is to find a more generic way, perhaps in 
combination with vm.compact_unevictable_allowed, to achieve what you're 
looking to do that can be useful even beyond your originally intended use 
case.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: revert scope for 5.8, was Re: dma-pool fixes

2020-08-03 Thread David Rientjes via iommu
On Mon, 3 Aug 2020, Christoph Hellwig wrote:

> On Sun, Aug 02, 2020 at 09:14:41PM -0700, David Rientjes wrote:
> > Christoph: since we have atomic DMA coherent pools in 5.8 now, recall our 
> > previous discussions, including Greg KH, regarding backports to stable 
> > trees (we are interested in 5.4 and 5.6) of this support for the purposes 
> > of confidential VM users who wish to run their own SEV-enabled guests in 
> > cloud.
> > 
> > Would you have any objections to backports being offered for this support 
> > now?  We can prepare them immediately.  Or, if you would prefer we hold 
> > off for a while, please let me know and we can delay it until you are more 
> > comfortable.  (For confidential VM users, the ability to run your own 
> > unmodified stable kernel is a much different security story than a guest 
> > modified by the cloud provider.)
> 
> Before we start backporting I think we need the two fixes from
> the "dma pool fixes" series.  Can you start reviewing them?  I also
> think we should probably wait at least until -rc2 for any fallout
> before starting the backport.
> 

Thanks Christoph, this makes perfect sense.  I see Nicolas has refreshed 
the two patches:

dma-pool: fix coherent pool allocations for IOMMU mappings
dma-pool: Only allocate from CMA when in same memory zone

and posted them again today on LKML for review.  I'll review those and 
we'll send a full series of backports for stable consideration for 5.4 LTS 
and 5.6 around 5.9-rc2 timeframe.

Thanks!
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: revert scope for 5.8, was Re: dma-pool fixes

2020-08-02 Thread David Rientjes via iommu
On Sun, 2 Aug 2020, Amit Pundir wrote:

> > > > Hi, I found the problematic memory region. It was a memory
> > > > chunk reserved/removed in the downstream tree but was
> > > > seemingly reserved upstream for different drivers. I failed to
> > > > calculate the length of the total region reserved downstream
> > > > correctly. And there was still a portion of memory left unmarked,
> > > > which I should have marked as reserved in my testing earlier
> > > > today.
> > > >
> > > > Sorry for all the noise and thanks Nicolas, Christoph and David
> > > > for your patience.
> > >
> > > So you'll need to patch the upstream DTS to fix this up?  Do you also
> > > need my two fixes?  What about the Oneplus phones?  Can you send a
> > > mail with a summary of the status?
> >
> > Poco's DTS is not upstreamed yet. I have updated it for this fix
> > and sent out a newer version for review.
> > https://lkml.org/lkml/2020/8/1/184
> >
> > I didn't need to try your two add-on fixes. I'll give them a spin
> > later today.
> 
> Hi Christoph,
> 
> I see no obvious regressions with your twin dma-pool fixes on my
> PocoF1 phone.
> 
> Caleb also confirmed that a similar reserved-memory region fix in his
> One Plus 6 device-tree worked for him too.
> 

Thanks very much for the update, Amit.

Christoph: since we have atomic DMA coherent pools in 5.8 now, recall our 
previous discussions, including Greg KH, regarding backports to stable 
trees (we are interested in 5.4 and 5.6) of this support for the purposes 
of confidential VM users who wish to run their own SEV-enabled guests in 
cloud.

Would you have any objections to backports being offered for this support 
now?  We can prepare them immediately.  Or, if you would prefer we hold 
off for a while, please let me know and we can delay it until you are more 
comfortable.  (For confidential VM users, the ability to run your own 
unmodified stable kernel is a much different security story than a guest 
modified by the cloud provider.)
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: dma-pool fixes

2020-08-01 Thread David Rientjes via iommu
On Fri, 31 Jul 2020, David Rientjes wrote:

> > > Hi Nicolas, Christoph,
> > > 
> > > Just out of curiosity, I'm wondering if we can restore the earlier
> > > behaviour and make DMA atomic allocation configured thru platform
> > > specific device tree instead?
> > > 
> > > Or if you can allow a more hackish approach to restore the earlier
> > > logic, using of_machine_is_compatible() just for my device for the
> > > time being. Meanwhile I'm checking with other developers running the
> > > mainline kernel on sdm845 phones like OnePlus 6/6T, if they see this
> > > issue too.
> > 
> > If we don't find a fix for your platform I'm going to send Linus a
> > last minute revert this weekend, to stick to the no regressions policy.
> > I still hope we can fix the issue for real.
> > 
> 
> What would be the scope of this potential revert?
> 

To follow-up on this, the introduction of the DMA atomic pools in 5.8 
fixes an issue for any AMD SEV enabled guest that has a driver that 
requires atomic DMA allocations (for us, nvme) because runtime decryption 
of memory allocated through the DMA API may block.  This manifests itself 
as "sleeping in invalid context" BUGs for any confidential VM user in 
cloud.

I unfortunately don't have Amit's device to be able to independently debug 
this issue and certainly could not have done a better job at working the 
bug than Nicolas and Christoph have done so far.  I'm as baffled by the 
results as anybody else.

I fully understand the no regressions policy.  I'd also ask that we 
consider that *all* SEV guests are currently broken if they use nvme or 
any other driver that does atomic DMA allocations.  It's an extremely 
serious issue for cloud.  If there is *anything* that I can do to make 
forward progress on this issue for 5.8, including some of the workarounds 
above that Amit requested, I'd be very happy to help.  Christoph will make 
the right decision for DMA in 5.8, but I simply wanted to state how 
critical working SEV guests are to users.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: dma-pool fixes

2020-07-31 Thread David Rientjes via iommu
On Fri, 31 Jul 2020, Christoph Hellwig wrote:

> > Hi Nicolas, Christoph,
> > 
> > Just out of curiosity, I'm wondering if we can restore the earlier
> > behaviour and make DMA atomic allocation configured thru platform
> > specific device tree instead?
> > 
> > Or if you can allow a more hackish approach to restore the earlier
> > logic, using of_machine_is_compatible() just for my device for the
> > time being. Meanwhile I'm checking with other developers running the
> > mainline kernel on sdm845 phones like OnePlus 6/6T, if they see this
> > issue too.
> 
> If we don't find a fix for your platform I'm going to send Linus a
> last minute revert this weekend, to stick to the no regressions policy.
> I still hope we can fix the issue for real.
> 

What would be the scope of this potential revert?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 2/4] dma-pool: Get rid of dma_in_atomic_pool()

2020-07-09 Thread David Rientjes via iommu
On Thu, 9 Jul 2020, Nicolas Saenz Julienne wrote:

> The function is only used once and can be simplified to a one-liner.
> 
> Signed-off-by: Nicolas Saenz Julienne 

I'll leave this one to Christoph to decide on.  One thing I really liked 
about hacking around in kernel/dma is the coding style, it really follows 
"one function does one thing and does it well" even if there is only one 
caller.  dma_in_atomic_pool() was an attempt to follow in those footsteps.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] dma-pool: use single atomic pool for both DMA zones

2020-07-09 Thread David Rientjes via iommu
On Wed, 8 Jul 2020, Christoph Hellwig wrote:

> On Wed, Jul 08, 2020 at 06:00:35PM +0200, Nicolas Saenz Julienne wrote:
> > On Wed, 2020-07-08 at 17:35 +0200, Christoph Hellwig wrote:
> > > On Tue, Jul 07, 2020 at 02:28:04PM +0200, Nicolas Saenz Julienne wrote:
> > > > When allocating atomic DMA memory for a device, the dma-pool core
> > > > queries __dma_direct_optimal_gfp_mask() to check which atomic pool to
> > > > use. It turns out the GFP flag returned is only an optimistic guess.
> > > > The pool selected might sometimes live in a zone higher than the
> > > > device's view of memory.
> > > > 
> > > > As there isn't a way to grantee a mapping between a device's DMA
> > > > constraints and correct GFP flags this unifies both DMA atomic pools.
> > > > The resulting pool is allocated in the lower DMA zone available, if any,
> > > > so as for devices to always get accessible memory while having the
> > > > flexibility of using dma_pool_kernel for the non constrained ones.
> > > > 
> > > > Fixes: c84dc6e68a1d ("dma-pool: add additional coherent pools to map to 
> > > > gfp
> > > > mask")
> > > > Reported-by: Jeremy Linton 
> > > > Suggested-by: Robin Murphy 
> > > > Signed-off-by: Nicolas Saenz Julienne 
> > > 
> > > Hmm, this is not what I expected from the previous thread.  I thought
> > > we'd just use one dma pool based on runtime available of the zones..
> > 
> > I may be misunderstanding you, but isn't that going back to how things used 
> > to
> > be before pulling in David Rientjes' work? The benefit of having a 
> > GFP_KERNEL
> > pool is that non-address-constrained devices can get their atomic memory 
> > there,
> > instead of consuming somewhat scarcer low memory.
> 
> Yes, I think we are misunderstanding each other.  I don't want to remove
> any pool, just make better runtime decisions when to use them.
> 

Just to be extra explicit for the record and for my own understanding: 
Nicolas, your patch series "dma-pool: Fix atomic pool selection" obsoletes 
this patch, right? :)
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] dma-pool: Do not allocate pool memory from CMA

2020-07-09 Thread David Rientjes via iommu
On Wed, 8 Jul 2020, Nicolas Saenz Julienne wrote:

> There is no guarantee to CMA's placement, so allocating a zone specific
> atomic pool from CMA might return memory from a completely different
> memory zone. So stop using it.
> 
> Fixes: c84dc6e68a1d ("dma-pool: add additional coherent pools to map to gfp 
> mask")
> Reported-by: Jeremy Linton 
> Signed-off-by: Nicolas Saenz Julienne 

Acked-by: David Rientjes 

Thanks Nicolas!
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [patch] dma-pool: warn when coherent pool is depleted

2020-06-27 Thread David Rientjes via iommu
On Sun, 21 Jun 2020, Guenter Roeck wrote:

> > When a DMA coherent pool is depleted, allocation failures may or may not
> > get reported in the kernel log depending on the allocator.
> > 
> > The admin does have a workaround, however, by using coherent_pool= on the
> > kernel command line.
> > 
> > Provide some guidance on the failure and a recommended minimum size for
> > the pools (double the size).
> > 
> > Signed-off-by: David Rientjes 
> 
> Tested-by: Guenter Roeck 
> 
> Also confirmed that coherent_pool=256k works around the crash
> I had observed.
> 

Thanks Guenter.  Christoph, does it make sense to apply this patch since 
there may not be an artifact left behind in the kernel log on allocation 
failure by the caller?

> Guenter
> 
> > ---
> >  kernel/dma/pool.c | 6 +-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> > 
> > diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
> > --- a/kernel/dma/pool.c
> > +++ b/kernel/dma/pool.c
> > @@ -239,12 +239,16 @@ void *dma_alloc_from_pool(struct device *dev, size_t 
> > size,
> > }
> >  
> > val = gen_pool_alloc(pool, size);
> > -   if (val) {
> > +   if (likely(val)) {
> > phys_addr_t phys = gen_pool_virt_to_phys(pool, val);
> >  
> > *ret_page = pfn_to_page(__phys_to_pfn(phys));
> > ptr = (void *)val;
> > memset(ptr, 0, size);
> > +   } else {
> > +   WARN_ONCE(1, "DMA coherent pool depleted, increase size "
> > +"(recommended min coherent_pool=%zuK)\n",
> > + gen_pool_size(pool) >> 9);
> > }
> > if (gen_pool_avail(pool) < atomic_pool_size)
> > schedule_work(&atomic_pool_work);
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[patch] dma-pool: warn when coherent pool is depleted

2020-06-21 Thread David Rientjes via iommu
When a DMA coherent pool is depleted, allocation failures may or may not
get reported in the kernel log depending on the allocator.

The admin does have a workaround, however, by using coherent_pool= on the
kernel command line.

Provide some guidance on the failure and a recommended minimum size for
the pools (double the size).

Signed-off-by: David Rientjes 
---
 kernel/dma/pool.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -239,12 +239,16 @@ void *dma_alloc_from_pool(struct device *dev, size_t size,
}
 
val = gen_pool_alloc(pool, size);
-   if (val) {
+   if (likely(val)) {
phys_addr_t phys = gen_pool_virt_to_phys(pool, val);
 
*ret_page = pfn_to_page(__phys_to_pfn(phys));
ptr = (void *)val;
memset(ptr, 0, size);
+   } else {
+   WARN_ONCE(1, "DMA coherent pool depleted, increase size "
+"(recommended min coherent_pool=%zuK)\n",
+ gen_pool_size(pool) >> 9);
}
if (gen_pool_avail(pool) < atomic_pool_size)
schedule_work(&atomic_pool_work);
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] dma-pool: Fix too large DMA pools on medium systems

2020-06-21 Thread David Rientjes via iommu
On Sun, 21 Jun 2020, Guenter Roeck wrote:

> >> This patch results in a boot failure in some of my powerpc boot tests,
> >> specifically those testing boots from mptsas1068 devices. Error message:
> >>
> >> mptsas :00:02.0: enabling device ( -> 0002)
> >> mptbase: ioc0: Initiating bringup
> >> ioc0: LSISAS1068 A0: Capabilities={Initiator}
> >> mptbase: ioc0: ERROR - Unable to allocate Reply, Request, Chain Buffers!
> >> mptbase: ioc0: ERROR - didn't initialize properly! (-3)
> >> mptsas: probe of :00:02.0 failed with error -3
> >>
> >> Configuration is bamboo:44x/bamboo_defconfig plus various added drivers.
> >> Qemu command line is
> >>
> >> qemu-system-ppc -kernel vmlinux -M bamboo \
> >>  -m 256 -no-reboot -snapshot -device mptsas1068,id=scsi \
> >>  -device scsi-hd,bus=scsi.0,drive=d0,wwn=0x5000c50015ea71ac -drive \
> >>  file=rootfs.ext2,format=raw,if=none,id=d0 \
> >>  --append "panic=-1 slub_debug=FZPUA root=/dev/sda  mem=256M 
> >> console=ttyS0" \
> >>  -monitor none -nographic
> >>
> >> canyonlands_defconfig with sam460ex machine and otherwise similar command 
> >> line
> >> fails as well.
> >>
> >> Reverting this patch fixes the problem.
> > 
> > This looks like the minimum value of 128 KiB is not sufficient, and the
> > bug is in the intention of 1d659236fb43c4d2 ("dma-pool: scale the
> > default DMA coherent pool size with memory capacity")?
> > Before, there was a single pool of (fixed) 256 KiB size, now there are
> > up to three coherent pools (DMA, DMA32, and kernel), albeit of smaller
> > size (128 KiB each).
> > 
> > Can you print the requested size in drivers/message/fusion/mptbase.c:
> > PrimeIocFifos()?
> 
> 172928 bytes
> 
> > Does replacing all SZ_128K by SZ_256K in my patch help?
> 
> Yes, it does.
> 

The new coherent pools should auto expand when they are close to being 
depleted but there's no guarantee that it can be done fast enough.  
Switching the min size to be the previous min size (256KB) seems like the 
best option and it matches what 
Documentation/admin-guide/kernel-parameters.txt still stays.

I'll also send a patch to point in the right direction when this happens.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: kernel BUG at mm/huge_memory.c:2613!

2020-06-21 Thread David Rientjes via iommu
On Fri, 19 Jun 2020, Roman Gushchin wrote:

> > > [   40.287524] BUG: unable to handle page fault for address: 
> > > a77b833df000
> > > [   40.287529] #PF: supervisor write access in kernel mode
> > > [   40.287531] #PF: error_code(0x000b) - reserved bit violation
> > > [   40.287532] PGD 40d14e067 P4D 40d14e067 PUD 40d14f067 PMD 3ec54d067
> > > PTE 80001688033d9163
> > > [   40.287538] Oops: 000b [#2] SMP NOPTI
> > > [   40.287542] CPU: 9 PID: 1986 Comm: pulseaudio Tainted: G  D
> > >   5.8.0-rc1+ #697
> > > [   40.287544] Hardware name: Gigabyte Technology Co., Ltd.
> > > AB350-Gaming/AB350-Gaming-CF, BIOS F25 01/16/2019
> > > [   40.287550] RIP: 0010:__memset+0x24/0x30
> > > [   40.287553] Code: cc cc cc cc cc cc 0f 1f 44 00 00 49 89 f9 48 89
> > > d1 83 e2 07 48 c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48
> > > 0f af c6  48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89
> > > d1 f3
> > > [   40.287556] RSP: 0018:a77b827a7e08 EFLAGS: 00010216
> > > [   40.287558] RAX:  RBX: 90f77dced800 RCX: 
> > > 08a0
> > > [   40.287560] RDX:  RSI:  RDI: 
> > > a77b833df000
> > > [   40.287561] RBP: 90f7898c7000 R08: 90f78c507768 R09: 
> > > a77b833df000
> > > [   40.287563] R10: a77b833df000 R11: 90f7839f2d40 R12: 
> > > 
> > > [   40.287564] R13: 90f76a802e00 R14: c0cb8880 R15: 
> > > 90f770f4e700
> > > [   40.287567] FS:  7f3d8e8df880() GS:90f78ee4()
> > > knlGS:
> > > [   40.287569] CS:  0010 DS:  ES:  CR0: 80050033
> > > [   40.287570] CR2: a77b833df000 CR3: 0003fa556000 CR4: 
> > > 003406e0
> > > [   40.287572] Call Trace:
> > > [   40.287584]  snd_pcm_hw_params+0x3fd/0x490 [snd_pcm]
> > > [   40.287593]  snd_pcm_common_ioctl+0x1c5/0x1110 [snd_pcm]
> > > [   40.287601]  ? snd_pcm_info_user+0x64/0x80 [snd_pcm]
> > > [   40.287608]  snd_pcm_ioctl+0x23/0x30 [snd_pcm]
> > > [   40.287613]  ksys_ioctl+0x82/0xc0
> > > [   40.287617]  __x64_sys_ioctl+0x16/0x20
> > > [   40.287622]  do_syscall_64+0x4d/0x90
> > > [   40.287627]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> > 
> > Hi Roman,
> > 
> > If you have CONFIG_AMD_MEM_ENCRYPT set, this should be resolved by
> > 
> > commit dbed452a078d56bc7f1abecc3edd6a75e8e4484e
> > Author: David Rientjes 
> > Date:   Thu Jun 11 00:25:57 2020 -0700
> > 
> > dma-pool: decouple DMA_REMAP from DMA_COHERENT_POOL
> > 
> > Or you might want to wait for 5.8-rc2 instead which includes this fix.
> > 
> 
> Hello, David!
> 
> Thank you for pointing at it! Unfortunately, there must be something wrong
> with drivers, your patch didn't help much. I still see the same stacktrace.
> 

Wow, I'm very surprised.  Do you have CONFIG_AMD_MEM_ENCRYPT enabled?

Adding Takashi.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: kernel BUG at mm/huge_memory.c:2613!

2020-06-19 Thread David Rientjes via iommu
On Fri, 19 Jun 2020, Roman Gushchin wrote:

> [   40.287524] BUG: unable to handle page fault for address: a77b833df000
> [   40.287529] #PF: supervisor write access in kernel mode
> [   40.287531] #PF: error_code(0x000b) - reserved bit violation
> [   40.287532] PGD 40d14e067 P4D 40d14e067 PUD 40d14f067 PMD 3ec54d067
> PTE 80001688033d9163
> [   40.287538] Oops: 000b [#2] SMP NOPTI
> [   40.287542] CPU: 9 PID: 1986 Comm: pulseaudio Tainted: G  D
>   5.8.0-rc1+ #697
> [   40.287544] Hardware name: Gigabyte Technology Co., Ltd.
> AB350-Gaming/AB350-Gaming-CF, BIOS F25 01/16/2019
> [   40.287550] RIP: 0010:__memset+0x24/0x30
> [   40.287553] Code: cc cc cc cc cc cc 0f 1f 44 00 00 49 89 f9 48 89
> d1 83 e2 07 48 c1 e9 03 40 0f b6 f6 48 b8 01 01 01 01 01 01 01 01 48
> 0f af c6  48 ab 89 d1 f3 aa 4c 89 c8 c3 90 49 89 f9 40 88 f0 48 89
> d1 f3
> [   40.287556] RSP: 0018:a77b827a7e08 EFLAGS: 00010216
> [   40.287558] RAX:  RBX: 90f77dced800 RCX: 
> 08a0
> [   40.287560] RDX:  RSI:  RDI: 
> a77b833df000
> [   40.287561] RBP: 90f7898c7000 R08: 90f78c507768 R09: 
> a77b833df000
> [   40.287563] R10: a77b833df000 R11: 90f7839f2d40 R12: 
> 
> [   40.287564] R13: 90f76a802e00 R14: c0cb8880 R15: 
> 90f770f4e700
> [   40.287567] FS:  7f3d8e8df880() GS:90f78ee4()
> knlGS:
> [   40.287569] CS:  0010 DS:  ES:  CR0: 80050033
> [   40.287570] CR2: a77b833df000 CR3: 0003fa556000 CR4: 
> 003406e0
> [   40.287572] Call Trace:
> [   40.287584]  snd_pcm_hw_params+0x3fd/0x490 [snd_pcm]
> [   40.287593]  snd_pcm_common_ioctl+0x1c5/0x1110 [snd_pcm]
> [   40.287601]  ? snd_pcm_info_user+0x64/0x80 [snd_pcm]
> [   40.287608]  snd_pcm_ioctl+0x23/0x30 [snd_pcm]
> [   40.287613]  ksys_ioctl+0x82/0xc0
> [   40.287617]  __x64_sys_ioctl+0x16/0x20
> [   40.287622]  do_syscall_64+0x4d/0x90
> [   40.287627]  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Hi Roman,

If you have CONFIG_AMD_MEM_ENCRYPT set, this should be resolved by

commit dbed452a078d56bc7f1abecc3edd6a75e8e4484e
Author: David Rientjes 
Date:   Thu Jun 11 00:25:57 2020 -0700

dma-pool: decouple DMA_REMAP from DMA_COHERENT_POOL

Or you might want to wait for 5.8-rc2 instead which includes this fix.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] dma-mapping: DMA_COHERENT_POOL should select GENERIC_ALLOCATOR

2020-06-18 Thread David Rientjes via iommu
On Thu, 18 Jun 2020, Christoph Hellwig wrote:

> The dma coherent pool code needs genalloc.  Move the select over
> from DMA_REMAP, which doesn't actually need it.
> 
> Fixes: dbed452a078d ("dma-pool: decouple DMA_REMAP from DMA_COHERENT_POOL")
> Reported-by: kernel test robot 

Acked-by: David Rientjes 

Thanks Christoph.  In the initial bug report from Alex Xu, his .config set 
CONFIG_GENERIC_ALLOCATOR already so I think there is very little risk with 
this patch.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[patch for-5.8 1/4] dma-direct: always align allocation size in dma_direct_alloc_pages()

2020-06-11 Thread David Rientjes via iommu
dma_alloc_contiguous() does size >> PAGE_SHIFT and set_memory_decrypted()
works at page granularity.  It's necessary to page align the allocation
size in dma_direct_alloc_pages() for consistent behavior.

This also fixes an issue when arch_dma_prep_coherent() is called on an
unaligned allocation size for dma_alloc_need_uncached() when
CONFIG_DMA_DIRECT_REMAP is disabled but CONFIG_ARCH_HAS_DMA_SET_UNCACHED
is enabled.

Cc: sta...@vger.kernel.org
Signed-off-by: David Rientjes 
---
 kernel/dma/direct.c | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -112,11 +112,12 @@ static inline bool dma_should_free_from_pool(struct 
device *dev,
 struct page *__dma_direct_alloc_pages(struct device *dev, size_t size,
gfp_t gfp, unsigned long attrs)
 {
-   size_t alloc_size = PAGE_ALIGN(size);
int node = dev_to_node(dev);
struct page *page = NULL;
u64 phys_limit;
 
+   VM_BUG_ON(!PAGE_ALIGNED(size));
+
if (attrs & DMA_ATTR_NO_WARN)
gfp |= __GFP_NOWARN;
 
@@ -124,14 +125,14 @@ struct page *__dma_direct_alloc_pages(struct device *dev, 
size_t size,
gfp &= ~__GFP_ZERO;
gfp |= dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask,
   &phys_limit);
-   page = dma_alloc_contiguous(dev, alloc_size, gfp);
+   page = dma_alloc_contiguous(dev, size, gfp);
if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
-   dma_free_contiguous(dev, page, alloc_size);
+   dma_free_contiguous(dev, page, size);
page = NULL;
}
 again:
if (!page)
-   page = alloc_pages_node(node, gfp, get_order(alloc_size));
+   page = alloc_pages_node(node, gfp, get_order(size));
if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
dma_free_contiguous(dev, page, size);
page = NULL;
@@ -158,8 +159,10 @@ void *dma_direct_alloc_pages(struct device *dev, size_t 
size,
struct page *page;
void *ret;
 
+   size = PAGE_ALIGN(size);
+
if (dma_should_alloc_from_pool(dev, gfp, attrs)) {
-   ret = dma_alloc_from_pool(dev, PAGE_ALIGN(size), &page, gfp);
+   ret = dma_alloc_from_pool(dev, size, &page, gfp);
if (!ret)
return NULL;
goto done;
@@ -183,10 +186,10 @@ void *dma_direct_alloc_pages(struct device *dev, size_t 
size,
 dma_alloc_need_uncached(dev, attrs)) ||
(IS_ENABLED(CONFIG_DMA_REMAP) && PageHighMem(page))) {
/* remove any dirty cache lines on the kernel alias */
-   arch_dma_prep_coherent(page, PAGE_ALIGN(size));
+   arch_dma_prep_coherent(page, size);
 
/* create a coherent mapping */
-   ret = dma_common_contiguous_remap(page, PAGE_ALIGN(size),
+   ret = dma_common_contiguous_remap(page, size,
dma_pgprot(dev, PAGE_KERNEL, attrs),
__builtin_return_address(0));
if (!ret)
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[patch for-5.8 4/4] dma-direct: add missing set_memory_decrypted() for coherent mapping

2020-06-11 Thread David Rientjes via iommu
When a coherent mapping is created in dma_direct_alloc_pages(), it needs
to be decrypted if the device requires unencrypted DMA before returning.

Fixes: 3acac065508f ("dma-mapping: merge the generic remapping helpers
into dma-direct")
Cc: sta...@vger.kernel.org # 5.5+
Signed-off-by: David Rientjes 
---
 kernel/dma/direct.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -195,6 +195,12 @@ void *dma_direct_alloc_pages(struct device *dev, size_t 
size,
__builtin_return_address(0));
if (!ret)
goto out_free_pages;
+   if (force_dma_unencrypted(dev)) {
+   err = set_memory_decrypted((unsigned long)ret,
+  1 << get_order(size));
+   if (err)
+   goto out_free_pages;
+   }
memset(ret, 0, size);
goto done;
}
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[patch for-5.8 2/4] dma-direct: re-encrypt memory if dma_direct_alloc_pages() fails

2020-06-11 Thread David Rientjes via iommu
If arch_dma_set_uncached() fails after memory has been decrypted, it needs
to be re-encrypted before freeing.

Fixes: fa7e2247c572 ("dma-direct: make uncached_kernel_address more
general")
Cc: sta...@vger.kernel.org # 5.7
Signed-off-by: David Rientjes 
---
 kernel/dma/direct.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -220,7 +220,7 @@ void *dma_direct_alloc_pages(struct device *dev, size_t 
size,
arch_dma_prep_coherent(page, size);
ret = arch_dma_set_uncached(ret, size);
if (IS_ERR(ret))
-   goto out_free_pages;
+   goto out_encrypt_pages;
}
 done:
if (force_dma_unencrypted(dev))
@@ -228,6 +228,10 @@ void *dma_direct_alloc_pages(struct device *dev, size_t 
size,
else
*dma_handle = phys_to_dma(dev, page_to_phys(page));
return ret;
+out_encrypt_pages:
+   if (force_dma_unencrypted(dev))
+   set_memory_encrypted((unsigned long)page_address(page),
+1 << get_order(size));
 out_free_pages:
dma_free_contiguous(dev, page, size);
return NULL;
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[patch for-5.8 0/4] dma-direct: dma_direct_alloc_pages() fixes for AMD SEV

2020-06-11 Thread David Rientjes via iommu
While debugging recently reported issues concerning DMA allocation
practices when CONFIG_AMD_MEM_ENCRYPT is enabled, some curiosities arose
when looking at dma_direct_alloc_pages() behavior.

Fix these up.  These are likely all stable material, so proposing for 5.8.
---
 kernel/dma/direct.c | 42 --
 1 file changed, 32 insertions(+), 10 deletions(-)
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[patch for-5.8 3/4] dma-direct: check return value when encrypting or decrypting memory

2020-06-11 Thread David Rientjes via iommu
__change_page_attr() can fail which will cause set_memory_encrypted() and
set_memory_decrypted() to return non-zero.

If the device requires unencrypted DMA memory and decryption fails, simply
free the memory and fail.

If attempting to re-encrypt in the failure path and that encryption fails,
there is no alternative other than to leak the memory.

Fixes: c10f07aa27da ("dma/direct: Handle force decryption for DMA coherent
buffers in common code")
Cc: sta...@vger.kernel.org # 4.17+
Signed-off-by: David Rientjes 
---
 kernel/dma/direct.c | 19 ++-
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -158,6 +158,7 @@ void *dma_direct_alloc_pages(struct device *dev, size_t 
size,
 {
struct page *page;
void *ret;
+   int err;
 
size = PAGE_ALIGN(size);
 
@@ -210,8 +211,12 @@ void *dma_direct_alloc_pages(struct device *dev, size_t 
size,
}
 
ret = page_address(page);
-   if (force_dma_unencrypted(dev))
-   set_memory_decrypted((unsigned long)ret, 1 << get_order(size));
+   if (force_dma_unencrypted(dev)) {
+   err = set_memory_decrypted((unsigned long)ret,
+  1 << get_order(size));
+   if (err)
+   goto out_free_pages;
+   }
 
memset(ret, 0, size);
 
@@ -229,9 +234,13 @@ void *dma_direct_alloc_pages(struct device *dev, size_t 
size,
*dma_handle = phys_to_dma(dev, page_to_phys(page));
return ret;
 out_encrypt_pages:
-   if (force_dma_unencrypted(dev))
-   set_memory_encrypted((unsigned long)page_address(page),
-1 << get_order(size));
+   if (force_dma_unencrypted(dev)) {
+   err = set_memory_encrypted((unsigned long)page_address(page),
+  1 << get_order(size));
+   /* If memory cannot be re-encrypted, it must be leaked */
+   if (err)
+   return NULL;
+   }
 out_free_pages:
dma_free_contiguous(dev, page, size);
return NULL;
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v2] dma-pool: Fix too large DMA pools on medium systems

2020-06-08 Thread David Rientjes via iommu
On Mon, 8 Jun 2020, Geert Uytterhoeven wrote:

> On systems with at least 32 MiB, but less than 32 GiB of RAM, the DMA
> memory pools are much larger than intended (e.g. 2 MiB instead of 128
> KiB on a 256 MiB system).
> 
> Fix this by correcting the calculation of the number of GiBs of RAM in
> the system.  Invert the order of the min/max operations, to keep on
> calculating in pages until the last step, which aids readability.
> 
> Fixes: 1d659236fb43c4d2 ("dma-pool: scale the default DMA coherent pool size 
> with memory capacity")
> Signed-off-by: Geert Uytterhoeven 

This works as well and is much more readable.  Thanks Geert!

Acked-by: David Rientjes 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [patch 0/7] unencrypted atomic DMA pools with dynamic expansion

2020-04-17 Thread David Rientjes via iommu
On Fri, 17 Apr 2020, Christoph Hellwig wrote:

> So modulo a few comments that I can fix up myself this looks good.  Unless
> you want to resend for some reason I'm ready to pick this up once I open
> the dma-mapping tree after -rc2.
> 

Yes, please do, and thanks to both you and Thomas for the guidance and 
code reviews.

Once these patches take on their final form in your branch, how supportive 
would you be of stable backports going back to 4.19 LTS?

There have been several changes to this area over time, so there are 
varying levels of rework that need to be done for each stable kernel back 
to 4.19.  But I'd be happy to do that work if you are receptive to it.

For rationale, without these fixes, all SEV enabled guests warn of 
blocking in rcu read side critical sections when using drivers that 
allocate atomically though the DMA API that calls set_memory_decrypted().  
Users can see warnings such as these in the guest:

BUG: sleeping function called from invalid context at mm/vmalloc.c:1710
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 3383, name: fio
2 locks held by fio/3383:
 #0: 93b6a8568348 (&sb->s_type->i_mutex_key#16){+.+.}, at: 
ext4_file_write_iter+0xa2/0x5d0
 #1: a52a61a0 (rcu_read_lock){}, at: hctx_lock+0x1a/0xe0
CPU: 0 PID: 3383 Comm: fio Tainted: GW 5.5.10 #14
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
Call Trace:
 dump_stack+0x98/0xd5
 ___might_sleep+0x175/0x260
 __might_sleep+0x4a/0x80
 _vm_unmap_aliases+0x45/0x250
 vm_unmap_aliases+0x19/0x20
 __set_memory_enc_dec+0xa4/0x130
 set_memory_decrypted+0x10/0x20
 dma_direct_alloc_pages+0x148/0x150
 dma_direct_alloc+0xe/0x10
 dma_alloc_attrs+0x86/0xc0
 dma_pool_alloc+0x16f/0x2b0
 nvme_queue_rq+0x878/0xc30 [nvme]
 __blk_mq_try_issue_directly+0x135/0x200
 blk_mq_request_issue_directly+0x4f/0x80
 blk_mq_try_issue_list_directly+0x46/0xb0
 blk_mq_sched_insert_requests+0x19b/0x2b0
 blk_mq_flush_plug_list+0x22f/0x3b0
 blk_flush_plug_list+0xd1/0x100
 blk_finish_plug+0x2c/0x40
 iomap_dio_rw+0x427/0x490
 ext4_file_write_iter+0x181/0x5d0
 aio_write+0x109/0x1b0
 io_submit_one+0x7d0/0xfa0
 __x64_sys_io_submit+0xa2/0x280
 do_syscall_64+0x5f/0x250
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[patch 4/7] dma-direct: atomic allocations must come from atomic coherent pools

2020-04-14 Thread David Rientjes via iommu
When a device requires unencrypted memory and the context does not allow
blocking, memory must be returned from the atomic coherent pools.

This avoids the remap when CONFIG_DMA_DIRECT_REMAP is not enabled and the
config only requires CONFIG_DMA_COHERENT_POOL.  This will be used for
CONFIG_AMD_MEM_ENCRYPT in a subsequent patch.

Keep all memory in these pools unencrypted.  When set_memory_decrypted()
fails, this prohibits the memory from being added.  If adding memory to
the genpool fails, and set_memory_encrypted() subsequently fails, there
is no alternative other than leaking the memory.

Signed-off-by: David Rientjes 
---
 kernel/dma/direct.c | 46 ++---
 kernel/dma/pool.c   | 27 +++---
 2 files changed, 63 insertions(+), 10 deletions(-)

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index a834ee22f8ff..07ecc5c4d134 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -76,6 +76,39 @@ static bool dma_coherent_ok(struct device *dev, phys_addr_t 
phys, size_t size)
min_not_zero(dev->coherent_dma_mask, 
dev->bus_dma_limit);
 }
 
+/*
+ * Decrypting memory is allowed to block, so if this device requires
+ * unencrypted memory it must come from atomic pools.
+ */
+static inline bool dma_should_alloc_from_pool(struct device *dev, gfp_t gfp,
+ unsigned long attrs)
+{
+   if (!IS_ENABLED(CONFIG_DMA_COHERENT_POOL))
+   return false;
+   if (gfpflags_allow_blocking(gfp))
+   return false;
+   if (force_dma_unencrypted(dev))
+   return true;
+   if (!IS_ENABLED(CONFIG_DMA_DIRECT_REMAP))
+   return false;
+   if (dma_alloc_need_uncached(dev, attrs))
+   return true;
+   return false;
+}
+
+static inline bool dma_should_free_from_pool(struct device *dev,
+unsigned long attrs)
+{
+   if (IS_ENABLED(CONFIG_DMA_COHERENT_POOL))
+   return true;
+   if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) &&
+   !force_dma_unencrypted(dev))
+   return false;
+   if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP))
+   return true;
+   return false;
+}
+
 struct page *__dma_direct_alloc_pages(struct device *dev, size_t size,
gfp_t gfp, unsigned long attrs)
 {
@@ -125,9 +158,7 @@ void *dma_direct_alloc_pages(struct device *dev, size_t 
size,
struct page *page;
void *ret;
 
-   if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
-   dma_alloc_need_uncached(dev, attrs) &&
-   !gfpflags_allow_blocking(gfp)) {
+   if (dma_should_alloc_from_pool(dev, gfp, attrs)) {
ret = dma_alloc_from_pool(dev, PAGE_ALIGN(size), &page, gfp);
if (!ret)
return NULL;
@@ -204,6 +235,11 @@ void dma_direct_free_pages(struct device *dev, size_t 
size, void *cpu_addr,
 {
unsigned int page_order = get_order(size);
 
+   /* If cpu_addr is not from an atomic pool, dma_free_from_pool() fails */
+   if (dma_should_free_from_pool(dev, attrs) &&
+   dma_free_from_pool(dev, cpu_addr, PAGE_ALIGN(size)))
+   return;
+
if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) &&
!force_dma_unencrypted(dev)) {
/* cpu_addr is a struct page cookie, not a kernel address */
@@ -211,10 +247,6 @@ void dma_direct_free_pages(struct device *dev, size_t 
size, void *cpu_addr,
return;
}
 
-   if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
-   dma_free_from_pool(dev, cpu_addr, PAGE_ALIGN(size)))
-   return;
-
if (force_dma_unencrypted(dev))
set_memory_encrypted((unsigned long)cpu_addr, 1 << page_order);
 
diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
index 9e2da17ed17b..cf052314d9e4 100644
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -7,6 +7,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -53,22 +54,42 @@ static int atomic_pool_expand(struct gen_pool *pool, size_t 
pool_size,
 
arch_dma_prep_coherent(page, pool_size);
 
+#ifdef CONFIG_DMA_DIRECT_REMAP
addr = dma_common_contiguous_remap(page, pool_size,
   pgprot_dmacoherent(PAGE_KERNEL),
   __builtin_return_address(0));
if (!addr)
goto free_page;
-
+#else
+   addr = page_to_virt(page);
+#endif
+   /*
+* Memory in the atomic DMA pools must be unencrypted, the pools do not
+* shrink so no re-encryption occurs in dma_direct_free_pages().
+*/
+   ret = set_memory_decrypted((unsigned long)page_to_virt(page),
+  1 << order);
+   if (ret)
+   goto remove_mapping;
ret = gen_pool_add_virt(pool, (unsigned long)addr, page_to_phys(page),
 

[patch 1/7] dma-remap: separate DMA atomic pools from direct remap code

2020-04-14 Thread David Rientjes via iommu
DMA atomic pools will be needed beyond only CONFIG_DMA_DIRECT_REMAP so
separate them out into their own file.

This also adds a new Kconfig option that can be subsequently used for
options, such as CONFIG_AMD_MEM_ENCRYPT, that will utilize the coherent
pools but do not have a dependency on direct remapping.

For this patch alone, there is no functional change introduced.

Reviewed-by: Christoph Hellwig 
Signed-off-by: David Rientjes 
---
 kernel/dma/Kconfig  |   6 ++-
 kernel/dma/Makefile |   1 +
 kernel/dma/pool.c   | 123 
 kernel/dma/remap.c  | 114 
 4 files changed, 129 insertions(+), 115 deletions(-)
 create mode 100644 kernel/dma/pool.c

diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index 4c103a24e380..d006668c0027 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -79,10 +79,14 @@ config DMA_REMAP
select DMA_NONCOHERENT_MMAP
bool
 
-config DMA_DIRECT_REMAP
+config DMA_COHERENT_POOL
bool
select DMA_REMAP
 
+config DMA_DIRECT_REMAP
+   bool
+   select DMA_COHERENT_POOL
+
 config DMA_CMA
bool "DMA Contiguous Memory Allocator"
depends on HAVE_DMA_CONTIGUOUS && CMA
diff --git a/kernel/dma/Makefile b/kernel/dma/Makefile
index d237cf3dc181..370f63344e9c 100644
--- a/kernel/dma/Makefile
+++ b/kernel/dma/Makefile
@@ -6,4 +6,5 @@ obj-$(CONFIG_DMA_DECLARE_COHERENT)  += coherent.o
 obj-$(CONFIG_DMA_VIRT_OPS) += virt.o
 obj-$(CONFIG_DMA_API_DEBUG)+= debug.o
 obj-$(CONFIG_SWIOTLB)  += swiotlb.o
+obj-$(CONFIG_DMA_COHERENT_POOL)+= pool.o
 obj-$(CONFIG_DMA_REMAP)+= remap.o
diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
new file mode 100644
index ..6612c2d51d3c
--- /dev/null
+++ b/kernel/dma/pool.c
@@ -0,0 +1,123 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2020 Google LLC
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static struct gen_pool *atomic_pool __ro_after_init;
+
+#define DEFAULT_DMA_COHERENT_POOL_SIZE  SZ_256K
+static size_t atomic_pool_size __initdata = DEFAULT_DMA_COHERENT_POOL_SIZE;
+
+static int __init early_coherent_pool(char *p)
+{
+   atomic_pool_size = memparse(p, &p);
+   return 0;
+}
+early_param("coherent_pool", early_coherent_pool);
+
+static gfp_t dma_atomic_pool_gfp(void)
+{
+   if (IS_ENABLED(CONFIG_ZONE_DMA))
+   return GFP_DMA;
+   if (IS_ENABLED(CONFIG_ZONE_DMA32))
+   return GFP_DMA32;
+   return GFP_KERNEL;
+}
+
+static int __init dma_atomic_pool_init(void)
+{
+   unsigned int pool_size_order = get_order(atomic_pool_size);
+   unsigned long nr_pages = atomic_pool_size >> PAGE_SHIFT;
+   struct page *page;
+   void *addr;
+   int ret;
+
+   if (dev_get_cma_area(NULL))
+   page = dma_alloc_from_contiguous(NULL, nr_pages,
+pool_size_order, false);
+   else
+   page = alloc_pages(dma_atomic_pool_gfp(), pool_size_order);
+   if (!page)
+   goto out;
+
+   arch_dma_prep_coherent(page, atomic_pool_size);
+
+   atomic_pool = gen_pool_create(PAGE_SHIFT, -1);
+   if (!atomic_pool)
+   goto free_page;
+
+   addr = dma_common_contiguous_remap(page, atomic_pool_size,
+  pgprot_dmacoherent(PAGE_KERNEL),
+  __builtin_return_address(0));
+   if (!addr)
+   goto destroy_genpool;
+
+   ret = gen_pool_add_virt(atomic_pool, (unsigned long)addr,
+   page_to_phys(page), atomic_pool_size, -1);
+   if (ret)
+   goto remove_mapping;
+   gen_pool_set_algo(atomic_pool, gen_pool_first_fit_order_align, NULL);
+
+   pr_info("DMA: preallocated %zu KiB pool for atomic allocations\n",
+   atomic_pool_size / 1024);
+   return 0;
+
+remove_mapping:
+   dma_common_free_remap(addr, atomic_pool_size);
+destroy_genpool:
+   gen_pool_destroy(atomic_pool);
+   atomic_pool = NULL;
+free_page:
+   if (!dma_release_from_contiguous(NULL, page, nr_pages))
+   __free_pages(page, pool_size_order);
+out:
+   pr_err("DMA: failed to allocate %zu KiB pool for atomic coherent 
allocation\n",
+   atomic_pool_size / 1024);
+   return -ENOMEM;
+}
+postcore_initcall(dma_atomic_pool_init);
+
+bool dma_in_atomic_pool(void *start, size_t size)
+{
+   if (unlikely(!atomic_pool))
+   return false;
+
+   return gen_pool_has_addr(atomic_pool, (unsigned long)start, size);
+}
+
+void *dma_alloc_from_pool(size_t size, struct page **ret_page, gfp_t flags)
+{
+   unsigned long val;
+   void *ptr = NULL;
+
+   if (!atomic_pool) {
+   WARN(1, "coherent pool not initialised!\n");
+   return NULL;
+

[patch 3/7] dma-pool: dynamically expanding atomic pools

2020-04-14 Thread David Rientjes via iommu
When an atomic pool becomes fully depleted because it is now relied upon
for all non-blocking allocations through the DMA API, allow background
expansion of each pool by a kworker.

When an atomic pool has less than the default size of memory left, kick
off a kworker to dynamically expand the pool in the background.  The pool
is doubled in size, up to MAX_ORDER-1.  If memory cannot be allocated at
the requested order, smaller allocation(s) are attempted.

This allows the default size to be kept quite low when one or more of the
atomic pools is not used.

Allocations for lowmem should also use GFP_KERNEL for the benefits of
reclaim, so use GFP_KERNEL | GFP_DMA and GFP_KERNEL | GFP_DMA32 for
lowmem allocations.

This also allows __dma_atomic_pool_init() to return a pointer to the pool
to make initialization cleaner.

Also switch over some node ids to the more appropriate NUMA_NO_NODE.

Signed-off-by: David Rientjes 
---
 kernel/dma/pool.c | 122 +++---
 1 file changed, 84 insertions(+), 38 deletions(-)

diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
index 5c98ab991b16..9e2da17ed17b 100644
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -9,13 +9,17 @@
 #include 
 #include 
 #include 
+#include 
 
 static struct gen_pool *atomic_pool_dma __ro_after_init;
 static struct gen_pool *atomic_pool_dma32 __ro_after_init;
 static struct gen_pool *atomic_pool_kernel __ro_after_init;
 
 #define DEFAULT_DMA_COHERENT_POOL_SIZE  SZ_256K
-static size_t atomic_pool_size __initdata = DEFAULT_DMA_COHERENT_POOL_SIZE;
+static size_t atomic_pool_size = DEFAULT_DMA_COHERENT_POOL_SIZE;
+
+/* Dynamic background expansion when the atomic pool is near capacity */
+static struct work_struct atomic_pool_work;
 
 static int __init early_coherent_pool(char *p)
 {
@@ -24,76 +28,116 @@ static int __init early_coherent_pool(char *p)
 }
 early_param("coherent_pool", early_coherent_pool);
 
-static int __init __dma_atomic_pool_init(struct gen_pool **pool,
-size_t pool_size, gfp_t gfp)
+static int atomic_pool_expand(struct gen_pool *pool, size_t pool_size,
+ gfp_t gfp)
 {
-   const unsigned int order = get_order(pool_size);
-   const unsigned long nr_pages = pool_size >> PAGE_SHIFT;
+   unsigned int order;
struct page *page;
void *addr;
-   int ret;
+   int ret = -ENOMEM;
+
+   /* Cannot allocate larger than MAX_ORDER-1 */
+   order = min(get_order(pool_size), MAX_ORDER-1);
+
+   do {
+   pool_size = 1 << (PAGE_SHIFT + order);
 
-   if (dev_get_cma_area(NULL))
-   page = dma_alloc_from_contiguous(NULL, nr_pages, order, false);
-   else
-   page = alloc_pages(gfp, order);
+   if (dev_get_cma_area(NULL))
+   page = dma_alloc_from_contiguous(NULL, 1 << order,
+order, false);
+   else
+   page = alloc_pages(gfp, order);
+   } while (!page && order-- > 0);
if (!page)
goto out;
 
arch_dma_prep_coherent(page, pool_size);
 
-   *pool = gen_pool_create(PAGE_SHIFT, -1);
-   if (!*pool)
-   goto free_page;
-
addr = dma_common_contiguous_remap(page, pool_size,
   pgprot_dmacoherent(PAGE_KERNEL),
   __builtin_return_address(0));
if (!addr)
-   goto destroy_genpool;
+   goto free_page;
 
-   ret = gen_pool_add_virt(*pool, (unsigned long)addr, page_to_phys(page),
-   pool_size, -1);
+   ret = gen_pool_add_virt(pool, (unsigned long)addr, page_to_phys(page),
+   pool_size, NUMA_NO_NODE);
if (ret)
goto remove_mapping;
-   gen_pool_set_algo(*pool, gen_pool_first_fit_order_align, NULL);
 
-   pr_info("DMA: preallocated %zu KiB %pGg pool for atomic allocations\n",
-   pool_size >> 10, &gfp);
return 0;
 
 remove_mapping:
dma_common_free_remap(addr, pool_size);
-destroy_genpool:
-   gen_pool_destroy(*pool);
-   *pool = NULL;
 free_page:
-   if (!dma_release_from_contiguous(NULL, page, nr_pages))
+   if (!dma_release_from_contiguous(NULL, page, 1 << order))
__free_pages(page, order);
 out:
-   pr_err("DMA: failed to allocate %zu KiB %pGg pool for atomic 
allocation\n",
-  pool_size >> 10, &gfp);
-   return -ENOMEM;
+   return ret;
+}
+
+static void atomic_pool_resize(struct gen_pool *pool, gfp_t gfp)
+{
+   if (pool && gen_pool_avail(pool) < atomic_pool_size)
+   atomic_pool_expand(pool, gen_pool_size(pool), gfp);
+}
+
+static void atomic_pool_work_fn(struct work_struct *work)
+{
+   if (IS_ENABLED(CONFIG_ZONE_DMA))
+   atomic_pool_resize(atomic_pool_dma,
+

[patch 6/7] x86/mm: unencrypted non-blocking DMA allocations use coherent pools

2020-04-14 Thread David Rientjes via iommu
When CONFIG_AMD_MEM_ENCRYPT is enabled and a device requires unencrypted
DMA, all non-blocking allocations must originate from the atomic DMA
coherent pools.

Select CONFIG_DMA_COHERENT_POOL for CONFIG_AMD_MEM_ENCRYPT.

Signed-off-by: David Rientjes 
---
 arch/x86/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 1d6104ea8af0..2bf819d3 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1520,6 +1520,7 @@ config X86_CPA_STATISTICS
 config AMD_MEM_ENCRYPT
bool "AMD Secure Memory Encryption (SME) support"
depends on X86_64 && CPU_SUP_AMD
+   select DMA_COHERENT_POOL
select DYNAMIC_PHYSICAL_MASK
select ARCH_USE_MEMREMAP_PROT
select ARCH_HAS_FORCE_DMA_UNENCRYPTED
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[patch 0/7] unencrypted atomic DMA pools with dynamic expansion

2020-04-14 Thread David Rientjes via iommu
set_memory_decrypted() may block so it is not possible to do non-blocking
allocations through the DMA API for devices that required unencrypted
memory.

The solution is to expand the atomic DMA pools for the various possible
gfp requirements as a means to prevent an unnecessary depletion of lowmem.
These atomic pools are separated from the remap code and can be selected
for configurations that need them outside the scope of
CONFIG_DMA_DIRECT_REMAP, such as CONFIG_AMD_MEM_ENCRYPT.

These atomic DMA pools are kept unencrypted so they can immediately be
used for non-blocking allocations.  Since the need for this type of memory
depends on the kernel config and devices being used, these pools are also
dynamically expandable.

The sizes of the various atomic DMA pools is exported through debugfs at
/sys/kernel/debug/dma_pools.

This patchset is based on latest Linus HEAD:

commit 8632e9b5645bbc2331d21d892b0d6961c1a08429
Merge: 6cc9306b8fc0 f3a99e761efa
Author: Linus Torvalds 
Date:   Tue Apr 14 11:58:04 2020 -0700

Merge tag 'hyperv-fixes-signed' of 
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
---
 arch/x86/Kconfig|   1 +
 drivers/iommu/dma-iommu.c   |   5 +-
 include/linux/dma-direct.h  |   2 +
 include/linux/dma-mapping.h |   6 +-
 kernel/dma/Kconfig  |   6 +-
 kernel/dma/Makefile |   1 +
 kernel/dma/direct.c |  56 ++--
 kernel/dma/pool.c   | 275 
 kernel/dma/remap.c  | 114 ---
 9 files changed, 334 insertions(+), 132 deletions(-)
 create mode 100644 kernel/dma/pool.c
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[patch 7/7] dma-pool: scale the default DMA coherent pool size with memory capacity

2020-04-14 Thread David Rientjes via iommu
When AMD memory encryption is enabled, some devices may use more than
256KB/sec from the atomic pools.  It would be more appropriate to scale
the default size based on memory capacity unless the coherent_pool
option is used on the kernel command line.

This provides a slight optimization on initial expansion and is deemed
appropriate due to the increased reliance on the atomic pools.  Note that
the default size of 128KB per pool will normally be larger than the
single coherent pool implementation since there are now up to three
coherent pools (DMA, DMA32, and kernel).

Note that even prior to this patch, coherent_pool= for sizes larger than
1 << (PAGE_SHIFT + MAX_ORDER-1) can fail.  With new dynamic expansion
support, this would be trivially extensible to allow even larger initial
sizes.

Signed-off-by: David Rientjes 
---
 kernel/dma/pool.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
index 3e22022c933b..763b687569b0 100644
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -22,8 +22,8 @@ static unsigned long pool_size_dma32;
 static unsigned long pool_size_kernel;
 #endif
 
-#define DEFAULT_DMA_COHERENT_POOL_SIZE  SZ_256K
-static size_t atomic_pool_size = DEFAULT_DMA_COHERENT_POOL_SIZE;
+/* Size can be defined by the coherent_pool command line */
+static size_t atomic_pool_size;
 
 /* Dynamic background expansion when the atomic pool is near capacity */
 static struct work_struct atomic_pool_work;
@@ -181,6 +181,16 @@ static int __init dma_atomic_pool_init(void)
 {
int ret = 0;
 
+   /*
+* If coherent_pool was not used on the command line, default the pool
+* sizes to 128KB per 1GB of memory, min 128KB, max MAX_ORDER-1.
+*/
+   if (!atomic_pool_size) {
+   atomic_pool_size = max(totalram_pages() >> PAGE_SHIFT, 1UL) *
+   SZ_128K;
+   atomic_pool_size = min_t(size_t, atomic_pool_size,
+1 << (PAGE_SHIFT + MAX_ORDER-1));
+   }
INIT_WORK(&atomic_pool_work, atomic_pool_work_fn);
 
atomic_pool_kernel = __dma_atomic_pool_init(atomic_pool_size,
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[patch 5/7] dma-pool: add pool sizes to debugfs

2020-04-14 Thread David Rientjes via iommu
The atomic DMA pools can dynamically expand based on non-blocking
allocations that need to use it.

Export the sizes of each of these pools, in bytes, through debugfs for
measurement.

Suggested-by: Christoph Hellwig 
Signed-off-by: David Rientjes 
---
 kernel/dma/pool.c | 41 +
 1 file changed, 41 insertions(+)

diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
index cf052314d9e4..3e22022c933b 100644
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -2,6 +2,7 @@
 /*
  * Copyright (C) 2020 Google LLC
  */
+#include 
 #include 
 #include 
 #include 
@@ -15,6 +16,11 @@
 static struct gen_pool *atomic_pool_dma __ro_after_init;
 static struct gen_pool *atomic_pool_dma32 __ro_after_init;
 static struct gen_pool *atomic_pool_kernel __ro_after_init;
+#ifdef CONFIG_DEBUG_FS
+static unsigned long pool_size_dma;
+static unsigned long pool_size_dma32;
+static unsigned long pool_size_kernel;
+#endif
 
 #define DEFAULT_DMA_COHERENT_POOL_SIZE  SZ_256K
 static size_t atomic_pool_size = DEFAULT_DMA_COHERENT_POOL_SIZE;
@@ -29,6 +35,38 @@ static int __init early_coherent_pool(char *p)
 }
 early_param("coherent_pool", early_coherent_pool);
 
+#ifdef CONFIG_DEBUG_FS
+static void __init dma_atomic_pool_debugfs_init(void)
+{
+   struct dentry *root;
+
+   root = debugfs_create_dir("dma_pools", NULL);
+   if (IS_ERR_OR_NULL(root))
+   return;
+
+   debugfs_create_ulong("pool_size_dma", 0400, root, &pool_size_dma);
+   debugfs_create_ulong("pool_size_dma32", 0400, root, &pool_size_dma32);
+   debugfs_create_ulong("pool_size_kernel", 0400, root, &pool_size_kernel);
+}
+
+static void dma_atomic_pool_size_add(gfp_t gfp, size_t size)
+{
+   if (gfp & __GFP_DMA)
+   pool_size_dma += size;
+   else if (gfp & __GFP_DMA32)
+   pool_size_dma32 += size;
+   else
+   pool_size_kernel += size;
+}
+#else
+static inline void dma_atomic_pool_debugfs_init(void)
+{
+}
+static inline void dma_atomic_pool_size_add(gfp_t gfp, size_t size)
+{
+}
+#endif /* CONFIG_DEBUG_FS */
+
 static int atomic_pool_expand(struct gen_pool *pool, size_t pool_size,
  gfp_t gfp)
 {
@@ -76,6 +114,7 @@ static int atomic_pool_expand(struct gen_pool *pool, size_t 
pool_size,
if (ret)
goto encrypt_mapping;
 
+   dma_atomic_pool_size_add(gfp, pool_size);
return 0;
 
 encrypt_mapping:
@@ -160,6 +199,8 @@ static int __init dma_atomic_pool_init(void)
if (!atomic_pool_dma32)
ret = -ENOMEM;
}
+
+   dma_atomic_pool_debugfs_init();
return ret;
 }
 postcore_initcall(dma_atomic_pool_init);
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[patch 2/7] dma-pool: add additional coherent pools to map to gfp mask

2020-04-14 Thread David Rientjes via iommu
The single atomic pool is allocated from the lowest zone possible since
it is guaranteed to be applicable for any DMA allocation.

Devices may allocate through the DMA API but not have a strict reliance
on GFP_DMA memory.  Since the atomic pool will be used for all
non-blockable allocations, returning all memory from ZONE_DMA may
unnecessarily deplete the zone.

Provision for multiple atomic pools that will map to the optimal gfp
mask of the device.

When allocating non-blockable memory, determine the optimal gfp mask of
the device and use the appropriate atomic pool.

The coherent DMA mask will remain the same between allocation and free
and, thus, memory will be freed to the same atomic pool it was allocated
from.

__dma_atomic_pool_init() will be changed to return struct gen_pool *
later once dynamic expansion is added.

Signed-off-by: David Rientjes 
---
 drivers/iommu/dma-iommu.c   |   5 +-
 include/linux/dma-direct.h  |   2 +
 include/linux/dma-mapping.h |   6 +-
 kernel/dma/direct.c |  12 ++--
 kernel/dma/pool.c   | 120 +++-
 5 files changed, 91 insertions(+), 54 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index ba128d1cdaee..4959f5df21bd 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -952,7 +952,7 @@ static void __iommu_dma_free(struct device *dev, size_t 
size, void *cpu_addr)
 
/* Non-coherent atomic allocation? Easy */
if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
-   dma_free_from_pool(cpu_addr, alloc_size))
+   dma_free_from_pool(dev, cpu_addr, alloc_size))
return;
 
if (IS_ENABLED(CONFIG_DMA_REMAP) && is_vmalloc_addr(cpu_addr)) {
@@ -1035,7 +1035,8 @@ static void *iommu_dma_alloc(struct device *dev, size_t 
size,
 
if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
!gfpflags_allow_blocking(gfp) && !coherent)
-   cpu_addr = dma_alloc_from_pool(PAGE_ALIGN(size), &page, gfp);
+   cpu_addr = dma_alloc_from_pool(dev, PAGE_ALIGN(size), &page,
+  gfp);
else
cpu_addr = iommu_dma_alloc_pages(dev, size, &page, gfp, attrs);
if (!cpu_addr)
diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index 24b8684aa21d..136f984df0d9 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -67,6 +67,8 @@ static inline bool dma_capable(struct device *dev, dma_addr_t 
addr, size_t size,
 }
 
 u64 dma_direct_get_required_mask(struct device *dev);
+gfp_t dma_direct_optimal_gfp_mask(struct device *dev, u64 dma_mask,
+ u64 *phys_mask);
 void *dma_direct_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
gfp_t gfp, unsigned long attrs);
 void dma_direct_free(struct device *dev, size_t size, void *cpu_addr,
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 330ad58fbf4d..b43116a6405d 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -630,9 +630,9 @@ void *dma_common_pages_remap(struct page **pages, size_t 
size,
pgprot_t prot, const void *caller);
 void dma_common_free_remap(void *cpu_addr, size_t size);
 
-bool dma_in_atomic_pool(void *start, size_t size);
-void *dma_alloc_from_pool(size_t size, struct page **ret_page, gfp_t flags);
-bool dma_free_from_pool(void *start, size_t size);
+void *dma_alloc_from_pool(struct device *dev, size_t size,
+ struct page **ret_page, gfp_t flags);
+bool dma_free_from_pool(struct device *dev, void *start, size_t size);
 
 int
 dma_common_get_sgtable(struct device *dev, struct sg_table *sgt, void 
*cpu_addr,
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 8f4bbdaf965e..a834ee22f8ff 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -45,8 +45,8 @@ u64 dma_direct_get_required_mask(struct device *dev)
return (1ULL << (fls64(max_dma) - 1)) * 2 - 1;
 }
 
-static gfp_t __dma_direct_optimal_gfp_mask(struct device *dev, u64 dma_mask,
-   u64 *phys_limit)
+gfp_t dma_direct_optimal_gfp_mask(struct device *dev, u64 dma_mask,
+ u64 *phys_limit)
 {
u64 dma_limit = min_not_zero(dma_mask, dev->bus_dma_limit);
 
@@ -89,8 +89,8 @@ struct page *__dma_direct_alloc_pages(struct device *dev, 
size_t size,
 
/* we always manually zero the memory once we are done: */
gfp &= ~__GFP_ZERO;
-   gfp |= __dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask,
-   &phys_limit);
+   gfp |= dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask,
+  &phys_limit);
page = dma_alloc_contiguous(dev, alloc_size, gfp);
if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
dma_free_contiguous(dev, page, alloc_size);
@@ -128,7 +128,7 @@ void *dma_direct_al

Re: [rfc v2 3/6] dma-pool: dynamically expanding atomic pools

2020-04-14 Thread David Rientjes via iommu
On Tue, 14 Apr 2020, Christoph Hellwig wrote:

> > I'll rely on Christoph to determine whether it makes sense to add some 
> > periodic scavening of the atomic pools, whether that's needed for this to 
> > be merged, or wheter we should enforce some maximum pool size.
> 
> I don't really see the point.  In fact the only part of the series
> I feel uneasy about is the growing of the pools, because it already
> adds a fair amount of complexity that we might not need for simple
> things, but shrinking really doesn't make any sense.  So I'm tempted
> to not ever support shrinking, and even make growing optional code under
> a new config variable.  We'll also need a way to query the current size
> through e.g. a debugfs file.
> 

New debugfs file sounds good, I'll add it.  If we want to disable dynamic 
expansion when the pool is depleted under a new config option, let me 
know.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [rfc v2 4/6] dma-direct: atomic allocations must come from atomic coherent pools

2020-04-14 Thread David Rientjes via iommu
On Tue, 14 Apr 2020, Christoph Hellwig wrote:

> > +   /*
> > +* Unencrypted memory must come directly from DMA atomic pools if
> > +* blocking is not allowed.
> > +*/
> > +   if (IS_ENABLED(CONFIG_DMA_COHERENT_POOL) &&
> > +   force_dma_unencrypted(dev) && !gfpflags_allow_blocking(gfp)) {
> > +   ret = dma_alloc_from_pool(dev, PAGE_ALIGN(size), &page, gfp);
> > +   if (!ret)
> > +   return NULL;
> > +   goto done;
> > +   }
> > +
> > if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
> > dma_alloc_need_uncached(dev, attrs) &&
> > !gfpflags_allow_blocking(gfp)) {
> 
> Can we keep a single conditional for the pool allocations?  Maybe
> add a new dma_alloc_from_pool helper ala:
> 
> static inline bool dma_alloc_from_pool(struct device *dev, gfp_t gfp)
> {
>   if (!IS_ENABLED(CONFIG_DMA_COHERENT_POOL))
>   return false;
>   if (gfpflags_allow_blocking(gfp))
>   return false;
>   if (force_dma_unencrypted(dev))
>   return true;
>   if (dma_alloc_need_uncached(dev))
>   return true;
> }

Looks good, fixed.  I renamed it to dma_should_alloc_from_pool() to avoid 
confusing it with the actual allocation function and added a 
dma_should_free_from_pool() as well.

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -75,6 +75,39 @@ static bool dma_coherent_ok(struct device *dev, phys_addr_t 
phys, size_t size)
min_not_zero(dev->coherent_dma_mask, 
dev->bus_dma_limit);
 }
 
+/*
+ * Decrypting memory is allowed to block, so if this device requires
+ * unencrypted memory it must come from atomic pools.
+ */
+static inline bool dma_should_alloc_from_pool(struct device *dev, gfp_t gfp,
+ unsigned long attrs)
+{
+   if (!IS_ENABLED(CONFIG_DMA_COHERENTPOOL))
+   return false;
+   if (gfpflags_allow_blocking(gfp))
+   return false;
+   if (force_dma_unencrypted(dev))
+   return true;
+   if (!IS_ENABLED(CONFIG_DMA_DIRECT_REMAP))
+   return false;
+   if (dma_alloc_need_uncached(dev, attrs))
+   return true;
+   return false;
+}
+
+static inline bool dma_should_free_from_pool(struct device *dev,
+unsigned long attrs)
+{
+   if (IS_ENABLED(CONFIG_DMA_COHERENT_POOL))
+   return true;
+   if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) &&
+   !force_dma_unencrypted(dev))
+   return false;
+   if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP))
+   return true;
+   return false;
+}
+
 struct page *__dma_direct_alloc_pages(struct device *dev, size_t size,
gfp_t gfp, unsigned long attrs)
 {
@@ -124,9 +157,7 @@ void *dma_direct_alloc_pages(struct device *dev, size_t 
size,
struct page *page;
void *ret;
 
-   if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
-   dma_alloc_need_uncached(dev, attrs) &&
-   !gfpflags_allow_blocking(gfp)) {
+   if (dma_should_alloc_from_pool(dev, gfp, attrs)) {
ret = dma_alloc_from_pool(dev, PAGE_ALIGN(size), &page, gfp);
if (!ret)
return NULL;
@@ -202,6 +233,11 @@ void dma_direct_free_pages(struct device *dev, size_t 
size, void *cpu_addr,
 {
unsigned int page_order = get_order(size);
 
+   /* If cpu_addr is not from an atomic pool, dma_free_from_pool() fails */
+   if (dma_should_free_from_pool(dev, attrs) &&
+   dma_free_from_pool(dev, cpu_addr, PAGE_ALIGN(size)))
+   return;
+
if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) &&
!force_dma_unencrypted(dev)) {
/* cpu_addr is a struct page cookie, not a kernel address */
@@ -209,10 +245,6 @@ void dma_direct_free_pages(struct device *dev, size_t 
size, void *cpu_addr,
return;
}
 
-   if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
-   dma_free_from_pool(dev, cpu_addr, PAGE_ALIGN(size)))
-   return;
-
if (force_dma_unencrypted(dev))
set_memory_encrypted((unsigned long)cpu_addr, 1 << page_order);
 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [rfc v2 4/6] dma-direct: atomic allocations must come from atomic coherent pools

2020-04-14 Thread David Rientjes via iommu
On Thu, 9 Apr 2020, Tom Lendacky wrote:

> > When a device required unencrypted memory and the context does not allow
> 
> required => requires
> 

Fixed, thanks.

> > blocking, memory must be returned from the atomic coherent pools.
> > 
> > This avoids the remap when CONFIG_DMA_DIRECT_REMAP is not enabled and the
> > config only requires CONFIG_DMA_COHERENT_POOL.  This will be used for
> > CONFIG_AMD_MEM_ENCRYPT in a subsequent patch.
> > 
> > Keep all memory in these pools unencrypted.
> > 
> > Signed-off-by: David Rientjes 
> > ---
> >   kernel/dma/direct.c | 16 
> >   kernel/dma/pool.c   | 15 +--
> >   2 files changed, 29 insertions(+), 2 deletions(-)
> > 
> > diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
> > index 70800ca64f13..44165263c185 100644
> > --- a/kernel/dma/direct.c
> > +++ b/kernel/dma/direct.c
> > @@ -124,6 +124,18 @@ void *dma_direct_alloc_pages(struct device *dev, size_t
> > size,
> > struct page *page;
> > void *ret;
> >   + /*
> > +* Unencrypted memory must come directly from DMA atomic pools if
> > +* blocking is not allowed.
> > +*/
> > +   if (IS_ENABLED(CONFIG_DMA_COHERENT_POOL) &&
> > +   force_dma_unencrypted(dev) && !gfpflags_allow_blocking(gfp)) {
> > +   ret = dma_alloc_from_pool(dev, PAGE_ALIGN(size), &page, gfp);
> > +   if (!ret)
> > +   return NULL;
> > +   goto done;
> > +   }
> > +
> > if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
> > dma_alloc_need_uncached(dev, attrs) &&
> > !gfpflags_allow_blocking(gfp)) {
> > @@ -203,6 +215,10 @@ void dma_direct_free_pages(struct device *dev, size_t
> > size, void *cpu_addr,
> >   {
> > unsigned int page_order = get_order(size);
> >   + if (IS_ENABLED(CONFIG_DMA_COHERENT_POOL) &&
> > +   dma_free_from_pool(dev, cpu_addr, PAGE_ALIGN(size)))
> > +   return;
> > +
> > if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) &&
> > !force_dma_unencrypted(dev)) {
> > /* cpu_addr is a struct page cookie, not a kernel address */
> > diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
> > index e14c5a2da734..6685ab89cfa7 100644
> > --- a/kernel/dma/pool.c
> > +++ b/kernel/dma/pool.c
> > @@ -9,6 +9,7 @@
> >   #include 
> >   #include 
> >   #include 
> > +#include 
> >   #include 
> >   #include 
> >   #include 
> > @@ -55,12 +56,20 @@ static int atomic_pool_expand(struct gen_pool *pool,
> > size_t pool_size,
> > arch_dma_prep_coherent(page, pool_size);
> >   +#ifdef CONFIG_DMA_DIRECT_REMAP
> > addr = dma_common_contiguous_remap(page, pool_size,
> >pgprot_dmacoherent(PAGE_KERNEL),
> >__builtin_return_address(0));
> > if (!addr)
> > goto free_page;
> > -
> > +#else
> > +   addr = page_to_virt(page);
> > +#endif
> > +   /*
> > +* Memory in the atomic DMA pools must be unencrypted, the pools do
> > not
> > +* shrink so no re-encryption occurs in dma_direct_free_pages().
> > +*/
> > +   set_memory_decrypted((unsigned long)page_to_virt(page), 1 << order);
> > ret = gen_pool_add_virt(pool, (unsigned long)addr, page_to_phys(page),
> > pool_size, NUMA_NO_NODE);
> > if (ret)
> > @@ -69,8 +78,10 @@ static int atomic_pool_expand(struct gen_pool *pool,
> > size_t pool_size,
> > return 0;
> > remove_mapping:
> > +#ifdef CONFIG_DMA_DIRECT_REMAP
> > dma_common_free_remap(addr, pool_size);
> 
> You're about to free the memory, but you've called set_memory_decrypted()
> against it, so you need to do a set_memory_encrypted() to bring it back to a
> state ready for allocation again.
> 

Ah, good catch, thanks.  I notice that I should also be checking the 
return value of set_memory_decrypted() because pages added to the coherent 
pools *must* be unencrypted.  If it fails, we fail the expansion.

And do the same thing for set_memory_encrypted(), which would be a bizarre 
situation (decrypt succeeded, encrypt failed), by simply leaking the page.

diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -7,6 +7,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -53,22 +54,42 @@ static int atomic_pool_expand(struct gen_pool *pool, size_t 
pool_size,
 
arch_dma_prep_coherent(page, pool_size);
 
+#ifdef CONFIG_DMA_DIRECT_REMAP
addr = dma_common_contiguous_remap(page, pool_size,
   pgprot_dmacoherent(PAGE_KERNEL),
   __builtin_return_address(0));
if (!addr)
goto free_page;
-
+#else
+   addr = page_to_virt(page);
+#endif
+   /*
+* Memory in the atomic DMA pools must be unencrypted, the pools do not
+* shrink so no re-encryption occurs in dma_direct_free_pages().
+*/
+   ret = set_memory_decrypted((unsigned long)page_to_virt(page)

Re: [rfc v2 3/6] dma-pool: dynamically expanding atomic pools

2020-04-10 Thread David Rientjes via iommu
On Fri, 10 Apr 2020, Hillf Danton wrote:

> 
> On Wed, 8 Apr 2020 14:21:06 -0700 (PDT) David Rientjes wrote:
> > 
> > When an atomic pool becomes fully depleted because it is now relied upon
> > for all non-blocking allocations through the DMA API, allow background
> > expansion of each pool by a kworker.
> > 
> > When an atomic pool has less than the default size of memory left, kick
> > off a kworker to dynamically expand the pool in the background.  The pool
> > is doubled in size, up to MAX_ORDER-1.  If memory cannot be allocated at
> > the requested order, smaller allocation(s) are attempted.
> > 
> What is proposed looks like a path of single lane without how to
> dynamically shrink the pool taken into account. Thus the risk may
> rise in corner cases where pools are over-expanded in long run
> after one-off peak allocation requests.
> 

To us, this is actually a benefit: we prefer the peak size to be 
maintained so that we do not need to dynamic resize the pool later at the 
cost of throughput.  Genpool also does not have great support for 
scavenging and freeing unused chunks.

Perhaps we could enforce a maximum size on the pools just as we allow the 
default size to be defined by coherent_size= on the command line.  Our use 
case would not set this, however, since we have not seen egregious genpool 
sizes as the result of non-blockable DMA allocations (perhaps the drivers 
we use just play friendlier and you have seen excessive usage?).

I'll rely on Christoph to determine whether it makes sense to add some 
periodic scavening of the atomic pools, whether that's needed for this to 
be merged, or wheter we should enforce some maximum pool size.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[rfc v2 0/6] unencrypted atomic DMA pools with dynamic expansion

2020-04-08 Thread David Rientjes via iommu
set_memory_decrypted() may block so it is not possible to do non-blocking
allocations through the DMA API for devices that required unencrypted
memory.

The solution is to expand the atomic DMA pools for the various possible
gfp requirements as a means to prevent an unnecessary depletion of lowmem.
These atomic pools are separated from the remap code and can be selected
for configurations that need them outside the scope of
CONFIG_DMA_DIRECT_REMAP, such as CONFIG_AMD_MEM_ENCRYPT.

These atomic DMA pools are kept unencrypted so they can immediately be
used for non-blocking allocations.  Since the need for this type of memory
depends on the kernel config and devices being used, these pools are also
dynamically expandable.

There are a number of changes to v1 of the patchset based on Christoph's
feedback and the separation of the atomic pools out into the new
kernel/dma/pool.c.

NOTE!  I'd like eyes from Christoph specifically on patch 4 in the series
where we handle the coherent pools differently depending on
CONFIG_DMA_COHERENT_POOL and/or CONFIG_DMA_DIRECT_REMAP and from Thomas
on the requirement for set_memory_decrypted() for all DMA coherent pools.

Why still an RFC?  We are starting to aggressively test this series but
since there were a number of changes that were requested for the first
RFC, it would be better to have feedback and factor that into the series
earlier rather than later so testing can be focused on a series that
could be merged upstream.

This patchset is based on latest Linus HEAD:

commit ae46d2aa6a7fbe8ca0946f24b061b6ccdc6c3f25
Author: Hillf Danton 
Date:   Wed Apr 8 11:59:24 2020 -0400

mm/gup: Let __get_user_pages_locked() return -EINTR for fatal signal
---
 arch/x86/Kconfig|   1 +
 drivers/iommu/dma-iommu.c   |   5 +-
 include/linux/dma-direct.h  |   2 +
 include/linux/dma-mapping.h |   6 +-
 kernel/dma/Kconfig  |   6 +-
 kernel/dma/Makefile |   1 +
 kernel/dma/direct.c |  28 -
 kernel/dma/pool.c   | 224 
 kernel/dma/remap.c  | 114 --
 9 files changed, 261 insertions(+), 126 deletions(-)
 create mode 100644 kernel/dma/pool.c

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[rfc v2 6/6] dma-pool: scale the default DMA coherent pool size with memory capacity

2020-04-08 Thread David Rientjes via iommu
When AMD memory encryption is enabled, some devices may use more than
256KB/sec from the atomic pools.  It would be more appropriate to scale
the default size based on memory capacity unless the coherent_pool
option is used on the kernel command line.

This provides a slight optimization on initial expansion and is deemed
appropriate due to the increased reliance on the atomic pools.  Note that
the default size of 128KB per pool will normally be larger than the
single coherent pool implementation since there are now up to three
coherent pools (DMA, DMA32, and kernel).

Alternatively, this could be done only when CONFIG_AMD_MEM_ENCRYPT is
enabled.

Signed-off-by: David Rientjes 
---
 kernel/dma/pool.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
index 6685ab89cfa7..42bac953548c 100644
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -18,8 +18,8 @@ static struct gen_pool *atomic_pool_dma __ro_after_init;
 static struct gen_pool *atomic_pool_dma32 __ro_after_init;
 static struct gen_pool *atomic_pool_kernel __ro_after_init;
 
-#define DEFAULT_DMA_COHERENT_POOL_SIZE  SZ_256K
-static size_t atomic_pool_size = DEFAULT_DMA_COHERENT_POOL_SIZE;
+/* Size can be defined by the coherent_pool command line */
+static size_t atomic_pool_size;
 
 /* Dynamic background expansion when the atomic pool is near capacity */
 static struct work_struct atomic_pool_work;
@@ -132,6 +132,16 @@ static int __init dma_atomic_pool_init(void)
 {
int ret = 0;
 
+   /*
+* If coherent_pool was not used on the command line, default the pool
+* sizes to 128KB per 1GB of memory, min 128KB, max MAX_ORDER-1.
+*/
+   if (!atomic_pool_size) {
+   atomic_pool_size = max(totalram_pages() >> PAGE_SHIFT, 1UL) *
+   SZ_128K;
+   atomic_pool_size = min_t(size_t, atomic_pool_size,
+1 << (PAGE_SHIFT + MAX_ORDER-1));
+   }
INIT_WORK(&atomic_pool_work, atomic_pool_work_fn);
 
atomic_pool_kernel = __dma_atomic_pool_init(atomic_pool_size,
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[rfc v2 3/6] dma-pool: dynamically expanding atomic pools

2020-04-08 Thread David Rientjes via iommu
When an atomic pool becomes fully depleted because it is now relied upon
for all non-blocking allocations through the DMA API, allow background
expansion of each pool by a kworker.

When an atomic pool has less than the default size of memory left, kick
off a kworker to dynamically expand the pool in the background.  The pool
is doubled in size, up to MAX_ORDER-1.  If memory cannot be allocated at
the requested order, smaller allocation(s) are attempted.

This allows the default size to be kept quite low when one or more of the
atomic pools is not used.

This also allows __dma_atomic_pool_init to return a pointer to the pool
to make initialization cleaner.

Also switch over some node ids to the more appropriate NUMA_NO_NODE.

Signed-off-by: David Rientjes 
---
 kernel/dma/pool.c | 120 +++---
 1 file changed, 82 insertions(+), 38 deletions(-)

diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
index 9b806f5eded8..e14c5a2da734 100644
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -11,13 +11,17 @@
 #include 
 #include 
 #include 
+#include 
 
 static struct gen_pool *atomic_pool_dma __ro_after_init;
 static struct gen_pool *atomic_pool_dma32 __ro_after_init;
 static struct gen_pool *atomic_pool_kernel __ro_after_init;
 
 #define DEFAULT_DMA_COHERENT_POOL_SIZE  SZ_256K
-static size_t atomic_pool_size __initdata = DEFAULT_DMA_COHERENT_POOL_SIZE;
+static size_t atomic_pool_size = DEFAULT_DMA_COHERENT_POOL_SIZE;
+
+/* Dynamic background expansion when the atomic pool is near capacity */
+static struct work_struct atomic_pool_work;
 
 static int __init early_coherent_pool(char *p)
 {
@@ -26,76 +30,114 @@ static int __init early_coherent_pool(char *p)
 }
 early_param("coherent_pool", early_coherent_pool);
 
-static int __init __dma_atomic_pool_init(struct gen_pool **pool,
-size_t pool_size, gfp_t gfp)
+static int atomic_pool_expand(struct gen_pool *pool, size_t pool_size,
+ gfp_t gfp)
 {
-   const unsigned int order = get_order(pool_size);
-   const unsigned long nr_pages = pool_size >> PAGE_SHIFT;
+   unsigned int order;
struct page *page;
void *addr;
-   int ret;
+   int ret = -ENOMEM;
+
+   /* Cannot allocate larger than MAX_ORDER-1 */
+   order = min(get_order(pool_size), MAX_ORDER-1);
+
+   do {
+   pool_size = 1 << (PAGE_SHIFT + order);
 
-   if (dev_get_cma_area(NULL))
-   page = dma_alloc_from_contiguous(NULL, nr_pages, order, false);
-   else
-   page = alloc_pages(gfp, order);
+   if (dev_get_cma_area(NULL))
+   page = dma_alloc_from_contiguous(NULL, 1 << order,
+order, false);
+   else
+   page = alloc_pages(gfp, order);
+   } while (!page && order-- > 0);
if (!page)
goto out;
 
arch_dma_prep_coherent(page, pool_size);
 
-   *pool = gen_pool_create(PAGE_SHIFT, -1);
-   if (!*pool)
-   goto free_page;
-
addr = dma_common_contiguous_remap(page, pool_size,
   pgprot_dmacoherent(PAGE_KERNEL),
   __builtin_return_address(0));
if (!addr)
-   goto destroy_genpool;
+   goto free_page;
 
-   ret = gen_pool_add_virt(*pool, (unsigned long)addr, page_to_phys(page),
-   pool_size, -1);
+   ret = gen_pool_add_virt(pool, (unsigned long)addr, page_to_phys(page),
+   pool_size, NUMA_NO_NODE);
if (ret)
goto remove_mapping;
-   gen_pool_set_algo(*pool, gen_pool_first_fit_order_align, NULL);
 
-   pr_info("DMA: preallocated %zu KiB %pGg pool for atomic allocations\n",
-   pool_size >> 10, &gfp);
return 0;
 
 remove_mapping:
dma_common_free_remap(addr, pool_size);
-destroy_genpool:
-   gen_pool_destroy(*pool);
-   *pool = NULL;
 free_page:
-   if (!dma_release_from_contiguous(NULL, page, nr_pages))
+   if (!dma_release_from_contiguous(NULL, page, 1 << order))
__free_pages(page, order);
 out:
-   pr_err("DMA: failed to allocate %zu KiB %pGg pool for atomic 
allocation\n",
-  pool_size >> 10, &gfp);
-   return -ENOMEM;
+   return ret;
+}
+
+static void atomic_pool_resize(struct gen_pool *pool, gfp_t gfp)
+{
+   if (pool && gen_pool_avail(pool) < atomic_pool_size)
+   atomic_pool_expand(pool, gen_pool_size(pool), gfp);
+}
+
+static void atomic_pool_work_fn(struct work_struct *work)
+{
+   if (IS_ENABLED(CONFIG_ZONE_DMA))
+   atomic_pool_resize(atomic_pool_dma, GFP_DMA);
+   if (IS_ENABLED(CONFIG_ZONE_DMA32))
+   atomic_pool_resize(atomic_pool_dma32, GFP_DMA32);
+   atomic_pool_resize(atomic_pool_kernel, GFP_KE

[rfc v2 5/6] x86/mm: unencrypted non-blocking DMA allocations use coherent pools

2020-04-08 Thread David Rientjes via iommu
When CONFIG_AMD_MEM_ENCRYPT is enabled and a device requires unencrypted
DMA, all non-blocking allocations must originate from the atomic DMA
coherent pools.

Select CONFIG_DMA_COHERENT_POOL for CONFIG_AMD_MEM_ENCRYPT.

Signed-off-by: David Rientjes 
---
 arch/x86/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 8d078642b4be..b7c9e78a5374 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1520,6 +1520,7 @@ config X86_CPA_STATISTICS
 config AMD_MEM_ENCRYPT
bool "AMD Secure Memory Encryption (SME) support"
depends on X86_64 && CPU_SUP_AMD
+   select DMA_COHERENT_POOL
select DYNAMIC_PHYSICAL_MASK
select ARCH_USE_MEMREMAP_PROT
select ARCH_HAS_FORCE_DMA_UNENCRYPTED
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[rfc v2 2/6] dma-pool: add additional coherent pools to map to gfp mask

2020-04-08 Thread David Rientjes via iommu
The single atomic pool is allocated from the lowest zone possible since
it is guaranteed to be applicable for any DMA allocation.

Devices may allocate through the DMA API but not have a strict reliance
on GFP_DMA memory.  Since the atomic pool will be used for all
non-blockable allocations, returning all memory from ZONE_DMA may
unnecessarily deplete the zone.

Provision for multiple atomic pools that will map to the optimal gfp
mask of the device.

When allocating non-blockable memory, determine the optimal gfp mask of
the device and use the appropriate atomic pool.

The coherent DMA mask will remain the same between allocation and free
and, thus, memory will be freed to the same atomic pool it was allocated
from.

__dma_atomic_pool_init() will be changed to return struct gen_pool *
later once dynamic expansion is added.

Signed-off-by: David Rientjes 
---
 drivers/iommu/dma-iommu.c   |   5 +-
 include/linux/dma-direct.h  |   2 +
 include/linux/dma-mapping.h |   6 +-
 kernel/dma/direct.c |  12 ++--
 kernel/dma/pool.c   | 120 +++-
 5 files changed, 91 insertions(+), 54 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index ba128d1cdaee..4959f5df21bd 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -952,7 +952,7 @@ static void __iommu_dma_free(struct device *dev, size_t 
size, void *cpu_addr)
 
/* Non-coherent atomic allocation? Easy */
if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
-   dma_free_from_pool(cpu_addr, alloc_size))
+   dma_free_from_pool(dev, cpu_addr, alloc_size))
return;
 
if (IS_ENABLED(CONFIG_DMA_REMAP) && is_vmalloc_addr(cpu_addr)) {
@@ -1035,7 +1035,8 @@ static void *iommu_dma_alloc(struct device *dev, size_t 
size,
 
if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
!gfpflags_allow_blocking(gfp) && !coherent)
-   cpu_addr = dma_alloc_from_pool(PAGE_ALIGN(size), &page, gfp);
+   cpu_addr = dma_alloc_from_pool(dev, PAGE_ALIGN(size), &page,
+  gfp);
else
cpu_addr = iommu_dma_alloc_pages(dev, size, &page, gfp, attrs);
if (!cpu_addr)
diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
index 24b8684aa21d..136f984df0d9 100644
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -67,6 +67,8 @@ static inline bool dma_capable(struct device *dev, dma_addr_t 
addr, size_t size,
 }
 
 u64 dma_direct_get_required_mask(struct device *dev);
+gfp_t dma_direct_optimal_gfp_mask(struct device *dev, u64 dma_mask,
+ u64 *phys_mask);
 void *dma_direct_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
gfp_t gfp, unsigned long attrs);
 void dma_direct_free(struct device *dev, size_t size, void *cpu_addr,
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 330ad58fbf4d..b43116a6405d 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -630,9 +630,9 @@ void *dma_common_pages_remap(struct page **pages, size_t 
size,
pgprot_t prot, const void *caller);
 void dma_common_free_remap(void *cpu_addr, size_t size);
 
-bool dma_in_atomic_pool(void *start, size_t size);
-void *dma_alloc_from_pool(size_t size, struct page **ret_page, gfp_t flags);
-bool dma_free_from_pool(void *start, size_t size);
+void *dma_alloc_from_pool(struct device *dev, size_t size,
+ struct page **ret_page, gfp_t flags);
+bool dma_free_from_pool(struct device *dev, void *start, size_t size);
 
 int
 dma_common_get_sgtable(struct device *dev, struct sg_table *sgt, void 
*cpu_addr,
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index a8560052a915..70800ca64f13 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -44,8 +44,8 @@ u64 dma_direct_get_required_mask(struct device *dev)
return (1ULL << (fls64(max_dma) - 1)) * 2 - 1;
 }
 
-static gfp_t __dma_direct_optimal_gfp_mask(struct device *dev, u64 dma_mask,
-   u64 *phys_limit)
+gfp_t dma_direct_optimal_gfp_mask(struct device *dev, u64 dma_mask,
+ u64 *phys_limit)
 {
u64 dma_limit = min_not_zero(dma_mask, dev->bus_dma_limit);
 
@@ -88,8 +88,8 @@ struct page *__dma_direct_alloc_pages(struct device *dev, 
size_t size,
 
/* we always manually zero the memory once we are done: */
gfp &= ~__GFP_ZERO;
-   gfp |= __dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask,
-   &phys_limit);
+   gfp |= dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask,
+  &phys_limit);
page = dma_alloc_contiguous(dev, alloc_size, gfp);
if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
dma_free_contiguous(dev, page, alloc_size);
@@ -127,7 +127,7 @@ void *dma_direct_al

[rfc v2 1/6] dma-remap: separate DMA atomic pools from direct remap code

2020-04-08 Thread David Rientjes via iommu
DMA atomic pools will be needed beyond only CONFIG_DMA_DIRECT_REMAP so
separate them out into their own file.

This also adds a new Kconfig option that can be subsequently used for
options, such as CONFIG_AMD_MEM_ENCRYPT, that will utilize the coherent
pools but do not have a dependency on direct remapping.

For this patch alone, there is no functional change introduced.

Signed-off-by: David Rientjes 
---
 kernel/dma/Kconfig  |   6 ++-
 kernel/dma/Makefile |   1 +
 kernel/dma/pool.c   | 125 
 kernel/dma/remap.c  | 114 
 4 files changed, 131 insertions(+), 115 deletions(-)
 create mode 100644 kernel/dma/pool.c

diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig
index 4c103a24e380..d006668c0027 100644
--- a/kernel/dma/Kconfig
+++ b/kernel/dma/Kconfig
@@ -79,10 +79,14 @@ config DMA_REMAP
select DMA_NONCOHERENT_MMAP
bool
 
-config DMA_DIRECT_REMAP
+config DMA_COHERENT_POOL
bool
select DMA_REMAP
 
+config DMA_DIRECT_REMAP
+   bool
+   select DMA_COHERENT_POOL
+
 config DMA_CMA
bool "DMA Contiguous Memory Allocator"
depends on HAVE_DMA_CONTIGUOUS && CMA
diff --git a/kernel/dma/Makefile b/kernel/dma/Makefile
index d237cf3dc181..370f63344e9c 100644
--- a/kernel/dma/Makefile
+++ b/kernel/dma/Makefile
@@ -6,4 +6,5 @@ obj-$(CONFIG_DMA_DECLARE_COHERENT)  += coherent.o
 obj-$(CONFIG_DMA_VIRT_OPS) += virt.o
 obj-$(CONFIG_DMA_API_DEBUG)+= debug.o
 obj-$(CONFIG_SWIOTLB)  += swiotlb.o
+obj-$(CONFIG_DMA_COHERENT_POOL)+= pool.o
 obj-$(CONFIG_DMA_REMAP)+= remap.o
diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
new file mode 100644
index ..64cbe5852cf6
--- /dev/null
+++ b/kernel/dma/pool.c
@@ -0,0 +1,125 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ * Copyright (c) 2014 The Linux Foundation
+ * Copyright (C) 2020 Google LLC
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static struct gen_pool *atomic_pool __ro_after_init;
+
+#define DEFAULT_DMA_COHERENT_POOL_SIZE  SZ_256K
+static size_t atomic_pool_size __initdata = DEFAULT_DMA_COHERENT_POOL_SIZE;
+
+static int __init early_coherent_pool(char *p)
+{
+   atomic_pool_size = memparse(p, &p);
+   return 0;
+}
+early_param("coherent_pool", early_coherent_pool);
+
+static gfp_t dma_atomic_pool_gfp(void)
+{
+   if (IS_ENABLED(CONFIG_ZONE_DMA))
+   return GFP_DMA;
+   if (IS_ENABLED(CONFIG_ZONE_DMA32))
+   return GFP_DMA32;
+   return GFP_KERNEL;
+}
+
+static int __init dma_atomic_pool_init(void)
+{
+   unsigned int pool_size_order = get_order(atomic_pool_size);
+   unsigned long nr_pages = atomic_pool_size >> PAGE_SHIFT;
+   struct page *page;
+   void *addr;
+   int ret;
+
+   if (dev_get_cma_area(NULL))
+   page = dma_alloc_from_contiguous(NULL, nr_pages,
+pool_size_order, false);
+   else
+   page = alloc_pages(dma_atomic_pool_gfp(), pool_size_order);
+   if (!page)
+   goto out;
+
+   arch_dma_prep_coherent(page, atomic_pool_size);
+
+   atomic_pool = gen_pool_create(PAGE_SHIFT, -1);
+   if (!atomic_pool)
+   goto free_page;
+
+   addr = dma_common_contiguous_remap(page, atomic_pool_size,
+  pgprot_dmacoherent(PAGE_KERNEL),
+  __builtin_return_address(0));
+   if (!addr)
+   goto destroy_genpool;
+
+   ret = gen_pool_add_virt(atomic_pool, (unsigned long)addr,
+   page_to_phys(page), atomic_pool_size, -1);
+   if (ret)
+   goto remove_mapping;
+   gen_pool_set_algo(atomic_pool, gen_pool_first_fit_order_align, NULL);
+
+   pr_info("DMA: preallocated %zu KiB pool for atomic allocations\n",
+   atomic_pool_size / 1024);
+   return 0;
+
+remove_mapping:
+   dma_common_free_remap(addr, atomic_pool_size);
+destroy_genpool:
+   gen_pool_destroy(atomic_pool);
+   atomic_pool = NULL;
+free_page:
+   if (!dma_release_from_contiguous(NULL, page, nr_pages))
+   __free_pages(page, pool_size_order);
+out:
+   pr_err("DMA: failed to allocate %zu KiB pool for atomic coherent 
allocation\n",
+   atomic_pool_size / 1024);
+   return -ENOMEM;
+}
+postcore_initcall(dma_atomic_pool_init);
+
+bool dma_in_atomic_pool(void *start, size_t size)
+{
+   if (unlikely(!atomic_pool))
+   return false;
+
+   return gen_pool_has_addr(atomic_pool, (unsigned long)start, size);
+}
+
+void *dma_alloc_from_pool(size_t size, struct page **ret_page, gfp_t flags)
+{
+   unsigned long val;
+   void *ptr = NULL;
+
+   if (!atomic_pool) {
+   WARN(1, "coherent pool not initiali

[rfc v2 4/6] dma-direct: atomic allocations must come from atomic coherent pools

2020-04-08 Thread David Rientjes via iommu
When a device required unencrypted memory and the context does not allow
blocking, memory must be returned from the atomic coherent pools.

This avoids the remap when CONFIG_DMA_DIRECT_REMAP is not enabled and the
config only requires CONFIG_DMA_COHERENT_POOL.  This will be used for
CONFIG_AMD_MEM_ENCRYPT in a subsequent patch.

Keep all memory in these pools unencrypted.

Signed-off-by: David Rientjes 
---
 kernel/dma/direct.c | 16 
 kernel/dma/pool.c   | 15 +--
 2 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
index 70800ca64f13..44165263c185 100644
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -124,6 +124,18 @@ void *dma_direct_alloc_pages(struct device *dev, size_t 
size,
struct page *page;
void *ret;
 
+   /*
+* Unencrypted memory must come directly from DMA atomic pools if
+* blocking is not allowed.
+*/
+   if (IS_ENABLED(CONFIG_DMA_COHERENT_POOL) &&
+   force_dma_unencrypted(dev) && !gfpflags_allow_blocking(gfp)) {
+   ret = dma_alloc_from_pool(dev, PAGE_ALIGN(size), &page, gfp);
+   if (!ret)
+   return NULL;
+   goto done;
+   }
+
if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
dma_alloc_need_uncached(dev, attrs) &&
!gfpflags_allow_blocking(gfp)) {
@@ -203,6 +215,10 @@ void dma_direct_free_pages(struct device *dev, size_t 
size, void *cpu_addr,
 {
unsigned int page_order = get_order(size);
 
+   if (IS_ENABLED(CONFIG_DMA_COHERENT_POOL) &&
+   dma_free_from_pool(dev, cpu_addr, PAGE_ALIGN(size)))
+   return;
+
if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) &&
!force_dma_unencrypted(dev)) {
/* cpu_addr is a struct page cookie, not a kernel address */
diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c
index e14c5a2da734..6685ab89cfa7 100644
--- a/kernel/dma/pool.c
+++ b/kernel/dma/pool.c
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -55,12 +56,20 @@ static int atomic_pool_expand(struct gen_pool *pool, size_t 
pool_size,
 
arch_dma_prep_coherent(page, pool_size);
 
+#ifdef CONFIG_DMA_DIRECT_REMAP
addr = dma_common_contiguous_remap(page, pool_size,
   pgprot_dmacoherent(PAGE_KERNEL),
   __builtin_return_address(0));
if (!addr)
goto free_page;
-
+#else
+   addr = page_to_virt(page);
+#endif
+   /*
+* Memory in the atomic DMA pools must be unencrypted, the pools do not
+* shrink so no re-encryption occurs in dma_direct_free_pages().
+*/
+   set_memory_decrypted((unsigned long)page_to_virt(page), 1 << order);
ret = gen_pool_add_virt(pool, (unsigned long)addr, page_to_phys(page),
pool_size, NUMA_NO_NODE);
if (ret)
@@ -69,8 +78,10 @@ static int atomic_pool_expand(struct gen_pool *pool, size_t 
pool_size,
return 0;
 
 remove_mapping:
+#ifdef CONFIG_DMA_DIRECT_REMAP
dma_common_free_remap(addr, pool_size);
-free_page:
+#endif
+free_page: __maybe_unused
if (!dma_release_from_contiguous(NULL, page, 1 << order))
__free_pages(page, order);
 out:
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [rfc 5/6] dma-direct: atomic allocations must come from unencrypted pools

2020-03-06 Thread David Rientjes via iommu
On Thu, 5 Mar 2020, Christoph Hellwig wrote:

> On Sun, Mar 01, 2020 at 04:05:23PM -0800, David Rientjes wrote:
> > When AMD memory encryption is enabled, all non-blocking DMA allocations
> > must originate from the atomic pools depending on the device and the gfp
> > mask of the allocation.
> > 
> > Keep all memory in these pools unencrypted.
> > 
> > Signed-off-by: David Rientjes 
> > ---
> >  arch/x86/Kconfig| 1 +
> >  kernel/dma/direct.c | 9 -
> >  kernel/dma/remap.c  | 2 ++
> >  3 files changed, 7 insertions(+), 5 deletions(-)
> > 
> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -1523,6 +1523,7 @@ config X86_CPA_STATISTICS
> >  config AMD_MEM_ENCRYPT
> > bool "AMD Secure Memory Encryption (SME) support"
> > depends on X86_64 && CPU_SUP_AMD
> > +   select DMA_DIRECT_REMAP
> 
> I think we need to split the pool from remapping so that we don't drag
> in the remap code for x86.
> 

Thanks for the review, Christoph.  I can address all the comments that you 
provided for the series but am hoping to get a clarification on this one 
depending on how elaborate the change you would prefer.

As a preliminary change to this series, I could move the atomic pools and 
coherent_pool command line to a new kernel/dma/atomic_pools.c file with a 
new CONFIG_DMA_ATOMIC_POOLS that would get "select"ed by CONFIG_DMA_REMAP 
and CONFIG_AMD_MEM_ENCRYPT and call into dma_common_contiguous_remap() if 
we have CONFIG_DMA_DIRECT_REMAP when adding pages to the pool.

I think that's what you mean by splitting the pool from remapping, 
otherwise we still have a full CONFIG_DMA_REMAP dependency here.  If you 
had something else in mind, please let me know.  Thanks!

> > if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
> > -   dma_alloc_need_uncached(dev, attrs) &&
> 
> We still need a check here for either uncached or memory encryption.
> 
> > @@ -141,6 +142,7 @@ static int atomic_pool_expand(struct gen_pool *pool, 
> > size_t pool_size,
> > if (!addr)
> > goto free_page;
> >  
> > +   set_memory_decrypted((unsigned long)page_to_virt(page), nr_pages);
> 
> This probably warrants a comment.
> 
> Also I think the infrastructure changes should be split from the x86
> wire up.
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [rfc 4/6] dma-remap: dynamically expanding atomic pools

2020-03-03 Thread David Rientjes via iommu
On Sun, 1 Mar 2020, David Rientjes wrote:

> When an atomic pool becomes fully depleted because it is now relied upon
> for all non-blocking allocations through the DMA API, allow background
> expansion of each pool by a kworker.
> 
> When an atomic pool has less than the default size of memory left, kick
> off a kworker to dynamically expand the pool in the background.  The pool
> is doubled in size.
> 
> This allows the default size to be kept quite low when one or more of the
> atomic pools is not used.
> 
> Also switch over some node ids to the more appropriate NUMA_NO_NODE.
> 
> Signed-off-by: David Rientjes 
> ---
>  kernel/dma/remap.c | 79 ++
>  1 file changed, 58 insertions(+), 21 deletions(-)
> 
> diff --git a/kernel/dma/remap.c b/kernel/dma/remap.c
> --- a/kernel/dma/remap.c
> +++ b/kernel/dma/remap.c
> @@ -10,6 +10,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  struct page **dma_common_find_pages(void *cpu_addr)
>  {
> @@ -104,7 +105,10 @@ static struct gen_pool *atomic_pool_dma32 
> __ro_after_init;
>  static struct gen_pool *atomic_pool_normal __ro_after_init;
>  
>  #define DEFAULT_DMA_COHERENT_POOL_SIZE  SZ_256K
> -static size_t atomic_pool_size __initdata = DEFAULT_DMA_COHERENT_POOL_SIZE;
> +static size_t atomic_pool_size = DEFAULT_DMA_COHERENT_POOL_SIZE;
> +
> +/* Dynamic background expansion when the atomic pool is near capacity */
> +struct work_struct atomic_pool_work;
>  
>  static int __init early_coherent_pool(char *p)
>  {
> @@ -113,14 +117,14 @@ static int __init early_coherent_pool(char *p)
>  }
>  early_param("coherent_pool", early_coherent_pool);
>  
> -static int __init __dma_atomic_pool_init(struct gen_pool **pool,
> -  size_t pool_size, gfp_t gfp)
> +static int atomic_pool_expand(struct gen_pool *pool, size_t pool_size,
> +   gfp_t gfp)
>  {
> - const unsigned int order = get_order(pool_size);
>   const unsigned long nr_pages = pool_size >> PAGE_SHIFT;
> + const unsigned int order = get_order(pool_size);
>   struct page *page;
>   void *addr;
> - int ret;
> + int ret = -ENOMEM;
>  
>   if (dev_get_cma_area(NULL))
>   page = dma_alloc_from_contiguous(NULL, nr_pages, order, false);

There's an issue here if the pool grows too large which would result in
order > MAX_ORDER-1.  We can fix that by limiting order to MAX_ORDER-1 and 
doing nr_pages = 1 << order.

I should also add support for trying smaller page allocations if our 
preferred expansion size results in an allocation failure.

Other than that, I'll remove the RFC tag and send a refreshed series by 
the end of the week unless there are other comments or suggestions to 
factor in.

Thanks!

> @@ -131,38 +135,67 @@ static int __init __dma_atomic_pool_init(struct 
> gen_pool **pool,
>  
>   arch_dma_prep_coherent(page, pool_size);
>  
> - *pool = gen_pool_create(PAGE_SHIFT, -1);
> - if (!*pool)
> - goto free_page;
> -
>   addr = dma_common_contiguous_remap(page, pool_size,
>  pgprot_dmacoherent(PAGE_KERNEL),
>  __builtin_return_address(0));
>   if (!addr)
> - goto destroy_genpool;
> + goto free_page;
>  
> - ret = gen_pool_add_virt(*pool, (unsigned long)addr, page_to_phys(page),
> - pool_size, -1);
> + ret = gen_pool_add_virt(pool, (unsigned long)addr, page_to_phys(page),
> + pool_size, NUMA_NO_NODE);
>   if (ret)
>   goto remove_mapping;
> - gen_pool_set_algo(*pool, gen_pool_first_fit_order_align, NULL);
>  
> - pr_info("DMA: preallocated %zu KiB %pGg pool for atomic allocations\n",
> - pool_size >> 10, &gfp);
>   return 0;
>  
>  remove_mapping:
>   dma_common_free_remap(addr, pool_size);
> -destroy_genpool:
> - gen_pool_destroy(*pool);
> - *pool = NULL;
>  free_page:
>   if (!dma_release_from_contiguous(NULL, page, nr_pages))
>   __free_pages(page, order);
>  out:
> - pr_err("DMA: failed to allocate %zu KiB %pGg pool for atomic 
> allocation\n",
> - atomic_pool_size >> 10, &gfp);
> - return -ENOMEM;
> + return ret;
> +}
> +
> +static void atomic_pool_resize(struct gen_pool *pool, gfp_t gfp)
> +{
> + if (pool && gen_pool_avail(pool) < atomic_pool_size)
> + atomic_pool_expand(pool, gen_pool_size(pool), gfp);
> +}
> +
> +static void atomic_pool_work_fn(struct work_struct *work)
> +{
> + if (IS_ENABLED(CONFIG_ZONE_DMA))
> + atomic_pool_resize(atomic_pool, GFP_DMA);
> + if (IS_ENABLED(CONFIG_ZONE_DMA32))
> + atomic_pool_resize(atomic_pool_dma32, GFP_DMA32);
> + atomic_pool_resize(atomic_pool_normal, GFP_KERNEL);
> +}
> +
> +static int __init __dma_atomic_pool_init(struct gen_pool **pool,
> +

[rfc 3/6] dma-remap: wire up the atomic pools depending on gfp mask

2020-03-01 Thread David Rientjes via iommu


When allocating non-blockable memory, determine the optimal gfp mask of
the device and use the appropriate atomic pool.

The coherent DMA mask will remain the same between allocation and free
and, thus, memory will be freed to the same atomic pool it was allocated
from.

Signed-off-by: David Rientjes 
---
 include/linux/dma-direct.h |  2 ++
 kernel/dma/direct.c|  6 +++---
 kernel/dma/remap.c | 34 ++
 3 files changed, 31 insertions(+), 11 deletions(-)

diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h
--- a/include/linux/dma-direct.h
+++ b/include/linux/dma-direct.h
@@ -67,6 +67,8 @@ static inline bool dma_capable(struct device *dev, dma_addr_t 
addr, size_t size,
 }
 
 u64 dma_direct_get_required_mask(struct device *dev);
+gfp_t dma_direct_optimal_gfp_mask(struct device *dev, u64 dma_mask,
+ u64 *phys_mask);
 void *dma_direct_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle,
gfp_t gfp, unsigned long attrs);
 void dma_direct_free(struct device *dev, size_t size, void *cpu_addr,
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -44,8 +44,8 @@ u64 dma_direct_get_required_mask(struct device *dev)
return (1ULL << (fls64(max_dma) - 1)) * 2 - 1;
 }
 
-static gfp_t __dma_direct_optimal_gfp_mask(struct device *dev, u64 dma_mask,
-   u64 *phys_limit)
+gfp_t dma_direct_optimal_gfp_mask(struct device *dev, u64 dma_mask,
+ u64 *phys_limit)
 {
u64 dma_limit = min_not_zero(dma_mask, dev->bus_dma_limit);
 
@@ -88,7 +88,7 @@ struct page *__dma_direct_alloc_pages(struct device *dev, 
size_t size,
 
/* we always manually zero the memory once we are done: */
gfp &= ~__GFP_ZERO;
-   gfp |= __dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask,
+   gfp |= dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask,
&phys_limit);
page = dma_alloc_contiguous(dev, alloc_size, gfp);
if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) {
diff --git a/kernel/dma/remap.c b/kernel/dma/remap.c
--- a/kernel/dma/remap.c
+++ b/kernel/dma/remap.c
@@ -188,28 +188,44 @@ static int __init dma_atomic_pool_init(void)
 }
 postcore_initcall(dma_atomic_pool_init);
 
+static inline struct gen_pool *dev_to_pool(struct device *dev)
+{
+   u64 phys_mask;
+   gfp_t gfp;
+
+   gfp = dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask,
+ &phys_mask);
+   if (IS_ENABLED(CONFIG_ZONE_DMA) && gfp == GFP_DMA)
+   return atomic_pool;
+   if (IS_ENABLED(CONFIG_ZONE_DMA32) && gfp == GFP_DMA32)
+   return atomic_pool_dma32;
+   return atomic_pool_normal;
+}
+
 static bool dma_in_atomic_pool(struct device *dev, void *start, size_t size)
 {
-   if (unlikely(!atomic_pool))
-   return false;
+   struct gen_pool *pool = dev_to_pool(dev);
 
-   return gen_pool_has_addr(atomic_pool, (unsigned long)start, size);
+   if (unlikely(!pool))
+   return false;
+   return gen_pool_has_addr(pool, (unsigned long)start, size);
 }
 
 void *dma_alloc_from_pool(struct device *dev, size_t size,
  struct page **ret_page, gfp_t flags)
 {
+   struct gen_pool *pool = dev_to_pool(dev);
unsigned long val;
void *ptr = NULL;
 
-   if (!atomic_pool) {
-   WARN(1, "coherent pool not initialised!\n");
+   if (!pool) {
+   WARN(1, "%pGg atomic pool not initialised!\n", &flags);
return NULL;
}
 
-   val = gen_pool_alloc(atomic_pool, size);
+   val = gen_pool_alloc(pool, size);
if (val) {
-   phys_addr_t phys = gen_pool_virt_to_phys(atomic_pool, val);
+   phys_addr_t phys = gen_pool_virt_to_phys(pool, val);
 
*ret_page = pfn_to_page(__phys_to_pfn(phys));
ptr = (void *)val;
@@ -221,9 +237,11 @@ void *dma_alloc_from_pool(struct device *dev, size_t size,
 
 bool dma_free_from_pool(struct device *dev, void *start, size_t size)
 {
+   struct gen_pool *pool = dev_to_pool(dev);
+
if (!dma_in_atomic_pool(dev, start, size))
return false;
-   gen_pool_free(atomic_pool, (unsigned long)start, size);
+   gen_pool_free(pool, (unsigned long)start, size);
return true;
 }
 #endif /* CONFIG_DMA_DIRECT_REMAP */
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[rfc 4/6] dma-remap: dynamically expanding atomic pools

2020-03-01 Thread David Rientjes via iommu
When an atomic pool becomes fully depleted because it is now relied upon
for all non-blocking allocations through the DMA API, allow background
expansion of each pool by a kworker.

When an atomic pool has less than the default size of memory left, kick
off a kworker to dynamically expand the pool in the background.  The pool
is doubled in size.

This allows the default size to be kept quite low when one or more of the
atomic pools is not used.

Also switch over some node ids to the more appropriate NUMA_NO_NODE.

Signed-off-by: David Rientjes 
---
 kernel/dma/remap.c | 79 ++
 1 file changed, 58 insertions(+), 21 deletions(-)

diff --git a/kernel/dma/remap.c b/kernel/dma/remap.c
--- a/kernel/dma/remap.c
+++ b/kernel/dma/remap.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct page **dma_common_find_pages(void *cpu_addr)
 {
@@ -104,7 +105,10 @@ static struct gen_pool *atomic_pool_dma32 __ro_after_init;
 static struct gen_pool *atomic_pool_normal __ro_after_init;
 
 #define DEFAULT_DMA_COHERENT_POOL_SIZE  SZ_256K
-static size_t atomic_pool_size __initdata = DEFAULT_DMA_COHERENT_POOL_SIZE;
+static size_t atomic_pool_size = DEFAULT_DMA_COHERENT_POOL_SIZE;
+
+/* Dynamic background expansion when the atomic pool is near capacity */
+struct work_struct atomic_pool_work;
 
 static int __init early_coherent_pool(char *p)
 {
@@ -113,14 +117,14 @@ static int __init early_coherent_pool(char *p)
 }
 early_param("coherent_pool", early_coherent_pool);
 
-static int __init __dma_atomic_pool_init(struct gen_pool **pool,
-size_t pool_size, gfp_t gfp)
+static int atomic_pool_expand(struct gen_pool *pool, size_t pool_size,
+ gfp_t gfp)
 {
-   const unsigned int order = get_order(pool_size);
const unsigned long nr_pages = pool_size >> PAGE_SHIFT;
+   const unsigned int order = get_order(pool_size);
struct page *page;
void *addr;
-   int ret;
+   int ret = -ENOMEM;
 
if (dev_get_cma_area(NULL))
page = dma_alloc_from_contiguous(NULL, nr_pages, order, false);
@@ -131,38 +135,67 @@ static int __init __dma_atomic_pool_init(struct gen_pool 
**pool,
 
arch_dma_prep_coherent(page, pool_size);
 
-   *pool = gen_pool_create(PAGE_SHIFT, -1);
-   if (!*pool)
-   goto free_page;
-
addr = dma_common_contiguous_remap(page, pool_size,
   pgprot_dmacoherent(PAGE_KERNEL),
   __builtin_return_address(0));
if (!addr)
-   goto destroy_genpool;
+   goto free_page;
 
-   ret = gen_pool_add_virt(*pool, (unsigned long)addr, page_to_phys(page),
-   pool_size, -1);
+   ret = gen_pool_add_virt(pool, (unsigned long)addr, page_to_phys(page),
+   pool_size, NUMA_NO_NODE);
if (ret)
goto remove_mapping;
-   gen_pool_set_algo(*pool, gen_pool_first_fit_order_align, NULL);
 
-   pr_info("DMA: preallocated %zu KiB %pGg pool for atomic allocations\n",
-   pool_size >> 10, &gfp);
return 0;
 
 remove_mapping:
dma_common_free_remap(addr, pool_size);
-destroy_genpool:
-   gen_pool_destroy(*pool);
-   *pool = NULL;
 free_page:
if (!dma_release_from_contiguous(NULL, page, nr_pages))
__free_pages(page, order);
 out:
-   pr_err("DMA: failed to allocate %zu KiB %pGg pool for atomic 
allocation\n",
-   atomic_pool_size >> 10, &gfp);
-   return -ENOMEM;
+   return ret;
+}
+
+static void atomic_pool_resize(struct gen_pool *pool, gfp_t gfp)
+{
+   if (pool && gen_pool_avail(pool) < atomic_pool_size)
+   atomic_pool_expand(pool, gen_pool_size(pool), gfp);
+}
+
+static void atomic_pool_work_fn(struct work_struct *work)
+{
+   if (IS_ENABLED(CONFIG_ZONE_DMA))
+   atomic_pool_resize(atomic_pool, GFP_DMA);
+   if (IS_ENABLED(CONFIG_ZONE_DMA32))
+   atomic_pool_resize(atomic_pool_dma32, GFP_DMA32);
+   atomic_pool_resize(atomic_pool_normal, GFP_KERNEL);
+}
+
+static int __init __dma_atomic_pool_init(struct gen_pool **pool,
+size_t pool_size, gfp_t gfp)
+{
+   int ret;
+
+   *pool = gen_pool_create(PAGE_SHIFT, NUMA_NO_NODE);
+   if (!*pool)
+   return -ENOMEM;
+
+   gen_pool_set_algo(*pool, gen_pool_first_fit_order_align, NULL);
+
+   ret = atomic_pool_expand(*pool, pool_size, gfp);
+   if (ret) {
+   gen_pool_destroy(*pool);
+   *pool = NULL;
+   pr_err("DMA: failed to allocate %zu KiB %pGg pool for atomic 
allocation\n",
+  atomic_pool_size >> 10, &gfp);
+   return ret;
+   }
+
+
+   pr_info("DMA: preallocated %zu KiB %pGg pool for atomic allocations\n",
+ 

[rfc 6/6] dma-remap: double the default DMA coherent pool size

2020-03-01 Thread David Rientjes via iommu
When AMD memory encryption is enabled, some devices may used more than
256KB/sec from the atomic pools.  Double the default size to make the
original size and expansion more appropriate.

This provides a slight optimization on initial expansion and is deemed
appropriate for all configs with CONFIG_DMA_REMAP enabled because of the
increased reliance on the atomic pools.

Alternatively, this could be done only when CONFIG_AMD_MEM_ENCRYPT is
enabled.

Signed-off-by: David Rientjes 
---
 kernel/dma/remap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/dma/remap.c b/kernel/dma/remap.c
--- a/kernel/dma/remap.c
+++ b/kernel/dma/remap.c
@@ -105,7 +105,7 @@ static struct gen_pool *atomic_pool __ro_after_init;
 static struct gen_pool *atomic_pool_dma32 __ro_after_init;
 static struct gen_pool *atomic_pool_normal __ro_after_init;
 
-#define DEFAULT_DMA_COHERENT_POOL_SIZE  SZ_256K
+#define DEFAULT_DMA_COHERENT_POOL_SIZE  SZ_512K
 static size_t atomic_pool_size = DEFAULT_DMA_COHERENT_POOL_SIZE;
 
 /* Dynamic background expansion when the atomic pool is near capacity */
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[rfc 5/6] dma-direct: atomic allocations must come from unencrypted pools

2020-03-01 Thread David Rientjes via iommu
When AMD memory encryption is enabled, all non-blocking DMA allocations
must originate from the atomic pools depending on the device and the gfp
mask of the allocation.

Keep all memory in these pools unencrypted.

Signed-off-by: David Rientjes 
---
 arch/x86/Kconfig| 1 +
 kernel/dma/direct.c | 9 -
 kernel/dma/remap.c  | 2 ++
 3 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1523,6 +1523,7 @@ config X86_CPA_STATISTICS
 config AMD_MEM_ENCRYPT
bool "AMD Secure Memory Encryption (SME) support"
depends on X86_64 && CPU_SUP_AMD
+   select DMA_DIRECT_REMAP
select DYNAMIC_PHYSICAL_MASK
select ARCH_USE_MEMREMAP_PROT
select ARCH_HAS_FORCE_DMA_UNENCRYPTED
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -125,7 +125,6 @@ void *dma_direct_alloc_pages(struct device *dev, size_t 
size,
void *ret;
 
if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
-   dma_alloc_need_uncached(dev, attrs) &&
!gfpflags_allow_blocking(gfp)) {
ret = dma_alloc_from_pool(dev, PAGE_ALIGN(size), &page, gfp);
if (!ret)
@@ -202,6 +201,10 @@ void dma_direct_free_pages(struct device *dev, size_t 
size, void *cpu_addr,
 {
unsigned int page_order = get_order(size);
 
+   if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
+   dma_free_from_pool(dev, cpu_addr, PAGE_ALIGN(size)))
+   return;
+
if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) &&
!force_dma_unencrypted(dev)) {
/* cpu_addr is a struct page cookie, not a kernel address */
@@ -209,10 +212,6 @@ void dma_direct_free_pages(struct device *dev, size_t 
size, void *cpu_addr,
return;
}
 
-   if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
-   dma_free_from_pool(dev, cpu_addr, PAGE_ALIGN(size)))
-   return;
-
if (force_dma_unencrypted(dev))
set_memory_encrypted((unsigned long)cpu_addr, 1 << page_order);
 
diff --git a/kernel/dma/remap.c b/kernel/dma/remap.c
--- a/kernel/dma/remap.c
+++ b/kernel/dma/remap.c
@@ -8,6 +8,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -141,6 +142,7 @@ static int atomic_pool_expand(struct gen_pool *pool, size_t 
pool_size,
if (!addr)
goto free_page;
 
+   set_memory_decrypted((unsigned long)page_to_virt(page), nr_pages);
ret = gen_pool_add_virt(pool, (unsigned long)addr, page_to_phys(page),
pool_size, NUMA_NO_NODE);
if (ret)
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[rfc 0/6] unencrypted atomic DMA pools with dynamic expansion

2020-03-01 Thread David Rientjes via iommu
set_memory_decrypted() may block so it is not possible to do non-blocking
allocations through the DMA API for devices that required unencrypted
memory.

The solution is to expand the atomic DMA pools for the various possible
gfp requirements as a means to prevent an unnecessary depletion of lowmem.

These atomic DMA pools are kept unencrypted so they can immediately be
used for non-blocking allocations.  Since the need for this type of memory
depends on the kernel config and devices being used, these pools are also
dynamically expandable.

Some of these patches could be squashed into each other, but they were
separated out to ease code review.

This patchset is based on 5.6-rc4.
---
 arch/x86/Kconfig|   1 +
 drivers/iommu/dma-iommu.c   |   5 +-
 include/linux/dma-direct.h  |   2 +
 include/linux/dma-mapping.h |   6 +-
 kernel/dma/direct.c |  17 ++--
 kernel/dma/remap.c  | 173 +---
 6 files changed, 140 insertions(+), 64 deletions(-)
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[rfc 2/6] dma-remap: add additional atomic pools to map to gfp mask

2020-03-01 Thread David Rientjes via iommu
The single atomic pool is allocated from the lowest zone possible since
it is guaranteed to be applicable for any DMA allocation.

Devices may allocate through the DMA API but not have a strict reliance
on GFP_DMA memory.  Since the atomic pool will be used for all
non-blockable allocations, returning all memory from ZONE_DMA may
unnecessarily deplete the zone.

Provision for multiple atomic pools that will map to the optimal gfp
mask of the device.  These will be wired up in a subsequent patch.

Signed-off-by: David Rientjes 
---
 kernel/dma/remap.c | 75 +++---
 1 file changed, 45 insertions(+), 30 deletions(-)

diff --git a/kernel/dma/remap.c b/kernel/dma/remap.c
--- a/kernel/dma/remap.c
+++ b/kernel/dma/remap.c
@@ -100,6 +100,8 @@ void dma_common_free_remap(void *cpu_addr, size_t size)
 
 #ifdef CONFIG_DMA_DIRECT_REMAP
 static struct gen_pool *atomic_pool __ro_after_init;
+static struct gen_pool *atomic_pool_dma32 __ro_after_init;
+static struct gen_pool *atomic_pool_normal __ro_after_init;
 
 #define DEFAULT_DMA_COHERENT_POOL_SIZE  SZ_256K
 static size_t atomic_pool_size __initdata = DEFAULT_DMA_COHERENT_POOL_SIZE;
@@ -111,66 +113,79 @@ static int __init early_coherent_pool(char *p)
 }
 early_param("coherent_pool", early_coherent_pool);
 
-static gfp_t dma_atomic_pool_gfp(void)
+static int __init __dma_atomic_pool_init(struct gen_pool **pool,
+size_t pool_size, gfp_t gfp)
 {
-   if (IS_ENABLED(CONFIG_ZONE_DMA))
-   return GFP_DMA;
-   if (IS_ENABLED(CONFIG_ZONE_DMA32))
-   return GFP_DMA32;
-   return GFP_KERNEL;
-}
-
-static int __init dma_atomic_pool_init(void)
-{
-   unsigned int pool_size_order = get_order(atomic_pool_size);
-   unsigned long nr_pages = atomic_pool_size >> PAGE_SHIFT;
+   const unsigned int order = get_order(pool_size);
+   const unsigned long nr_pages = pool_size >> PAGE_SHIFT;
struct page *page;
void *addr;
int ret;
 
if (dev_get_cma_area(NULL))
-   page = dma_alloc_from_contiguous(NULL, nr_pages,
-pool_size_order, false);
+   page = dma_alloc_from_contiguous(NULL, nr_pages, order, false);
else
-   page = alloc_pages(dma_atomic_pool_gfp(), pool_size_order);
+   page = alloc_pages(gfp, order);
if (!page)
goto out;
 
-   arch_dma_prep_coherent(page, atomic_pool_size);
+   arch_dma_prep_coherent(page, pool_size);
 
-   atomic_pool = gen_pool_create(PAGE_SHIFT, -1);
-   if (!atomic_pool)
+   *pool = gen_pool_create(PAGE_SHIFT, -1);
+   if (!*pool)
goto free_page;
 
-   addr = dma_common_contiguous_remap(page, atomic_pool_size,
+   addr = dma_common_contiguous_remap(page, pool_size,
   pgprot_dmacoherent(PAGE_KERNEL),
   __builtin_return_address(0));
if (!addr)
goto destroy_genpool;
 
-   ret = gen_pool_add_virt(atomic_pool, (unsigned long)addr,
-   page_to_phys(page), atomic_pool_size, -1);
+   ret = gen_pool_add_virt(*pool, (unsigned long)addr, page_to_phys(page),
+   pool_size, -1);
if (ret)
goto remove_mapping;
-   gen_pool_set_algo(atomic_pool, gen_pool_first_fit_order_align, NULL);
+   gen_pool_set_algo(*pool, gen_pool_first_fit_order_align, NULL);
 
-   pr_info("DMA: preallocated %zu KiB pool for atomic allocations\n",
-   atomic_pool_size / 1024);
+   pr_info("DMA: preallocated %zu KiB %pGg pool for atomic allocations\n",
+   pool_size >> 10, &gfp);
return 0;
 
 remove_mapping:
-   dma_common_free_remap(addr, atomic_pool_size);
+   dma_common_free_remap(addr, pool_size);
 destroy_genpool:
-   gen_pool_destroy(atomic_pool);
-   atomic_pool = NULL;
+   gen_pool_destroy(*pool);
+   *pool = NULL;
 free_page:
if (!dma_release_from_contiguous(NULL, page, nr_pages))
-   __free_pages(page, pool_size_order);
+   __free_pages(page, order);
 out:
-   pr_err("DMA: failed to allocate %zu KiB pool for atomic coherent 
allocation\n",
-   atomic_pool_size / 1024);
+   pr_err("DMA: failed to allocate %zu KiB %pGg pool for atomic 
allocation\n",
+   atomic_pool_size >> 10, &gfp);
return -ENOMEM;
 }
+
+static int __init dma_atomic_pool_init(void)
+{
+   int ret = 0;
+   int err;
+
+   ret = __dma_atomic_pool_init(&atomic_pool_normal, atomic_pool_size,
+GFP_KERNEL);
+   if (IS_ENABLED(CONFIG_ZONE_DMA)) {
+   err = __dma_atomic_pool_init(&atomic_pool, atomic_pool_size,
+GFP_DMA);
+   if (!ret && err)
+   

[rfc 1/6] dma-mapping: pass device to atomic allocation functions

2020-03-01 Thread David Rientjes via iommu
This augments the dma_{alloc,free}_from_pool() functions with a pointer
to the struct device of the allocation.   This introduces no functional
change and will be used later to determine the optimal gfp mask to
allocate memory from.

dma_in_atomic_pool() is not used outside kernel/dma/remap.c, so remove
its declaration from the header file.

Signed-off-by: David Rientjes 
---
 drivers/iommu/dma-iommu.c   | 5 +++--
 include/linux/dma-mapping.h | 6 +++---
 kernel/dma/direct.c | 4 ++--
 kernel/dma/remap.c  | 9 +
 4 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -952,7 +952,7 @@ static void __iommu_dma_free(struct device *dev, size_t 
size, void *cpu_addr)
 
/* Non-coherent atomic allocation? Easy */
if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
-   dma_free_from_pool(cpu_addr, alloc_size))
+   dma_free_from_pool(dev, cpu_addr, alloc_size))
return;
 
if (IS_ENABLED(CONFIG_DMA_REMAP) && is_vmalloc_addr(cpu_addr)) {
@@ -1035,7 +1035,8 @@ static void *iommu_dma_alloc(struct device *dev, size_t 
size,
 
if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
!gfpflags_allow_blocking(gfp) && !coherent)
-   cpu_addr = dma_alloc_from_pool(PAGE_ALIGN(size), &page, gfp);
+   cpu_addr = dma_alloc_from_pool(dev, PAGE_ALIGN(size), &page,
+  gfp);
else
cpu_addr = iommu_dma_alloc_pages(dev, size, &page, gfp, attrs);
if (!cpu_addr)
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -630,9 +630,9 @@ void *dma_common_pages_remap(struct page **pages, size_t 
size,
pgprot_t prot, const void *caller);
 void dma_common_free_remap(void *cpu_addr, size_t size);
 
-bool dma_in_atomic_pool(void *start, size_t size);
-void *dma_alloc_from_pool(size_t size, struct page **ret_page, gfp_t flags);
-bool dma_free_from_pool(void *start, size_t size);
+void *dma_alloc_from_pool(struct device *dev, size_t size,
+ struct page **ret_page, gfp_t flags);
+bool dma_free_from_pool(struct device *dev, void *start, size_t size);
 
 int
 dma_common_get_sgtable(struct device *dev, struct sg_table *sgt, void 
*cpu_addr,
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -127,7 +127,7 @@ void *dma_direct_alloc_pages(struct device *dev, size_t 
size,
if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
dma_alloc_need_uncached(dev, attrs) &&
!gfpflags_allow_blocking(gfp)) {
-   ret = dma_alloc_from_pool(PAGE_ALIGN(size), &page, gfp);
+   ret = dma_alloc_from_pool(dev, PAGE_ALIGN(size), &page, gfp);
if (!ret)
return NULL;
goto done;
@@ -210,7 +210,7 @@ void dma_direct_free_pages(struct device *dev, size_t size, 
void *cpu_addr,
}
 
if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
-   dma_free_from_pool(cpu_addr, PAGE_ALIGN(size)))
+   dma_free_from_pool(dev, cpu_addr, PAGE_ALIGN(size)))
return;
 
if (force_dma_unencrypted(dev))
diff --git a/kernel/dma/remap.c b/kernel/dma/remap.c
--- a/kernel/dma/remap.c
+++ b/kernel/dma/remap.c
@@ -173,7 +173,7 @@ static int __init dma_atomic_pool_init(void)
 }
 postcore_initcall(dma_atomic_pool_init);
 
-bool dma_in_atomic_pool(void *start, size_t size)
+static bool dma_in_atomic_pool(struct device *dev, void *start, size_t size)
 {
if (unlikely(!atomic_pool))
return false;
@@ -181,7 +181,8 @@ bool dma_in_atomic_pool(void *start, size_t size)
return gen_pool_has_addr(atomic_pool, (unsigned long)start, size);
 }
 
-void *dma_alloc_from_pool(size_t size, struct page **ret_page, gfp_t flags)
+void *dma_alloc_from_pool(struct device *dev, size_t size,
+ struct page **ret_page, gfp_t flags)
 {
unsigned long val;
void *ptr = NULL;
@@ -203,9 +204,9 @@ void *dma_alloc_from_pool(size_t size, struct page 
**ret_page, gfp_t flags)
return ptr;
 }
 
-bool dma_free_from_pool(void *start, size_t size)
+bool dma_free_from_pool(struct device *dev, void *start, size_t size)
 {
-   if (!dma_in_atomic_pool(start, size))
+   if (!dma_in_atomic_pool(dev, start, size))
return false;
gen_pool_free(atomic_pool, (unsigned long)start, size);
return true;
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [rfc] dma-mapping: preallocate unencrypted DMA atomic pool

2020-02-28 Thread David Rientjes via iommu
On Fri, 17 Jan 2020, Tom Lendacky wrote:

> On 12/31/19 7:54 PM, David Rientjes wrote:
> > Christoph, Thomas, is something like this (without the diagnosic 
> > information included in this patch) acceptable for these allocations?  
> > Adding expansion support when the pool is half depleted wouldn't be *that* 
> > hard.
> 
> Sorry for the delay in responding...  Overall, I think this will work
> well. If you want the default size to be adjustable, you can always go
> with a Kconfig setting or a command line parameter or both (I realize the
> command line parameter is not optimal).
> 

Ok, so we've determined that we don't only have a dependency on GFP_DMA 
memory through the DMA API in a non-blocking context that needs to be 
unencrypted, but also GFP_KERNEL.  We don't have a dependency on GFP_DMA32 
memory (yet) but should likely provision for it.

So my patch would change by actually providing three separate pools, one 
for ZONE_DMA, one for ZONE_DMA32, and one for ZONE_NORMAL.  The ZONE_DMA 
already exists in the form of the atomic_pool, so it would add two 
additional pools that likely start out at the same size and dynamically 
expand with a kworker when its usage approaches the limitatios of the 
pool.  I don't necessarily like needing three separate pools, but I can't 
think of a better way to provide unencrypted memory for non-blocking 
allocations that work for all possible device gfp masks.

My idea right now is to create all three pools instead of the single 
atomic_pool, all 256KB in size, and anytime their usage reaches half their 
limit, we kick off some background work to double the size of the pool 
with GFP_KERNEL | __GFP_NORETRY.  Our experimentation suggests that a 
kworker would actually respond in time for this.

Any objections to this approach?  If so, an alternative suggestion would 
be appreciated :)  I plan on all atomic pools to be unencrypted at the 
time the allocation is successful unless there is some strange need for 
non-blocking atomic allocations through the DMA API that should *not* be 
encrypted.

> Just a couple of overall comments about the use of variable names and
> messages using both unencrypted and encrypted, I think the use should be
> consistent throughout the patch.
> 
> Thanks,
> Tom
> 
> > 
> > Or are there alternatives we should consider?  Thanks!
> > 
> > 
> > 
> > 
> > When AMD SEV is enabled in the guest, all allocations through 
> > dma_pool_alloc_page() must call set_memory_decrypted() for unencrypted 
> > DMA.  This includes dma_pool_alloc() and dma_direct_alloc_pages().  These 
> > calls may block which is not allowed in atomic allocation contexts such as 
> > from the NVMe driver.
> > 
> > Preallocate a complementary unecrypted DMA atomic pool that is initially 
> > 4MB in size.  This patch does not contain dynamic expansion, but that 
> > could be added if necessary.
> > 
> > In our stress testing, our peak unecrypted DMA atomic allocation 
> > requirements is ~1.4MB, so 4MB is plenty.  This pool is similar to the 
> > existing DMA atomic pool but is unencrypted.
> > 
> > Signed-off-by: David Rientjes 
> > ---
> >  Based on v5.4 HEAD.
> > 
> >  This commit contains diagnostic information and is not intended for use 
> >  in a production environment.
> > 
> >  arch/x86/Kconfig|   1 +
> >  drivers/iommu/dma-iommu.c   |   5 +-
> >  include/linux/dma-mapping.h |   7 ++-
> >  kernel/dma/direct.c |  16 -
> >  kernel/dma/remap.c  | 116 ++--
> >  5 files changed, 108 insertions(+), 37 deletions(-)
> > 
> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -1530,6 +1530,7 @@ config X86_CPA_STATISTICS
> >  config AMD_MEM_ENCRYPT
> > bool "AMD Secure Memory Encryption (SME) support"
> > depends on X86_64 && CPU_SUP_AMD
> > +   select DMA_DIRECT_REMAP
> > select DYNAMIC_PHYSICAL_MASK
> > select ARCH_USE_MEMREMAP_PROT
> > select ARCH_HAS_FORCE_DMA_UNENCRYPTED
> > diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> > --- a/drivers/iommu/dma-iommu.c
> > +++ b/drivers/iommu/dma-iommu.c
> > @@ -928,7 +928,7 @@ static void __iommu_dma_free(struct device *dev, size_t 
> > size, void *cpu_addr)
> >  
> > /* Non-coherent atomic allocation? Easy */
> > if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
> > -   dma_free_from_pool(cpu_addr, alloc_size))
> > +   dma_free_from_pool(dev, cpu_addr, alloc_size))
> > return;
> >  
> > if (IS_ENABLED(CONFIG_DMA_REMAP) && is_vmalloc_addr(cpu_addr)) {
> > @@ -1011,7 +1011,8 @@ static void *iommu_dma_alloc(struct device *dev, 
> > size_t size,
> >  
> > if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
> > !gfpflags_allow_blocking(gfp) && !coherent)
> > -   cpu_addr = dma_alloc_from_pool(PAGE_ALIGN(size), &page, gfp);
> > +   cpu_addr = dma_alloc_from_pool(dev, PAGE_ALIGN(size), &page,
> > +   

Re: [rfc] dma-mapping: preallocate unencrypted DMA atomic pool

2020-01-09 Thread David Rientjes via iommu
On Thu, 9 Jan 2020, Christoph Hellwig wrote:

> > I'll rely on Thomas to chime in if this doesn't make sense for the SEV 
> > usecase.
> > 
> > I think the sizing of the single atomic pool needs to be determined.  Our 
> > peak usage that we have measured from NVMe is ~1.4MB and atomic_pool is 
> > currently sized to 256KB by default.  I'm unsure at this time if we need 
> > to be able to dynamically expand this pool with a kworker.
> >
> > Maybe when CONFIG_AMD_MEM_ENCRYPT is enabled this atomic pool should be 
> > sized to 2MB or so and then when it reaches half capacity we schedule some 
> > background work to dynamically increase it?  That wouldn't be hard unless 
> > the pool can be rapidly depleted.
> > 
> 
> Note that a non-coherent architecture with the same workload would need
> the same size.
> 
> > Do we want to increase the atomic pool size by default and then do 
> > background dynamic expansion?
> 
> For now I'd just scale with system memory size.
> 

Thanks Christoph and Robin for the help, we're running some additional 
stress tests to double check that our required amount of memory from this 
pool is accurate.  Once that's done, I'll refresh the patch with th 
suggestions and propose it formally.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [rfc] dma-mapping: preallocate unencrypted DMA atomic pool

2020-01-07 Thread David Rientjes via iommu
On Tue, 7 Jan 2020, Christoph Hellwig wrote:

> > On 01/01/2020 1:54 am, David Rientjes via iommu wrote:
> >> Christoph, Thomas, is something like this (without the diagnosic
> >> information included in this patch) acceptable for these allocations?
> >> Adding expansion support when the pool is half depleted wouldn't be *that*
> >> hard.
> >>
> >> Or are there alternatives we should consider?  Thanks!
> >
> > Are there any platforms which require both non-cacheable remapping *and* 
> > unencrypted remapping for distinct subsets of devices?
> >
> > If not (and I'm assuming there aren't, because otherwise this patch is 
> > incomplete in covering only 2 of the 3 possible combinations), then 
> > couldn't we keep things simpler by just attributing both properties to the 
> > single "atomic pool" on the basis that one or the other will always be a 
> > no-op? In other words, basically just tweaking the existing "!coherent" 
> > tests to "!coherent || force_dma_unencrypted()" and doing 
> > set_dma_unencrypted() unconditionally in atomic_pool_init().
> 
> I think that would make most sense.
> 

I'll rely on Thomas to chime in if this doesn't make sense for the SEV 
usecase.

I think the sizing of the single atomic pool needs to be determined.  Our 
peak usage that we have measured from NVMe is ~1.4MB and atomic_pool is 
currently sized to 256KB by default.  I'm unsure at this time if we need 
to be able to dynamically expand this pool with a kworker.

Maybe when CONFIG_AMD_MEM_ENCRYPT is enabled this atomic pool should be 
sized to 2MB or so and then when it reaches half capacity we schedule some 
background work to dynamically increase it?  That wouldn't be hard unless 
the pool can be rapidly depleted.

Do we want to increase the atomic pool size by default and then do 
background dynamic expansion?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[rfc] dma-mapping: preallocate unencrypted DMA atomic pool

2019-12-31 Thread David Rientjes via iommu
Christoph, Thomas, is something like this (without the diagnosic 
information included in this patch) acceptable for these allocations?  
Adding expansion support when the pool is half depleted wouldn't be *that* 
hard.

Or are there alternatives we should consider?  Thanks!




When AMD SEV is enabled in the guest, all allocations through 
dma_pool_alloc_page() must call set_memory_decrypted() for unencrypted 
DMA.  This includes dma_pool_alloc() and dma_direct_alloc_pages().  These 
calls may block which is not allowed in atomic allocation contexts such as 
from the NVMe driver.

Preallocate a complementary unecrypted DMA atomic pool that is initially 
4MB in size.  This patch does not contain dynamic expansion, but that 
could be added if necessary.

In our stress testing, our peak unecrypted DMA atomic allocation 
requirements is ~1.4MB, so 4MB is plenty.  This pool is similar to the 
existing DMA atomic pool but is unencrypted.

Signed-off-by: David Rientjes 
---
 Based on v5.4 HEAD.

 This commit contains diagnostic information and is not intended for use 
 in a production environment.

 arch/x86/Kconfig|   1 +
 drivers/iommu/dma-iommu.c   |   5 +-
 include/linux/dma-mapping.h |   7 ++-
 kernel/dma/direct.c |  16 -
 kernel/dma/remap.c  | 116 ++--
 5 files changed, 108 insertions(+), 37 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1530,6 +1530,7 @@ config X86_CPA_STATISTICS
 config AMD_MEM_ENCRYPT
bool "AMD Secure Memory Encryption (SME) support"
depends on X86_64 && CPU_SUP_AMD
+   select DMA_DIRECT_REMAP
select DYNAMIC_PHYSICAL_MASK
select ARCH_USE_MEMREMAP_PROT
select ARCH_HAS_FORCE_DMA_UNENCRYPTED
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -928,7 +928,7 @@ static void __iommu_dma_free(struct device *dev, size_t 
size, void *cpu_addr)
 
/* Non-coherent atomic allocation? Easy */
if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
-   dma_free_from_pool(cpu_addr, alloc_size))
+   dma_free_from_pool(dev, cpu_addr, alloc_size))
return;
 
if (IS_ENABLED(CONFIG_DMA_REMAP) && is_vmalloc_addr(cpu_addr)) {
@@ -1011,7 +1011,8 @@ static void *iommu_dma_alloc(struct device *dev, size_t 
size,
 
if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) &&
!gfpflags_allow_blocking(gfp) && !coherent)
-   cpu_addr = dma_alloc_from_pool(PAGE_ALIGN(size), &page, gfp);
+   cpu_addr = dma_alloc_from_pool(dev, PAGE_ALIGN(size), &page,
+  gfp);
else
cpu_addr = iommu_dma_alloc_pages(dev, size, &page, gfp, attrs);
if (!cpu_addr)
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -629,9 +629,10 @@ void *dma_common_pages_remap(struct page **pages, size_t 
size,
pgprot_t prot, const void *caller);
 void dma_common_free_remap(void *cpu_addr, size_t size);
 
-bool dma_in_atomic_pool(void *start, size_t size);
-void *dma_alloc_from_pool(size_t size, struct page **ret_page, gfp_t flags);
-bool dma_free_from_pool(void *start, size_t size);
+bool dma_in_atomic_pool(struct device *dev, void *start, size_t size);
+void *dma_alloc_from_pool(struct device *dev, size_t size,
+ struct page **ret_page, gfp_t flags);
+bool dma_free_from_pool(struct device *dev, void *start, size_t size);
 
 int
 dma_common_get_sgtable(struct device *dev, struct sg_table *sgt, void 
*cpu_addr,
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c
--- a/kernel/dma/direct.c
+++ b/kernel/dma/direct.c
@@ -10,6 +10,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -131,6 +132,13 @@ void *dma_direct_alloc_pages(struct device *dev, size_t 
size,
struct page *page;
void *ret;
 
+   if (!gfpflags_allow_blocking(gfp) && force_dma_unencrypted(dev)) {
+   ret = dma_alloc_from_pool(dev, size, &page, gfp);
+   if (!ret)
+   return NULL;
+   goto done;
+   }
+
page = __dma_direct_alloc_pages(dev, size, dma_handle, gfp, attrs);
if (!page)
return NULL;
@@ -156,7 +164,7 @@ void *dma_direct_alloc_pages(struct device *dev, size_t 
size,
__dma_direct_free_pages(dev, size, page);
return NULL;
}
-
+done:
ret = page_address(page);
if (force_dma_unencrypted(dev)) {
set_memory_decrypted((unsigned long)ret, 1 << get_order(size));
@@ -185,6 +193,12 @@ void dma_direct_free_pages(struct device *dev, size_t 
size, void *cpu_addr,
 {
unsigned int page_order = get_order(size);
 
+   if (force_dma_unencrypted(dev) &&

Re: [bug] __blk_mq_run_hw_queue suspicious rcu usage

2019-12-14 Thread David Rientjes via iommu
On Thu, 12 Dec 2019, David Rientjes wrote:

> Since all DMA must be unencrypted in this case, what happens if all 
> dma_direct_alloc_pages() calls go through the DMA pool in 
> kernel/dma/remap.c when force_dma_unencrypted(dev) == true since 
> __PAGE_ENC is cleared for these ptes?  (Ignoring for a moment that this 
> special pool should likely be a separate dma pool.)
> 
> I assume a general depletion of that atomic pool so 
> DEFAULT_DMA_COHERENT_POOL_SIZE becomes insufficient.  I'm not sure what 
> size any DMA pool wired up for this specific purpose would need to be 
> sized at, so I assume dynamic resizing is required.
> 
> It shouldn't be *that* difficult to supplement kernel/dma/remap.c with the 
> ability to do background expansion of the atomic pool when nearing its 
> capacity for this purpose?  I imagine that if we just can't allocate pages 
> within the DMA mask that it's the only blocker to dynamic expansion and we 
> don't oom kill for lowmem.  But perhaps vm.lowmem_reserve_ratio is good 
> enough protection?
> 
> Beyond that, I'm not sure what sizing would be appropriate if this is to 
> be a generic solution in the DMA API for all devices that may require 
> unecrypted memory.
> 

Optimizations involving lowmem reserve ratio aside, is it ok that 
CONFIG_AMD_MEM_ENCRYPT develops a dependency on DMA_DIRECT_REMAP because 
set_memory_decrypted() must be allowed to block?

If so, we could allocate from the atomic pool when we can't block and the 
device requires unencrypted DMA from dma_direct_alloc_pages().  I assume 
we need this to be its own atomic pool specifically for 
force_dma_unencrypted() devices and to check addr_in_gen_pool() for this 
new unencrypted pool in dma_direct_free_pages().

I have no idea how large this unencrypted atomic pool should be sized.  We 
could determine a nice default and grow size for nvme itself, but as 
Christoph mentioned many drivers require non-blockable allocations that 
can be run inside a SEV encrypted guest.

Trivial implementation would be to just double the size of the unencrypted 
pool when it reaches half capacity.  Perhaps done with GFP_KERNEL | 
__GFP_DMA allocations in a workqueue.  We can reclaim from ZONE_DMA or 
ZONE_DMA32 in this context but when that fails I'm not sure if it's 
satisfactory to just fail the dma_pool_alloc() when the unecrypted pool 
runs out.

Heuristics can be tweaked, of course, but I want to make sure I'm not 
missing anything obvious with this approach before implementing it.  
Please let me know, thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [bug] __blk_mq_run_hw_queue suspicious rcu usage

2019-12-13 Thread David Rientjes via iommu
On Thu, 12 Dec 2019, David Rientjes wrote:

> Since all DMA must be unencrypted in this case, what happens if all 
> dma_direct_alloc_pages() calls go through the DMA pool in 
> kernel/dma/remap.c when force_dma_unencrypted(dev) == true since 
> __PAGE_ENC is cleared for these ptes?  (Ignoring for a moment that this 
> special pool should likely be a separate dma pool.)
> 
> I assume a general depletion of that atomic pool so 
> DEFAULT_DMA_COHERENT_POOL_SIZE becomes insufficient.  I'm not sure what 
> size any DMA pool wired up for this specific purpose would need to be 
> sized at, so I assume dynamic resizing is required.
> 
> It shouldn't be *that* difficult to supplement kernel/dma/remap.c with the 
> ability to do background expansion of the atomic pool when nearing its 
> capacity for this purpose?  I imagine that if we just can't allocate pages 
> within the DMA mask that it's the only blocker to dynamic expansion and we 
> don't oom kill for lowmem.  But perhaps vm.lowmem_reserve_ratio is good 
> enough protection?
> 
> Beyond that, I'm not sure what sizing would be appropriate if this is to 
> be a generic solution in the DMA API for all devices that may require 
> unecrypted memory.
> 

Secondly, I'm wondering about how the DMA pool for atomic allocations 
compares with lowmem reserve for both ZONE_DMA and ZONE_DMA32.  For 
allocations where the classzone index is one of these zones, the lowmem 
reserve is static, we don't account the amount of lowmem allocated and 
adjust this for future watermark checks in the page allocator.  We always 
guarantee that reserve is free (absent the depletion of the zone due to 
GFP_ATOMIC allocations where we fall below the min watermarks).

If all DMA memory needs to have _PAGE_ENC cleared when the guest is SEV 
encrypted, I'm wondering if the entire lowmem reserve could be designed as 
a pool of lowmem pages rather than a watermark check.  If implemented as a 
pool of pages in the page allocator itself, and today's reserve is static, 
maybe we could get away with a dynamic resizing based on that static 
amount?  We could offload the handling of this reserve to kswapd such that 
when the pool falls below today's reserve amount, we dynamically expand, 
do the necessary unencryption in blockable context, and add to the pool.  
Bonus is that this provides high-order lowmem reserve if implemented as 
per-order freelists rather than the current watermark check that provides 
no guarantees for any high-order lowmem.

I don't want to distract from the first set of questions in my previous 
email because I need an understanding of that anyway, but I'm hoping 
Christoph can guide me on why the above wouldn't be an improvement even 
for non encrypted guests.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [bug] __blk_mq_run_hw_queue suspicious rcu usage

2019-12-12 Thread David Rientjes via iommu
On Thu, 28 Nov 2019, Christoph Hellwig wrote:

> > So we're left with making dma_pool_alloc(GFP_ATOMIC) actually be atomic 
> > even when the DMA needs to be unencrypted for SEV.  Christoph's suggestion 
> > was to wire up dmapool in kernel/dma/remap.c for this.  Is that necessary 
> > to be done for all devices that need to do dma_pool_alloc(GFP_ATOMIC) or 
> > can we do it within the DMA API itself so it's transparent to the driver?
> 
> It needs to be transparent to the driver.  Lots of drivers use GFP_ATOMIC
> dma allocations, and all of them are broken on SEV setups currently.
> 

Not my area, so bear with me.

Since all DMA must be unencrypted in this case, what happens if all 
dma_direct_alloc_pages() calls go through the DMA pool in 
kernel/dma/remap.c when force_dma_unencrypted(dev) == true since 
__PAGE_ENC is cleared for these ptes?  (Ignoring for a moment that this 
special pool should likely be a separate dma pool.)

I assume a general depletion of that atomic pool so 
DEFAULT_DMA_COHERENT_POOL_SIZE becomes insufficient.  I'm not sure what 
size any DMA pool wired up for this specific purpose would need to be 
sized at, so I assume dynamic resizing is required.

It shouldn't be *that* difficult to supplement kernel/dma/remap.c with the 
ability to do background expansion of the atomic pool when nearing its 
capacity for this purpose?  I imagine that if we just can't allocate pages 
within the DMA mask that it's the only blocker to dynamic expansion and we 
don't oom kill for lowmem.  But perhaps vm.lowmem_reserve_ratio is good 
enough protection?

Beyond that, I'm not sure what sizing would be appropriate if this is to 
be a generic solution in the DMA API for all devices that may require 
unecrypted memory.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [bug] __blk_mq_run_hw_queue suspicious rcu usage

2019-11-27 Thread David Rientjes via iommu
On Wed, 18 Sep 2019, Christoph Hellwig wrote:

> On Tue, Sep 17, 2019 at 06:41:02PM +, Lendacky, Thomas wrote:
> > > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
> > > --- a/drivers/nvme/host/pci.c
> > > +++ b/drivers/nvme/host/pci.c
> > > @@ -1613,7 +1613,8 @@ static int nvme_alloc_admin_tags(struct nvme_dev 
> > > *dev)
> > >   dev->admin_tagset.timeout = ADMIN_TIMEOUT;
> > >   dev->admin_tagset.numa_node = dev_to_node(dev->dev);
> > >   dev->admin_tagset.cmd_size = sizeof(struct nvme_iod);
> > > - dev->admin_tagset.flags = BLK_MQ_F_NO_SCHED;
> > > + dev->admin_tagset.flags = BLK_MQ_F_NO_SCHED |
> > > +   BLK_MQ_F_BLOCKING;
> > 
> > I think you want to only set the BLK_MQ_F_BLOCKING if the DMA is required
> > to be unencrypted. Unfortunately, force_dma_unencrypted() can't be called
> > from a module. Is there a DMA API that could be called to get that info?
> 
> The DMA API must support non-blocking calls, and various drivers rely
> on that.  So we need to provide that even for the SEV case.  If the
> actual blocking can't be made to work we'll need to wire up the DMA
> pool in kernel/dma/remap.c for it (and probably move it to separate
> file).
> 

Resurrecting this thread from a couple months ago because it appears that 
this is still an issue with 5.4 guests.

dma_pool_alloc(), regardless of whether mem_flags allows blocking or not, 
can always sleep if the device's DMA must be unencrypted and 
mem_encrypt_active() == true.  We know this because vm_unmap_aliases() can 
always block.

NVMe's setup of PRPs and SGLs uses dma_pool_alloc(GFP_ATOMIC) but when 
this is a SEV-enabled guest this allocation may block due to the 
possibility of allocating DMA coherent memory through dma_direct_alloc().

It seems like one solution would be to add significant latency by doing 
BLK_MQ_F_BLOCKING if force_dma_unencrypted() is true for the device but 
this comes with significant downsides.

So we're left with making dma_pool_alloc(GFP_ATOMIC) actually be atomic 
even when the DMA needs to be unencrypted for SEV.  Christoph's suggestion 
was to wire up dmapool in kernel/dma/remap.c for this.  Is that necessary 
to be done for all devices that need to do dma_pool_alloc(GFP_ATOMIC) or 
can we do it within the DMA API itself so it's transparent to the driver?

Thomas/Brijesh: separately, it seems the use of set_memory_encrypted() or 
set_memory_decrypted() must be possible without blocking; is this only an 
issue from the DMA API point of view or can it be done elsewhere?
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [bug] __blk_mq_run_hw_queue suspicious rcu usage

2019-09-05 Thread David Rientjes via iommu
On Thu, 5 Sep 2019, Christoph Hellwig wrote:

> > Hi Christoph, Jens, and Ming,
> > 
> > While booting a 5.2 SEV-enabled guest we have encountered the following 
> > WARNING that is followed up by a BUG because we are in atomic context 
> > while trying to call set_memory_decrypted:
> 
> Well, this really is a x86 / DMA API issue unfortunately.  Drivers
> are allowed to do GFP_ATOMIC dma allocation under locks / rcu critical
> sections and from interrupts.  And it seems like the SEV case can't
> handle that.  We have some semi-generic code to have a fixed sized
> pool in kernel/dma for non-coherent platforms that have similar issues
> that we could try to wire up, but I wonder if there is a better way
> to handle the issue, so I've added Tom and the x86 maintainers.
> 
> Now independent of that issue using DMA coherent memory for the nvme
> PRPs/SGLs doesn't actually feel very optional.  We could do with
> normal kmalloc allocations and just sync it to the device and back.
> I wonder if we should create some general mempool-like helpers for that.
> 

Thanks for looking into this.  I assume it's a non-starter to try to 
address this in _vm_unmap_aliases() itself, i.e. rely on a purge spinlock 
to do all synchronization (or trylock if not forced) for 
purge_vmap_area_lazy() rather than only the vmap_area_lock within it.  In 
other words, no mutex.

If that's the case, and set_memory_encrypted() can't be fixed to not need 
to sleep by changing _vm_unmap_aliases() locking, then I assume dmapool is 
our only alternative?  I have no idea with how large this should be.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [GIT PULL] dma mapping fixes for 4.17-rc3

2018-04-25 Thread David Rientjes via iommu
On Tue, 24 Apr 2018, Christoph Hellwig wrote:

> On Tue, Apr 24, 2018 at 11:54:26PM -0700, David Rientjes wrote:
> > Shouldn't that test for dev->coherent_dma_mask < DMA_BIT_MASK(32) be more 
> > accurately <=? 
> 
> No, it should really be <.  The exactly 32-bit case is already covered
> with GFP_DMA32.  Eventualy it should be < 24-bit with a separate
> GFP_DMA32 case for > 32-bit && < 64-bit.  Takashi has been working on
> that, but it is 4.18 material.
> 

Ack, thanks.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [GIT PULL] dma mapping fixes for 4.17-rc3

2018-04-25 Thread David Rientjes via iommu
On Wed, 25 Apr 2018, Christoph Hellwig wrote:

> The following changes since commit 6d08b06e67cd117f6992c46611dfb4ce267cd71e:
> 
>   Linux 4.17-rc2 (2018-04-22 19:20:09 -0700)
> 
> are available in the Git repository at:
> 
>   git://git.infradead.org/users/hch/dma-mapping.git tags/dma-mapping-4.17-3
> 
> for you to fetch changes up to 60695be2bb6b0623f8e53bd9949d582a83c6d44a:
> 
>   dma-mapping: postpone cpu addr translation on mmap (2018-04-23 14:44:24 
> +0200)
> 
> 
> A few small dma-mapping fixes for Linux 4.17-rc3:
> 
>  - don't loop to try GFP_DMA allocations if ZONE_DMA is not actually
>enabled (regression in 4.16)

Shouldn't that test for dev->coherent_dma_mask < DMA_BIT_MASK(32) be more 
accurately <=? 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu