Re: [RFC v2] /dev/iommu uAPI proposal

2021-08-09 Thread David Gibson
On Mon, Aug 09, 2021 at 08:34:06AM +, Tian, Kevin wrote:
> > From: David Gibson 
> > Sent: Friday, August 6, 2021 12:45 PM
> > > > > In concept I feel the purpose of DMA endpoint is equivalent to the
> > routing
> > > > > info in this proposal.
> > > >
> > > > Maybe?  I'm afraid I never quite managed to understand the role of the
> > > > routing info in your proposal.
> > > >
> > >
> > > the IOMMU routes incoming DMA packets to a specific I/O page table,
> > > according to RID or RID+PASID carried in the packet. RID or RID+PASID
> > > is the routing information (represented by device cookie +PASID in
> > > proposed uAPI) and what the iommu driver really cares when activating
> > > the I/O page table in the iommu.
> > 
> > Ok, so yes, endpoint is roughly equivalent to that.  But my point is
> > that the IOMMU layer really only cares about that (device+routing)
> > combination, not other aspects of what the device is.  So that's the
> > concept we should give a name and put front and center in the API.
> > 
> 
> This is how this proposal works, centered around the routing info. the 
> uAPI doesn't care what the device is. It just requires the user to specify 
> the user view of routing info (device fd + optional pasid) to tag an IOAS. 

Which works as long as (just device fd) and (device fd + PASID) covers
all the options.  If we have new possibilities we need new interfaces.
And, that can't even handle the case of one endpoint for multiple
devices (e.g. ACS-incapable bridge).

> the user view is then converted to the kernel view of routing (rid or 
> rid+pasid) by vfio driver and then passed to iommu fd in the attaching 
> operation. and GET_INFO interface is provided for the user to check 
> whether a device supports multiple IOASes and whether pasid space is 
> delegated to the user. We just need a better name if pasid is considered 
> too pci specific...
> 
> But creating an endpoint per ioasid and making it centered in uAPI is not 
> what the IOMMU layer cares about.

It's not an endpoint per ioasid.  You can have multiple endpoints per
ioasid, just not the other way around.  As it is multiple IOASes per
device means *some* sort of disambiguation (generally by PASID) which
is hard to describe generall.  Having endpoints as a first-class
concept makes that simpler.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson


signature.asc
Description: PGP signature
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

[Patch V3 2/2] iommu/arm-smmu: Fix race condition during iommu_group creation

2021-08-09 Thread Ashish Mhetre via iommu
From: Krishna Reddy 

When two devices with same SID are getting probed concurrently through
iommu_probe_device(), the iommu_group sometimes is getting allocated more
than once as call to arm_smmu_device_group() is not protected for
concurrency. Furthermore, it leads to each device holding a different
iommu_group and domain pointer, separate IOVA space and only one of the
devices' domain is used for translations from IOMMU. This causes accesses
from other device to fault or see incorrect translations.
Fix this by protecting iommu_group allocation from concurrency in
arm_smmu_device_group().

Signed-off-by: Krishna Reddy 
Signed-off-by: Ashish Mhetre 
---
 drivers/iommu/arm/arm-smmu/arm-smmu.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c 
b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index dba15f3..f9c6648 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -1457,6 +1457,7 @@ static struct iommu_group *arm_smmu_device_group(struct 
device *dev)
struct iommu_group *group = NULL;
int i, idx;
 
+   mutex_lock(>stream_map_mutex);
for_each_cfg_sme(cfg, fwspec, i, idx) {
if (group && smmu->s2crs[idx].group &&
group != smmu->s2crs[idx].group)
@@ -1465,8 +1466,10 @@ static struct iommu_group *arm_smmu_device_group(struct 
device *dev)
group = smmu->s2crs[idx].group;
}
 
-   if (group)
+   if (group) {
+   mutex_unlock(>stream_map_mutex);
return iommu_group_ref_get(group);
+   }
 
if (dev_is_pci(dev))
group = pci_device_group(dev);
@@ -1480,6 +1483,7 @@ static struct iommu_group *arm_smmu_device_group(struct 
device *dev)
for_each_cfg_sme(cfg, fwspec, i, idx)
smmu->s2crs[idx].group = group;
 
+   mutex_unlock(>stream_map_mutex);
return group;
 }
 
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[Patch V3 1/2] iommu: Fix race condition during default domain allocation

2021-08-09 Thread Ashish Mhetre via iommu
When two devices with same SID are getting probed concurrently through
iommu_probe_device(), the iommu_domain sometimes is getting allocated more
than once as call to iommu_alloc_default_domain() is not protected for
concurrency. Furthermore, it leads to each device holding a different
iommu_domain pointer, separate IOVA space and only one of the devices'
domain is used for translations from IOMMU. This causes accesses from other
device to fault or see incorrect translations.
Fix this by protecting iommu_alloc_default_domain() call with group->mutex
and let all devices with same SID share same iommu_domain.

Signed-off-by: Ashish Mhetre 
---
 drivers/iommu/iommu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/iommu/iommu.c b/drivers/iommu/iommu.c
index 5419c4b..80c5a1c 100644
--- a/drivers/iommu/iommu.c
+++ b/drivers/iommu/iommu.c
@@ -273,7 +273,9 @@ int iommu_probe_device(struct device *dev)
 * support default domains, so the return value is not yet
 * checked.
 */
+   mutex_lock(>mutex);
iommu_alloc_default_domain(group, dev);
+   mutex_unlock(>mutex);
 
if (group->default_domain) {
ret = __iommu_attach_device(group->default_domain, dev);
-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[Patch V3 0/2] iommu/arm-smmu: Fix races in iommu domain/group creation

2021-08-09 Thread Ashish Mhetre via iommu
When two devices with same SID are getting probed concurrently through
iommu_probe_device(), the iommu_group and iommu_domain are allocated more
than once because they are not protected for concurrency. This is leading
to context faults when one device is accessing IOVA from other device.
Fix this by protecting iommu_domain and iommu_group creation with mutexes.

Changes in v3:
* Updated commit messages.
* Added Signed-off-by in patch 2.

Ashish Mhetre (1):
  iommu: Fix race condition during default domain allocation

Krishna Reddy (1):
  iommu/arm-smmu: Fix race condition during iommu_group creation

 drivers/iommu/arm/arm-smmu/arm-smmu.c | 6 +-
 drivers/iommu/iommu.c | 2 ++
 2 files changed, 7 insertions(+), 1 deletion(-)

-- 
2.7.4

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v10 01/17] iova: Export alloc_iova_fast() and free_iova_fast()

2021-08-09 Thread Jason Wang


在 2021/8/9 下午1:56, Yongji Xie 写道:

On Thu, Aug 5, 2021 at 9:31 PM Jason Wang  wrote:


在 2021/8/5 下午8:34, Yongji Xie 写道:

My main point, though, is that if you've already got something else
keeping track of the actual addresses, then the way you're using an
iova_domain appears to be something you could do with a trivial bitmap
allocator. That's why I don't buy the efficiency argument. The main
design points of the IOVA allocator are to manage large address spaces
while trying to maximise spatial locality to minimise the underlying
pagetable usage, and allocating with a flexible limit to support
multiple devices with different addressing capabilities in the same
address space. If none of those aspects are relevant to the use-case -
which AFAICS appears to be true here - then as a general-purpose
resource allocator it's rubbish and has an unreasonably massive memory
overhead and there are many, many better choices.


OK, I get your point. Actually we used the genpool allocator in the
early version. Maybe we can fall back to using it.


I think maybe you can share some perf numbers to see how much
alloc_iova_fast() can help.


I did some fio tests[1] with a ram-backend vduse block device[2].

Following are some performance data:

 numjobs=1   numjobs=2numjobs=4   numjobs=8
iova_alloc_fast145k iops  265k iops  514k iops  758k iops

iova_alloc137k iops 170k iops  128k iops  113k iops

gen_pool_alloc   143k iops  270k iops  458k iops  521k iops

The iova_alloc_fast() has the best performance since we always hit the
per-cpu cache. Regardless of the per-cpu cache, the genpool allocator
should be better than the iova allocator.



I think we see convincing numbers for using iova_alloc_fast() than the 
gen_poll_alloc() (45% improvement on job=8).


Thanks




[1] fio jobfile:

[global]
rw=randread
direct=1
ioengine=libaio
iodepth=16
time_based=1
runtime=60s
group_reporting
bs=4k
filename=/dev/vda
[job]
numjobs=..

[2]  $ qemu-storage-daemon \
   --chardev socket,id=charmonitor,path=/tmp/qmp.sock,server,nowait \
   --monitor chardev=charmonitor \
   --blockdev
driver=host_device,cache.direct=on,aio=native,filename=/dev/nullb0,node-name=disk0
\
   --export 
type=vduse-blk,id=test,node-name=disk0,writable=on,name=vduse-null,num-queues=16,queue-size=128

The qemu-storage-daemon can be builded based on the repo:
https://github.com/bytedance/qemu/tree/vduse-test.

Thanks,
Yongji



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v2 3/9] dma-iommu: bounce buffers for untrusted devices

2021-08-09 Thread David Stevens
On Tue, Aug 10, 2021 at 10:19 AM Mi, Dapeng1  wrote:
>
> Hi David,
>
> I like this patch set and this is crucial for reducing the significant vIOMMU 
> performance. It looks you totally rewrite the IOMMU mapping/unmapping part 
> and use the dynamically allocated memory from buddy system as bounce buffer 
> instead of using the legacy SWIOTLB bounce buffer. As I know, some legacy 
> devices' DMA could not access the memory larger than 32-bit memory space and 
> the dynamically allocated memory address could exceed the 32-bit memory 
> space. Is it a problem?

My understanding is that when devices with that sort of limitation sit
behind an IOMMU, the IOVA is what matters, not the physical address.
The bounce bounce buffers use the same limits for IOVA allocation as
the regular dma-iommu path, so compatible IOVAs will be allocated for
the bounce buffers.

-David

> Thx,
> Dapeng Mi
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v2 3/9] dma-iommu: bounce buffers for untrusted devices

2021-08-09 Thread Mi, Dapeng1
Hi David,

I like this patch set and this is crucial for reducing the significant vIOMMU 
performance. It looks you totally rewrite the IOMMU mapping/unmapping part and 
use the dynamically allocated memory from buddy system as bounce buffer instead 
of using the legacy SWIOTLB bounce buffer. As I know, some legacy devices' DMA 
could not access the memory larger than 32-bit memory space and the dynamically 
allocated memory address could exceed the 32-bit memory space. Is it a problem?

Thx,
Dapeng Mi

-Original Message-
From: iommu  On Behalf Of David 
Stevens
Sent: Friday, August 6, 2021 6:34 PM
To: Robin Murphy 
Cc: linux-ker...@vger.kernel.org; Sergey Senozhatsky 
; iommu@lists.linux-foundation.org; David Stevens 
; Will Deacon ; Christoph Hellwig 

Subject: [PATCH v2 3/9] dma-iommu: bounce buffers for untrusted devices

From: David Stevens 

Add support for dynamic bounce buffers to the dma-api for use with subgranule 
IOMMU mappings with untrusted devices. Bounce buffer management is split into 
two parts. First, there is a buffer manager that is responsible for allocating 
and tracking buffers. Second, there is a layer that uses the managed buffers as 
bounce buffers. It is responsible for managing the IOMMU mapping and for 
syncing between the original and bounce buffers.

For now, buffer management is very simple - every mapping allocates a new 
bounce buffer.

Signed-off-by: David Stevens 
---
 drivers/iommu/Makefile|   2 +-
 drivers/iommu/dma-iommu.c |  70 +-
 drivers/iommu/io-bounce-buffers.c | 358 ++  
drivers/iommu/io-bounce-buffers.h |  46   drivers/iommu/io-buffer-manager.c 
| 212 ++  drivers/iommu/io-buffer-manager.h |  43 
 6 files changed, 728 insertions(+), 3 deletions(-)  create mode 100644 
drivers/iommu/io-bounce-buffers.c  create mode 100644 
drivers/iommu/io-bounce-buffers.h  create mode 100644 
drivers/iommu/io-buffer-manager.c  create mode 100644 
drivers/iommu/io-buffer-manager.h

diff --git a/drivers/iommu/Makefile b/drivers/iommu/Makefile index 
c0fb0ba88143..4edaf7adc082 100644
--- a/drivers/iommu/Makefile
+++ b/drivers/iommu/Makefile
@@ -4,7 +4,7 @@ obj-$(CONFIG_IOMMU_API) += iommu.o
 obj-$(CONFIG_IOMMU_API) += iommu-traces.o
 obj-$(CONFIG_IOMMU_API) += iommu-sysfs.o
 obj-$(CONFIG_IOMMU_DEBUGFS) += iommu-debugfs.o
-obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o
+obj-$(CONFIG_IOMMU_DMA) += dma-iommu.o io-bounce-buffers.o 
+io-buffer-manager.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE) += io-pgtable.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_ARMV7S) += io-pgtable-arm-v7s.o
 obj-$(CONFIG_IOMMU_IO_PGTABLE_LPAE) += io-pgtable-arm.o diff --git 
a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 
055ccda5eba1..908eb6fb7dc3 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -24,6 +24,8 @@
 #include 
 #include 
 
+#include "io-bounce-buffers.h"
+
 struct iommu_dma_msi_page {
struct list_headlist;
dma_addr_t  iova;
@@ -44,6 +46,7 @@ struct iommu_dma_cookie {
dma_addr_t  msi_iova;
};
struct list_headmsi_page_list;
+   struct io_bounce_buffers*bounce_buffers;
 
/* Domain for flush queue callback; NULL if flush queue not in use */
struct iommu_domain *fq_domain;
@@ -81,6 +84,14 @@ static inline size_t cookie_msi_granule(struct 
iommu_dma_cookie *cookie)
return PAGE_SIZE;
 }
 
+static struct io_bounce_buffers *dev_to_io_bounce_buffers(struct device 
+*dev) {
+   struct iommu_domain *domain = iommu_get_dma_domain(dev);
+   struct iommu_dma_cookie *cookie = domain->iova_cookie;
+
+   return cookie->bounce_buffers;
+}
+
 static struct iommu_dma_cookie *cookie_alloc(enum iommu_dma_cookie_type type)  
{
struct iommu_dma_cookie *cookie;
@@ -160,6 +171,9 @@ void iommu_put_dma_cookie(struct iommu_domain *domain)
if (!cookie)
return;
 
+   if (cookie->bounce_buffers)
+   io_bounce_buffers_destroy(cookie->bounce_buffers);
+
if (cookie->type == IOMMU_DMA_IOVA_COOKIE && cookie->iovad.granule)
put_iova_domain(>iovad);
 
@@ -333,6 +347,7 @@ static int iommu_dma_init_domain(struct iommu_domain 
*domain, dma_addr_t base,
struct iommu_dma_cookie *cookie = domain->iova_cookie;
unsigned long order, base_pfn;
struct iova_domain *iovad;
+   int ret;
 
if (!cookie || cookie->type != IOMMU_DMA_IOVA_COOKIE)
return -EINVAL;
@@ -380,7 +395,16 @@ static int iommu_dma_init_domain(struct iommu_domain 
*domain, dma_addr_t base,
if (!dev)
return 0;
 
-   return iova_reserve_iommu_regions(dev, domain);
+   ret = iova_reserve_iommu_regions(dev, domain);
+
+   if (ret == 0 && dev_is_untrusted(dev)) {
+   cookie->bounce_buffers =
+   io_bounce_buffers_init(dev, domain, iovad);
+   

[iommu:apple/dart 3/3] drivers/iommu/apple-dart.c:730:17: error: initialization of 'size_t (*)(struct iommu_domain *, long unsigned int, size_t, struct iommu_iotlb_gather *)' {aka 'long unsigned int

2021-08-09 Thread kernel test robot
tree:   https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git 
apple/dart
head:   05ce9d20d699b093dec985192a7db63b48f26ca2
commit: 05ce9d20d699b093dec985192a7db63b48f26ca2 [3/3] iommu/dart: Add DART 
iommu driver
config: sparc-allyesconfig (attached as .config)
compiler: sparc64-linux-gcc (GCC) 10.3.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# 
https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git/commit/?id=05ce9d20d699b093dec985192a7db63b48f26ca2
git remote add iommu 
https://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu.git
git fetch --no-tags iommu apple/dart
git checkout 05ce9d20d699b093dec985192a7db63b48f26ca2
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-10.3.0 make.cross 
ARCH=sparc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

   drivers/iommu/apple-dart.c: In function 'apple_dart_map_pages':
   drivers/iommu/apple-dart.c:380:12: error: 'struct io_pgtable_ops' has no 
member named 'map_pages'
 380 |  return ops->map_pages(ops, iova, paddr, pgsize, pgcount, prot, gfp,
 |^~
   drivers/iommu/apple-dart.c: In function 'apple_dart_unmap_pages':
   drivers/iommu/apple-dart.c:392:12: error: 'struct io_pgtable_ops' has no 
member named 'unmap_pages'
 392 |  return ops->unmap_pages(ops, iova, pgsize, pgcount, gather);
 |^~
   drivers/iommu/apple-dart.c: At top level:
   drivers/iommu/apple-dart.c:729:3: error: 'const struct iommu_ops' has no 
member named 'map_pages'
 729 |  .map_pages = apple_dart_map_pages,
 |   ^
   drivers/iommu/apple-dart.c:729:15: error: initialization of 'int (*)(struct 
iommu_domain *, long unsigned int,  phys_addr_t,  size_t,  int,  gfp_t)' {aka 
'int (*)(struct iommu_domain *, long unsigned int,  long long unsigned int,  
long unsigned int,  int,  unsigned int)'} from incompatible pointer type 'int 
(*)(struct iommu_domain *, long unsigned int,  phys_addr_t,  size_t,  size_t,  
int,  gfp_t,  size_t *)' {aka 'int (*)(struct iommu_domain *, long unsigned 
int,  long long unsigned int,  long unsigned int,  long unsigned int,  int,  
unsigned int,  long unsigned int *)'} [-Werror=incompatible-pointer-types]
 729 |  .map_pages = apple_dart_map_pages,
 |   ^~~~
   drivers/iommu/apple-dart.c:729:15: note: (near initialization for 
'apple_dart_iommu_ops.map')
   drivers/iommu/apple-dart.c:730:3: error: 'const struct iommu_ops' has no 
member named 'unmap_pages'
 730 |  .unmap_pages = apple_dart_unmap_pages,
 |   ^~~
>> drivers/iommu/apple-dart.c:730:17: error: initialization of 'size_t 
>> (*)(struct iommu_domain *, long unsigned int,  size_t,  struct 
>> iommu_iotlb_gather *)' {aka 'long unsigned int (*)(struct iommu_domain *, 
>> long unsigned int,  long unsigned int,  struct iommu_iotlb_gather *)'} from 
>> incompatible pointer type 'size_t (*)(struct iommu_domain *, long unsigned 
>> int,  size_t,  size_t,  struct iommu_iotlb_gather *)' {aka 'long unsigned 
>> int (*)(struct iommu_domain *, long unsigned int,  long unsigned int,  long 
>> unsigned int,  struct iommu_iotlb_gather *)'} 
>> [-Werror=incompatible-pointer-types]
 730 |  .unmap_pages = apple_dart_unmap_pages,
 | ^~
   drivers/iommu/apple-dart.c:730:17: note: (near initialization for 
'apple_dart_iommu_ops.unmap')
   drivers/iommu/apple-dart.c: In function 'apple_dart_unmap_pages':
   drivers/iommu/apple-dart.c:393:1: error: control reaches end of non-void 
function [-Werror=return-type]
 393 | }
 | ^
   drivers/iommu/apple-dart.c: In function 'apple_dart_map_pages':
   drivers/iommu/apple-dart.c:382:1: error: control reaches end of non-void 
function [-Werror=return-type]
 382 | }
 | ^
   cc1: some warnings being treated as errors


vim +730 drivers/iommu/apple-dart.c

   723  
   724  static const struct iommu_ops apple_dart_iommu_ops = {
   725  .domain_alloc = apple_dart_domain_alloc,
   726  .domain_free = apple_dart_domain_free,
   727  .attach_dev = apple_dart_attach_dev,
   728  .detach_dev = apple_dart_detach_dev,
 > 729  .map_pages = apple_dart_map_pages,
 > 730  .unmap_pages = apple_dart_unmap_pages,
   731  .flush_iotlb_all = apple_dart_flush_iotlb_all,
   732  .iotlb_sync = apple_dart_iotlb_sync,
   733  .iotlb_sync_map = apple_dart_iotlb_sync_map,
   734  .iova_to_phys = apple_dart_iova_to_phys,
   735  .probe_device = apple_dart_probe_device,
   736  .release_device = apple_dart_release_device,
   737  .device_group = apple_dart_device_group,
   738  .of_xlate = 

Re: [PATCH 00/11] Implement generic prot_guest_has() helper function

2021-08-09 Thread Tom Lendacky via iommu
On 8/8/21 8:41 PM, Kuppuswamy, Sathyanarayanan wrote:
> Hi Tom,
> 
> On 7/27/21 3:26 PM, Tom Lendacky wrote:
>> This patch series provides a generic helper function, prot_guest_has(),
>> to replace the sme_active(), sev_active(), sev_es_active() and
>> mem_encrypt_active() functions.
>>
>> It is expected that as new protected virtualization technologies are
>> added to the kernel, they can all be covered by a single function call
>> instead of a collection of specific function calls all called from the
>> same locations.
>>
>> The powerpc and s390 patches have been compile tested only. Can the
>> folks copied on this series verify that nothing breaks for them.
> 
> With this patch set, select ARCH_HAS_PROTECTED_GUEST and set
> CONFIG_AMD_MEM_ENCRYPT=n, creates following error.
> 
> ld: arch/x86/mm/ioremap.o: in function `early_memremap_is_setup_data':
> arch/x86/mm/ioremap.c:672: undefined reference to `early_memremap_decrypted'
> 
> It looks like early_memremap_is_setup_data() is not protected with
> appropriate config.

Ok, thanks for finding that. I'll fix that.

Thanks,
Tom

> 
> 
>>
>> Cc: Andi Kleen 
>> Cc: Andy Lutomirski 
>> Cc: Ard Biesheuvel 
>> Cc: Baoquan He 
>> Cc: Benjamin Herrenschmidt 
>> Cc: Borislav Petkov 
>> Cc: Christian Borntraeger 
>> Cc: Daniel Vetter 
>> Cc: Dave Hansen 
>> Cc: Dave Young 
>> Cc: David Airlie 
>> Cc: Heiko Carstens 
>> Cc: Ingo Molnar 
>> Cc: Joerg Roedel 
>> Cc: Maarten Lankhorst 
>> Cc: Maxime Ripard 
>> Cc: Michael Ellerman 
>> Cc: Paul Mackerras 
>> Cc: Peter Zijlstra 
>> Cc: Thomas Gleixner 
>> Cc: Thomas Zimmermann 
>> Cc: Vasily Gorbik 
>> Cc: VMware Graphics 
>> Cc: Will Deacon 
>>
>> ---
>>
>> Patches based on:
>>   
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgit.kernel.org%2Fpub%2Fscm%2Flinux%2Fkernel%2Fgit%2Ftip%2Ftip.gitdata=04%7C01%7Cthomas.lendacky%40amd.com%7C563b5e30a3254f6739aa08d95ad6e242%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637640701228434514%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=vx9v4EmYqVTsJ7KSr97gQaBWJ%2Fq%2BE9NOzXMhe3Fp7T8%3Dreserved=0
>> master
>>    commit 79e920060fa7 ("Merge branch 'WIP/fixes'")
>>
>> Tom Lendacky (11):
>>    mm: Introduce a function to check for virtualization protection
>>  features
>>    x86/sev: Add an x86 version of prot_guest_has()
>>    powerpc/pseries/svm: Add a powerpc version of prot_guest_has()
>>    x86/sme: Replace occurrences of sme_active() with prot_guest_has()
>>    x86/sev: Replace occurrences of sev_active() with prot_guest_has()
>>    x86/sev: Replace occurrences of sev_es_active() with prot_guest_has()
>>    treewide: Replace the use of mem_encrypt_active() with
>>  prot_guest_has()
>>    mm: Remove the now unused mem_encrypt_active() function
>>    x86/sev: Remove the now unused mem_encrypt_active() function
>>    powerpc/pseries/svm: Remove the now unused mem_encrypt_active()
>>  function
>>    s390/mm: Remove the now unused mem_encrypt_active() function
>>
>>   arch/Kconfig   |  3 ++
>>   arch/powerpc/include/asm/mem_encrypt.h |  5 --
>>   arch/powerpc/include/asm/protected_guest.h | 30 +++
>>   arch/powerpc/platforms/pseries/Kconfig |  1 +
>>   arch/s390/include/asm/mem_encrypt.h    |  2 -
>>   arch/x86/Kconfig   |  1 +
>>   arch/x86/include/asm/kexec.h   |  2 +-
>>   arch/x86/include/asm/mem_encrypt.h | 13 +
>>   arch/x86/include/asm/protected_guest.h | 27 ++
>>   arch/x86/kernel/crash_dump_64.c    |  4 +-
>>   arch/x86/kernel/head64.c   |  4 +-
>>   arch/x86/kernel/kvm.c  |  3 +-
>>   arch/x86/kernel/kvmclock.c |  4 +-
>>   arch/x86/kernel/machine_kexec_64.c | 19 +++
>>   arch/x86/kernel/pci-swiotlb.c  |  9 ++--
>>   arch/x86/kernel/relocate_kernel_64.S   |  2 +-
>>   arch/x86/kernel/sev.c  |  6 +--
>>   arch/x86/kvm/svm/svm.c |  3 +-
>>   arch/x86/mm/ioremap.c  | 16 +++---
>>   arch/x86/mm/mem_encrypt.c  | 60 +++---
>>   arch/x86/mm/mem_encrypt_identity.c |  3 +-
>>   arch/x86/mm/pat/set_memory.c   |  3 +-
>>   arch/x86/platform/efi/efi_64.c |  9 ++--
>>   arch/x86/realmode/init.c   |  8 +--
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c    |  4 +-
>>   drivers/gpu/drm/drm_cache.c    |  4 +-
>>   drivers/gpu/drm/vmwgfx/vmwgfx_drv.c    |  4 +-
>>   drivers/gpu/drm/vmwgfx/vmwgfx_msg.c    |  6 +--
>>   drivers/iommu/amd/init.c   |  7 +--
>>   drivers/iommu/amd/iommu.c  |  3 +-
>>   drivers/iommu/amd/iommu_v2.c   |  3 +-
>>   drivers/iommu/iommu.c  |  3 +-
>>   fs/proc/vmcore.c   |  6 +--
>>   include/linux/mem_encrypt.h    |  4 --
>>   

Re: [PATCH V3 03/13] x86/HV: Add new hvcall guest address host visibility support

2021-08-09 Thread Dave Hansen
On 8/9/21 10:56 AM, Tianyu Lan wrote:
> From: Tianyu Lan 
> 
> Add new hvcall guest address host visibility support to mark
> memory visible to host. Call it inside set_memory_decrypted
> /encrypted(). Add HYPERVISOR feature check in the
> hv_is_isolation_supported() to optimize in non-virtualization
> environment.

>From an x86/mm perspective:

Acked-by: Dave Hansen 

A tiny nit:

> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 0bb4d9ca7a55..b3683083208a 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -607,6 +607,12 @@ EXPORT_SYMBOL_GPL(hv_get_isolation_type);
>  
>  bool hv_is_isolation_supported(void)
>  {
> + if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR))
> + return 0;
> +
> + if (!hypervisor_is_type(X86_HYPER_MS_HYPERV))
> + return 0;
> +
>   return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
>  }
This might be worthwhile to move to a header.  That ensures that
hv_is_isolation_supported() use can avoid even a function call.  But, I
see this is used in modules and its use here is also in a slow path, so
it's not a big deal
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 06/11] x86/sev: Replace occurrences of sev_es_active() with prot_guest_has()

2021-08-09 Thread Kuppuswamy, Sathyanarayanan




On 8/9/21 2:59 PM, Tom Lendacky wrote:

Not sure how TDX will handle AP booting, are you sure it needs this
special setup as well? Otherwise a check for SEV-ES would be better
instead of the generic PATTR_GUEST_PROT_STATE.

Yes, I'm not sure either. I figure that change can be made, if needed, as
part of the TDX support.


We don't plan to set PROT_STATE. So it does not affect TDX.
For SMP, we use MADT ACPI table for AP booting.

--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 07/11] treewide: Replace the use of mem_encrypt_active() with prot_guest_has()

2021-08-09 Thread Tom Lendacky via iommu
On 8/2/21 7:42 AM, Christophe Leroy wrote:
> 
> 
> Le 28/07/2021 à 00:26, Tom Lendacky a écrit :
>> Replace occurrences of mem_encrypt_active() with calls to prot_guest_has()
>> with the PATTR_MEM_ENCRYPT attribute.
> 
> 
> What about
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.ozlabs.org%2Fproject%2Flinuxppc-dev%2Fpatch%2F20210730114231.23445-1-will%40kernel.org%2Fdata=04%7C01%7Cthomas.lendacky%40amd.com%7C1198d62463e04a27be5908d955b30433%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637635049667233612%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000sdata=Erpu4Du05sVYkYuAfTkXdLvq48%2FlfLS2q%2FZW8DG3tFw%3Dreserved=0>
>  ?

Ah, looks like that just went into the PPC tree and isn't part of the tip
tree. I'll have to look into how to handle that one.

Thanks,
Tom

> 
> Christophe
> 
> 
>>
>> Cc: Thomas Gleixner 
>> Cc: Ingo Molnar 
>> Cc: Borislav Petkov 
>> Cc: Dave Hansen 
>> Cc: Andy Lutomirski 
>> Cc: Peter Zijlstra 
>> Cc: David Airlie 
>> Cc: Daniel Vetter 
>> Cc: Maarten Lankhorst 
>> Cc: Maxime Ripard 
>> Cc: Thomas Zimmermann 
>> Cc: VMware Graphics 
>> Cc: Joerg Roedel 
>> Cc: Will Deacon 
>> Cc: Dave Young 
>> Cc: Baoquan He 
>> Signed-off-by: Tom Lendacky 
>> ---
>>   arch/x86/kernel/head64.c    | 4 ++--
>>   arch/x86/mm/ioremap.c   | 4 ++--
>>   arch/x86/mm/mem_encrypt.c   | 5 ++---
>>   arch/x86/mm/pat/set_memory.c    | 3 ++-
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 4 +++-
>>   drivers/gpu/drm/drm_cache.c | 4 ++--
>>   drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 4 ++--
>>   drivers/gpu/drm/vmwgfx/vmwgfx_msg.c | 6 +++---
>>   drivers/iommu/amd/iommu.c   | 3 ++-
>>   drivers/iommu/amd/iommu_v2.c    | 3 ++-
>>   drivers/iommu/iommu.c   | 3 ++-
>>   fs/proc/vmcore.c    | 6 +++---
>>   kernel/dma/swiotlb.c    | 4 ++--
>>   13 files changed, 29 insertions(+), 24 deletions(-)
>>
>> diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c
>> index de01903c3735..cafed6456d45 100644
>> --- a/arch/x86/kernel/head64.c
>> +++ b/arch/x86/kernel/head64.c
>> @@ -19,7 +19,7 @@
>>   #include 
>>   #include 
>>   #include 
>> -#include 
>> +#include 
>>   #include 
>>     #include 
>> @@ -285,7 +285,7 @@ unsigned long __head __startup_64(unsigned long
>> physaddr,
>>    * there is no need to zero it after changing the memory encryption
>>    * attribute.
>>    */
>> -    if (mem_encrypt_active()) {
>> +    if (prot_guest_has(PATTR_MEM_ENCRYPT)) {
>>   vaddr = (unsigned long)__start_bss_decrypted;
>>   vaddr_end = (unsigned long)__end_bss_decrypted;
>>   for (; vaddr < vaddr_end; vaddr += PMD_SIZE) {
>> diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
>> index 0f2d5ace5986..5e1c1f5cbbe8 100644
>> --- a/arch/x86/mm/ioremap.c
>> +++ b/arch/x86/mm/ioremap.c
>> @@ -693,7 +693,7 @@ static bool __init
>> early_memremap_is_setup_data(resource_size_t phys_addr,
>>   bool arch_memremap_can_ram_remap(resource_size_t phys_addr, unsigned
>> long size,
>>    unsigned long flags)
>>   {
>> -    if (!mem_encrypt_active())
>> +    if (!prot_guest_has(PATTR_MEM_ENCRYPT))
>>   return true;
>>     if (flags & MEMREMAP_ENC)
>> @@ -723,7 +723,7 @@ pgprot_t __init
>> early_memremap_pgprot_adjust(resource_size_t phys_addr,
>>   {
>>   bool encrypted_prot;
>>   -    if (!mem_encrypt_active())
>> +    if (!prot_guest_has(PATTR_MEM_ENCRYPT))
>>   return prot;
>>     encrypted_prot = true;
>> diff --git a/arch/x86/mm/mem_encrypt.c b/arch/x86/mm/mem_encrypt.c
>> index 451de8e84fce..0f1533dbe81c 100644
>> --- a/arch/x86/mm/mem_encrypt.c
>> +++ b/arch/x86/mm/mem_encrypt.c
>> @@ -364,8 +364,7 @@ int __init early_set_memory_encrypted(unsigned long
>> vaddr, unsigned long size)
>>   /*
>>    * SME and SEV are very similar but they are not the same, so there are
>>    * times that the kernel will need to distinguish between SME and SEV.
>> The
>> - * sme_active() and sev_active() functions are used for this.  When a
>> - * distinction isn't needed, the mem_encrypt_active() function can be
>> used.
>> + * sme_active() and sev_active() functions are used for this.
>>    *
>>    * The trampoline code is a good example for this requirement.  Before
>>    * paging is activated, SME will access all memory as decrypted, but SEV
>> @@ -451,7 +450,7 @@ void __init mem_encrypt_free_decrypted_mem(void)
>>    * The unused memory range was mapped decrypted, change the
>> encryption
>>    * attribute from decrypted to encrypted before freeing it.
>>    */
>> -    if (mem_encrypt_active()) {
>> +    if (sme_me_mask) {
>>   r = set_memory_encrypted(vaddr, npages);
>>   if (r) {
>>   pr_warn("failed to free unused decrypted pages\n");
>> diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c

Re: [PATCH 06/11] x86/sev: Replace occurrences of sev_es_active() with prot_guest_has()

2021-08-09 Thread Tom Lendacky via iommu
On 8/2/21 5:45 AM, Joerg Roedel wrote:
> On Tue, Jul 27, 2021 at 05:26:09PM -0500, Tom Lendacky wrote:
>> @@ -48,7 +47,7 @@ static void sme_sev_setup_real_mode(struct 
>> trampoline_header *th)
>>  if (prot_guest_has(PATTR_HOST_MEM_ENCRYPT))
>>  th->flags |= TH_FLAGS_SME_ACTIVE;
>>  
>> -if (sev_es_active()) {
>> +if (prot_guest_has(PATTR_GUEST_PROT_STATE)) {
>>  /*
>>   * Skip the call to verify_cpu() in secondary_startup_64 as it
>>   * will cause #VC exceptions when the AP can't handle them yet.
> 
> Not sure how TDX will handle AP booting, are you sure it needs this
> special setup as well? Otherwise a check for SEV-ES would be better
> instead of the generic PATTR_GUEST_PROT_STATE.

Yes, I'm not sure either. I figure that change can be made, if needed, as
part of the TDX support.

Thanks,
Tom

> 
> Regards,
> 
> Joerg
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH 07/11] treewide: Replace the use of mem_encrypt_active() with prot_guest_has()

2021-08-09 Thread Tom Lendacky via iommu
On 7/30/21 5:34 PM, Sean Christopherson wrote:
> On Tue, Jul 27, 2021, Tom Lendacky wrote:
>> @@ -451,7 +450,7 @@ void __init mem_encrypt_free_decrypted_mem(void)
>>   * The unused memory range was mapped decrypted, change the encryption
>>   * attribute from decrypted to encrypted before freeing it.
>>   */
>> -if (mem_encrypt_active()) {
>> +if (sme_me_mask) {
> 
> Any reason this uses sme_me_mask?  The helper it calls, 
> __set_memory_enc_dec(),
> uses prot_guest_has(PATTR_MEM_ENCRYPT) so I assume it's available?

Probably just a slip on my part. I was debating at one point calling the
helper vs. referencing the variables/functions directly in the
mem_encrypt.c file.

Thanks,
Tom

> 
>>  r = set_memory_encrypted(vaddr, npages);
>>  if (r) {
>>  pr_warn("failed to free unused decrypted pages\n");
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 2/3] iommu/dma-iommu: Support iovad->granule > PAGE_SIZE

2021-08-09 Thread Sven Peter via iommu



On Mon, Aug 9, 2021, at 19:41, Robin Murphy wrote:
> On 2021-08-07 12:47, Sven Peter via iommu wrote:
> > 
> > 
> > On Fri, Aug 6, 2021, at 20:04, Robin Murphy wrote:
> >> On 2021-08-06 16:55, Sven Peter via iommu wrote:
> >>> @@ -1006,6 +1019,31 @@ static int iommu_dma_map_sg(struct device *dev, 
> >>> struct scatterlist *sg,
> >>>   if (dev_is_untrusted(dev))
> >>>   return iommu_dma_map_sg_swiotlb(dev, sg, nents, dir, 
> >>> attrs);
> >>>
> >>> + /*
> >>> +  * If the IOMMU pagesize is larger than the CPU pagesize we will
> >>> +  * very likely run into sgs with a physical address that is not aligned
> >>> +  * to an IOMMU page boundary. Fall back to just mapping every entry
> >>> +  * independently with __iommu_dma_map then.
> >>
> >> Scatterlist segments often don't have nicely aligned ends, which is why
> >> we already align things to IOVA granules in main loop here. I think in
> >> principle we'd just need to move the non-IOVA-aligned part of the
> >> address from sg->page to sg->offset in the temporary transformation for
> >> the rest of the assumptions to hold. I don't blame you for being timid
> >> about touching that, though - it took me 3 tries to get right when I
> >> first wrote it...
> >>
> > 
> > 
> > I've spent some time with that code now and I think we cannot use it
> > but have to fall back to iommu_dma_map_sg_swiotlb (even though that swiotlb
> > part is a lie then):
> > 
> > When we have sg_phys(s) = 0x802e65000 with s->offset = 0 the paddr
> > is aligned to PAGE_SIZE but has an offset of 0x1000 from something
> > the IOMMU can map.
> > Now this would result in s->offset = -0x1000 which is already weird
> > enough.
> > Offset is unsigned (and 32bit) so this will actually look like
> > s->offset = 0xf000 then, which isn't much better.
> > And then sg_phys(s) = 0x902e64000 (instead of 0x802e64000) and
> > we'll map some random memory in iommu_map_sg_atomic and a little bit later
> > everything explodes.
> > 
> > Now I could probably adjust the phys addr backwards and make sure offset is
> > always positive (and possibly larger than PAGE_SIZE) and later restore it
> > in __finalise_sg then but I feel like that's pushing this a little bit too 
> > far.
> 
> Yes, that's what I meant. At a quick guess, something like the
> completely untested diff below.

That unfortunately results in unaligned mappings

[9.630334] iommu: unaligned: iova 0xbff4 pa 0x000801a3b000 size 
0x4000 min_pagesz 0x4000

I'll take a closer look later this week and see if I can fix it.

> It really comes down to what we want to
> achieve here - if it's just to make this thing work at all, then I'd
> favour bolting on the absolute minimum changes, possibly even cheating
> by tainting the kernel and saying all bets are off instead of trying to
> handle the more involved corners really properly. However if you want to
> work towards this being a properly-supported thing, then I think it's
> worth generalising the existing assumptions of page alignment from the
> beginning.

I'd like to try and see if we can make this a properly-supported thing.

That will likely take a few iterations but realistically the rest of the drivers
required to make this platform actually useful (and especially the display 
controller
and GPU drivers) won't be ready for a few more months anyway. And even on 4KB 
PAGE_SIZE
kernels half the USB ports and NVMe will work fine, which should be enough to 
install
a distro and some third-party package that just ships the distro kernel with 
16KB
pages.




Sven

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 24/25] iommu/dma: Factor out flush queue init

2021-08-09 Thread Rajat Jain via iommu
On Mon, Aug 9, 2021 at 12:59 PM Robin Murphy  wrote:
>
> On 2021-08-09 20:05, Rajat Jain wrote:
> > On Wed, Aug 4, 2021 at 10:16 AM Robin Murphy  wrote:
> >>
> >> Factor out flush queue setup from the initial domain init so that we
> >> can potentially trigger it from sysfs later on in a domain's lifetime.
> >>
> >> Reviewed-by: Lu Baolu 
> >> Reviewed-by: John Garry 
> >> Signed-off-by: Robin Murphy 
> >> ---
> >>   drivers/iommu/dma-iommu.c | 30 --
> >>   include/linux/dma-iommu.h |  9 ++---
> >>   2 files changed, 26 insertions(+), 13 deletions(-)
> >>
> >> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> >> index 2e19505dddf9..f51b8dc99ac6 100644
> >> --- a/drivers/iommu/dma-iommu.c
> >> +++ b/drivers/iommu/dma-iommu.c
> >> @@ -310,6 +310,25 @@ static bool dev_is_untrusted(struct device *dev)
> >>  return dev_is_pci(dev) && to_pci_dev(dev)->untrusted;
> >>   }
> >>
> >> +int iommu_dma_init_fq(struct iommu_domain *domain)
> >> +{
> >> +   struct iommu_dma_cookie *cookie = domain->iova_cookie;
> >> +
> >> +   if (domain->type != IOMMU_DOMAIN_DMA_FQ)
> >> +   return -EINVAL;
> >> +   if (cookie->fq_domain)
> >> +   return 0;
> >> +
> >> +   if (init_iova_flush_queue(>iovad, 
> >> iommu_dma_flush_iotlb_all,
> >> + iommu_dma_entry_dtor)) {
> >> +   pr_warn("iova flush queue initialization failed\n");
> >> +   domain->type = IOMMU_DOMAIN_DMA;
> >> +   return -ENODEV;
> >> +   }
> >> +   cookie->fq_domain = domain;
> >> +   return 0;
> >> +}
> >> +
> >>   /**
> >>* iommu_dma_init_domain - Initialise a DMA mapping domain
> >>* @domain: IOMMU domain previously prepared by iommu_get_dma_cookie()
> >> @@ -362,16 +381,7 @@ static int iommu_dma_init_domain(struct iommu_domain 
> >> *domain, dma_addr_t base,
> >>  }
> >>
> >>  init_iova_domain(iovad, 1UL << order, base_pfn);
> >> -
> >> -   if (domain->type == IOMMU_DOMAIN_DMA_FQ && !cookie->fq_domain) {
> >> -   if (init_iova_flush_queue(iovad, iommu_dma_flush_iotlb_all,
> >> - iommu_dma_entry_dtor)) {
> >> -   pr_warn("iova flush queue initialization 
> >> failed\n");
> >> -   domain->type = IOMMU_DOMAIN_DMA;
> >> -   } else {
> >> -   cookie->fq_domain = domain;
> >> -   }
> >> -   }
> >> +   iommu_dma_init_fq(domain);
> >>
> >>  return iova_reserve_iommu_regions(dev, domain);
> >>   }
> >> diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
> >> index 758ca4694257..81ab647f1618 100644
> >> --- a/include/linux/dma-iommu.h
> >> +++ b/include/linux/dma-iommu.h
> >> @@ -20,6 +20,7 @@ void iommu_put_dma_cookie(struct iommu_domain *domain);
> >>
> >>   /* Setup call for arch DMA mapping code */
> >>   void iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 
> >> dma_limit);
> >> +int iommu_dma_init_fq(struct iommu_domain *domain);
> >>
> >>   /* The DMA API isn't _quite_ the whole story, though... */
> >>   /*
> >> @@ -37,9 +38,6 @@ void iommu_dma_compose_msi_msg(struct msi_desc *desc,
> >>
> >>   void iommu_dma_get_resv_regions(struct device *dev, struct list_head 
> >> *list);
> >>
> >> -void iommu_dma_free_cpu_cached_iovas(unsigned int cpu,
> >> -   struct iommu_domain *domain);
> >> -
> >
> > This looks like an unrelated code cleanup. Should this be a separate patch?
>
> Ha, busted! Much of this was done in the "stream of consciousness" style
> where I made a big sprawling mess then split it up into patches and
> branches afterwards. TBH it was already feeling pretty tenuous having a
> separate patch just to move this one function, and it only gets more so
> with the simplification Will pointed out earlier. I think I'll squash
> iommu_dma_init_fq() into the next patch then do a thorough header sweep,
> since I've now spotted some things in iova.h which could probably go as
> well.

Thank you. I chanced upon this only because I've backported your
patchset (and some other changes that it depends on) to 5.10 which is
the kernel we currently use for our Intel platforms, and this cleanup
hunk was creating a problem (since 5.10 still uses the symbol you
removed). I'll be giving your v3 patchset a spin in my setup and
update you in case I see any issue.

Thanks,

Rajat


>
> Thanks for the poke!
>
> Robin.
>
> >
> > Thanks,
> >
> > Rajat
> >
> >
> >>   extern bool iommu_dma_forcedac;
> >>
> >>   #else /* CONFIG_IOMMU_DMA */
> >> @@ -54,6 +52,11 @@ static inline void iommu_setup_dma_ops(struct device 
> >> *dev, u64 dma_base,
> >>   {
> >>   }
> >>
> >> +static inline int iommu_dma_init_fq(struct iommu_domain *domain)
> >> +{
> >> +   return -EINVAL;
> >> +}
> >> +
> >>   static inline int iommu_get_dma_cookie(struct iommu_domain *domain)
> >>   {
> >>  return -ENODEV;
> 

Re: [RFC PATCH 2/3] iommu/dma-iommu: Support iovad->granule > PAGE_SIZE

2021-08-09 Thread Sven Peter via iommu
Hi,

On Mon, Aug 9, 2021, at 20:37, Robin Murphy wrote:
> On 2021-08-07 09:41, Sven Peter wrote:
> > Hi,
> > 
> > Thanks a lot for quick reply!
> > 
> > On Fri, Aug 6, 2021, at 20:04, Robin Murphy wrote:
> >> On 2021-08-06 16:55, Sven Peter via iommu wrote:
> >>> DMA IOMMU domains can support hardware where the IOMMU page size is
> >>> larger than the CPU page size.
> >>> Alignments need to be done with respect to both PAGE_SIZE and
> >>> iovad->granule. Additionally, the sg list optimization to use a single
> >>> IOVA allocation cannot be used in those cases since the physical
> >>> addresses will very likely not be aligned to the larger IOMMU page size.
> >>>
> >>> Signed-off-by: Sven Peter 
> >>> ---
> >>>drivers/iommu/dma-iommu.c | 87 ++-
> >>>1 file changed, 77 insertions(+), 10 deletions(-)
> >>>
> >>> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> >>> index 6f0df629353f..e072d9030d9f 100644
> >>> --- a/drivers/iommu/dma-iommu.c
> >>> +++ b/drivers/iommu/dma-iommu.c
> >>> @@ -8,6 +8,7 @@
> >>> * Copyright (C) 2000-2004 Russell King
> >>> */
> >>>
> >>> +#include 
> >>>#include 
> >>>#include 
> >>>#include 
> >>> @@ -51,6 +52,15 @@ struct iommu_dma_cookie {
> >>>   struct iommu_domain *fq_domain;
> >>>};
> >>>
> >>> +/* aligns size to CPU and IOMMU page size */
> >>> +static inline size_t iommu_page_align(struct device *dev, size_t size)
> >>> +{
> >>> + struct iommu_domain *domain = iommu_get_dma_domain(dev);
> >>> + struct iommu_dma_cookie *cookie = domain->iova_cookie;
> >>> +
> >>> + return iova_align(>iovad, PAGE_ALIGN(size));
> >>> +}
> >>> +
> >>>static DEFINE_STATIC_KEY_FALSE(iommu_deferred_attach_enabled);
> >>>bool iommu_dma_forcedac __read_mostly;
> >>>
> >>> @@ -647,6 +657,8 @@ static struct page **__iommu_dma_alloc_pages(struct 
> >>> device *dev,
> >>>/*
> >>> * If size is less than PAGE_SIZE, then a full CPU page will be 
> >>> allocated,
> >>> * but an IOMMU which supports smaller pages might not map the whole 
> >>> thing.
> >>> + * If the IOMMU page size is larger than the CPU page size, then the size
> >>> + * will be aligned to that granularity and some memory will be left 
> >>> unused.
> >>
> >> Why do we need to increase the actual memory allocation? The point here
> >> is that we allocate the smallest thing we can allocate and map the
> >> smallest thing we can map - I think that still works the "wrong" way
> >> round too, we should just need to start taking an IOVA offset into
> >> account as in dma_map_page() if we can no longer assume it's 0 for a CPU
> >> page. Sure we may expose some unrelated adjacent pages, but we'll
> >> already be doing that to excess for streaming DMA so whoop de do.
> > 
> > I agree for trusted devices, but untrusted ones (Thunderbolt, and depending 
> > on your
> > risk tolerance possibly even the broadcom wifi) might also end up calling 
> > this.
> 
> Oh, right, I hadn't considered actual untrusted device support at this 
> stage.


Me neither :-)
I did the alignment at first without thinking too much about it,
then read your reply, and only *then* realized that there are untrusted devices
for which this just happens to do the right thing (at the cost of wasting
memory for everyone else, but I'll fix that).

> 
> > For streaming DMA swiotlb will make sure that these won't see memory
> > they're not supposed to access.
> 
> I was slightly surprised to see that that does appear to work out OK, 
> but I guess SWIOTLB slots are already smaller than typical IOMMU pages, 
> so it falls out of that. Neat.
> 
> > But, at least as far as I understand it, no swiotlb is in the way to catch 
> > devices
> > who end up calling this function. That wasn't required because we used to 
> > get
> > PAGE_SIZE aligned allocation here and every IOMMU so far would be able to 
> > easily
> > map them without any spill overs.
> > But now we'll end up exposing three more unrelated pages if the allocation
> > is not increased.
> 
> Fair enough, but then why still waste memory in the (usual) cases where 
> it logically *isn't* necessary?

See above, didn't even consider this either.

I should be able to fix this by first allocating <= 3 pages to reach a
iovad->granule boundary (or just satisfy the full allocation length already),
adjust the DMA offset forward for this first allocation and then just keep
allocating iovad->granule blocks for the rest of the requested size.

The current code already does the second part and I think I'll just need
to add the first alignment part.

> 
> >>> */
> >>>static struct page **__iommu_dma_alloc_noncontiguous(struct device 
> >>> *dev,
> >>>   size_t size, struct sg_table *sgt, gfp_t gfp, pgprot_t 
> >>> prot,
> >>> @@ -736,7 +748,7 @@ static void *iommu_dma_alloc_remap(struct device 
> >>> *dev, size_t size,
> >>>
> >>>out_unmap:
> >>>   

Re: [PATCH v3 24/25] iommu/dma: Factor out flush queue init

2021-08-09 Thread Robin Murphy

On 2021-08-09 20:05, Rajat Jain wrote:

On Wed, Aug 4, 2021 at 10:16 AM Robin Murphy  wrote:


Factor out flush queue setup from the initial domain init so that we
can potentially trigger it from sysfs later on in a domain's lifetime.

Reviewed-by: Lu Baolu 
Reviewed-by: John Garry 
Signed-off-by: Robin Murphy 
---
  drivers/iommu/dma-iommu.c | 30 --
  include/linux/dma-iommu.h |  9 ++---
  2 files changed, 26 insertions(+), 13 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 2e19505dddf9..f51b8dc99ac6 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -310,6 +310,25 @@ static bool dev_is_untrusted(struct device *dev)
 return dev_is_pci(dev) && to_pci_dev(dev)->untrusted;
  }

+int iommu_dma_init_fq(struct iommu_domain *domain)
+{
+   struct iommu_dma_cookie *cookie = domain->iova_cookie;
+
+   if (domain->type != IOMMU_DOMAIN_DMA_FQ)
+   return -EINVAL;
+   if (cookie->fq_domain)
+   return 0;
+
+   if (init_iova_flush_queue(>iovad, iommu_dma_flush_iotlb_all,
+ iommu_dma_entry_dtor)) {
+   pr_warn("iova flush queue initialization failed\n");
+   domain->type = IOMMU_DOMAIN_DMA;
+   return -ENODEV;
+   }
+   cookie->fq_domain = domain;
+   return 0;
+}
+
  /**
   * iommu_dma_init_domain - Initialise a DMA mapping domain
   * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie()
@@ -362,16 +381,7 @@ static int iommu_dma_init_domain(struct iommu_domain 
*domain, dma_addr_t base,
 }

 init_iova_domain(iovad, 1UL << order, base_pfn);
-
-   if (domain->type == IOMMU_DOMAIN_DMA_FQ && !cookie->fq_domain) {
-   if (init_iova_flush_queue(iovad, iommu_dma_flush_iotlb_all,
- iommu_dma_entry_dtor)) {
-   pr_warn("iova flush queue initialization failed\n");
-   domain->type = IOMMU_DOMAIN_DMA;
-   } else {
-   cookie->fq_domain = domain;
-   }
-   }
+   iommu_dma_init_fq(domain);

 return iova_reserve_iommu_regions(dev, domain);
  }
diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
index 758ca4694257..81ab647f1618 100644
--- a/include/linux/dma-iommu.h
+++ b/include/linux/dma-iommu.h
@@ -20,6 +20,7 @@ void iommu_put_dma_cookie(struct iommu_domain *domain);

  /* Setup call for arch DMA mapping code */
  void iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 dma_limit);
+int iommu_dma_init_fq(struct iommu_domain *domain);

  /* The DMA API isn't _quite_ the whole story, though... */
  /*
@@ -37,9 +38,6 @@ void iommu_dma_compose_msi_msg(struct msi_desc *desc,

  void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list);

-void iommu_dma_free_cpu_cached_iovas(unsigned int cpu,
-   struct iommu_domain *domain);
-


This looks like an unrelated code cleanup. Should this be a separate patch?


Ha, busted! Much of this was done in the "stream of consciousness" style 
where I made a big sprawling mess then split it up into patches and 
branches afterwards. TBH it was already feeling pretty tenuous having a 
separate patch just to move this one function, and it only gets more so 
with the simplification Will pointed out earlier. I think I'll squash 
iommu_dma_init_fq() into the next patch then do a thorough header sweep, 
since I've now spotted some things in iova.h which could probably go as 
well.


Thanks for the poke!

Robin.



Thanks,

Rajat



  extern bool iommu_dma_forcedac;

  #else /* CONFIG_IOMMU_DMA */
@@ -54,6 +52,11 @@ static inline void iommu_setup_dma_ops(struct device *dev, 
u64 dma_base,
  {
  }

+static inline int iommu_dma_init_fq(struct iommu_domain *domain)
+{
+   return -EINVAL;
+}
+
  static inline int iommu_get_dma_cookie(struct iommu_domain *domain)
  {
 return -ENODEV;
--
2.25.1


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 24/25] iommu/dma: Factor out flush queue init

2021-08-09 Thread Rajat Jain via iommu
On Wed, Aug 4, 2021 at 10:16 AM Robin Murphy  wrote:
>
> Factor out flush queue setup from the initial domain init so that we
> can potentially trigger it from sysfs later on in a domain's lifetime.
>
> Reviewed-by: Lu Baolu 
> Reviewed-by: John Garry 
> Signed-off-by: Robin Murphy 
> ---
>  drivers/iommu/dma-iommu.c | 30 --
>  include/linux/dma-iommu.h |  9 ++---
>  2 files changed, 26 insertions(+), 13 deletions(-)
>
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 2e19505dddf9..f51b8dc99ac6 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -310,6 +310,25 @@ static bool dev_is_untrusted(struct device *dev)
> return dev_is_pci(dev) && to_pci_dev(dev)->untrusted;
>  }
>
> +int iommu_dma_init_fq(struct iommu_domain *domain)
> +{
> +   struct iommu_dma_cookie *cookie = domain->iova_cookie;
> +
> +   if (domain->type != IOMMU_DOMAIN_DMA_FQ)
> +   return -EINVAL;
> +   if (cookie->fq_domain)
> +   return 0;
> +
> +   if (init_iova_flush_queue(>iovad, iommu_dma_flush_iotlb_all,
> + iommu_dma_entry_dtor)) {
> +   pr_warn("iova flush queue initialization failed\n");
> +   domain->type = IOMMU_DOMAIN_DMA;
> +   return -ENODEV;
> +   }
> +   cookie->fq_domain = domain;
> +   return 0;
> +}
> +
>  /**
>   * iommu_dma_init_domain - Initialise a DMA mapping domain
>   * @domain: IOMMU domain previously prepared by iommu_get_dma_cookie()
> @@ -362,16 +381,7 @@ static int iommu_dma_init_domain(struct iommu_domain 
> *domain, dma_addr_t base,
> }
>
> init_iova_domain(iovad, 1UL << order, base_pfn);
> -
> -   if (domain->type == IOMMU_DOMAIN_DMA_FQ && !cookie->fq_domain) {
> -   if (init_iova_flush_queue(iovad, iommu_dma_flush_iotlb_all,
> - iommu_dma_entry_dtor)) {
> -   pr_warn("iova flush queue initialization failed\n");
> -   domain->type = IOMMU_DOMAIN_DMA;
> -   } else {
> -   cookie->fq_domain = domain;
> -   }
> -   }
> +   iommu_dma_init_fq(domain);
>
> return iova_reserve_iommu_regions(dev, domain);
>  }
> diff --git a/include/linux/dma-iommu.h b/include/linux/dma-iommu.h
> index 758ca4694257..81ab647f1618 100644
> --- a/include/linux/dma-iommu.h
> +++ b/include/linux/dma-iommu.h
> @@ -20,6 +20,7 @@ void iommu_put_dma_cookie(struct iommu_domain *domain);
>
>  /* Setup call for arch DMA mapping code */
>  void iommu_setup_dma_ops(struct device *dev, u64 dma_base, u64 dma_limit);
> +int iommu_dma_init_fq(struct iommu_domain *domain);
>
>  /* The DMA API isn't _quite_ the whole story, though... */
>  /*
> @@ -37,9 +38,6 @@ void iommu_dma_compose_msi_msg(struct msi_desc *desc,
>
>  void iommu_dma_get_resv_regions(struct device *dev, struct list_head *list);
>
> -void iommu_dma_free_cpu_cached_iovas(unsigned int cpu,
> -   struct iommu_domain *domain);
> -

This looks like an unrelated code cleanup. Should this be a separate patch?

Thanks,

Rajat


>  extern bool iommu_dma_forcedac;
>
>  #else /* CONFIG_IOMMU_DMA */
> @@ -54,6 +52,11 @@ static inline void iommu_setup_dma_ops(struct device *dev, 
> u64 dma_base,
>  {
>  }
>
> +static inline int iommu_dma_init_fq(struct iommu_domain *domain)
> +{
> +   return -EINVAL;
> +}
> +
>  static inline int iommu_get_dma_cookie(struct iommu_domain *domain)
>  {
> return -ENODEV;
> --
> 2.25.1
>
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [RFC PATCH 2/3] iommu/dma-iommu: Support iovad->granule > PAGE_SIZE

2021-08-09 Thread Robin Murphy

On 2021-08-07 09:41, Sven Peter wrote:

Hi,

Thanks a lot for quick reply!

On Fri, Aug 6, 2021, at 20:04, Robin Murphy wrote:

On 2021-08-06 16:55, Sven Peter via iommu wrote:

DMA IOMMU domains can support hardware where the IOMMU page size is
larger than the CPU page size.
Alignments need to be done with respect to both PAGE_SIZE and
iovad->granule. Additionally, the sg list optimization to use a single
IOVA allocation cannot be used in those cases since the physical
addresses will very likely not be aligned to the larger IOMMU page size.

Signed-off-by: Sven Peter 
---
   drivers/iommu/dma-iommu.c | 87 ++-
   1 file changed, 77 insertions(+), 10 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 6f0df629353f..e072d9030d9f 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -8,6 +8,7 @@
* Copyright (C) 2000-2004 Russell King
*/
   
+#include 

   #include 
   #include 
   #include 
@@ -51,6 +52,15 @@ struct iommu_dma_cookie {
struct iommu_domain *fq_domain;
   };
   
+/* aligns size to CPU and IOMMU page size */

+static inline size_t iommu_page_align(struct device *dev, size_t size)
+{
+   struct iommu_domain *domain = iommu_get_dma_domain(dev);
+   struct iommu_dma_cookie *cookie = domain->iova_cookie;
+
+   return iova_align(>iovad, PAGE_ALIGN(size));
+}
+
   static DEFINE_STATIC_KEY_FALSE(iommu_deferred_attach_enabled);
   bool iommu_dma_forcedac __read_mostly;
   
@@ -647,6 +657,8 @@ static struct page **__iommu_dma_alloc_pages(struct device *dev,

   /*
* If size is less than PAGE_SIZE, then a full CPU page will be allocated,
* but an IOMMU which supports smaller pages might not map the whole thing.
+ * If the IOMMU page size is larger than the CPU page size, then the size
+ * will be aligned to that granularity and some memory will be left unused.


Why do we need to increase the actual memory allocation? The point here
is that we allocate the smallest thing we can allocate and map the
smallest thing we can map - I think that still works the "wrong" way
round too, we should just need to start taking an IOVA offset into
account as in dma_map_page() if we can no longer assume it's 0 for a CPU
page. Sure we may expose some unrelated adjacent pages, but we'll
already be doing that to excess for streaming DMA so whoop de do.


I agree for trusted devices, but untrusted ones (Thunderbolt, and depending on 
your
risk tolerance possibly even the broadcom wifi) might also end up calling this.


Oh, right, I hadn't considered actual untrusted device support at this 
stage.



For streaming DMA swiotlb will make sure that these won't see memory
they're not supposed to access.


I was slightly surprised to see that that does appear to work out OK, 
but I guess SWIOTLB slots are already smaller than typical IOMMU pages, 
so it falls out of that. Neat.



But, at least as far as I understand it, no swiotlb is in the way to catch 
devices
who end up calling this function. That wasn't required because we used to get
PAGE_SIZE aligned allocation here and every IOMMU so far would be able to easily
map them without any spill overs.
But now we'll end up exposing three more unrelated pages if the allocation
is not increased.


Fair enough, but then why still waste memory in the (usual) cases where 
it logically *isn't* necessary?



*/
   static struct page **__iommu_dma_alloc_noncontiguous(struct device *dev,
size_t size, struct sg_table *sgt, gfp_t gfp, pgprot_t prot,
@@ -736,7 +748,7 @@ static void *iommu_dma_alloc_remap(struct device *dev, 
size_t size,
   
   out_unmap:

__iommu_dma_unmap(dev, *dma_handle, size);
-   __iommu_dma_free_pages(pages, PAGE_ALIGN(size) >> PAGE_SHIFT);
+   __iommu_dma_free_pages(pages, iommu_page_align(dev, size) >> 
PAGE_SHIFT);
return NULL;
   }
   
@@ -766,7 +778,8 @@ static void iommu_dma_free_noncontiguous(struct device *dev, size_t size,

struct dma_sgt_handle *sh = sgt_handle(sgt);
   
   	__iommu_dma_unmap(dev, sgt->sgl->dma_address, size);

-   __iommu_dma_free_pages(sh->pages, PAGE_ALIGN(size) >> PAGE_SHIFT);
+   __iommu_dma_free_pages(sh->pages,
+   iommu_page_align(dev, size) >> PAGE_SHIFT);
sg_free_table(>sgt);
kfree(sh);
   }
@@ -1006,6 +1019,31 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
if (dev_is_untrusted(dev))
return iommu_dma_map_sg_swiotlb(dev, sg, nents, dir, attrs);
   
+	/*

+* If the IOMMU pagesize is larger than the CPU pagesize we will
+* very likely run into sgs with a physical address that is not aligned
+* to an IOMMU page boundary. Fall back to just mapping every entry
+* independently with __iommu_dma_map then.


Scatterlist segments often don't have nicely aligned ends, which is why
we already align things to IOVA granules in main loop 

Re: [Freedreno] [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache

2021-08-09 Thread Sai Prakash Ranjan

On 2021-08-10 00:00, Rob Clark wrote:

On Mon, Aug 9, 2021 at 11:11 AM Sai Prakash Ranjan
 wrote:


On 2021-08-09 23:37, Rob Clark wrote:
> On Mon, Aug 9, 2021 at 10:47 AM Sai Prakash Ranjan
>  wrote:
>>
>> On 2021-08-09 23:10, Will Deacon wrote:
>> > On Mon, Aug 09, 2021 at 10:18:21AM -0700, Rob Clark wrote:
>> >> On Mon, Aug 9, 2021 at 10:05 AM Will Deacon  wrote:
>> >> >
>> >> > On Mon, Aug 09, 2021 at 09:57:08AM -0700, Rob Clark wrote:
>> >> > > On Mon, Aug 9, 2021 at 7:56 AM Will Deacon  wrote:
>> >> > > > On Mon, Aug 02, 2021 at 06:36:04PM -0700, Rob Clark wrote:
>> >> > > > > On Mon, Aug 2, 2021 at 8:14 AM Will Deacon  
wrote:
>> >> > > > > > On Mon, Aug 02, 2021 at 08:08:07AM -0700, Rob Clark wrote:
>> >> > > > > > > On Mon, Aug 2, 2021 at 3:55 AM Will Deacon  
wrote:
>> >> > > > > > > > On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash 
Ranjan wrote:
>> >> > > > > > > > > On 2021-07-28 19:30, Georgi Djakov wrote:
>> >> > > > > > > > > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash 
Ranjan wrote:
>> >> > > > > > > > > > > commit ecd7274fb4cd ("iommu: Remove unused 
IOMMU_SYS_CACHE_ONLY flag")
>> >> > > > > > > > > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and 
along with it went
>> >> > > > > > > > > > > the memory type setting required for the non-coherent 
masters to use
>> >> > > > > > > > > > > system cache. Now that system cache support for GPU 
is added, we will
>> >> > > > > > > > > > > need to set the right PTE attribute for GPU buffers 
to be sys cached.
>> >> > > > > > > > > > > Without this, the system cache lines are not 
allocated for GPU.
>> >> > > > > > > > > > >
>> >> > > > > > > > > > > So the patches in this series introduces a new prot 
flag IOMMU_LLC,
>> >> > > > > > > > > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to 
IO_PGTABLE_QUIRK_PTW_LLC
>> >> > > > > > > > > > > and makes GPU the user of this protection flag.
>> >> > > > > > > > > >
>> >> > > > > > > > > > Thank you for the patchset! Are you planning to refresh 
it, as it does
>> >> > > > > > > > > > not apply anymore?
>> >> > > > > > > > > >
>> >> > > > > > > > >
>> >> > > > > > > > > I was waiting on Will's reply [1]. If there are no 
changes needed, then
>> >> > > > > > > > > I can repost the patch.
>> >> > > > > > > >
>> >> > > > > > > > I still think you need to handle the mismatched alias, no? 
You're adding
>> >> > > > > > > > a new memory type to the SMMU which doesn't exist on the 
CPU side. That
>> >> > > > > > > > can't be right.
>> >> > > > > > > >
>> >> > > > > > >
>> >> > > > > > > Just curious, and maybe this is a dumb question, but what is 
your
>> >> > > > > > > concern about mismatched aliases?  I mean the cache hierarchy 
on the
>> >> > > > > > > GPU device side (anything beyond the LLC) is pretty different 
and
>> >> > > > > > > doesn't really care about the smmu pgtable attributes..
>> >> > > > > >
>> >> > > > > > If the CPU accesses a shared buffer with different attributes 
to those which
>> >> > > > > > the device is using then you fall into the "mismatched memory 
attributes"
>> >> > > > > > part of the Arm architecture. It's reasonably unforgiving (you 
should go and
>> >> > > > > > read it) and in some cases can apply to speculative accesses as 
well, but
>> >> > > > > > the end result is typically loss of coherency.
>> >> > > > >
>> >> > > > > Ok, I might have a few other sections to read first to decipher 
the
>> >> > > > > terminology..
>> >> > > > >
>> >> > > > > But my understanding of LLC is that it looks just like system 
memory
>> >> > > > > to the CPU and GPU (I think that would make it "the point of
>> >> > > > > coherence" between the GPU and CPU?)  If that is true, shouldn't 
it be
>> >> > > > > invisible from the point of view of different CPU mapping options?
>> >> > > >
>> >> > > > You could certainly build a system where mismatched attributes 
don't cause
>> >> > > > loss of coherence, but as it's not guaranteed by the architecture 
and the
>> >> > > > changes proposed here affect APIs which are exposed across SoCs, 
then I
>> >> > > > don't think it helps much.
>> >> > > >
>> >> > >
>> >> > > Hmm, the description of the new mapping flag is that it applies only
>> >> > > to transparent outer level cache:
>> >> > >
>> >> > > +/*
>> >> > > + * Non-coherent masters can use this page protection flag to set 
cacheable
>> >> > > + * memory attributes for only a transparent outer level of cache, 
also known as
>> >> > > + * the last-level or system cache.
>> >> > > + */
>> >> > > +#define IOMMU_LLC  (1 << 6)
>> >> > >
>> >> > > But I suppose we could call it instead IOMMU_QCOM_LLC or something
>> >> > > like that to make it more clear that it is not necessarily something
>> >> > > that would work with a different outer level cache implementation?
>> >> >
>> >> > ... or we could just deal with the problem so that other people can 
reuse
>> >> > the code. I haven't really understood the reluctance to solve this 
properly.
>> >> >
>> >> > Am I missing some 

Re: [Freedreno] [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache

2021-08-09 Thread Rob Clark
On Mon, Aug 9, 2021 at 11:11 AM Sai Prakash Ranjan
 wrote:
>
> On 2021-08-09 23:37, Rob Clark wrote:
> > On Mon, Aug 9, 2021 at 10:47 AM Sai Prakash Ranjan
> >  wrote:
> >>
> >> On 2021-08-09 23:10, Will Deacon wrote:
> >> > On Mon, Aug 09, 2021 at 10:18:21AM -0700, Rob Clark wrote:
> >> >> On Mon, Aug 9, 2021 at 10:05 AM Will Deacon  wrote:
> >> >> >
> >> >> > On Mon, Aug 09, 2021 at 09:57:08AM -0700, Rob Clark wrote:
> >> >> > > On Mon, Aug 9, 2021 at 7:56 AM Will Deacon  wrote:
> >> >> > > > On Mon, Aug 02, 2021 at 06:36:04PM -0700, Rob Clark wrote:
> >> >> > > > > On Mon, Aug 2, 2021 at 8:14 AM Will Deacon  
> >> >> > > > > wrote:
> >> >> > > > > > On Mon, Aug 02, 2021 at 08:08:07AM -0700, Rob Clark wrote:
> >> >> > > > > > > On Mon, Aug 2, 2021 at 3:55 AM Will Deacon 
> >> >> > > > > > >  wrote:
> >> >> > > > > > > > On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash 
> >> >> > > > > > > > Ranjan wrote:
> >> >> > > > > > > > > On 2021-07-28 19:30, Georgi Djakov wrote:
> >> >> > > > > > > > > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash 
> >> >> > > > > > > > > > Ranjan wrote:
> >> >> > > > > > > > > > > commit ecd7274fb4cd ("iommu: Remove unused 
> >> >> > > > > > > > > > > IOMMU_SYS_CACHE_ONLY flag")
> >> >> > > > > > > > > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and 
> >> >> > > > > > > > > > > along with it went
> >> >> > > > > > > > > > > the memory type setting required for the 
> >> >> > > > > > > > > > > non-coherent masters to use
> >> >> > > > > > > > > > > system cache. Now that system cache support for GPU 
> >> >> > > > > > > > > > > is added, we will
> >> >> > > > > > > > > > > need to set the right PTE attribute for GPU buffers 
> >> >> > > > > > > > > > > to be sys cached.
> >> >> > > > > > > > > > > Without this, the system cache lines are not 
> >> >> > > > > > > > > > > allocated for GPU.
> >> >> > > > > > > > > > >
> >> >> > > > > > > > > > > So the patches in this series introduces a new prot 
> >> >> > > > > > > > > > > flag IOMMU_LLC,
> >> >> > > > > > > > > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to 
> >> >> > > > > > > > > > > IO_PGTABLE_QUIRK_PTW_LLC
> >> >> > > > > > > > > > > and makes GPU the user of this protection flag.
> >> >> > > > > > > > > >
> >> >> > > > > > > > > > Thank you for the patchset! Are you planning to 
> >> >> > > > > > > > > > refresh it, as it does
> >> >> > > > > > > > > > not apply anymore?
> >> >> > > > > > > > > >
> >> >> > > > > > > > >
> >> >> > > > > > > > > I was waiting on Will's reply [1]. If there are no 
> >> >> > > > > > > > > changes needed, then
> >> >> > > > > > > > > I can repost the patch.
> >> >> > > > > > > >
> >> >> > > > > > > > I still think you need to handle the mismatched alias, 
> >> >> > > > > > > > no? You're adding
> >> >> > > > > > > > a new memory type to the SMMU which doesn't exist on the 
> >> >> > > > > > > > CPU side. That
> >> >> > > > > > > > can't be right.
> >> >> > > > > > > >
> >> >> > > > > > >
> >> >> > > > > > > Just curious, and maybe this is a dumb question, but what 
> >> >> > > > > > > is your
> >> >> > > > > > > concern about mismatched aliases?  I mean the cache 
> >> >> > > > > > > hierarchy on the
> >> >> > > > > > > GPU device side (anything beyond the LLC) is pretty 
> >> >> > > > > > > different and
> >> >> > > > > > > doesn't really care about the smmu pgtable attributes..
> >> >> > > > > >
> >> >> > > > > > If the CPU accesses a shared buffer with different attributes 
> >> >> > > > > > to those which
> >> >> > > > > > the device is using then you fall into the "mismatched memory 
> >> >> > > > > > attributes"
> >> >> > > > > > part of the Arm architecture. It's reasonably unforgiving 
> >> >> > > > > > (you should go and
> >> >> > > > > > read it) and in some cases can apply to speculative accesses 
> >> >> > > > > > as well, but
> >> >> > > > > > the end result is typically loss of coherency.
> >> >> > > > >
> >> >> > > > > Ok, I might have a few other sections to read first to decipher 
> >> >> > > > > the
> >> >> > > > > terminology..
> >> >> > > > >
> >> >> > > > > But my understanding of LLC is that it looks just like system 
> >> >> > > > > memory
> >> >> > > > > to the CPU and GPU (I think that would make it "the point of
> >> >> > > > > coherence" between the GPU and CPU?)  If that is true, 
> >> >> > > > > shouldn't it be
> >> >> > > > > invisible from the point of view of different CPU mapping 
> >> >> > > > > options?
> >> >> > > >
> >> >> > > > You could certainly build a system where mismatched attributes 
> >> >> > > > don't cause
> >> >> > > > loss of coherence, but as it's not guaranteed by the architecture 
> >> >> > > > and the
> >> >> > > > changes proposed here affect APIs which are exposed across SoCs, 
> >> >> > > > then I
> >> >> > > > don't think it helps much.
> >> >> > > >
> >> >> > >
> >> >> > > Hmm, the description of the new mapping flag is that it applies only
> >> >> > > to transparent outer level cache:
> >> >> > >
> >> >> > > +/*
> 

Re: [Freedreno] [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache

2021-08-09 Thread Sai Prakash Ranjan

On 2021-08-09 23:37, Rob Clark wrote:

On Mon, Aug 9, 2021 at 10:47 AM Sai Prakash Ranjan
 wrote:


On 2021-08-09 23:10, Will Deacon wrote:
> On Mon, Aug 09, 2021 at 10:18:21AM -0700, Rob Clark wrote:
>> On Mon, Aug 9, 2021 at 10:05 AM Will Deacon  wrote:
>> >
>> > On Mon, Aug 09, 2021 at 09:57:08AM -0700, Rob Clark wrote:
>> > > On Mon, Aug 9, 2021 at 7:56 AM Will Deacon  wrote:
>> > > > On Mon, Aug 02, 2021 at 06:36:04PM -0700, Rob Clark wrote:
>> > > > > On Mon, Aug 2, 2021 at 8:14 AM Will Deacon  wrote:
>> > > > > > On Mon, Aug 02, 2021 at 08:08:07AM -0700, Rob Clark wrote:
>> > > > > > > On Mon, Aug 2, 2021 at 3:55 AM Will Deacon  
wrote:
>> > > > > > > > On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash Ranjan 
wrote:
>> > > > > > > > > On 2021-07-28 19:30, Georgi Djakov wrote:
>> > > > > > > > > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash 
Ranjan wrote:
>> > > > > > > > > > > commit ecd7274fb4cd ("iommu: Remove unused 
IOMMU_SYS_CACHE_ONLY flag")
>> > > > > > > > > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along 
with it went
>> > > > > > > > > > > the memory type setting required for the non-coherent 
masters to use
>> > > > > > > > > > > system cache. Now that system cache support for GPU is 
added, we will
>> > > > > > > > > > > need to set the right PTE attribute for GPU buffers to 
be sys cached.
>> > > > > > > > > > > Without this, the system cache lines are not allocated 
for GPU.
>> > > > > > > > > > >
>> > > > > > > > > > > So the patches in this series introduces a new prot flag 
IOMMU_LLC,
>> > > > > > > > > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to 
IO_PGTABLE_QUIRK_PTW_LLC
>> > > > > > > > > > > and makes GPU the user of this protection flag.
>> > > > > > > > > >
>> > > > > > > > > > Thank you for the patchset! Are you planning to refresh 
it, as it does
>> > > > > > > > > > not apply anymore?
>> > > > > > > > > >
>> > > > > > > > >
>> > > > > > > > > I was waiting on Will's reply [1]. If there are no changes 
needed, then
>> > > > > > > > > I can repost the patch.
>> > > > > > > >
>> > > > > > > > I still think you need to handle the mismatched alias, no? 
You're adding
>> > > > > > > > a new memory type to the SMMU which doesn't exist on the CPU 
side. That
>> > > > > > > > can't be right.
>> > > > > > > >
>> > > > > > >
>> > > > > > > Just curious, and maybe this is a dumb question, but what is your
>> > > > > > > concern about mismatched aliases?  I mean the cache hierarchy on 
the
>> > > > > > > GPU device side (anything beyond the LLC) is pretty different and
>> > > > > > > doesn't really care about the smmu pgtable attributes..
>> > > > > >
>> > > > > > If the CPU accesses a shared buffer with different attributes to 
those which
>> > > > > > the device is using then you fall into the "mismatched memory 
attributes"
>> > > > > > part of the Arm architecture. It's reasonably unforgiving (you 
should go and
>> > > > > > read it) and in some cases can apply to speculative accesses as 
well, but
>> > > > > > the end result is typically loss of coherency.
>> > > > >
>> > > > > Ok, I might have a few other sections to read first to decipher the
>> > > > > terminology..
>> > > > >
>> > > > > But my understanding of LLC is that it looks just like system memory
>> > > > > to the CPU and GPU (I think that would make it "the point of
>> > > > > coherence" between the GPU and CPU?)  If that is true, shouldn't it 
be
>> > > > > invisible from the point of view of different CPU mapping options?
>> > > >
>> > > > You could certainly build a system where mismatched attributes don't 
cause
>> > > > loss of coherence, but as it's not guaranteed by the architecture and 
the
>> > > > changes proposed here affect APIs which are exposed across SoCs, then I
>> > > > don't think it helps much.
>> > > >
>> > >
>> > > Hmm, the description of the new mapping flag is that it applies only
>> > > to transparent outer level cache:
>> > >
>> > > +/*
>> > > + * Non-coherent masters can use this page protection flag to set 
cacheable
>> > > + * memory attributes for only a transparent outer level of cache, also 
known as
>> > > + * the last-level or system cache.
>> > > + */
>> > > +#define IOMMU_LLC  (1 << 6)
>> > >
>> > > But I suppose we could call it instead IOMMU_QCOM_LLC or something
>> > > like that to make it more clear that it is not necessarily something
>> > > that would work with a different outer level cache implementation?
>> >
>> > ... or we could just deal with the problem so that other people can reuse
>> > the code. I haven't really understood the reluctance to solve this 
properly.
>> >
>> > Am I missing some reason this isn't solvable?
>>
>> Oh, was there another way to solve it (other than foregoing setting
>> INC_OCACHE in the pgtables)?  Maybe I misunderstood, is there a
>> corresponding setting on the MMU pgtables side of things?
>
> Right -- we just need to program the CPU's MMU with the matching memory
> attributes! It's a bit more fiddly if you're 

Re: [Freedreno] [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache

2021-08-09 Thread Rob Clark
On Mon, Aug 9, 2021 at 10:47 AM Sai Prakash Ranjan
 wrote:
>
> On 2021-08-09 23:10, Will Deacon wrote:
> > On Mon, Aug 09, 2021 at 10:18:21AM -0700, Rob Clark wrote:
> >> On Mon, Aug 9, 2021 at 10:05 AM Will Deacon  wrote:
> >> >
> >> > On Mon, Aug 09, 2021 at 09:57:08AM -0700, Rob Clark wrote:
> >> > > On Mon, Aug 9, 2021 at 7:56 AM Will Deacon  wrote:
> >> > > > On Mon, Aug 02, 2021 at 06:36:04PM -0700, Rob Clark wrote:
> >> > > > > On Mon, Aug 2, 2021 at 8:14 AM Will Deacon  wrote:
> >> > > > > > On Mon, Aug 02, 2021 at 08:08:07AM -0700, Rob Clark wrote:
> >> > > > > > > On Mon, Aug 2, 2021 at 3:55 AM Will Deacon  
> >> > > > > > > wrote:
> >> > > > > > > > On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash Ranjan 
> >> > > > > > > > wrote:
> >> > > > > > > > > On 2021-07-28 19:30, Georgi Djakov wrote:
> >> > > > > > > > > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash 
> >> > > > > > > > > > Ranjan wrote:
> >> > > > > > > > > > > commit ecd7274fb4cd ("iommu: Remove unused 
> >> > > > > > > > > > > IOMMU_SYS_CACHE_ONLY flag")
> >> > > > > > > > > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and 
> >> > > > > > > > > > > along with it went
> >> > > > > > > > > > > the memory type setting required for the non-coherent 
> >> > > > > > > > > > > masters to use
> >> > > > > > > > > > > system cache. Now that system cache support for GPU is 
> >> > > > > > > > > > > added, we will
> >> > > > > > > > > > > need to set the right PTE attribute for GPU buffers to 
> >> > > > > > > > > > > be sys cached.
> >> > > > > > > > > > > Without this, the system cache lines are not allocated 
> >> > > > > > > > > > > for GPU.
> >> > > > > > > > > > >
> >> > > > > > > > > > > So the patches in this series introduces a new prot 
> >> > > > > > > > > > > flag IOMMU_LLC,
> >> > > > > > > > > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to 
> >> > > > > > > > > > > IO_PGTABLE_QUIRK_PTW_LLC
> >> > > > > > > > > > > and makes GPU the user of this protection flag.
> >> > > > > > > > > >
> >> > > > > > > > > > Thank you for the patchset! Are you planning to refresh 
> >> > > > > > > > > > it, as it does
> >> > > > > > > > > > not apply anymore?
> >> > > > > > > > > >
> >> > > > > > > > >
> >> > > > > > > > > I was waiting on Will's reply [1]. If there are no changes 
> >> > > > > > > > > needed, then
> >> > > > > > > > > I can repost the patch.
> >> > > > > > > >
> >> > > > > > > > I still think you need to handle the mismatched alias, no? 
> >> > > > > > > > You're adding
> >> > > > > > > > a new memory type to the SMMU which doesn't exist on the CPU 
> >> > > > > > > > side. That
> >> > > > > > > > can't be right.
> >> > > > > > > >
> >> > > > > > >
> >> > > > > > > Just curious, and maybe this is a dumb question, but what is 
> >> > > > > > > your
> >> > > > > > > concern about mismatched aliases?  I mean the cache hierarchy 
> >> > > > > > > on the
> >> > > > > > > GPU device side (anything beyond the LLC) is pretty different 
> >> > > > > > > and
> >> > > > > > > doesn't really care about the smmu pgtable attributes..
> >> > > > > >
> >> > > > > > If the CPU accesses a shared buffer with different attributes to 
> >> > > > > > those which
> >> > > > > > the device is using then you fall into the "mismatched memory 
> >> > > > > > attributes"
> >> > > > > > part of the Arm architecture. It's reasonably unforgiving (you 
> >> > > > > > should go and
> >> > > > > > read it) and in some cases can apply to speculative accesses as 
> >> > > > > > well, but
> >> > > > > > the end result is typically loss of coherency.
> >> > > > >
> >> > > > > Ok, I might have a few other sections to read first to decipher the
> >> > > > > terminology..
> >> > > > >
> >> > > > > But my understanding of LLC is that it looks just like system 
> >> > > > > memory
> >> > > > > to the CPU and GPU (I think that would make it "the point of
> >> > > > > coherence" between the GPU and CPU?)  If that is true, shouldn't 
> >> > > > > it be
> >> > > > > invisible from the point of view of different CPU mapping options?
> >> > > >
> >> > > > You could certainly build a system where mismatched attributes don't 
> >> > > > cause
> >> > > > loss of coherence, but as it's not guaranteed by the architecture 
> >> > > > and the
> >> > > > changes proposed here affect APIs which are exposed across SoCs, 
> >> > > > then I
> >> > > > don't think it helps much.
> >> > > >
> >> > >
> >> > > Hmm, the description of the new mapping flag is that it applies only
> >> > > to transparent outer level cache:
> >> > >
> >> > > +/*
> >> > > + * Non-coherent masters can use this page protection flag to set 
> >> > > cacheable
> >> > > + * memory attributes for only a transparent outer level of cache, 
> >> > > also known as
> >> > > + * the last-level or system cache.
> >> > > + */
> >> > > +#define IOMMU_LLC  (1 << 6)
> >> > >
> >> > > But I suppose we could call it instead IOMMU_QCOM_LLC or something
> >> > > like that to make it more clear that it is not 

[PATCH V3 12/13] HV/Netvsc: Add Isolation VM support for netvsc driver

2021-08-09 Thread Tianyu Lan
From: Tianyu Lan 

In Isolation VM, all shared memory with host needs to mark visible
to host via hvcall. vmbus_establish_gpadl() has already done it for
netvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
pagebuffer() still need to handle. Use DMA API to map/umap these
memory during sending/receiving packet and Hyper-V DMA ops callback
will use swiotlb function to allocate bounce buffer and copy data
from/to bounce buffer.

Signed-off-by: Tianyu Lan 
---
 drivers/net/hyperv/hyperv_net.h   |   6 ++
 drivers/net/hyperv/netvsc.c   | 144 +-
 drivers/net/hyperv/rndis_filter.c |   2 +
 include/linux/hyperv.h|   5 ++
 4 files changed, 154 insertions(+), 3 deletions(-)

diff --git a/drivers/net/hyperv/hyperv_net.h b/drivers/net/hyperv/hyperv_net.h
index bc48855dff10..862419912bfb 100644
--- a/drivers/net/hyperv/hyperv_net.h
+++ b/drivers/net/hyperv/hyperv_net.h
@@ -164,6 +164,7 @@ struct hv_netvsc_packet {
u32 total_bytes;
u32 send_buf_index;
u32 total_data_buflen;
+   struct hv_dma_range *dma_range;
 };
 
 #define NETVSC_HASH_KEYLEN 40
@@ -1074,6 +1075,7 @@ struct netvsc_device {
 
/* Receive buffer allocated by us but manages by NetVSP */
void *recv_buf;
+   void *recv_original_buf;
u32 recv_buf_size; /* allocated bytes */
u32 recv_buf_gpadl_handle;
u32 recv_section_cnt;
@@ -1082,6 +1084,8 @@ struct netvsc_device {
 
/* Send buffer allocated by us */
void *send_buf;
+   void *send_original_buf;
+   u32 send_buf_size;
u32 send_buf_gpadl_handle;
u32 send_section_cnt;
u32 send_section_size;
@@ -1730,4 +1734,6 @@ struct rndis_message {
 #define RETRY_US_HI1
 #define RETRY_MAX  2000/* >10 sec */
 
+void netvsc_dma_unmap(struct hv_device *hv_dev,
+ struct hv_netvsc_packet *packet);
 #endif /* _HYPERV_NET_H */
diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 7bd935412853..fc312e5db4d5 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -153,8 +153,21 @@ static void free_netvsc_device(struct rcu_head *head)
int i;
 
kfree(nvdev->extension);
-   vfree(nvdev->recv_buf);
-   vfree(nvdev->send_buf);
+
+   if (nvdev->recv_original_buf) {
+   vunmap(nvdev->recv_buf);
+   vfree(nvdev->recv_original_buf);
+   } else {
+   vfree(nvdev->recv_buf);
+   }
+
+   if (nvdev->send_original_buf) {
+   vunmap(nvdev->send_buf);
+   vfree(nvdev->send_original_buf);
+   } else {
+   vfree(nvdev->send_buf);
+   }
+
kfree(nvdev->send_section_map);
 
for (i = 0; i < VRSS_CHANNEL_MAX; i++) {
@@ -330,6 +343,27 @@ int netvsc_alloc_recv_comp_ring(struct netvsc_device 
*net_device, u32 q_idx)
return nvchan->mrc.slots ? 0 : -ENOMEM;
 }
 
+static void *netvsc_remap_buf(void *buf, unsigned long size)
+{
+   unsigned long *pfns;
+   void *vaddr;
+   int i;
+
+   pfns = kcalloc(size / HV_HYP_PAGE_SIZE, sizeof(unsigned long),
+  GFP_KERNEL);
+   if (!pfns)
+   return NULL;
+
+   for (i = 0; i < size / HV_HYP_PAGE_SIZE; i++)
+   pfns[i] = virt_to_hvpfn(buf + i * HV_HYP_PAGE_SIZE)
+   + (ms_hyperv.shared_gpa_boundary >> HV_HYP_PAGE_SHIFT);
+
+   vaddr = vmap_pfn(pfns, size / HV_HYP_PAGE_SIZE, PAGE_KERNEL_IO);
+   kfree(pfns);
+
+   return vaddr;
+}
+
 static int netvsc_init_buf(struct hv_device *device,
   struct netvsc_device *net_device,
   const struct netvsc_device_info *device_info)
@@ -340,6 +374,7 @@ static int netvsc_init_buf(struct hv_device *device,
unsigned int buf_size;
size_t map_words;
int i, ret = 0;
+   void *vaddr;
 
/* Get receive buffer area. */
buf_size = device_info->recv_sections * device_info->recv_section_size;
@@ -375,6 +410,15 @@ static int netvsc_init_buf(struct hv_device *device,
goto cleanup;
}
 
+   if (hv_isolation_type_snp()) {
+   vaddr = netvsc_remap_buf(net_device->recv_buf, buf_size);
+   if (!vaddr)
+   goto cleanup;
+
+   net_device->recv_original_buf = net_device->recv_buf;
+   net_device->recv_buf = vaddr;
+   }
+
/* Notify the NetVsp of the gpadl handle */
init_packet = _device->channel_init_pkt;
memset(init_packet, 0, sizeof(struct nvsp_message));
@@ -477,6 +521,15 @@ static int netvsc_init_buf(struct hv_device *device,
goto cleanup;
}
 
+   if (hv_isolation_type_snp()) {
+   vaddr = netvsc_remap_buf(net_device->send_buf, buf_size);
+   if (!vaddr)
+   goto cleanup;
+
+   net_device->send_original_buf = 

[PATCH V3 13/13] HV/Storvsc: Add Isolation VM support for storvsc driver

2021-08-09 Thread Tianyu Lan
From: Tianyu Lan 

In Isolation VM, all shared memory with host needs to mark visible
to host via hvcall. vmbus_establish_gpadl() has already done it for
storvsc rx/tx ring buffer. The page buffer used by vmbus_sendpacket_
mpb_desc() still need to handle. Use DMA API to map/umap these
memory during sending/receiving packet and Hyper-V DMA ops callback
will use swiotlb function to allocate bounce buffer and copy data
from/to bounce buffer.

Signed-off-by: Tianyu Lan 
---
 drivers/scsi/storvsc_drv.c | 68 +++---
 1 file changed, 63 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 328bb961c281..78320719bdd8 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -21,6 +21,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -427,6 +429,8 @@ struct storvsc_cmd_request {
u32 payload_sz;
 
struct vstor_packet vstor_packet;
+   u32 hvpg_count;
+   struct hv_dma_range *dma_range;
 };
 
 
@@ -509,6 +513,14 @@ struct storvsc_scan_work {
u8 tgt_id;
 };
 
+#define storvsc_dma_map(dev, page, offset, size, dir) \
+   dma_map_page(dev, page, offset, size, dir)
+
+#define storvsc_dma_unmap(dev, dma_range, dir) \
+   dma_unmap_page(dev, dma_range.dma,  \
+  dma_range.mapping_size,  \
+  dir ? DMA_FROM_DEVICE : DMA_TO_DEVICE)
+
 static void storvsc_device_scan(struct work_struct *work)
 {
struct storvsc_scan_work *wrk;
@@ -1260,6 +1272,7 @@ static void storvsc_on_channel_callback(void *context)
struct hv_device *device;
struct storvsc_device *stor_device;
struct Scsi_Host *shost;
+   int i;
 
if (channel->primary_channel != NULL)
device = channel->primary_channel->device_obj;
@@ -1314,6 +1327,15 @@ static void storvsc_on_channel_callback(void *context)
request = (struct storvsc_cmd_request 
*)scsi_cmd_priv(scmnd);
}
 
+   if (request->dma_range) {
+   for (i = 0; i < request->hvpg_count; i++)
+   storvsc_dma_unmap(>device,
+   request->dma_range[i],
+   
request->vstor_packet.vm_srb.data_in == READ_TYPE);
+
+   kfree(request->dma_range);
+   }
+
storvsc_on_receive(stor_device, packet, request);
continue;
}
@@ -1810,7 +1832,9 @@ static int storvsc_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scmnd)
unsigned int hvpgoff, hvpfns_to_add;
unsigned long offset_in_hvpg = offset_in_hvpage(sgl->offset);
unsigned int hvpg_count = HVPFN_UP(offset_in_hvpg + length);
+   dma_addr_t dma;
u64 hvpfn;
+   u32 size;
 
if (hvpg_count > MAX_PAGE_BUFFER_COUNT) {
 
@@ -1824,6 +1848,13 @@ static int storvsc_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scmnd)
payload->range.len = length;
payload->range.offset = offset_in_hvpg;
 
+   cmd_request->dma_range = kcalloc(hvpg_count,
+sizeof(*cmd_request->dma_range),
+GFP_ATOMIC);
+   if (!cmd_request->dma_range) {
+   ret = -ENOMEM;
+   goto free_payload;
+   }
 
for (i = 0; sgl != NULL; sgl = sg_next(sgl)) {
/*
@@ -1847,9 +1878,29 @@ static int storvsc_queuecommand(struct Scsi_Host *host, 
struct scsi_cmnd *scmnd)
 * last sgl should be reached at the same time that
 * the PFN array is filled.
 */
-   while (hvpfns_to_add--)
-   payload->range.pfn_array[i++] = hvpfn++;
+   while (hvpfns_to_add--) {
+   size = min(HV_HYP_PAGE_SIZE - offset_in_hvpg,
+  (unsigned long)length);
+   dma = storvsc_dma_map(>device, 
pfn_to_page(hvpfn++),
+ offset_in_hvpg, size,
+ scmnd->sc_data_direction);
+   if (dma_mapping_error(>device, dma)) {
+   ret = -ENOMEM;
+   goto free_dma_range;
+   }
+
+   if (offset_in_hvpg) {
+   payload->range.offset = dma & 
~HV_HYP_PAGE_MASK;
+   

[PATCH V3 11/13] HV/IOMMU: Enable swiotlb bounce buffer for Isolation VM

2021-08-09 Thread Tianyu Lan
From: Tianyu Lan 

Hyper-V Isolation VM requires bounce buffer support to copy
data from/to encrypted memory and so enable swiotlb force
mode to use swiotlb bounce buffer for DMA transaction.

In Isolation VM with AMD SEV, the bounce buffer needs to be
accessed via extra address space which is above shared_gpa_boundary
(E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG.
The access physical address will be original physical address +
shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP
spec is called virtual top of memory(vTOM). Memory addresses below
vTOM are automatically treated as private while memory above
vTOM is treated as shared.

Swiotlb bounce buffer code calls dma_map_decrypted()
to mark bounce buffer visible to host and map it in extra
address space. Populate dma memory decrypted ops with hv
map/unmap function.

Hyper-V initalizes swiotlb bounce buffer and default swiotlb
needs to be disabled. pci_swiotlb_detect_override() and
pci_swiotlb_detect_4gb() enable the default one. To override
the setting, hyperv_swiotlb_detect() needs to run before
these detect functions which depends on the pci_xen_swiotlb_
init(). Make pci_xen_swiotlb_init() depends on the hyperv_swiotlb
_detect() to keep the order.

The map function vmap_pfn() can't work in the early place
hyperv_iommu_swiotlb_init() and so initialize swiotlb bounce
buffer in the hyperv_iommu_swiotlb_later_init().

Signed-off-by: Tianyu Lan 
---
 arch/x86/hyperv/ivm.c   | 28 ++
 arch/x86/include/asm/mshyperv.h |  2 +
 arch/x86/xen/pci-swiotlb-xen.c  |  3 +-
 drivers/hv/vmbus_drv.c  |  3 ++
 drivers/iommu/hyperv-iommu.c| 65 +
 include/linux/hyperv.h  |  1 +
 6 files changed, 101 insertions(+), 1 deletion(-)

diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index c13ec5560d73..0f05e4d6fc62 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -265,3 +265,31 @@ int hv_set_mem_host_visibility(unsigned long addr, int 
numpages, bool visible)
 
return __hv_set_mem_host_visibility((void *)addr, numpages, visibility);
 }
+
+/*
+ * hv_map_memory - map memory to extra space in the AMD SEV-SNP Isolation VM.
+ */
+void *hv_map_memory(void *addr, unsigned long size)
+{
+   unsigned long *pfns = kcalloc(size / HV_HYP_PAGE_SIZE,
+ sizeof(unsigned long), GFP_KERNEL);
+   void *vaddr;
+   int i;
+
+   if (!pfns)
+   return NULL;
+
+   for (i = 0; i < size / HV_HYP_PAGE_SIZE; i++)
+   pfns[i] = virt_to_hvpfn(addr + i * HV_HYP_PAGE_SIZE) +
+   (ms_hyperv.shared_gpa_boundary >> HV_HYP_PAGE_SHIFT);
+
+   vaddr = vmap_pfn(pfns, size / HV_HYP_PAGE_SIZE, PAGE_KERNEL_IO);
+   kfree(pfns);
+
+   return vaddr;
+}
+
+void hv_unmap_memory(void *addr)
+{
+   vunmap(addr);
+}
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index a30c60f189a3..b247739f57ac 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -250,6 +250,8 @@ int hv_unmap_ioapic_interrupt(int ioapic_id, struct 
hv_interrupt_entry *entry);
 int hv_mark_gpa_visibility(u16 count, const u64 pfn[],
   enum hv_mem_host_visibility visibility);
 int hv_set_mem_host_visibility(unsigned long addr, int numpages, bool visible);
+void *hv_map_memory(void *addr, unsigned long size);
+void hv_unmap_memory(void *addr);
 void hv_sint_wrmsrl_ghcb(u64 msr, u64 value);
 void hv_sint_rdmsrl_ghcb(u64 msr, u64 *value);
 void hv_signal_eom_ghcb(void);
diff --git a/arch/x86/xen/pci-swiotlb-xen.c b/arch/x86/xen/pci-swiotlb-xen.c
index 54f9aa7e8457..43bd031aa332 100644
--- a/arch/x86/xen/pci-swiotlb-xen.c
+++ b/arch/x86/xen/pci-swiotlb-xen.c
@@ -4,6 +4,7 @@
 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -91,6 +92,6 @@ int pci_xen_swiotlb_init_late(void)
 EXPORT_SYMBOL_GPL(pci_xen_swiotlb_init_late);
 
 IOMMU_INIT_FINISH(pci_xen_swiotlb_detect,
- NULL,
+ hyperv_swiotlb_detect,
  pci_xen_swiotlb_init,
  NULL);
diff --git a/drivers/hv/vmbus_drv.c b/drivers/hv/vmbus_drv.c
index 57bbbaa4e8f7..f068e22a5636 100644
--- a/drivers/hv/vmbus_drv.c
+++ b/drivers/hv/vmbus_drv.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 
+#include 
 #include 
 #include 
 #include 
@@ -2081,6 +2082,7 @@ struct hv_device *vmbus_device_create(const guid_t *type,
return child_device_obj;
 }
 
+static u64 vmbus_dma_mask = DMA_BIT_MASK(64);
 /*
  * vmbus_device_register - Register the child device
  */
@@ -2121,6 +2123,7 @@ int vmbus_device_register(struct hv_device 
*child_device_obj)
}
hv_debug_add_dev_dir(child_device_obj);
 
+   child_device_obj->device.dma_mask = _dma_mask;
return 0;
 
 err_kset_unregister:
diff --git a/drivers/iommu/hyperv-iommu.c b/drivers/iommu/hyperv-iommu.c
index e285a220c913..01e874b3b43a 100644
--- 

[PATCH V3 10/13] x86/Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM

2021-08-09 Thread Tianyu Lan
From: Tianyu Lan 

In Isolation VM with AMD SEV, bounce buffer needs to be accessed via
extra address space which is above shared_gpa_boundary
(E.G 39 bit address line) reported by Hyper-V CPUID ISOLATION_CONFIG.
The access physical address will be original physical address +
shared_gpa_boundary. The shared_gpa_boundary in the AMD SEV SNP
spec is called virtual top of memory(vTOM). Memory addresses below
vTOM are automatically treated as private while memory above
vTOM is treated as shared.

Use dma_map_decrypted() in the swiotlb code, store remap address returned
and use the remap address to copy data from/to swiotlb bounce buffer.

Signed-off-by: Tianyu Lan 
---
Change since v1:
   * Make swiotlb_init_io_tlb_mem() return error code and return
 error when dma_map_decrypted() fails.
---
 include/linux/swiotlb.h |  4 
 kernel/dma/swiotlb.c| 32 
 2 files changed, 28 insertions(+), 8 deletions(-)

diff --git a/include/linux/swiotlb.h b/include/linux/swiotlb.h
index f507e3eacbea..584560ecaa8e 100644
--- a/include/linux/swiotlb.h
+++ b/include/linux/swiotlb.h
@@ -72,6 +72,9 @@ extern enum swiotlb_force swiotlb_force;
  * @end:   The end address of the swiotlb memory pool. Used to do a quick
  * range check to see if the memory was in fact allocated by this
  * API.
+ * @vaddr: The vaddr of the swiotlb memory pool. The swiotlb
+ * memory pool may be remapped in the memory encrypted case and 
store
+ * virtual address for bounce buffer operation.
  * @nslabs:The number of IO TLB blocks (in groups of 64) between @start and
  * @end. For default swiotlb, this is command line adjustable via
  * setup_io_tlb_npages.
@@ -89,6 +92,7 @@ extern enum swiotlb_force swiotlb_force;
 struct io_tlb_mem {
phys_addr_t start;
phys_addr_t end;
+   void *vaddr;
unsigned long nslabs;
unsigned long used;
unsigned int index;
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index 1fa81c096c1d..29b6d888ef3b 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -176,7 +176,7 @@ void __init swiotlb_update_mem_attributes(void)
memset(vaddr, 0, bytes);
 }
 
-static void swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start,
+static int swiotlb_init_io_tlb_mem(struct io_tlb_mem *mem, phys_addr_t start,
unsigned long nslabs, bool late_alloc)
 {
void *vaddr = phys_to_virt(start);
@@ -194,14 +194,21 @@ static void swiotlb_init_io_tlb_mem(struct io_tlb_mem 
*mem, phys_addr_t start,
mem->slots[i].alloc_size = 0;
}
 
-   set_memory_decrypted((unsigned long)vaddr, bytes >> PAGE_SHIFT);
-   memset(vaddr, 0, bytes);
+   mem->vaddr = dma_map_decrypted(vaddr, bytes);
+   if (!mem->vaddr) {
+   pr_err("Failed to decrypt memory.\n");
+   return -ENOMEM;
+   }
+
+   memset(mem->vaddr, 0, bytes);
+   return 0;
 }
 
 int __init swiotlb_init_with_tbl(char *tlb, unsigned long nslabs, int verbose)
 {
struct io_tlb_mem *mem;
size_t alloc_size;
+   int ret;
 
if (swiotlb_force == SWIOTLB_NO_FORCE)
return 0;
@@ -216,7 +223,11 @@ int __init swiotlb_init_with_tbl(char *tlb, unsigned long 
nslabs, int verbose)
panic("%s: Failed to allocate %zu bytes align=0x%lx\n",
  __func__, alloc_size, PAGE_SIZE);
 
-   swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, false);
+   ret = swiotlb_init_io_tlb_mem(mem, __pa(tlb), nslabs, false);
+   if (ret) {
+   memblock_free(__pa(mem), alloc_size);
+   return ret;
+   }
 
io_tlb_default_mem = mem;
if (verbose)
@@ -304,6 +315,8 @@ int
 swiotlb_late_init_with_tbl(char *tlb, unsigned long nslabs)
 {
struct io_tlb_mem *mem;
+   int size = get_order(struct_size(mem, slots, nslabs));
+   int ret;
 
if (swiotlb_force == SWIOTLB_NO_FORCE)
return 0;
@@ -312,12 +325,15 @@ swiotlb_late_init_with_tbl(char *tlb, unsigned long 
nslabs)
if (WARN_ON_ONCE(io_tlb_default_mem))
return -ENOMEM;
 
-   mem = (void *)__get_free_pages(GFP_KERNEL,
-   get_order(struct_size(mem, slots, nslabs)));
+   mem = (void *)__get_free_pages(GFP_KERNEL, size);
if (!mem)
return -ENOMEM;
 
-   swiotlb_init_io_tlb_mem(mem, virt_to_phys(tlb), nslabs, true);
+   ret = swiotlb_init_io_tlb_mem(mem, virt_to_phys(tlb), nslabs, true);
+   if (ret) {
+   free_pages((unsigned long)mem, size);
+   return ret;
+   }
 
io_tlb_default_mem = mem;
swiotlb_print_info();
@@ -360,7 +376,7 @@ static void swiotlb_bounce(struct device *dev, phys_addr_t 
tlb_addr, size_t size
phys_addr_t orig_addr = mem->slots[index].orig_addr;
size_t alloc_size = 

[PATCH V3 09/13] DMA: Add dma_map_decrypted/dma_unmap_encrypted() function

2021-08-09 Thread Tianyu Lan
From: Tianyu Lan 

In Hyper-V Isolation VM with AMD SEV, swiotlb boucne buffer
needs to be mapped into address space above vTOM and so
introduce dma_map_decrypted/dma_unmap_encrypted() to map/unmap
bounce buffer memory. The platform can populate man/unmap callback
in the dma memory decrypted ops.
---
 include/linux/dma-map-ops.h |  9 +
 kernel/dma/mapping.c| 22 ++
 2 files changed, 31 insertions(+)

diff --git a/include/linux/dma-map-ops.h b/include/linux/dma-map-ops.h
index 0d53a96a3d64..01d60a024e45 100644
--- a/include/linux/dma-map-ops.h
+++ b/include/linux/dma-map-ops.h
@@ -71,6 +71,11 @@ struct dma_map_ops {
unsigned long (*get_merge_boundary)(struct device *dev);
 };
 
+struct dma_memory_decrypted_ops {
+   void *(*map)(void *addr, unsigned long size);
+   void (*unmap)(void *addr);
+};
+
 #ifdef CONFIG_DMA_OPS
 #include 
 
@@ -374,6 +379,10 @@ static inline void debug_dma_dump_mappings(struct device 
*dev)
 }
 #endif /* CONFIG_DMA_API_DEBUG */
 
+void *dma_map_decrypted(void *addr, unsigned long size);
+int dma_unmap_decrypted(void *addr, unsigned long size);
+
 extern const struct dma_map_ops dma_dummy_ops;
+extern struct dma_memory_decrypted_ops dma_memory_generic_decrypted_ops;
 
 #endif /* _LINUX_DMA_MAP_OPS_H */
diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c
index 2b06a809d0b9..6fb150dc1750 100644
--- a/kernel/dma/mapping.c
+++ b/kernel/dma/mapping.c
@@ -13,11 +13,13 @@
 #include 
 #include 
 #include 
+#include 
 #include "debug.h"
 #include "direct.h"
 
 bool dma_default_coherent;
 
+struct dma_memory_decrypted_ops dma_memory_generic_decrypted_ops;
 /*
  * Managed DMA API
  */
@@ -736,3 +738,23 @@ unsigned long dma_get_merge_boundary(struct device *dev)
return ops->get_merge_boundary(dev);
 }
 EXPORT_SYMBOL_GPL(dma_get_merge_boundary);
+
+void *dma_map_decrypted(void *addr, unsigned long size)
+{
+   if (set_memory_decrypted((unsigned long)addr,
+size / PAGE_SIZE))
+   return NULL;
+
+   if (dma_memory_generic_decrypted_ops.map)
+   return dma_memory_generic_decrypted_ops.map(addr, size);
+   else
+   return addr;
+}
+
+int dma_unmap_encrypted(void *addr, unsigned long size)
+{
+   if (dma_memory_generic_decrypted_ops.unmap)
+   dma_memory_generic_decrypted_ops.unmap(addr);
+
+   return set_memory_encrypted((unsigned long)addr, size / PAGE_SIZE);
+}
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH V3 08/13] HV/Vmbus: Initialize VMbus ring buffer for Isolation VM

2021-08-09 Thread Tianyu Lan
From: Tianyu Lan 

VMbus ring buffer are shared with host and it's need to
be accessed via extra address space of Isolation VM with
SNP support. This patch is to map the ring buffer
address in extra address space via ioremap(). HV host
visibility hvcall smears data in the ring buffer and
so reset the ring buffer memory to zero after calling
visibility hvcall.

Signed-off-by: Tianyu Lan 
---
 drivers/hv/Kconfig|  1 +
 drivers/hv/channel.c  | 10 +
 drivers/hv/hyperv_vmbus.h |  2 +
 drivers/hv/ring_buffer.c  | 84 ++-
 4 files changed, 79 insertions(+), 18 deletions(-)

diff --git a/drivers/hv/Kconfig b/drivers/hv/Kconfig
index d1123ceb38f3..dd12af20e467 100644
--- a/drivers/hv/Kconfig
+++ b/drivers/hv/Kconfig
@@ -8,6 +8,7 @@ config HYPERV
|| (ARM64 && !CPU_BIG_ENDIAN))
select PARAVIRT
select X86_HV_CALLBACK_VECTOR if X86
+   select VMAP_PFN
help
  Select this option to run Linux as a Hyper-V client operating
  system.
diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index 4c4717c26240..60ef881a700c 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -712,6 +712,16 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
if (err)
goto error_clean_ring;
 
+   err = hv_ringbuffer_post_init(>outbound,
+ page, send_pages);
+   if (err)
+   goto error_free_gpadl;
+
+   err = hv_ringbuffer_post_init(>inbound,
+ [send_pages], recv_pages);
+   if (err)
+   goto error_free_gpadl;
+
/* Create and init the channel open message */
open_info = kzalloc(sizeof(*open_info) +
   sizeof(struct vmbus_channel_open_channel),
diff --git a/drivers/hv/hyperv_vmbus.h b/drivers/hv/hyperv_vmbus.h
index 40bc0eff6665..15cd23a561f3 100644
--- a/drivers/hv/hyperv_vmbus.h
+++ b/drivers/hv/hyperv_vmbus.h
@@ -172,6 +172,8 @@ extern int hv_synic_cleanup(unsigned int cpu);
 /* Interface */
 
 void hv_ringbuffer_pre_init(struct vmbus_channel *channel);
+int hv_ringbuffer_post_init(struct hv_ring_buffer_info *ring_info,
+   struct page *pages, u32 page_cnt);
 
 int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
   struct page *pages, u32 pagecnt, u32 max_pkt_size);
diff --git a/drivers/hv/ring_buffer.c b/drivers/hv/ring_buffer.c
index 2aee356840a2..d4f93fca1108 100644
--- a/drivers/hv/ring_buffer.c
+++ b/drivers/hv/ring_buffer.c
@@ -17,6 +17,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include "hyperv_vmbus.h"
 
@@ -179,43 +181,89 @@ void hv_ringbuffer_pre_init(struct vmbus_channel *channel)
mutex_init(>outbound.ring_buffer_mutex);
 }
 
-/* Initialize the ring buffer. */
-int hv_ringbuffer_init(struct hv_ring_buffer_info *ring_info,
-  struct page *pages, u32 page_cnt, u32 max_pkt_size)
+int hv_ringbuffer_post_init(struct hv_ring_buffer_info *ring_info,
+  struct page *pages, u32 page_cnt)
 {
+   u64 physic_addr = page_to_pfn(pages) << PAGE_SHIFT;
+   unsigned long *pfns_wraparound;
+   void *vaddr;
int i;
-   struct page **pages_wraparound;
 
-   BUILD_BUG_ON((sizeof(struct hv_ring_buffer) != PAGE_SIZE));
+   if (!hv_isolation_type_snp())
+   return 0;
+
+   physic_addr += ms_hyperv.shared_gpa_boundary;
 
/*
 * First page holds struct hv_ring_buffer, do wraparound mapping for
 * the rest.
 */
-   pages_wraparound = kcalloc(page_cnt * 2 - 1, sizeof(struct page *),
+   pfns_wraparound = kcalloc(page_cnt * 2 - 1, sizeof(unsigned long),
   GFP_KERNEL);
-   if (!pages_wraparound)
+   if (!pfns_wraparound)
return -ENOMEM;
 
-   pages_wraparound[0] = pages;
+   pfns_wraparound[0] = physic_addr >> PAGE_SHIFT;
for (i = 0; i < 2 * (page_cnt - 1); i++)
-   pages_wraparound[i + 1] = [i % (page_cnt - 1) + 1];
-
-   ring_info->ring_buffer = (struct hv_ring_buffer *)
-   vmap(pages_wraparound, page_cnt * 2 - 1, VM_MAP, PAGE_KERNEL);
-
-   kfree(pages_wraparound);
+   pfns_wraparound[i + 1] = (physic_addr >> PAGE_SHIFT) +
+   i % (page_cnt - 1) + 1;
 
-
-   if (!ring_info->ring_buffer)
+   vaddr = vmap_pfn(pfns_wraparound, page_cnt * 2 - 1, PAGE_KERNEL_IO);
+   kfree(pfns_wraparound);
+   if (!vaddr)
return -ENOMEM;
 
-   ring_info->ring_buffer->read_index =
-   ring_info->ring_buffer->write_index = 0;
+   /* Clean memory after setting host visibility. */
+   memset((void *)vaddr, 0x00, page_cnt * PAGE_SIZE);
+
+   ring_info->ring_buffer = (struct hv_ring_buffer *)vaddr;
+   ring_info->ring_buffer->read_index = 0;
+   ring_info->ring_buffer->write_index = 0;
 
 

[PATCH V3 07/13] HV/Vmbus: Add SNP support for VMbus channel initiate message

2021-08-09 Thread Tianyu Lan
From: Tianyu Lan 

The monitor pages in the CHANNELMSG_INITIATE_CONTACT msg are shared
with host in Isolation VM and so it's necessary to use hvcall to set
them visible to host. In Isolation VM with AMD SEV SNP, the access
address should be in the extra space which is above shared gpa
boundary. So remap these pages into the extra address(pa +
shared_gpa_boundary). Introduce monitor_pages_va to store
the remap address and unmap these va when disconnect vmbus.

Signed-off-by: Tianyu Lan 
---
Change since v1:
* Not remap monitor pages in the non-SNP isolation VM.
---
 drivers/hv/connection.c   | 65 +++
 drivers/hv/hyperv_vmbus.h |  1 +
 2 files changed, 66 insertions(+)

diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index 6d315c1465e0..bf0ac3167bd2 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include "hyperv_vmbus.h"
@@ -104,6 +105,12 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo 
*msginfo, u32 version)
 
msg->monitor_page1 = virt_to_phys(vmbus_connection.monitor_pages[0]);
msg->monitor_page2 = virt_to_phys(vmbus_connection.monitor_pages[1]);
+
+   if (hv_isolation_type_snp()) {
+   msg->monitor_page1 += ms_hyperv.shared_gpa_boundary;
+   msg->monitor_page2 += ms_hyperv.shared_gpa_boundary;
+   }
+
msg->target_vcpu = hv_cpu_number_to_vp_number(VMBUS_CONNECT_CPU);
 
/*
@@ -148,6 +155,31 @@ int vmbus_negotiate_version(struct vmbus_channel_msginfo 
*msginfo, u32 version)
return -ECONNREFUSED;
}
 
+   if (hv_isolation_type_snp()) {
+   vmbus_connection.monitor_pages_va[0]
+   = vmbus_connection.monitor_pages[0];
+   vmbus_connection.monitor_pages[0]
+   = memremap(msg->monitor_page1, HV_HYP_PAGE_SIZE,
+  MEMREMAP_WB);
+   if (!vmbus_connection.monitor_pages[0])
+   return -ENOMEM;
+
+   vmbus_connection.monitor_pages_va[1]
+   = vmbus_connection.monitor_pages[1];
+   vmbus_connection.monitor_pages[1]
+   = memremap(msg->monitor_page2, HV_HYP_PAGE_SIZE,
+  MEMREMAP_WB);
+   if (!vmbus_connection.monitor_pages[1]) {
+   memunmap(vmbus_connection.monitor_pages[0]);
+   return -ENOMEM;
+   }
+
+   memset(vmbus_connection.monitor_pages[0], 0x00,
+  HV_HYP_PAGE_SIZE);
+   memset(vmbus_connection.monitor_pages[1], 0x00,
+  HV_HYP_PAGE_SIZE);
+   }
+
return ret;
 }
 
@@ -159,6 +191,7 @@ int vmbus_connect(void)
struct vmbus_channel_msginfo *msginfo = NULL;
int i, ret = 0;
__u32 version;
+   u64 pfn[2];
 
/* Initialize the vmbus connection */
vmbus_connection.conn_state = CONNECTING;
@@ -216,6 +249,16 @@ int vmbus_connect(void)
goto cleanup;
}
 
+   if (hv_is_isolation_supported()) {
+   pfn[0] = virt_to_hvpfn(vmbus_connection.monitor_pages[0]);
+   pfn[1] = virt_to_hvpfn(vmbus_connection.monitor_pages[1]);
+   if (hv_mark_gpa_visibility(2, pfn,
+   VMBUS_PAGE_VISIBLE_READ_WRITE)) {
+   ret = -EFAULT;
+   goto cleanup;
+   }
+   }
+
msginfo = kzalloc(sizeof(*msginfo) +
  sizeof(struct vmbus_channel_initiate_contact),
  GFP_KERNEL);
@@ -284,6 +327,8 @@ int vmbus_connect(void)
 
 void vmbus_disconnect(void)
 {
+   u64 pfn[2];
+
/*
 * First send the unload request to the host.
 */
@@ -303,6 +348,26 @@ void vmbus_disconnect(void)
vmbus_connection.int_page = NULL;
}
 
+   if (hv_is_isolation_supported()) {
+   if (vmbus_connection.monitor_pages_va[0]) {
+   memunmap(vmbus_connection.monitor_pages[0]);
+   vmbus_connection.monitor_pages[0]
+   = vmbus_connection.monitor_pages_va[0];
+   vmbus_connection.monitor_pages_va[0] = NULL;
+   }
+
+   if (vmbus_connection.monitor_pages_va[1]) {
+   memunmap(vmbus_connection.monitor_pages[1]);
+   vmbus_connection.monitor_pages[1]
+   = vmbus_connection.monitor_pages_va[1];
+   vmbus_connection.monitor_pages_va[1] = NULL;
+   }
+
+   pfn[0] = virt_to_hvpfn(vmbus_connection.monitor_pages[0]);
+   pfn[1] = virt_to_hvpfn(vmbus_connection.monitor_pages[1]);
+   hv_mark_gpa_visibility(2, pfn, VMBUS_PAGE_NOT_VISIBLE);
+

[PATCH V3 06/13] HV: Add ghcb hvcall support for SNP VM

2021-08-09 Thread Tianyu Lan
From: Tianyu Lan 

Hyper-V provides ghcb hvcall to handle VMBus
HVCALL_SIGNAL_EVENT and HVCALL_POST_MESSAGE
msg in SNP Isolation VM. Add such support.

Signed-off-by: Tianyu Lan 
---
 arch/x86/hyperv/ivm.c   | 43 +
 arch/x86/include/asm/mshyperv.h |  1 +
 drivers/hv/connection.c |  6 -
 drivers/hv/hv.c |  8 +-
 include/asm-generic/mshyperv.h  | 29 ++
 5 files changed, 85 insertions(+), 2 deletions(-)

diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index ec0e5c259740..c13ec5560d73 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -15,6 +15,49 @@
 #include 
 #include 
 
+#define GHCB_USAGE_HYPERV_CALL 1
+
+u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size)
+{
+   union hv_ghcb *hv_ghcb;
+   void **ghcb_base;
+   unsigned long flags;
+
+   if (!ms_hyperv.ghcb_base)
+   return -EFAULT;
+
+   WARN_ON(in_nmi());
+
+   local_irq_save(flags);
+   ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+   hv_ghcb = (union hv_ghcb *)*ghcb_base;
+   if (!hv_ghcb) {
+   local_irq_restore(flags);
+   return -EFAULT;
+   }
+
+   hv_ghcb->ghcb.protocol_version = GHCB_PROTOCOL_MAX;
+   hv_ghcb->ghcb.ghcb_usage = GHCB_USAGE_HYPERV_CALL;
+
+   hv_ghcb->hypercall.outputgpa = (u64)output;
+   hv_ghcb->hypercall.hypercallinput.asuint64 = 0;
+   hv_ghcb->hypercall.hypercallinput.callcode = control;
+
+   if (input_size)
+   memcpy(hv_ghcb->hypercall.hypercalldata, input, input_size);
+
+   VMGEXIT();
+
+   hv_ghcb->ghcb.ghcb_usage = 0x;
+   memset(hv_ghcb->ghcb.save.valid_bitmap, 0,
+  sizeof(hv_ghcb->ghcb.save.valid_bitmap));
+
+   local_irq_restore(flags);
+
+   return hv_ghcb->hypercall.hypercalloutput.callstatus;
+}
+EXPORT_SYMBOL_GPL(hv_ghcb_hypercall);
+
 void hv_ghcb_msr_write(u64 msr, u64 value)
 {
union hv_ghcb *hv_ghcb;
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 730985676ea3..a30c60f189a3 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -255,6 +255,7 @@ void hv_sint_rdmsrl_ghcb(u64 msr, u64 *value);
 void hv_signal_eom_ghcb(void);
 void hv_ghcb_msr_write(u64 msr, u64 value);
 void hv_ghcb_msr_read(u64 msr, u64 *value);
+u64 hv_ghcb_hypercall(u64 control, void *input, void *output, u32 input_size);
 
 #define hv_get_synint_state_ghcb(int_num, val) \
hv_sint_rdmsrl_ghcb(HV_X64_MSR_SINT0 + int_num, val)
diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index 5e479d54918c..6d315c1465e0 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -447,6 +447,10 @@ void vmbus_set_event(struct vmbus_channel *channel)
 
++channel->sig_events;
 
-   hv_do_fast_hypercall8(HVCALL_SIGNAL_EVENT, channel->sig_event);
+   if (hv_isolation_type_snp())
+   hv_ghcb_hypercall(HVCALL_SIGNAL_EVENT, >sig_event,
+   NULL, sizeof(u64));
+   else
+   hv_do_fast_hypercall8(HVCALL_SIGNAL_EVENT, channel->sig_event);
 }
 EXPORT_SYMBOL_GPL(vmbus_set_event);
diff --git a/drivers/hv/hv.c b/drivers/hv/hv.c
index 59f7173c4d9f..e5c9fc467893 100644
--- a/drivers/hv/hv.c
+++ b/drivers/hv/hv.c
@@ -98,7 +98,13 @@ int hv_post_message(union hv_connection_id connection_id,
aligned_msg->payload_size = payload_size;
memcpy((void *)aligned_msg->payload, payload, payload_size);
 
-   status = hv_do_hypercall(HVCALL_POST_MESSAGE, aligned_msg, NULL);
+   if (hv_isolation_type_snp())
+   status = hv_ghcb_hypercall(HVCALL_POST_MESSAGE,
+   (void *)aligned_msg, NULL,
+   sizeof(struct hv_input_post_message));
+   else
+   status = hv_do_hypercall(HVCALL_POST_MESSAGE,
+   aligned_msg, NULL);
 
/* Preemption must remain disabled until after the hypercall
 * so some other thread can't get scheduled onto this cpu and
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index 90dac369a2dc..400181b855c1 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -31,6 +31,35 @@
 
 union hv_ghcb {
struct ghcb ghcb;
+   struct {
+   u64 hypercalldata[509];
+   u64 outputgpa;
+   union {
+   union {
+   struct {
+   u32 callcode: 16;
+   u32 isfast  : 1;
+   u32 reserved1   : 14;
+   u32 isnested: 1;
+   u32 countofelements : 12;
+   u32 reserved2 

[PATCH V3 05/13] HV: Add Write/Read MSR registers via ghcb page

2021-08-09 Thread Tianyu Lan
From: Tianyu Lan 

Hyper-V provides GHCB protocol to write Synthetic Interrupt
Controller MSR registers in Isolation VM with AMD SEV SNP
and these registers are emulated by hypervisor directly.
Hyper-V requires to write SINTx MSR registers twice. First
writes MSR via GHCB page to communicate with hypervisor
and then writes wrmsr instruction to talk with paravisor
which runs in VMPL0. Guest OS ID MSR also needs to be set
via GHCB.

Signed-off-by: Tianyu Lan 
---
Change since v1:
 * Introduce sev_es_ghcb_hv_call_simple() and share code
   between SEV and Hyper-V code.
---
 arch/x86/hyperv/hv_init.c   |  33 ++---
 arch/x86/hyperv/ivm.c   | 110 +
 arch/x86/include/asm/mshyperv.h |  78 +++-
 arch/x86/include/asm/sev.h  |   3 +
 arch/x86/kernel/cpu/mshyperv.c  |   3 +
 arch/x86/kernel/sev-shared.c|  63 ++---
 drivers/hv/hv.c | 121 ++--
 include/asm-generic/mshyperv.h  |  12 +++-
 8 files changed, 329 insertions(+), 94 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index b3683083208a..ab0b33f621e7 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -423,7 +423,7 @@ void __init hyperv_init(void)
goto clean_guest_os_id;
 
if (hv_isolation_type_snp()) {
-   ms_hyperv.ghcb_base = alloc_percpu(void *);
+   ms_hyperv.ghcb_base = alloc_percpu(union hv_ghcb __percpu *);
if (!ms_hyperv.ghcb_base)
goto clean_guest_os_id;
 
@@ -432,6 +432,9 @@ void __init hyperv_init(void)
ms_hyperv.ghcb_base = NULL;
goto clean_guest_os_id;
}
+
+   /* Hyper-V requires to write guest os id via ghcb in SNP IVM. */
+   hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, guest_id);
}
 
rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
@@ -523,6 +526,7 @@ void hyperv_cleanup(void)
 
/* Reset our OS id */
wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
+   hv_ghcb_msr_write(HV_X64_MSR_GUEST_OS_ID, 0);
 
/*
 * Reset hypercall page reference before reset the page,
@@ -596,30 +600,3 @@ bool hv_is_hyperv_initialized(void)
return hypercall_msr.enable;
 }
 EXPORT_SYMBOL_GPL(hv_is_hyperv_initialized);
-
-enum hv_isolation_type hv_get_isolation_type(void)
-{
-   if (!(ms_hyperv.priv_high & HV_ISOLATION))
-   return HV_ISOLATION_TYPE_NONE;
-   return FIELD_GET(HV_ISOLATION_TYPE, ms_hyperv.isolation_config_b);
-}
-EXPORT_SYMBOL_GPL(hv_get_isolation_type);
-
-bool hv_is_isolation_supported(void)
-{
-   if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR))
-   return 0;
-
-   if (!hypervisor_is_type(X86_HYPER_MS_HYPERV))
-   return 0;
-
-   return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
-}
-
-DEFINE_STATIC_KEY_FALSE(isolation_type_snp);
-
-bool hv_isolation_type_snp(void)
-{
-   return static_branch_unlikely(_type_snp);
-}
-EXPORT_SYMBOL_GPL(hv_isolation_type_snp);
diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
index 8c905ffdba7f..ec0e5c259740 100644
--- a/arch/x86/hyperv/ivm.c
+++ b/arch/x86/hyperv/ivm.c
@@ -6,6 +6,8 @@
  *  Tianyu Lan 
  */
 
+#include 
+#include 
 #include 
 #include 
 #include 
@@ -13,6 +15,114 @@
 #include 
 #include 
 
+void hv_ghcb_msr_write(u64 msr, u64 value)
+{
+   union hv_ghcb *hv_ghcb;
+   void **ghcb_base;
+   unsigned long flags;
+
+   if (!ms_hyperv.ghcb_base)
+   return;
+
+   WARN_ON(in_nmi());
+
+   local_irq_save(flags);
+   ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+   hv_ghcb = (union hv_ghcb *)*ghcb_base;
+   if (!hv_ghcb) {
+   local_irq_restore(flags);
+   return;
+   }
+
+   ghcb_set_rcx(_ghcb->ghcb, msr);
+   ghcb_set_rax(_ghcb->ghcb, lower_32_bits(value));
+   ghcb_set_rdx(_ghcb->ghcb, value >> 32);
+
+   if (sev_es_ghcb_hv_call_simple(_ghcb->ghcb, SVM_EXIT_MSR, 1, 0))
+   pr_warn("Fail to write msr via ghcb %llx.\n", msr);
+
+   local_irq_restore(flags);
+}
+
+void hv_ghcb_msr_read(u64 msr, u64 *value)
+{
+   union hv_ghcb *hv_ghcb;
+   void **ghcb_base;
+   unsigned long flags;
+
+   if (!ms_hyperv.ghcb_base)
+   return;
+
+   WARN_ON(in_nmi());
+
+   local_irq_save(flags);
+   ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+   hv_ghcb = (union hv_ghcb *)*ghcb_base;
+   if (!hv_ghcb) {
+   local_irq_restore(flags);
+   return;
+   }
+
+   ghcb_set_rcx(_ghcb->ghcb, msr);
+   if (sev_es_ghcb_hv_call_simple(_ghcb->ghcb, SVM_EXIT_MSR, 0, 0))
+   pr_warn("Fail to read msr via ghcb %llx.\n", msr);
+   else
+   *value = (u64)lower_32_bits(hv_ghcb->ghcb.save.rax)
+   | 

[PATCH V3 04/13] HV: Mark vmbus ring buffer visible to host in Isolation VM

2021-08-09 Thread Tianyu Lan
From: Tianyu Lan 

Mark vmbus ring buffer visible with set_memory_decrypted() when
establish gpadl handle.

Signed-off-by: Tianyu Lan 
---
 drivers/hv/channel.c   | 44 --
 include/linux/hyperv.h | 11 +++
 2 files changed, 53 insertions(+), 2 deletions(-)

diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index f3761c73b074..4c4717c26240 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -465,7 +466,14 @@ static int __vmbus_establish_gpadl(struct vmbus_channel 
*channel,
struct list_head *curr;
u32 next_gpadl_handle;
unsigned long flags;
-   int ret = 0;
+   int ret = 0, index;
+
+   index = atomic_inc_return(>gpadl_index) - 1;
+
+   if (index > VMBUS_GPADL_RANGE_COUNT - 1) {
+   pr_err("Gpadl handle position(%d) has been occupied.\n", index);
+   return -ENOSPC;
+   }
 
next_gpadl_handle =
(atomic_inc_return(_connection.next_gpadl_handle) - 1);
@@ -474,6 +482,13 @@ static int __vmbus_establish_gpadl(struct vmbus_channel 
*channel,
if (ret)
return ret;
 
+   ret = set_memory_decrypted((unsigned long)kbuffer,
+  HVPFN_UP(size));
+   if (ret) {
+   pr_warn("Failed to set host visibility.\n");
+   return ret;
+   }
+
init_completion(>waitevent);
msginfo->waiting_channel = channel;
 
@@ -539,6 +554,10 @@ static int __vmbus_establish_gpadl(struct vmbus_channel 
*channel,
/* At this point, we received the gpadl created msg */
*gpadl_handle = gpadlmsg->gpadl;
 
+   channel->gpadl_array[index].size = size;
+   channel->gpadl_array[index].buffer = kbuffer;
+   channel->gpadl_array[index].gpadlhandle = *gpadl_handle;
+
 cleanup:
spin_lock_irqsave(_connection.channelmsg_lock, flags);
list_del(>msglistentry);
@@ -549,6 +568,13 @@ static int __vmbus_establish_gpadl(struct vmbus_channel 
*channel,
}
 
kfree(msginfo);
+
+   if (ret) {
+   set_memory_encrypted((unsigned long)kbuffer,
+HVPFN_UP(size));
+   atomic_dec(>gpadl_index);
+   }
+
return ret;
 }
 
@@ -676,6 +702,7 @@ static int __vmbus_open(struct vmbus_channel *newchannel,
 
/* Establish the gpadl for the ring buffer */
newchannel->ringbuffer_gpadlhandle = 0;
+   atomic_set(>gpadl_index, 0);
 
err = __vmbus_establish_gpadl(newchannel, HV_GPADL_RING,
  page_address(newchannel->ringbuffer_page),
@@ -811,7 +838,7 @@ int vmbus_teardown_gpadl(struct vmbus_channel *channel, u32 
gpadl_handle)
struct vmbus_channel_gpadl_teardown *msg;
struct vmbus_channel_msginfo *info;
unsigned long flags;
-   int ret;
+   int ret, i;
 
info = kzalloc(sizeof(*info) +
   sizeof(struct vmbus_channel_gpadl_teardown), GFP_KERNEL);
@@ -859,6 +886,19 @@ int vmbus_teardown_gpadl(struct vmbus_channel *channel, 
u32 gpadl_handle)
spin_unlock_irqrestore(_connection.channelmsg_lock, flags);
 
kfree(info);
+
+   /* Find gpadl buffer virtual address and size. */
+   for (i = 0; i < VMBUS_GPADL_RANGE_COUNT; i++)
+   if (channel->gpadl_array[i].gpadlhandle == gpadl_handle)
+   break;
+
+   if (set_memory_encrypted((unsigned long)channel->gpadl_array[i].buffer,
+   HVPFN_UP(channel->gpadl_array[i].size)))
+   pr_warn("Fail to set mem host visibility.\n");
+
+   channel->gpadl_array[i].gpadlhandle = 0;
+   atomic_dec(>gpadl_index);
+
return ret;
 }
 EXPORT_SYMBOL_GPL(vmbus_teardown_gpadl);
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index ddc8713ce57b..90b542597143 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -803,6 +803,14 @@ struct vmbus_device {
 
 #define VMBUS_DEFAULT_MAX_PKT_SIZE 4096
 
+struct vmbus_gpadl {
+   u32 gpadlhandle;
+   u32 size;
+   void *buffer;
+};
+
+#define VMBUS_GPADL_RANGE_COUNT3
+
 struct vmbus_channel {
struct list_head listentry;
 
@@ -823,6 +831,9 @@ struct vmbus_channel {
struct completion rescind_event;
 
u32 ringbuffer_gpadlhandle;
+   /* GPADL_RING and Send/Receive GPADL_BUFFER. */
+   struct vmbus_gpadl gpadl_array[VMBUS_GPADL_RANGE_COUNT];
+   atomic_t gpadl_index;
 
/* Allocated memory for ring buffer */
struct page *ringbuffer_page;
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH V3 03/13] x86/HV: Add new hvcall guest address host visibility support

2021-08-09 Thread Tianyu Lan
From: Tianyu Lan 

Add new hvcall guest address host visibility support to mark
memory visible to host. Call it inside set_memory_decrypted
/encrypted(). Add HYPERVISOR feature check in the
hv_is_isolation_supported() to optimize in non-virtualization
environment.

Signed-off-by: Tianyu Lan 
---
Change since v2:
   * Rework __set_memory_enc_dec() and call Hyper-V and AMD function
 according to platform check.

Change since v1:
   * Use new staic call x86_set_memory_enc to avoid add Hyper-V
 specific check in the set_memory code.
---
 arch/x86/hyperv/Makefile   |   2 +-
 arch/x86/hyperv/hv_init.c  |   6 ++
 arch/x86/hyperv/ivm.c  | 114 +
 arch/x86/include/asm/hyperv-tlfs.h |  20 +
 arch/x86/include/asm/mshyperv.h|   4 +-
 arch/x86/mm/pat/set_memory.c   |  19 +++--
 include/asm-generic/hyperv-tlfs.h  |   1 +
 include/asm-generic/mshyperv.h |   1 +
 8 files changed, 160 insertions(+), 7 deletions(-)
 create mode 100644 arch/x86/hyperv/ivm.c

diff --git a/arch/x86/hyperv/Makefile b/arch/x86/hyperv/Makefile
index 48e2c51464e8..5d2de10809ae 100644
--- a/arch/x86/hyperv/Makefile
+++ b/arch/x86/hyperv/Makefile
@@ -1,5 +1,5 @@
 # SPDX-License-Identifier: GPL-2.0-only
-obj-y  := hv_init.o mmu.o nested.o irqdomain.o
+obj-y  := hv_init.o mmu.o nested.o irqdomain.o ivm.o
 obj-$(CONFIG_X86_64)   += hv_apic.o hv_proc.o
 
 ifdef CONFIG_X86_64
diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 0bb4d9ca7a55..b3683083208a 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -607,6 +607,12 @@ EXPORT_SYMBOL_GPL(hv_get_isolation_type);
 
 bool hv_is_isolation_supported(void)
 {
+   if (!cpu_feature_enabled(X86_FEATURE_HYPERVISOR))
+   return 0;
+
+   if (!hypervisor_is_type(X86_HYPER_MS_HYPERV))
+   return 0;
+
return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
 }
 
diff --git a/arch/x86/hyperv/ivm.c b/arch/x86/hyperv/ivm.c
new file mode 100644
index ..8c905ffdba7f
--- /dev/null
+++ b/arch/x86/hyperv/ivm.c
@@ -0,0 +1,114 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Hyper-V Isolation VM interface with paravisor and hypervisor
+ *
+ * Author:
+ *  Tianyu Lan 
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/*
+ * hv_mark_gpa_visibility - Set pages visible to host via hvcall.
+ *
+ * In Isolation VM, all guest memory is encripted from host and guest
+ * needs to set memory visible to host via hvcall before sharing memory
+ * with host.
+ */
+int hv_mark_gpa_visibility(u16 count, const u64 pfn[],
+  enum hv_mem_host_visibility visibility)
+{
+   struct hv_gpa_range_for_visibility **input_pcpu, *input;
+   u16 pages_processed;
+   u64 hv_status;
+   unsigned long flags;
+
+   /* no-op if partition isolation is not enabled */
+   if (!hv_is_isolation_supported())
+   return 0;
+
+   if (count > HV_MAX_MODIFY_GPA_REP_COUNT) {
+   pr_err("Hyper-V: GPA count:%d exceeds supported:%lu\n", count,
+   HV_MAX_MODIFY_GPA_REP_COUNT);
+   return -EINVAL;
+   }
+
+   local_irq_save(flags);
+   input_pcpu = (struct hv_gpa_range_for_visibility **)
+   this_cpu_ptr(hyperv_pcpu_input_arg);
+   input = *input_pcpu;
+   if (unlikely(!input)) {
+   local_irq_restore(flags);
+   return -EINVAL;
+   }
+
+   input->partition_id = HV_PARTITION_ID_SELF;
+   input->host_visibility = visibility;
+   input->reserved0 = 0;
+   input->reserved1 = 0;
+   memcpy((void *)input->gpa_page_list, pfn, count * sizeof(*pfn));
+   hv_status = hv_do_rep_hypercall(
+   HVCALL_MODIFY_SPARSE_GPA_PAGE_HOST_VISIBILITY, count,
+   0, input, _processed);
+   local_irq_restore(flags);
+
+   if (!(hv_status & HV_HYPERCALL_RESULT_MASK))
+   return 0;
+
+   return hv_status & HV_HYPERCALL_RESULT_MASK;
+}
+EXPORT_SYMBOL(hv_mark_gpa_visibility);
+
+static int __hv_set_mem_host_visibility(void *kbuffer, int pagecount,
+ enum hv_mem_host_visibility visibility)
+{
+   u64 *pfn_array;
+   int ret = 0;
+   int i, pfn;
+
+   if (!hv_is_isolation_supported() || !ms_hyperv.ghcb_base)
+   return 0;
+
+   pfn_array = kzalloc(HV_HYP_PAGE_SIZE, GFP_KERNEL);
+   if (!pfn_array)
+   return -ENOMEM;
+
+   for (i = 0, pfn = 0; i < pagecount; i++) {
+   pfn_array[pfn] = virt_to_hvpfn(kbuffer + i * HV_HYP_PAGE_SIZE);
+   pfn++;
+
+   if (pfn == HV_MAX_MODIFY_GPA_REP_COUNT || i == pagecount - 1) {
+   ret |= hv_mark_gpa_visibility(pfn, pfn_array,
+   visibility);
+   pfn = 0;
+
+  

[PATCH V3 02/13] x86/HV: Initialize shared memory boundary in the Isolation VM.

2021-08-09 Thread Tianyu Lan
From: Tianyu Lan 

Hyper-V exposes shared memory boundary via cpuid
HYPERV_CPUID_ISOLATION_CONFIG and store it in the
shared_gpa_boundary of ms_hyperv struct. This prepares
to share memory with host for SNP guest.

Signed-off-by: Tianyu Lan 
---
 arch/x86/kernel/cpu/mshyperv.c |  2 ++
 include/asm-generic/mshyperv.h | 12 +++-
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index 6b5835a087a3..2b7f396ef1a5 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -313,6 +313,8 @@ static void __init ms_hyperv_init_platform(void)
if (ms_hyperv.priv_high & HV_ISOLATION) {
ms_hyperv.isolation_config_a = 
cpuid_eax(HYPERV_CPUID_ISOLATION_CONFIG);
ms_hyperv.isolation_config_b = 
cpuid_ebx(HYPERV_CPUID_ISOLATION_CONFIG);
+   ms_hyperv.shared_gpa_boundary =
+   (u64)1 << ms_hyperv.shared_gpa_boundary_bits;
 
pr_info("Hyper-V: Isolation Config: Group A 0x%x, Group B 
0x%x\n",
ms_hyperv.isolation_config_a, 
ms_hyperv.isolation_config_b);
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index 4269f3174e58..aa26d24a5ca9 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -35,8 +35,18 @@ struct ms_hyperv_info {
u32 max_vp_index;
u32 max_lp_index;
u32 isolation_config_a;
-   u32 isolation_config_b;
+   union {
+   u32 isolation_config_b;
+   struct {
+   u32 cvm_type : 4;
+   u32 Reserved11 : 1;
+   u32 shared_gpa_boundary_active : 1;
+   u32 shared_gpa_boundary_bits : 6;
+   u32 Reserved12 : 20;
+   };
+   };
void  __percpu **ghcb_base;
+   u64 shared_gpa_boundary;
 };
 extern struct ms_hyperv_info ms_hyperv;
 
-- 
2.25.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH V3 01/13] x86/HV: Initialize GHCB page in Isolation VM

2021-08-09 Thread Tianyu Lan
From: Tianyu Lan 

Hyper-V exposes GHCB page via SEV ES GHCB MSR for SNP guest
to communicate with hypervisor. Map GHCB page for all
cpus to read/write MSR register and submit hvcall request
via GHCB.

Signed-off-by: Tianyu Lan 
---
 arch/x86/hyperv/hv_init.c   | 66 +++--
 arch/x86/include/asm/mshyperv.h |  2 +
 include/asm-generic/mshyperv.h  |  2 +
 3 files changed, 66 insertions(+), 4 deletions(-)

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 708a2712a516..0bb4d9ca7a55 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -20,6 +20,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -42,6 +43,31 @@ static void *hv_hypercall_pg_saved;
 struct hv_vp_assist_page **hv_vp_assist_page;
 EXPORT_SYMBOL_GPL(hv_vp_assist_page);
 
+static int hyperv_init_ghcb(void)
+{
+   u64 ghcb_gpa;
+   void *ghcb_va;
+   void **ghcb_base;
+
+   if (!ms_hyperv.ghcb_base)
+   return -EINVAL;
+
+   /*
+* GHCB page is allocated by paravisor. The address
+* returned by MSR_AMD64_SEV_ES_GHCB is above shared
+* ghcb boundary and map it here.
+*/
+   rdmsrl(MSR_AMD64_SEV_ES_GHCB, ghcb_gpa);
+   ghcb_va = memremap(ghcb_gpa, HV_HYP_PAGE_SIZE, MEMREMAP_WB);
+   if (!ghcb_va)
+   return -ENOMEM;
+
+   ghcb_base = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+   *ghcb_base = ghcb_va;
+
+   return 0;
+}
+
 static int hv_cpu_init(unsigned int cpu)
 {
union hv_vp_assist_msr_contents msr = { 0 };
@@ -85,6 +111,8 @@ static int hv_cpu_init(unsigned int cpu)
}
}
 
+   hyperv_init_ghcb();
+
return 0;
 }
 
@@ -177,6 +205,14 @@ static int hv_cpu_die(unsigned int cpu)
 {
struct hv_reenlightenment_control re_ctrl;
unsigned int new_cpu;
+   void **ghcb_va = NULL;
+
+   if (ms_hyperv.ghcb_base) {
+   ghcb_va = (void **)this_cpu_ptr(ms_hyperv.ghcb_base);
+   if (*ghcb_va)
+   memunmap(*ghcb_va);
+   *ghcb_va = NULL;
+   }
 
hv_common_cpu_die(cpu);
 
@@ -383,9 +419,19 @@ void __init hyperv_init(void)
VMALLOC_END, GFP_KERNEL, PAGE_KERNEL_ROX,
VM_FLUSH_RESET_PERMS, NUMA_NO_NODE,
__builtin_return_address(0));
-   if (hv_hypercall_pg == NULL) {
-   wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
-   goto remove_cpuhp_state;
+   if (hv_hypercall_pg == NULL)
+   goto clean_guest_os_id;
+
+   if (hv_isolation_type_snp()) {
+   ms_hyperv.ghcb_base = alloc_percpu(void *);
+   if (!ms_hyperv.ghcb_base)
+   goto clean_guest_os_id;
+
+   if (hyperv_init_ghcb()) {
+   free_percpu(ms_hyperv.ghcb_base);
+   ms_hyperv.ghcb_base = NULL;
+   goto clean_guest_os_id;
+   }
}
 
rdmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
@@ -456,7 +502,8 @@ void __init hyperv_init(void)
hv_query_ext_cap(0);
return;
 
-remove_cpuhp_state:
+clean_guest_os_id:
+   wrmsrl(HV_X64_MSR_GUEST_OS_ID, 0);
cpuhp_remove_state(cpuhp);
 free_vp_assist_page:
kfree(hv_vp_assist_page);
@@ -484,6 +531,9 @@ void hyperv_cleanup(void)
 */
hv_hypercall_pg = NULL;
 
+   if (ms_hyperv.ghcb_base)
+   free_percpu(ms_hyperv.ghcb_base);
+
/* Reset the hypercall page */
hypercall_msr.as_uint64 = 0;
wrmsrl(HV_X64_MSR_HYPERCALL, hypercall_msr.as_uint64);
@@ -559,3 +609,11 @@ bool hv_is_isolation_supported(void)
 {
return hv_get_isolation_type() != HV_ISOLATION_TYPE_NONE;
 }
+
+DEFINE_STATIC_KEY_FALSE(isolation_type_snp);
+
+bool hv_isolation_type_snp(void)
+{
+   return static_branch_unlikely(_type_snp);
+}
+EXPORT_SYMBOL_GPL(hv_isolation_type_snp);
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index adccbc209169..6627cfd2bfba 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -11,6 +11,8 @@
 #include 
 #include 
 
+DECLARE_STATIC_KEY_FALSE(isolation_type_snp);
+
 typedef int (*hyperv_fill_flush_list_func)(
struct hv_guest_mapping_flush_list *flush,
void *data);
diff --git a/include/asm-generic/mshyperv.h b/include/asm-generic/mshyperv.h
index c1ab6a6e72b5..4269f3174e58 100644
--- a/include/asm-generic/mshyperv.h
+++ b/include/asm-generic/mshyperv.h
@@ -36,6 +36,7 @@ struct ms_hyperv_info {
u32 max_lp_index;
u32 isolation_config_a;
u32 isolation_config_b;
+   void  __percpu **ghcb_base;
 };
 extern struct ms_hyperv_info ms_hyperv;
 
@@ -237,6 +238,7 @@ bool hv_is_hyperv_initialized(void);
 bool hv_is_hibernation_supported(void);
 enum hv_isolation_type hv_get_isolation_type(void);
 bool 

[PATCH V3 00/13] x86/Hyper-V: Add Hyper-V Isolation VM support

2021-08-09 Thread Tianyu Lan
From: Tianyu Lan 

Hyper-V provides two kinds of Isolation VMs. VBS(Virtualization-based
security) and AMD SEV-SNP unenlightened Isolation VMs. This patchset
is to add support for these Isolation VM support in Linux.

The memory of these vms are encrypted and host can't access guest
memory directly. Hyper-V provides new host visibility hvcall and
the guest needs to call new hvcall to mark memory visible to host
before sharing memory with host. For security, all network/storage
stack memory should not be shared with host and so there is bounce
buffer requests.

Vmbus channel ring buffer already plays bounce buffer role because
all data from/to host needs to copy from/to between the ring buffer
and IO stack memory. So mark vmbus channel ring buffer visible.

There are two exceptions - packets sent by vmbus_sendpacket_
pagebuffer() and vmbus_sendpacket_mpb_desc(). These packets
contains IO stack memory address and host will access these memory.
So add allocation bounce buffer support in vmbus for these packets.

For SNP isolation VM, guest needs to access the shared memory via
extra address space which is specified by Hyper-V CPUID HYPERV_CPUID_
ISOLATION_CONFIG. The access physical address of the shared memory
should be bounce buffer memory GPA plus with shared_gpa_boundary
reported by CPUID.


Change since V2:
   - Drop x86_set_memory_enc static call and use platform check
 in the __set_memory_enc_dec() to run platform callback of
 set memory encrypted or decrypted.

Change since V1:
   - Introduce x86_set_memory_enc static call and so platforms can
 override __set_memory_enc_dec() with their implementation
   - Introduce sev_es_ghcb_hv_call_simple() and share code
 between SEV and Hyper-V code.
   - Not remap monitor pages in the non-SNP isolation VM
   - Make swiotlb_init_io_tlb_mem() return error code and return
 error when dma_map_decrypted() fails.

Change since RFC V4:
   - Introduce dma map decrypted function to remap bounce buffer
  and provide dma map decrypted ops for platform to hook callback.  
  
   - Split swiotlb and dma map decrypted change into two patches
   - Replace vstart with vaddr in swiotlb changes.

Change since RFC v3:
   - Add interface set_memory_decrypted_map() to decrypt memory and
 map bounce buffer in extra address space
   - Remove swiotlb remap function and store the remap address
 returned by set_memory_decrypted_map() in swiotlb mem data structure.
   - Introduce hv_set_mem_enc() to make code more readable in the 
__set_memory_enc_dec().

Change since RFC v2:
   - Remove not UIO driver in Isolation VM patch
   - Use vmap_pfn() to replace ioremap_page_range function in
   order to avoid exposing symbol ioremap_page_range() and
   ioremap_page_range()
   - Call hv set mem host visibility hvcall in 
set_memory_encrypted/decrypted()
   - Enable swiotlb force mode instead of adding Hyper-V dma map/unmap hook
   - Fix code style


Tianyu Lan (13):
  x86/HV: Initialize GHCB page in Isolation VM
  x86/HV: Initialize shared memory boundary in the Isolation VM.
  x86/HV: Add new hvcall guest address host visibility support
  HV: Mark vmbus ring buffer visible to host in Isolation VM
  HV: Add Write/Read MSR registers via ghcb page
  HV: Add ghcb hvcall support for SNP VM
  HV/Vmbus: Add SNP support for VMbus channel initiate message
  HV/Vmbus: Initialize VMbus ring buffer for Isolation VM
  DMA: Add dma_map_decrypted/dma_unmap_encrypted() function
  x86/Swiotlb: Add Swiotlb bounce buffer remap function for HV IVM
  HV/IOMMU: Enable swiotlb bounce buffer for Isolation VM
  HV/Netvsc: Add Isolation VM support for netvsc driver
  HV/Storvsc: Add Isolation VM support for storvsc driver

 arch/x86/hyperv/Makefile   |   2 +-
 arch/x86/hyperv/hv_init.c  |  75 ++--
 arch/x86/hyperv/ivm.c  | 295 +
 arch/x86/include/asm/hyperv-tlfs.h |  20 ++
 arch/x86/include/asm/mshyperv.h|  87 -
 arch/x86/include/asm/sev.h |   3 +
 arch/x86/kernel/cpu/mshyperv.c |   5 +
 arch/x86/kernel/sev-shared.c   |  63 +++---
 arch/x86/mm/pat/set_memory.c   |  19 +-
 arch/x86/xen/pci-swiotlb-xen.c |   3 +-
 drivers/hv/Kconfig |   1 +
 drivers/hv/channel.c   |  54 +-
 drivers/hv/connection.c|  71 ++-
 drivers/hv/hv.c| 129 +
 drivers/hv/hyperv_vmbus.h  |   3 +
 drivers/hv/ring_buffer.c   |  84 ++--
 drivers/hv/vmbus_drv.c |   3 +
 drivers/iommu/hyperv-iommu.c   |  65 +++
 drivers/net/hyperv/hyperv_net.h|   6 +
 drivers/net/hyperv/netvsc.c| 144 +-
 drivers/net/hyperv/rndis_filter.c  |   2 +
 drivers/scsi/storvsc_drv.c |  68 ++-
 include/asm-generic/hyperv-tlfs.h  |   1 +
 include/asm-generic/mshyperv.h |  54 +-
 

Re: [Freedreno] [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache

2021-08-09 Thread Sai Prakash Ranjan

On 2021-08-09 23:10, Will Deacon wrote:

On Mon, Aug 09, 2021 at 10:18:21AM -0700, Rob Clark wrote:

On Mon, Aug 9, 2021 at 10:05 AM Will Deacon  wrote:
>
> On Mon, Aug 09, 2021 at 09:57:08AM -0700, Rob Clark wrote:
> > On Mon, Aug 9, 2021 at 7:56 AM Will Deacon  wrote:
> > > On Mon, Aug 02, 2021 at 06:36:04PM -0700, Rob Clark wrote:
> > > > On Mon, Aug 2, 2021 at 8:14 AM Will Deacon  wrote:
> > > > > On Mon, Aug 02, 2021 at 08:08:07AM -0700, Rob Clark wrote:
> > > > > > On Mon, Aug 2, 2021 at 3:55 AM Will Deacon  wrote:
> > > > > > > On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash Ranjan 
wrote:
> > > > > > > > On 2021-07-28 19:30, Georgi Djakov wrote:
> > > > > > > > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash Ranjan 
wrote:
> > > > > > > > > > commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY 
flag")
> > > > > > > > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along 
with it went
> > > > > > > > > > the memory type setting required for the non-coherent 
masters to use
> > > > > > > > > > system cache. Now that system cache support for GPU is 
added, we will
> > > > > > > > > > need to set the right PTE attribute for GPU buffers to be 
sys cached.
> > > > > > > > > > Without this, the system cache lines are not allocated for 
GPU.
> > > > > > > > > >
> > > > > > > > > > So the patches in this series introduces a new prot flag 
IOMMU_LLC,
> > > > > > > > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to 
IO_PGTABLE_QUIRK_PTW_LLC
> > > > > > > > > > and makes GPU the user of this protection flag.
> > > > > > > > >
> > > > > > > > > Thank you for the patchset! Are you planning to refresh it, 
as it does
> > > > > > > > > not apply anymore?
> > > > > > > > >
> > > > > > > >
> > > > > > > > I was waiting on Will's reply [1]. If there are no changes 
needed, then
> > > > > > > > I can repost the patch.
> > > > > > >
> > > > > > > I still think you need to handle the mismatched alias, no? You're 
adding
> > > > > > > a new memory type to the SMMU which doesn't exist on the CPU 
side. That
> > > > > > > can't be right.
> > > > > > >
> > > > > >
> > > > > > Just curious, and maybe this is a dumb question, but what is your
> > > > > > concern about mismatched aliases?  I mean the cache hierarchy on the
> > > > > > GPU device side (anything beyond the LLC) is pretty different and
> > > > > > doesn't really care about the smmu pgtable attributes..
> > > > >
> > > > > If the CPU accesses a shared buffer with different attributes to 
those which
> > > > > the device is using then you fall into the "mismatched memory 
attributes"
> > > > > part of the Arm architecture. It's reasonably unforgiving (you should 
go and
> > > > > read it) and in some cases can apply to speculative accesses as well, 
but
> > > > > the end result is typically loss of coherency.
> > > >
> > > > Ok, I might have a few other sections to read first to decipher the
> > > > terminology..
> > > >
> > > > But my understanding of LLC is that it looks just like system memory
> > > > to the CPU and GPU (I think that would make it "the point of
> > > > coherence" between the GPU and CPU?)  If that is true, shouldn't it be
> > > > invisible from the point of view of different CPU mapping options?
> > >
> > > You could certainly build a system where mismatched attributes don't cause
> > > loss of coherence, but as it's not guaranteed by the architecture and the
> > > changes proposed here affect APIs which are exposed across SoCs, then I
> > > don't think it helps much.
> > >
> >
> > Hmm, the description of the new mapping flag is that it applies only
> > to transparent outer level cache:
> >
> > +/*
> > + * Non-coherent masters can use this page protection flag to set cacheable
> > + * memory attributes for only a transparent outer level of cache, also 
known as
> > + * the last-level or system cache.
> > + */
> > +#define IOMMU_LLC  (1 << 6)
> >
> > But I suppose we could call it instead IOMMU_QCOM_LLC or something
> > like that to make it more clear that it is not necessarily something
> > that would work with a different outer level cache implementation?
>
> ... or we could just deal with the problem so that other people can reuse
> the code. I haven't really understood the reluctance to solve this properly.
>
> Am I missing some reason this isn't solvable?

Oh, was there another way to solve it (other than foregoing setting
INC_OCACHE in the pgtables)?  Maybe I misunderstood, is there a
corresponding setting on the MMU pgtables side of things?


Right -- we just need to program the CPU's MMU with the matching memory
attributes! It's a bit more fiddly if you're just using ioremap_wc()
though, as it's usually the DMA API which handles the attributes under 
the

hood.

Anyway, sorry, I should've said that explicitly earlier on. We've done 
this
sort of thing in the Android tree so I assumed Sai knew what needed to 
be

done and then I didn't think to explain to you :(



Right I was aware of that but even in 

Re: [RFC PATCH 2/3] iommu/dma-iommu: Support iovad->granule > PAGE_SIZE

2021-08-09 Thread Robin Murphy

On 2021-08-07 12:47, Sven Peter via iommu wrote:



On Fri, Aug 6, 2021, at 20:04, Robin Murphy wrote:

On 2021-08-06 16:55, Sven Peter via iommu wrote:

@@ -1006,6 +1019,31 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
if (dev_is_untrusted(dev))
return iommu_dma_map_sg_swiotlb(dev, sg, nents, dir, attrs);
   
+	/*

+* If the IOMMU pagesize is larger than the CPU pagesize we will
+* very likely run into sgs with a physical address that is not aligned
+* to an IOMMU page boundary. Fall back to just mapping every entry
+* independently with __iommu_dma_map then.


Scatterlist segments often don't have nicely aligned ends, which is why
we already align things to IOVA granules in main loop here. I think in
principle we'd just need to move the non-IOVA-aligned part of the
address from sg->page to sg->offset in the temporary transformation for
the rest of the assumptions to hold. I don't blame you for being timid
about touching that, though - it took me 3 tries to get right when I
first wrote it...




I've spent some time with that code now and I think we cannot use it
but have to fall back to iommu_dma_map_sg_swiotlb (even though that swiotlb
part is a lie then):

When we have sg_phys(s) = 0x802e65000 with s->offset = 0 the paddr
is aligned to PAGE_SIZE but has an offset of 0x1000 from something
the IOMMU can map.
Now this would result in s->offset = -0x1000 which is already weird
enough.
Offset is unsigned (and 32bit) so this will actually look like
s->offset = 0xf000 then, which isn't much better.
And then sg_phys(s) = 0x902e64000 (instead of 0x802e64000) and
we'll map some random memory in iommu_map_sg_atomic and a little bit later
everything explodes.

Now I could probably adjust the phys addr backwards and make sure offset is
always positive (and possibly larger than PAGE_SIZE) and later restore it
in __finalise_sg then but I feel like that's pushing this a little bit too far.


Yes, that's what I meant. At a quick guess, something like the
completely untested diff below. It really comes down to what we want to
achieve here - if it's just to make this thing work at all, then I'd
favour bolting on the absolute minimum changes, possibly even cheating
by tainting the kernel and saying all bets are off instead of trying to
handle the more involved corners really properly. However if you want to
work towards this being a properly-supported thing, then I think it's
worth generalising the existing assumptions of page alignment from the
beginning.

Robin.

->8-
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 58cc7297aab5..73aeaa8bfc73 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -895,8 +895,8 @@ static int __finalise_sg(struct device *dev, struct 
scatterlist *sg, int nents,
unsigned int s_length = sg_dma_len(s);
unsigned int s_iova_len = s->length;
 
-		s->offset += s_iova_off;

-   s->length = s_length;
+   sg_set_page(s, phys_to_page(sg_phys(s)), s_length,
+   s_iova_off & ~PAGE_MASK);
sg_dma_address(s) = DMA_MAPPING_ERROR;
sg_dma_len(s) = 0;
 
@@ -940,10 +940,9 @@ static void __invalidate_sg(struct scatterlist *sg, int nents)

int i;
 
 	for_each_sg(sg, s, nents, i) {

-   if (sg_dma_address(s) != DMA_MAPPING_ERROR)
-   s->offset += sg_dma_address(s);
if (sg_dma_len(s))
-   s->length = sg_dma_len(s);
+   sg_set_page(s, phys_to_page(sg_phys(s)), sg_dma_len(s),
+   sg_dma_address(s) & ~PAGE_MASK);
sg_dma_address(s) = DMA_MAPPING_ERROR;
sg_dma_len(s) = 0;
}
@@ -1019,15 +1018,16 @@ static int iommu_dma_map_sg(struct device *dev, struct 
scatterlist *sg,
 * stashing the unaligned parts in the as-yet-unused DMA fields.
 */
for_each_sg(sg, s, nents, i) {
-   size_t s_iova_off = iova_offset(iovad, s->offset);
+   phys_addr_t s_phys = sg_phys(s);
+   size_t s_iova_off = iova_offset(iovad, s_phys);
size_t s_length = s->length;
size_t pad_len = (mask - iova_len + 1) & mask;
 
 		sg_dma_address(s) = s_iova_off;

sg_dma_len(s) = s_length;
-   s->offset -= s_iova_off;
s_length = iova_align(iovad, s_length + s_iova_off);
-   s->length = s_length;
+   sg_set_page(s, phys_to_page(s_phys - s_iova_off),
+   s_length, s->offset ^ s_iova_off);
 
 		/*

 * Due to the alignment of our single IOVA allocation, we can
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Freedreno] [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache

2021-08-09 Thread Will Deacon
On Mon, Aug 09, 2021 at 10:18:21AM -0700, Rob Clark wrote:
> On Mon, Aug 9, 2021 at 10:05 AM Will Deacon  wrote:
> >
> > On Mon, Aug 09, 2021 at 09:57:08AM -0700, Rob Clark wrote:
> > > On Mon, Aug 9, 2021 at 7:56 AM Will Deacon  wrote:
> > > > On Mon, Aug 02, 2021 at 06:36:04PM -0700, Rob Clark wrote:
> > > > > On Mon, Aug 2, 2021 at 8:14 AM Will Deacon  wrote:
> > > > > > On Mon, Aug 02, 2021 at 08:08:07AM -0700, Rob Clark wrote:
> > > > > > > On Mon, Aug 2, 2021 at 3:55 AM Will Deacon  
> > > > > > > wrote:
> > > > > > > > On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash Ranjan 
> > > > > > > > wrote:
> > > > > > > > > On 2021-07-28 19:30, Georgi Djakov wrote:
> > > > > > > > > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash 
> > > > > > > > > > Ranjan wrote:
> > > > > > > > > > > commit ecd7274fb4cd ("iommu: Remove unused 
> > > > > > > > > > > IOMMU_SYS_CACHE_ONLY flag")
> > > > > > > > > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along 
> > > > > > > > > > > with it went
> > > > > > > > > > > the memory type setting required for the non-coherent 
> > > > > > > > > > > masters to use
> > > > > > > > > > > system cache. Now that system cache support for GPU is 
> > > > > > > > > > > added, we will
> > > > > > > > > > > need to set the right PTE attribute for GPU buffers to be 
> > > > > > > > > > > sys cached.
> > > > > > > > > > > Without this, the system cache lines are not allocated 
> > > > > > > > > > > for GPU.
> > > > > > > > > > >
> > > > > > > > > > > So the patches in this series introduces a new prot flag 
> > > > > > > > > > > IOMMU_LLC,
> > > > > > > > > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to 
> > > > > > > > > > > IO_PGTABLE_QUIRK_PTW_LLC
> > > > > > > > > > > and makes GPU the user of this protection flag.
> > > > > > > > > >
> > > > > > > > > > Thank you for the patchset! Are you planning to refresh it, 
> > > > > > > > > > as it does
> > > > > > > > > > not apply anymore?
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > I was waiting on Will's reply [1]. If there are no changes 
> > > > > > > > > needed, then
> > > > > > > > > I can repost the patch.
> > > > > > > >
> > > > > > > > I still think you need to handle the mismatched alias, no? 
> > > > > > > > You're adding
> > > > > > > > a new memory type to the SMMU which doesn't exist on the CPU 
> > > > > > > > side. That
> > > > > > > > can't be right.
> > > > > > > >
> > > > > > >
> > > > > > > Just curious, and maybe this is a dumb question, but what is your
> > > > > > > concern about mismatched aliases?  I mean the cache hierarchy on 
> > > > > > > the
> > > > > > > GPU device side (anything beyond the LLC) is pretty different and
> > > > > > > doesn't really care about the smmu pgtable attributes..
> > > > > >
> > > > > > If the CPU accesses a shared buffer with different attributes to 
> > > > > > those which
> > > > > > the device is using then you fall into the "mismatched memory 
> > > > > > attributes"
> > > > > > part of the Arm architecture. It's reasonably unforgiving (you 
> > > > > > should go and
> > > > > > read it) and in some cases can apply to speculative accesses as 
> > > > > > well, but
> > > > > > the end result is typically loss of coherency.
> > > > >
> > > > > Ok, I might have a few other sections to read first to decipher the
> > > > > terminology..
> > > > >
> > > > > But my understanding of LLC is that it looks just like system memory
> > > > > to the CPU and GPU (I think that would make it "the point of
> > > > > coherence" between the GPU and CPU?)  If that is true, shouldn't it be
> > > > > invisible from the point of view of different CPU mapping options?
> > > >
> > > > You could certainly build a system where mismatched attributes don't 
> > > > cause
> > > > loss of coherence, but as it's not guaranteed by the architecture and 
> > > > the
> > > > changes proposed here affect APIs which are exposed across SoCs, then I
> > > > don't think it helps much.
> > > >
> > >
> > > Hmm, the description of the new mapping flag is that it applies only
> > > to transparent outer level cache:
> > >
> > > +/*
> > > + * Non-coherent masters can use this page protection flag to set 
> > > cacheable
> > > + * memory attributes for only a transparent outer level of cache, also 
> > > known as
> > > + * the last-level or system cache.
> > > + */
> > > +#define IOMMU_LLC  (1 << 6)
> > >
> > > But I suppose we could call it instead IOMMU_QCOM_LLC or something
> > > like that to make it more clear that it is not necessarily something
> > > that would work with a different outer level cache implementation?
> >
> > ... or we could just deal with the problem so that other people can reuse
> > the code. I haven't really understood the reluctance to solve this properly.
> >
> > Am I missing some reason this isn't solvable?
> 
> Oh, was there another way to solve it (other than foregoing setting
> INC_OCACHE in the pgtables)?  Maybe I misunderstood, is there a
> corresponding 

Re: [Freedreno] [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache

2021-08-09 Thread Rob Clark
On Mon, Aug 9, 2021 at 10:05 AM Will Deacon  wrote:
>
> On Mon, Aug 09, 2021 at 09:57:08AM -0700, Rob Clark wrote:
> > On Mon, Aug 9, 2021 at 7:56 AM Will Deacon  wrote:
> > > On Mon, Aug 02, 2021 at 06:36:04PM -0700, Rob Clark wrote:
> > > > On Mon, Aug 2, 2021 at 8:14 AM Will Deacon  wrote:
> > > > > On Mon, Aug 02, 2021 at 08:08:07AM -0700, Rob Clark wrote:
> > > > > > On Mon, Aug 2, 2021 at 3:55 AM Will Deacon  wrote:
> > > > > > > On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash Ranjan 
> > > > > > > wrote:
> > > > > > > > On 2021-07-28 19:30, Georgi Djakov wrote:
> > > > > > > > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash Ranjan 
> > > > > > > > > wrote:
> > > > > > > > > > commit ecd7274fb4cd ("iommu: Remove unused 
> > > > > > > > > > IOMMU_SYS_CACHE_ONLY flag")
> > > > > > > > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along 
> > > > > > > > > > with it went
> > > > > > > > > > the memory type setting required for the non-coherent 
> > > > > > > > > > masters to use
> > > > > > > > > > system cache. Now that system cache support for GPU is 
> > > > > > > > > > added, we will
> > > > > > > > > > need to set the right PTE attribute for GPU buffers to be 
> > > > > > > > > > sys cached.
> > > > > > > > > > Without this, the system cache lines are not allocated for 
> > > > > > > > > > GPU.
> > > > > > > > > >
> > > > > > > > > > So the patches in this series introduces a new prot flag 
> > > > > > > > > > IOMMU_LLC,
> > > > > > > > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to 
> > > > > > > > > > IO_PGTABLE_QUIRK_PTW_LLC
> > > > > > > > > > and makes GPU the user of this protection flag.
> > > > > > > > >
> > > > > > > > > Thank you for the patchset! Are you planning to refresh it, 
> > > > > > > > > as it does
> > > > > > > > > not apply anymore?
> > > > > > > > >
> > > > > > > >
> > > > > > > > I was waiting on Will's reply [1]. If there are no changes 
> > > > > > > > needed, then
> > > > > > > > I can repost the patch.
> > > > > > >
> > > > > > > I still think you need to handle the mismatched alias, no? You're 
> > > > > > > adding
> > > > > > > a new memory type to the SMMU which doesn't exist on the CPU 
> > > > > > > side. That
> > > > > > > can't be right.
> > > > > > >
> > > > > >
> > > > > > Just curious, and maybe this is a dumb question, but what is your
> > > > > > concern about mismatched aliases?  I mean the cache hierarchy on the
> > > > > > GPU device side (anything beyond the LLC) is pretty different and
> > > > > > doesn't really care about the smmu pgtable attributes..
> > > > >
> > > > > If the CPU accesses a shared buffer with different attributes to 
> > > > > those which
> > > > > the device is using then you fall into the "mismatched memory 
> > > > > attributes"
> > > > > part of the Arm architecture. It's reasonably unforgiving (you should 
> > > > > go and
> > > > > read it) and in some cases can apply to speculative accesses as well, 
> > > > > but
> > > > > the end result is typically loss of coherency.
> > > >
> > > > Ok, I might have a few other sections to read first to decipher the
> > > > terminology..
> > > >
> > > > But my understanding of LLC is that it looks just like system memory
> > > > to the CPU and GPU (I think that would make it "the point of
> > > > coherence" between the GPU and CPU?)  If that is true, shouldn't it be
> > > > invisible from the point of view of different CPU mapping options?
> > >
> > > You could certainly build a system where mismatched attributes don't cause
> > > loss of coherence, but as it's not guaranteed by the architecture and the
> > > changes proposed here affect APIs which are exposed across SoCs, then I
> > > don't think it helps much.
> > >
> >
> > Hmm, the description of the new mapping flag is that it applies only
> > to transparent outer level cache:
> >
> > +/*
> > + * Non-coherent masters can use this page protection flag to set cacheable
> > + * memory attributes for only a transparent outer level of cache, also 
> > known as
> > + * the last-level or system cache.
> > + */
> > +#define IOMMU_LLC  (1 << 6)
> >
> > But I suppose we could call it instead IOMMU_QCOM_LLC or something
> > like that to make it more clear that it is not necessarily something
> > that would work with a different outer level cache implementation?
>
> ... or we could just deal with the problem so that other people can reuse
> the code. I haven't really understood the reluctance to solve this properly.
>
> Am I missing some reason this isn't solvable?
>

Oh, was there another way to solve it (other than foregoing setting
INC_OCACHE in the pgtables)?  Maybe I misunderstood, is there a
corresponding setting on the MMU pgtables side of things?

BR,
-R
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Freedreno] [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache

2021-08-09 Thread Will Deacon
On Mon, Aug 09, 2021 at 09:57:08AM -0700, Rob Clark wrote:
> On Mon, Aug 9, 2021 at 7:56 AM Will Deacon  wrote:
> > On Mon, Aug 02, 2021 at 06:36:04PM -0700, Rob Clark wrote:
> > > On Mon, Aug 2, 2021 at 8:14 AM Will Deacon  wrote:
> > > > On Mon, Aug 02, 2021 at 08:08:07AM -0700, Rob Clark wrote:
> > > > > On Mon, Aug 2, 2021 at 3:55 AM Will Deacon  wrote:
> > > > > > On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash Ranjan wrote:
> > > > > > > On 2021-07-28 19:30, Georgi Djakov wrote:
> > > > > > > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash Ranjan 
> > > > > > > > wrote:
> > > > > > > > > commit ecd7274fb4cd ("iommu: Remove unused 
> > > > > > > > > IOMMU_SYS_CACHE_ONLY flag")
> > > > > > > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with 
> > > > > > > > > it went
> > > > > > > > > the memory type setting required for the non-coherent masters 
> > > > > > > > > to use
> > > > > > > > > system cache. Now that system cache support for GPU is added, 
> > > > > > > > > we will
> > > > > > > > > need to set the right PTE attribute for GPU buffers to be sys 
> > > > > > > > > cached.
> > > > > > > > > Without this, the system cache lines are not allocated for 
> > > > > > > > > GPU.
> > > > > > > > >
> > > > > > > > > So the patches in this series introduces a new prot flag 
> > > > > > > > > IOMMU_LLC,
> > > > > > > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to 
> > > > > > > > > IO_PGTABLE_QUIRK_PTW_LLC
> > > > > > > > > and makes GPU the user of this protection flag.
> > > > > > > >
> > > > > > > > Thank you for the patchset! Are you planning to refresh it, as 
> > > > > > > > it does
> > > > > > > > not apply anymore?
> > > > > > > >
> > > > > > >
> > > > > > > I was waiting on Will's reply [1]. If there are no changes 
> > > > > > > needed, then
> > > > > > > I can repost the patch.
> > > > > >
> > > > > > I still think you need to handle the mismatched alias, no? You're 
> > > > > > adding
> > > > > > a new memory type to the SMMU which doesn't exist on the CPU side. 
> > > > > > That
> > > > > > can't be right.
> > > > > >
> > > > >
> > > > > Just curious, and maybe this is a dumb question, but what is your
> > > > > concern about mismatched aliases?  I mean the cache hierarchy on the
> > > > > GPU device side (anything beyond the LLC) is pretty different and
> > > > > doesn't really care about the smmu pgtable attributes..
> > > >
> > > > If the CPU accesses a shared buffer with different attributes to those 
> > > > which
> > > > the device is using then you fall into the "mismatched memory 
> > > > attributes"
> > > > part of the Arm architecture. It's reasonably unforgiving (you should 
> > > > go and
> > > > read it) and in some cases can apply to speculative accesses as well, 
> > > > but
> > > > the end result is typically loss of coherency.
> > >
> > > Ok, I might have a few other sections to read first to decipher the
> > > terminology..
> > >
> > > But my understanding of LLC is that it looks just like system memory
> > > to the CPU and GPU (I think that would make it "the point of
> > > coherence" between the GPU and CPU?)  If that is true, shouldn't it be
> > > invisible from the point of view of different CPU mapping options?
> >
> > You could certainly build a system where mismatched attributes don't cause
> > loss of coherence, but as it's not guaranteed by the architecture and the
> > changes proposed here affect APIs which are exposed across SoCs, then I
> > don't think it helps much.
> >
> 
> Hmm, the description of the new mapping flag is that it applies only
> to transparent outer level cache:
> 
> +/*
> + * Non-coherent masters can use this page protection flag to set cacheable
> + * memory attributes for only a transparent outer level of cache, also known 
> as
> + * the last-level or system cache.
> + */
> +#define IOMMU_LLC  (1 << 6)
> 
> But I suppose we could call it instead IOMMU_QCOM_LLC or something
> like that to make it more clear that it is not necessarily something
> that would work with a different outer level cache implementation?

... or we could just deal with the problem so that other people can reuse
the code. I haven't really understood the reluctance to solve this properly.

Am I missing some reason this isn't solvable?

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Freedreno] [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache

2021-08-09 Thread Rob Clark
On Mon, Aug 9, 2021 at 7:56 AM Will Deacon  wrote:
>
> On Mon, Aug 02, 2021 at 06:36:04PM -0700, Rob Clark wrote:
> > On Mon, Aug 2, 2021 at 8:14 AM Will Deacon  wrote:
> > >
> > > On Mon, Aug 02, 2021 at 08:08:07AM -0700, Rob Clark wrote:
> > > > On Mon, Aug 2, 2021 at 3:55 AM Will Deacon  wrote:
> > > > >
> > > > > On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash Ranjan wrote:
> > > > > > On 2021-07-28 19:30, Georgi Djakov wrote:
> > > > > > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash Ranjan 
> > > > > > > wrote:
> > > > > > > > commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY 
> > > > > > > > flag")
> > > > > > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it 
> > > > > > > > went
> > > > > > > > the memory type setting required for the non-coherent masters 
> > > > > > > > to use
> > > > > > > > system cache. Now that system cache support for GPU is added, 
> > > > > > > > we will
> > > > > > > > need to set the right PTE attribute for GPU buffers to be sys 
> > > > > > > > cached.
> > > > > > > > Without this, the system cache lines are not allocated for GPU.
> > > > > > > >
> > > > > > > > So the patches in this series introduces a new prot flag 
> > > > > > > > IOMMU_LLC,
> > > > > > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to 
> > > > > > > > IO_PGTABLE_QUIRK_PTW_LLC
> > > > > > > > and makes GPU the user of this protection flag.
> > > > > > >
> > > > > > > Thank you for the patchset! Are you planning to refresh it, as it 
> > > > > > > does
> > > > > > > not apply anymore?
> > > > > > >
> > > > > >
> > > > > > I was waiting on Will's reply [1]. If there are no changes needed, 
> > > > > > then
> > > > > > I can repost the patch.
> > > > >
> > > > > I still think you need to handle the mismatched alias, no? You're 
> > > > > adding
> > > > > a new memory type to the SMMU which doesn't exist on the CPU side. 
> > > > > That
> > > > > can't be right.
> > > > >
> > > >
> > > > Just curious, and maybe this is a dumb question, but what is your
> > > > concern about mismatched aliases?  I mean the cache hierarchy on the
> > > > GPU device side (anything beyond the LLC) is pretty different and
> > > > doesn't really care about the smmu pgtable attributes..
> > >
> > > If the CPU accesses a shared buffer with different attributes to those 
> > > which
> > > the device is using then you fall into the "mismatched memory attributes"
> > > part of the Arm architecture. It's reasonably unforgiving (you should go 
> > > and
> > > read it) and in some cases can apply to speculative accesses as well, but
> > > the end result is typically loss of coherency.
> >
> > Ok, I might have a few other sections to read first to decipher the
> > terminology..
> >
> > But my understanding of LLC is that it looks just like system memory
> > to the CPU and GPU (I think that would make it "the point of
> > coherence" between the GPU and CPU?)  If that is true, shouldn't it be
> > invisible from the point of view of different CPU mapping options?
>
> You could certainly build a system where mismatched attributes don't cause
> loss of coherence, but as it's not guaranteed by the architecture and the
> changes proposed here affect APIs which are exposed across SoCs, then I
> don't think it helps much.
>

Hmm, the description of the new mapping flag is that it applies only
to transparent outer level cache:

+/*
+ * Non-coherent masters can use this page protection flag to set cacheable
+ * memory attributes for only a transparent outer level of cache, also known as
+ * the last-level or system cache.
+ */
+#define IOMMU_LLC  (1 << 6)

But I suppose we could call it instead IOMMU_QCOM_LLC or something
like that to make it more clear that it is not necessarily something
that would work with a different outer level cache implementation?

BR,
-R
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 06/14] x86: Secure Launch main header file

2021-08-09 Thread Ross Philipson
Introduce the main Secure Launch header file used in the early SL stub
and the early setup code.

Signed-off-by: Ross Philipson 
---
 include/linux/slaunch.h | 532 
 1 file changed, 532 insertions(+)
 create mode 100644 include/linux/slaunch.h

diff --git a/include/linux/slaunch.h b/include/linux/slaunch.h
new file mode 100644
index 000..c125b67
--- /dev/null
+++ b/include/linux/slaunch.h
@@ -0,0 +1,532 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Main Secure Launch header file.
+ *
+ * Copyright (c) 2021, Oracle and/or its affiliates.
+ */
+
+#ifndef _LINUX_SLAUNCH_H
+#define _LINUX_SLAUNCH_H
+
+/*
+ * Secure Launch Defined State Flags
+ */
+#define SL_FLAG_ACTIVE 0x0001
+#define SL_FLAG_ARCH_SKINIT0x0002
+#define SL_FLAG_ARCH_TXT   0x0004
+
+/*
+ * Secure Launch CPU Type
+ */
+#define SL_CPU_AMD 1
+#define SL_CPU_INTEL   2
+
+#if IS_ENABLED(CONFIG_SECURE_LAUNCH)
+
+#define __SL32_CS  0x0008
+#define __SL32_DS  0x0010
+
+/*
+ * Intel Safer Mode Extensions (SMX)
+ *
+ * Intel SMX provides a programming interface to establish a Measured Launched
+ * Environment (MLE). The measurement and protection mechanisms supported by 
the
+ * capabilities of an Intel Trusted Execution Technology (TXT) platform. SMX is
+ * the processor’s programming interface in an Intel TXT platform.
+ *
+ * See Intel SDM Volume 2 - 6.1 "Safer Mode Extensions Reference"
+ */
+
+/*
+ * SMX GETSEC Leaf Functions
+ */
+#define SMX_X86_GETSEC_SEXIT   5
+#define SMX_X86_GETSEC_SMCTRL  7
+#define SMX_X86_GETSEC_WAKEUP  8
+
+/*
+ * Intel Trusted Execution Technology MMIO Registers Banks
+ */
+#define TXT_PUB_CONFIG_REGS_BASE   0xfed3
+#define TXT_PRIV_CONFIG_REGS_BASE  0xfed2
+#define TXT_NR_CONFIG_PAGES ((TXT_PUB_CONFIG_REGS_BASE - \
+ TXT_PRIV_CONFIG_REGS_BASE) >> PAGE_SHIFT)
+
+/*
+ * Intel Trusted Execution Technology (TXT) Registers
+ */
+#define TXT_CR_STS 0x
+#define TXT_CR_ESTS0x0008
+#define TXT_CR_ERRORCODE   0x0030
+#define TXT_CR_CMD_RESET   0x0038
+#define TXT_CR_CMD_CLOSE_PRIVATE   0x0048
+#define TXT_CR_DIDVID  0x0110
+#define TXT_CR_VER_EMIF0x0200
+#define TXT_CR_CMD_UNLOCK_MEM_CONFIG   0x0218
+#define TXT_CR_SINIT_BASE  0x0270
+#define TXT_CR_SINIT_SIZE  0x0278
+#define TXT_CR_MLE_JOIN0x0290
+#define TXT_CR_HEAP_BASE   0x0300
+#define TXT_CR_HEAP_SIZE   0x0308
+#define TXT_CR_SCRATCHPAD  0x0378
+#define TXT_CR_CMD_OPEN_LOCALITY1  0x0380
+#define TXT_CR_CMD_CLOSE_LOCALITY1 0x0388
+#define TXT_CR_CMD_OPEN_LOCALITY2  0x0390
+#define TXT_CR_CMD_CLOSE_LOCALITY2 0x0398
+#define TXT_CR_CMD_SECRETS 0x08e0
+#define TXT_CR_CMD_NO_SECRETS  0x08e8
+#define TXT_CR_E2STS   0x08f0
+
+/* TXT default register value */
+#define TXT_REGVALUE_ONE   0x1ULL
+
+/* TXTCR_STS status bits */
+#define TXT_SENTER_DONE_STS(1<<0)
+#define TXT_SEXIT_DONE_STS (1<<1)
+
+/*
+ * SINIT/MLE Capabilities Field Bit Definitions
+ */
+#define TXT_SINIT_MLE_CAP_WAKE_GETSEC  0
+#define TXT_SINIT_MLE_CAP_WAKE_MONITOR 1
+
+/*
+ * OS/MLE Secure Launch Specific Definitions
+ */
+#define TXT_OS_MLE_STRUCT_VERSION  1
+#define TXT_OS_MLE_MAX_VARIABLE_MTRRS  32
+
+/*
+ * TXT Heap Table Enumeration
+ */
+#define TXT_BIOS_DATA_TABLE1
+#define TXT_OS_MLE_DATA_TABLE  2
+#define TXT_OS_SINIT_DATA_TABLE3
+#define TXT_SINIT_MLE_DATA_TABLE   4
+#define TXT_SINIT_TABLE_MAXTXT_SINIT_MLE_DATA_TABLE
+
+/*
+ * Secure Launch Defined Error Codes used in MLE-initiated TXT resets.
+ *
+ * TXT Specification
+ * Appendix I ACM Error Codes
+ */
+#define SL_ERROR_GENERIC   0xc0008001
+#define SL_ERROR_TPM_INIT  0xc0008002
+#define SL_ERROR_TPM_INVALID_LOG20 0xc0008003
+#define SL_ERROR_TPM_LOGGING_FAILED0xc0008004
+#define SL_ERROR_REGION_STRADDLE_4GB   0xc0008005
+#define SL_ERROR_TPM_EXTEND0xc0008006
+#define SL_ERROR_MTRR_INV_VCNT 0xc0008007
+#define SL_ERROR_MTRR_INV_DEF_TYPE 0xc0008008
+#define SL_ERROR_MTRR_INV_BASE 0xc0008009
+#define SL_ERROR_MTRR_INV_MASK 0xc000800a
+#define SL_ERROR_MSR_INV_MISC_EN   0xc000800b
+#define SL_ERROR_INV_AP_INTERRUPT  0xc000800c
+#define SL_ERROR_INTEGER_OVERFLOW  0xc000800d
+#define SL_ERROR_HEAP_WALK 0xc000800e
+#define SL_ERROR_HEAP_MAP  0xc000800f
+#define SL_ERROR_REGION_ABOVE_4GB  0xc0008010
+#define SL_ERROR_HEAP_INVALID_DMAR 0xc0008011
+#define SL_ERROR_HEAP_DMAR_SIZE0xc0008012
+#define SL_ERROR_HEAP_DMAR_MAP 0xc0008013
+#define SL_ERROR_HI_PMR_BASE   0xc0008014
+#define SL_ERROR_HI_PMR_SIZE   0xc0008015
+#define 

[PATCH v3 12/14] reboot: Secure Launch SEXIT support on reboot paths

2021-08-09 Thread Ross Philipson
If the MLE kernel is being powered off, rebooted or halted,
then SEXIT must be called. Note that the SEXIT GETSEC leaf
can only be called after a machine_shutdown() has been done on
these paths. The machine_shutdown() is not called on a few paths
like when poweroff action does not have a poweroff callback (into
ACPI code) or when an emergency reset is done. In these cases,
just the TXT registers are finalized but SEXIT is skipped.

Signed-off-by: Ross Philipson 
---
 arch/x86/kernel/reboot.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/arch/x86/kernel/reboot.c b/arch/x86/kernel/reboot.c
index ebfb911..fe9d8cc 100644
--- a/arch/x86/kernel/reboot.c
+++ b/arch/x86/kernel/reboot.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -731,6 +732,7 @@ static void native_machine_restart(char *__unused)
 
if (!reboot_force)
machine_shutdown();
+   slaunch_finalize(!reboot_force);
__machine_emergency_restart(0);
 }
 
@@ -741,6 +743,9 @@ static void native_machine_halt(void)
 
tboot_shutdown(TB_SHUTDOWN_HALT);
 
+   /* SEXIT done after machine_shutdown() to meet TXT requirements */
+   slaunch_finalize(1);
+
stop_this_cpu(NULL);
 }
 
@@ -749,8 +754,12 @@ static void native_machine_power_off(void)
if (pm_power_off) {
if (!reboot_force)
machine_shutdown();
+   slaunch_finalize(!reboot_force);
pm_power_off();
+   } else {
+   slaunch_finalize(0);
}
+
/* A fallback in case there is no PM info available */
tboot_shutdown(TB_SHUTDOWN_HALT);
 }
@@ -778,6 +787,7 @@ void machine_shutdown(void)
 
 void machine_emergency_restart(void)
 {
+   slaunch_finalize(0);
__machine_emergency_restart(1);
 }
 
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 08/14] x86: Secure Launch kernel early boot stub

2021-08-09 Thread Ross Philipson
The Secure Launch (SL) stub provides the entry point for Intel TXT (and
later AMD SKINIT) to vector to during the late launch. The symbol
sl_stub_entry is that entry point and its offset into the kernel is
conveyed to the launching code using the MLE (Measured Launch
Environment) header in the structure named mle_header. The offset of the
MLE header is set in the kernel_info. The routine sl_stub contains the
very early late launch setup code responsible for setting up the basic
environment to allow the normal kernel startup_32 code to proceed. It is
also responsible for properly waking and handling the APs on Intel
platforms. The routine sl_main which runs after entering 64b mode is
responsible for measuring configuration and module information before
it is used like the boot params, the kernel command line, the TXT heap,
an external initramfs, etc.

Signed-off-by: Ross Philipson 
---
 Documentation/x86/boot.rst |  13 +
 arch/x86/boot/compressed/Makefile  |   3 +-
 arch/x86/boot/compressed/head_64.S |  37 ++
 arch/x86/boot/compressed/kernel_info.S |  34 ++
 arch/x86/boot/compressed/sl_main.c | 549 ++
 arch/x86/boot/compressed/sl_stub.S | 685 +
 arch/x86/kernel/asm-offsets.c  |  19 +
 7 files changed, 1339 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/boot/compressed/sl_main.c
 create mode 100644 arch/x86/boot/compressed/sl_stub.S

diff --git a/Documentation/x86/boot.rst b/Documentation/x86/boot.rst
index 894a198..2fbcb77 100644
--- a/Documentation/x86/boot.rst
+++ b/Documentation/x86/boot.rst
@@ -1026,6 +1026,19 @@ Offset/size: 0x000c/4
 
   This field contains maximal allowed type for setup_data and setup_indirect 
structs.
 
+   =
+Field name:mle_header_offset
+Offset/size:   0x0010/4
+   =
+
+  This field contains the offset to the Secure Launch Measured Launch 
Environment
+  (MLE) header. This offset is used to locate information needed during a 
secure
+  late launch using Intel TXT. If the offset is zero, the kernel does not have
+  Secure Launch capabilities. The MLE entry point is called from TXT on the BSP
+  following a success measured launch. The specific state of the processors is
+  outlined in the TXT Software Development Guide, the latest can be found here:
+  
https://www.intel.com/content/dam/www/public/us/en/documents/guides/intel-txt-software-development-guide.pdf
+
 
 The Image Checksum
 ==
diff --git a/arch/x86/boot/compressed/Makefile 
b/arch/x86/boot/compressed/Makefile
index 059d49a..1fe55a5 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -102,7 +102,8 @@ vmlinux-objs-$(CONFIG_ACPI) += $(obj)/acpi.o
 vmlinux-objs-$(CONFIG_EFI_MIXED) += $(obj)/efi_thunk_$(BITS).o
 efi-obj-$(CONFIG_EFI_STUB) = $(objtree)/drivers/firmware/efi/libstub/lib.a
 
-vmlinux-objs-$(CONFIG_SECURE_LAUNCH) += $(obj)/early_sha1.o 
$(obj)/early_sha256.o
+vmlinux-objs-$(CONFIG_SECURE_LAUNCH) += $(obj)/early_sha1.o 
$(obj)/early_sha256.o \
+   $(obj)/sl_main.o $(obj)/sl_stub.o
 
 $(obj)/vmlinux: $(vmlinux-objs-y) $(efi-obj-y) FORCE
$(call if_changed,ld)
diff --git a/arch/x86/boot/compressed/head_64.S 
b/arch/x86/boot/compressed/head_64.S
index a2347de..b35e072 100644
--- a/arch/x86/boot/compressed/head_64.S
+++ b/arch/x86/boot/compressed/head_64.S
@@ -498,6 +498,17 @@ trampoline_return:
pushq   $0
popfq
 
+#ifdef CONFIG_SECURE_LAUNCH
+   pushq   %rsi
+
+   /* Ensure the relocation region coverd by a PMR */
+   movq%rbx, %rdi
+   movl$(_bss - startup_32), %esi
+   callq   sl_check_region
+
+   popq%rsi
+#endif
+
 /*
  * Copy the compressed kernel to the end of our buffer
  * where decompression in place becomes safe.
@@ -556,6 +567,32 @@ SYM_FUNC_START_LOCAL_NOALIGN(.Lrelocated)
shrq$3, %rcx
rep stosq
 
+#ifdef CONFIG_SECURE_LAUNCH
+   /*
+* Have to do the final early sl stub work in 64b area.
+*
+* *** NOTE ***
+*
+* Several boot params get used before we get a chance to measure
+* them in this call. This is a known issue and we currently don't
+* have a solution. The scratch field doesn't matter. There is no
+* obvious way to do anything about the use of kernel_alignment or
+* init_size though these seem low risk with all the PMR and overlap
+* checks in place.
+*/
+   pushq   %rsi
+
+   movq%rsi, %rdi
+   callq   sl_main
+
+   /* Ensure the decompression location is coverd by a PMR */
+   movq%rbp, %rdi
+   movqoutput_len(%rip), %rsi
+   callq   sl_check_region
+
+   popq%rsi
+#endif
+
 /*
  * If running as an SEV guest, the encryption mask is required in the
  * page-table setup code below. When the guest also has SEV-ES enabled
diff --git 

[PATCH v3 07/14] x86: Add early SHA support for Secure Launch early measurements

2021-08-09 Thread Ross Philipson
From: "Daniel P. Smith" 

The SHA algorithms are necessary to measure configuration information into
the TPM as early as possible before using the values. This implementation
uses the established approach of #including the SHA libraries directly in
the code since the compressed kernel is not uncompressed at this point.

The SHA code here has its origins in the code from the main kernel, commit
c4d5b9ffa31f (crypto: sha1 - implement base layer for SHA-1). That code could
not be pulled directly into the setup portion of the compressed kernel because
of other dependencies it pulls in. The result is this is a modified copy of
that code that still leverages the core SHA algorithms.

Signed-off-by: Daniel P. Smith 
Signed-off-by: Ross Philipson 
---
 arch/x86/boot/compressed/Makefile   |   2 +
 arch/x86/boot/compressed/early_sha1.c   | 103 
 arch/x86/boot/compressed/early_sha1.h   |  17 ++
 arch/x86/boot/compressed/early_sha256.c |   7 +++
 lib/crypto/sha256.c |   8 +++
 lib/sha1.c  |   4 ++
 6 files changed, 141 insertions(+)
 create mode 100644 arch/x86/boot/compressed/early_sha1.c
 create mode 100644 arch/x86/boot/compressed/early_sha1.h
 create mode 100644 arch/x86/boot/compressed/early_sha256.c

diff --git a/arch/x86/boot/compressed/Makefile 
b/arch/x86/boot/compressed/Makefile
index 431bf7f..059d49a 100644
--- a/arch/x86/boot/compressed/Makefile
+++ b/arch/x86/boot/compressed/Makefile
@@ -102,6 +102,8 @@ vmlinux-objs-$(CONFIG_ACPI) += $(obj)/acpi.o
 vmlinux-objs-$(CONFIG_EFI_MIXED) += $(obj)/efi_thunk_$(BITS).o
 efi-obj-$(CONFIG_EFI_STUB) = $(objtree)/drivers/firmware/efi/libstub/lib.a
 
+vmlinux-objs-$(CONFIG_SECURE_LAUNCH) += $(obj)/early_sha1.o 
$(obj)/early_sha256.o
+
 $(obj)/vmlinux: $(vmlinux-objs-y) $(efi-obj-y) FORCE
$(call if_changed,ld)
 
diff --git a/arch/x86/boot/compressed/early_sha1.c 
b/arch/x86/boot/compressed/early_sha1.c
new file mode 100644
index 000..74f4654
--- /dev/null
+++ b/arch/x86/boot/compressed/early_sha1.c
@@ -0,0 +1,103 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2021 Apertus Solutions, LLC.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "early_sha1.h"
+
+#define SHA1_DISABLE_EXPORT
+#include "../../../../lib/sha1.c"
+
+/* The SHA1 implementation in lib/sha1.c was written to get the workspace
+ * buffer as a parameter. This wrapper function provides a container
+ * around a temporary workspace that is cleared after the transform completes.
+ */
+static void __sha_transform(u32 *digest, const char *data)
+{
+   u32 ws[SHA1_WORKSPACE_WORDS];
+
+   sha1_transform(digest, data, ws);
+
+   memset(ws, 0, sizeof(ws));
+   /*
+* As this is cryptographic code, prevent the memset 0 from being
+* optimized out potentially leaving secrets in memory.
+*/
+   wmb();
+
+}
+
+void early_sha1_init(struct sha1_state *sctx)
+{
+   sha1_init(sctx->state);
+   sctx->count = 0;
+}
+
+void early_sha1_update(struct sha1_state *sctx,
+  const u8 *data,
+  unsigned int len)
+{
+   unsigned int partial = sctx->count % SHA1_BLOCK_SIZE;
+
+   sctx->count += len;
+
+   if (likely((partial + len) >= SHA1_BLOCK_SIZE)) {
+   int blocks;
+
+   if (partial) {
+   int p = SHA1_BLOCK_SIZE - partial;
+
+   memcpy(sctx->buffer + partial, data, p);
+   data += p;
+   len -= p;
+
+   __sha_transform(sctx->state, sctx->buffer);
+   }
+
+   blocks = len / SHA1_BLOCK_SIZE;
+   len %= SHA1_BLOCK_SIZE;
+
+   if (blocks) {
+   while (blocks--) {
+   __sha_transform(sctx->state, data);
+   data += SHA1_BLOCK_SIZE;
+   }
+   }
+   partial = 0;
+   }
+
+   if (len)
+   memcpy(sctx->buffer + partial, data, len);
+}
+
+void early_sha1_final(struct sha1_state *sctx, u8 *out)
+{
+   const int bit_offset = SHA1_BLOCK_SIZE - sizeof(__be64);
+   __be64 *bits = (__be64 *)(sctx->buffer + bit_offset);
+   __be32 *digest = (__be32 *)out;
+   unsigned int partial = sctx->count % SHA1_BLOCK_SIZE;
+   int i;
+
+   sctx->buffer[partial++] = 0x80;
+   if (partial > bit_offset) {
+   memset(sctx->buffer + partial, 0x0, SHA1_BLOCK_SIZE - partial);
+   partial = 0;
+
+   __sha_transform(sctx->state, sctx->buffer);
+   }
+
+   memset(sctx->buffer + partial, 0x0, bit_offset - partial);
+   *bits = cpu_to_be64(sctx->count << 3);
+   __sha_transform(sctx->state, sctx->buffer);
+
+   for (i = 0; i < SHA1_DIGEST_SIZE / sizeof(__be32); i++)
+   put_unaligned_be32(sctx->state[i], digest++);
+
+   *sctx 

[PATCH v3 09/14] x86: Secure Launch kernel late boot stub

2021-08-09 Thread Ross Philipson
The routine slaunch_setup is called out of the x86 specific setup_arch
routine during early kernel boot. After determining what platform is
present, various operations specific to that platform occur. This
includes finalizing setting for the platform late launch and verifying
that memory protections are in place.

For TXT, this code also reserves the original compressed kernel setup
area where the APs were left looping so that this memory cannot be used.

Signed-off-by: Ross Philipson 
---
 arch/x86/kernel/Makefile   |   1 +
 arch/x86/kernel/setup.c|   3 +
 arch/x86/kernel/slaunch.c  | 460 +
 drivers/iommu/intel/dmar.c |   4 +
 4 files changed, 468 insertions(+)
 create mode 100644 arch/x86/kernel/slaunch.c

diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 3e625c6..d6ee904 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -80,6 +80,7 @@ obj-$(CONFIG_X86_32)  += tls.o
 obj-$(CONFIG_IA32_EMULATION)   += tls.o
 obj-y  += step.o
 obj-$(CONFIG_INTEL_TXT)+= tboot.o
+obj-$(CONFIG_SECURE_LAUNCH)+= slaunch.o
 obj-$(CONFIG_ISA_DMA_API)  += i8237.o
 obj-$(CONFIG_STACKTRACE)   += stacktrace.o
 obj-y  += cpu/
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 055a834..482bd76 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -976,6 +977,8 @@ void __init setup_arch(char **cmdline_p)
early_gart_iommu_check();
 #endif
 
+   slaunch_setup_txt();
+
/*
 * partially used pages are not usable - thus
 * we are rounding upwards:
diff --git a/arch/x86/kernel/slaunch.c b/arch/x86/kernel/slaunch.c
new file mode 100644
index 000..f91f0b5
--- /dev/null
+++ b/arch/x86/kernel/slaunch.c
@@ -0,0 +1,460 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Secure Launch late validation/setup and finalization support.
+ *
+ * Copyright (c) 2021, Oracle and/or its affiliates.
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+static u32 sl_flags;
+static struct sl_ap_wake_info ap_wake_info;
+static u64 evtlog_addr;
+static u32 evtlog_size;
+static u64 vtd_pmr_lo_size;
+
+/* This should be plenty of room */
+static u8 txt_dmar[PAGE_SIZE] __aligned(16);
+
+u32 slaunch_get_flags(void)
+{
+   return sl_flags;
+}
+EXPORT_SYMBOL(slaunch_get_flags);
+
+struct sl_ap_wake_info *slaunch_get_ap_wake_info(void)
+{
+   return _wake_info;
+}
+
+struct acpi_table_header *slaunch_get_dmar_table(struct acpi_table_header 
*dmar)
+{
+   /* The DMAR is only stashed and provided via TXT on Intel systems */
+   if (memcmp(txt_dmar, "DMAR", 4))
+   return dmar;
+
+   return (struct acpi_table_header *)(_dmar[0]);
+}
+
+void __noreturn slaunch_txt_reset(void __iomem *txt,
+ const char *msg, u64 error)
+{
+   u64 one = 1, val;
+
+   pr_err("%s", msg);
+
+   /*
+* This performs a TXT reset with a sticky error code. The reads of
+* TXT_CR_E2STS act as barriers.
+*/
+   memcpy_toio(txt + TXT_CR_ERRORCODE, , sizeof(error));
+   memcpy_fromio(, txt + TXT_CR_E2STS, sizeof(val));
+   memcpy_toio(txt + TXT_CR_CMD_NO_SECRETS, , sizeof(one));
+   memcpy_fromio(, txt + TXT_CR_E2STS, sizeof(val));
+   memcpy_toio(txt + TXT_CR_CMD_UNLOCK_MEM_CONFIG, , sizeof(one));
+   memcpy_fromio(, txt + TXT_CR_E2STS, sizeof(val));
+   memcpy_toio(txt + TXT_CR_CMD_RESET, , sizeof(one));
+
+   for ( ; ; )
+   asm volatile ("hlt");
+
+   unreachable();
+}
+
+/*
+ * The TXT heap is too big to map all at once with early_ioremap
+ * so it is done a table at a time.
+ */
+static void __init *txt_early_get_heap_table(void __iomem *txt, u32 type,
+u32 bytes)
+{
+   void *heap;
+   u64 base, size, offset = 0;
+   int i;
+
+   if (type > TXT_SINIT_TABLE_MAX)
+   slaunch_txt_reset(txt,
+   "Error invalid table type for early heap walk\n",
+   SL_ERROR_HEAP_WALK);
+
+   memcpy_fromio(, txt + TXT_CR_HEAP_BASE, sizeof(base));
+   memcpy_fromio(, txt + TXT_CR_HEAP_SIZE, sizeof(size));
+
+   /* Iterate over heap tables looking for table of "type" */
+   for (i = 0; i < type; i++) {
+   base += offset;
+   heap = early_memremap(base, sizeof(u64));
+   if (!heap)
+   slaunch_txt_reset(txt,
+   "Error early_memremap of heap for heap walk\n",
+   SL_ERROR_HEAP_MAP);
+
+   offset = *((u64 *)heap);
+
+   /*
+* After the 

[PATCH v3 04/14] Documentation/x86: Secure Launch kernel documentation

2021-08-09 Thread Ross Philipson
Introduce background, overview and configuration/ABI information
for the Secure Launch kernel feature.

Signed-off-by: Daniel P. Smith 
Signed-off-by: Ross Philipson 
---
 Documentation/x86/index.rst |   1 +
 Documentation/x86/secure-launch.rst | 714 
 2 files changed, 715 insertions(+)
 create mode 100644 Documentation/x86/secure-launch.rst

diff --git a/Documentation/x86/index.rst b/Documentation/x86/index.rst
index 3830483..e5a058f 100644
--- a/Documentation/x86/index.rst
+++ b/Documentation/x86/index.rst
@@ -31,6 +31,7 @@ x86-specific Documentation
tsx_async_abort
buslock
usb-legacy-support
+   secure-launch
i386/index
x86_64/index
sva
diff --git a/Documentation/x86/secure-launch.rst 
b/Documentation/x86/secure-launch.rst
new file mode 100644
index 000..cc5995c6
--- /dev/null
+++ b/Documentation/x86/secure-launch.rst
@@ -0,0 +1,714 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+=
+Secure Launch
+=
+
+Background
+==
+
+The Trusted Computing Group (TCG) architecture defines two methods in
+which the target operating system is started, aka launched, on a system
+for which Intel and AMD provides implementations. These two launch types
+are static launch and dynamic launch. Static launch is referred to as
+such because it happens at one fixed point, at system startup, during
+the defined life-cycle of a system. Dynamic launch is referred to as
+such because it is not limited to being done once nor bound to system
+startup. It can in fact happen at anytime without incurring/requiring an
+associated power event for the system. Since dynamic launch can happen
+at anytime, this results in dynamic launch being split into two type of
+its own. The first is referred to as an early launch, where the dynamic
+launch is done in conjunction with the static lunch of the system. The
+second is referred to as a late launch, where a dynamic launch is
+initiated after the static launch was fully completed and the system was
+under the control of some target operating system or run-time kernel.
+These two top-level launch methods, static launch and dynamic launch
+provide different models for establishing the launch integrity, i.e. the
+load-time integrity, of the target operating system. When cryptographic
+hashing is used to create an integrity assessment for the launch
+integrity, then for a static launch this is referred to as the Static
+Root of Trust for Measurement (SRTM) and for dynamic launch it is
+referred to as the Dynamic Root of Trust for Measurement (DRTM).
+
+The reasoning for needing the two integrity models is driven by the fact
+that these models leverage what is referred to as a "transitive trust".
+A transitive trust is commonly referred to as a "trust chain", which is
+created through the process of an entity making an integrity assessment
+of another entity and upon success transfers control to the new entity.
+This process is then repeated by each entity until the Trusted Computing
+Base (TCB) of system has been established. A challenge for transitive
+trust is that the process is susceptible to cumulative error
+and in this case that error is inaccurate or improper integrity
+assessments. The way to address cumulative error is to reduce the
+number of instances that can create error involved in the process.  In
+this case it would be to reduce then number of entities involved in the
+transitive trust. It is not possible to reduce the number of firmware
+components or the boot loader(s) involved during static launch. This is
+where dynamic launch comes in, as it introduces the concept for a CPU to
+provide an instruction that results in a transitive trust starting with
+the CPU doing an integrity assessment of a special loader that can then
+start a target operating system. This reduces the trust chain down to
+the CPU, a special loader, and the target operation system.  It is also
+why it is said that the DRTM is rooted in hardware since the CPU is what
+does the first integrity assessment, i.e. the first measurement, in the
+trust chain.
+
+Overview
+
+
+Prior to the start of the TrenchBoot project, the only active Open
+Source project supporting dynamic launch was Intel's Tboot project to
+support their implementation of dynamic launch known as Intel Trusted
+eXecution Technology (TXT). The approach taken by Tboot was to provide
+an exokernel that could handle the launch protocol implemented by
+Intel's special loader, the SINIT Authenticated Code Module (ACM [3]_)
+and remained in memory to manage the SMX CPU mode that a dynamic launch
+would put a system. While it is not precluded from being used for doing
+a late launch, Tboot's primary use case was to be used as an early
+launch solution. As a result the TrenchBoot project started the
+development of Secure Launch kernel feature to provide a more
+generalized approach. The focus of the effort is twofold, the first is to
+make the Linux 

[PATCH v3 14/14] tpm: Allow locality 2 to be set when initializing the TPM for Secure Launch

2021-08-09 Thread Ross Philipson
The Secure Launch MLE environment uses PCRs that are only accessible from
the DRTM locality 2. By default the TPM drivers always initialize the
locality to 0. When a Secure Launch is in progress, initialize the
locality to 2.

Signed-off-by: Ross Philipson 
---
 drivers/char/tpm/tpm-chip.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c
index ddaeceb..48b9351 100644
--- a/drivers/char/tpm/tpm-chip.c
+++ b/drivers/char/tpm/tpm-chip.c
@@ -23,6 +23,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "tpm.h"
 
 DEFINE_IDR(dev_nums_idr);
@@ -34,12 +35,20 @@
 
 static int tpm_request_locality(struct tpm_chip *chip)
 {
-   int rc;
+   int rc, locality;
 
if (!chip->ops->request_locality)
return 0;
 
-   rc = chip->ops->request_locality(chip, 0);
+   if (slaunch_get_flags() & SL_FLAG_ACTIVE) {
+   dev_dbg(>dev, "setting TPM locality to 2 for MLE\n");
+   locality = 2;
+   } else {
+   dev_dbg(>dev, "setting TPM locality to 0\n");
+   locality = 0;
+   }
+
+   rc = chip->ops->request_locality(chip, locality);
if (rc < 0)
return rc;
 
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 11/14] kexec: Secure Launch kexec SEXIT support

2021-08-09 Thread Ross Philipson
Prior to running the next kernel via kexec, the Secure Launch code
closes down private SMX resources and does an SEXIT. This allows the
next kernel to start normally without any issues starting the APs etc.

Signed-off-by: Ross Philipson 
---
 arch/x86/kernel/slaunch.c | 71 +++
 kernel/kexec_core.c   |  4 +++
 2 files changed, 75 insertions(+)

diff --git a/arch/x86/kernel/slaunch.c b/arch/x86/kernel/slaunch.c
index f91f0b5..60a193a 100644
--- a/arch/x86/kernel/slaunch.c
+++ b/arch/x86/kernel/slaunch.c
@@ -458,3 +458,74 @@ void __init slaunch_setup_txt(void)
 
pr_info("Intel TXT setup complete\n");
 }
+
+static inline void smx_getsec_sexit(void)
+{
+   asm volatile (".byte 0x0f,0x37\n"
+ : : "a" (SMX_X86_GETSEC_SEXIT));
+}
+
+void slaunch_finalize(int do_sexit)
+{
+   void __iomem *config;
+   u64 one = TXT_REGVALUE_ONE, val;
+
+   if ((slaunch_get_flags() & (SL_FLAG_ACTIVE|SL_FLAG_ARCH_TXT)) !=
+   (SL_FLAG_ACTIVE|SL_FLAG_ARCH_TXT))
+   return;
+
+   config = ioremap(TXT_PRIV_CONFIG_REGS_BASE, TXT_NR_CONFIG_PAGES *
+PAGE_SIZE);
+   if (!config) {
+   pr_emerg("Error SEXIT failed to ioremap TXT private reqs\n");
+   return;
+   }
+
+   /* Clear secrets bit for SEXIT */
+   memcpy_toio(config + TXT_CR_CMD_NO_SECRETS, , sizeof(one));
+   memcpy_fromio(, config + TXT_CR_E2STS, sizeof(val));
+
+   /* Unlock memory configurations */
+   memcpy_toio(config + TXT_CR_CMD_UNLOCK_MEM_CONFIG, , sizeof(one));
+   memcpy_fromio(, config + TXT_CR_E2STS, sizeof(val));
+
+   /* Close the TXT private register space */
+   memcpy_toio(config + TXT_CR_CMD_CLOSE_PRIVATE, , sizeof(one));
+   memcpy_fromio(, config + TXT_CR_E2STS, sizeof(val));
+
+   /*
+* Calls to iounmap are not being done because of the state of the
+* system this late in the kexec process. Local IRQs are disabled and
+* iounmap causes a TLB flush which in turn causes a warning. Leaving
+* thse mappings is not an issue since the next kernel is going to
+* completely re-setup memory management.
+*/
+
+   /* Map public registers and do a final read fence */
+   config = ioremap(TXT_PUB_CONFIG_REGS_BASE, TXT_NR_CONFIG_PAGES *
+PAGE_SIZE);
+   if (!config) {
+   pr_emerg("Error SEXIT failed to ioremap TXT public reqs\n");
+   return;
+   }
+
+   memcpy_fromio(, config + TXT_CR_E2STS, sizeof(val));
+
+   pr_emerg("TXT clear secrets bit and unlock memory complete.");
+
+   if (!do_sexit)
+   return;
+
+   if (smp_processor_id() != 0) {
+   pr_emerg("Error TXT SEXIT must be called on CPU 0\n");
+   return;
+   }
+
+   /* Disable SMX mode */
+   cr4_set_bits(X86_CR4_SMXE);
+
+   /* Do the SEXIT SMX operation */
+   smx_getsec_sexit();
+
+   pr_emerg("TXT SEXIT complete.");
+}
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 4b34a9a..fdf0a27 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -39,6 +39,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1179,6 +1180,9 @@ int kernel_kexec(void)
cpu_hotplug_enable();
pr_notice("Starting new kernel\n");
machine_shutdown();
+
+   /* Finalize TXT registers and do SEXIT */
+   slaunch_finalize(1);
}
 
kmsg_dump(KMSG_DUMP_SHUTDOWN);
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 05/14] x86: Secure Launch Kconfig

2021-08-09 Thread Ross Philipson
Initial bits to bring in Secure Launch functionality. Add Kconfig
options for compiling in/out the Secure Launch code.

Signed-off-by: Ross Philipson 
---
 arch/x86/Kconfig | 32 
 1 file changed, 32 insertions(+)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 88fb922..b5e25c5 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1949,6 +1949,38 @@ config EFI_MIXED
 
   If unsure, say N.
 
+config SECURE_LAUNCH
+   bool "Secure Launch support"
+   default n
+   depends on X86_64 && X86_X2APIC
+   help
+  The Secure Launch feature allows a kernel to be loaded
+  directly through an Intel TXT measured launch. Intel TXT
+  establishes a Dynamic Root of Trust for Measurement (DRTM)
+  where the CPU measures the kernel image. This feature then
+  continues the measurement chain over kernel configuration
+  information and init images.
+
+config SECURE_LAUNCH_ALT_PCR19
+   bool "Secure Launch Alternate PCR 19 usage"
+   default n
+   depends on SECURE_LAUNCH
+   help
+  In the post ACM environment, Secure Launch by default measures
+  configuration information into PCR18. This feature allows finer
+  control over measurements by moving configuration measurements
+  into PCR19.
+
+config SECURE_LAUNCH_ALT_PCR20
+   bool "Secure Launch Alternate PCR 20 usage"
+   default n
+   depends on SECURE_LAUNCH
+   help
+  In the post ACM environment, Secure Launch by default measures
+  image data like any external initrd into PCR17. This feature
+  allows finer control over measurements by moving image measurements
+  into PCR20.
+
 source "kernel/Kconfig.hz"
 
 config KEXEC
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 13/14] x86: Secure Launch late initcall platform module

2021-08-09 Thread Ross Philipson
From: "Daniel P. Smith" 

The Secure Launch platform module is a late init module. During the
init call, the TPM event log is read and measurements taken in the
early boot stub code are located. These measurements are extended
into the TPM PCRs using the mainline TPM kernel driver.

The platform module also registers the securityfs nodes to allow
access to TXT register fields on Intel along with the fetching of
and writing events to the late launch TPM log.

Signed-off-by: Daniel P. Smith 
Signed-off-by: garnetgrimm 
Signed-off-by: Ross Philipson 
---
 arch/x86/kernel/Makefile   |   1 +
 arch/x86/kernel/slmodule.c | 495 +
 2 files changed, 496 insertions(+)
 create mode 100644 arch/x86/kernel/slmodule.c

diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index d6ee904..09b730a 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -81,6 +81,7 @@ obj-$(CONFIG_IA32_EMULATION)  += tls.o
 obj-y  += step.o
 obj-$(CONFIG_INTEL_TXT)+= tboot.o
 obj-$(CONFIG_SECURE_LAUNCH)+= slaunch.o
+obj-$(CONFIG_SECURE_LAUNCH)+= slmodule.o
 obj-$(CONFIG_ISA_DMA_API)  += i8237.o
 obj-$(CONFIG_STACKTRACE)   += stacktrace.o
 obj-y  += cpu/
diff --git a/arch/x86/kernel/slmodule.c b/arch/x86/kernel/slmodule.c
new file mode 100644
index 000..807f9ca
--- /dev/null
+++ b/arch/x86/kernel/slmodule.c
@@ -0,0 +1,495 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Secure Launch late validation/setup, securityfs exposure and
+ * finalization support.
+ *
+ * Copyright (c) 2021 Apertus Solutions, LLC
+ * Copyright (c) 2021 Assured Information Security, Inc.
+ * Copyright (c) 2021, Oracle and/or its affiliates.
+ *
+ * Author(s):
+ * Daniel P. Smith 
+ * Garnet T. Grimm 
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define SL_FS_ENTRIES  10
+/* root directory node must be last */
+#define SL_ROOT_DIR_ENTRY  (SL_FS_ENTRIES - 1)
+#define SL_TXT_DIR_ENTRY   (SL_FS_ENTRIES - 2)
+#define SL_TXT_FILE_FIRST  (SL_TXT_DIR_ENTRY - 1)
+#define SL_TXT_ENTRY_COUNT 7
+
+#define DECLARE_TXT_PUB_READ_U(size, fmt, msg_size)\
+static ssize_t txt_pub_read_u##size(unsigned int offset,   \
+   loff_t *read_offset,\
+   size_t read_len,\
+   char __user *buf)   \
+{  \
+   void __iomem *txt;  \
+   char msg_buffer[msg_size];  \
+   u##size reg_value = 0;  \
+   txt = ioremap(TXT_PUB_CONFIG_REGS_BASE, \
+   TXT_NR_CONFIG_PAGES * PAGE_SIZE);   \
+   if (IS_ERR(txt))\
+   return PTR_ERR(txt);\
+   memcpy_fromio(_value, txt + offset, sizeof(u##size));   \
+   iounmap(txt);   \
+   snprintf(msg_buffer, msg_size, fmt, reg_value); \
+   return simple_read_from_buffer(buf, read_len, read_offset,  \
+   _buffer, msg_size); \
+}
+
+DECLARE_TXT_PUB_READ_U(8, "%#04x\n", 6);
+DECLARE_TXT_PUB_READ_U(32, "%#010x\n", 12);
+DECLARE_TXT_PUB_READ_U(64, "%#018llx\n", 20);
+
+#define DECLARE_TXT_FOPS(reg_name, reg_offset, reg_size)   \
+static ssize_t txt_##reg_name##_read(struct file *flip,
\
+   char __user *buf, size_t read_len, loff_t *read_offset) \
+{  \
+   return txt_pub_read_u##reg_size(reg_offset, read_offset,\
+   read_len, buf); \
+}  \
+static const struct file_operations reg_name##_ops = { \
+   .read = txt_##reg_name##_read,  \
+}
+
+DECLARE_TXT_FOPS(sts, TXT_CR_STS, 64);
+DECLARE_TXT_FOPS(ests, TXT_CR_ESTS, 8);
+DECLARE_TXT_FOPS(errorcode, TXT_CR_ERRORCODE, 32);
+DECLARE_TXT_FOPS(didvid, TXT_CR_DIDVID, 64);
+DECLARE_TXT_FOPS(e2sts, TXT_CR_E2STS, 64);
+DECLARE_TXT_FOPS(ver_emif, TXT_CR_VER_EMIF, 32);
+DECLARE_TXT_FOPS(scratchpad, TXT_CR_SCRATCHPAD, 64);
+
+/*
+ * Securityfs exposure
+ */
+struct memfile {
+   char *name;
+   void *addr;
+   size_t size;
+};
+
+static struct memfile sl_evtlog = {"eventlog", 0, 0};
+static void *txt_heap;
+static struct 

[PATCH v3 02/14] x86/boot: Add missing handling of setup_indirect structures

2021-08-09 Thread Ross Philipson
One of the two functions in ioremap.c that handles setup_data was
missing the correct handling of setup_indirect structures.

Functionality missing from original commit:

commit b3c72fc9a78e (x86/boot: Introduce setup_indirect)

Signed-off-by: Ross Philipson 
---
 arch/x86/mm/ioremap.c | 21 +++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c
index ab74e4f..f2b34c5 100644
--- a/arch/x86/mm/ioremap.c
+++ b/arch/x86/mm/ioremap.c
@@ -669,17 +669,34 @@ static bool __init 
early_memremap_is_setup_data(resource_size_t phys_addr,
 
paddr = boot_params.hdr.setup_data;
while (paddr) {
-   unsigned int len;
+   unsigned int len, size;
 
if (phys_addr == paddr)
return true;
 
data = early_memremap_decrypted(paddr, sizeof(*data));
+   size = sizeof(*data);
 
paddr_next = data->next;
len = data->len;
 
-   early_memunmap(data, sizeof(*data));
+   if ((phys_addr > paddr) && (phys_addr < (paddr + len))) {
+   early_memunmap(data, sizeof(*data));
+   return true;
+   }
+
+   if (data->type == SETUP_INDIRECT) {
+   size += len;
+   early_memunmap(data, sizeof(*data));
+   data = early_memremap_decrypted(paddr, size);
+
+   if (((struct setup_indirect *)data->data)->type != 
SETUP_INDIRECT) {
+   paddr = ((struct setup_indirect 
*)data->data)->addr;
+   len = ((struct setup_indirect 
*)data->data)->len;
+   }
+   }
+
+   early_memunmap(data, size);
 
if ((phys_addr > paddr) && (phys_addr < (paddr + len)))
return true;
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 00/14] x86: Trenchboot secure dynamic launch Linux kernel support

2021-08-09 Thread Ross Philipson
The focus of Trechboot project (https://github.com/TrenchBoot) is to
enhance the boot security and integrity. This requires the linux kernel
to be directly invoked by x86 Dynamic launch measurements to establish
Dynamic Root of Trust for Measurement (DRTM). The dynamic launch will
be initiated by a boot loader with associated support added to it, for
example the first targeted boot loader will be GRUB2. An integral part of
establishing the DRTM involves measuring everything that is intended to
be run (kernel image, initrd, etc) and everything that will configure
that kernel to run (command line, boot params, etc) into specific PCRs,
the DRTM PCRs (17-22), in the TPM. Another key aspect is the dynamic
launch is rooted in hardware, that is to say the hardware (CPU) is what
takes the first measurement for the chain of integrity measurements. On
Intel this is done using the GETSEC instruction provided by Intel's TXT
and the SKINIT instruction provided by AMD's AMD-V. Information on these
technologies can be readily found online. This patchset introduces Intel
TXT support.

To enable the kernel to be launched by GETSEC, a stub must be built
into the setup section of the compressed kernel to handle the specific
state that the dynamic launch process leaves the BSP in. Also this stub
must measure everything that is going to be used as early as possible.
This stub code and subsequent code must also deal with the specific
state that the dynamic launch leaves the APs in.

A quick note on terminology. The larger open source project itself is
called Trenchboot, which is hosted on Github (links below). The kernel
feature enabling the use of the x86 technology is referred to as "Secure
Launch" within the kernel code. As such the prefixes sl_/SL_ or
slaunch/SLAUNCH will be seen in the code. The stub code discussed above
is referred to as the SL stub.

The new feature starts with patch #4. There are several preceeding patches
before that. Patches 1 and 2 are fixes to an earlier patch set that
itroduced the x86 setup_data type setup_indirect. Patch 3 was authored
by Arvind Sankar. There is no further status on this patch at this point but
Secure Launch depends on it so it is included with the set.

The basic flow is:

 - Entry from the dynamic launch jumps to the SL stub
 - SL stub fixes up the world on the BSP
 - For TXT, SL stub wakes the APs, fixes up their worlds
 - For TXT, APs are left halted waiting for an NMI to wake them
 - SL stub jumps to startup_32
 - SL main locates the TPM event log and writes the measurements of
   configuration and module information into it.
 - Kernel boot proceeds normally from this point.
 - During early setup, slaunch_setup() runs to finish some validation
   and setup tasks.
 - The SMP bringup code is modified to wake the waiting APs. APs vector
   to rmpiggy and start up normally from that point.
 - SL platform module is registered as a late initcall module. It reads
   the TPM event log and extends the measurements taken into the TPM PCRs.
 - SL platform module initializes the securityfs interface to allow
   asccess to the TPM event log and TXT public registers.
 - Kernel boot finishes booting normally
 - SEXIT support to leave SMX mode is present on the kexec path and
   the various reboot paths (poweroff, reset, halt).

Links:

The Trenchboot project including documentation:

https://github.com/trenchboot

Intel TXT is documented in its own specification and in the SDM Instruction Set 
volume:

https://www.intel.com/content/dam/www/public/us/en/documents/guides/intel-txt-software-development-guide.pdf
https://software.intel.com/en-us/articles/intel-sdm

AMD SKINIT is documented in the System Programming manual:

https://www.amd.com/system/files/TechDocs/24593.pdf

GRUB2 pre-launch support patchset (WIP):

https://lists.gnu.org/archive/html/grub-devel/2020-05/msg00011.html

Thanks
Ross Philipson and Daniel P. Smith

Changes in v2:

 - Modified 32b entry code to prevent causing relocations in the compressed
   kernel.
 - Dropped patches for compressed kernel TPM PCR extender.
 - Modified event log code to insert log delimiter events and not rely
   on TPM access.
 - Stop extending PCRs in the early Secure Launch stub code.
 - Removed Kconfig options for hash algorithms and use the algorithms the
   ACM used.
 - Match Secure Launch measurement algorithm use to those reported in the
   TPM 2.0 event log.
 - Read the TPM events out of the TPM and extend them into the PCRs using
   the mainline TPM driver. This is done in the late initcall module.
 - Allow use of alternate PCR 19 and 20 for post ACM measurements.
 - Add Kconfig constraints needed by Secure Launch (disable KASLR
   and add x2apic dependency).
 - Fix testing of SL_FLAGS when determining if Secure Launch is active
   and the architecture is TXT.
 - Use SYM_DATA_START_LOCAL macros in early entry point code.
 - Security audit changes:
   - Validate buffers passed to MLE do not overlap the MLE and are
 properly laid out.
 

[PATCH v3 10/14] x86: Secure Launch SMP bringup support

2021-08-09 Thread Ross Philipson
On Intel, the APs are left in a well documented state after TXT performs
the late launch. Specifically they cannot have #INIT asserted on them so
a standard startup via INIT/SIPI/SIPI cannot be performed. Instead the
early SL stub code parked the APs in a pause/jmp loop waiting for an NMI.
The modified SMP boot code is called for the Secure Launch case. The
jump address for the RM piggy entry point is fixed up in the jump where
the APs are waiting and an NMI IPI is sent to the AP. The AP vectors to
the Secure Launch entry point in the RM piggy which mimics what the real
mode code would do then jumps the the standard RM piggy protected mode
entry point.

Signed-off-by: Ross Philipson 
---
 arch/x86/include/asm/realmode.h  |  3 ++
 arch/x86/kernel/smpboot.c| 86 
 arch/x86/realmode/rm/header.S|  3 ++
 arch/x86/realmode/rm/trampoline_64.S | 37 
 4 files changed, 129 insertions(+)

diff --git a/arch/x86/include/asm/realmode.h b/arch/x86/include/asm/realmode.h
index 5db5d08..ef37bf1 100644
--- a/arch/x86/include/asm/realmode.h
+++ b/arch/x86/include/asm/realmode.h
@@ -37,6 +37,9 @@ struct real_mode_header {
 #ifdef CONFIG_X86_64
u32 machine_real_restart_seg;
 #endif
+#ifdef CONFIG_SECURE_LAUNCH
+   u32 sl_trampoline_start32;
+#endif
 };
 
 /* This must match data at realmode/rm/trampoline_{32,64}.S */
diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 9320285..aafe627 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -57,6 +57,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1021,6 +1022,83 @@ int common_cpu_up(unsigned int cpu, struct task_struct 
*idle)
return 0;
 }
 
+#ifdef CONFIG_SECURE_LAUNCH
+
+static atomic_t first_ap_only = {1};
+
+/*
+ * Called to fix the long jump address for the waiting APs to vector to
+ * the correct startup location in the Secure Launch stub in the rmpiggy.
+ */
+static int
+slaunch_fixup_jump_vector(void)
+{
+   struct sl_ap_wake_info *ap_wake_info;
+   u32 *ap_jmp_ptr = NULL;
+
+   if (!atomic_dec_and_test(_ap_only))
+   return 0;
+
+   ap_wake_info = slaunch_get_ap_wake_info();
+
+   ap_jmp_ptr = (u32 *)__va(ap_wake_info->ap_wake_block +
+ap_wake_info->ap_jmp_offset);
+
+   *ap_jmp_ptr = real_mode_header->sl_trampoline_start32;
+
+   pr_info("TXT AP long jump address updated\n");
+
+   return 0;
+}
+
+/*
+ * TXT AP startup is quite different than normal. The APs cannot have #INIT
+ * asserted on them or receive SIPIs. The early Secure Launch code has parked
+ * the APs in a pause loop waiting to receive an NMI. This will wake the APs
+ * and have them jump to the protected mode code in the rmpiggy where the rest
+ * of the SMP boot of the AP will proceed normally.
+ */
+static int
+slaunch_wakeup_cpu_from_txt(int cpu, int apicid)
+{
+   unsigned long send_status = 0, accept_status = 0;
+
+   /* Only done once */
+   if (slaunch_fixup_jump_vector())
+   return -1;
+
+   /* Send NMI IPI to idling AP and wake it up */
+   apic_icr_write(APIC_DM_NMI, apicid);
+
+   if (init_udelay == 0)
+   udelay(10);
+   else
+   udelay(300);
+
+   send_status = safe_apic_wait_icr_idle();
+
+   if (init_udelay == 0)
+   udelay(10);
+   else
+   udelay(300);
+
+   accept_status = (apic_read(APIC_ESR) & 0xEF);
+
+   if (send_status)
+   pr_err("Secure Launch IPI never delivered???\n");
+   if (accept_status)
+   pr_err("Secure Launch IPI delivery error (%lx)\n",
+   accept_status);
+
+   return (send_status | accept_status);
+}
+
+#else
+
+#define slaunch_wakeup_cpu_from_txt(cpu, apicid)   0
+
+#endif  /* !CONFIG_SECURE_LAUNCH */
+
 /*
  * NOTE - on most systems this is a PHYSICAL apic ID, but on multiquad
  * (ie clustered apic addressing mode), this is a LOGICAL apic ID.
@@ -1075,6 +1153,13 @@ static int do_boot_cpu(int apicid, int cpu, struct 
task_struct *idle,
cpumask_clear_cpu(cpu, cpu_initialized_mask);
smp_mb();
 
+   /* With Intel TXT, the AP startup is totally different */
+   if ((slaunch_get_flags() & (SL_FLAG_ACTIVE|SL_FLAG_ARCH_TXT)) ==
+  (SL_FLAG_ACTIVE|SL_FLAG_ARCH_TXT)) {
+   boot_error = slaunch_wakeup_cpu_from_txt(cpu, apicid);
+   goto txt_wake;
+   }
+
/*
 * Wake up a CPU in difference cases:
 * - Use the method in the APIC driver if it's defined
@@ -1087,6 +1172,7 @@ static int do_boot_cpu(int apicid, int cpu, struct 
task_struct *idle,
boot_error = wakeup_cpu_via_init_nmi(cpu, start_ip, apicid,
 cpu0_nmi_registered);
 
+txt_wake:
if (!boot_error) {
/*
 * Wait 10s total for 

[PATCH v3 03/14] x86/boot: Place kernel_info at a fixed offset

2021-08-09 Thread Ross Philipson
From: Arvind Sankar 

There are use cases for storing the offset of a symbol in kernel_info.
For example, the trenchboot series [0] needs to store the offset of the
Measured Launch Environment header in kernel_info.

Since commit (note: commit ID from tip/master)

  527afc212231 ("x86/boot: Check that there are no run-time relocations")

run-time relocations are not allowed in the compressed kernel, so simply
using the symbol in kernel_info, as

.long   symbol

will cause a linker error because this is not position-independent.

With kernel_info being a separate object file and in a different section
from startup_32, there is no way to calculate the offset of a symbol
from the start of the image in a position-independent way.

To enable such use cases, put kernel_info into its own section which is
placed at a predetermined offset (KERNEL_INFO_OFFSET) via the linker
script. This will allow calculating the symbol offset in a
position-independent way, by adding the offset from the start of
kernel_info to KERNEL_INFO_OFFSET.

Ensure that kernel_info is aligned, and use the SYM_DATA.* macros
instead of bare labels. This stores the size of the kernel_info
structure in the ELF symbol table.

Signed-off-by: Arvind Sankar 
Cc: Ross Philipson 
Signed-off-by: Ross Philipson 
---
 arch/x86/boot/compressed/kernel_info.S | 19 +++
 arch/x86/boot/compressed/kernel_info.h | 12 
 arch/x86/boot/compressed/vmlinux.lds.S |  6 ++
 3 files changed, 33 insertions(+), 4 deletions(-)
 create mode 100644 arch/x86/boot/compressed/kernel_info.h

diff --git a/arch/x86/boot/compressed/kernel_info.S 
b/arch/x86/boot/compressed/kernel_info.S
index f818ee8..c18f071 100644
--- a/arch/x86/boot/compressed/kernel_info.S
+++ b/arch/x86/boot/compressed/kernel_info.S
@@ -1,12 +1,23 @@
 /* SPDX-License-Identifier: GPL-2.0 */
 
+#include 
 #include 
+#include "kernel_info.h"
 
-   .section ".rodata.kernel_info", "a"
+/*
+ * If a field needs to hold the offset of a symbol from the start
+ * of the image, use the macro below, eg
+ * .long   rva(symbol)
+ * This will avoid creating run-time relocations, which are not
+ * allowed in the compressed kernel.
+ */
+
+#define rva(X) (((X) - kernel_info) + KERNEL_INFO_OFFSET)
 
-   .global kernel_info
+   .section ".rodata.kernel_info", "a"
 
-kernel_info:
+   .balign 16
+SYM_DATA_START(kernel_info)
/* Header, Linux top (structure). */
.ascii  "LToP"
/* Size. */
@@ -19,4 +30,4 @@ kernel_info:
 
 kernel_info_var_len_data:
/* Empty for time being... */
-kernel_info_end:
+SYM_DATA_END_LABEL(kernel_info, SYM_L_LOCAL, kernel_info_end)
diff --git a/arch/x86/boot/compressed/kernel_info.h 
b/arch/x86/boot/compressed/kernel_info.h
new file mode 100644
index 000..c127f84
--- /dev/null
+++ b/arch/x86/boot/compressed/kernel_info.h
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef BOOT_COMPRESSED_KERNEL_INFO_H
+#define BOOT_COMPRESSED_KERNEL_INFO_H
+
+#ifdef CONFIG_X86_64
+#define KERNEL_INFO_OFFSET 0x500
+#else /* 32-bit */
+#define KERNEL_INFO_OFFSET 0x100
+#endif
+
+#endif /* BOOT_COMPRESSED_KERNEL_INFO_H */
diff --git a/arch/x86/boot/compressed/vmlinux.lds.S 
b/arch/x86/boot/compressed/vmlinux.lds.S
index 112b237..84c7b4d 100644
--- a/arch/x86/boot/compressed/vmlinux.lds.S
+++ b/arch/x86/boot/compressed/vmlinux.lds.S
@@ -7,6 +7,7 @@ OUTPUT_FORMAT(CONFIG_OUTPUT_FORMAT)
 
 #include 
 #include 
+#include "kernel_info.h"
 
 #ifdef CONFIG_X86_64
 OUTPUT_ARCH(i386:x86-64)
@@ -27,6 +28,11 @@ SECTIONS
HEAD_TEXT
_ehead = . ;
}
+   .rodata.kernel_info KERNEL_INFO_OFFSET : {
+   *(.rodata.kernel_info)
+   }
+   ASSERT(ABSOLUTE(kernel_info) == KERNEL_INFO_OFFSET, "kernel_info at bad 
address!")
+
.rodata..compressed : {
*(.rodata..compressed)
}
-- 
1.8.3.1

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


[PATCH v3 01/14] x86/boot: Fix memremap of setup_indirect structures

2021-08-09 Thread Ross Philipson
As documented, the setup_indirect structure is nested inside
the setup_data structures in the setup_data list. The code was
accessing the fields inside the setup_indirect structure but only
the sizeof(struct setup_data) was being memremapped. No crash
occured but this is just due to how the area was remapped under the
covers.

The setup_indirect structure was introduced in commit:

commit b3c72fc9a78e (x86/boot: Introduce setup_indirect)

Signed-off-by: Ross Philipson 
---
 arch/x86/kernel/e820.c | 31 -
 arch/x86/kernel/kdebugfs.c | 28 ---
 arch/x86/kernel/ksysfs.c   | 56 --
 arch/x86/kernel/setup.c| 23 +--
 arch/x86/mm/ioremap.c  | 13 +++
 5 files changed, 109 insertions(+), 42 deletions(-)

diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c
index bc0657f..e023950 100644
--- a/arch/x86/kernel/e820.c
+++ b/arch/x86/kernel/e820.c
@@ -996,7 +996,8 @@ static int __init parse_memmap_opt(char *str)
 void __init e820__reserve_setup_data(void)
 {
struct setup_data *data;
-   u64 pa_data;
+   u64 pa_data, pa_next;
+   u32 len;
 
pa_data = boot_params.hdr.setup_data;
if (!pa_data)
@@ -1004,6 +1005,9 @@ void __init e820__reserve_setup_data(void)
 
while (pa_data) {
data = early_memremap(pa_data, sizeof(*data));
+   len = sizeof(*data);
+   pa_next = data->next;
+
e820__range_update(pa_data, sizeof(*data)+data->len, 
E820_TYPE_RAM, E820_TYPE_RESERVED_KERN);
 
/*
@@ -1015,18 +1019,23 @@ void __init e820__reserve_setup_data(void)
 sizeof(*data) + data->len,
 E820_TYPE_RAM, 
E820_TYPE_RESERVED_KERN);
 
-   if (data->type == SETUP_INDIRECT &&
-   ((struct setup_indirect *)data->data)->type != 
SETUP_INDIRECT) {
-   e820__range_update(((struct setup_indirect 
*)data->data)->addr,
-  ((struct setup_indirect 
*)data->data)->len,
-  E820_TYPE_RAM, 
E820_TYPE_RESERVED_KERN);
-   e820__range_update_kexec(((struct setup_indirect 
*)data->data)->addr,
-((struct setup_indirect 
*)data->data)->len,
-E820_TYPE_RAM, 
E820_TYPE_RESERVED_KERN);
+   if (data->type == SETUP_INDIRECT) {
+   len += data->len;
+   early_memunmap(data, sizeof(*data));
+   data = early_memremap(pa_data, len);
+
+   if (((struct setup_indirect *)data->data)->type != 
SETUP_INDIRECT) {
+   e820__range_update(((struct setup_indirect 
*)data->data)->addr,
+  ((struct setup_indirect 
*)data->data)->len,
+  E820_TYPE_RAM, 
E820_TYPE_RESERVED_KERN);
+   e820__range_update_kexec(((struct 
setup_indirect *)data->data)->addr,
+((struct 
setup_indirect *)data->data)->len,
+E820_TYPE_RAM, 
E820_TYPE_RESERVED_KERN);
+   }
}
 
-   pa_data = data->next;
-   early_memunmap(data, sizeof(*data));
+   pa_data = pa_next;
+   early_memunmap(data, len);
}
 
e820__update_table(e820_table);
diff --git a/arch/x86/kernel/kdebugfs.c b/arch/x86/kernel/kdebugfs.c
index 64b6da9..2fa1b93 100644
--- a/arch/x86/kernel/kdebugfs.c
+++ b/arch/x86/kernel/kdebugfs.c
@@ -92,7 +92,8 @@ static int __init create_setup_data_nodes(struct dentry 
*parent)
struct setup_data *data;
int error;
struct dentry *d;
-   u64 pa_data;
+   u64 pa_data, pa_next;
+   u32 len;
int no = 0;
 
d = debugfs_create_dir("setup_data", parent);
@@ -112,12 +113,23 @@ static int __init create_setup_data_nodes(struct dentry 
*parent)
error = -ENOMEM;
goto err_dir;
}
-
-   if (data->type == SETUP_INDIRECT &&
-   ((struct setup_indirect *)data->data)->type != 
SETUP_INDIRECT) {
-   node->paddr = ((struct setup_indirect 
*)data->data)->addr;
-   node->type  = ((struct setup_indirect 
*)data->data)->type;
-   node->len   = ((struct setup_indirect 
*)data->data)->len;
+   pa_next = data->next;
+
+   if (data->type == SETUP_INDIRECT) {
+   len = sizeof(*data) + data->len;
+   memunmap(data);
+   data = memremap(pa_data, len, 

Re: [PATCH resend] dma-debug: Use memory_intersects() directly

2021-08-09 Thread Christoph Hellwig
Thanks,

applied.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH] dma-debug: fix debugfs initialization order

2021-08-09 Thread Christoph Hellwig
Thanks, applied.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Freedreno] [PATCH 0/3] iommu/drm/msm: Allow non-coherent masters to use system cache

2021-08-09 Thread Will Deacon
On Mon, Aug 02, 2021 at 06:36:04PM -0700, Rob Clark wrote:
> On Mon, Aug 2, 2021 at 8:14 AM Will Deacon  wrote:
> >
> > On Mon, Aug 02, 2021 at 08:08:07AM -0700, Rob Clark wrote:
> > > On Mon, Aug 2, 2021 at 3:55 AM Will Deacon  wrote:
> > > >
> > > > On Thu, Jul 29, 2021 at 10:08:22AM +0530, Sai Prakash Ranjan wrote:
> > > > > On 2021-07-28 19:30, Georgi Djakov wrote:
> > > > > > On Mon, Jan 11, 2021 at 07:45:02PM +0530, Sai Prakash Ranjan wrote:
> > > > > > > commit ecd7274fb4cd ("iommu: Remove unused IOMMU_SYS_CACHE_ONLY 
> > > > > > > flag")
> > > > > > > removed unused IOMMU_SYS_CACHE_ONLY prot flag and along with it 
> > > > > > > went
> > > > > > > the memory type setting required for the non-coherent masters to 
> > > > > > > use
> > > > > > > system cache. Now that system cache support for GPU is added, we 
> > > > > > > will
> > > > > > > need to set the right PTE attribute for GPU buffers to be sys 
> > > > > > > cached.
> > > > > > > Without this, the system cache lines are not allocated for GPU.
> > > > > > >
> > > > > > > So the patches in this series introduces a new prot flag 
> > > > > > > IOMMU_LLC,
> > > > > > > renames IO_PGTABLE_QUIRK_ARM_OUTER_WBWA to 
> > > > > > > IO_PGTABLE_QUIRK_PTW_LLC
> > > > > > > and makes GPU the user of this protection flag.
> > > > > >
> > > > > > Thank you for the patchset! Are you planning to refresh it, as it 
> > > > > > does
> > > > > > not apply anymore?
> > > > > >
> > > > >
> > > > > I was waiting on Will's reply [1]. If there are no changes needed, 
> > > > > then
> > > > > I can repost the patch.
> > > >
> > > > I still think you need to handle the mismatched alias, no? You're adding
> > > > a new memory type to the SMMU which doesn't exist on the CPU side. That
> > > > can't be right.
> > > >
> > >
> > > Just curious, and maybe this is a dumb question, but what is your
> > > concern about mismatched aliases?  I mean the cache hierarchy on the
> > > GPU device side (anything beyond the LLC) is pretty different and
> > > doesn't really care about the smmu pgtable attributes..
> >
> > If the CPU accesses a shared buffer with different attributes to those which
> > the device is using then you fall into the "mismatched memory attributes"
> > part of the Arm architecture. It's reasonably unforgiving (you should go and
> > read it) and in some cases can apply to speculative accesses as well, but
> > the end result is typically loss of coherency.
> 
> Ok, I might have a few other sections to read first to decipher the
> terminology..
> 
> But my understanding of LLC is that it looks just like system memory
> to the CPU and GPU (I think that would make it "the point of
> coherence" between the GPU and CPU?)  If that is true, shouldn't it be
> invisible from the point of view of different CPU mapping options?

You could certainly build a system where mismatched attributes don't cause
loss of coherence, but as it's not guaranteed by the architecture and the
changes proposed here affect APIs which are exposed across SoCs, then I
don't think it helps much.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [Patch V2 0/2] iommu/arm-smmu: Fix races in iommu domain/group creation

2021-08-09 Thread Will Deacon
On Mon, Aug 02, 2021 at 04:46:37PM +0100, Robin Murphy wrote:
> On 2021-08-02 16:16, Will Deacon wrote:
> > On Fri, Jun 18, 2021 at 02:00:35AM +0530, Ashish Mhetre wrote:
> > > Multiple iommu domains and iommu groups are getting created for the 
> > > devices
> > > sharing same SID. It is expected for devices sharing same SID to be in 
> > > same
> > > iommu group and same iommu domain.
> > > This is leading to context faults when one device is accessing IOVA from
> > > other device which shouldn't be the case for devices sharing same SID.
> > > Fix this by protecting iommu domain and iommu group creation with mutexes.
> > 
> > Robin -- any chance you could take a look at these, please? You had some
> > comments on the first version which convinced me that they are needed,
> > but I couldn't tell whether you wanted to solve this a different way or not.
> 
> Sorry, I was lamenting that this came to light due to the
> of_iommu_configure() flow being yucky, but that wasn't meant to imply that
> there aren't - or couldn't be in future - better reasons for
> iommu_probe_device() to be robust against concurrency anyway. I do think
> these are legitimate fixes to make in their own right, even if the current
> need might get swept back under the rug in future.
> 
> I would say, however, that the commit messages seem to focus too much on the
> wrong details and aren't overly useful, and patch #2 is missing Ashish's
> sign-off.

Ashish -- please can you send a v3 fixing these issues?

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 24/25] iommu/dma: Factor out flush queue init

2021-08-09 Thread Robin Murphy

On 2021-08-09 13:52, Will Deacon wrote:

On Wed, Aug 04, 2021 at 06:15:52PM +0100, Robin Murphy wrote:

Factor out flush queue setup from the initial domain init so that we
can potentially trigger it from sysfs later on in a domain's lifetime.

Reviewed-by: Lu Baolu 
Reviewed-by: John Garry 
Signed-off-by: Robin Murphy 
---
  drivers/iommu/dma-iommu.c | 30 --
  include/linux/dma-iommu.h |  9 ++---
  2 files changed, 26 insertions(+), 13 deletions(-)

diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
index 2e19505dddf9..f51b8dc99ac6 100644
--- a/drivers/iommu/dma-iommu.c
+++ b/drivers/iommu/dma-iommu.c
@@ -310,6 +310,25 @@ static bool dev_is_untrusted(struct device *dev)
return dev_is_pci(dev) && to_pci_dev(dev)->untrusted;
  }
  
+int iommu_dma_init_fq(struct iommu_domain *domain)

+{
+   struct iommu_dma_cookie *cookie = domain->iova_cookie;
+
+   if (domain->type != IOMMU_DOMAIN_DMA_FQ)
+   return -EINVAL;
+   if (cookie->fq_domain)
+   return 0;
+
+   if (init_iova_flush_queue(>iovad, iommu_dma_flush_iotlb_all,
+ iommu_dma_entry_dtor)) {
+   pr_warn("iova flush queue initialization failed\n");
+   domain->type = IOMMU_DOMAIN_DMA;
+   return -ENODEV;


I do find this a bit odd: we assert that the caller has set domain->type
to IOMMU_DOMAIN_DMA_FQ but then on failure we reset it to IOMMU_DOMAIN_DMA
here. I think it would be less error-prone if the setting of domain->type
was handled in the same function.


On reflection I think I agree. For some reason I settled on the idea of 
doing this to make the callers simpler, but it turns out that unpicking 
it to flow logically is in fact a +4/-5 diff essentially just moving all 
the same statements to different places, and that's before I update 
comments since that theoretical race between the sysfs and DMA ops paths 
only exists because of sysfs having to dance around the type check here...


I'll send v4 later today or possibly tomorrow, but not in such a hurry 
that I skimp on the build-testing this time!


Cheers,
Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 25/25] iommu: Allow enabling non-strict mode dynamically

2021-08-09 Thread Robin Murphy

On 2021-08-09 13:49, Will Deacon wrote:

On Wed, Aug 04, 2021 at 06:15:53PM +0100, Robin Murphy wrote:

Allocating and enabling a flush queue is in fact something we can
reasonably do while a DMA domain is active, without having to rebuild it
from scratch. Thus we can allow a strict -> non-strict transition from
sysfs without requiring to unbind the device's driver, which is of
particular interest to users who want to make selective relaxations to
critical devices like the one serving their root filesystem.

Disabling and draining a queue also seems technically possible to
achieve without rebuilding the whole domain, but would certainly be more
involved. Furthermore there's not such a clear use-case for tightening
up security *after* the device may already have done whatever it is that
you don't trust it not to do, so we only consider the relaxation case.

CC: Sai Praneeth Prakhya 
Signed-off-by: Robin Murphy 

---

v3: Actually think about concurrency, rework most of the fq data
 accesses to be (hopefully) safe and comment it all
---
  drivers/iommu/dma-iommu.c | 25 ++---
  drivers/iommu/iommu.c | 16 
  drivers/iommu/iova.c  |  9 ++---
  3 files changed, 36 insertions(+), 14 deletions(-)


I failed to break this, so hopefully you've caught everything now.

Only thing I wasn't sure of is why we still need the smp_wmb() in
init_iova_flush_queue(). Can we remove it now that we have one before
assigning into the cookie?


Mostly because I failed to spot it, I think :)

Indeed now that we don't have any callers other than iommu_dma_init_fq() 
to worry about, I don't think that one matters any more. It would if 
were testing cookie->iovad->fq directly as our indicator instead of 
cookie->fq_domain, but then we'd still need the new barrier to ensure 
iommu_dma_flush_iotlb_all() properly observes the latter, so we may as 
well rely on that everywhere and let it fully replace the old one.


Thanks,
Robin.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 24/25] iommu/dma: Factor out flush queue init

2021-08-09 Thread Will Deacon
On Wed, Aug 04, 2021 at 06:15:52PM +0100, Robin Murphy wrote:
> Factor out flush queue setup from the initial domain init so that we
> can potentially trigger it from sysfs later on in a domain's lifetime.
> 
> Reviewed-by: Lu Baolu 
> Reviewed-by: John Garry 
> Signed-off-by: Robin Murphy 
> ---
>  drivers/iommu/dma-iommu.c | 30 --
>  include/linux/dma-iommu.h |  9 ++---
>  2 files changed, 26 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c
> index 2e19505dddf9..f51b8dc99ac6 100644
> --- a/drivers/iommu/dma-iommu.c
> +++ b/drivers/iommu/dma-iommu.c
> @@ -310,6 +310,25 @@ static bool dev_is_untrusted(struct device *dev)
>   return dev_is_pci(dev) && to_pci_dev(dev)->untrusted;
>  }
>  
> +int iommu_dma_init_fq(struct iommu_domain *domain)
> +{
> + struct iommu_dma_cookie *cookie = domain->iova_cookie;
> +
> + if (domain->type != IOMMU_DOMAIN_DMA_FQ)
> + return -EINVAL;
> + if (cookie->fq_domain)
> + return 0;
> +
> + if (init_iova_flush_queue(>iovad, iommu_dma_flush_iotlb_all,
> +   iommu_dma_entry_dtor)) {
> + pr_warn("iova flush queue initialization failed\n");
> + domain->type = IOMMU_DOMAIN_DMA;
> + return -ENODEV;

I do find this a bit odd: we assert that the caller has set domain->type
to IOMMU_DOMAIN_DMA_FQ but then on failure we reset it to IOMMU_DOMAIN_DMA
here. I think it would be less error-prone if the setting of domain->type
was handled in the same function.

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v3 25/25] iommu: Allow enabling non-strict mode dynamically

2021-08-09 Thread Will Deacon
On Wed, Aug 04, 2021 at 06:15:53PM +0100, Robin Murphy wrote:
> Allocating and enabling a flush queue is in fact something we can
> reasonably do while a DMA domain is active, without having to rebuild it
> from scratch. Thus we can allow a strict -> non-strict transition from
> sysfs without requiring to unbind the device's driver, which is of
> particular interest to users who want to make selective relaxations to
> critical devices like the one serving their root filesystem.
> 
> Disabling and draining a queue also seems technically possible to
> achieve without rebuilding the whole domain, but would certainly be more
> involved. Furthermore there's not such a clear use-case for tightening
> up security *after* the device may already have done whatever it is that
> you don't trust it not to do, so we only consider the relaxation case.
> 
> CC: Sai Praneeth Prakhya 
> Signed-off-by: Robin Murphy 
> 
> ---
> 
> v3: Actually think about concurrency, rework most of the fq data
> accesses to be (hopefully) safe and comment it all
> ---
>  drivers/iommu/dma-iommu.c | 25 ++---
>  drivers/iommu/iommu.c | 16 
>  drivers/iommu/iova.c  |  9 ++---
>  3 files changed, 36 insertions(+), 14 deletions(-)

I failed to break this, so hopefully you've caught everything now.

Only thing I wasn't sure of is why we still need the smp_wmb() in
init_iova_flush_queue(). Can we remove it now that we have one before
assigning into the cookie?

Will
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v5 0/3] Apple M1 DART IOMMU driver

2021-08-09 Thread Joerg Roedel
On Tue, Aug 03, 2021 at 02:16:48PM +0200, Sven Peter wrote:
> Sven Peter (3):
>   iommu/io-pgtable: Add DART pagetable format
>   dt-bindings: iommu: add DART iommu bindings
>   iommu/dart: Add DART iommu driver

Applied, thanks. This driver now lives in the apple/dart branch of the
IOMMU tree.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/12] Clean up "mediatek,larb"

2021-08-09 Thread j...@8bytes.org
On Mon, Aug 09, 2021 at 08:30:03AM +, Yong Wu (吴勇) wrote:
> Thanks very much for your confirm. I will your Ack for iommu part in
> the next version.

Note that my ack is conditional on the premise that Matthias has
reviewed the IOMMU parts.

Thanks,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [RFC v2] /dev/iommu uAPI proposal

2021-08-09 Thread Tian, Kevin
> From: David Gibson 
> Sent: Friday, August 6, 2021 12:45 PM
> > > > In concept I feel the purpose of DMA endpoint is equivalent to the
> routing
> > > > info in this proposal.
> > >
> > > Maybe?  I'm afraid I never quite managed to understand the role of the
> > > routing info in your proposal.
> > >
> >
> > the IOMMU routes incoming DMA packets to a specific I/O page table,
> > according to RID or RID+PASID carried in the packet. RID or RID+PASID
> > is the routing information (represented by device cookie +PASID in
> > proposed uAPI) and what the iommu driver really cares when activating
> > the I/O page table in the iommu.
> 
> Ok, so yes, endpoint is roughly equivalent to that.  But my point is
> that the IOMMU layer really only cares about that (device+routing)
> combination, not other aspects of what the device is.  So that's the
> concept we should give a name and put front and center in the API.
> 

This is how this proposal works, centered around the routing info. the 
uAPI doesn't care what the device is. It just requires the user to specify 
the user view of routing info (device fd + optional pasid) to tag an IOAS. 
the user view is then converted to the kernel view of routing (rid or 
rid+pasid) by vfio driver and then passed to iommu fd in the attaching 
operation. and GET_INFO interface is provided for the user to check 
whether a device supports multiple IOASes and whether pasid space is 
delegated to the user. We just need a better name if pasid is considered 
too pci specific...

But creating an endpoint per ioasid and making it centered in uAPI is not 
what the IOMMU layer cares about.

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu