Re: [RFC PATCH 10/11] vfio/iommu_type1: Optimize dirty bitmap population based on iommu HWDBM

2021-02-17 Thread Keqian Zhu
Hi Yi,

On 2021/2/9 19:57, Yi Sun wrote:
> On 21-02-07 18:40:36, Keqian Zhu wrote:
>> Hi Yi,
>>
>> On 2021/2/7 17:56, Yi Sun wrote:
>>> Hi,
>>>
>>> On 21-01-28 23:17:41, Keqian Zhu wrote:
>>>
>>> [...]
>>>
 +static void vfio_dma_dirty_log_start(struct vfio_iommu *iommu,
 +   struct vfio_dma *dma)
 +{
 +  struct vfio_domain *d;
 +
 +  list_for_each_entry(d, >domain_list, next) {
 +  /* Go through all domain anyway even if we fail */
 +  iommu_split_block(d->domain, dma->iova, dma->size);
 +  }
 +}
>>>
>>> This should be a switch to prepare for dirty log start. Per Intel
>>> Vtd spec, there is SLADE defined in Scalable-Mode PASID Table Entry.
>>> It enables Accessed/Dirty Flags in second-level paging entries.
>>> So, a generic iommu interface here is better. For Intel iommu, it
>>> enables SLADE. For ARM, it splits block.
>> Indeed, a generic interface name is better.
>>
>> The vendor iommu driver plays vendor's specific actions to start dirty log, 
>> and Intel iommu and ARM smmu may differ. Besides, we may add more actions in 
>> ARM smmu driver in future.
>>
>> One question: Though I am not familiar with Intel iommu, I think it also 
>> should split block mapping besides enable SLADE. Right?
>>
> I am not familiar with ARM smmu. :) So I want to clarify if the block
> in smmu is big page, e.g. 2M page? Intel Vtd manages the memory per
Yes, for ARM, the "block" is big page :).

> page, 4KB/2MB/1GB. There are two ways to manage dirty pages.
> 1. Keep default granularity. Just set SLADE to enable the dirty track.
> 2. Split big page to 4KB to get finer granularity.
According to your statement, I see that VT-D's SLADE behaves like smmu HTTU. 
They are both based on page-table.

Right, we should give more freedom to iommu vendor driver, so a generic 
interface is better.
1) As you said, set SLADE when enable dirty log.
2) IOMMUs of other architecture may has completely different dirty tracking 
mechanism.

> 
> But question about the second solution is if it can benefit the user
> space, e.g. live migration. If my understanding about smmu block (i.e.
> the big page) is correct, have you collected some performance data to
> prove that the split can improve performance? Thanks!
The purpose of splitting block mapping is to reduce the amount of dirty bytes, 
which depends on actual DMA transaction.
Take an extreme example, if DMA writes one byte, under 1G mapping, the dirty 
amount reported to userspace is 1G, but under 4K mapping, the dirty amount is 
just 4K.

I will detail the commit message in v2.

Thanks,
Keqian
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH v3 00/66] KVM: arm64: ARMv8.3/8.4 Nested Virtualization support

2021-02-17 Thread Marc Zyngier
On Thu, 04 Feb 2021 07:51:37 +,
Haibo Xu  wrote:
> 
> Kindly ping!
> 
> On Thu, 21 Jan 2021 at 11:03, Haibo Xu  wrote:
> >
> > Re-send in case the previous email was blocked for the inlined hyper-link.
> >
> > Hi Marc,
> >
> > I have tried to enable the NV support in Qemu, and now I can
> > successfully boot a L2 guest
> > in Qemu KVM mode.
> >
> > This patch series looks good from the Qemu side except for two minor
> > requirements:
> > (1) Qemu will check whether a feature was supported by the KVM cap
> > when the user tries to enable it in the command line, so a new
> > capability was prefered for the NV(KVM_CAP_ARM_NV?).

I have added KVM_CAP_ARM_EL2 (rather than NV) to that effect.

> > (2) According to the Documentation/virt/kvm/api.rst, userspace can
> > call KVM_ARM_VCPU_INIT multiple times for a given vcpu, but the
> > kvm_vcpu_init_nested() do have some issue when called multiple
> > times(please refer to the detailed comments in patch 63)

This is now fixed, I believe.

I have pushed out a branch [1] that addresses all the reported
issues, though it currently lack some testing. Please let me know if
it works for you.

Thanks,

M.

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git/log/?h=kvm-arm64/nv-5.12-WIP

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [RFC PATCH v2 00/26] KVM/arm64: A stage 2 for the host

2021-02-17 Thread Quentin Perret
Hi Mate,

On Wednesday 17 Feb 2021 at 17:27:07 (+0100), Mate Toth-Pal wrote:
> We tested the pKVM changes pulled from here:
> 
> 
> >  https://android-kvm.googlesource.com/linux qperret/host-stage2-v2
> 
> 
> We were using a target with Arm architecture with FEAT_S2FWB, and found that
> there is a bug in the patch.
> 
> 
> It turned out that the Kernel checks for the extension, and sets up the
> stage 2 translation so that it forces the host memory type to write-through.
> However it seems that the code doesn't turn on the feature in the HCR_EL2
> register.
> 
> 
> We were able to fix the issue by applying the following patch:
> 
> 
> diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> index 0cd3eb178f3b..e8521a072ea6 100644
> --- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> +++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
> @@ -105,6 +105,8 @@ int kvm_host_prepare_stage2(void *mem_pgt_pool, void
> *dev_pgt_pool)
>     params->vttbr = kvm_get_vttbr(mmu);
>     params->vtcr = host_kvm.arch.vtcr;
>     params->hcr_el2 |= HCR_VM;
> +   if (cpus_have_const_cap(ARM64_HAS_STAGE2_FWB))
> +   params->hcr_el2 |= HCR_FWB;
>     __flush_dcache_area(params, sizeof(*params));
>     }

Aha, indeed, this looks right. I'll double check HCR_EL2 to see if I'm
missing any other, and I'll add this to v3.

Thanks for testing, and the for the report.
Quentin
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH kvmtool 05/21] hw/i8042: Clean up data types

2021-02-17 Thread Andre Przywara
On Thu, 11 Feb 2021 16:55:43 +
Alexandru Elisei  wrote:

Hi,

> On 12/10/20 2:28 PM, Andre Przywara wrote:
> 
> > The i8042 is clearly an 8-bit era device, so there is little room for
> > 32-bit registers.
> > Clean up the data types used.
> >
> > Signed-off-by: Andre Przywara 
> > ---
> >  hw/i8042.c | 20 ++--
> >  1 file changed, 10 insertions(+), 10 deletions(-)
> >
> > diff --git a/hw/i8042.c b/hw/i8042.c
> > index 37a99a2d..36ee183f 100644
> > --- a/hw/i8042.c
> > +++ b/hw/i8042.c
> > @@ -64,11 +64,11 @@
> >  struct kbd_state {
> > struct kvm  *kvm;
> >  
> > -   charkq[QUEUE_SIZE]; /* Keyboard queue */
> > +   u8  kq[QUEUE_SIZE]; /* Keyboard queue */
> > int kread, kwrite;  /* Indexes into the queue */
> > int kcount; /* number of elements in queue 
> > */
> >  
> > -   charmq[QUEUE_SIZE];
> > +   u8  mq[QUEUE_SIZE];
> > int mread, mwrite;
> > int mcount;  
> 
> I think the write_cmd field further down should also be u8 because it stores 
> the
> first byte of a command (and it's set only to an 8 bit value in 
> kbd_write_command()).

Yes, looks like it, will change it on the way. I guess I wanted to
suppress my rewrite-everything complex ;-)

If you allow, I will also fix the confusing indentation bug in the big
switch statement on the way.
 
> Otherwise, it looks ok to me. osdev wiki seems to confirm that the device is
> indeed 8 bit only, and all the registers are 8 bit now:
> 
> Reviewed-by: Alexandru Elisei 

Thanks!
Andre

> 
> >  
> > @@ -173,9 +173,9 @@ static void kbd_write_command(struct kvm *kvm, u8 val)
> >  /*
> >   * Called when the OS reads from port 0x60 (PS/2 data)
> >   */
> > -static u32 kbd_read_data(void)
> > +static u8 kbd_read_data(void)
> >  {
> > -   u32 ret;
> > +   u8 ret;
> > int i;
> >  
> > if (state.kcount != 0) {
> > @@ -202,9 +202,9 @@ static u32 kbd_read_data(void)
> >  /*
> >   * Called when the OS read from port 0x64, the command port
> >   */
> > -static u32 kbd_read_status(void)
> > +static u8 kbd_read_status(void)
> >  {
> > -   return (u32)state.status;
> > +   return state.status;
> >  }
> >  
> >  /*
> > @@ -212,7 +212,7 @@ static u32 kbd_read_status(void)
> >   * Things written here are generally arguments to commands previously
> >   * written to port 0x64 and stored in state.write_cmd
> >   */
> > -static void kbd_write_data(u32 val)
> > +static void kbd_write_data(u8 val)
> >  {
> > switch (state.write_cmd) {
> > case I8042_CMD_CTL_WCTR:
> > @@ -304,8 +304,8 @@ static bool kbd_in(struct ioport *ioport, struct 
> > kvm_cpu *vcpu, u16 port, void *
> > break;
> > }
> > case I8042_DATA_REG: {
> > -   u32 value = kbd_read_data();
> > -   ioport__write32(data, value);
> > +   u8 value = kbd_read_data();
> > +   ioport__write8(data, value);
> > break;
> > }
> > case I8042_PORT_B_REG: {
> > @@ -328,7 +328,7 @@ static bool kbd_out(struct ioport *ioport, struct 
> > kvm_cpu *vcpu, u16 port, void
> > break;
> > }
> > case I8042_DATA_REG: {
> > -   u32 value = ioport__read32(data);
> > +   u8 value = ioport__read8(data);
> > kbd_write_data(value);
> > break;
> > }  

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH kvmtool 04/21] mmio: Extend handling to include ioport emulation

2021-02-17 Thread Andre Przywara
On Thu, 11 Feb 2021 16:10:16 +
Alexandru Elisei  wrote:

Hi,

> On 12/10/20 2:28 PM, Andre Przywara wrote:
> > In their core functionality MMIO and I/O port traps are not really
> > different, yet we still have two totally separate code paths for
> > handling them. Devices need to decide on one conduit or need to provide
> > different handler functions for each of them.
> >
> > Extend the existing MMIO emulation to also cover ioport handlers.
> > This just adds another RB tree root for holding the I/O port handlers,
> > but otherwise uses the same tree population and lookup code.  
> 
> Maybe I'm missing something, but why two trees? Is it valid to have an overlap
> between IO port and MMIO emulation? Or was it done to make the removal of 
> ioport
> emulation easier?

So I thought about it as well, but figured it's easier this way:
- x86 allows overlap, PIO is a totally separate address space from
  memory/MMIO. Early x86 CPUs had pins to indicate a PIO bus cycle, but
  using the same address and data pins otherwise. In practise there
  might be no overlap when it comes to *MMIO* traps vs PIO on x86
  (there is DRAM only at the lowest 64K of the IBM PC memory map),
  but not sure we should rely on this.
- For non-x86 this would indeed be non-overlapping, but this would need
  to be translated at init time then? And then we can't move those
  anymore, I guess? So I found it cleaner to keep this separate, and do
  the translation at trap time.
- As a consequence we would need to have a bit indicating the address
  space. I haven't actually tried this, but my understanding is that
  this would spoil the whole rb_tree functions, since they rely on a
  linear addressing scheme, and adding another bit there would be at
  least cumbersome?

At the end I decided to go for separate trees, as also this was less
change.

I agree that it would be nice to have one tree, from a design point of
view, but I am afraid that would require more changes.
If need be, I think we can always unify them later on, on top of this
series?

> If it's not valid to have that overlap, then I think having one tree for both
> would better. Struct mmio_mapping would have to be augmented with a flags 
> field
> that holds the same flags given to kvm__register_iotrap to differentiate 
> between
> the two slightly different emulations. Saving the IOTRAP_COALESCE flag would 
> also
> make it trivial to call KVM_UNREGISTER_COALESCED_MMIO in 
> kvm__deregister_iotrap,
> which we currently don't do.
> 
> > "ioport" or "mmio" just become a flag in the registration function.
> > Provide wrappers to not break existing users, and allow an easy
> > transition for the existing ioport handlers.
> >
> > This also means that ioport handlers now can use the same emulation
> > callback prototype as MMIO handlers, which means we have to migrate them
> > over. To allow a smooth transition, we hook up the new I/O emulate
> > function to the end of the existing ioport emulation code.  
> 
> I'm sorry, but I don't understand that last sentence. Do you mean that the 
> ioport
> emulation code has been modified to use kvm__emulate_pio() as a fallback for 
> when
> the port is not found in the ioport_tree?

I meant that for the transition period we have all of traditional MMIO,
traditional PIO, *and* just transformed PIO.

That means there are still PIO devices registered through ioport.c's
ioport__register(), *and* PIO devices registered through mmio.c's
kvm__register_pio(). Which means they end up in two separate PIO trees.
And only the traditional kvm__emulate_io() from ioport.c is called upon
a trap, so it needs to check both trees, which it does by calling into
kvm__emulate_pio(), shall a search in the local tree fail.

Or did you mean something else?

> >
> > Signed-off-by: Andre Przywara 
> > ---
> >  include/kvm/kvm.h | 42 +
> >  ioport.c  |  4 ++--
> >  mmio.c| 59 +++
> >  3 files changed, 89 insertions(+), 16 deletions(-)
> >
> > diff --git a/include/kvm/kvm.h b/include/kvm/kvm.h
> > index ee99c28e..14f9d58b 100644
> > --- a/include/kvm/kvm.h
> > +++ b/include/kvm/kvm.h
> > @@ -27,10 +27,16 @@
> >  #define PAGE_SIZE (sysconf(_SC_PAGE_SIZE))
> >  #endif
> >  
> > +#define IOTRAP_BUS_MASK0xf  
> 
> It's not immediately obvious what this mask does. It turns out it's used to 
> mask
> the enum flags defined in the header devices.h, header which is not included 
> in
> this file.
> 
> The flag names we pass to kvm__register_iotrap() are slightly inconsistent
> (DEVICE_BUS_PCI, DEVICE_BUS_MMIO and IOTRAP_COALESCE), where DEVICE_BUS_{PCI,
> MMIO} come from devices.h as an enum. I was wondering if I'm missing 
> something and
> there is a particular reason why we don't define our own flags for that here
> (something like IOTRAP_PIO and IOTRAP_MMIO).

I am not sure why this would be needed?
We already define and use DEVICE_BUS_x elsewhere, so why not 

Re: [PATCH kvmtool 20/21] hw/serial: ARM/arm64: Use MMIO at higher addresses

2021-02-17 Thread Alexandru Elisei
Hi Andre,

On 12/10/20 2:29 PM, Andre Przywara wrote:
> Using the UART devices at their legacy I/O addresses as set by IBM in
> 1981 was a kludge we used for simplicity on ARM platforms as well.
> However this imposes problems due to their missing alignment and overlap
> with the PCI I/O address space.
>
> Now that we can switch a device easily between using ioports and MMIO,
> let's move the UARTs out of the first 4K of memory on ARM platforms.
>
> That should be transparent for well behaved guests, since the change is
> naturally reflected in the device tree. Even "earlycon" keeps working,
> as the stdout-path property is adjusted automatically.
>
> People providing direct earlycon parameters via the command line need to
> adjust it to: "earlycon=uart,mmio,0x100".
>
> Signed-off-by: Andre Przywara 
> ---
>  hw/serial.c | 52 
>  1 file changed, 36 insertions(+), 16 deletions(-)
>
> diff --git a/hw/serial.c b/hw/serial.c
> index d840eebc..00fb3aa8 100644
> --- a/hw/serial.c
> +++ b/hw/serial.c
> @@ -13,6 +13,24 @@
>  
>  #include 
>  
> +#if defined(CONFIG_ARM) || defined(CONFIG_ARM64)
> +#define serial_iobase(nr)(0x100 + (nr) * 0x1000)
> +#define serial_irq(nr)   (32 + (nr))
> +#define SERIAL8250_BUS_TYPE  DEVICE_BUS_MMIO
> +#else
> +#define serial_iobase_0  0x3f8
> +#define serial_iobase_1  0x2f8
> +#define serial_iobase_2  0x3e8
> +#define serial_iobase_3  0x2e8
> +#define serial_irq_0 4
> +#define serial_irq_1 3
> +#define serial_irq_2 4
> +#define serial_irq_3 3

Nitpick: serial_iobase_* and serial_irq_* could be changed to have two leading
underscores, to stress the fact that they are helpers for serial_iobase() and
serial_irq() and are not meant to be used by themselves. But that's just 
personal
preference, otherwise the patch looks really nice and clean:

Reviewed-by: Alexandru Elisei 

Thanks,

Alex

> +#define serial_iobase(nr)serial_iobase_##nr
> +#define serial_irq(nr)   serial_irq_##nr
> +#define SERIAL8250_BUS_TYPE  DEVICE_BUS_IOPORT
> +#endif
> +
>  /*
>   * This fakes a U6_16550A. The fifo len needs to be 64 as the kernel
>   * expects that for autodetection.
> @@ -27,7 +45,7 @@ struct serial8250_device {
>   struct mutexmutex;
>   u8  id;
>  
> - u16 iobase;
> + u32 iobase;
>   u8  irq;
>   u8  irq_state;
>   int txcnt;
> @@ -65,56 +83,56 @@ static struct serial8250_device devices[] = {
>   /* ttyS0 */
>   [0] = {
>   .dev_hdr = {
> - .bus_type   = DEVICE_BUS_IOPORT,
> + .bus_type   = SERIAL8250_BUS_TYPE,
>   .data   = serial8250_generate_fdt_node,
>   },
>   .mutex  = MUTEX_INITIALIZER,
>  
>   .id = 0,
> - .iobase = 0x3f8,
> - .irq= 4,
> + .iobase = serial_iobase(0),
> + .irq= serial_irq(0),
>  
>   SERIAL_REGS_SETTING
>   },
>   /* ttyS1 */
>   [1] = {
>   .dev_hdr = {
> - .bus_type   = DEVICE_BUS_IOPORT,
> + .bus_type   = SERIAL8250_BUS_TYPE,
>   .data   = serial8250_generate_fdt_node,
>   },
>   .mutex  = MUTEX_INITIALIZER,
>  
>   .id = 1,
> - .iobase = 0x2f8,
> - .irq= 3,
> + .iobase = serial_iobase(1),
> + .irq= serial_irq(1),
>  
>   SERIAL_REGS_SETTING
>   },
>   /* ttyS2 */
>   [2] = {
>   .dev_hdr = {
> - .bus_type   = DEVICE_BUS_IOPORT,
> + .bus_type   = SERIAL8250_BUS_TYPE,
>   .data   = serial8250_generate_fdt_node,
>   },
>   .mutex  = MUTEX_INITIALIZER,
>  
>   .id = 2,
> - .iobase = 0x3e8,
> - .irq= 4,
> + .iobase = serial_iobase(2),
> + .irq= serial_irq(2),
>  
>   SERIAL_REGS_SETTING
>   },
>   /* ttyS3 */
>   [3] = {
>   .dev_hdr = {
> - .bus_type   = DEVICE_BUS_IOPORT,
> + .bus_type   = SERIAL8250_BUS_TYPE,
>   .data   = serial8250_generate_fdt_node,
>   },
>   .mutex  = MUTEX_INITIALIZER,
>  
>   

Re: [PATCH kvmtool 01/21] ioport: Remove ioport__setup_arch()

2021-02-17 Thread Andre Przywara
On Thu, 11 Feb 2021 17:32:01 +
Alexandru Elisei  wrote:

Hi,

> On 2/11/21 5:16 PM, Andre Przywara wrote:
> > On Wed, 10 Feb 2021 17:44:59 +
> > Alexandru Elisei  wrote:
> >
> > Hi Alex,
> >  
> >> On 12/10/20 2:28 PM, Andre Przywara wrote:  
> >>> Since x86 had a special need for registering tons of special I/O ports,
> >>> we had an ioport__setup_arch() callback, to allow each architecture
> >>> to do the same. As it turns out no one uses it beside x86, so we remove
> >>> that unnecessary abstraction.
> >>>
> >>> The generic function was registered via a device_base_init() call, so
> >>> we just do the same for the x86 specific function only, and can remove
> >>> the unneeded ioport__setup_arch().
> >>>
> >>> Signed-off-by: Andre Przywara 
> >>> ---
> >>>  arm/ioport.c |  5 -
> >>>  include/kvm/ioport.h |  1 -
> >>>  ioport.c | 28 
> >>>  mips/kvm.c   |  5 -
> >>>  powerpc/ioport.c |  6 --
> >>>  x86/ioport.c | 25 -
> >>>  6 files changed, 24 insertions(+), 46 deletions(-)
> >>>
> >>> diff --git a/arm/ioport.c b/arm/ioport.c
> >>> index 2f0feb9a..24092c9d 100644
> >>> --- a/arm/ioport.c
> >>> +++ b/arm/ioport.c
> >>> @@ -1,11 +1,6 @@
> >>>  #include "kvm/ioport.h"
> >>>  #include "kvm/irq.h"
> >>>  
> >>> -int ioport__setup_arch(struct kvm *kvm)
> >>> -{
> >>> - return 0;
> >>> -}
> >>> -
> >>>  void ioport__map_irq(u8 *irq)
> >>>  {
> >>>   *irq = irq__alloc_line();
> >>> diff --git a/include/kvm/ioport.h b/include/kvm/ioport.h
> >>> index 039633f7..d0213541 100644
> >>> --- a/include/kvm/ioport.h
> >>> +++ b/include/kvm/ioport.h
> >>> @@ -35,7 +35,6 @@ struct ioport_operations {
> >>>   enum irq_type));
> >>>  };
> >>>  
> >>> -int ioport__setup_arch(struct kvm *kvm);
> >>>  void ioport__map_irq(u8 *irq);
> >>>  
> >>>  int __must_check ioport__register(struct kvm *kvm, u16 port, struct 
> >>> ioport_operations *ops,
> >>> diff --git a/ioport.c b/ioport.c
> >>> index 844a832d..667e8386 100644
> >>> --- a/ioport.c
> >>> +++ b/ioport.c
> >>> @@ -158,21 +158,6 @@ int ioport__unregister(struct kvm *kvm, u16 port)
> >>>   return 0;
> >>>  }
> >>>  
> >>> -static void ioport__unregister_all(void)
> >>> -{
> >>> - struct ioport *entry;
> >>> - struct rb_node *rb;
> >>> - struct rb_int_node *rb_node;
> >>> -
> >>> - rb = rb_first(_tree);
> >>> - while (rb) {
> >>> - rb_node = rb_int(rb);
> >>> - entry = ioport_node(rb_node);
> >>> - ioport_unregister(_tree, entry);
> >>> - rb = rb_first(_tree);
> >>> - }
> >>> -}
> >> I get the impression this is a rebasing artifact. The commit message 
> >> doesn't
> >> mention anything about removing ioport__exit() -> 
> >> ioport__unregister_all(), and as
> >> far as I can tell it's still needed because there are places other than
> >> ioport__setup_arch() from where ioport__register() is called.  
> > I agree that the commit message is a bit thin on this fact, but the
> > functionality of ioport__unregister_all() is now in
> > x86/ioport.c:ioport__remove_arch(). I think removing ioport__init()
> > without removing ioport__exit() as well would look very weird, if not
> > hackish.  
> 
> Not necessarily. ioport__unregister_all() removes the ioports added by
> x86/ioport.c::ioport__setup_arch(), *plus* ioports added by different devices,
> like serial, rtc, virtio-pci and vfio-pci (which are used by arm/arm64).

Right, indeed. Not that it really matters, since we are about to exit
anyway, but it looks indeed I need to move this to a generic teardown
method, or actually just keep that part here in this file.

Will give this a try.

Thanks!
Andre

> >
> > I can amend the commit message to mention this, or is there anything
> > else I missed?
> >
> > Cheers,
> > Andre
> >  
> >>> -
> >>>  static const char *to_direction(int direction)
> >>>  {
> >>>   if (direction == KVM_EXIT_IO_IN)
> >>> @@ -220,16 +205,3 @@ out:
> >>>  
> >>>   return !kvm->cfg.ioport_debug;
> >>>  }
> >>> -
> >>> -int ioport__init(struct kvm *kvm)
> >>> -{
> >>> - return ioport__setup_arch(kvm);
> >>> -}
> >>> -dev_base_init(ioport__init);
> >>> -
> >>> -int ioport__exit(struct kvm *kvm)
> >>> -{
> >>> - ioport__unregister_all();
> >>> - return 0;
> >>> -}
> >>> -dev_base_exit(ioport__exit);
> >>> diff --git a/mips/kvm.c b/mips/kvm.c
> >>> index 26355930..e110e5d5 100644
> >>> --- a/mips/kvm.c
> >>> +++ b/mips/kvm.c
> >>> @@ -100,11 +100,6 @@ void kvm__irq_trigger(struct kvm *kvm, int irq)
> >>>   die_perror("KVM_IRQ_LINE ioctl");
> >>>  }
> >>>  
> >>> -int ioport__setup_arch(struct kvm *kvm)
> >>> -{
> >>> - return 0;
> >>> -}
> >>> -
> >>>  bool kvm__arch_cpu_supports_vm(void)
> >>>  {
> >>>   return true;
> >>> diff --git a/powerpc/ioport.c b/powerpc/ioport.c
> >>> index 0c188b61..a5cff4ee 100644
> >>> --- a/powerpc/ioport.c
> >>> +++ b/powerpc/ioport.c
> >>> @@ -12,12 +12,6 @@
> >>>  
> >>>  

Re: [RFC PATCH v2 00/26] KVM/arm64: A stage 2 for the host

2021-02-17 Thread Mate Toth-Pal

Hi Quentin,


On 2021-01-08 13:14, Quentin Perret wrote:

Hi all,

This is the v2 of the series previously posted here:

   https://lore.kernel.org/kvmarm/20201117181607.1761516-1-qper...@google.com/

This basically allows us to wrap the host with a stage 2 when running in
nVHE, hence paving the way for protecting guest memory from the host in
the future (among other use-cases). For more details about the
motivation and the design angle taken here, I would recommend to have a
look at the cover letter of v1, and/or to watch these presentations at
LPC [1] and KVM forum 2020 [2].



We tested the pKVM changes pulled from here:



 https://android-kvm.googlesource.com/linux qperret/host-stage2-v2



We were using a target with Arm architecture with FEAT_S2FWB, and found 
that there is a bug in the patch.



It turned out that the Kernel checks for the extension, and sets up the 
stage 2 translation so that it forces the host memory type to 
write-through. However it seems that the code doesn't turn on the 
feature in the HCR_EL2 register.



We were able to fix the issue by applying the following patch:


diff --git a/arch/arm64/kvm/hyp/nvhe/mem_protect.c 
b/arch/arm64/kvm/hyp/nvhe/mem_protect.c

index 0cd3eb178f3b..e8521a072ea6 100644
--- a/arch/arm64/kvm/hyp/nvhe/mem_protect.c
+++ b/arch/arm64/kvm/hyp/nvhe/mem_protect.c
@@ -105,6 +105,8 @@ int kvm_host_prepare_stage2(void *mem_pgt_pool, void 
*dev_pgt_pool)

    params->vttbr = kvm_get_vttbr(mmu);
    params->vtcr = host_kvm.arch.vtcr;
    params->hcr_el2 |= HCR_VM;
+   if (cpus_have_const_cap(ARM64_HAS_STAGE2_FWB))
+   params->hcr_el2 |= HCR_FWB;
    __flush_dcache_area(params, sizeof(*params));
    }


Best regards,
Mate Toth-Pal

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH kvmtool 19/21] Remove ioport specific routines

2021-02-17 Thread Alexandru Elisei
Hi Andre,

On 2/17/21 3:49 PM, Alexandru Elisei wrote:
> Hi Andre,
>
> On 12/10/20 2:29 PM, Andre Przywara wrote:
>> Now that all users of the dedicated ioport trap handler interface are
>> gone, we can retire the code associated with it.
>>
>> This removes ioport.c and ioport.h, along with removing prototypes from
>> other header files.
>>
>> This also transfers the responsibility for port I/O trap handling
>> entirely into the new routine in mmio.c.
>>
>> Signed-off-by: Andre Przywara 
>> ---
>>  Makefile |   1 -
>>  include/kvm/ioport.h |  20 -
>>  include/kvm/kvm.h|   2 -
>>  ioport.c | 173 ---
>>  mmio.c   |   2 +-
>>  5 files changed, 1 insertion(+), 197 deletions(-)
>>  delete mode 100644 ioport.c
>>
>> diff --git a/Makefile b/Makefile
>> index 35bb1182..94ff5da6 100644
>> --- a/Makefile
>> +++ b/Makefile
>> @@ -56,7 +56,6 @@ OBJS   += framebuffer.o
>>  OBJS+= guest_compat.o
>>  OBJS+= hw/rtc.o
>>  OBJS+= hw/serial.o
>> -OBJS+= ioport.o
>>  OBJS+= irq.o
>>  OBJS+= kvm-cpu.o
>>  OBJS+= kvm.o
>> diff --git a/include/kvm/ioport.h b/include/kvm/ioport.h
>> index a61038e2..38636553 100644
>> --- a/include/kvm/ioport.h
>> +++ b/include/kvm/ioport.h
>> @@ -17,28 +17,8 @@
>>  
>>  struct kvm;
> Looks to me like the above forward declaration can be removed; same for all 
> the
> includes except linux/byteorder.h, needed for the lexx_to_cpu/cpu_to_lexx
> functions, and linux/types.h for the uxx typedefs. Otherwise looks good.

Actually, ignore the part about removing the includes, it opens a new can of 
worms
- byteorder.h doesn't include compiler.h where __always_inline is defined, and
various files where struct kvm_cpu is used don't include kvm-cpu.h (like pci.c,
hw/serial.c, etc). The header removal is not trivial and I think it should be 
part
of another cleanup patch.

Thanks,

Alex

>
> Thanks,
>
> Alex
>
>>  
>> -struct ioport {
>> -struct rb_int_node  node;
>> -struct ioport_operations*ops;
>> -void*priv;
>> -struct device_headerdev_hdr;
>> -u32 refcount;
>> -boolremove;
>> -};
>> -
>> -struct ioport_operations {
>> -bool (*io_in)(struct ioport *ioport, struct kvm_cpu *vcpu, u16 port, 
>> void *data, int size);
>> -bool (*io_out)(struct ioport *ioport, struct kvm_cpu *vcpu, u16 port, 
>> void *data, int size);
>> -};
>> -
>>  void ioport__map_irq(u8 *irq);
>>  
>> -int __must_check ioport__register(struct kvm *kvm, u16 port, struct 
>> ioport_operations *ops,
>> -  int count, void *param);
>> -int ioport__unregister(struct kvm *kvm, u16 port);
>> -int ioport__init(struct kvm *kvm);
>> -int ioport__exit(struct kvm *kvm);
>> -
>>  static inline u8 ioport__read8(u8 *data)
>>  {
>>  return *data;
>> diff --git a/include/kvm/kvm.h b/include/kvm/kvm.h
>> index 14f9d58b..e70f8ef6 100644
>> --- a/include/kvm/kvm.h
>> +++ b/include/kvm/kvm.h
>> @@ -119,8 +119,6 @@ void kvm__irq_line(struct kvm *kvm, int irq, int level);
>>  void kvm__irq_trigger(struct kvm *kvm, int irq);
>>  bool kvm__emulate_io(struct kvm_cpu *vcpu, u16 port, void *data, int 
>> direction, int size, u32 count);
>>  bool kvm__emulate_mmio(struct kvm_cpu *vcpu, u64 phys_addr, u8 *data, u32 
>> len, u8 is_write);
>> -bool kvm__emulate_pio(struct kvm_cpu *vcpu, u16 port, void *data,
>> -  int direction, int size, u32 count);
>>  int kvm__destroy_mem(struct kvm *kvm, u64 guest_phys, u64 size, void 
>> *userspace_addr);
>>  int kvm__register_mem(struct kvm *kvm, u64 guest_phys, u64 size, void 
>> *userspace_addr,
>>enum kvm_mem_type type);
>> diff --git a/ioport.c b/ioport.c
>> deleted file mode 100644
>> index 204d8103..
>> --- a/ioport.c
>> +++ /dev/null
>> @@ -1,173 +0,0 @@
>> -#include "kvm/ioport.h"
>> -
>> -#include "kvm/kvm.h"
>> -#include "kvm/util.h"
>> -#include "kvm/rbtree-interval.h"
>> -#include "kvm/mutex.h"
>> -
>> -#include   /* for KVM_EXIT_* */
>> -#include 
>> -
>> -#include 
>> -#include 
>> -#include 
>> -#include 
>> -
>> -#define ioport_node(n) rb_entry(n, struct ioport, node)
>> -
>> -static DEFINE_MUTEX(ioport_lock);
>> -
>> -static struct rb_root   ioport_tree = RB_ROOT;
>> -
>> -static struct ioport *ioport_search(struct rb_root *root, u64 addr)
>> -{
>> -struct rb_int_node *node;
>> -
>> -node = rb_int_search_single(root, addr);
>> -if (node == NULL)
>> -return NULL;
>> -
>> -return ioport_node(node);
>> -}
>> -
>> -static int ioport_insert(struct rb_root *root, struct ioport *data)
>> -{
>> -return rb_int_insert(root, >node);
>> -}
>> -
>> -static void ioport_remove(struct rb_root *root, struct ioport *data)
>> -{
>> -rb_int_erase(root, >node);
>> -}
>> -
>> -static struct ioport *ioport_get(struct rb_root 

Re: [PATCH kvmtool 03/21] ioport: Retire .generate_fdt_node functionality

2021-02-17 Thread Alexandru Elisei
Hi Andre,

On 2/17/21 3:54 PM, Andre Przywara wrote:
> On Thu, 11 Feb 2021 14:05:27 +
> Alexandru Elisei  wrote:
>
>> Hi Andre,
>>
>> On 12/10/20 2:28 PM, Andre Przywara wrote:
>>> The ioport routines support a special way of registering FDT node
>>> generator functions. There is no reason to have this separate from the
>>> already existing way via the device header.
>>>
>>> Now that the only user of this special ioport variety has been
>>> transferred, we can retire this code, to simplify ioport handling.  
>> One comment below, but otherwise very nice cleanup.
>>
>>> Signed-off-by: Andre Przywara 
>>> ---
>>>  include/kvm/ioport.h |  4 
>>>  ioport.c | 34 --
>>>  2 files changed, 38 deletions(-)
>>>
>>> diff --git a/include/kvm/ioport.h b/include/kvm/ioport.h
>>> index d0213541..a61038e2 100644
>>> --- a/include/kvm/ioport.h
>>> +++ b/include/kvm/ioport.h
>>> @@ -29,10 +29,6 @@ struct ioport {
>>>  struct ioport_operations {
>>> bool (*io_in)(struct ioport *ioport, struct kvm_cpu *vcpu, u16 port, 
>>> void *data, int size);
>>> bool (*io_out)(struct ioport *ioport, struct kvm_cpu *vcpu, u16 port, 
>>> void *data, int size);
>>> -   void (*generate_fdt_node)(struct ioport *ioport, void *fdt,
>>> - void (*generate_irq_prop)(void *fdt,
>>> -   u8 irq,
>>> -   enum irq_type));
>>>  };
>>>  
>>>  void ioport__map_irq(u8 *irq);
>>> diff --git a/ioport.c b/ioport.c
>>> index 667e8386..b98836d3 100644
>>> --- a/ioport.c
>>> +++ b/ioport.c
>>> @@ -56,7 +56,6 @@ static struct ioport *ioport_get(struct rb_root *root, 
>>> u64 addr)
>>>  /* Called with ioport_lock held. */
>>>  static void ioport_unregister(struct rb_root *root, struct ioport *data)
>>>  {
>>> -   device__unregister(>dev_hdr);
>>> ioport_remove(root, data);
>>> free(data);
>>>  }
>>> @@ -70,30 +69,6 @@ static void ioport_put(struct rb_root *root, struct 
>>> ioport *data)
>>> mutex_unlock(_lock);
>>>  }
>>>  
>>> -#ifdef CONFIG_HAS_LIBFDT
>>> -static void generate_ioport_fdt_node(void *fdt,
>>> -struct device_header *dev_hdr,
>>> -void (*generate_irq_prop)(void *fdt,
>>> -  u8 irq,
>>> -  enum irq_type))
>>> -{
>>> -   struct ioport *ioport = container_of(dev_hdr, struct ioport, dev_hdr);
>>> -   struct ioport_operations *ops = ioport->ops;
>>> -
>>> -   if (ops->generate_fdt_node)
>>> -   ops->generate_fdt_node(ioport, fdt, generate_irq_prop);
>>> -}
>>> -#else
>>> -static void generate_ioport_fdt_node(void *fdt,
>>> -struct device_header *dev_hdr,
>>> -void (*generate_irq_prop)(void *fdt,
>>> -  u8 irq,
>>> -  enum irq_type))
>>> -{
>>> -   die("Unable to generate device tree nodes without libfdt\n");
>>> -}
>>> -#endif
>>> -
>>>  int ioport__register(struct kvm *kvm, u16 port, struct ioport_operations 
>>> *ops, int count, void *param)
>>>  {
>>> struct ioport *entry;
>>> @@ -107,10 +82,6 @@ int ioport__register(struct kvm *kvm, u16 port, struct 
>>> ioport_operations *ops, i
>>> .node   = RB_INT_INIT(port, port + count),
>>> .ops= ops,
>>> .priv   = param,
>>> -   .dev_hdr= (struct device_header) {
>>> -   .bus_type   = DEVICE_BUS_IOPORT,
>>> -   .data   = generate_ioport_fdt_node,
>>> -   },  
>> Since the dev_hdr field is not used anymore, maybe it could also be removed 
>> from
>> struct ioport in include/kvm/ioport.h?
> I could (seems to indeed still work without it), but this whole
> structure will go away with a later patch, so I didn't bother so far.
> That's why I am not sure it's useful to do this at this point then.

Yes, you're totally right, no need to fiddle too much with it now because it 
will
removed later:

Reviewed-by: Alexandru Elisei 

Thanks,

Alex

>
> Cheers,
> Andre
>
>>> /*
>>>  * Start from 0 because ioport__unregister() doesn't decrement
>>>  * the reference count.
>>> @@ -123,15 +94,10 @@ int ioport__register(struct kvm *kvm, u16 port, struct 
>>> ioport_operations *ops, i
>>> r = ioport_insert(_tree, entry);
>>> if (r < 0)
>>> goto out_free;
>>> -   r = device__register(>dev_hdr);
>>> -   if (r < 0)
>>> -   goto out_remove;
>>> mutex_unlock(_lock);
>>>  
>>> return port;
>>>  
>>> -out_remove:
>>> -   ioport_remove(_tree, entry);
>>>  out_free:
>>> free(entry);
>>> mutex_unlock(_lock);  
___
kvmarm mailing 

Re: [PATCH kvmtool 03/21] ioport: Retire .generate_fdt_node functionality

2021-02-17 Thread Andre Przywara
On Thu, 11 Feb 2021 14:05:27 +
Alexandru Elisei  wrote:

> Hi Andre,
> 
> On 12/10/20 2:28 PM, Andre Przywara wrote:
> > The ioport routines support a special way of registering FDT node
> > generator functions. There is no reason to have this separate from the
> > already existing way via the device header.
> >
> > Now that the only user of this special ioport variety has been
> > transferred, we can retire this code, to simplify ioport handling.  
> 
> One comment below, but otherwise very nice cleanup.
> 
> >
> > Signed-off-by: Andre Przywara 
> > ---
> >  include/kvm/ioport.h |  4 
> >  ioport.c | 34 --
> >  2 files changed, 38 deletions(-)
> >
> > diff --git a/include/kvm/ioport.h b/include/kvm/ioport.h
> > index d0213541..a61038e2 100644
> > --- a/include/kvm/ioport.h
> > +++ b/include/kvm/ioport.h
> > @@ -29,10 +29,6 @@ struct ioport {
> >  struct ioport_operations {
> > bool (*io_in)(struct ioport *ioport, struct kvm_cpu *vcpu, u16 port, 
> > void *data, int size);
> > bool (*io_out)(struct ioport *ioport, struct kvm_cpu *vcpu, u16 port, 
> > void *data, int size);
> > -   void (*generate_fdt_node)(struct ioport *ioport, void *fdt,
> > - void (*generate_irq_prop)(void *fdt,
> > -   u8 irq,
> > -   enum irq_type));
> >  };
> >  
> >  void ioport__map_irq(u8 *irq);
> > diff --git a/ioport.c b/ioport.c
> > index 667e8386..b98836d3 100644
> > --- a/ioport.c
> > +++ b/ioport.c
> > @@ -56,7 +56,6 @@ static struct ioport *ioport_get(struct rb_root *root, 
> > u64 addr)
> >  /* Called with ioport_lock held. */
> >  static void ioport_unregister(struct rb_root *root, struct ioport *data)
> >  {
> > -   device__unregister(>dev_hdr);
> > ioport_remove(root, data);
> > free(data);
> >  }
> > @@ -70,30 +69,6 @@ static void ioport_put(struct rb_root *root, struct 
> > ioport *data)
> > mutex_unlock(_lock);
> >  }
> >  
> > -#ifdef CONFIG_HAS_LIBFDT
> > -static void generate_ioport_fdt_node(void *fdt,
> > -struct device_header *dev_hdr,
> > -void (*generate_irq_prop)(void *fdt,
> > -  u8 irq,
> > -  enum irq_type))
> > -{
> > -   struct ioport *ioport = container_of(dev_hdr, struct ioport, dev_hdr);
> > -   struct ioport_operations *ops = ioport->ops;
> > -
> > -   if (ops->generate_fdt_node)
> > -   ops->generate_fdt_node(ioport, fdt, generate_irq_prop);
> > -}
> > -#else
> > -static void generate_ioport_fdt_node(void *fdt,
> > -struct device_header *dev_hdr,
> > -void (*generate_irq_prop)(void *fdt,
> > -  u8 irq,
> > -  enum irq_type))
> > -{
> > -   die("Unable to generate device tree nodes without libfdt\n");
> > -}
> > -#endif
> > -
> >  int ioport__register(struct kvm *kvm, u16 port, struct ioport_operations 
> > *ops, int count, void *param)
> >  {
> > struct ioport *entry;
> > @@ -107,10 +82,6 @@ int ioport__register(struct kvm *kvm, u16 port, struct 
> > ioport_operations *ops, i
> > .node   = RB_INT_INIT(port, port + count),
> > .ops= ops,
> > .priv   = param,
> > -   .dev_hdr= (struct device_header) {
> > -   .bus_type   = DEVICE_BUS_IOPORT,
> > -   .data   = generate_ioport_fdt_node,
> > -   },  
> 
> Since the dev_hdr field is not used anymore, maybe it could also be removed 
> from
> struct ioport in include/kvm/ioport.h?

I could (seems to indeed still work without it), but this whole
structure will go away with a later patch, so I didn't bother so far.
That's why I am not sure it's useful to do this at this point then.

Cheers,
Andre

> > /*
> >  * Start from 0 because ioport__unregister() doesn't decrement
> >  * the reference count.
> > @@ -123,15 +94,10 @@ int ioport__register(struct kvm *kvm, u16 port, struct 
> > ioport_operations *ops, i
> > r = ioport_insert(_tree, entry);
> > if (r < 0)
> > goto out_free;
> > -   r = device__register(>dev_hdr);
> > -   if (r < 0)
> > -   goto out_remove;
> > mutex_unlock(_lock);
> >  
> > return port;
> >  
> > -out_remove:
> > -   ioport_remove(_tree, entry);
> >  out_free:
> > free(entry);
> > mutex_unlock(_lock);  

___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH kvmtool 19/21] Remove ioport specific routines

2021-02-17 Thread Alexandru Elisei
Hi Andre,

On 12/10/20 2:29 PM, Andre Przywara wrote:
> Now that all users of the dedicated ioport trap handler interface are
> gone, we can retire the code associated with it.
>
> This removes ioport.c and ioport.h, along with removing prototypes from
> other header files.
>
> This also transfers the responsibility for port I/O trap handling
> entirely into the new routine in mmio.c.
>
> Signed-off-by: Andre Przywara 
> ---
>  Makefile |   1 -
>  include/kvm/ioport.h |  20 -
>  include/kvm/kvm.h|   2 -
>  ioport.c | 173 ---
>  mmio.c   |   2 +-
>  5 files changed, 1 insertion(+), 197 deletions(-)
>  delete mode 100644 ioport.c
>
> diff --git a/Makefile b/Makefile
> index 35bb1182..94ff5da6 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -56,7 +56,6 @@ OBJS+= framebuffer.o
>  OBJS += guest_compat.o
>  OBJS += hw/rtc.o
>  OBJS += hw/serial.o
> -OBJS += ioport.o
>  OBJS += irq.o
>  OBJS += kvm-cpu.o
>  OBJS += kvm.o
> diff --git a/include/kvm/ioport.h b/include/kvm/ioport.h
> index a61038e2..38636553 100644
> --- a/include/kvm/ioport.h
> +++ b/include/kvm/ioport.h
> @@ -17,28 +17,8 @@
>  
>  struct kvm;

Looks to me like the above forward declaration can be removed; same for all the
includes except linux/byteorder.h, needed for the lexx_to_cpu/cpu_to_lexx
functions, and linux/types.h for the uxx typedefs. Otherwise looks good.

Thanks,

Alex

>  
> -struct ioport {
> - struct rb_int_node  node;
> - struct ioport_operations*ops;
> - void*priv;
> - struct device_headerdev_hdr;
> - u32 refcount;
> - boolremove;
> -};
> -
> -struct ioport_operations {
> - bool (*io_in)(struct ioport *ioport, struct kvm_cpu *vcpu, u16 port, 
> void *data, int size);
> - bool (*io_out)(struct ioport *ioport, struct kvm_cpu *vcpu, u16 port, 
> void *data, int size);
> -};
> -
>  void ioport__map_irq(u8 *irq);
>  
> -int __must_check ioport__register(struct kvm *kvm, u16 port, struct 
> ioport_operations *ops,
> -   int count, void *param);
> -int ioport__unregister(struct kvm *kvm, u16 port);
> -int ioport__init(struct kvm *kvm);
> -int ioport__exit(struct kvm *kvm);
> -
>  static inline u8 ioport__read8(u8 *data)
>  {
>   return *data;
> diff --git a/include/kvm/kvm.h b/include/kvm/kvm.h
> index 14f9d58b..e70f8ef6 100644
> --- a/include/kvm/kvm.h
> +++ b/include/kvm/kvm.h
> @@ -119,8 +119,6 @@ void kvm__irq_line(struct kvm *kvm, int irq, int level);
>  void kvm__irq_trigger(struct kvm *kvm, int irq);
>  bool kvm__emulate_io(struct kvm_cpu *vcpu, u16 port, void *data, int 
> direction, int size, u32 count);
>  bool kvm__emulate_mmio(struct kvm_cpu *vcpu, u64 phys_addr, u8 *data, u32 
> len, u8 is_write);
> -bool kvm__emulate_pio(struct kvm_cpu *vcpu, u16 port, void *data,
> -   int direction, int size, u32 count);
>  int kvm__destroy_mem(struct kvm *kvm, u64 guest_phys, u64 size, void 
> *userspace_addr);
>  int kvm__register_mem(struct kvm *kvm, u64 guest_phys, u64 size, void 
> *userspace_addr,
> enum kvm_mem_type type);
> diff --git a/ioport.c b/ioport.c
> deleted file mode 100644
> index 204d8103..
> --- a/ioport.c
> +++ /dev/null
> @@ -1,173 +0,0 @@
> -#include "kvm/ioport.h"
> -
> -#include "kvm/kvm.h"
> -#include "kvm/util.h"
> -#include "kvm/rbtree-interval.h"
> -#include "kvm/mutex.h"
> -
> -#include/* for KVM_EXIT_* */
> -#include 
> -
> -#include 
> -#include 
> -#include 
> -#include 
> -
> -#define ioport_node(n) rb_entry(n, struct ioport, node)
> -
> -static DEFINE_MUTEX(ioport_lock);
> -
> -static struct rb_rootioport_tree = RB_ROOT;
> -
> -static struct ioport *ioport_search(struct rb_root *root, u64 addr)
> -{
> - struct rb_int_node *node;
> -
> - node = rb_int_search_single(root, addr);
> - if (node == NULL)
> - return NULL;
> -
> - return ioport_node(node);
> -}
> -
> -static int ioport_insert(struct rb_root *root, struct ioport *data)
> -{
> - return rb_int_insert(root, >node);
> -}
> -
> -static void ioport_remove(struct rb_root *root, struct ioport *data)
> -{
> - rb_int_erase(root, >node);
> -}
> -
> -static struct ioport *ioport_get(struct rb_root *root, u64 addr)
> -{
> - struct ioport *ioport;
> -
> - mutex_lock(_lock);
> - ioport = ioport_search(root, addr);
> - if (ioport)
> - ioport->refcount++;
> - mutex_unlock(_lock);
> -
> - return ioport;
> -}
> -
> -/* Called with ioport_lock held. */
> -static void ioport_unregister(struct rb_root *root, struct ioport *data)
> -{
> - ioport_remove(root, data);
> - free(data);
> -}
> -
> -static void ioport_put(struct rb_root *root, struct ioport *data)
> -{
> - mutex_lock(_lock);
> - data->refcount--;
> - if (data->remove && data->refcount 

Re: [PATCH kvmtool 18/21] pci: Switch trap handling to use MMIO handler

2021-02-17 Thread Alexandru Elisei
Hi Andre,

On 12/10/20 2:29 PM, Andre Przywara wrote:
> With the planned retirement of the special ioport emulation code, we
> need to provide an emulation function compatible with the MMIO prototype.
>
> Merge the existing _in and _out handlers to adhere to that MMIO
> interface, and register these using the new registration function.
>
> Signed-off-by: Andre Przywara 

It looks like there's no change in functionality, the patch looks correct to me:

Reviewed-by: Alexandru Elisei 

Thanks,

Alex

> ---
>  pci.c | 82 +--
>  1 file changed, 24 insertions(+), 58 deletions(-)
>
> diff --git a/pci.c b/pci.c
> index 2e2c0270..d6da79e0 100644
> --- a/pci.c
> +++ b/pci.c
> @@ -87,29 +87,16 @@ static void *pci_config_address_ptr(u16 port)
>   return base + offset;
>  }
>  
> -static bool pci_config_address_out(struct ioport *ioport, struct kvm_cpu 
> *vcpu, u16 port, void *data, int size)
> +static void pci_config_address_mmio(struct kvm_cpu *vcpu, u64 addr, u8 *data,
> + u32 len, u8 is_write, void *ptr)
>  {
> - void *p = pci_config_address_ptr(port);
> + void *p = pci_config_address_ptr(addr);
>  
> - memcpy(p, data, size);
> -
> - return true;
> -}
> -
> -static bool pci_config_address_in(struct ioport *ioport, struct kvm_cpu 
> *vcpu, u16 port, void *data, int size)
> -{
> - void *p = pci_config_address_ptr(port);
> -
> - memcpy(data, p, size);
> -
> - return true;
> + if (is_write)
> + memcpy(p, data, len);
> + else
> + memcpy(data, p, len);
>  }
> -
> -static struct ioport_operations pci_config_address_ops = {
> - .io_in  = pci_config_address_in,
> - .io_out = pci_config_address_out,
> -};
> -
>  static bool pci_device_exists(u8 bus_number, u8 device_number, u8 
> function_number)
>  {
>   union pci_config_address pci_config_address;
> @@ -125,49 +112,27 @@ static bool pci_device_exists(u8 bus_number, u8 
> device_number, u8 function_numbe
>   return !IS_ERR_OR_NULL(device__find_dev(DEVICE_BUS_PCI, device_number));
>  }
>  
> -static bool pci_config_data_out(struct ioport *ioport, struct kvm_cpu *vcpu, 
> u16 port, void *data, int size)
> -{
> - union pci_config_address pci_config_address;
> -
> - if (size > 4)
> - size = 4;
> -
> - pci_config_address.w = ioport__read32(_config_address_bits);
> - /*
> -  * If someone accesses PCI configuration space offsets that are not
> -  * aligned to 4 bytes, it uses ioports to signify that.
> -  */
> - pci_config_address.reg_offset = port - PCI_CONFIG_DATA;
> -
> - pci__config_wr(vcpu->kvm, pci_config_address, data, size);
> -
> - return true;
> -}
> -
> -static bool pci_config_data_in(struct ioport *ioport, struct kvm_cpu *vcpu, 
> u16 port, void *data, int size)
> +static void pci_config_data_mmio(struct kvm_cpu *vcpu, u64 addr, u8 *data,
> +  u32 len, u8 is_write, void *kvm)
>  {
>   union pci_config_address pci_config_address;
>  
> - if (size > 4)
> - size = 4;
> + if (len > 4)
> + len = 4;
>  
>   pci_config_address.w = ioport__read32(_config_address_bits);
>   /*
>* If someone accesses PCI configuration space offsets that are not
>* aligned to 4 bytes, it uses ioports to signify that.
>*/
> - pci_config_address.reg_offset = port - PCI_CONFIG_DATA;
> + pci_config_address.reg_offset = addr - PCI_CONFIG_DATA;
>  
> - pci__config_rd(vcpu->kvm, pci_config_address, data, size);
> -
> - return true;
> + if (is_write)
> + pci__config_wr(vcpu->kvm, pci_config_address, data, len);
> + else
> + pci__config_rd(vcpu->kvm, pci_config_address, data, len);
>  }
>  
> -static struct ioport_operations pci_config_data_ops = {
> - .io_in  = pci_config_data_in,
> - .io_out = pci_config_data_out,
> -};
> -
>  static int pci_activate_bar(struct kvm *kvm, struct pci_device_header 
> *pci_hdr,
>   int bar_num)
>  {
> @@ -512,11 +477,12 @@ int pci__init(struct kvm *kvm)
>  {
>   int r;
>  
> - r = ioport__register(kvm, PCI_CONFIG_DATA + 0, _config_data_ops, 4, 
> NULL);
> + r = kvm__register_pio(kvm, PCI_CONFIG_DATA, 4,
> +  pci_config_data_mmio, NULL);
>   if (r < 0)
>   return r;
> -
> - r = ioport__register(kvm, PCI_CONFIG_ADDRESS + 0, 
> _config_address_ops, 4, NULL);
> + r = kvm__register_pio(kvm, PCI_CONFIG_ADDRESS, 4,
> +  pci_config_address_mmio, NULL);
>   if (r < 0)
>   goto err_unregister_data;
>  
> @@ -528,17 +494,17 @@ int pci__init(struct kvm *kvm)
>   return 0;
>  
>  err_unregister_addr:
> - ioport__unregister(kvm, PCI_CONFIG_ADDRESS);
> + kvm__deregister_pio(kvm, PCI_CONFIG_ADDRESS);
>  err_unregister_data:
> - ioport__unregister(kvm, PCI_CONFIG_DATA);
> + 

Re: [PATCH] KVM: arm64: Handle CMOs on Read Only memslots

2021-02-17 Thread Alexandru Elisei
Hi Drew,

On 2/17/21 10:43 AM, Andrew Jones wrote:
> On Tue, Feb 16, 2021 at 12:18:31PM +, Alexandru Elisei wrote:
>> Hi Marc,
>>
>> Played with this for a bit to try to understand the problem better, wrote a 
>> simple
>> MMIO device in kvmtool which maps the memory as a read-only memslot [1] and 
>> poked
>> it with kvm-unit-tests [2].
>>
>> [1] https://gitlab.arm.com/linux-arm/kvmtool-ae/-/tree/mmiodev-wip1
>>
>> [2] https://gitlab.arm.com/linux-arm/kvm-unit-tests-ae/-/tree/mmiodev-wip1
> Looks like you forgot to add arm/mmiodev.c to your commit.

Fixed, thanks for pointing that out!

Thanks,

Alex

>
> Thanks,
> drew
>
>> On 2/11/21 2:27 PM, Marc Zyngier wrote:
>>> It appears that when a guest traps into KVM because it is
>>> performing a CMO on a Read Only memslot, our handling of
>>> this operation is "slightly suboptimal", as we treat it as
>>> an MMIO access without a valid syndrome.
>>>
>>> The chances that userspace is adequately equiped to deal
>>> with such an exception being slim, it would be better to
>>> handle it in the kernel.
>>>
>>> What we need to provide is roughly as follows:
>>>
>>> (a) if a CMO hits writeable memory, handle it as a normal memory acess
>>> (b) if a CMO hits non-memory, skip it
>>> (c) if a CMO hits R/O memory, that's where things become fun:
>>>   (1) if the CMO is DC IVAC, the architecture says this should result
>>>   in a permission fault
>>>   (2) if the CMO is DC CIVAC, it should work similarly to (a)
>>>
>>> We already perform (a) and (b) correctly, but (c) is a total mess.
>>> Hence we need to distinguish between IVAC (c.1) and CIVAC (c.2).
>>>
>>> One way to do it is to treat CMOs generating a translation fault as
>>> a *read*, even when they are on a RW memslot. This allows us to
>>> further triage things:
>>>
>>> If they come back with a permission fault, that is because this is
>>> a DC IVAC instruction:
>>> - inside a RW memslot: no problem, treat it as a write (a)(c.2)
>>> - inside a RO memslot: inject a data abort in the guest (c.1)
>>>
>>> The only drawback is that DC IVAC on a yet unmapped page faults
>>> twice: one for the initial translation fault that result in a RO
>>> mapping, and once for the permission fault. I think we can live with
>>> that.
>>>
>>> Reported-by: Jianyong Wu 
>>> Signed-off-by: Marc Zyngier 
>>> ---
>>>
>>> Notes:
>>> I have taken the option to inject an abort in the guest when
>>> it issues a DC IVAC on a R/O memslot, but another option would
>>> be to just perform the invalidation ourselves as a DC CIAVAC.
>>> 
>>> This would have the advantage of being consistent with what we
>>> do for emulated MMIO.
>>>
>>>  arch/arm64/kvm/mmu.c | 53 ++--
>>>  1 file changed, 41 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
>>> index 7d2257cc5438..c7f4388bea45 100644
>>> --- a/arch/arm64/kvm/mmu.c
>>> +++ b/arch/arm64/kvm/mmu.c
>>> @@ -760,7 +760,17 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
>>> phys_addr_t fault_ipa,
>>> struct kvm_pgtable *pgt;
>>>  
>>> fault_granule = 1UL << ARM64_HW_PGTABLE_LEVEL_SHIFT(fault_level);
>>> -   write_fault = kvm_is_write_fault(vcpu);
>>> +   /*
>>> +* Treat translation faults on CMOs as read faults. Should
>>> +* this further generate a permission fault on a R/O memslot,
>>> +* it will be caught in kvm_handle_guest_abort(), with
>>> +* prejudice. Permission faults on non-R/O memslot will be
>>> +* gracefully handled as writes.
>>> +*/
>>> +   if (fault_status == FSC_FAULT && kvm_vcpu_dabt_is_cm(vcpu))
>>> +   write_fault = false;
>> This means that every DC CIVAC will map the IPA with read permissions in the 
>> stage
>> 2 tables, regardless of the IPA being already mapped. It's harmless, but a 
>> bit
>> unexpected.
>>
>>> +   else
>>> +   write_fault = kvm_is_write_fault(vcpu);
>>> exec_fault = kvm_vcpu_trap_is_exec_fault(vcpu);
>>> VM_BUG_ON(write_fault && exec_fault);
>>>  
>>> @@ -1013,19 +1023,37 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
>>> }
>>>  
>>> /*
>>> -* Check for a cache maintenance operation. Since we
>>> -* ended-up here, we know it is outside of any memory
>>> -* slot. But we can't find out if that is for a device,
>>> -* or if the guest is just being stupid. The only thing
>>> -* we know for sure is that this range cannot be cached.
>>> +* Check for a cache maintenance operation. Three cases:
>>> +*
>>> +* - It is outside of any memory slot. But we can't find out
>>> +*   if that is for a device, or if the guest is just being
>>> +*   stupid. The only thing we know for sure is that this
>>> +*   range cannot be cached.  So let's assume that the guest
>>> +*   is just being cautious, and skip the instruction.
>>> +   

Re: [PATCH] KVM: arm64: Handle CMOs on Read Only memslots

2021-02-17 Thread Andrew Jones
On Tue, Feb 16, 2021 at 12:18:31PM +, Alexandru Elisei wrote:
> Hi Marc,
> 
> Played with this for a bit to try to understand the problem better, wrote a 
> simple
> MMIO device in kvmtool which maps the memory as a read-only memslot [1] and 
> poked
> it with kvm-unit-tests [2].
> 
> [1] https://gitlab.arm.com/linux-arm/kvmtool-ae/-/tree/mmiodev-wip1
> 
> [2] https://gitlab.arm.com/linux-arm/kvm-unit-tests-ae/-/tree/mmiodev-wip1

Looks like you forgot to add arm/mmiodev.c to your commit.

Thanks,
drew

> 
> On 2/11/21 2:27 PM, Marc Zyngier wrote:
> > It appears that when a guest traps into KVM because it is
> > performing a CMO on a Read Only memslot, our handling of
> > this operation is "slightly suboptimal", as we treat it as
> > an MMIO access without a valid syndrome.
> >
> > The chances that userspace is adequately equiped to deal
> > with such an exception being slim, it would be better to
> > handle it in the kernel.
> >
> > What we need to provide is roughly as follows:
> >
> > (a) if a CMO hits writeable memory, handle it as a normal memory acess
> > (b) if a CMO hits non-memory, skip it
> > (c) if a CMO hits R/O memory, that's where things become fun:
> >   (1) if the CMO is DC IVAC, the architecture says this should result
> >   in a permission fault
> >   (2) if the CMO is DC CIVAC, it should work similarly to (a)
> >
> > We already perform (a) and (b) correctly, but (c) is a total mess.
> > Hence we need to distinguish between IVAC (c.1) and CIVAC (c.2).
> >
> > One way to do it is to treat CMOs generating a translation fault as
> > a *read*, even when they are on a RW memslot. This allows us to
> > further triage things:
> >
> > If they come back with a permission fault, that is because this is
> > a DC IVAC instruction:
> > - inside a RW memslot: no problem, treat it as a write (a)(c.2)
> > - inside a RO memslot: inject a data abort in the guest (c.1)
> >
> > The only drawback is that DC IVAC on a yet unmapped page faults
> > twice: one for the initial translation fault that result in a RO
> > mapping, and once for the permission fault. I think we can live with
> > that.
> >
> > Reported-by: Jianyong Wu 
> > Signed-off-by: Marc Zyngier 
> > ---
> >
> > Notes:
> > I have taken the option to inject an abort in the guest when
> > it issues a DC IVAC on a R/O memslot, but another option would
> > be to just perform the invalidation ourselves as a DC CIAVAC.
> > 
> > This would have the advantage of being consistent with what we
> > do for emulated MMIO.
> >
> >  arch/arm64/kvm/mmu.c | 53 ++--
> >  1 file changed, 41 insertions(+), 12 deletions(-)
> >
> > diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
> > index 7d2257cc5438..c7f4388bea45 100644
> > --- a/arch/arm64/kvm/mmu.c
> > +++ b/arch/arm64/kvm/mmu.c
> > @@ -760,7 +760,17 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
> > phys_addr_t fault_ipa,
> > struct kvm_pgtable *pgt;
> >  
> > fault_granule = 1UL << ARM64_HW_PGTABLE_LEVEL_SHIFT(fault_level);
> > -   write_fault = kvm_is_write_fault(vcpu);
> > +   /*
> > +* Treat translation faults on CMOs as read faults. Should
> > +* this further generate a permission fault on a R/O memslot,
> > +* it will be caught in kvm_handle_guest_abort(), with
> > +* prejudice. Permission faults on non-R/O memslot will be
> > +* gracefully handled as writes.
> > +*/
> > +   if (fault_status == FSC_FAULT && kvm_vcpu_dabt_is_cm(vcpu))
> > +   write_fault = false;
> 
> This means that every DC CIVAC will map the IPA with read permissions in the 
> stage
> 2 tables, regardless of the IPA being already mapped. It's harmless, but a bit
> unexpected.
> 
> > +   else
> > +   write_fault = kvm_is_write_fault(vcpu);
> > exec_fault = kvm_vcpu_trap_is_exec_fault(vcpu);
> > VM_BUG_ON(write_fault && exec_fault);
> >  
> > @@ -1013,19 +1023,37 @@ int kvm_handle_guest_abort(struct kvm_vcpu *vcpu)
> > }
> >  
> > /*
> > -* Check for a cache maintenance operation. Since we
> > -* ended-up here, we know it is outside of any memory
> > -* slot. But we can't find out if that is for a device,
> > -* or if the guest is just being stupid. The only thing
> > -* we know for sure is that this range cannot be cached.
> > +* Check for a cache maintenance operation. Three cases:
> > +*
> > +* - It is outside of any memory slot. But we can't find out
> > +*   if that is for a device, or if the guest is just being
> > +*   stupid. The only thing we know for sure is that this
> > +*   range cannot be cached.  So let's assume that the guest
> > +*   is just being cautious, and skip the instruction.
> > +*
> > +* - Otherwise, check whether this is a permission fault.
> > +*   If so, that's a DC IVAC on