Re: [PATCH 2/3] KVM: arm64: Handle S1PTW translation with TCR_HA set as a write

2022-12-24 Thread Marc Zyngier
On Thu, 22 Dec 2022 20:58:40 +,
Oliver Upton  wrote:
> 
> On Thu, Dec 22, 2022 at 09:01:15AM +, Marc Zyngier wrote:
> > On Wed, 21 Dec 2022 17:46:24 +, Oliver Upton  
> > wrote:
> > >  - When UFFD is in use, translation faults are reported to userspace as
> > >writes when from a RW memslot and reads when from an RO memslot.
> > 
> > Not quite: translation faults are reported as reads if TCR_EL1.HA
> > isn't set, and as writes if it is. Ignoring TCR_EL1.HD for a moment,
> > this matches exactly the behaviour of the page-table walker, which
> > will update the S1 PTs only if this bit is set.
> 
> My bad, yes you're right. I conflated the use case here with the
> architectural state.
> 
> I'm probably being way too pedantic, but I just wanted to make sure we
> agree about the ensuing subtlety. More below:
> 
> > Or is it what userfaultfd does on its own? That'd be confusing...
> > 
> > > 
> > >  - S1 page table memory is spuriously marked as dirty, as we presume a
> > >write immediately follows the translation fault. That isn't entirely
> > >senseless, as it would mean both the target page and the S1 PT that
> > >maps it are both old. This is nothing new I suppose, just weird.
> > 
> > s/old/young/ ?
> > 
> > I think you're confusing the PT access with the access that caused the
> > PT access (I'll have that printed on a t-shirt, thank you very much).
> 
> I'd buy it!
> 
> > Here, we're not considering the cause of the PT access anymore. If
> > TCR_EL1.HA is set, the S1 PT page will be marked as accessed even on a
> > read, and only that page.
> 
> I think this is where the disconnect might be. TCR_EL1.HA == 1 suggests
> a write could possibly follow, but I don't think it requires it. The
> page table walker must first load the S1 PTE before writing to it.

Ah, you're talking of the write to the PTE. Too many writes!

My reasoning is based on Rule LFTXR in DDI0487I.a, which says:

"When the PE performs a hardware update of the AF, it sets the AF to 1
 in the corresponding descriptor in memory, in a coherent manner,
 using an atomic read-modify-write of that descriptor."

An atomic-or operation fits this description, and I cannot see
anything in the architecture that would prevent the write of a PTE
even if AF is already set, such as mandating something like a
test-and-set or compare-and-swap.

I'm not saying this is the only possible implementation, or even a
good one. But I don't think this is incompatible with what the
architecture mandates.

> 
> From AArch64.S1Translate() (DDI0487H.a):
> 
> (fault, descaddress, walkstate, descriptor) = AArch64.S1Walk(fault, 
> walkparams, va, regime,
>ss, acctype, 
> iswrite, ispriv);
> 
> [...]
> 
> new_desc = descriptor;
> if walkparams.ha == '1' && AArch64.FaultAllowsSetAccessFlag(fault) then
>   // Set descriptor AF bit
>   new_desc<10> = '1';
> 
> [...]
> 
> // Either the access flag was clear or AP<2> is set
> if new_desc != descriptor then
>   if regime == Regime_EL10 && EL2Enabled() then
> s1aarch64 = TRUE;
>   s2fs1walk = TRUE;
>   aligned = TRUE;
>   iswrite = TRUE;
>   (s2fault, descupdateaddress) = AArch64.S2Translate(fault, descaddress, 
> s1aarch64,
>  ss, s2fs1walk, 
> AccType_ATOMICRW,
>  aligned, iswrite, 
> ispriv);
> 
> if s2fault.statuscode != Fault_None then
>   return (s2fault, AddressDescriptor UNKNOWN);
> else
>   descupdateaddress = descaddress;
> 
> (fault, mem_desc) = AArch64.MemSwapTableDesc(fault, descriptor, new_desc,
>walkparams.ee, 
> descupdateaddress)
> 
> Buried in AArch64.S1Walk() is a stage-2 walk for a read to fetch the
> descriptor. The second stage-2 walk for write is conditioned on having
> already fetched the stage-1 descriptor and determining the AF needs
> to be set.

The question is whether this is one possible implementation, or the
only possible implementation. My bet is on the former.

> Relating back to UFFD: if we expect KVM to do exactly what hardware
> does, UFFD should see an attempted read when the first walk fails
> because of an S2 translation fault. Based on this patch, though, we'd
> promote it to a write if TCR_EL1.HA == 1.
> 
> This has the additional nuance of marking the S1 PT's IPA as dirty, even
> though it might not actually have been written to. Having said that,
> the false positive rate should be negligible given that S1 PTs ought to
> account for a small amount of guest memory.
> 
> Like I said before, I'm probably being unnecessarily pedantic :) It just
> seems to me that the view we're giving userspace of S1PTW aborts isn't
> exactly architectural and I want to make sure that is explicitly
> intentional.

I think it is perfectly fine to be pedantic about these things,

Re: [PATCH 2/3] KVM: arm64: Handle S1PTW translation with TCR_HA set as a write

2022-12-22 Thread Ricardo Koller
On Thu, Dec 22, 2022 at 08:58:40PM +, Oliver Upton wrote:
> On Thu, Dec 22, 2022 at 09:01:15AM +, Marc Zyngier wrote:
> > On Wed, 21 Dec 2022 17:46:24 +, Oliver Upton  
> > wrote:
> > >  - When UFFD is in use, translation faults are reported to userspace as
> > >writes when from a RW memslot and reads when from an RO memslot.
> > 
> > Not quite: translation faults are reported as reads if TCR_EL1.HA
> > isn't set, and as writes if it is. Ignoring TCR_EL1.HD for a moment,
> > this matches exactly the behaviour of the page-table walker, which
> > will update the S1 PTs only if this bit is set.
> 
> My bad, yes you're right. I conflated the use case here with the
> architectural state.
> 
> I'm probably being way too pedantic, but I just wanted to make sure we
> agree about the ensuing subtlety. More below:
> 
> > Or is it what userfaultfd does on its own? That'd be confusing...
> > 
> > > 
> > >  - S1 page table memory is spuriously marked as dirty, as we presume a
> > >write immediately follows the translation fault. That isn't entirely
> > >senseless, as it would mean both the target page and the S1 PT that
> > >maps it are both old. This is nothing new I suppose, just weird.
> > 
> > s/old/young/ ?
> > 
> > I think you're confusing the PT access with the access that caused the
> > PT access (I'll have that printed on a t-shirt, thank you very much).
> 
> I'd buy it!
> 
> > Here, we're not considering the cause of the PT access anymore. If
> > TCR_EL1.HA is set, the S1 PT page will be marked as accessed even on a
> > read, and only that page.
> 
> I think this is where the disconnect might be. TCR_EL1.HA == 1 suggests
> a write could possibly follow, but I don't think it requires it. The
> page table walker must first load the S1 PTE before writing to it.
> 
> From AArch64.S1Translate() (DDI0487H.a):
> 
> (fault, descaddress, walkstate, descriptor) = AArch64.S1Walk(fault, 
> walkparams, va, regime,
>ss, acctype, 
> iswrite, ispriv);
> 
> [...]
> 
> new_desc = descriptor;
> if walkparams.ha == '1' && AArch64.FaultAllowsSetAccessFlag(fault) then
>   // Set descriptor AF bit
>   new_desc<10> = '1';
> 
> [...]
> 
> // Either the access flag was clear or AP<2> is set
> if new_desc != descriptor then
>   if regime == Regime_EL10 && EL2Enabled() then
> s1aarch64 = TRUE;
>   s2fs1walk = TRUE;
>   aligned = TRUE;
>   iswrite = TRUE;
>   (s2fault, descupdateaddress) = AArch64.S2Translate(fault, descaddress, 
> s1aarch64,
>  ss, s2fs1walk, 
> AccType_ATOMICRW,
>  aligned, iswrite, 
> ispriv);
> 
> if s2fault.statuscode != Fault_None then
>   return (s2fault, AddressDescriptor UNKNOWN);
> else
>   descupdateaddress = descaddress;
> 
> (fault, mem_desc) = AArch64.MemSwapTableDesc(fault, descriptor, new_desc,
>walkparams.ee, 
> descupdateaddress)
> 
> Buried in AArch64.S1Walk() is a stage-2 walk for a read to fetch the
> descriptor. The second stage-2 walk for write is conditioned on having
> already fetched the stage-1 descriptor and determining the AF needs
> to be set.
> 
> Relating back to UFFD: if we expect KVM to do exactly what hardware
> does, UFFD should see an attempted read when the first walk fails
> because of an S2 translation fault. Based on this patch, though, we'd
> promote it to a write if TCR_EL1.HA == 1.
> 
> This has the additional nuance of marking the S1 PT's IPA as dirty, even
> though it might not actually have been written to. Having said that,
> the false positive rate should be negligible given that S1 PTs ought to
> account for a small amount of guest memory.

Another false positive is TCR_EL1.HA == 1 and having the AF bit set in
the PTE. This results on a write, when I don't think it should.

> 
> Like I said before, I'm probably being unnecessarily pedantic :) It just
> seems to me that the view we're giving userspace of S1PTW aborts isn't
> exactly architectural and I want to make sure that is explicitly
> intentional.
>
> --
> Thanks,
> Oliver
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 2/3] KVM: arm64: Handle S1PTW translation with TCR_HA set as a write

2022-12-22 Thread Ricardo Koller
Hi Marc,

On Wed, Dec 21, 2022 at 05:43:03PM +, Marc Zyngier wrote:
> Hi Ricardo,
> 
> On Wed, 21 Dec 2022 16:46:06 +,
> Ricardo Koller  wrote:
> > 
> > Hello,
> > 
> > On Tue, Dec 20, 2022 at 08:09:22PM +, Marc Zyngier wrote:
> > > As a minor optimisation, we can retrofit the "S1PTW is a write
> > > even on translation fault" concept *if* the vcpu is using the
> > > HW-managed Access Flag, as setting TCR_EL1.HA is guaranteed
> > > to result in an update of the PTE.
> > > 
> > > However, we cannot do the same thing for DB, as it would require
> > > us to parse the PTs to find out if the DBM bit is set there.
> > > This is not going to happen.
> > > 
> > > Signed-off-by: Marc Zyngier 
> > > ---
> > >  arch/arm64/include/asm/kvm_emulate.h | 20 +++-
> > >  1 file changed, 19 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/arch/arm64/include/asm/kvm_emulate.h 
> > > b/arch/arm64/include/asm/kvm_emulate.h
> > > index fd6ad8b21f85..4ee467065042 100644
> > > --- a/arch/arm64/include/asm/kvm_emulate.h
> > > +++ b/arch/arm64/include/asm/kvm_emulate.h
> > > @@ -374,6 +374,9 @@ static __always_inline int kvm_vcpu_sys_get_rt(struct 
> > > kvm_vcpu *vcpu)
> > >  static inline bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
> > >  {
> > >   if (kvm_vcpu_abt_iss1tw(vcpu)) {
> > > + unsigned int afdb;
> > > + u64 mmfr1;
> > > +
> > >   /*
> > >* Only a permission fault on a S1PTW should be
> > >* considered as a write. Otherwise, page tables baked
> > > @@ -385,12 +388,27 @@ static inline bool kvm_is_write_fault(struct 
> > > kvm_vcpu *vcpu)
> > >* to map the page containing the PT (read only at
> > >* first), then a permission fault to allow the flags
> > >* to be set.
> > > +  *
> > > +  * We can improve things if the guest uses AF, as this
> > > +  * is guaranteed to result in a write to the PTE. For
> > > +  * DB, however, we'd need to parse the guest's PTs,
> > > +  * and that's not on. DB is crap anyway.
> > >*/
> > >   switch (kvm_vcpu_trap_get_fault_type(vcpu)) {
> > 
> > Nit: fault_status is calculated once when taking the fault, and passed
> > around to all users (like user_mem_abort()). Not sure if this is because
> > of the extra cycles needed to get it, or just style. Anyway, maybe it
> > applies here.
> 
> All these things are just fields in ESR_EL2, which we keep looking at
> all the time. The compiler actually does a pretty good job at keeping
> that around, specially considering that this function is inlined (at
> least here, kvm_handle_guest_abort and kvm_user_mem_abort are merged
> into a single monster).
> 
> So passing the parameter wouldn't change a thing, and I find the above
> more readable (I know that all the information in this function are
> derived from the same data structure).
>

Got it, thanks for the info.

> > 
> > >   case ESR_ELx_FSC_PERM:
> > >   return true;
> > >   default:
> > > - return false;
> > > + /* Can't introspect TCR_EL1 with pKVM */
> > > + if (kvm_vm_is_protected(vcpu->kvm))
> > > + return false;
> > > +
> > > + mmfr1 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
> > > + afdb = cpuid_feature_extract_unsigned_field(mmfr1, 
> > > ID_AA64MMFR1_EL1_HAFDBS_SHIFT);
> > > +
> > > + if (afdb == ID_AA64MMFR1_EL1_HAFDBS_NI)
> > > + return false;
> > > +
> > > + return (vcpu_read_sys_reg(vcpu, TCR_EL1) & TCR_HA);
> > 
> > Also tested this specific case using page_fault_test when the PT page is
> > marked for dirty logging with and without AF. In both cases there's a
> > single _FSC_FAULT (no PERM_FAUT) as expected, and the PT page is marked 
> > dirty
> > in the AF case. The RO and UFFD cases also work as expected.
> 
> Ah, thanks for checking this.
> 
> > 
> > Need to send some changes for page_fault_test as many tests assume that
> > any S1PTW is always a PT write, and are failing. Also need to add some new
> > tests for PTs in RO memslots (as it didn't make much sense before this
> > change).
> 
> I think this is what I really quite didn't grok in these tests. They
> seem to verify the KVM behaviour, which is not what we should check
> for.
> 
> Instead, we should check for the architectural behaviour, which is
> that if HAFDBS is enabled, we can observe updates to the PTs even when
> we do not write to them directly.

There are some tests checking that case (e.g., AF set by HW), but they
also do it while interacting with dirty-logging, userfaultfd, and/or RO
memslots. Some checks are clearly dealing with architectural behavior,
while others are not that clear. Let me use this sample test to get more
specific.  This test deals with HW setting the AF bit on a punched hole
backed by userfaultfd:


Re: [PATCH 2/3] KVM: arm64: Handle S1PTW translation with TCR_HA set as a write

2022-12-22 Thread Oliver Upton
On Thu, Dec 22, 2022 at 09:01:15AM +, Marc Zyngier wrote:
> On Wed, 21 Dec 2022 17:46:24 +, Oliver Upton  
> wrote:
> >  - When UFFD is in use, translation faults are reported to userspace as
> >writes when from a RW memslot and reads when from an RO memslot.
> 
> Not quite: translation faults are reported as reads if TCR_EL1.HA
> isn't set, and as writes if it is. Ignoring TCR_EL1.HD for a moment,
> this matches exactly the behaviour of the page-table walker, which
> will update the S1 PTs only if this bit is set.

My bad, yes you're right. I conflated the use case here with the
architectural state.

I'm probably being way too pedantic, but I just wanted to make sure we
agree about the ensuing subtlety. More below:

> Or is it what userfaultfd does on its own? That'd be confusing...
> 
> > 
> >  - S1 page table memory is spuriously marked as dirty, as we presume a
> >write immediately follows the translation fault. That isn't entirely
> >senseless, as it would mean both the target page and the S1 PT that
> >maps it are both old. This is nothing new I suppose, just weird.
> 
> s/old/young/ ?
> 
> I think you're confusing the PT access with the access that caused the
> PT access (I'll have that printed on a t-shirt, thank you very much).

I'd buy it!

> Here, we're not considering the cause of the PT access anymore. If
> TCR_EL1.HA is set, the S1 PT page will be marked as accessed even on a
> read, and only that page.

I think this is where the disconnect might be. TCR_EL1.HA == 1 suggests
a write could possibly follow, but I don't think it requires it. The
page table walker must first load the S1 PTE before writing to it.

>From AArch64.S1Translate() (DDI0487H.a):

(fault, descaddress, walkstate, descriptor) = AArch64.S1Walk(fault, 
walkparams, va, regime,
 ss, acctype, 
iswrite, ispriv);

[...]

new_desc = descriptor;
if walkparams.ha == '1' && AArch64.FaultAllowsSetAccessFlag(fault) then
  // Set descriptor AF bit
  new_desc<10> = '1';

[...]

// Either the access flag was clear or AP<2> is set
if new_desc != descriptor then
  if regime == Regime_EL10 && EL2Enabled() then
s1aarch64 = TRUE;
s2fs1walk = TRUE;
aligned = TRUE;
iswrite = TRUE;
(s2fault, descupdateaddress) = AArch64.S2Translate(fault, descaddress, 
s1aarch64,
   ss, s2fs1walk, 
AccType_ATOMICRW,
   aligned, iswrite, 
ispriv);

if s2fault.statuscode != Fault_None then
  return (s2fault, AddressDescriptor UNKNOWN);
else
  descupdateaddress = descaddress;

(fault, mem_desc) = AArch64.MemSwapTableDesc(fault, descriptor, new_desc,
 walkparams.ee, 
descupdateaddress)

Buried in AArch64.S1Walk() is a stage-2 walk for a read to fetch the
descriptor. The second stage-2 walk for write is conditioned on having
already fetched the stage-1 descriptor and determining the AF needs
to be set.

Relating back to UFFD: if we expect KVM to do exactly what hardware
does, UFFD should see an attempted read when the first walk fails
because of an S2 translation fault. Based on this patch, though, we'd
promote it to a write if TCR_EL1.HA == 1.

This has the additional nuance of marking the S1 PT's IPA as dirty, even
though it might not actually have been written to. Having said that,
the false positive rate should be negligible given that S1 PTs ought to
account for a small amount of guest memory.

Like I said before, I'm probably being unnecessarily pedantic :) It just
seems to me that the view we're giving userspace of S1PTW aborts isn't
exactly architectural and I want to make sure that is explicitly
intentional.

--
Thanks,
Oliver
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 2/3] KVM: arm64: Handle S1PTW translation with TCR_HA set as a write

2022-12-22 Thread Marc Zyngier
On Wed, 21 Dec 2022 17:46:24 +,
Oliver Upton  wrote:
> 
> On Wed, Dec 21, 2022 at 08:46:06AM -0800, Ricardo Koller wrote:
> 
> [...]
> 
> > > - return false;
> > > + /* Can't introspect TCR_EL1 with pKVM */
> > > + if (kvm_vm_is_protected(vcpu->kvm))
> > > + return false;
> > > +
> > > + mmfr1 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
> > > + afdb = cpuid_feature_extract_unsigned_field(mmfr1, 
> > > ID_AA64MMFR1_EL1_HAFDBS_SHIFT);
> > > +
> > > + if (afdb == ID_AA64MMFR1_EL1_HAFDBS_NI)
> > > + return false;
> > > +
> > > + return (vcpu_read_sys_reg(vcpu, TCR_EL1) & TCR_HA);
> > 
> > Also tested this specific case using page_fault_test when the PT page is
> > marked for dirty logging with and without AF. In both cases there's a
> > single _FSC_FAULT (no PERM_FAUT) as expected, and the PT page is marked 
> > dirty
> > in the AF case. The RO and UFFD cases also work as expected.
> > 
> > Need to send some changes for page_fault_test as many tests assume that
> > any S1PTW is always a PT write, and are failing. Also need to add some new
> > tests for PTs in RO memslots (as it didn't make much sense before this
> > change).
> 
> So I actually wanted to bring up the issue of user visibility, glad your
> test picked up something.
> 
> This has two implications, which are rather odd.
> 
>  - When UFFD is in use, translation faults are reported to userspace as
>writes when from a RW memslot and reads when from an RO memslot.

Not quite: translation faults are reported as reads if TCR_EL1.HA
isn't set, and as writes if it is. Ignoring TCR_EL1.HD for a moment,
this matches exactly the behaviour of the page-table walker, which
will update the S1 PTs only if this bit is set.

Or is it what userfaultfd does on its own? That'd be confusing...

> 
>  - S1 page table memory is spuriously marked as dirty, as we presume a
>write immediately follows the translation fault. That isn't entirely
>senseless, as it would mean both the target page and the S1 PT that
>maps it are both old. This is nothing new I suppose, just weird.

s/old/young/ ?

I think you're confusing the PT access with the access that caused the
PT access (I'll have that printed on a t-shirt, thank you very much).

Here, we're not considering the cause of the PT access anymore. If
TCR_EL1.HA is set, the S1 PT page will be marked as accessed even on a
read, and only that page.

TCR_EL1.HD is what muddies the waters a bit. If it is set without HA
being set, we still handle the translation fault as a read, followed
by a write permission fault. But again, that's solely for the purpose
of the S1 PT. What happens for the mapped page is completely
independent.

> Marc, do you have any concerns about leaving this as-is for the time
> being? At least before we were doing the same thing (write fault) every
> time.

I have the ugly feeling we're talking at cross purpose here, mostly
because I don't get how userfaultfd fits in that picture. Can you shed
some light here?

Thanks,

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 2/3] KVM: arm64: Handle S1PTW translation with TCR_HA set as a write

2022-12-21 Thread Oliver Upton
On Wed, Dec 21, 2022 at 08:46:06AM -0800, Ricardo Koller wrote:

[...]

> > -   return false;
> > +   /* Can't introspect TCR_EL1 with pKVM */
> > +   if (kvm_vm_is_protected(vcpu->kvm))
> > +   return false;
> > +
> > +   mmfr1 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
> > +   afdb = cpuid_feature_extract_unsigned_field(mmfr1, 
> > ID_AA64MMFR1_EL1_HAFDBS_SHIFT);
> > +
> > +   if (afdb == ID_AA64MMFR1_EL1_HAFDBS_NI)
> > +   return false;
> > +
> > +   return (vcpu_read_sys_reg(vcpu, TCR_EL1) & TCR_HA);
> 
> Also tested this specific case using page_fault_test when the PT page is
> marked for dirty logging with and without AF. In both cases there's a
> single _FSC_FAULT (no PERM_FAUT) as expected, and the PT page is marked dirty
> in the AF case. The RO and UFFD cases also work as expected.
> 
> Need to send some changes for page_fault_test as many tests assume that
> any S1PTW is always a PT write, and are failing. Also need to add some new
> tests for PTs in RO memslots (as it didn't make much sense before this
> change).

So I actually wanted to bring up the issue of user visibility, glad your
test picked up something.

This has two implications, which are rather odd.

 - When UFFD is in use, translation faults are reported to userspace as
   writes when from a RW memslot and reads when from an RO memslot.

 - S1 page table memory is spuriously marked as dirty, as we presume a
   write immediately follows the translation fault. That isn't entirely
   senseless, as it would mean both the target page and the S1 PT that
   maps it are both old. This is nothing new I suppose, just weird.

Marc, do you have any concerns about leaving this as-is for the time
being? At least before we were doing the same thing (write fault) every
time.

--
Thanks,
Oliver
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 2/3] KVM: arm64: Handle S1PTW translation with TCR_HA set as a write

2022-12-21 Thread Marc Zyngier
Hi Ricardo,

On Wed, 21 Dec 2022 16:46:06 +,
Ricardo Koller  wrote:
> 
> Hello,
> 
> On Tue, Dec 20, 2022 at 08:09:22PM +, Marc Zyngier wrote:
> > As a minor optimisation, we can retrofit the "S1PTW is a write
> > even on translation fault" concept *if* the vcpu is using the
> > HW-managed Access Flag, as setting TCR_EL1.HA is guaranteed
> > to result in an update of the PTE.
> > 
> > However, we cannot do the same thing for DB, as it would require
> > us to parse the PTs to find out if the DBM bit is set there.
> > This is not going to happen.
> > 
> > Signed-off-by: Marc Zyngier 
> > ---
> >  arch/arm64/include/asm/kvm_emulate.h | 20 +++-
> >  1 file changed, 19 insertions(+), 1 deletion(-)
> > 
> > diff --git a/arch/arm64/include/asm/kvm_emulate.h 
> > b/arch/arm64/include/asm/kvm_emulate.h
> > index fd6ad8b21f85..4ee467065042 100644
> > --- a/arch/arm64/include/asm/kvm_emulate.h
> > +++ b/arch/arm64/include/asm/kvm_emulate.h
> > @@ -374,6 +374,9 @@ static __always_inline int kvm_vcpu_sys_get_rt(struct 
> > kvm_vcpu *vcpu)
> >  static inline bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
> >  {
> > if (kvm_vcpu_abt_iss1tw(vcpu)) {
> > +   unsigned int afdb;
> > +   u64 mmfr1;
> > +
> > /*
> >  * Only a permission fault on a S1PTW should be
> >  * considered as a write. Otherwise, page tables baked
> > @@ -385,12 +388,27 @@ static inline bool kvm_is_write_fault(struct kvm_vcpu 
> > *vcpu)
> >  * to map the page containing the PT (read only at
> >  * first), then a permission fault to allow the flags
> >  * to be set.
> > +*
> > +* We can improve things if the guest uses AF, as this
> > +* is guaranteed to result in a write to the PTE. For
> > +* DB, however, we'd need to parse the guest's PTs,
> > +* and that's not on. DB is crap anyway.
> >  */
> > switch (kvm_vcpu_trap_get_fault_type(vcpu)) {
> 
> Nit: fault_status is calculated once when taking the fault, and passed
> around to all users (like user_mem_abort()). Not sure if this is because
> of the extra cycles needed to get it, or just style. Anyway, maybe it
> applies here.

All these things are just fields in ESR_EL2, which we keep looking at
all the time. The compiler actually does a pretty good job at keeping
that around, specially considering that this function is inlined (at
least here, kvm_handle_guest_abort and kvm_user_mem_abort are merged
into a single monster).

So passing the parameter wouldn't change a thing, and I find the above
more readable (I know that all the information in this function are
derived from the same data structure).

> 
> > case ESR_ELx_FSC_PERM:
> > return true;
> > default:
> > -   return false;
> > +   /* Can't introspect TCR_EL1 with pKVM */
> > +   if (kvm_vm_is_protected(vcpu->kvm))
> > +   return false;
> > +
> > +   mmfr1 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
> > +   afdb = cpuid_feature_extract_unsigned_field(mmfr1, 
> > ID_AA64MMFR1_EL1_HAFDBS_SHIFT);
> > +
> > +   if (afdb == ID_AA64MMFR1_EL1_HAFDBS_NI)
> > +   return false;
> > +
> > +   return (vcpu_read_sys_reg(vcpu, TCR_EL1) & TCR_HA);
> 
> Also tested this specific case using page_fault_test when the PT page is
> marked for dirty logging with and without AF. In both cases there's a
> single _FSC_FAULT (no PERM_FAUT) as expected, and the PT page is marked dirty
> in the AF case. The RO and UFFD cases also work as expected.

Ah, thanks for checking this.

> 
> Need to send some changes for page_fault_test as many tests assume that
> any S1PTW is always a PT write, and are failing. Also need to add some new
> tests for PTs in RO memslots (as it didn't make much sense before this
> change).

I think this is what I really quite didn't grok in these tests. They
seem to verify the KVM behaviour, which is not what we should check
for.

Instead, we should check for the architectural behaviour, which is
that if HAFDBS is enabled, we can observe updates to the PTs even when
we do not write to them directly.

> 
> > }
> > }
> >  
> > -- 
> > 2.34.1
> > 
> > ___
> > kvmarm mailing list
> > kvmarm@lists.cs.columbia.edu
> > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm
> 
> Reviewed-by: Ricardo Koller 

Thanks!

M.

-- 
Without deviation from the norm, progress is not possible.
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm


Re: [PATCH 2/3] KVM: arm64: Handle S1PTW translation with TCR_HA set as a write

2022-12-21 Thread Ricardo Koller
Hello,

On Tue, Dec 20, 2022 at 08:09:22PM +, Marc Zyngier wrote:
> As a minor optimisation, we can retrofit the "S1PTW is a write
> even on translation fault" concept *if* the vcpu is using the
> HW-managed Access Flag, as setting TCR_EL1.HA is guaranteed
> to result in an update of the PTE.
> 
> However, we cannot do the same thing for DB, as it would require
> us to parse the PTs to find out if the DBM bit is set there.
> This is not going to happen.
> 
> Signed-off-by: Marc Zyngier 
> ---
>  arch/arm64/include/asm/kvm_emulate.h | 20 +++-
>  1 file changed, 19 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_emulate.h 
> b/arch/arm64/include/asm/kvm_emulate.h
> index fd6ad8b21f85..4ee467065042 100644
> --- a/arch/arm64/include/asm/kvm_emulate.h
> +++ b/arch/arm64/include/asm/kvm_emulate.h
> @@ -374,6 +374,9 @@ static __always_inline int kvm_vcpu_sys_get_rt(struct 
> kvm_vcpu *vcpu)
>  static inline bool kvm_is_write_fault(struct kvm_vcpu *vcpu)
>  {
>   if (kvm_vcpu_abt_iss1tw(vcpu)) {
> + unsigned int afdb;
> + u64 mmfr1;
> +
>   /*
>* Only a permission fault on a S1PTW should be
>* considered as a write. Otherwise, page tables baked
> @@ -385,12 +388,27 @@ static inline bool kvm_is_write_fault(struct kvm_vcpu 
> *vcpu)
>* to map the page containing the PT (read only at
>* first), then a permission fault to allow the flags
>* to be set.
> +  *
> +  * We can improve things if the guest uses AF, as this
> +  * is guaranteed to result in a write to the PTE. For
> +  * DB, however, we'd need to parse the guest's PTs,
> +  * and that's not on. DB is crap anyway.
>*/
>   switch (kvm_vcpu_trap_get_fault_type(vcpu)) {

Nit: fault_status is calculated once when taking the fault, and passed
around to all users (like user_mem_abort()). Not sure if this is because
of the extra cycles needed to get it, or just style. Anyway, maybe it
applies here.

>   case ESR_ELx_FSC_PERM:
>   return true;
>   default:
> - return false;
> + /* Can't introspect TCR_EL1 with pKVM */
> + if (kvm_vm_is_protected(vcpu->kvm))
> + return false;
> +
> + mmfr1 = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1);
> + afdb = cpuid_feature_extract_unsigned_field(mmfr1, 
> ID_AA64MMFR1_EL1_HAFDBS_SHIFT);
> +
> + if (afdb == ID_AA64MMFR1_EL1_HAFDBS_NI)
> + return false;
> +
> + return (vcpu_read_sys_reg(vcpu, TCR_EL1) & TCR_HA);

Also tested this specific case using page_fault_test when the PT page is
marked for dirty logging with and without AF. In both cases there's a
single _FSC_FAULT (no PERM_FAUT) as expected, and the PT page is marked dirty
in the AF case. The RO and UFFD cases also work as expected.

Need to send some changes for page_fault_test as many tests assume that
any S1PTW is always a PT write, and are failing. Also need to add some new
tests for PTs in RO memslots (as it didn't make much sense before this
change).

>   }
>   }
>  
> -- 
> 2.34.1
> 
> ___
> kvmarm mailing list
> kvmarm@lists.cs.columbia.edu
> https://lists.cs.columbia.edu/mailman/listinfo/kvmarm

Reviewed-by: Ricardo Koller 

Thanks,
Ricardo
___
kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/mailman/listinfo/kvmarm