Re: [Xen-devel] [PATCH-for-4.13 v4] Rationalize max_grant_frames and max_maptrack_frames handling

2019-11-28 Thread Durrant, Paul
> -Original Message-
> From: Jan Beulich 
> Sent: 28 November 2019 16:37
> To: Durrant, Paul 
> Cc: xen-devel@lists.xenproject.org; Andrew Cooper
> ; Anthony PERARD ;
> George Dunlap ; Roger Pau Monné
> ; Volodymyr Babchuk ;
> George Dunlap ; Ian Jackson
> ; Marek Marczykowski-Górecki
> ; Stefano Stabellini
> ; Konrad Rzeszutek Wilk ;
> Julien Grall ; Wei Liu 
> Subject: Re: [PATCH-for-4.13 v4] Rationalize max_grant_frames and
> max_maptrack_frames handling
> 
> On 28.11.2019 14:58, Paul Durrant wrote:
> > --- a/xen/common/grant_table.c
> > +++ b/xen/common/grant_table.c
> > @@ -84,11 +84,43 @@ struct grant_table {
> >  struct grant_table_arch arch;
> >  };
> >
> > +static int __init parse_gnttab_limit(const char *param, const char
> *arg,
> 
> No __init please here and below, for this being runtime option
> parsing functions.
> 

Sorry, yes... forgot about the runtime part.

> > + unsigned int *valp)
> > +{
> > +const char *e;
> > +unsigned long val;
> > +
> > +val = simple_strtoul(arg, , 0);
> > +if ( *e )
> > +return -EINVAL;
> > +
> > +if ( val <= INT_MAX )
> > +*valp = val;
> > +else
> > +printk("parameter \"%s\" value \"%s\" is out of range; using
> value \"%u\"\n",
> > +   param, arg, *valp);
> 
> Better store INT_MAX in this case rather than leaving the value
> unchanged? Or otherwise ...
> 
> > +return 0;
> 
> ... at least don't return success?

TBH I wasn't sure what the best thing to do was. In the end I opted for the 
warning and a successful completion as I thought a failure would be largely 
unhelpful. I can change this into an error though.

> 
> > +}
> > +
> >  unsigned int __read_mostly opt_max_grant_frames = 64;
> > -integer_runtime_param("gnttab_max_frames", opt_max_grant_frames);
> > +
> > +static int __init parse_gnttab_max_frames(const char *arg)
> > +{
> > +return parse_gnttab_limit("gnttab_max_frames", arg,
> > +  _max_grant_frames);
> > +}
> > +custom_runtime_param("gnttab_max_frames", parse_gnttab_max_frames);
> >
> >  unsigned int __read_mostly opt_max_maptrack_frames = 1024;
> 
> As indicated this wants to become static now.

Sorry I forgot about that.

  Paul
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] x86: irq: Do not BUG_ON multiple unbind calls for shared pirqs

2019-12-18 Thread Durrant, Paul
> -Original Message-
> From: Xen-devel  On Behalf Of
> Varad Gautam
> Sent: 18 December 2019 10:54
> To: xen-devel@lists.xenproject.org
> Cc: Andrew Cooper ; Gautam, Varad
> ; Jan Beulich ; Roger Pau Monné
> 
> Subject: [Xen-devel] [PATCH v2] x86: irq: Do not BUG_ON multiple unbind
> calls for shared pirqs
> 
> XEN_DOMCTL_destroydomain creates a continuation if domain_kill -ERESTARTS.
> In that scenario, it is possible to receive multiple _pirq_guest_unbind
> calls for the same pirq from domain_kill, if the pirq has not yet been
> removed from the domain's pirq_tree, as:
>   domain_kill()
> -> domain_relinquish_resources()
>   -> pci_release_devices()
> -> pci_clean_dpci_irq()
>   -> pirq_guest_unbind()
> -> __pirq_guest_unbind()
> 
> For a shared pirq (nr_guests > 1), the first call would zap the current
> domain from the pirq's guests[] list, but the action handler is never
> freed
> as there are other guests using this pirq. As a result, on the second
> call,
> __pirq_guest_unbind searches for the current domain which has been removed
> from the guests[] list, and hits a BUG_ON.
> 
> Make __pirq_guest_unbind safe to be called multiple times by letting xen
> continue if a shared pirq has already been unbound from this guest. The
> PIRQ will be cleaned up from the domain's pirq_tree during the destruction
> in complete_domain_destroy anyways.
> 
> Signed-off-by: Varad Gautam 
> CC: Jan Beulich 
> CC: Roger Pau Monné 
> CC: Andrew Cooper 

Reviewed-by: Paul Durrant 

> 
> v2: Split the check on action->nr_guests > 0 and make it an ASSERT,
> reword.
> ---
>  xen/arch/x86/irq.c | 11 ++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
> index 5d0d94c..3eb7b22 100644
> --- a/xen/arch/x86/irq.c
> +++ b/xen/arch/x86/irq.c
> @@ -1863,7 +1863,16 @@ static irq_guest_action_t *__pirq_guest_unbind(
> 
>  for ( i = 0; (i < action->nr_guests) && (action->guest[i] != d); i++
> )
>  continue;
> -BUG_ON(i == action->nr_guests);
> +if ( i == action->nr_guests ) {
> +ASSERT(action->nr_guests > 0) ;
> +/* In case the pirq was shared, unbound for this domain in an
> earlier call, but still
> + * existed on the domain's pirq_tree, we still reach here if
> there are any later
> + * unbind calls on the same pirq. Return if such an unbind
> happens. */
> +if ( action->shareable )
> +return NULL;
> +BUG();
> +}
> +
>  memmove(>guest[i], >guest[i+1],
>  (action->nr_guests-i-1) * sizeof(action->guest[0]));
>  action->nr_guests--;
> --
> 2.7.4
> 
> 
> 
> 
> Amazon Development Center Germany GmbH
> Krausenstr. 38
> 10117 Berlin
> Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
> Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
> Sitz: Berlin
> Ust-ID: DE 289 237 879
> 
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] x86/save: reserve HVM save record numbers that have been consumed...

2019-12-19 Thread Durrant, Paul
> -Original Message-
> From: Jan Beulich 
> Sent: 19 December 2019 13:25
> To: Durrant, Paul 
> Cc: Andrew Cooper ; xen-
> de...@lists.xenproject.org; Wei Liu ; Roger Pau Monné
> 
> Subject: Re: [PATCH v2] x86/save: reserve HVM save record numbers that
> have been consumed...
> 
> On 19.12.2019 14:15, Durrant, Paul wrote:
> >> From: Andrew Cooper 
> >> Sent: 19 December 2019 13:08
> >>
> >> On 19/12/2019 12:37, Durrant, Paul wrote:
> >>>> From: Andrew Cooper 
> >>>> Sent: 19 December 2019 12:16
> >>>>
> >>>> On 19/12/2019 12:04, Paul Durrant wrote:
> >>>>> --- a/xen/include/public/arch-x86/hvm/save.h
> >>>>> +++ b/xen/include/public/arch-x86/hvm/save.h
> >>>>> @@ -639,6 +639,8 @@ struct hvm_msr {
> >>>>>
> >>>>>  #define CPU_MSR_CODE  20
> >>>>>
> >>>>> +/* Range 22 - 40 reserved for Amazon */
> >>>> What about 21 and 22?  And where does the actual number stop, because
> >>>> based on v1, its not 40.
> >>>>
> >>> The range is inclusive (maybe that's not obvious?). For some reason 21
> >> was skipped but why do you say the top is not 40? That was what I set
> >> HVM_SAVE_CODE_MAX to in v1.
> >>
> >> You also said that included prospective headroom, which definitely
> isn't
> >> fair to reserve for ABI breakage reasons.
> >>
> >> Which numbers have actually been allocated?
> >>
> >
> > The problem is that I don't yet know for sure. I have definitely seen
> > patches using 22 thru 34. It is *probably* safe to restrict to that but
> > does it really cost that much more to reserve some extra space just in
> > case? I.e. if someone else adds 41 vs. 35 is it going to make much of a
> > difference?
> 
> Not _that much_, but still - it's a bodge, so let's try to limit it as
> much as we can.
> 

OK, I'll send a v3 using 34 as the limit.

  Paul

> Jan
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] x86/save: reserve HVM save record numbers that have been consumed...

2019-12-19 Thread Durrant, Paul
> -Original Message-
> From: Andrew Cooper 
> Sent: 19 December 2019 13:08
> To: Durrant, Paul ; xen-devel@lists.xenproject.org
> Cc: Wei Liu ; Jan Beulich ; Roger Pau Monné
> 
> Subject: Re: [PATCH v2] x86/save: reserve HVM save record numbers that
> have been consumed...
> 
> On 19/12/2019 12:37, Durrant, Paul wrote:
> >> -Original Message-
> >> From: Andrew Cooper 
> >> Sent: 19 December 2019 12:16
> >> To: Durrant, Paul ; xen-devel@lists.xenproject.org
> >> Cc: Wei Liu ; Jan Beulich ; Roger Pau
> Monné
> >> 
> >> Subject: Re: [PATCH v2] x86/save: reserve HVM save record numbers that
> >> have been consumed...
> >>
> >> On 19/12/2019 12:04, Paul Durrant wrote:
> >>> ...for patches not (yet) upstream.
> >>>
> >>> This patch is simply adding a comment to reserve save record number
> >> space
> >>> to avoid the risk of clashes between existent downstream changes made
> by
> >>> Amazon and future upstream changes which may be incompatible.
> >>>
> >>> Signed-off-by: Paul Durrant 
> >>> Reviewed-by: Wei Liu 
> >>> ---
> >>> Cc: Jan Beulich 
> >>> Cc: Andrew Cooper 
> >>> Cc: "Roger Pau Monné" 
> >>>
> >>> v2:
> >>>  - Reduce patch to just a comment
> >>> ---
> >>>  xen/include/public/arch-x86/hvm/save.h | 2 ++
> >>>  1 file changed, 2 insertions(+)
> >>>
> >>> diff --git a/xen/include/public/arch-x86/hvm/save.h
> >> b/xen/include/public/arch-x86/hvm/save.h
> >>> index b2ad3fcd74..b3d445acf7 100644
> >>> --- a/xen/include/public/arch-x86/hvm/save.h
> >>> +++ b/xen/include/public/arch-x86/hvm/save.h
> >>> @@ -639,6 +639,8 @@ struct hvm_msr {
> >>>
> >>>  #define CPU_MSR_CODE  20
> >>>
> >>> +/* Range 22 - 40 reserved for Amazon */
> >> What about 21 and 22?  And where does the actual number stop, because
> >> based on v1, its not 40.
> >>
> > The range is inclusive (maybe that's not obvious?). For some reason 21
> was skipped but why do you say the top is not 40? That was what I set
> HVM_SAVE_CODE_MAX to in v1.
> 
> You also said that included prospective headroom, which definitely isn't
> fair to reserve for ABI breakage reasons.
> 
> Which numbers have actually been allocated?
> 

The problem is that I don't yet know for sure. I have definitely seen patches 
using 22 thru 34. It is *probably* safe to restrict to that but does it really 
cost that much more to reserve some extra space just in case? I.e. if someone 
else adds 41 vs. 35 is it going to make much of a difference?

  Paul
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 5/4] x86/viridian: drop a wrong invalid value from reference TSC implementation

2019-12-21 Thread Durrant, Paul
> -Original Message-
> From: Wei Liu  On Behalf Of Wei Liu
> Sent: 20 December 2019 21:21
> To: Xen Development List 
> Cc: Durrant, Paul ; Michael Kelley
> ; Wei Liu ; Paul Durrant
> ; Jan Beulich ; Andrew Cooper
> ; Wei Liu ; Roger Pau Monné
> 
> Subject: [PATCH 5/4] x86/viridian: drop a wrong invalid value from
> reference TSC implementation
> 
> The only invalid value mentioned in Hyper-V TLFS 5.0c is 0. Michael
> Kelley confirmed that 0x was never used [0].
> 
> [0] https://lists.xen.org/archives/html/xen-devel/2019-12/msg01564.html
> 
> Signed-off-by: Wei Liu 
> ---
>  xen/arch/x86/hvm/viridian/time.c | 13 -
>  1 file changed, 4 insertions(+), 9 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/viridian/time.c
> b/xen/arch/x86/hvm/viridian/time.c
> index 6b2d745f3a..b8280a1a60 100644
> --- a/xen/arch/x86/hvm/viridian/time.c
> +++ b/xen/arch/x86/hvm/viridian/time.c
> @@ -45,14 +45,9 @@ static void update_reference_tsc(const struct domain
> *d, bool initialize)
>  if ( !host_tsc_is_safe() || d->arch.vtsc )
>  {
>  /*
> - * The specification states that valid values of TscSequence
> range
> - * from 0 to 0xFFFE. The value 0x is used to indicate
> - * this mechanism is no longer a reliable source of time and that
> - * the VM should fall back to a different source.
> - *
> - * Server 2012 (6.2 kernel) and 2012 R2 (6.3 kernel) actually
> - * violate the spec. and rely on a value of 0 to indicate that
> this
> - * enlightenment should no longer be used.
> + * The value 0 is used to indicate this mechanism is no longer a
> + * reliable source of time and that the VM should fall back to a
> + * different source.
>   */
>  p->tsc_sequence = 0;
> 
> @@ -77,7 +72,7 @@ static void update_reference_tsc(const struct domain *d,
> bool initialize)
>  smp_wmb();
> 
>  seq = p->tsc_sequence + 1;
> -if ( seq == 0x || seq == 0 ) /* Avoid both 'invalid' values
> */
> +if ( seq == 0 ) /* Avoid 'invalid' value */
>  seq = 1;
> 
>  p->tsc_sequence = seq;

Now that we're just dealing with 0, I think the code would be neater as:

seq = p->tsc_sequence + 1;
p->tsc_sequence = seq ? seq : 1; /* Avoid 'invalid' value 0 */

  Paul



> --
> 2.20.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/4] x86/viridian: drop private copy of HV_REFERENCE_TSC_PAGE in time.c

2019-12-21 Thread Durrant, Paul
> -Original Message-
> From: Wei Liu  On Behalf Of Wei Liu
> Sent: 20 December 2019 19:52
> To: Xen Development List 
> Cc: Michael Kelley ; Durrant, Paul
> ; Wei Liu ; Paul Durrant
> ; Jan Beulich ; Andrew Cooper
> ; Wei Liu ; Roger Pau Monné
> 
> Subject: [PATCH 2/4] x86/viridian: drop private copy of
> HV_REFERENCE_TSC_PAGE in time.c
> 
> Use the one defined in hyperv-tlfs.h instead. No functional change
> intended.
> 
> Signed-off-by: Wei Liu 

Reviewed-by: Paul Durrant 

> ---
>  xen/arch/x86/hvm/viridian/time.c | 20 ++--
>  1 file changed, 6 insertions(+), 14 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/viridian/time.c
> b/xen/arch/x86/hvm/viridian/time.c
> index 6ddca29b29..32e79bbcc4 100644
> --- a/xen/arch/x86/hvm/viridian/time.c
> +++ b/xen/arch/x86/hvm/viridian/time.c
> @@ -13,19 +13,11 @@
> 
>  #include 
>  #include 
> +#include 
>  #include 
> 
>  #include "private.h"
> 
> -typedef struct _HV_REFERENCE_TSC_PAGE
> -{
> -uint32_t TscSequence;
> -uint32_t Reserved1;
> -uint64_t TscScale;
> -int64_t  TscOffset;
> -uint64_t Reserved2[509];
> -} HV_REFERENCE_TSC_PAGE, *PHV_REFERENCE_TSC_PAGE;
> -
>  static void update_reference_tsc(const struct domain *d, bool initialize)
>  {
>  struct viridian_domain *vd = d->arch.hvm.viridian;
> @@ -61,7 +53,7 @@ static void update_reference_tsc(const struct domain *d,
> bool initialize)
>   * violate the spec. and rely on a value of 0 to indicate that
> this
>   * enlightenment should no longer be used.
>   */
> -p->TscSequence = 0;
> +p->tsc_sequence = 0;
> 
>  printk(XENLOG_G_INFO "d%d: VIRIDIAN REFERENCE_TSC:
> invalidated\n",
> d->domain_id);
> @@ -79,15 +71,15 @@ static void update_reference_tsc(const struct domain
> *d, bool initialize)
>   * The offset value is calculated on restore after migration and
>   * ensures that Windows will not see a large jump in ReferenceTime.
>   */
> -p->TscScale = ((1ul << 32) / d->arch.tsc_khz) << 32;
> -p->TscOffset = trc->off;
> +p->tsc_scale = ((1ul << 32) / d->arch.tsc_khz) << 32;
> +p->tsc_offset = trc->off;
>  smp_wmb();
> 
> -seq = p->TscSequence + 1;
> +seq = p->tsc_sequence + 1;
>  if ( seq == 0x || seq == 0 ) /* Avoid both 'invalid' values
> */
>  seq = 1;
> 
> -p->TscSequence = seq;
> +p->tsc_sequence = seq;
>  }
> 
>  /*
> --
> 2.20.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 1/4] x86/viridian: drop duplicate defines from private.h and viridian.c

2019-12-21 Thread Durrant, Paul
> -Original Message-
> From: Wei Liu  On Behalf Of Wei Liu
> Sent: 20 December 2019 19:52
> To: Xen Development List 
> Cc: Michael Kelley ; Durrant, Paul
> ; Wei Liu ; Paul Durrant
> ; Jan Beulich ; Andrew Cooper
> ; Wei Liu ; Roger Pau Monné
> 
> Subject: [PATCH 1/4] x86/viridian: drop duplicate defines from private.h
> and viridian.c
> 
> Also add HVCALL_EXT_CALL_QUERY_CAPABILITIES to hyperv-tlfs.h.
> HvGetPartitionID was never used in code so just dropped it.
> 
> No functional change intended.
> 
> Signed-off-by: Wei Liu 

Reviewed-by: Paul Durrant 

> ---
>  xen/arch/x86/hvm/viridian/private.h | 66 -
>  xen/arch/x86/hvm/viridian/viridian.c| 29 +++
>  xen/include/asm-x86/guest/hyperv-tlfs.h |  1 +
>  3 files changed, 8 insertions(+), 88 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/viridian/private.h
> b/xen/arch/x86/hvm/viridian/private.h
> index c272c34cda..958a2814c2 100644
> --- a/xen/arch/x86/hvm/viridian/private.h
> +++ b/xen/arch/x86/hvm/viridian/private.h
> @@ -5,72 +5,6 @@
> 
>  #include 
> 
> -/* Viridian MSR numbers. */
> -#define HV_X64_MSR_GUEST_OS_ID   0x4000
> -#define HV_X64_MSR_HYPERCALL 0x4001
> -#define HV_X64_MSR_VP_INDEX  0x4002
> -#define HV_X64_MSR_RESET 0x4003
> -#define HV_X64_MSR_VP_RUNTIME0x4010
> -#define HV_X64_MSR_TIME_REF_COUNT0x4020
> -#define HV_X64_MSR_REFERENCE_TSC 0x4021
> -#define HV_X64_MSR_TSC_FREQUENCY 0x4022
> -#define HV_X64_MSR_APIC_FREQUENCY0x4023
> -#define HV_X64_MSR_EOI   0x4070
> -#define HV_X64_MSR_ICR   0x4071
> -#define HV_X64_MSR_TPR   0x4072
> -#define HV_X64_MSR_VP_ASSIST_PAGE0x4073
> -#define HV_X64_MSR_SCONTROL  0x4080
> -#define HV_X64_MSR_SVERSION  0x4081
> -#define HV_X64_MSR_SIEFP 0x4082
> -#define HV_X64_MSR_SIMP  0x4083
> -#define HV_X64_MSR_EOM   0x4084
> -#define HV_X64_MSR_SINT0 0x4090
> -#define HV_X64_MSR_SINT1 0x4091
> -#define HV_X64_MSR_SINT2 0x4092
> -#define HV_X64_MSR_SINT3 0x4093
> -#define HV_X64_MSR_SINT4 0x4094
> -#define HV_X64_MSR_SINT5 0x4095
> -#define HV_X64_MSR_SINT6 0x4096
> -#define HV_X64_MSR_SINT7 0x4097
> -#define HV_X64_MSR_SINT8 0x4098
> -#define HV_X64_MSR_SINT9 0x4099
> -#define HV_X64_MSR_SINT100x409A
> -#define HV_X64_MSR_SINT110x409B
> -#define HV_X64_MSR_SINT120x409C
> -#define HV_X64_MSR_SINT130x409D
> -#define HV_X64_MSR_SINT140x409E
> -#define HV_X64_MSR_SINT150x409F
> -#define HV_X64_MSR_STIMER0_CONFIG0x40B0
> -#define HV_X64_MSR_STIMER0_COUNT 0x40B1
> -#define HV_X64_MSR_STIMER1_CONFIG0x40B2
> -#define HV_X64_MSR_STIMER1_COUNT 0x40B3
> -#define HV_X64_MSR_STIMER2_CONFIG0x40B4
> -#define HV_X64_MSR_STIMER2_COUNT 0x40B5
> -#define HV_X64_MSR_STIMER3_CONFIG0x40B6
> -#define HV_X64_MSR_STIMER3_COUNT 0x40B7
> -#define HV_X64_MSR_POWER_STATE_TRIGGER_C10x40C1
> -#define HV_X64_MSR_POWER_STATE_TRIGGER_C20x40C2
> -#define HV_X64_MSR_POWER_STATE_TRIGGER_C30x40C3
> -#define HV_X64_MSR_POWER_STATE_CONFIG_C1 0x40D1
> -#define HV_X64_MSR_POWER_STATE_CONFIG_C2 0x40D2
> -#define HV_X64_MSR_POWER_STATE_CONFIG_C3 0x40D3
> -#define HV_X64_MSR_STATS_PARTITION_RETAIL_PAGE   0x40E0
> -#define HV_X64_MSR_STATS_PARTITION_INTERNAL_PAGE 0x40E1
> -#define HV_X64_MSR_STATS_VP_RETAIL_PAGE  0x40E2
> -#define HV_X64_MSR_STATS_VP_INTERNAL_PAGE0x40E3
> -#define HV_X64_MSR_GUEST_IDLE0x40F0
> -#define HV_X64_MSR_SYNTH_DEBUG_CONTROL   0x40F1
> -#define HV_X64_MSR_SYNTH_DEBUG_STATUS0x40F2
> -#define HV_X64_MSR_SYNTH_DEBUG_SEND_BUFFER   0x40F3
> -#define HV_X64_MSR_SYNTH_DEBUG_RECEIVE_BUFFER0x40F4
> -#de

Re: [Xen-devel] [PATCH 4/4] x86: move viridian_guest_os_id_msr to hyperv-tlfs.h

2019-12-21 Thread Durrant, Paul
> -Original Message-
> From: Wei Liu  On Behalf Of Wei Liu
> Sent: 20 December 2019 19:52
> To: Xen Development List 
> Cc: Michael Kelley ; Durrant, Paul
> ; Wei Liu ; Paul Durrant
> ; Jan Beulich ; Andrew Cooper
> ; Wei Liu ; Roger Pau Monné
> 
> Subject: [PATCH 4/4] x86: move viridian_guest_os_id_msr to hyperv-tlfs.h
> 
> Suggested-by: Paul Durrant 
> Signed-off-by: Wei Liu 

Reviewed-by: Paul Durrant 

> ---
>  xen/arch/x86/hvm/viridian/viridian.c|  2 +-
>  xen/include/asm-x86/guest/hyperv-tlfs.h | 13 +
>  xen/include/asm-x86/hvm/viridian.h  | 18 +++---
>  3 files changed, 17 insertions(+), 16 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/viridian/viridian.c
> b/xen/arch/x86/hvm/viridian/viridian.c
> index 341592f054..44c8e6cac6 100644
> --- a/xen/arch/x86/hvm/viridian/viridian.c
> +++ b/xen/arch/x86/hvm/viridian/viridian.c
> @@ -218,7 +218,7 @@ void cpuid_viridian_leaves(const struct vcpu *v,
> uint32_t leaf,
> 
>  static void dump_guest_os_id(const struct domain *d)
>  {
> -const union viridian_guest_os_id_msr *goi;
> +const union hv_guest_os_id *goi;
> 
>  goi = >arch.hvm.viridian->guest_os_id;
> 
> diff --git a/xen/include/asm-x86/guest/hyperv-tlfs.h b/xen/include/asm-
> x86/guest/hyperv-tlfs.h
> index b128807b2c..4402854c80 100644
> --- a/xen/include/asm-x86/guest/hyperv-tlfs.h
> +++ b/xen/include/asm-x86/guest/hyperv-tlfs.h
> @@ -327,6 +327,19 @@ struct ms_hyperv_tsc_page {
>   */
> 
>  #define HV_LINUX_VENDOR_ID  0x8100
> +union hv_guest_os_id
> +{
> +uint64_t raw;
> +struct
> +{
> +uint64_t build_number:16;
> +uint64_t service_pack:8;
> +uint64_t minor:8;
> +uint64_t major:8;
> +uint64_t os:8;
> +uint64_t vendor:16;
> +};
> +};
> 
>  struct hv_reenlightenment_control {
>   __u64 vector:8;
> diff --git a/xen/include/asm-x86/hvm/viridian.h b/xen/include/asm-
> x86/hvm/viridian.h
> index 010c8b58d4..cfbaede158 100644
> --- a/xen/include/asm-x86/hvm/viridian.h
> +++ b/xen/include/asm-x86/hvm/viridian.h
> @@ -9,6 +9,8 @@
>  #ifndef __ASM_X86_HVM_VIRIDIAN_H__
>  #define __ASM_X86_HVM_VIRIDIAN_H__
> 
> +#include 
> +
>  union viridian_page_msr
>  {
>  uint64_t raw;
> @@ -83,20 +85,6 @@ struct viridian_vcpu
>  uint64_t crash_param[5];
>  };
> 
> -union viridian_guest_os_id_msr
> -{
> -uint64_t raw;
> -struct
> -{
> -uint64_t build_number:16;
> -uint64_t service_pack:8;
> -uint64_t minor:8;
> -uint64_t major:8;
> -uint64_t os:8;
> -uint64_t vendor:16;
> -};
> -};
> -
>  struct viridian_time_ref_count
>  {
>  unsigned long flags;
> @@ -112,7 +100,7 @@ struct viridian_time_ref_count
> 
>  struct viridian_domain
>  {
> -union viridian_guest_os_id_msr guest_os_id;
> +union hv_guest_os_id guest_os_id;
>  union viridian_page_msr hypercall_gpa;
>  struct viridian_time_ref_count time_ref_count;
>  struct viridian_page reference_tsc;
> --
> 2.20.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 3/4] x86: provide and use hv_tsc_scale

2019-12-21 Thread Durrant, Paul
> -Original Message-
> From: Wei Liu  On Behalf Of Wei Liu
> Sent: 20 December 2019 19:52
> To: Xen Development List 
> Cc: Michael Kelley ; Durrant, Paul
> ; Wei Liu ; Paul Durrant
> ; Jan Beulich ; Andrew Cooper
> ; Wei Liu ; Roger Pau Monné
> 
> Subject: [PATCH 3/4] x86: provide and use hv_tsc_scale
> 
> The Hyper-V clock source and Xen's own viridian code need the same
> functionality.
> 
> Move the function in viridian/time.c to hyperv.h and use it in both
> places.
> 
> No functional change.
> 
> Signed-off-by: Wei Liu 

Reviewed-by: Paul Durrant 

> ---
>  xen/arch/x86/hvm/viridian/time.c   | 30 ++--
>  xen/arch/x86/time.c|  7 +--
>  xen/include/asm-x86/guest/hyperv.h | 32 --
>  3 files changed, 33 insertions(+), 36 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/viridian/time.c
> b/xen/arch/x86/hvm/viridian/time.c
> index 32e79bbcc4..6b2d745f3a 100644
> --- a/xen/arch/x86/hvm/viridian/time.c
> +++ b/xen/arch/x86/hvm/viridian/time.c
> @@ -13,6 +13,7 @@
> 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
> 
> @@ -82,33 +83,6 @@ static void update_reference_tsc(const struct domain
> *d, bool initialize)
>  p->tsc_sequence = seq;
>  }
> 
> -/*
> - * The specification says: "The partition reference time is computed
> - * by the following formula:
> - *
> - * ReferenceTime = ((VirtualTsc * TscScale) >> 64) + TscOffset
> - *
> - * The multiplication is a 64 bit multiplication, which results in a
> - * 128 bit number which is then shifted 64 times to the right to obtain
> - * the high 64 bits."
> - */
> -static uint64_t scale_tsc(uint64_t tsc, uint64_t scale, int64_t offset)
> -{
> -uint64_t result;
> -
> -/*
> - * Quadword MUL takes an implicit operand in RAX, and puts the result
> - * in RDX:RAX. Because we only want the result of the multiplication
> - * after shifting right by 64 bits, we therefore only need the
> content
> - * of RDX.
> - */
> -asm ( "mulq %[scale]"
> -  : "+a" (tsc), "=d" (result)
> -  : [scale] "rm" (scale) );
> -
> -return result + offset;
> -}
> -
>  static uint64_t trc_val(const struct domain *d, int64_t offset)
>  {
>  uint64_t tsc, scale;
> @@ -116,7 +90,7 @@ static uint64_t trc_val(const struct domain *d, int64_t
> offset)
>  tsc = hvm_get_guest_tsc(pt_global_vcpu_target(d));
>  scale = ((1ul << 32) / d->arch.tsc_khz) << 32;
> 
> -return scale_tsc(tsc, scale, offset);
> +return hv_scale_tsc(tsc, scale, offset);
>  }
> 
>  static void time_ref_count_freeze(const struct domain *d)
> diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
> index bbcc9b10b8..d21875de9e 100644
> --- a/xen/arch/x86/time.c
> +++ b/xen/arch/x86/time.c
> @@ -725,12 +725,7 @@ static inline uint64_t read_hyperv_timer(void)
> 
>  } while ( tsc_page->tsc_sequence != seq );
> 
> -/* ret = ((tsc * scale) >> 64) + offset; */
> -asm ( "mul %[scale]; add %[offset], %[ret]"
> -  : "+a" (tsc), [ret] "=" (ret)
> -  : [scale] "rm" (scale), [offset] "rm" (offset) );
> -
> -return ret;
> +return hv_scale_tsc(tsc, scale, offset);
>  }
> 
>  static struct platform_timesource __initdata plt_hyperv_timer =
> diff --git a/xen/include/asm-x86/guest/hyperv.h b/xen/include/asm-
> x86/guest/hyperv.h
> index cc21b9abfc..c7a7f32bd5 100644
> --- a/xen/include/asm-x86/guest/hyperv.h
> +++ b/xen/include/asm-x86/guest/hyperv.h
> @@ -19,10 +19,38 @@
>  #ifndef __X86_GUEST_HYPERV_H__
>  #define __X86_GUEST_HYPERV_H__
> 
> -#ifdef CONFIG_HYPERV_GUEST
> -
>  #include 
> 
> +/*
> + * The specification says: "The partition reference time is computed
> + * by the following formula:
> + *
> + * ReferenceTime = ((VirtualTsc * TscScale) >> 64) + TscOffset
> + *
> + * The multiplication is a 64 bit multiplication, which results in a
> + * 128 bit number which is then shifted 64 times to the right to obtain
> + * the high 64 bits."
> + */
> +static inline uint64_t hv_scale_tsc(uint64_t tsc, uint64_t scale,
> +int64_t offset)
> +{
> +uint64_t result;
> +
> +/*
> + * Quadword MUL takes an implicit operand in RAX, and puts the result
> + * in RDX:RAX. Because we only want the result of the multiplication
> + * after shifting right by 64 bits, we therefore only need the
> content
> + * of RDX.
> + */
> +asm ( "mulq %[scale]"
> +  : "+a" (tsc), "=d" (result)
> +  : [scale] "rm" (scale) );
> +
> +return result + offset;
> +}
> +
> +#ifdef CONFIG_HYPERV_GUEST
> +
>  #include 
> 
>  struct ms_hyperv_info {
> --
> 2.20.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH net-next] xen-netback: support dynamic unbind/bind

2019-12-23 Thread Durrant, Paul
> -Original Message-
> From: Wei Liu 
> Sent: 23 December 2019 11:36
> To: Durrant, Paul 
> Cc: xen-devel@lists.xenproject.org; net...@vger.kernel.org; linux-
> ker...@vger.kernel.org; Wei Liu ; Paul Durrant
> ; David S. Miller 
> Subject: Re: [PATCH net-next] xen-netback: support dynamic unbind/bind
> 
> On Mon, Dec 23, 2019 at 09:59:23AM +, Paul Durrant wrote:
> [...]
> > diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-
> netback/interface.c
> > index f15ba3de6195..0c8a02a1ead7 100644
> > --- a/drivers/net/xen-netback/interface.c
> > +++ b/drivers/net/xen-netback/interface.c
> > @@ -585,6 +585,7 @@ int xenvif_connect_ctrl(struct xenvif *vif,
> grant_ref_t ring_ref,
> > struct net_device *dev = vif->dev;
> > void *addr;
> > struct xen_netif_ctrl_sring *shared;
> > +   RING_IDX rsp_prod, req_prod;
> > int err;
> >
> > err = xenbus_map_ring_valloc(xenvif_to_xenbus_device(vif),
> > @@ -593,7 +594,14 @@ int xenvif_connect_ctrl(struct xenvif *vif,
> grant_ref_t ring_ref,
> > goto err;
> >
> > shared = (struct xen_netif_ctrl_sring *)addr;
> > -   BACK_RING_INIT(>ctrl, shared, XEN_PAGE_SIZE);
> > +   rsp_prod = READ_ONCE(shared->rsp_prod);
> > +   req_prod = READ_ONCE(shared->req_prod);
> > +
> > +   BACK_RING_ATTACH(>ctrl, shared, rsp_prod, XEN_PAGE_SIZE);
> > +
> > +   err = -EIO;
> > +   if (req_prod - rsp_prod > RING_SIZE(>ctrl))
> > +   goto err_unmap;
> 
> I think it makes more sense to attach the ring after this check has been
> done, but I can see you want to structure code like this to reuse the
> unmap error path.

Looks a little odd, agreed. The reason I did it this way is so that I can use 
RING_SIZE() rather than having to use __RING_SIZE(); makes the code just a 
little bit shorter... which reminds me I ought to neaten up blkback similarly.

> 
> So:
> 
> Reviewed-by: Wei Liu 
> 
> Nice work btw.

Thanks :-)

  Paul

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 2/6] x86/viridian: drop duplicate defines from private.h and viridian.c

2019-12-18 Thread Durrant, Paul
> -Original Message-
> From: Wei Liu  On Behalf Of Wei Liu
> Sent: 18 December 2019 14:42
> To: Xen Development List 
> Cc: Michael Kelley ; Durrant, Paul
> ; Wei Liu ; Paul Durrant
> ; Jan Beulich ; Andrew Cooper
> ; Wei Liu ; Roger Pau Monné
> 
> Subject: [PATCH v2 2/6] x86/viridian: drop duplicate defines from
> private.h and viridian.c
> 
> No functional change intended.
> 
> Signed-off-by: Wei Liu 

[snip]
> diff --git a/xen/arch/x86/hvm/viridian/viridian.c
> b/xen/arch/x86/hvm/viridian/viridian.c
> index 4b06b78a27..76f6b6510b 100644
> --- a/xen/arch/x86/hvm/viridian/viridian.c
> +++ b/xen/arch/x86/hvm/viridian/viridian.c
> @@ -10,6 +10,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -19,22 +20,10 @@
> 
>  #include "private.h"
> 
> -/* Viridian Hypercall Status Codes. */
> -#define HV_STATUS_SUCCESS   0x
> -#define HV_STATUS_INVALID_HYPERCALL_CODE0x0002
> -#define HV_STATUS_INVALID_PARAMETER 0x0005
> -
>  /* Viridian Hypercall Codes. */
> -#define HvFlushVirtualAddressSpace 0x0002
> -#define HvFlushVirtualAddressList  0x0003
> -#define HvNotifyLongSpinWait   0x0008
> -#define HvSendSyntheticClusterIpi  0x000b

>  #define HvGetPartitionId   0x0046
>  #define HvExtCallQueryCapabilities 0x8001

These ought to be added to hyperv-tlfs.h. After all they are specified in the 
TLFS.

  Paul


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 4/6] x86/viridian: drop private copy of HV_REFERENCE_TSC_PAGE in time.c

2019-12-18 Thread Durrant, Paul
> -Original Message-
> From: Wei Liu  On Behalf Of Wei Liu
> Sent: 18 December 2019 14:43
> To: Xen Development List 
> Cc: Michael Kelley ; Durrant, Paul
> ; Wei Liu ; Paul Durrant
> ; Jan Beulich ; Andrew Cooper
> ; Wei Liu ; Roger Pau Monné
> 
> Subject: [PATCH v2 4/6] x86/viridian: drop private copy of
> HV_REFERENCE_TSC_PAGE in time.c
> 
> Use the one defined in hyperv-tlfs.h instead. No functional change
> intended.
> 
> Signed-off-by: Wei Liu 
> ---
>  xen/arch/x86/hvm/viridian/time.c | 30 +++---
>  1 file changed, 11 insertions(+), 19 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/viridian/time.c
> b/xen/arch/x86/hvm/viridian/time.c
> index 6ddca29b29..33c15782e4 100644
> --- a/xen/arch/x86/hvm/viridian/time.c
> +++ b/xen/arch/x86/hvm/viridian/time.c
> @@ -13,19 +13,11 @@
> 
>  #include 
>  #include 
> +#include 
>  #include 
> 
>  #include "private.h"
> 
> -typedef struct _HV_REFERENCE_TSC_PAGE
> -{
> -uint32_t TscSequence;
> -uint32_t Reserved1;
> -uint64_t TscScale;
> -int64_t  TscOffset;
> -uint64_t Reserved2[509];
> -} HV_REFERENCE_TSC_PAGE, *PHV_REFERENCE_TSC_PAGE;
> -
>  static void update_reference_tsc(const struct domain *d, bool initialize)
>  {
>  struct viridian_domain *vd = d->arch.hvm.viridian;
> @@ -41,18 +33,18 @@ static void update_reference_tsc(const struct domain
> *d, bool initialize)
>   * This enlightenment must be disabled is the host TSC is not
> invariant.
>   * However it is also disabled if vtsc is true (which means rdtsc is
>   * being emulated). This generally happens when guest TSC freq and
> host
> - * TSC freq don't match. The TscScale value could be adjusted to cope
> + * TSC freq don't match. The tsc_scale value could be adjusted to
> cope
>   * with this, allowing vtsc to be turned off, but support for this is
>   * not yet present in the hypervisor. Thus is it is possible that
>   * migrating a Windows VM between hosts of differing TSC frequencies
>   * may result in large differences in guest performance. Any jump in
>   * TSC due to migration down-time can, however, be compensated for by
> - * setting the TscOffset value (see below).
> + * setting the tsc_offset value (see below).
>   */
>  if ( !host_tsc_is_safe() || d->arch.vtsc )
>  {
>  /*
> - * The specification states that valid values of TscSequence
> range
> + * The specification states that valid values of tsc_sequence
> range
>   * from 0 to 0xFFFE. The value 0x is used to indicate
>   * this mechanism is no longer a reliable source of time and that
>   * the VM should fall back to a different source.
> @@ -61,7 +53,7 @@ static void update_reference_tsc(const struct domain *d,
> bool initialize)
>   * violate the spec. and rely on a value of 0 to indicate that
> this
>   * enlightenment should no longer be used.
>   */
> -p->TscSequence = 0;
> +p->tsc_sequence = 0;
> 
>  printk(XENLOG_G_INFO "d%d: VIRIDIAN REFERENCE_TSC:
> invalidated\n",
> d->domain_id);
> @@ -72,29 +64,29 @@ static void update_reference_tsc(const struct domain
> *d, bool initialize)
>   * The guest will calculate reference time according to the following
>   * formula:
>   *
> - * ReferenceTime = ((RDTSC() * TscScale) >> 64) + TscOffset
> + * ReferenceTime = ((RDTSC() * tsc_scale) >> 64) + tsc_offset
>   *
>   * Windows uses a 100ns tick, so we need a scale which is cpu
>   * ticks per 100ns shifted left by 64.
>   * The offset value is calculated on restore after migration and
>   * ensures that Windows will not see a large jump in ReferenceTime.
>   */
> -p->TscScale = ((1ul << 32) / d->arch.tsc_khz) << 32;
> -p->TscOffset = trc->off;
> +p->tsc_scale = ((1ul << 32) / d->arch.tsc_khz) << 32;
> +p->tsc_offset = trc->off;
>  smp_wmb();
> 
> -seq = p->TscSequence + 1;
> +seq = p->tsc_sequence + 1;
>  if ( seq == 0x || seq == 0 ) /* Avoid both 'invalid' values
> */
>  seq = 1;
> 
> -p->TscSequence = seq;
> +p->tsc_sequence = seq;
>  }
> 
>  /*
>   * The specification says: "The partition reference time is computed
>   * by the following formula:
>   *
> - * ReferenceTime = ((VirtualTsc * TscScale) >> 64) + TscOffset
> + * ReferenceTime = ((VirtualTsc * tsc_scale) >> 64) + tsc_offset

I'd prefer keeping the CamelCase here as it's text lifted from the TLFS and not 
reliant on the header definitions.

  Paul

>   *
>   * The multiplication is a 64 bit multiplication, which results in a
>   * 128 bit number which is then shifted 64 times to the right to obtain
> --
> 2.20.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 6/6] x86: implement Hyper-V clock source

2019-12-18 Thread Durrant, Paul
> -Original Message-
> From: Wei Liu  On Behalf Of Wei Liu
> Sent: 18 December 2019 14:43
> To: Xen Development List 
> Cc: Michael Kelley ; Durrant, Paul
> ; Wei Liu ; Jan Beulich
> ; Andrew Cooper ; Wei Liu
> ; Roger Pau Monné 
> Subject: [PATCH v2 6/6] x86: implement Hyper-V clock source
> 
> Implement a clock source using Hyper-V's reference TSC page.
> 
> Signed-off-by: Wei Liu 
> ---
> v2:
> 1. Address Jan's comments.
> 
> Relevant spec:
> 
> https://github.com/MicrosoftDocs/Virtualization-
> Documentation/raw/live/tlfs/Hypervisor%20Top%20Level%20Functional%20Specif
> ication%20v5.0C.pdf
> 
> Section 12.6.
> ---
>  xen/arch/x86/time.c | 101 
>  1 file changed, 101 insertions(+)
> 
> diff --git a/xen/arch/x86/time.c b/xen/arch/x86/time.c
> index 216169a025..8b96b2e9a5 100644
> --- a/xen/arch/x86/time.c
> +++ b/xen/arch/x86/time.c
> @@ -31,6 +31,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -644,6 +645,103 @@ static struct platform_timesource __initdata
> plt_xen_timer =
>  };
>  #endif
> 
> +#ifdef CONFIG_HYPERV_GUEST
> +/
> + * HYPER-V REFERENCE TSC
> + */
> +
> +static struct ms_hyperv_tsc_page *hyperv_tsc;
> +static struct page_info *hyperv_tsc_page;
> +
> +static int64_t __init init_hyperv_timer(struct platform_timesource *pts)
> +{
> +paddr_t maddr;
> +uint64_t tsc_msr, freq;
> +
> +if ( !(ms_hyperv.features & HV_MSR_REFERENCE_TSC_AVAILABLE) )
> +return 0;
> +
> +hyperv_tsc_page = alloc_domheap_page(NULL, 0);
> +if ( !hyperv_tsc_page )
> +return 0;
> +
> +hyperv_tsc = __map_domain_page_global(hyperv_tsc_page);
> +if ( !hyperv_tsc )
> +{
> +free_domheap_page(hyperv_tsc_page);
> +hyperv_tsc_page = NULL;
> +return 0;
> +}
> +
> +maddr = page_to_maddr(hyperv_tsc_page);
> +
> +/*
> + * Per Hyper-V TLFS:
> + *   1. Read existing MSR value
> + *   2. Preserve bits [11:1]
> + *   3. Set bits [63:12] to be guest physical address of tsc page
> + *   4. Set enabled bit (0)
> + *   5. Write back new MSR value
> + */
> +rdmsrl(HV_X64_MSR_REFERENCE_TSC, tsc_msr);
> +tsc_msr &= 0xffeULL;
> +tsc_msr |=  maddr | 1 /* enabled */;
> +wrmsrl(HV_X64_MSR_REFERENCE_TSC, tsc_msr);
> +

You need to check for the HV_X64_ACCESS_FREQUENCY_MSRS feature or you risk a 
#GP below I think.

> +/* Get TSC frequency from Hyper-V */
> +rdmsrl(HV_X64_MSR_TSC_FREQUENCY, freq);
> +pts->frequency = freq;
> +
> +return freq;
> +}
> +
> +static inline uint64_t read_hyperv_timer(void)
> +{
> +uint64_t scale, offset, ret, tsc;
> +uint32_t seq;
> +const struct ms_hyperv_tsc_page *tsc_page = hyperv_tsc;
> +
> +do {
> +seq = tsc_page->tsc_sequence;
> +
> +/* Seq 0 is special. It means the TSC enlightenment is not
> + * available at the moment. The reference time can only be
> + * obtained from the Reference Counter MSR.
> + */
> +if ( seq == 0 )

Older versions of the spec used to use 0x I think, although when I look 
again they seem to have been retro-actively fixed. In any case I think you 
should treat both 0x and 0 as invalid.

> +{
> +rdmsrl(HV_X64_MSR_TIME_REF_COUNT, ret);
> +return ret;
> +}
> +
> +/* rdtsc_ordered already contains a load fence */
> +tsc = rdtsc_ordered();
> +scale = tsc_page->tsc_scale;
> +offset = tsc_page->tsc_offset;
> +
> +smp_rmb();
> +
> +} while (tsc_page->tsc_sequence != seq);
> +
> +/* ret = ((tsc * scale) >> 64) + offset; */
> +asm ( "mul %[scale]; add %[offset], %[ret]"
> +  : "+a" (tsc), [ret] "=d" (ret)
> +  : [scale] "rm" (scale), [offset] "rm" (offset) );
> +

It would be nice to common this up with scale_tsc() in viridian/time.c.

  Paul

> +return ret;
> +}
> +
> +static struct platform_timesource __initdata plt_hyperv_timer =
> +{
> +.id = "hyperv",
> +.name = "HYPER-V REFERENCE TSC",
> +.read_counter = read_hyperv_timer,
> +.init = init_hyperv_timer,
> +/* See TSC time source for why counter_bits is set to 63 */
> +.counter_bits = 63,
> +};
> +#endif
> +
>  /
>   * GENERIC PLATFORM TIMER INFRASTRUCTURE
>   */
> @@ -793,6 +891,9 @@ static u64 __init init_platform_timer(void)
>  static struct platform_timesource * __initdata plt_timers[] = {
>  #ifdef CONFIG_XEN_GUEST
>  _xen_timer,
> +#endif
> +#ifdef CONFIG_HYPERV_GUEST
> +_hyperv_timer,
>  #endif
>  _hpet, _pmtimer, _pit
>  };
> --
> 2.20.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 3/6] x86/viridian: drop private copy of definitions from synic.c

2019-12-18 Thread Durrant, Paul
> -Original Message-
> From: Wei Liu  On Behalf Of Wei Liu
> Sent: 18 December 2019 14:43
> To: Xen Development List 
> Cc: Michael Kelley ; Durrant, Paul
> ; Wei Liu ; Paul Durrant
> ; Jan Beulich ; Andrew Cooper
> ; Wei Liu ; Roger Pau Monné
> 
> Subject: [PATCH v2 3/6] x86/viridian: drop private copy of definitions
> from synic.c
> 
> Use hyperv-tlfs.h instead. No functional change intended.
> 
> Signed-off-by: Wei Liu 

Reviewed-by: Paul Durrant 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 1/6] x86: import hyperv-tlfs.h from Linux

2019-12-18 Thread Durrant, Paul
> -Original Message-
> From: Wei Liu  On Behalf Of Wei Liu
> Sent: 18 December 2019 14:42
> To: Xen Development List 
> Cc: Michael Kelley ; Durrant, Paul
> ; Wei Liu ; Jan Beulich
> ; Andrew Cooper ; Wei Liu
> ; Roger Pau Monné 
> Subject: [PATCH v2 1/6] x86: import hyperv-tlfs.h from Linux
> 
> Take a pristine copy from Linux commit
> b2d8b167e15bb5ec2691d1119c025630a247f649.
> 
> Do the following to fix it up for Xen:
> 
> 1. include xen/types.h and xen/bitops.h
> 2. fix up invocations of BIT macro
> 
> Signed-off-by: Wei Liu 
> Acked-by: Jan Beulich 
[snip]
> +/*
> + * The guest OS needs to register the guest ID with the hypervisor.
> + * The guest ID is a 64 bit entity and the structure of this ID is
> + * specified in the Hyper-V specification:
> + *
> + * msdn.microsoft.com/en-
> us/library/windows/hardware/ff542653%28v=vs.85%29.aspx
> + *
> + * While the current guideline does not specify how Linux guest ID(s)
> + * need to be generated, our plan is to publish the guidelines for
> + * Linux and other guest operating systems that currently are hosted
> + * on Hyper-V. The implementation here conforms to this yet
> + * unpublished guidelines.
> + *
> + *
> + * Bit(s)
> + * 63 - Indicates if the OS is Open Source or not; 1 is Open Source
> + * 62:56 - Os Type; Linux is 0x100
> + * 55:48 - Distro specific identification
> + * 47:16 - Linux kernel version number
> + * 15:0  - Distro specific identification
> + *
> + *

It might be useful to pull the declaration of union viridian_guest_os_id_msr in 
here since the comment is explaining the format.

  Paul


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/save: reserve HVM save record numbers that have been consumed...

2019-12-19 Thread Durrant, Paul
> -Original Message-
> From: Xen-devel  On Behalf Of
> Andrew Cooper
> Sent: 18 December 2019 19:45
> To: Durrant, Paul ; xen-devel@lists.xenproject.org
> Cc: Wei Liu ; Jan Beulich ; Roger Pau Monné
> 
> Subject: Re: [Xen-devel] [PATCH] x86/save: reserve HVM save record numbers
> that have been consumed...
> 
> On 18/12/2019 16:09, Paul Durrant wrote:
> > ...for patches not (yet) upstream.
> >
> > This patch is simply reserving save record number space to avoid the
> > risk of clashes between existent downstream changes made by Amazon and
> > future upstream changes which may be incompatible.
> >
> > Signed-off-by: Paul Durrant 
> 
> Is this "you've already used some of these", or you plan to?

Already used in code that has been deployed, although I have left some headroom 
because I know there is code in development which is using new ones.

Where records can be upstreamed in a way that is compatible with downstream 
use, we will keep the existing number. If incompatible changes are necessary to 
get the code upstream then we will have to use a new number and maintain 
downstream compatibility patches.

  Paul

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 6/6] x86: implement Hyper-V clock source

2019-12-19 Thread Durrant, Paul
> -Original Message-
> From: Wei Liu 
> Sent: 18 December 2019 22:21
> To: Michael Kelley 
> Cc: Durrant, Paul ; Wei Liu ; Xen
> Development List ; Wei Liu
> ; Jan Beulich ; Andrew Cooper
> ; Roger Pau Monné 
> Subject: Re: [PATCH v2 6/6] x86: implement Hyper-V clock source
> 
> On Wed, 18 Dec 2019 at 20:24, Michael Kelley 
> wrote:
> >
> > From: Durrant, Paul  Sent: Wednesday, December 18,
> 2019 7:24 AM
> >
> > > > From: Wei Liu  On Behalf Of Wei Liu
> > > > Sent: 18 December 2019 14:43
> >
> > [snip]
> >
> > > > +
> > > > +static inline uint64_t read_hyperv_timer(void)
> > > > +{
> > > > +uint64_t scale, offset, ret, tsc;
> > > > +uint32_t seq;
> > > > +const struct ms_hyperv_tsc_page *tsc_page = hyperv_tsc;
> > > > +
> > > > +do {
> > > > +seq = tsc_page->tsc_sequence;
> > > > +
> > > > +/* Seq 0 is special. It means the TSC enlightenment is not
> > > > + * available at the moment. The reference time can only be
> > > > + * obtained from the Reference Counter MSR.
> > > > + */
> > > > +if ( seq == 0 )
> > >
> > > Older versions of the spec used to use 0x I think, although
> when I look again they
> > > seem to have been retro-actively fixed. In any case I think you should
> treat both
> > > 0x and 0 as invalid.
> >
> > FWIW, the 0x was just a bug in the spec.  Hyper-V
> implementations only
> > set the value to 0 to indicate invalid.  The equivalent Linux code
> checks only for 0.
> >
> 
> Thanks for chiming in, Michael.
> 
> In that case I will submit a fix to change Xen's viridian code to
> remove the wrong value there.

If no consuming version of Windows is going to be upset seeing all-Fs then 
that's fine. Thanks for the clarification.

Cheers,

  Paul
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/save: reserve HVM save record numbers that have been consumed...

2019-12-19 Thread Durrant, Paul
> -Original Message-
> From: Andrew Cooper 
> Sent: 19 December 2019 10:52
> To: Durrant, Paul ; xen-devel@lists.xenproject.org
> Cc: Wei Liu ; Jan Beulich ; Roger Pau Monné
> 
> Subject: Re: [Xen-devel] [PATCH] x86/save: reserve HVM save record numbers
> that have been consumed...
> 
> On 19/12/2019 08:52, Durrant, Paul wrote:
> >> -Original Message-
> >> From: Xen-devel  On Behalf Of
> >> Andrew Cooper
> >> Sent: 18 December 2019 19:45
> >> To: Durrant, Paul ; xen-devel@lists.xenproject.org
> >> Cc: Wei Liu ; Jan Beulich ; Roger Pau
> Monné
> >> 
> >> Subject: Re: [Xen-devel] [PATCH] x86/save: reserve HVM save record
> numbers
> >> that have been consumed...
> >>
> >> On 18/12/2019 16:09, Paul Durrant wrote:
> >>> ...for patches not (yet) upstream.
> >>>
> >>> This patch is simply reserving save record number space to avoid the
> >>> risk of clashes between existent downstream changes made by Amazon and
> >>> future upstream changes which may be incompatible.
> >>>
> >>> Signed-off-by: Paul Durrant 
> >> Is this "you've already used some of these", or you plan to?
> > Already used in code that has been deployed, although I have left some
> headroom because I know there is code in development which is using new
> ones.
> >
> > Where records can be upstreamed in a way that is compatible with
> downstream use, we will keep the existing number. If incompatible changes
> are necessary to get the code upstream then we will have to use a new
> number and maintain downstream compatibility patches.
> 
> Every bump to this number is more wasted memory in Xen.
> 

How much more memory?

> It is not fair or reasonable to include extra headroom in a "oh dear we
> screwed up - will the community be kind enough to help us work around
> our ABI problems" scenario.
> 

I would have thought all the pain you went through to keep XenServer going with 
its non-upstreamed hypercall numbers would have made you a little more 
sympathetic to dealing with past mistakes.

> For now, I'd just leave it as a comment, and strictly only covering the
> ones you have used.  There is no need to actually bump the structure
> sizes in xen for now - simply that the ones you have currently used
> don't get allocated for something different in the future.
>

Ok, we can defer actually bumping HVM_SAVE_CODE_MAX, but it's almost certain to 
happen eventually.

  Paul

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/save: reserve HVM save record numbers that have been consumed...

2019-12-19 Thread Durrant, Paul
> -Original Message-
> From: Jan Beulich 
> Sent: 19 December 2019 10:33
> To: Durrant, Paul 
> Cc: xen-devel@lists.xenproject.org; Andrew Cooper
> ; Wei Liu ; Roger Pau Monné
> 
> Subject: Re: [PATCH] x86/save: reserve HVM save record numbers that have
> been consumed...
> 
> On 18.12.2019 17:09, Paul Durrant wrote:
> > --- a/xen/include/public/arch-x86/hvm/save.h
> > +++ b/xen/include/public/arch-x86/hvm/save.h
> > @@ -639,10 +639,12 @@ struct hvm_msr {
> >
> >  #define CPU_MSR_CODE  20
> >
> > +/* Range 22 - 40 reserved for Amazon */
> > +
> >  /*
> >   * Largest type-code in use
> >   */
> > -#define HVM_SAVE_CODE_MAX 20
> > +#define HVM_SAVE_CODE_MAX 40
> 
> I'm not overly happy to see the respective in-Xen array double its
> size for no use at all. I also don't think out-of-tree extensions
> should have been added using numbers consecutive to the upstream
> ones. Instead, an Amazon range should have been picked high up in
> the number space (e.g. with the upper byte being ASCII 'A').
> 

Totally agreed, but we don't have a time machine handy.

  Paul

> Jan
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/hvm/rtc: preserved guest RTC offset during suspend/resume/migrate

2019-12-19 Thread Durrant, Paul
> -Original Message-
> From: Jan Beulich 
> Sent: 19 December 2019 11:30
> To: Durrant, Paul 
> Cc: xen-devel@lists.xenproject.org; Stefano Stabellini
> ; Julien Grall ; Volodymyr Babchuk
> ; Andrew Cooper ;
> George Dunlap ; Ian Jackson
> ; Konrad Rzeszutek Wilk
> ; Wei Liu ; Roger Pau Monné
> 
> Subject: Re: [PATCH] x86/hvm/rtc: preserved guest RTC offset during
> suspend/resume/migrate
> 
> On 18.12.2019 15:41, Paul Durrant wrote:
> > The emulated RTC is synchronized with the PV wallclock; any write to the
> > RTC will update struct domain's 'time_offset_seconds' field and call
> > update_domain_wallclock().
> >
> > However, the value of 'time_offset_seconds' is not preserved in any save
> > record and indeed, when the RTC save record is loaded, the CMOS values
> > will be updated based on an offset value which may or may not have been
> > set by the toolstack [1]. This may result in making bogus values
> available
> > to the guest and messing up any calculations done in the call to
> > alarm_timer_update() at the end of rtc_load().
> >
> > This patch extends the RTC save record to contain an offset value, which
> > will be zero filled on load of an older record. The
> 'time_offset_secoonds'
> > field in struct domain is also modified into a 'time_offset' struct,
> > containing a 'seconds' field and a boolean 'set' field.
> >
> > The code in rtc_load() then uses the new value in the save record to
> > update the value of struct domain's 'time_offset.seconds' unless
> > 'time_offset.set' is true, which will only be the case if the toolstack
> has
> > already performed a XEN_DOMCTL_settimeoffset.
> >
> > [1] There is currently no way for a toolstack to read the value of
> > 'time_offset_seconds' from struct domain. In the past, any hope of
> > preservation of the value across a guest life-cycle operation was
> based
> > on relying on qemu-dm to write a value into xenstore whenever the
> RTC
> > was updated, in response to an IOREQ with type IOREQ_TYPE_TIMEOFFSET
> > being sent by Xen; see:
> >
> > https://xenbits.xen.org/gitweb/?p=qemu-xen-
> traditional.git;a=blob;f=i386-dm/helper2.c#l457
> >
> > but this behaviour was never forward-ported into upstream QEMU,
> which
> > completely ignores that IOREQ type.
> > In either case, nothing in xl or libxl ever samples the value of
> > RTC offset from xenstore so any offset adjustment to a non-zero
> value
> > performed by the guest (which in the case of Windows is highly
> likely
> > as it normally writes RTC in local time, whereas Xen maintains time
> in
> > UTC) is completely lost with the de-facto toolstack, and always has
> > been. Instead, PV drivers are relied upon to paper over this gaping
> > hole.
> >
> > Signed-off-by: Paul Durrant 
> 
> Reviewed-by: Jan Beulich 

Thanks.

> with one remark:
> 
> > @@ -771,6 +773,12 @@ static int rtc_load(struct domain *d,
> hvm_domain_context_t *h)
> >
> >  /* Reset the wall-clock time.  In normal running, this runs with
> host
> >   * time, so let's keep doing that. */
> > +if ( !d->time_offset.set )
> > +{
> > +d->time_offset.seconds = s->hw.rtc_offset;
> > +update_domain_wallclock_time(d);
> > +}
> 
> It's not really clear to me which of the possible behaviors is the
> better one - overriding a tool stack set value with what the save
> record says, or (like you do) the other way around.

It's not clear to me either, which is why I erred on the side of caution. I 
didn't want to break any out-of-tree toolstacks that might be overriding the 
offset early in the restore phase from a value acquired via QEMU hackery. I 
guess the boolean could be dispensed with if we retire the IOREQ and silently 
ignore the DOMCTL for HVM guests (so overrides from the aforementioned 
toolstacks simply have no effect).

  Paul

> 
> Jan
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] MAINTAINERS: put hyperv-tlfs.h under viridian maintainership

2019-12-23 Thread Durrant, Paul
> -Original Message-
> From: Xen-devel  On Behalf Of Wei
> Liu
> Sent: 23 December 2019 12:52
> To: Xen Development List 
> Cc: Andrew Cooper ; Jan Beulich
> ; Wei Liu ; Paul Durrant 
> Subject: [Xen-devel] [PATCH] MAINTAINERS: put hyperv-tlfs.h under viridian
> maintainership
> 
> Suggested-by: Jan Beulich 
> Signed-off-by: Wei Liu 

Acked-by: Paul Durrant 

> ---
>  MAINTAINERS | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 012c847ebd..eaea4620e2 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -516,6 +516,7 @@ M:Paul Durrant 
>  S:   Supported
>  F:   xen/arch/x86/hvm/viridian/
>  F:   xen/include/asm-x86/hvm/viridian.h
> +F:   xen/include/asm-x86/guest/hyperv-tlfs.h
> 
>  XENTRACE
>  M:   George Dunlap 
> --
> 2.20.1
> 
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/save: reserve HVM save record numbers that have been consumed...

2019-12-19 Thread Durrant, Paul
> -Original Message-
> From: Andrew Cooper 
> Sent: 19 December 2019 11:30
> To: Durrant, Paul ; xen-devel@lists.xenproject.org
> Cc: Wei Liu ; Jan Beulich ; Roger Pau Monné
> 
> Subject: Re: [Xen-devel] [PATCH] x86/save: reserve HVM save record numbers
> that have been consumed...
> 
> On 19/12/2019 11:06, Durrant, Paul wrote:
> >> It is not fair or reasonable to include extra headroom in a "oh dear we
> >> screwed up - will the community be kind enough to help us work around
> >> our ABI problems" scenario.
> >>
> > I would have thought all the pain you went through to keep XenServer
> going with its non-upstreamed hypercall numbers would have made you a
> little more sympathetic to dealing with past mistakes.
> 
> I could object to the principle of the patch, if you'd prefer :)
> 
> If you recall for the legacy window PV driver ABI breakages, I didn't
> actually reserve any numbers upstream in the end.  All compatibility was
> handled locally.

And I remember how nasty the hacks were ;-)

Given we don't yet have a clash that requires such nastiness, I just want to 
avoid it happening before we manage to dispense with the downstream-only legacy 
code.

> 
> >> For now, I'd just leave it as a comment, and strictly only covering the
> >> ones you have used.  There is no need to actually bump the structure
> >> sizes in xen for now - simply that the ones you have currently used
> >> don't get allocated for something different in the future.
> >>
> > Ok, we can defer actually bumping HVM_SAVE_CODE_MAX, but it's almost
> certain to happen eventually.
> 
> That's fine.
> 

Ok. I'll send a v2 with just the comment (and assume Wei's R-b still stands).

  Paul
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] x86/save: reserve HVM save record numbers that have been consumed...

2019-12-19 Thread Durrant, Paul
> -Original Message-
> From: Andrew Cooper 
> Sent: 19 December 2019 12:16
> To: Durrant, Paul ; xen-devel@lists.xenproject.org
> Cc: Wei Liu ; Jan Beulich ; Roger Pau Monné
> 
> Subject: Re: [PATCH v2] x86/save: reserve HVM save record numbers that
> have been consumed...
> 
> On 19/12/2019 12:04, Paul Durrant wrote:
> > ...for patches not (yet) upstream.
> >
> > This patch is simply adding a comment to reserve save record number
> space
> > to avoid the risk of clashes between existent downstream changes made by
> > Amazon and future upstream changes which may be incompatible.
> >
> > Signed-off-by: Paul Durrant 
> > Reviewed-by: Wei Liu 
> > ---
> > Cc: Jan Beulich 
> > Cc: Andrew Cooper 
> > Cc: "Roger Pau Monné" 
> >
> > v2:
> >  - Reduce patch to just a comment
> > ---
> >  xen/include/public/arch-x86/hvm/save.h | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/xen/include/public/arch-x86/hvm/save.h
> b/xen/include/public/arch-x86/hvm/save.h
> > index b2ad3fcd74..b3d445acf7 100644
> > --- a/xen/include/public/arch-x86/hvm/save.h
> > +++ b/xen/include/public/arch-x86/hvm/save.h
> > @@ -639,6 +639,8 @@ struct hvm_msr {
> >
> >  #define CPU_MSR_CODE  20
> >
> > +/* Range 22 - 40 reserved for Amazon */
> 
> What about 21 and 22?  And where does the actual number stop, because
> based on v1, its not 40.
> 

The range is inclusive (maybe that's not obvious?). For some reason 21 was 
skipped but why do you say the top is not 40? That was what I set 
HVM_SAVE_CODE_MAX to in v1.

  Paul
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v4 6/9] x86/mm: add an end_of_loop label in map_pages_to_xen

2019-12-05 Thread Durrant, Paul
> -Original Message-
> From: Xen-devel  On Behalf Of Jan
> Beulich
> Sent: 05 December 2019 10:26
> To: Xia, Hongyan 
> Cc: andrew.coop...@citrix.com; xen-devel@lists.xenproject.org; w...@xen.org;
> roger@citrix.com
> Subject: Re: [Xen-devel] [PATCH v4 6/9] x86/mm: add an end_of_loop label
> in map_pages_to_xen
> 
> On 05.12.2019 11:21, Xia, Hongyan wrote:
> >> On 02.10.2019 19:16, Hongyan Xia wrote:
> >>> We will soon need to clean up mappings whenever the out most loop is
> >>> ended. Add a new label and turn relevant continue's into goto's.
> >>
> >> I think already when this still was RFC I did indicate that I'm not
> >> happy about the introduction of these labels (including also patch 8).
> >> I realize it's quite a lot to ask, but both functions would benefit
> >>from splitting up into per-level helper functions, which - afaict -
> >> would avoid the need for such labels, and which would at the same
> >> time likely make it quite a bit easier to extend these to the
> >> 5-level page tables case down the road.
> >
> > A common pattern I have found when mapping PTE pages on-demand (and I
> > think is the exact intention of these labels from Wei, also described
> > in the commit message) is that we often need to do:
> >
> > map some pages - process those pages - error occurs or this iteration
> > of loop can be skipped - _clean up the mappings_ - continue or return
> >
> > As long as cleaning up is required, these labels will likely be needed
> > as the clean-up path before skipping or returning, so I would say we
> > will see such labels even if we split it into helper functions
> > (virt_to_xen_l[123]e() later in the patch series is an example). I see
> > the labels more or less as orthogonal to modularising into helper
> > functions.
> 
> I think differently: The fact that labels are needed is because of
> the complexity of the functions. Simpler functions would allow
> goto-free handling of such error conditions (by instead being able
> to use continue, break, or return without making the code less
> readable, often even improving readability).

And what is wrong with using goto-s? It is a *very* common style of error 
handling use widely in e.g. the linux kernel. IMO it often makes error paths 
much more obvious and easier to reason about. In fact I very much dislike 
returns from the middle of functions as they can easily lead to avoidance of 
necessary error cleanup.

  Paul

> 
> Jan
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] x86 / iommu: set up a scratch page in the quarantine domain

2019-12-10 Thread Durrant, Paul
> -Original Message-
> From: Jürgen Groß 
> Sent: 10 December 2019 08:12
> To: Jan Beulich ; Tian, Kevin ;
> Durrant, Paul 
> Cc: Andrew Cooper ; xen-
> de...@lists.xenproject.org; Roger Pau Monné ; Wei
> Liu 
> Subject: Re: [PATCH v2] x86 / iommu: set up a scratch page in the
> quarantine domain
> 
> On 10.12.19 09:05, Jan Beulich wrote:
> > On 10.12.2019 08:16, Tian, Kevin wrote:
> >>> From: Jan Beulich 
> >>> Sent: Tuesday, December 3, 2019 5:36 PM
> >>>
> >>> On 28.11.2019 12:32, Jürgen Groß wrote:
> >>>> On 28.11.19 12:17, Jan Beulich wrote:
> >>>>> On 27.11.2019 18:11, Paul Durrant wrote:
> >>>>>> This patch introduces a new iommu_op to facilitate a per-
> >>> implementation
> >>>>>> quarantine set up, and then further code for x86 implementations
> >>>>>> (amd and vtd) to set up a read-only scratch page to serve as the
> source
> >>>>>> for DMA reads whilst a device is assigned to dom_io. DMA writes
> will
> >>>>>> continue to fault as before.
> >>>>>>
> >>>>>> The reason for doing this is that some hardware may continue to re-
> try
> >>>>>> DMA (despite FLR) in the event of an error, or even BME being
> cleared,
> >>> and
> >>>>>> will fail to deal with DMA read faults gracefully. Having a scratch
> page
> >>>>>> mapped will allow pending DMA reads to complete and thus such buggy
> >>>>>> hardware will eventually be quiesced.
> >>>>>>
> >>>>>> NOTE: These modifications are restricted to x86 implementations
> only as
> >>>>>> the buggy h/w I am aware of is only used with Xen in an x86
> >>>>>> environment. ARM may require similar code but, since I am
> not
> >>>>>> aware of the need, this patch does not modify any ARM
> >>> implementation.
> >>>>>>
> >>>>>> Signed-off-by: Paul Durrant 
> >>>>>
> >>>>> Reviewed-by: Jan Beulich 
> >>>>>
> >>>>>> There is still the open question of whether use of a scratch page
> ought
> >>>>>> to be gated on something, either are run-time or compile-time.
> >>>>>
> >>>>> I have no clear opinion either way here. The workaround seems low
> >>>>> overhead enough that there may not be a need to have an admin (or
> >>>>> build time) control for this.
> >>>>>
> >>>>> As to 4.13: The quarantining as a whole is pretty fresh. While it
> >>>>> has been backported to security maintained trees, I'd still consider
> >>>>> it a new feature in 4.13, and hence this workaround at least
> eligible
> >>>>> for consideration.
> >>>>
> >>>> I agree.
> >>>>
> >>>> Release-acked-by: Juergen Gross 
> >>>
> >>> I notice this has been committed meanwhile. I had specifically not
> >>> done so due to the still missing VT-d ack, seeing that this wasn't
> >>> an entirely "trivial" change.
> >>>
> >>
> >> While the quarantine idea sounds good overall, I'm still not convinced
> >> to have it the only way in place just for handling some known-buggy
> >> device. It kills the possibility of identifying a new buggy device and
> then
> >> deciding not to use it in the first space... I thought about whether it
> >> will get better when future IOMMU implements A/D bit - by checking
> >> access bit being set then we'll know some buggy device exists, but,
> >> the scratch page is shared by all devices then we cannot rely on this
> >> feature to find out the actual buggy one.
> >
> > Thinking about it - yes, I think I agree. This (as with so many
> > workarounds) would better be an off-by-default one. The main issue
> > I understand this would have is that buggy systems then might hang
> > without even having managed to get a log message out - Paul?
> >

Yes, that's the concern. Some systems will wedge hard on the first fault with 
no logging. I have no objection to a command line parameter but IMO it ought to 
default on, at least in non-debug mode.

  Paul


> > Jürgen - would you be amenable to an almost last minute refinement
> > here (would then also need to still be backported to 4.12.2, or
> > the original backport reverted, to avoid giving the impression of
> > a regression)?
> 
> So what is your suggestion here? To have a boot option (defaulting to
> off) for enabling the scratch page?
> 
> 
> Juergen
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v9 0/4] purge free_shared_domheap_page()

2020-02-13 Thread Durrant, Paul
> -Original Message-
[snip]
> 
> Ok, thanks. Kevin has completed his acks (patches #1 and #4).
> 
> George, Julien, Tim,
> 
>   Can I have acks or otherwise, please?
> 

I have acks from Julien and Tim. George, can you ack or otherwise please?

  Paul 
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v7 0/6] xl/libxl: domid allocation/preservation changes

2020-02-24 Thread Durrant, Paul
> -Original Message-
> From: Xen-devel  On Behalf Of
> Durrant, Paul
> Sent: 24 February 2020 16:33
> To: Ian Jackson 
> Cc: Stefano Stabellini ; Julien Grall
> ; Wei Liu ; Jason Andryuk
> ; Andrew Cooper ; Konrad
> Rzeszutek Wilk ; George Dunlap
> ; Jan Beulich ; Anthony
> Perard ; xen-devel@lists.xenproject.org
> Subject: Re: [Xen-devel] [PATCH v7 0/6] xl/libxl: domid
> allocation/preservation changes
> 
> > -Original Message-
> > From: Ian Jackson 
> > Sent: 24 February 2020 15:54
> > To: Durrant, Paul 
> > Cc: xen-devel@lists.xenproject.org; Andrew Cooper
> > ; Anthony Perard ;
> > George Dunlap ; Jan Beulich
> ;
> > Jason Andryuk ; Julien Grall ;
> Konrad
> > Rzeszutek Wilk ; Stefano Stabellini
> > ; Wei Liu 
> > Subject: Re: [PATCH v7 0/6] xl/libxl: domid allocation/preservation
> > changes
> >
> > Paul Durrant writes ("[PATCH v7 0/6] xl/libxl: domid
> > allocation/preservation changes"):
> > > Paul Durrant (6):
> > >   libxl: add infrastructure to track and query 'recent' domids
> > >   libxl: modify libxl__logv() to only log valid domid values
> > >   public/xen.h: add a definition for a 'valid domid' mask
> > >   libxl: allow creation of domains with a specified or random domid
> > >   xl.conf: introduce 'domid_policy'
> > >   xl: allow domid to be preserved on save/restore or migrate
> >
> > Thanks for this.  I think this has enough acks to go in now.  Would
> > you care to fold the acks/reviews and pass me a git branch ?  If it's
> > easy for you, that would be more convenient and reliable than relying
> > on git-am.  Feel free to negotiate about details on irc...
> >
> 
> Sure, I'll point you at a branch, hopefully within the next hour or so.
> 

See 
https://xenbits.xen.org/gitweb/?p=people/pauldu/xen.git;a=shortlog;h=refs/heads/domid12

Cheers,

  Paul

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/2] domain: use PGC_extra domheap page for shared_info

2020-02-26 Thread Durrant, Paul
> -Original Message-
> From: Andrew Cooper 
> Sent: 26 February 2020 11:46
> To: Durrant, Paul ; xen-devel@lists.xenproject.org
> Cc: Stefano Stabellini ; Julien Grall
> ; Volodymyr Babchuk ; George
> Dunlap ; Ian Jackson
> ; Jan Beulich ; Konrad
> Rzeszutek Wilk ; Wei Liu 
> Subject: Re: [PATCH 2/2] domain: use PGC_extra domheap page for
> shared_info
> 
> On 25/02/2020 09:53, Paul Durrant wrote:
> > There's no particular reason shared_info need use a xenheap page.
> 

Ok, 'particular' is too vague, agreed. I'll elaborate.

> ?
> 
> There are a number of ABI-critical reasons, not least the evtchn_pending
> field which Xen needs to write.
> 

I do address this specifically in the commit comment.

"ASSERTions are added before apparently vulnerable dereferences in the
event channel code. These should not be hit because domain_kill() calls
evtchn_destroy() before calling domain_relinquish_resources() but are
added to catch any future re-ordering."

> I can see what you're trying to do, and it looks fine in principle, but
> the commit message isn't remotely accurate.  Remember that you are in
> the process of changing the meaning/usage of the xenheap - preexiting
> uses conform to the old meaning, where the sole distinction between
> domheap and xenheap pages was whether Xen required a virtual mapping or
> not.  The answer is "absolutely yes" for the shared info page.
> 

Was that the case? I haven't mined to find when map_domain_page() was 
introduced. At that point, of course, any strict 'being virtually mapped' 
distinction between xenheap and domheap was rendered inaccurate.

  Paul
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/2] domain: use PGC_extra domheap page for shared_info

2020-02-26 Thread Durrant, Paul
> -Original Message-
> From: Julien Grall 
> Sent: 26 February 2020 11:33
> To: Durrant, Paul ; xen-devel@lists.xenproject.org
> Cc: Stefano Stabellini ; Volodymyr Babchuk
> ; Andrew Cooper ;
> George Dunlap ; Ian Jackson
> ; Jan Beulich ; Konrad
> Rzeszutek Wilk ; Wei Liu 
> Subject: Re: [PATCH 2/2] domain: use PGC_extra domheap page for
> shared_info
> 
> Hi Paul,
> 
> On 25/02/2020 09:53, Paul Durrant wrote:
> > There's no particular reason shared_info need use a xenheap page.
> 
> AFAICT, a side-effect of this change is the shared_info is now going to
> be part of the migration stream. One issue with this is on the restore
> side, they will be accounted in number of pages allocated to the domain.
> 
> I am fairly certain dirty logging on page with PGC_extra set would not
> work properly as well.

True, those two do need fixing before expanding the use of PGC_extra. I guess 
this may already be a problem with the current VMX use case.

> 
> As the pages will be recreated in the restore side, I would suggest to
> skip them in XEN_DOMCTL_getpageframeinfo3.
>

At some point in future I guess it may be useful to migrate PGC_extra pages, 
but they would need to be marked as such in the stream. Skipping sounds like 
the right approach for now.

> > It's
> > only purpose is to be mapped by the guest so use a PGC_extra domheap
> page
> > instead. This does entail freeing shared_info during
> > domain_relinquish_resources() rather than domain_destroy() so care is
> > needed to avoid de-referencing a NULL shared_info pointer hence some
> > extra checks of 'is_dying' are needed.
> > ASSERTions are added before apparently vulnerable dereferences in the
> > event channel code. These should not be hit because domain_kill() calls
> > evtchn_destroy() before calling domain_relinquish_resources() but are
> > added to catch any future re-ordering.
> 
> IHMO, the ASSERTions in the event channel pending/mask/... helpers will
> not protect against re-ordering. You may never hit them even if there is
> a re-ordering. It would be better to add an ASSERT()/BUG_ON() in
> evtchn_destroy() and possibly a comment in domain_kill() to mention
> ordering.

I'm not sure about that. Why would they not be hit?

> 
> > For Arm, the call to free_shared_info() in arch_domain_destroy() is left
> > in place since it called in the error path for arch_domain_create().
> >
> > NOTE: A modification in p2m_alloc_table() is required to avoid a false
> >positive when checking for domain memory.
> >A fix is also needed in dom0_construct_pv() to avoid
> automatically
> >adding PGC_extra pages to the physmap.
> 
> I am not entirely sure how this is related to this patch. Was it a
> latent bug? If so, I think it would make sense to split it from this
> patch.
> 

It's related because the check relies on the fact that no PGC_extra pages are 
created before it is called. This is indeed true until this patch is applied.

> >
> > Signed-off-by: Paul Durrant 
> > ---
> > Cc: Stefano Stabellini 
> > Cc: Julien Grall 
> > Cc: Volodymyr Babchuk 
> > Cc: Andrew Cooper 
> > Cc: George Dunlap 
> > Cc: Ian Jackson 
> > Cc: Jan Beulich 
> > Cc: Konrad Rzeszutek Wilk 
> > Cc: Wei Liu 
> > ---
> >   xen/arch/arm/domain.c|  2 ++
> >   xen/arch/x86/domain.c|  3 ++-
> >   xen/arch/x86/mm/p2m.c|  3 +--
> >   xen/arch/x86/pv/dom0_build.c |  4 
> >   xen/common/domain.c  | 25 +++--
> >   xen/common/event_2l.c|  4 
> >   xen/common/event_fifo.c  |  1 +
> >   xen/common/time.c|  3 +++
> >   8 files changed, 36 insertions(+), 9 deletions(-)
> >
> > diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
> > index 2cbcdaac08..3904519256 100644
> > --- a/xen/arch/arm/domain.c
> > +++ b/xen/arch/arm/domain.c
> > @@ -1006,6 +1006,8 @@ int domain_relinquish_resources(struct domain *d)
> >   BUG();
> >   }
> >
> > +free_shared_info(d);
> > +
> >   return 0;
> >   }
> >
> > diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c
> > index eb7b0fc51c..3ad532eccf 100644
> > --- a/xen/arch/x86/domain.c
> > +++ b/xen/arch/x86/domain.c
> > @@ -691,7 +691,6 @@ void arch_domain_destroy(struct domain *d)
> >   pv_domain_destroy(d);
> >   free_perdomain_mappings(d);
> >
> > -free_shared_info(d);
> >   cleanup_domain_irq_mapping(d);
> >
> >   psr_domain_free(d);
> > @@ -2246,6 +2245,8 @@ int domain_relinqui

Re: [Xen-devel] [PATCH 2/2] domain: use PGC_extra domheap page for shared_info

2020-02-26 Thread Durrant, Paul
> -Original Message-
> From: Jan Beulich 
> Sent: 26 February 2020 13:58
> To: Durrant, Paul 
> Cc: xen-devel@lists.xenproject.org; Stefano Stabellini
> ; Julien Grall ; Volodymyr Babchuk
> ; Andrew Cooper ;
> George Dunlap ; Ian Jackson
> ; Konrad Rzeszutek Wilk
> ; Wei Liu 
> Subject: Re: [PATCH 2/2] domain: use PGC_extra domheap page for
> shared_info
> 
> On 25.02.2020 10:53, Paul Durrant wrote:
> > There's no particular reason shared_info need use a xenheap page. It's
> > only purpose is to be mapped by the guest so use a PGC_extra domheap
> page
> > instead.
> 
> Since the cover letter also doesn't give any background - is there a
> problem with the current arrangements? Are there any further plans
> depending on this being changed? Or is this simply "let's do it
> because now we can"?
> 

The general direction is to get rid of shared xenheap pages. Knowing that a 
xenheap page is not shared with a guest makes dealing with live update much 
easier, and also allows a chunk of code to be removed from 
domain_relinquish_resoureses() (the shared xenheap walk which de-assigns them 
from the dying guest).
The only xenheap pages shared with a normal domU (as opposed to a system 
domain, which would be re-created on live update anyway) are AFAICT shared-info 
and grant table/status frames. This series deals with shared-info but I do have 
proto-patches for the grant table code too.

  Paul
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 5/6] mm: add 'is_special_page' macro...

2020-02-29 Thread Durrant, Paul
> -Original Message-
> From: Tamas K Lengyel 
> Sent: 28 February 2020 19:32
> To: Durrant, Paul 
> Cc: Xen-devel ; Jan Beulich
> ; Andrew Cooper ; Wei Liu
> ; Roger Pau Monné ; George Dunlap
> ; Ian Jackson ;
> Julien Grall ; Konrad Rzeszutek Wilk
> ; Stefano Stabellini ; Tim
> Deegan 
> Subject: Re: [PATCH 5/6] mm: add 'is_special_page' macro...
> 
> > diff --git a/xen/arch/x86/mm/mem_sharing.c
> b/xen/arch/x86/mm/mem_sharing.c
> > index 3835bc928f..c14a724c6d 100644
> > --- a/xen/arch/x86/mm/mem_sharing.c
> > +++ b/xen/arch/x86/mm/mem_sharing.c
> > @@ -842,7 +842,7 @@ static int nominate_page(struct domain *d, gfn_t
> gfn,
> >
> >  /* Skip xen heap pages */
> 
> Perhaps adjust (or remove) the comment?
>

Yes. The comment can just be dropped I think.

  Paul
 
> >  page = mfn_to_page(mfn);
> > -if ( !page || is_xen_heap_page(page) )
> > +if ( !page || is_special_page(page) )
> >  goto out;
> 
> Thanks,
> Tamas
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] Introduce CHANGELOG.md

2020-01-21 Thread Durrant, Paul
Ping? I have acks from Lars and Wei but this doesn't appear to have been 
committed. Are any more acks required?

  Paul

> -Original Message-
> From: Paul Durrant 
> Sent: 13 January 2020 15:32
> To: xen-devel@lists.xenproject.org
> Cc: Durrant, Paul ; Andrew Cooper
> ; George Dunlap ;
> Ian Jackson ; Jan Beulich ;
> Julien Grall ; Konrad Rzeszutek Wilk
> ; Stefano Stabellini ; Wei
> Liu 
> Subject: [PATCH v2] Introduce CHANGELOG.md
> 
> As agreed during the 2020-01 community call [1] this patch introduces a
> changelog, based on the principles explained at keepachangelog.com [2].
> A new MAINTAINERS entry is also added, with myself as (currently sole)
> maintainer.
> 
> [1] See C.2 at https://cryptpad.fr/pad/#/2/pad/edit/ERZtMYD5j6k0sv-NG6Htl-
> AJ/
> [2] https://keepachangelog.com/en/1.0.0/
> 
> Signed-off-by: Paul Durrant 
> Acked-by: Lars Kurth 
> ---
> Cc: Andrew Cooper 
> Cc: George Dunlap 
> Cc: Ian Jackson 
> Cc: Jan Beulich 
> Cc: Julien Grall 
> Cc: Konrad Rzeszutek Wilk 
> Cc: Stefano Stabellini 
> Cc: Wei Liu 
> 
> v2:
>  - Dropped 'All' from 'All notable changes'
>  - Added Lars as a designated reviewer
> ---
>  CHANGELOG.md | 14 ++
>  MAINTAINERS  |  6 ++
>  2 files changed, 20 insertions(+)
>  create mode 100644 CHANGELOG.md
> 
> diff --git a/CHANGELOG.md b/CHANGELOG.md
> new file mode 100644
> index 00..b11e9bc4e3
> --- /dev/null
> +++ b/CHANGELOG.md
> @@ -0,0 +1,14 @@
> +# Changelog
> +
> +Notable changes to Xen will be documented in this file.
> +
> +The format is based on [Keep a
> Changelog](https://keepachangelog.com/en/1.0.0/)
> +
> +## [Unreleased](https://xenbits.xen.org/gitweb/?p=xen.git;a=shortlog)
> +
> +### Added
> + - This file and MAINTAINERS entry.
> +
> +##
> [4.13.0](https://xenbits.xen.org/gitweb/?p=xen.git;a=shortlog;h=RELEASE-
> 4.13.0) - 2019-12-17
> +
> +> Pointer to release from which CHANGELOG tracking starts
> diff --git a/MAINTAINERS b/MAINTAINERS
> index d5bd83073c..1ffc3dc600 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -198,6 +198,12 @@ F:   xen/include/asm-arm/
>  F:   xen/include/public/arch-arm/
>  F:   xen/include/public/arch-arm.h
> 
> +Change Log
> +M:   Paul Durrant 
> +R:   Lars Kurth 
> +S:   Maintained
> +F:   CHANGELOG.md
> +
>  Continuous Integration (CI)
>  M:   Doug Goldstein 
>  W:   https://gitlab.com/xen-project/xen
> --
> 2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 3/3] x86 / vmx: use a 'normal' domheap page for APIC_DEFAULT_PHYS_BASE

2020-01-21 Thread Durrant, Paul
> -Original Message-
> From: Xen-devel  On Behalf Of
> Julien Grall
> Sent: 21 January 2020 12:29
> To: Durrant, Paul ; xen-devel@lists.xenproject.org
> Cc: Kevin Tian ; Stefano Stabellini
> ; Jun Nakajima ; Wei Liu
> ; Konrad Rzeszutek Wilk ; George
> Dunlap ; Andrew Cooper
> ; Ian Jackson ;
> Roger Pau Monné 
> Subject: Re: [Xen-devel] [PATCH 3/3] x86 / vmx: use a 'normal' domheap
> page for APIC_DEFAULT_PHYS_BASE
> 
> Hi,
> 
> On 21/01/2020 12:00, Paul Durrant wrote:
> > diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
> > index 919a270587..ef327072ed 100644
> > --- a/xen/common/page_alloc.c
> > +++ b/xen/common/page_alloc.c
> > @@ -2269,7 +2269,8 @@ int assign_pages(
> >
> >   if ( !(memflags & MEMF_no_refcount) )
> >   {
> > -if ( unlikely((d->tot_pages + (1 << order)) > d->max_pages) )
> > +if ( unlikely((d->tot_pages + (1 << order)) > d->max_pages) &&
> > + d->creation_finished )
> 
> This is not entirely obvious to me how this is safe. What would happen
> if d->creation_finished is set on another CPU at the same time? At least
> on Arm, this may not be seen directly.
> 
> I guess the problem would not only happen in this use case (I am more
> concerned in the physmap code), but it would be good to document how it
> is meant to be safe to use.
> 
> However, AFAIU, the only reason for the check to be here is because
> d->max_pages is set quite late. How about setting max_pages as part of
> the domain_create hypercall?

That would be useful but certainly more invasive. There's no way a guest vcpu 
can see creation_finished set to true as it is still paused. The only concern 
would be a stub domain causing domheap pages to be allocated on behalf of the 
guest, and can we not trust a stub domain until it's guest has been unpaused 
(since there is no scope for the guest to attack it until then)?

  Paul

> 
> Cheers,
> 
> --
> Julien Grall
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 1/2] docs/designs: Add a design document for non-cooperative live migration

2020-03-04 Thread Durrant, Paul
> -Original Message-
> From: Xen-devel  On Behalf Of Julien 
> Grall
> Sent: 04 March 2020 15:11
> To: Durrant, Paul ; xen-devel@lists.xenproject.org
> Cc: Stefano Stabellini ; Wei Liu ; 
> Konrad Rzeszutek Wilk
> ; George Dunlap ; Andrew 
> Cooper
> ; Ian Jackson ; Jan 
> Beulich 
> Subject: Re: [Xen-devel] [PATCH v5 1/2] docs/designs: Add a design document 
> for non-cooperative live
> migration
> 
> Hi Paul,
> 
> The proposal looks sensible to me. Some NITpicking below.
> 
> On 13/02/2020 10:53, Paul Durrant wrote:
> > It has become apparent to some large cloud providers that the current
> > model of cooperative migration of guests under Xen is not usable as it
> > relies on software running inside the guest, which is likely beyond the
> > provider's control.
> > This patch introduces a proposal for non-cooperative live migration,
> > designed not to rely on any guest-side software.
> >
> > Signed-off-by: Paul Durrant 
> > ---
> > Cc: Andrew Cooper 
> > Cc: George Dunlap 
> > Cc: Ian Jackson 
> > Cc: Jan Beulich 
> > Cc: Julien Grall 
> > Cc: Konrad Rzeszutek Wilk 
> > Cc: Stefano Stabellini 
> > Cc: Wei Liu 
> >
> > v5:
> >   - Note that PV domain are not just expected to co-operate, they are
> > required to
> >
> > v4:
> >   - Fix issues raised by Wei
> >
> > v2:
> >   - Use the term 'non-cooperative' instead of 'transparent'
> >   - Replace 'trust in' with 'reliance on' when referring to guest-side
> > software
> > ---
> >   docs/designs/non-cooperative-migration.md | 272 ++
> >   1 file changed, 272 insertions(+)
> >   create mode 100644 docs/designs/non-cooperative-migration.md
> >
> > diff --git a/docs/designs/non-cooperative-migration.md 
> > b/docs/designs/non-cooperative-migration.md
> > new file mode 100644
> > index 00..09f74c8c0d
> > --- /dev/null
> > +++ b/docs/designs/non-cooperative-migration.md
> > @@ -0,0 +1,272 @@
> > +# Non-Cooperative Migration of Guests on Xen
> > +
> > +## Background
> > +
> > +The normal model of migration in Xen is driven by the guest because it was
> > +originally implemented for PV guests, where the guest must be aware it is
> > +running under Xen and is hence expected to co-operate. This model dates 
> > from
> > +an era when it was assumed that the host administrator had control of at 
> > least
> > +the privileged software running in the guest (i.e. the guest kernel) which 
> > may
> > +still be true in an enterprise deployment but is not generally true in a 
> > cloud
> > +environment. The aim of this design is to provide a model which is purely 
> > host
> > +driven, requiring no co-operation from the software running in the
> > +guest, and is thus suitable for cloud scenarios.
> > +
> > +PV guests are out of scope for this project because, as is outlined above, 
> > they
> > +have a symbiotic relationship with the hypervisor and therefore a certain 
> > level
> > +of co-operation is required.
> > +
> > +HVM guests can already be migrated on Xen without guest co-operation but 
> > only
> > +if they don’t have PV drivers installed[1] or are in power state S3. The
> 
> S3 is very ACPI centric, so I would prefer if we avoid the term. I think
> the non-ACPI description is "suspend to RAM". I would be OK is you
> mention S3 in parenthesis.

I'm actually pulling this from the way the code is currently written, which is 
clearly quite x86 specific:

xc_hvm_param_get(CTX->xch, domid, HVM_PARAM_ACPI_S_STATE, _s_state)
.
.
.
if (dsps->type == LIBXL_DOMAIN_TYPE_HVM && (!hvm_pvdrv || hvm_s_state)) {
LOGD(DEBUG, domid, "Calling xc_domain_shutdown on HVM domain");
ret = xc_domain_shutdown(CTX->xch, domid, SHUTDOWN_suspend);
.
.
}

So actually I should say 'not in power state S0'.

> 
> > +reason for not expecting co-operation if the guest is in S3 is obvious, 
> > but the
> > +reason co-operation is expected if PV drivers are installed is due to the
> > +nature of PV protocols.
> > +
> > +## Xenstore Nodes and Domain ID
> > +
> > +The PV driver model consists of a *frontend* and a *backend*. The frontend 
> > runs
> > +inside the guest domain and the backend runs inside a *service domain* 
> > which
> > +may or may not be domain 0. The frontend and backend typically pass data 
> > via
> > +memory pages which are shared between the two domains, but this channel of
> > +communication is gene

Re: [Xen-devel] [PATCH v5 1/2] docs/designs: Add a design document for non-cooperative live migration

2020-03-04 Thread Durrant, Paul
> -Original Message-
> >>> +HVM guests can already be migrated on Xen without guest co-operation but 
> >>> only
> >>> +if they don’t have PV drivers installed[1] or are in power state S3. The
> >>
> >> S3 is very ACPI centric, so I would prefer if we avoid the term. I think
> >> the non-ACPI description is "suspend to RAM". I would be OK is you
> >> mention S3 in parenthesis.
> >
> > I'm actually pulling this from the way the code is currently written, which 
> > is clearly quite x86
> specific:
> >
> > xc_hvm_param_get(CTX->xch, domid, HVM_PARAM_ACPI_S_STATE, _s_state)
> > .
> > .
> > .
> > if (dsps->type == LIBXL_DOMAIN_TYPE_HVM && (!hvm_pvdrv || hvm_s_state)) {
> >  LOGD(DEBUG, domid, "Calling xc_domain_shutdown on HVM domain");
> >  ret = xc_domain_shutdown(CTX->xch, domid, SHUTDOWN_suspend);
> >  .
> >  .
> > }
> >
> > So actually I should say 'not in power state S0'.
> 
> I understand that the current code is x86 specific. Arm would likely
> have a similar requirement although not based on ACPI.
> 
> However, my point here is nothing in the document says it is focusing on
> x86 only. The concept itself is not arch specific, the document is
> mostly x86 free except in a couple of bits. So I would like them to be
> rewritten in an arch-agnostic way.
> 
> Note that I am ok with arch-specific example.
> 

Sure. I'll try not to be x86 specific where it's not necessary.

  Paul
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] x86/cpuid: Untangle Invariant TSC handling

2020-03-05 Thread Durrant, Paul
> -Original Message-
> From: Andrew Cooper 
> Sent: 04 March 2020 17:31
> To: Durrant, Paul ; Xen-devel 
> 
> Cc: Wei Liu ; Jan Beulich ; Anthony PERARD 
> ;
> Ian Jackson ; Roger Pau Monné 
> Subject: RE: [EXTERNAL][Xen-devel] [PATCH] x86/cpuid: Untangle Invariant TSC 
> handling
> 
> CAUTION: This email originated from outside of the organization. Do not click 
> links or open
> attachments unless you can confirm the sender and know the content is safe.
> 
> 
> 
> On 04/03/2020 09:33, Durrant, Paul wrote:
> >> -Original Message-
> >> From: Xen-devel  On Behalf Of 
> >> Andrew Cooper
> >> Sent: 03 March 2020 18:25
> >> To: Xen-devel 
> >> Cc: Wei Liu ; Andrew Cooper ; Jan 
> >> Beulich
> ;
> >> Anthony PERARD ; Ian Jackson 
> >> ; Roger Pau Monné
> >> 
> >> Subject: [Xen-devel] [PATCH] x86/cpuid: Untangle Invariant TSC handling
> >>
> >> ITSC being visible to the guest is currently implicit with the toolstack
> >> unconditionally asking for it, and Xen clipping it based on the vTSC and/or
> >> XEN_DOMCTL_disable_migrate settings.
> >>
> >> This is problematic for several reasons.
> >>
> >> First, the implicit vTSC behaviour manifests as a real bug on migration to 
> >> a
> >> host with a different frequency, with ITSC but without TSC scaling
> >> capabilities, whereby the ITSC feature becomes advertised to the guest.  
> >> ITSC
> >> will disappear again if the guest migrates to server with the same 
> >> frequency
> >> as the original, or to one with TSC scaling support.
> >>
> >> Secondly, disallowing ITSC unless the guest doesn't migrate is conceptually
> >> wrong.  It is common to have migration pools of identical hardware, at 
> >> which
> >> point the TSC frequency is the same, and more modern hardware has TSC 
> >> scaling
> >> support anyway.  In both cases, it is safe to advertise ITSC and migrate 
> >> the
> >> guest.
> >>
> >> Remove all implicit logic logic in Xen, and make ITSC part of the max CPUID
> > One too many 'logic's there.
> 
> Oops.
> 
> >
> >> policies for guests.  Plumb an itsc parameter into xc_cpuid_apply_policy() 
> >> and
> >> have libxl__cpuid_legacy() fill in the two cases where it can reasonably
> >> expect ITSC to be safe for the guest to see.
> >>
> >> This is a behaviour change for TSC_MODE_NATIVE, where the ITSC will now
> >> reliably not appear, and for the case where the user explicitly requests 
> >> ITSC,
> >> in which case it will appear even if the guest isn't marked as nomigrate.
> >>
> > Does this mean a guest that would have seen ITSC on 4.13 may now no longer 
> > see it in 4.14 or is the
> TSC_MODE_NATIVE case just the one where the feature may erroneously appear 
> after migration?
> 
> In general, guests don't get to see ITSC at all, even before this
> change.  This is something which needs working on, but it is only a
> tractable problem in a multi-host toolstack.
> 
> After this change, the TSC_MODE_NATIVE case will now not see a
> metastable ITSC feature depending on the properties of the host it
> happens to be on.  It will default to consistently 0, unless overridden
> by the toolstack.

Ok, as long guests running on an older Xen won't see a stable ITSC disappear 
when moved to a newer Xen then there should be no problem here.

  Paul

> 
> ~Andrew
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 2/2] docs/designs: Add a design document for migration of xenstore data

2020-03-05 Thread Durrant, Paul
> -Original Message-
> From: Julien Grall 
> Sent: 04 March 2020 18:32
> To: Durrant, Paul ; xen-devel@lists.xenproject.org
> Cc: Andrew Cooper ; George Dunlap 
> ; Ian
> Jackson ; Jan Beulich ; Konrad 
> Rzeszutek Wilk
> ; Stefano Stabellini ; Wei 
> Liu 
> Subject: RE: [EXTERNAL][PATCH v5 2/2] docs/designs: Add a design document for 
> migration of xenstore
> data
> 
> CAUTION: This email originated from outside of the organization. Do not click 
> links or open
> attachments unless you can confirm the sender and know the content is safe.
> 
> 
> 
> Hi Paul,
> 
> On 13/02/2020 10:53, Paul Durrant wrote:
> > This patch details proposes extra migration data and xenstore protocol
> > extensions to support non-cooperative live migration of guests.
> >
> > Signed-off-by: Paul Durrant 
> > ---
> > Cc: Andrew Cooper 
> > Cc: George Dunlap 
> > Cc: Ian Jackson 
> > Cc: Jan Beulich 
> > Cc: Julien Grall 
> > Cc: Konrad Rzeszutek Wilk 
> > Cc: Stefano Stabellini 
> > Cc: Wei Liu 
> >
> > v5:
> >   - Add QUIESCE
> >   - Make semantics of  in GET_DOMAIN_WATCHES more clear
> >
> > v4:
> >   - Drop the restrictions on special paths
> >
> > v3:
> >   - New in v3
> > ---
> >   docs/designs/xenstore-migration.md | 136 +
> >   1 file changed, 136 insertions(+)
> >   create mode 100644 docs/designs/xenstore-migration.md
> >
> > diff --git a/docs/designs/xenstore-migration.md 
> > b/docs/designs/xenstore-migration.md
> > new file mode 100644
> > index 00..5cfe2d9a7d
> > --- /dev/null
> > +++ b/docs/designs/xenstore-migration.md
> > @@ -0,0 +1,136 @@
> > +# Xenstore Migration
> > +
> > +## Background
> > +
> > +The design for *Non-Cooperative Migration of Guests*[1] explains that extra
> > +save records are required in the migrations stream to allow a guest running
> > +PV drivers to be migrated without its co-operation. Moreover the save
> > +records must include details of registered xenstore watches as well as
> > +content; information that cannot currently be recovered from `xenstored`,
> > +and hence some extension to the xenstore protocol[2] will also be required.
> > +
> > +The *libxenlight Domain Image Format* specification[3] already defines a
> > +record type `EMULATOR_XENSTORE_DATA` but this is not suitable for
> > +transferring xenstore data pertaining to the domain directly as it is
> > +specified such that keys are relative to the path
> > +`/local/domain/$dm_domid/device-model/$domid`. Thus it is necessary to
> > +define at least one new save record type.
> > +
> > +## Proposal
> > +
> > +### New Save Record
> > +
> > +A new mandatory record type should be defined within the libxenlight Domain
> > +Image Format:
> > +
> > +`0x0007: DOMAIN_XENSTORE_DATA`
> > +
> > +The format of each of these new records should be as follows:
> > +
> > +
> > +```
> > +0 1 2 3 4 5 6 7 octet
> > ++++
> > +| type   | record specific data   |
> > +++|
> > +...
> > ++-+
> > +```
> > +
> > +
> > +| Field | Description |
> > +|---|---|
> 
> Did you indend to add more - so | is on the same column as the onter lines?
> 

Yep, cut'n'paste error.

> > +| `type` | 0x: invalid |
> > +|| 0x0001: node data |
> > +|| 0x0002: watch data |
> 
> Should not the last | be some of the columns on all the lines?
> 
> > +|| 0x0003 - 0x: reserved for future use |
> 
> Looking at the spec, the command TRANSACTION_END *must* be used with an
> existing transaction. As a guest would be migrate to a new domain, the
> transaction ID would now be invalid.
> 
> I understand that xenstored is able to cope with it, but such behavior
> is not described in the spec. So I am not sure we can expect a guest to
> cope with an error value other than the ones described for the command.
> 

And (as we discussed offline) there would be an issue if the migrated guest 
started a new transaction before completing one that was started pre-migration, 
as the ids may clash. So, we are going to need a record to transfer open 
transaction ids so that we can reserve them in the receiving xenstored.

> > +
> > +
> > +where data is always in the form of a NUL separated and termin

Re: [Xen-devel] [PATCH v5 1/4] x86/HVM: cancel emulation when register state got altered

2020-03-03 Thread Durrant, Paul
> -Original Message-
> From: Jan Beulich 
> Sent: 03 March 2020 14:57
> To: Durrant, Paul 
> Cc: xen-devel@lists.xenproject.org; Andrew Cooper 
> ; Roger Pau Monné
> ; Wei Liu ; Paul Durrant 
> Subject: RE: [EXTERNAL][PATCH v5 1/4] x86/HVM: cancel emulation when register 
> state got altered
> 
> CAUTION: This email originated from outside of the organization. Do not click 
> links or open
> attachments unless you can confirm the sender and know the content is safe.
> 
> 
> 
> On 03.03.2020 15:44, Durrant, Paul wrote:
> >> -Original Message-
> >> From: Jan Beulich 
> >> Sent: 03 March 2020 14:34
> >> To: Durrant, Paul 
> >> Cc: xen-devel@lists.xenproject.org; Andrew Cooper
> >> ; Roger Pau Monné ; Wei
> >> Liu ; Paul Durrant 
> >> Subject: RE: [EXTERNAL][PATCH v5 1/4] x86/HVM: cancel emulation when
> >> register state got altered
> >>
> >> CAUTION: This email originated from outside of the organization. Do not
> >> click links or open attachments unless you can confirm the sender and know
> >> the content is safe.
> >>
> >>
> >>
> >> On 03.03.2020 15:25, Durrant, Paul wrote:
> >>>> -Original Message-
> >>>> From: Jan Beulich 
> >>>> Sent: 03 March 2020 14:21
> >>>> To: Durrant, Paul 
> >>>> Cc: xen-devel@lists.xenproject.org; Andrew Cooper
> >>>> ; Roger Pau Monné ;
> >> Wei
> >>>> Liu ; Paul Durrant 
> >>>> Subject: RE: [EXTERNAL][Xen-devel] [PATCH v5 1/4] x86/HVM: cancel
> >>>> emulation when register state got altered
> >>>>
> >>>> On 03.03.2020 14:16, Durrant, Paul wrote:
> >>>>>> -Original Message-
> >>>>>> From: Xen-devel  On Behalf Of
> >>>> Jan
> >>>>>> Beulich
> >>>>>> Sent: 03 March 2020 10:17
> >>>>>> To: xen-devel@lists.xenproject.org
> >>>>>> Cc: Andrew Cooper ; Roger Pau Monné
> >>>>>> ; Wei Liu ; Paul Durrant
> >>>> 
> >>>>>> Subject: [EXTERNAL][Xen-devel] [PATCH v5 1/4] x86/HVM: cancel
> >> emulation
> >>>>>> when register state got altered
> >>>>>>
> >>>>>> Re-execution (after having received data from a device model) relies
> >> on
> >>>>>> the same register state still being in place as it was when the
> >> request
> >>>>>> was first sent to the device model. Therefore vCPU state changes
> >>>>>> effected by remote sources need to result in no attempt of re-
> >>>> execution.
> >>>>>> Instead the returned data is to simply be ignored.
> >>>>>>
> >>>>>> Note that any such asynchronous state changes happen with the vCPU at
> >>>>>> least paused (potentially down and/or not marked ->is_initialised),
> >> so
> >>>>>> there's no issue with fiddling with register state behind the
> >> actively
> >>>>>> running emulator's back. Hence the new function doesn't need to
> >>>>>> synchronize with the core emulation logic.
> >>>>>>
> >>>>>> Suggested-by: Andrew Cooper 
> >>>>>> Signed-off-by: Jan Beulich 
> >>>>>
> >>>>> Need we be concerned with any page-split I/O here? That may manifest
> >> as
> >>>>> two separate emulations and AFAICT it would be possible for only the
> >>>>> second part to be aborted by this change.
> >>>>
> >>>> I'm not sure whether e.g. INIT is recognized only on insn boundaries.
> >>>> I.e. this may not be that different from real hardware behavior. _If_
> >>>> we were to take care of this, how would you envision undoing the
> >>>> first part of such an access, most notably when the access has side
> >>>> effects?
> >>>
> >>> I wasn't thinking of undoing... I was more thinking that vcpu_pause()
> >>> ought to defer until an in-progress emulation has fully completed.
> >>
> >> Hmm, at the first glance this looks ugly/fragile to arrange for. I'm
> >> having a hard time thinking of a rough sketch of how such could be
> >> made work, in particular without blocking the vcpu_pause() itself
> >> for too long.
> >>
> >
> > If the vcpu is at the mercy of an external emulator it could take a
> > while. I can't really think of a way to avoid that though. Maybe
> > pausing at a non-architectural boundary is ok here though.
> 
> Well, at the very least I'd call it good enough until we can think
> of a sensible way to deal with this.
> 

Ok. You can have my R-b on this one then.

  Paul

> Jan
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v6 2/6] x86/paging: add TLB flush hooks

2020-03-03 Thread Durrant, Paul
> -Original Message-
> From: Xen-devel  On Behalf Of Roger 
> Pau Monne
> Sent: 03 March 2020 17:21
> To: xen-devel@lists.xenproject.org
> Cc: Wei Liu ; Andrew Cooper ; 
> Durrant, Paul
> ; Tim Deegan ; George Dunlap 
> ; Jan
> Beulich ; Roger Pau Monne 
> Subject: [Xen-devel] [PATCH v6 2/6] x86/paging: add TLB flush hooks
> 
> Add shadow and hap implementation specific helpers to perform guest
> TLB flushes. Note that the code for both is exactly the same at the
> moment, and is copied from hvm_flush_vcpu_tlb. This will be changed by
> further patches that will add implementation specific optimizations to
> them.
> 
> No functional change intended.
> 
> Signed-off-by: Roger Pau Monné 

Viridian part...

Reviewed-by: Paul Durrant 

> Reviewed-by: Wei Liu 
> Acked-by: Tim Deegan 
> ---
> Changes since v5:
>  - Make the flush tlb operation a paging_mode hook.
> 
> Changes since v3:
>  - Fix stray newline removal.
>  - Fix return of shadow_flush_tlb dummy function.
> ---
>  xen/arch/x86/hvm/hvm.c   | 56 +--
>  xen/arch/x86/hvm/viridian/viridian.c |  2 +-
>  xen/arch/x86/mm/hap/hap.c| 58 
>  xen/arch/x86/mm/shadow/common.c  | 55 ++
>  xen/arch/x86/mm/shadow/multi.c   |  1 +
>  xen/arch/x86/mm/shadow/private.h |  4 ++
>  xen/include/asm-x86/hvm/hvm.h|  3 --
>  xen/include/asm-x86/paging.h | 10 +
>  8 files changed, 130 insertions(+), 59 deletions(-)
> 
> diff --git a/xen/arch/x86/hvm/hvm.c b/xen/arch/x86/hvm/hvm.c
> index db5d7b4d30..a2abad9f76 100644
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -3988,60 +3988,6 @@ static void hvm_s3_resume(struct domain *d)
>  }
>  }
> 
> -bool hvm_flush_vcpu_tlb(bool (*flush_vcpu)(void *ctxt, struct vcpu *v),
> -void *ctxt)
> -{
> -static DEFINE_PER_CPU(cpumask_t, flush_cpumask);
> -cpumask_t *mask = _cpu(flush_cpumask);
> -struct domain *d = current->domain;
> -struct vcpu *v;
> -
> -/* Avoid deadlock if more than one vcpu tries this at the same time. */
> -if ( !spin_trylock(>hypercall_deadlock_mutex) )
> -return false;
> -
> -/* Pause all other vcpus. */
> -for_each_vcpu ( d, v )
> -if ( v != current && flush_vcpu(ctxt, v) )
> -vcpu_pause_nosync(v);
> -
> -/* Now that all VCPUs are signalled to deschedule, we wait... */
> -for_each_vcpu ( d, v )
> -if ( v != current && flush_vcpu(ctxt, v) )
> -while ( !vcpu_runnable(v) && v->is_running )
> -cpu_relax();
> -
> -/* All other vcpus are paused, safe to unlock now. */
> -spin_unlock(>hypercall_deadlock_mutex);
> -
> -cpumask_clear(mask);
> -
> -/* Flush paging-mode soft state (e.g., va->gfn cache; PAE PDPE cache). */
> -for_each_vcpu ( d, v )
> -{
> -unsigned int cpu;
> -
> -if ( !flush_vcpu(ctxt, v) )
> -continue;
> -
> -paging_update_cr3(v, false);
> -
> -cpu = read_atomic(>dirty_cpu);
> -if ( is_vcpu_dirty_cpu(cpu) )
> -__cpumask_set_cpu(cpu, mask);
> -}
> -
> -/* Flush TLBs on all CPUs with dirty vcpu state. */
> -flush_tlb_mask(mask);
> -
> -/* Done. */
> -for_each_vcpu ( d, v )
> -if ( v != current && flush_vcpu(ctxt, v) )
> -vcpu_unpause(v);
> -
> -return true;
> -}
> -
>  static bool always_flush(void *ctxt, struct vcpu *v)
>  {
>  return true;
> @@ -4052,7 +3998,7 @@ static int hvmop_flush_tlb_all(void)
>  if ( !is_hvm_domain(current->domain) )
>  return -EINVAL;
> 
> -return hvm_flush_vcpu_tlb(always_flush, NULL) ? 0 : -ERESTART;
> +return paging_flush_tlb(always_flush, NULL) ? 0 : -ERESTART;
>  }
> 
>  static int hvmop_set_evtchn_upcall_vector(
> diff --git a/xen/arch/x86/hvm/viridian/viridian.c 
> b/xen/arch/x86/hvm/viridian/viridian.c
> index cd8f210198..977c1bc54f 100644
> --- a/xen/arch/x86/hvm/viridian/viridian.c
> +++ b/xen/arch/x86/hvm/viridian/viridian.c
> @@ -609,7 +609,7 @@ int viridian_hypercall(struct cpu_user_regs *regs)
>   * A false return means that another vcpu is currently trying
>   * a similar operation, so back off.
>   */
> -if ( !hvm_flush_vcpu_tlb(need_flush, _params.vcpu_mask) )
> +if ( !paging_flush_tlb(need_flush, _params.vcpu_mask) )
>  return HVM_HCALL_preempted;
> 
>  output.rep_complete = input.rep_count;
> diff --git a/xen/arch/x86/mm/ha

Re: [Xen-devel] [PATCH] MAINTAINERS: Paul to co-maintain vendor-independent IOMMU code

2020-03-03 Thread Durrant, Paul
> -Original Message-
> From: Xen-devel  On Behalf Of
> Durrant, Paul
> Sent: 03 March 2020 12:13
> To: Jan Beulich ; xen-devel@lists.xenproject.org
> Cc: Stefano Stabellini ; Julien Grall
> ; Wei Liu ; Konrad Wilk
> ; George Dunlap ;
> Andrew Cooper ; Paul Durrant ;
> Ian Jackson 
> Subject: Re: [Xen-devel] [PATCH] MAINTAINERS: Paul to co-maintain vendor-
> independent IOMMU code
> 
> > -Original Message-
> [snip]
> >
> > Having just a single maintainer is not helpful anywhere, and can be
> > avoided here quite easily, seeing that Paul has been doing quite a bit
> > of IOMMU work lately.
> >
> > Signed-off-by: Jan Beulich 
> 
> Reviewed-by: Paul Durrant 
> 
> >
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -323,6 +323,7 @@ F:  xen/arch/x86/cpu/vpmu_intel.c
> >
> >  IOMMU VENDOR INDEPENDENT CODE
> >  M: Jan Beulich 
> > +M: Paul Durrant 

Actually, could my address here be set to pdurr...@amzn.com here please? I'll 
send a patch for my other entries now.

  Thanks,

Paul

> >  S: Supported
> >  F: xen/drivers/passthrough/
> >  X: xen/drivers/passthrough/amd/
> >
> > ___
> > Xen-devel mailing list
> > Xen-devel@lists.xenproject.org
> > https://lists.xenproject.org/mailman/listinfo/xen-devel
> ___
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 1/4] x86/HVM: cancel emulation when register state got altered

2020-03-03 Thread Durrant, Paul
> -Original Message-
> From: Jan Beulich 
> Sent: 03 March 2020 14:21
> To: Durrant, Paul 
> Cc: xen-devel@lists.xenproject.org; Andrew Cooper
> ; Roger Pau Monné ; Wei
> Liu ; Paul Durrant 
> Subject: RE: [EXTERNAL][Xen-devel] [PATCH v5 1/4] x86/HVM: cancel
> emulation when register state got altered
> 
> On 03.03.2020 14:16, Durrant, Paul wrote:
> >> -Original Message-
> >> From: Xen-devel  On Behalf Of
> Jan
> >> Beulich
> >> Sent: 03 March 2020 10:17
> >> To: xen-devel@lists.xenproject.org
> >> Cc: Andrew Cooper ; Roger Pau Monné
> >> ; Wei Liu ; Paul Durrant
> 
> >> Subject: [EXTERNAL][Xen-devel] [PATCH v5 1/4] x86/HVM: cancel emulation
> >> when register state got altered
> >>
> >> Re-execution (after having received data from a device model) relies on
> >> the same register state still being in place as it was when the request
> >> was first sent to the device model. Therefore vCPU state changes
> >> effected by remote sources need to result in no attempt of re-
> execution.
> >> Instead the returned data is to simply be ignored.
> >>
> >> Note that any such asynchronous state changes happen with the vCPU at
> >> least paused (potentially down and/or not marked ->is_initialised), so
> >> there's no issue with fiddling with register state behind the actively
> >> running emulator's back. Hence the new function doesn't need to
> >> synchronize with the core emulation logic.
> >>
> >> Suggested-by: Andrew Cooper 
> >> Signed-off-by: Jan Beulich 
> >
> > Need we be concerned with any page-split I/O here? That may manifest as
> > two separate emulations and AFAICT it would be possible for only the
> > second part to be aborted by this change.
> 
> I'm not sure whether e.g. INIT is recognized only on insn boundaries.
> I.e. this may not be that different from real hardware behavior. _If_
> we were to take care of this, how would you envision undoing the
> first part of such an access, most notably when the access has side
> effects?

I wasn't thinking of undoing... I was more thinking that vcpu_pause() ought to 
defer until an in-progress emulation has fully completed.

  Paul

> 
> Jan
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 3/4] x86/mm: use cache in guest_walk_tables()

2020-03-03 Thread Durrant, Paul
> -Original Message-
> From: Xen-devel  On Behalf Of Jan 
> Beulich
> Sent: 03 March 2020 10:19
> To: xen-devel@lists.xenproject.org
> Cc: Andrew Cooper ; Roger Pau Monné 
> ; Tim Deegan
> ; Wei Liu ; Paul Durrant 
> Subject: [EXTERNAL][Xen-devel] [PATCH v5 3/4] x86/mm: use cache in 
> guest_walk_tables()
> 
> CAUTION: This email originated from outside of the organization. Do not click 
> links or open
> attachments unless you can confirm the sender and know the content is safe.
> 
> 
> 
> Emulation requiring device model assistance uses a form of instruction
> re-execution, assuming that the second (and any further) pass takes
> exactly the same path. This is a valid assumption as far as use of CPU
> registers goes (as those can't change without any other instruction
> executing in between [1]), but is wrong for memory accesses. In
> particular it has been observed that Windows might page out buffers
> underneath an instruction currently under emulation (hitting between two
> passes). If the first pass translated a linear address successfully, any
> subsequent pass needs to do so too, yielding the exact same translation.
> To guarantee this, leverage the caching that now backs HVM insn
> emulation.
> 
> [1] Other than on actual hardware, actions like
> XEN_DOMCTL_sethvmcontext, XEN_DOMCTL_setvcpucontext,
> VCPUOP_initialise, INIT, or SIPI issued against the vCPU can occur
> while the vCPU is blocked waiting for a device model to return data.
> In such cases emulation now gets canceled, though, and hence re-
> execution correctness is unaffected.
> 
> Signed-off-by: Jan Beulich 

Reviewed-by: Paul Durrant 
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 4/4] x86/HVM: __hvm_copy()'s size parameter is an unsigned quantity

2020-03-03 Thread Durrant, Paul
> -Original Message-
> From: Xen-devel  On Behalf Of Jan 
> Beulich
> Sent: 03 March 2020 10:20
> To: xen-devel@lists.xenproject.org
> Cc: Andrew Cooper ; Roger Pau Monné 
> ; Wei Liu
> ; Paul Durrant 
> Subject: [EXTERNAL][Xen-devel] [PATCH v5 4/4] x86/HVM: __hvm_copy()'s size 
> parameter is an unsigned
> quantity
> 
> CAUTION: This email originated from outside of the organization. Do not click 
> links or open
> attachments unless you can confirm the sender and know the content is safe.
> 
> 
> 
> There are no negative sizes. Make the function's parameter as well as
> that of its derivates "unsigned int". Similarly make its local "count"
> variable "unsigned int", and drop "todo" altogether. Don't use min_t()
> anymore to calculate "count". Restrict its scope as well as that of
> other local variables of the function.
> 
> While at it I've also noticed that {copy_{from,to},clear}_user_hvm()
> have been returning "unsigned long" for no apparent reason, as their
> respective "size" parameters have already been "unsigned int". Adjust
> this as well as a slightly wrong comment there at the same time.
> 
> Signed-off-by: Jan Beulich 

Reviewed-by: Paul Durrant 

> ---
> v5: New.
> 
> --- a/xen/arch/x86/hvm/hvm.c
> +++ b/xen/arch/x86/hvm/hvm.c
> @@ -3249,14 +3249,9 @@ enum hvm_translation_result hvm_translat
>  #define HVMCOPY_phys   (0u<<2)
>  #define HVMCOPY_linear (1u<<2)
>  static enum hvm_translation_result __hvm_copy(
> -void *buf, paddr_t addr, int size, struct vcpu *v, unsigned int flags,
> +void *buf, paddr_t addr, unsigned int size, struct vcpu *v, unsigned int 
> flags,
>  uint32_t pfec, pagefault_info_t *pfinfo)
>  {
> -gfn_t gfn;
> -struct page_info *page;
> -p2m_type_t p2mt;
> -int count, todo = size;
> -
>  ASSERT(is_hvm_vcpu(v));
> 
>  /*
> @@ -3275,12 +3270,14 @@ static enum hvm_translation_result __hvm
>  return HVMTRANS_unhandleable;
>  #endif
> 
> -while ( todo > 0 )
> +while ( size > 0 )

'while ( size )'

ought to do.

>  {
> +struct page_info *page;
> +gfn_t gfn;
> +p2m_type_t p2mt;
>  enum hvm_translation_result res;
>  unsigned int pgoff = addr & ~PAGE_MASK;
> -
> -count = min_t(int, PAGE_SIZE - pgoff, todo);
> +unsigned int count = min((unsigned int)PAGE_SIZE - pgoff, size);
> 
>  res = hvm_translate_get_page(v, addr, flags & HVMCOPY_linear,
>   pfec, pfinfo, , , );
> @@ -3336,7 +,7 @@ static enum hvm_translation_result __hvm
>  addr += count;
>  if ( buf )
>  buf += count;
> -todo -= count;
> +size -= count;
>  put_page(page);
>  }
> 
> @@ -3344,21 +3341,21 @@ static enum hvm_translation_result __hvm
>  }
> 
>  enum hvm_translation_result hvm_copy_to_guest_phys(
> -paddr_t paddr, void *buf, int size, struct vcpu *v)
> +paddr_t paddr, void *buf, unsigned int size, struct vcpu *v)
>  {
>  return __hvm_copy(buf, paddr, size, v,
>HVMCOPY_to_guest | HVMCOPY_phys, 0, NULL);
>  }
> 
>  enum hvm_translation_result hvm_copy_from_guest_phys(
> -void *buf, paddr_t paddr, int size)
> +void *buf, paddr_t paddr, unsigned int size)
>  {
>  return __hvm_copy(buf, paddr, size, current,
>HVMCOPY_from_guest | HVMCOPY_phys, 0, NULL);
>  }
> 
>  enum hvm_translation_result hvm_copy_to_guest_linear(
> -unsigned long addr, void *buf, int size, uint32_t pfec,
> +unsigned long addr, void *buf, unsigned int size, uint32_t pfec,
>  pagefault_info_t *pfinfo)
>  {
>  return __hvm_copy(buf, addr, size, current,
> @@ -3367,7 +3364,7 @@ enum hvm_translation_result hvm_copy_to_
>  }
> 
>  enum hvm_translation_result hvm_copy_from_guest_linear(
> -void *buf, unsigned long addr, int size, uint32_t pfec,
> +void *buf, unsigned long addr, unsigned int size, uint32_t pfec,
>  pagefault_info_t *pfinfo)
>  {
>  return __hvm_copy(buf, addr, size, current,
> @@ -3375,7 +3372,7 @@ enum hvm_translation_result hvm_copy_fro
>PFEC_page_present | pfec, pfinfo);
>  }
> 
> -unsigned long copy_to_user_hvm(void *to, const void *from, unsigned int len)
> +unsigned int copy_to_user_hvm(void *to, const void *from, unsigned int len)
>  {
>  int rc;
> 
> @@ -3389,7 +3386,7 @@ unsigned long copy_to_user_hvm(void *to,
>  return rc ? len : 0; /* fake a copy_to_user() return code */
>  }
> 
> -unsigned long clear_user_hvm(void *to, unsigned int len)
> +unsigned int clear_user_hvm(void *to, unsigned int len)
>  {
>  int rc;
> 
> @@ -3400,10 +3397,11 @@ unsigned long clear_user_hvm(void *to, u
>  }
> 
>  rc = hvm_copy_to_guest_linear((unsigned long)to, NULL, len, 0, NULL);
> -return rc ? len : 0; /* fake a copy_to_user() return code */
> +
> +return rc ? len : 0; /* fake a clear_user() return code */
>  }
> 
> -unsigned long copy_from_user_hvm(void *to, const 

Re: [Xen-devel] [PATCH v5 1/4] x86/HVM: cancel emulation when register state got altered

2020-03-03 Thread Durrant, Paul
> -Original Message-
> From: Xen-devel  On Behalf Of Jan
> Beulich
> Sent: 03 March 2020 10:17
> To: xen-devel@lists.xenproject.org
> Cc: Andrew Cooper ; Roger Pau Monné
> ; Wei Liu ; Paul Durrant 
> Subject: [EXTERNAL][Xen-devel] [PATCH v5 1/4] x86/HVM: cancel emulation
> when register state got altered
> 
> Re-execution (after having received data from a device model) relies on
> the same register state still being in place as it was when the request
> was first sent to the device model. Therefore vCPU state changes
> effected by remote sources need to result in no attempt of re-execution.
> Instead the returned data is to simply be ignored.
> 
> Note that any such asynchronous state changes happen with the vCPU at
> least paused (potentially down and/or not marked ->is_initialised), so
> there's no issue with fiddling with register state behind the actively
> running emulator's back. Hence the new function doesn't need to
> synchronize with the core emulation logic.
> 
> Suggested-by: Andrew Cooper 
> Signed-off-by: Jan Beulich 

Need we be concerned with any page-split I/O here? That may manifest as two 
separate emulations and AFAICT it would be possible for only the second part to 
be aborted by this change.

  Paul
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 2/4] x86/HVM: implement memory read caching for insn emulation

2020-03-03 Thread Durrant, Paul
> -Original Message-
> From: Jan Beulich 
> Sent: 03 March 2020 15:24
> To: Durrant, Paul 
> Cc: xen-devel@lists.xenproject.org; Kevin Tian ; Wei 
> Liu ; Paul
> Durrant ; Andrew Cooper ; Jun 
> Nakajima
> ; Roger Pau Monné 
> Subject: RE: [EXTERNAL][PATCH v5 2/4] x86/HVM: implement memory read caching 
> for insn emulation
> 
> CAUTION: This email originated from outside of the organization. Do not click 
> links or open
> attachments unless you can confirm the sender and know the content is safe.
> 
> 
> 
> On 03.03.2020 16:16, Durrant, Paul wrote:
> >> From: Xen-devel  On Behalf Of Jan 
> >> Beulich
> >> Sent: 03 March 2020 10:17
> >>
> >> --- a/xen/arch/x86/hvm/emulate.c
> >> +++ b/xen/arch/x86/hvm/emulate.c
> >> @@ -28,6 +28,19 @@
> >>  #include 
> >>  #include 
> >>
> >> +struct hvmemul_cache
> >> +{
> >> +/* The cache is disabled as long as num_ents > max_ents. */
> >> +unsigned int num_ents;
> >> +unsigned int max_ents;
> >> +struct {
> >> +paddr_t gpa:PADDR_BITS;
> >> +unsigned int :BITS_PER_LONG - PADDR_BITS - 8;
> >
> > Is clang ok with unnamed fields?
> 
> Clang 5 at least is, and iirc this is a standard C feature.
> 

Google didn't give me a conclusive answer which was I asked. I've not found 
anything that says it's a gcc-ism though.

> >> +void hvmemul_cache_restore(struct vcpu *v, unsigned int token)
> >> +{
> >> +struct hvmemul_cache *cache = v->arch.hvm.hvm_io.cache;
> >> +
> >> +ASSERT(cache->num_ents > cache->max_ents);
> >> +cache->num_ents = token;
> >
> > ASSERT(token <= cache->max_ents) here?
> 
> Hmm, not sure. Definitely not exactly as you say, as disabling
> in already disabled state would return max_ents + 1, and hence
> this value could also be fed back here.

Ok, I can see keeping that idempotent is useful.

> But even beyond that I
> don't see a need to restrict the value range here - anything
> larger than max_ents will simply result in the cache being
> disabled.
> 

Fair enough.

Reviewed-by: Paul Durrant 

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] MAINTAINERS: Paul to co-maintain vendor-independent IOMMU code

2020-03-03 Thread Durrant, Paul
> -Original Message-
[snip]
> 
> Having just a single maintainer is not helpful anywhere, and can be
> avoided here quite easily, seeing that Paul has been doing quite a bit
> of IOMMU work lately.
> 
> Signed-off-by: Jan Beulich 

Reviewed-by: Paul Durrant 

> 
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -323,6 +323,7 @@ F:  xen/arch/x86/cpu/vpmu_intel.c
> 
>  IOMMU VENDOR INDEPENDENT CODE
>  M: Jan Beulich 
> +M: Paul Durrant 
>  S: Supported
>  F: xen/drivers/passthrough/
>  X: xen/drivers/passthrough/amd/
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 1/4] x86/HVM: cancel emulation when register state got altered

2020-03-03 Thread Durrant, Paul
> -Original Message-
> From: Jan Beulich 
> Sent: 03 March 2020 14:34
> To: Durrant, Paul 
> Cc: xen-devel@lists.xenproject.org; Andrew Cooper
> ; Roger Pau Monné ; Wei
> Liu ; Paul Durrant 
> Subject: RE: [EXTERNAL][PATCH v5 1/4] x86/HVM: cancel emulation when
> register state got altered
> 
> CAUTION: This email originated from outside of the organization. Do not
> click links or open attachments unless you can confirm the sender and know
> the content is safe.
> 
> 
> 
> On 03.03.2020 15:25, Durrant, Paul wrote:
> >> -Original Message-
> >> From: Jan Beulich 
> >> Sent: 03 March 2020 14:21
> >> To: Durrant, Paul 
> >> Cc: xen-devel@lists.xenproject.org; Andrew Cooper
> >> ; Roger Pau Monné ;
> Wei
> >> Liu ; Paul Durrant 
> >> Subject: RE: [EXTERNAL][Xen-devel] [PATCH v5 1/4] x86/HVM: cancel
> >> emulation when register state got altered
> >>
> >> On 03.03.2020 14:16, Durrant, Paul wrote:
> >>>> -Original Message-
> >>>> From: Xen-devel  On Behalf Of
> >> Jan
> >>>> Beulich
> >>>> Sent: 03 March 2020 10:17
> >>>> To: xen-devel@lists.xenproject.org
> >>>> Cc: Andrew Cooper ; Roger Pau Monné
> >>>> ; Wei Liu ; Paul Durrant
> >> 
> >>>> Subject: [EXTERNAL][Xen-devel] [PATCH v5 1/4] x86/HVM: cancel
> emulation
> >>>> when register state got altered
> >>>>
> >>>> Re-execution (after having received data from a device model) relies
> on
> >>>> the same register state still being in place as it was when the
> request
> >>>> was first sent to the device model. Therefore vCPU state changes
> >>>> effected by remote sources need to result in no attempt of re-
> >> execution.
> >>>> Instead the returned data is to simply be ignored.
> >>>>
> >>>> Note that any such asynchronous state changes happen with the vCPU at
> >>>> least paused (potentially down and/or not marked ->is_initialised),
> so
> >>>> there's no issue with fiddling with register state behind the
> actively
> >>>> running emulator's back. Hence the new function doesn't need to
> >>>> synchronize with the core emulation logic.
> >>>>
> >>>> Suggested-by: Andrew Cooper 
> >>>> Signed-off-by: Jan Beulich 
> >>>
> >>> Need we be concerned with any page-split I/O here? That may manifest
> as
> >>> two separate emulations and AFAICT it would be possible for only the
> >>> second part to be aborted by this change.
> >>
> >> I'm not sure whether e.g. INIT is recognized only on insn boundaries.
> >> I.e. this may not be that different from real hardware behavior. _If_
> >> we were to take care of this, how would you envision undoing the
> >> first part of such an access, most notably when the access has side
> >> effects?
> >
> > I wasn't thinking of undoing... I was more thinking that vcpu_pause()
> > ought to defer until an in-progress emulation has fully completed.
> 
> Hmm, at the first glance this looks ugly/fragile to arrange for. I'm
> having a hard time thinking of a rough sketch of how such could be
> made work, in particular without blocking the vcpu_pause() itself
> for too long.
> 

If the vcpu is at the mercy of an external emulator it could take a while. I 
can't really think of a way to avoid that though. Maybe pausing at a 
non-architectural boundary is ok here though.

  Paul

> Jan
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 2/4] x86/HVM: implement memory read caching for insn emulation

2020-03-03 Thread Durrant, Paul
> -Original Message-
> From: Xen-devel  On Behalf Of Jan 
> Beulich
> Sent: 03 March 2020 10:17
> To: xen-devel@lists.xenproject.org
> Cc: Kevin Tian ; Wei Liu ; Paul Durrant 
> ; Andrew
> Cooper ; Jun Nakajima ; 
> Roger Pau Monné
> 
> Subject: [EXTERNAL][Xen-devel] [PATCH v5 2/4] x86/HVM: implement memory read 
> caching for insn
> emulation
> 
> CAUTION: This email originated from outside of the organization. Do not click 
> links or open
> attachments unless you can confirm the sender and know the content is safe.
> 
> 
> 
> Emulation requiring device model assistance uses a form of instruction
> re-execution, assuming that the second (and any further) pass takes
> exactly the same path. This is a valid assumption as far as use of CPU
> registers goes (as those can't change without any other instruction
> executing in between [1]), but is wrong for memory accesses. In
> particular it has been observed that Windows might page out buffers
> underneath an instruction currently under emulation (hitting between two
> passes). If the first pass read a memory operand successfully, any
> subsequent pass needs to get to see the exact same value.
> 
> Introduce a cache to make sure above described assumption holds. This
> is a very simplistic implementation for now: Only exact matches are
> satisfied (no overlaps or partial reads or anything); this is sufficient
> for the immediate purpose of making re-execution an exact replay. The
> cache also won't be used just yet for guest page walks; that'll be the
> subject of a subsequent change.
> 
> With the cache being generally transparent to upper layers, but with it
> having limited capacity yet being required for correctness, certain
> users of hvm_copy_from_guest_*() need to disable caching temporarily,
> without invalidating the cache. Note that the adjustments here to
> hvm_hypercall() and hvm_task_switch() are benign at this point; they'll
> become relevant once we start to be able to emulate respective insns
> through the main emulator (and more changes will then likely be needed
> to nested code).
> 
> As to the actual data page in a problamtic scenario, there are a couple
> of aspects to take into consideration:
> - We must be talking about an insn accessing two locations (two memory
>   ones, one of which is MMIO, or a memory and an I/O one).
> - If the non I/O / MMIO side is being read, the re-read (if it occurs at
>   all) is having its result discarded, by taking the shortcut through
>   the first switch()'s STATE_IORESP_READY case in hvmemul_do_io(). Note
>   how, among all the re-issue sanity checks there, we avoid comparing
>   the actual data.
> - If the non I/O / MMIO side is being written, it is the OSes
>   responsibility to avoid actually moving page contents to disk while
>   there might still be a write access in flight - this is no different
>   in behavior from bare hardware.
> - Read-modify-write accesses are, as always, complicated, and while we
>   deal with them better nowadays than we did in the past, we're still
>   not quite there to guarantee hardware like behavior in all cases
>   anyway. Nothing is getting worse by the changes made here, afaict.
> 
> In __hvm_copy() also reduce p's scope and change its type to void *.
> 
> [1] Other than on actual hardware, actions like
> XEN_DOMCTL_sethvmcontext, XEN_DOMCTL_setvcpucontext,
> VCPUOP_initialise, INIT, or SIPI issued against the vCPU can occur
> while the vCPU is blocked waiting for a device model to return data.
> In such cases emulation now gets canceled, though, and hence re-
> execution correctness is unaffected.
> 
> Signed-off-by: Jan Beulich 
> ---
> TBD: In principle the caching here yields unnecessary the one used for
>  insn bytes (vio->mmio_insn{,_bytes}. However, to seed the cache
>  with the data SVM may have made available, we'd have to also know
>  the corresponding GPA. It's not safe, however, to re-walk the page
>  tables to find out, as the page tables may have changed in the
>  meantime. Therefore I guess we need to keep the duplicate
>  functionality for now. A possible solution to this could be to use
>  a physical-address-based cache for page table accesses (and looking
>  forward also e.g. SVM/VMX insn emulation), and a linear-address-
>  based one for all other reads.
> ---
> v5: Re-arrange bitfield. Use domain_crash() in hvmemul_write_cache().
> Move hvmemul_{read,write}_cache() stubs to later patch. Also adjust
> hvmemul_cancel(). Add / extend comments. Re-base.
> v4: Re-write for cache to become transparent to callers.
> v3: Add text about the actual data page to the description.
> v2: Re-base.
> 

Generally LGTM, just a couple of points below...

> --- a/xen/arch/x86/hvm/emulate.c
> +++ b/xen/arch/x86/hvm/emulate.c
> @@ -28,6 +28,19 @@
>  #include 
>  #include 
> 
> +struct hvmemul_cache
> +{
> +/* The cache is disabled as long as num_ents > max_ents. */
> +unsigned int 

Re: [Xen-devel] [PATCH] x86/cpuid: Untangle Invariant TSC handling

2020-03-04 Thread Durrant, Paul
> -Original Message-
> From: Xen-devel  On Behalf Of Andrew 
> Cooper
> Sent: 03 March 2020 18:25
> To: Xen-devel 
> Cc: Wei Liu ; Andrew Cooper ; Jan 
> Beulich ;
> Anthony PERARD ; Ian Jackson 
> ; Roger Pau Monné
> 
> Subject: [Xen-devel] [PATCH] x86/cpuid: Untangle Invariant TSC handling
> 
> ITSC being visible to the guest is currently implicit with the toolstack
> unconditionally asking for it, and Xen clipping it based on the vTSC and/or
> XEN_DOMCTL_disable_migrate settings.
> 
> This is problematic for several reasons.
> 
> First, the implicit vTSC behaviour manifests as a real bug on migration to a
> host with a different frequency, with ITSC but without TSC scaling
> capabilities, whereby the ITSC feature becomes advertised to the guest.  ITSC
> will disappear again if the guest migrates to server with the same frequency
> as the original, or to one with TSC scaling support.
> 
> Secondly, disallowing ITSC unless the guest doesn't migrate is conceptually
> wrong.  It is common to have migration pools of identical hardware, at which
> point the TSC frequency is the same, and more modern hardware has TSC scaling
> support anyway.  In both cases, it is safe to advertise ITSC and migrate the
> guest.
> 
> Remove all implicit logic logic in Xen, and make ITSC part of the max CPUID

One too many 'logic's there.

> policies for guests.  Plumb an itsc parameter into xc_cpuid_apply_policy() and
> have libxl__cpuid_legacy() fill in the two cases where it can reasonably
> expect ITSC to be safe for the guest to see.
> 
> This is a behaviour change for TSC_MODE_NATIVE, where the ITSC will now
> reliably not appear, and for the case where the user explicitly requests ITSC,
> in which case it will appear even if the guest isn't marked as nomigrate.
> 

Does this mean a guest that would have seen ITSC on 4.13 may now no longer see 
it in 4.14 or is the TSC_MODE_NATIVE case just the one where the feature may 
erroneously appear after migration?

  Paul

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] tools/libxc: Reduce feature handling complexity in xc_cpuid_apply_policy()

2020-03-04 Thread Durrant, Paul
> -Original Message-
> From: Xen-devel  On Behalf Of Andrew 
> Cooper
> Sent: 03 March 2020 18:23
> To: Xen-devel 
> Cc: Andrew Cooper ; Ian Jackson 
> ; Wei Liu
> ; Jan Beulich ; Roger Pau Monné 
> 
> Subject: [Xen-devel] [PATCH] tools/libxc: Reduce feature handling complexity 
> in
> xc_cpuid_apply_policy()
> 
> xc_cpuid_apply_policy() is gaining extra parameters to untangle CPUID
> complexity in Xen.  While an improvement in general, it does have the
> unfortunate side effect of duplicating some settings across muliple
> parameters.
> 
> Rearrange the logic to only consider 'pae' if no explicit featureset is
> provided.  This reduces the complexity for callers who have already provided a
> pae setting in the featureset.
> 
> Signed-off-by: Andrew Cooper 

Reviewed-by: Paul Durrant 
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/3] libxl: make creation of xenstore suspend event channel node optional

2020-02-28 Thread Durrant, Paul
> -Original Message-
> From: Julien Grall 
> Sent: 27 February 2020 22:52
> To: Durrant, Paul ; xen-devel@lists.xenproject.org
> Cc: Anthony PERARD ; Ian Jackson
> ; Wei Liu 
> Subject: Re: [Xen-devel] [PATCH 2/3] libxl: make creation of xenstore
> suspend event channel node optional
> 
> Hi,
> 
> On 26/02/2020 16:08, Paul Durrant wrote:
> > The purpose and semantics of the node are explained in
> > xenstore-paths.pandoc [1]. It was originally introduced in xend by
> commit
> > 17636f47a474 "Teach xc_save to use event-channel-based domain suspend if
> > available.". Note that, because, the top-level frontend 'device' node
> was
> > created writable by the guest in xend, there was no need to explicitly
> > create the 'suspend-event-channel' node as writable node.
> >
> > However, libxl creates the 'device' node as read-only by the guest and
> so
> > explicit creation of the 'suspend-event-channel' node is necessary to
> make
> > it usable. This unfortunately has the side-effect of making some old
> > Windows PV drivers [2] cease to function. This is because they scan the
> top
> > level 'device' node, find the 'suspend' node and expect it to contain
> the
> > usual sub-nodes describing a PV frontend. When this is found not to be
> the
> > case, enumeration ceases and (because the 'suspend' node is observed
> before
> > the 'vbd' node) no system disk is enumerated. Windows will then crash
> with
> > bugcheck code 0x7B.
> >
> > This patch adds a boolean 'suspend_event_channel' field into
> > libxl_create_info to control whether the xenstore node is created and a
> > similarly named option in xl.cfg which, for compatibility with previous
> > libxl behaviour, defaults to true.
> >
> > [1]
> https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/misc/xenstore-
> paths.pandoc;hb=HEAD#l177
> > [2] https://access.redhat.com/documentation/en-
> us/red_hat_enterprise_linux/5/html/para-
> virtualized_windows_drivers_guide/sect-para-
> virtualized_windows_drivers_guide-
> installing_and_configuring_the_para_virtualized_drivers-
> installing_the_para_virtualized_drivers
> >
> > NOTE: While adding the new LIBXL_HAVE_CREATEINFO_SUSPEND_EVENT_CHANNEL
> >definition into libxl.h, this patch corrects the previous stanza
> >which erroneously implies ibxl_domain_create_info is a function.
> >
> > Signed-off-by: Paul Durrant 
> > ---
> > Cc: Ian Jackson 
> > Cc: Wei Liu 
> > Cc: Anthony PERARD 
> > ---
> >   docs/man/xl.cfg.5.pod.in|  7 +++
> >   tools/libxl/libxl.h | 13 -
> >   tools/libxl/libxl_create.c  | 12 +---
> >   tools/libxl/libxl_types.idl |  1 +
> >   tools/xl/xl_parse.c |  3 +++
> 
> You may want to update xenstore-paths.pandoc as the document mention the
> node will be created by the toolstack.
> 

The doc already says that:

-
 ~/device/suspend/event-channel = ""|EVTCHN [w]
 
The domain's suspend event channel. The toolstack will create this
path with an empty value which the guest may choose to overwrite.
-

> >   5 files changed, 32 insertions(+), 4 deletions(-)
> >
> > diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
> > index 0cad561375..5f476f1e1d 100644
> > --- a/docs/man/xl.cfg.5.pod.in
> > +++ b/docs/man/xl.cfg.5.pod.in
> > @@ -668,6 +668,13 @@ file.
> >
> >   =back
> >
> > +=item B
> > +
> > +Create the xenstore path for the domain's suspend event channel. The
> > +existence of this path can cause problems with older PV drivers running
> > +in the guest. If this option is not specified then it will default to
> > +B.
> 
> In the next patch you are going to make device/ rw. Do you see any issue
> with just not creating the node for everyone? Are PV drivers allowed to
> assume a node will be present?

Well that's the problem. Some of them may have become accustomed to it being 
present. Also the documentation does say it is created by the toolstack (but 
not when). Perhaps I should update the doc to say the toolstack *may* create 
this path when the domain is created.

> 
> My knowledge of xenstore is limited, so I thought I would ask the
> questions to understand a bit more how stable the ABI is meant to be. :).

Stable? That'll be the day :-)

  Paul

> 
> Cheers,
> 
> --
> Julien Grall
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/3] libxl: make creation of xenstore suspend event channel node optional

2020-02-28 Thread Durrant, Paul
> -Original Message-
> From: Julien Grall 
> Sent: 28 February 2020 10:26
> To: Durrant, Paul ; xen-devel@lists.xenproject.org
> Cc: Anthony PERARD ; Ian Jackson
> ; Wei Liu 
> Subject: Re: [Xen-devel] [PATCH 2/3] libxl: make creation of xenstore
> suspend event channel node optional
> 
> Hi Paul,
> 
> On 28/02/2020 09:28, Durrant, Paul wrote:
> >> -Original Message-
> >> From: Julien Grall 
> >> Sent: 27 February 2020 22:52
> >> To: Durrant, Paul ; xen-
> de...@lists.xenproject.org
> >> Cc: Anthony PERARD ; Ian Jackson
> >> ; Wei Liu 
> >> Subject: Re: [Xen-devel] [PATCH 2/3] libxl: make creation of xenstore
> >> suspend event channel node optional
> >>
> >> Hi,
> >>
> >> On 26/02/2020 16:08, Paul Durrant wrote:
> >>> The purpose and semantics of the node are explained in
> >>> xenstore-paths.pandoc [1]. It was originally introduced in xend by
> >> commit
> >>> 17636f47a474 "Teach xc_save to use event-channel-based domain suspend
> if
> >>> available.". Note that, because, the top-level frontend 'device' node
> >> was
> >>> created writable by the guest in xend, there was no need to explicitly
> >>> create the 'suspend-event-channel' node as writable node.
> >>>
> >>> However, libxl creates the 'device' node as read-only by the guest and
> >> so
> >>> explicit creation of the 'suspend-event-channel' node is necessary to
> >> make
> >>> it usable. This unfortunately has the side-effect of making some old
> >>> Windows PV drivers [2] cease to function. This is because they scan
> the
> >> top
> >>> level 'device' node, find the 'suspend' node and expect it to contain
> >> the
> >>> usual sub-nodes describing a PV frontend. When this is found not to be
> >> the
> >>> case, enumeration ceases and (because the 'suspend' node is observed
> >> before
> >>> the 'vbd' node) no system disk is enumerated. Windows will then crash
> >> with
> >>> bugcheck code 0x7B.
> >>>
> >>> This patch adds a boolean 'suspend_event_channel' field into
> >>> libxl_create_info to control whether the xenstore node is created and
> a
> >>> similarly named option in xl.cfg which, for compatibility with
> previous
> >>> libxl behaviour, defaults to true.
> >>>
> >>> [1]
> >> https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=docs/misc/xenstore-
> >> paths.pandoc;hb=HEAD#l177
> >>> [2] https://access.redhat.com/documentation/en-
> >> us/red_hat_enterprise_linux/5/html/para-
> >> virtualized_windows_drivers_guide/sect-para-
> >> virtualized_windows_drivers_guide-
> >> installing_and_configuring_the_para_virtualized_drivers-
> >> installing_the_para_virtualized_drivers
> >>>
> >>> NOTE: While adding the new LIBXL_HAVE_CREATEINFO_SUSPEND_EVENT_CHANNEL
> >>> definition into libxl.h, this patch corrects the previous
> stanza
> >>> which erroneously implies ibxl_domain_create_info is a
> function.
> >>>
> >>> Signed-off-by: Paul Durrant 
> >>> ---
> >>> Cc: Ian Jackson 
> >>> Cc: Wei Liu 
> >>> Cc: Anthony PERARD 
> >>> ---
> >>>docs/man/xl.cfg.5.pod.in|  7 +++
> >>>tools/libxl/libxl.h | 13 -
> >>>tools/libxl/libxl_create.c  | 12 +---
> >>>tools/libxl/libxl_types.idl |  1 +
> >>>tools/xl/xl_parse.c |  3 +++
> >>
> >> You may want to update xenstore-paths.pandoc as the document mention
> the
> >> node will be created by the toolstack.
> >>
> >
> > The doc already says that:
> >
> > -
> >  ~/device/suspend/event-channel = ""|EVTCHN [w]
> >
> > The domain's suspend event channel. The toolstack will create this
> > path with an empty value which the guest may choose to overwrite.
> > -
> 
> The paragraph you quoted does not suggest the node may not exist. So I
> think you want to update the documentation to reflect the node may not
> exist.
> 
> >>>5 files changed, 32 insertions(+), 4 deletions(-)
> >>>
> >>> diff --git a/docs/man/xl.cfg.5.pod.in b/docs/man/xl.cfg.5.pod.in
> >>> index 0cad561375..5f476f1e1d 100644
> >>> --- a/docs/man/xl.cfg.5.pod.in
> >>> +++ b/

Re: [Xen-devel] [PATCH v5 0/2] docs: Migration design documents

2020-02-28 Thread Durrant, Paul
Ping again...

> -Original Message-
> From: Durrant, Paul 
> Sent: 20 February 2020 12:54
> To: Durrant, Paul ; xen-devel@lists.xenproject.org
> Cc: Andrew Cooper ; George Dunlap
> ; Ian Jackson ;
> Jan Beulich ; Julien Grall ; Konrad
> Rzeszutek Wilk ; Stefano Stabellini
> ; Wei Liu 
> Subject: RE: [PATCH v5 0/2] docs: Migration design documents
> 
> Ping?
> 
> I have not receieved any further comments on v5. Can I please get acks or
> otherwise so we can (hopefully) move on with coding?
> 
>   Paul
> 
> > -Original Message-
> > From: Paul Durrant 
> > Sent: 13 February 2020 10:53
> > To: xen-devel@lists.xenproject.org
> > Cc: Durrant, Paul ; Andrew Cooper
> > ; George Dunlap
> ;
> > Ian Jackson ; Jan Beulich
> ;
> > Julien Grall ; Konrad Rzeszutek Wilk
> > ; Stefano Stabellini ;
> Wei
> > Liu 
> > Subject: [PATCH v5 0/2] docs: Migration design documents
> >
> > Paul Durrant (2):
> >   docs/designs: Add a design document for non-cooperative live migration
> >   docs/designs: Add a design document for migration of xenstore data
> >
> >  docs/designs/non-cooperative-migration.md | 272 ++
> >  docs/designs/xenstore-migration.md| 136 +++
> >  2 files changed, 408 insertions(+)
> >  create mode 100644 docs/designs/non-cooperative-migration.md
> >  create mode 100644 docs/designs/xenstore-migration.md
> > ---
> > Cc: Andrew Cooper 
> > Cc: George Dunlap 
> > Cc: Ian Jackson 
> > Cc: Jan Beulich 
> > Cc: Julien Grall 
> > Cc: Konrad Rzeszutek Wilk 
> > Cc: Stefano Stabellini 
> > Cc: Wei Liu 
> > --
> > 2.20.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 5/6] mm: add 'is_special_page' macro...

2020-03-06 Thread Durrant, Paul
> -Original Message-
> From: detected-as-s...@amazon.com  On Behalf Of 
> Alan Robinson
> Sent: 06 March 2020 07:02
> To: pdurr...@amzn.com
> Cc: xen-devel@lists.xenproject.org; Stefano Stabellini 
> ; Julien Grall
> ; Wei Liu ; Konrad Rzeszutek Wilk 
> ; Andrew Cooper
> ; Durrant, Paul ; Ian 
> Jackson
> ; George Dunlap ; Tim 
> Deegan ; Tamas
> K Lengyel ; Jan Beulich ; Roger Pau 
> Monné
> 
> Subject: RE: [EXTERNAL][Xen-devel] [PATCH v3 5/6] mm: add 'is_special_page' 
> macro...
> 
> CAUTION: This email originated from outside of the organization. Do not click 
> links or open
> attachments unless you can confirm the sender and know the content is safe.
> 
> 
> 
> A typo...
> 
> On Thu, Mar 05, 2020 at 01:45:03PM +0100, pdurr...@amzn.com wrote:
> >
> > PGC_extra pages are intended to hold data structures that are associated
> > with a domain and my be mapped by that domain. They should not be treated
> 
> s/my/may/

Good spot. Will fix. Thanks,

  Paul

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 3/6] x86 / pv: do not treat PGC_extra pages as RAM when constructing dom0

2020-03-06 Thread Durrant, Paul
> -Original Message-
> From: Xen-devel  On Behalf Of Jan 
> Beulich
> Sent: 06 March 2020 11:56
> To: pdurr...@amzn.com
> Cc: xen-devel@lists.xenproject.org; Durrant, Paul ; 
> Roger Pau Monné
> ; Wei Liu ; Andrew Cooper 
> 
> Subject: Re: [Xen-devel] [PATCH v3 3/6] x86 / pv: do not treat PGC_extra 
> pages as RAM when
> constructing dom0
> 
> On 05.03.2020 13:45, pdurr...@amzn.com wrote:
> > --- a/xen/arch/x86/pv/dom0_build.c
> > +++ b/xen/arch/x86/pv/dom0_build.c
> > @@ -792,6 +792,10 @@ int __init dom0_construct_pv(struct domain *d,
> >  {
> >  mfn = mfn_x(page_to_mfn(page));
> >  BUG_ON(SHARED_M2P(get_gpfn_from_mfn(mfn)));
> > +
> > +if ( page->count_info & PGC_extra )
> > +continue;
> 
> This surely is a pattern, i.e. there are more similar changes to
> make: tboot_gen_domain_integrity() e.g. ignores d->xenpage_list,
> and hence with the goal of converting the shared info page would
> also want adjustment. For dump_numa() it may be less important,
> but it would still look more correct if it too got changed.
> audit_p2m() might apparently complain about such pages (and
> hence might be a problem with the one PGC_extra page VMX domains
> now have). And this is only from me looking at
> page_list_for_each(..., >page_list) constructs; who knows
> what else there is.
> 

Those are dealt with by the is_special_page() patch later on I think. It didn't 
seem appropriate to use that macro here though since we know pages on the page 
list cannot be xenheap pages.

  Paul
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 2/6] x86 / p2m: remove page_list check in p2m_alloc_table

2020-03-06 Thread Durrant, Paul
> -Original Message-
> From: Jan Beulich 
> Sent: 06 March 2020 11:46
> To: pdurr...@amzn.com
> Cc: xen-devel@lists.xenproject.org; Durrant, Paul ; 
> Andrew Cooper
> ; George Dunlap ; Wei 
> Liu ; Roger Pau
> Monné 
> Subject: Re: [PATCH v3 2/6] x86 / p2m: remove page_list check in 
> p2m_alloc_table
> 
> On 05.03.2020 13:45, pdurr...@amzn.com wrote:
> > From: Paul Durrant 
> >
> > There does not seem to be any justification for refusing to create the
> > domain's p2m table simply because it may have assigned pages.
> 
> I think there is: If any such allocation had happened before, how
> would it be represented in the domain's p2m?

Insertion into the p2m is a separate action from page allocation. Why should 
they be linked?

> 
> > Particularly
> > it prevents the prior allocation of PGC_extra pages.
> 
> That's unfortunate, but will need taking care of differently then:
> 
> > --- a/xen/arch/x86/mm/p2m.c
> > +++ b/xen/arch/x86/mm/p2m.c
> > @@ -695,14 +695,6 @@ int p2m_alloc_table(struct p2m_domain *p2m)
> >
> >  p2m_lock(p2m);
> >
> > -if ( p2m_is_hostp2m(p2m)
> > - && !page_list_empty(>page_list) )
> > -{
> > -P2M_ERROR("dom %d already has memory allocated\n", d->domain_id);
> > -p2m_unlock(p2m);
> > -return -EINVAL;
> > -}
> 
> Instead of checking the list to be empty, how about checking
> domain_tot_pages() to return zero?

I could do that, and in fact my original code did, but with more consideration 
the whole test just didn't make sense to me. Yes, clearly the p2m has to be 
there before pages can be added into it, but I can't see any reason why you 
couldn't even allocate the entire guest RAM, then create the p2m and then add 
the pages into it.

  Paul

> 
> Jan
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH 2/2] domain: use PGC_extra domheap page for shared_info

2020-03-06 Thread Durrant, Paul
> -Original Message-
> From: David Woodhouse 
> Sent: 06 March 2020 11:53
> To: Jan Beulich ; Durrant, Paul 
> Cc: jul...@xen.org; andrew.coop...@citrix.com; sstabell...@kernel.org; 
> konrad.w...@oracle.com;
> volodymyr_babc...@epam.com; ian.jack...@eu.citrix.com; w...@xen.org; 
> george.dun...@citrix.com; xen-
> de...@lists.xenproject.org
> Subject: RE: [EXTERNAL][PATCH 2/2] domain: use PGC_extra domheap page for 
> shared_info
> 
> On Fri, 2020-03-06 at 12:37 +0100, Jan Beulich wrote:
> > I've started looking at the latest version of Paul's series, but I'm
> > still struggling to see the picture: There's no true distinction
> > between Xen heap and domain heap on x86-64 (except on very large
> > systems). Therefore it is unclear to me what "those pages" is actually
> > referring to above. Surely new Xen can't be given any pages in use
> > _in any way_ by old Xen, no matter whether it's ones assigned to
> > domains, or ones used internally to (old) Xen.
> 
> Old and new Xen do not coexist. There is a kexec (via kexec_reloc.S and
> purgatory) from old to new.
> 
> There are some pages which new Xen MUST NOT scribble on, because they
> actually belong to the domains being preserved. That includes the EPT
> (or at least IOMMU) page tables.
> 
> I suppose new Xen also mustn't scribble on the pages in which old Xen
> has placed the migration information for those domains either. At
> least, not until it's consumed the data.
> 
> Anything else, however, is fine for new Xen to scribble on. Fairly much
> anything that the old Xen had allocated from its xenheap (and not
> subsequently shared to a guest, qv) is no longer required and can be
> treated as free memory by the new Xen, which now owns the machine.
> 

... so getting rid of shared xenheap pages altogether just makes life easier.

  Paul

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 5/6] mm: add 'is_special_page' macro...

2020-03-05 Thread Durrant, Paul
> -Original Message-
> From: Tamas K Lengyel 
> Sent: 05 March 2020 15:10
> To: pdurr...@amzn.com
> Cc: Xen-devel ; Durrant, Paul 
> ; Jan Beulich
> ; Andrew Cooper ; Wei Liu 
> ; Roger Pau Monné
> ; George Dunlap ; Ian Jackson
> ; Julien Grall ; Konrad Rzeszutek 
> Wilk
> ; Stefano Stabellini ; Tim 
> Deegan 
> Subject: RE: [EXTERNAL][PATCH v3 5/6] mm: add 'is_special_page' macro...
> 
> CAUTION: This email originated from outside of the organization. Do not click 
> links or open
> attachments unless you can confirm the sender and know the content is safe.
> 
> 
> 
> On Thu, Mar 5, 2020 at 5:45 AM  wrote:
> >
> > From: Paul Durrant 
> >
> > ... to cover xenheap and PGC_extra pages.
> >
> > PGC_extra pages are intended to hold data structures that are associated
> > with a domain and my be mapped by that domain. They should not be treated
> > as 'normal' guest pages (i.e. RAM or page tables). Hence, in many cases
> > where code currently tests is_xen_heap_page() it should also check for
> > the PGC_extra bit in 'count_info'.
> >
> > This patch therefore defines is_special_page() to cover both cases and
> > converts tests if is_xen_heap_page() to is_special_page() where
> > appropriate.
> 
> In context of VM forking, are these pages only used by some type of PV
> mechanism? If not, would we need to get them copied somehow or are
> these setup during the regular createdomain routine? Can they be
> copied on-demand, ie. do these pages pass a p2m_is_ram() check?

PGC_extra domheap pages are intended as direct replacements for shared xenheap 
pages and should be treated the same way. Thus they do not form part of the 
migration stream. Their p2m type depends entirely on how they are added to the 
p2m, as it is for any other page.

  Paul

> 
> Thanks,
> Tamas
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 2/6] x86 / p2m: remove page_list check in p2m_alloc_table

2020-03-06 Thread Durrant, Paul
> -Original Message-
> From: Jan Beulich 
> Sent: 06 March 2020 12:47
> To: Durrant, Paul 
> Cc: pdurr...@amzn.com; xen-devel@lists.xenproject.org; Andrew Cooper 
> ;
> George Dunlap ; Wei Liu ; Roger Pau 
> Monné 
> Subject: Re: [PATCH v3 2/6] x86 / p2m: remove page_list check in 
> p2m_alloc_table
> 
> On 06.03.2020 13:07, Durrant, Paul wrote:
> >> -Original Message-
> >> From: Jan Beulich 
> >> Sent: 06 March 2020 11:46
> >> To: pdurr...@amzn.com
> >> Cc: xen-devel@lists.xenproject.org; Durrant, Paul ; 
> >> Andrew Cooper
> >> ; George Dunlap ; Wei 
> >> Liu ; Roger
> Pau
> >> Monné 
> >> Subject: Re: [PATCH v3 2/6] x86 / p2m: remove page_list check in 
> >> p2m_alloc_table
> >>
> >> On 05.03.2020 13:45, pdurr...@amzn.com wrote:
> >>> From: Paul Durrant 
> >>>
> >>> There does not seem to be any justification for refusing to create the
> >>> domain's p2m table simply because it may have assigned pages.
> >>
> >> I think there is: If any such allocation had happened before, how
> >> would it be represented in the domain's p2m?
> >
> > Insertion into the p2m is a separate action from page allocation. Why 
> > should they be linked?
> 
> They are, because of how XENMEM_populate_physmap works. Yes,
> they _could_ be separate steps, but that's only a theoretical
> consideration.

Then surely the check should be in the XENMEM_populate_physmap code?

> 
> >>> Particularly
> >>> it prevents the prior allocation of PGC_extra pages.
> >>
> >> That's unfortunate, but will need taking care of differently then:
> >>
> >>> --- a/xen/arch/x86/mm/p2m.c
> >>> +++ b/xen/arch/x86/mm/p2m.c
> >>> @@ -695,14 +695,6 @@ int p2m_alloc_table(struct p2m_domain *p2m)
> >>>
> >>>  p2m_lock(p2m);
> >>>
> >>> -if ( p2m_is_hostp2m(p2m)
> >>> - && !page_list_empty(>page_list) )
> >>> -{
> >>> -P2M_ERROR("dom %d already has memory allocated\n", d->domain_id);
> >>> -p2m_unlock(p2m);
> >>> -return -EINVAL;
> >>> -}
> >>
> >> Instead of checking the list to be empty, how about checking
> >> domain_tot_pages() to return zero?
> >
> > I could do that, and in fact my original code did, but with more
> > consideration the whole test just didn't make sense to me. Yes,
> > clearly the p2m has to be there before pages can be added into it,
> > but I can't see any reason why you couldn't even allocate the
> > entire guest RAM, then create the p2m and then add the pages into
> > it.
> 
> Right - more hypercalls (XENMEM_increase_reservation + operations
> like XENMAPSPACE_gmfn), and hence slower overall domain creation.
> Plus - XENMEM_increase_reservation is not very useful for
> translated domains, as it won't return the MFNs allocated, and
> there's no way to specify where they should appear in GFN space.
> Hence in practice I don't see how this whole operation could
> work without XENMEM_populate_physmap.
> 

Oh, it would mean a big change in the tools etc. so I'm not saying it's a good 
idea or even possible at the moment. I was just pointing out that, as far as 
the lower layers of code in Xen go, page allocation and p2m insertion are 
distinct actions.

  Paul

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3] x86: irq: Do not BUG_ON multiple unbind calls for shared pirqs

2020-03-06 Thread Durrant, Paul
> -Original Message-
> From: Xen-devel  On Behalf Of 
> Durrant, Paul
> Sent: 05 March 2020 17:37
> To: Jan Beulich ; Gautam, Varad 
> Cc: xen-devel@lists.xenproject.org; Roger Pau Monné ; 
> Julien Grall
> ; Andrew Cooper 
> Subject: Re: [Xen-devel] [PATCH v3] x86: irq: Do not BUG_ON multiple unbind 
> calls for shared pirqs
> 
> > -Original Message-
> > From: Xen-devel  On Behalf Of Jan 
> > Beulich
> > Sent: 05 March 2020 09:37
> > To: Gautam, Varad 
> > Cc: xen-devel@lists.xenproject.org; Andrew Cooper 
> > ; Julien Grall
> > ; Roger Pau Monné 
> > Subject: Re: [Xen-devel] [PATCH v3] x86: irq: Do not BUG_ON multiple unbind 
> > calls for shared pirqs
> >
> > On 29.01.2020 12:47, Jan Beulich wrote:
> > > On 29.01.2020 11:30, Roger Pau Monné wrote:
> > >> Hello,
> > >>
> > >> Thanks for the patch! Next time could you please try to reply to the
> > >> previous questions before sending a new version:
> > >>
> > >> https://lists.xenproject.org/archives/html/xen-devel/2019-07/msg00257.html
> > >>
> > >> On Wed, Jan 29, 2020 at 10:28:07AM +0100, Varad Gautam wrote:
> > >>> XEN_DOMCTL_destroydomain creates a continuation if domain_kill 
> > >>> -ERESTARTS.
> > >>> In that scenario, it is possible to receive multiple _pirq_guest_unbind
> > >>> calls for the same pirq from domain_kill, if the pirq has not yet been
> > >>> removed from the domain's pirq_tree, as:
> > >>>   domain_kill()
> > >>> -> domain_relinquish_resources()
> > >>>   -> pci_release_devices()
> > >>> -> pci_clean_dpci_irq()
> > >>>   -> pirq_guest_unbind()
> > >>> -> __pirq_guest_unbind()
> > >>>
> > >>> For a shared pirq (nr_guests > 1), the first call would zap the current
> > >>> domain from the pirq's guests[] list, but the action handler is never 
> > >>> freed
> > >>> as there are other guests using this pirq. As a result, on the second 
> > >>> call,
> > >>> __pirq_guest_unbind searches for the current domain which has been 
> > >>> removed
> > >>> from the guests[] list, and hits a BUG_ON.
> > >>>
> > >>> Make __pirq_guest_unbind safe to be called multiple times by letting xen
> > >>> continue if a shared pirq has already been unbound from this guest. The
> > >>> PIRQ will be cleaned up from the domain's pirq_tree during the 
> > >>> destruction
> > >>> in complete_domain_destroy anyways.
> > >>
> > >> So AFAICT this is because pt_pirq_softirq_active() returns true in
> > >> pci_clean_dpci_irq() and hence the iteration is stopped and
> > >> hvm_domain_irq(d)->dpci is not set to NULL.
> > >>
> > >> Would it be possible to clean the already processed IRQs from the
> > >> domain pirq_tree?
> > >
> > > This might work, perhaps by way of invoking unmap_domain_pirq()
> > > right after pirq_guest_unbind(), as long as hvm_dirq_assist() (as
> > > called from dpci_softirq()) can be made skip all actual work it
> > > means to do in such a case. Unfortunately the two ->masked fields
> > > acted upon are different between __pirq_guest_unbind() and
> > > hvm_dirq_assist().
> >
> > Ping? Unless I hear back soon, I'm afraid I'm going to drop this
> > patch from my "pending" mail folder, as not being agreed whether
> > to stick to the current version or whether to go this alternative
> > route. A more "natural" approach to fixing the issue would be
> > quite nice, after all.
> 
> I'll try to pick this up tomorrow as I helped diagnose the problem being 
> fixed.
> 

I'm looking at this now and I am finding the code very confusing, but I think 
we could achieve the cleanup by passing back the irq index out of 
__pirq_guest_unbind() such that pirq_guest_unbind() can call 
clean_domain_irq_pirq(). I still haven't got much of a clue as to how all the 
data structures hang together though.

  Paul


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v7 0/6] xl/libxl: domid allocation/preservation changes

2020-02-25 Thread Durrant, Paul
> -Original Message-
> From: Ian Jackson 
> Sent: 24 February 2020 17:57
> To: Durrant, Paul 
> Cc: Stefano Stabellini ; Julien Grall
> ; Wei Liu ; Jason Andryuk
> ; Andrew Cooper ; Konrad
> Rzeszutek Wilk ; George Dunlap
> ; Jan Beulich ; Anthony
> Perard ; xen-devel@lists.xenproject.org
> Subject: RE: [PATCH v7 0/6] xl/libxl: domid allocation/preservation
> changes
> 
> Durrant, Paul writes ("RE: [PATCH v7 0/6] xl/libxl: domid
> allocation/preservation changes"):
> > See
> https://xenbits.xen.org/gitweb/?p=people/pauldu/xen.git;a=shortlog;h=refs/
> heads/domid12
> 
> Thanks.  LGTM, pushed.
> 

Excellent. Thanks,

  Paul

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] automation: document vsyscall=emulate for old glibc

2020-02-25 Thread Durrant, Paul
> -Original Message-
> From: Xen-devel  On Behalf Of Wei
> Liu
> Sent: 25 February 2020 12:11
> To: Xen Development List 
> Cc: Doug Goldstein ; Wei Liu 
> Subject: [Xen-devel] [PATCH] automation: document vsyscall=emulate for old
> glibc
> 
> Signed-off-by: Wei Liu 
> ---
>  automation/build/README.md | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/automation/build/README.md b/automation/build/README.md
> index 476f82cf45..8cda2b65a5 100644
> --- a/automation/build/README.md
> +++ b/automation/build/README.md
> @@ -58,6 +58,16 @@ understands.
>  - XEN_CONFIG_EXPERT: If this is defined in your shell it will be
>automatically passed through to the container.
> 
> +If your docker host has Linux kernel > 4.11, and you want to use
> containers
> +that run old glibc (for example, CentOS 6 or SLES11SP4), you may need to
> add
> +
> +```
> +vsyscall=emulate
> +```
> +
> +to the host kernel command line. That enables a legacy interface that is
> used
> +by old glibc.
> +

I can vouch that it is indeed necessary.

Reviewed-by: Paul Durrant 

> 
>  Building a container
>  
> --
> 2.20.1
> 
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v7 0/6] xl/libxl: domid allocation/preservation changes

2020-02-24 Thread Durrant, Paul
> -Original Message-
> From: Ian Jackson 
> Sent: 24 February 2020 15:54
> To: Durrant, Paul 
> Cc: xen-devel@lists.xenproject.org; Andrew Cooper
> ; Anthony Perard ;
> George Dunlap ; Jan Beulich ;
> Jason Andryuk ; Julien Grall ; Konrad
> Rzeszutek Wilk ; Stefano Stabellini
> ; Wei Liu 
> Subject: Re: [PATCH v7 0/6] xl/libxl: domid allocation/preservation
> changes
> 
> Paul Durrant writes ("[PATCH v7 0/6] xl/libxl: domid
> allocation/preservation changes"):
> > Paul Durrant (6):
> >   libxl: add infrastructure to track and query 'recent' domids
> >   libxl: modify libxl__logv() to only log valid domid values
> >   public/xen.h: add a definition for a 'valid domid' mask
> >   libxl: allow creation of domains with a specified or random domid
> >   xl.conf: introduce 'domid_policy'
> >   xl: allow domid to be preserved on save/restore or migrate
> 
> Thanks for this.  I think this has enough acks to go in now.  Would
> you care to fold the acks/reviews and pass me a git branch ?  If it's
> easy for you, that would be more convenient and reliable than relying
> on git-am.  Feel free to negotiate about details on irc...
> 

Sure, I'll point you at a branch, hopefully within the next hour or so.

  Thanks,

Paul

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v3 3/3] x86 / vmx: use a 'normal' domheap page for APIC_DEFAULT_PHYS_BASE

2020-01-24 Thread Durrant, Paul
> -Original Message-
> From: George Dunlap [mailto:george.dun...@citrix.com]
> Sent: 23 January 2020 14:46
> To: Durrant, Paul ; xen-devel@lists.xenproject.org
> Cc: Jan Beulich ; Andrew Cooper
> ; Wei Liu ; Roger Pau Monné
> ; George Dunlap ; Ian
> Jackson ; Julien Grall ; Konrad
> Rzeszutek Wilk ; Stefano Stabellini
> ; Jun Nakajima ; Kevin
> Tian 
> Subject: Re: [PATCH v3 3/3] x86 / vmx: use a 'normal' domheap page for
> APIC_DEFAULT_PHYS_BASE
> 
> On 1/23/20 2:03 PM, Paul Durrant wrote:
> > vmx_alloc_vlapic_mapping() currently contains some very odd looking
> > code that allocates a MEMF_no_owner domheap page and then shares with
> > the guest as if it were a xenheap page. This then requires
> > vmx_free_vlapic_mapping() to call a special function in the mm code:
> free_shared_domheap_page().
> >
> > By using a 'normal' domheap page (i.e. by not passing MEMF_no_owner to
> > alloc_domheap_page()), the odd looking code in
> > vmx_alloc_vlapic_mapping() can simply use get_page_and_type() to set
> > up a writable mapping before insertion in the P2M and
> > vmx_free_vlapic_mapping() can simply release the page using
> > put_page_alloc_ref() followed by put_page_and_type(). This then allows
> free_shared_domheap_page() to be purged.
> >
> > There is, however, some fall-out from this simplification:
> >
> > - alloc_domheap_page() will now call assign_pages() and run into the
> fact
> >   that 'max_pages' is not set until some time after domain_create(). To
> >   avoid an allocation failure, domain_create() is modified to set
> >   max_pages to an initial value, sufficient to cover any domheap
> >   allocations required to complete domain creation. The value will be
> >   set to the 'real' max_pages when the tool-stack later performs the
> >   XEN_DOMCTL_max_mem operation, thus allowing the rest of the domain's
> >   memory to be allocated.
> >
> > - Because the domheap page is no longer a pseudo-xenheap page, the
> >   reference counting will prevent the domain from being destroyed. Thus
> >   the call to vmx_free_vlapic_mapping() is moved from the
> >   domain_destroy() method into the domain_relinquish_resources() method.
> >   Whilst in the area, make the domain_destroy() method an optional
> >   alternative_vcall() (since it will no longer peform any function in
> VMX
> >   and is stubbed in SVM anyway).
> >
> > Signed-off-by: Paul Durrant 
> 
> This is an excellent change, thank you:
> 
> Reviewed-by: George Dunlap 
> 
> My only comment is that this would have been a bit easier to review broken
> down into probably three patches: 1) making domain_destroy optional, 2)
> moving vmx teardown to a relinquish_resources call 3) using "normal"
> pages".  But I don't think it's worth a re-send just for that.

Since I appear to need to do a v4 to re-work the allocation (assuming I can 
steal another PGC bit) I'll split things as you suggest.

  Paul

> 
>  -George
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [Vote] For Xen Project Code of Conduct (deadline March 31st)

2020-01-23 Thread Durrant, Paul
> -Original Message-
> 
> People allowed to vote on behalf of the Hypervisor project are:
> Julien Grall, Andy Cooper, George Dunlap, Ian Jackson, Jan Beulich, Konrad
> R
> Wilk, Stefano Stabellini, Wei Liu and Paul Durrant (as Release Manager).

+1

> 
> People allowed to vote on behalf of the XAPI project are:
> Chandrika Srinivasan, Christian Lindig, Konstantina Chremmou,
> Rob Hoes, Zhang Li
> 
> People allowed to vote on behalf of the Windows PV Driver Project are:
> Paul Durrant, Ben Chalmers, Owen Smith
> 

+1

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] cmdline: treat hyphens and underscores the same

2020-01-23 Thread Durrant, Paul
> -Original Message-
> From: Xen-devel  On Behalf Of Jan
> Beulich
> Sent: 23 January 2020 11:43
> To: xen-devel@lists.xenproject.org
> Cc: Stefano Stabellini ; Julien Grall
> ; Wei Liu ; Konrad Wilk
> ; George Dunlap ;
> Andrew Cooper ; Ian Jackson
> 
> Subject: [Xen-devel] [PATCH v2] cmdline: treat hyphens and underscores the
> same
> 
> In order to avoid permanently having to ask that no new command line
> options using underscores be introduced (albeit I'm likely to still make
> remarks), and in order to also allow extending the use of hyphens to
> pre-existing ones, introduce custom comparison functions treating both
> characters as matching.
> 
> Signed-off-by: Jan Beulich 
> ---
> v2: Rename to opt_str{,n}cmp(). Don't use the new function for comapring
> against "no-" in parse_params(). Add comment to cdiff().
> 
> --- a/docs/misc/xen-command-line.pandoc
> +++ b/docs/misc/xen-command-line.pandoc
> @@ -72,6 +72,11 @@ Some options take a comma separated list
>  Some parameters act as combinations of the above, most commonly a mix
>  of Boolean and String.  These are noted in the relevant sections.
> 
> +### Spelling
> +
> +Parameter names may include hyphens or underscores.  These are
> +generally being treated as matching one another by the parsing logic.
> +
>  ## Parameter details
> 
>  ### acpi
> --- a/xen/common/kernel.c
> +++ b/xen/common/kernel.c
> @@ -23,6 +23,53 @@ enum system_state system_state = SYS_STA
>  xen_commandline_t saved_cmdline;
>  static const char __initconst opt_builtin_cmdline[] = CONFIG_CMDLINE;
> 
> +/*
> + * Calculate the difference between two characters for command line
> parsing
> + * purposes, i.e. treating '-' and '_' the same.
> + */
> +static int cdiff(unsigned char c1, unsigned char c2)
> +{
> +int res = c1 - c2;
> +
> +if ( res && (c1 ^ c2) == ('-' ^ '_') &&
> + (c1 == '-' || c1 == '_') )
> +res = 0;
> +

Wow! That makes my head hurt. How about:

static int cdiff(unsigned char c1, unsigned char c2)
{
if ( c1 == '-' )
c1 = '_';

if ( c2 == '-' )
c2 = '_';

return c1 - c2;
}

?

  Paul
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 1/3] x86 / vmx: make apic_access_mfn type-safe

2020-01-23 Thread Durrant, Paul
> -Original Message-
> From: Jan Beulich 
> Sent: 23 January 2020 13:30
> To: Durrant, Paul 
> Cc: Kevin Tian ; Wei Liu ; Andrew Cooper
> ; Jun Nakajima ; xen-
> de...@lists.xenproject.org; Roger Pau Monné 
> Subject: Re: [PATCH v2 1/3] x86 / vmx: make apic_access_mfn type-safe
> 
> On 23.01.2020 14:09, Durrant, Paul wrote:
> >> -Original Message-
> >> From: Xen-devel  On Behalf Of
> Jan
> >> Beulich
> >> Sent: 23 January 2020 12:45
> >> To: Durrant, Paul 
> >> Cc: Kevin Tian ; Wei Liu ; Andrew
> Cooper
> >> ; Jun Nakajima ;
> xen-
> >> de...@lists.xenproject.org; Roger Pau Monné 
> >> Subject: Re: [Xen-devel] [PATCH v2 1/3] x86 / vmx: make apic_access_mfn
> >> type-safe
> >>
> >> On 23.01.2020 13:21, Paul Durrant wrote:
> >>> Use mfn_t rather than unsigned long and change previous tests against
> 0
> >> to
> >>> tests against INVALID_MFN (also introducing initialization to that
> >> value).
> >>>
> >>> Signed-off-by: Paul Durrant 
> >>> Acked-by: Kevin Tian 
> >>> Reviewed-by: Jan Beulich 
> >>
> >> No, this isn't what the R-b was given for.
> >
> > Oh, sorry, I misunderstood; I thought the R-b was good as long as
> idempotency was ensured.
> >
> >>
> >>> v2:
> >>>  - Set apic_access_mfn to INVALID_MFN in vmx_free_vlapic_mapping() to
> >> make
> >>>the function idempotent
> >>
> >> Andrew had suggested to use 0 instead of INVALID_MFN. I don't see
> >> how you achieved idempotency with this adjustment. Aiui
> >> vmx_free_vlapic_mapping() is supposed to also run correctly if
> >> vmx_alloc_vlapic_mapping() was never called.
> >
> > It will. vmx_domain_initialise() will set apic_access_mfn to INVALID_MFN
> > so vmx_free_vlapic_mapping() will do nothing.
> 
> I'm sorry, it was implied that it also needs to work if
> vmx_domain_initialise() was never called. Andrew's goal after
> all is, aiui, to be able to call "destroy" functions on error
> paths irrespective of how far "create" had managed to progress.
> 

Oh, I see. That implication was not at all obvious to me. I thought he was just 
after being able to 'destroy' as many times as it took to finish, in which case 
our choice for the value of INVALID_MFN is indeed unfortunate. If that's the 
goal I'll switch to use _mfn(0) as a comparator.

  Paul

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2 1/3] x86 / vmx: make apic_access_mfn type-safe

2020-01-23 Thread Durrant, Paul
> -Original Message-
> From: Xen-devel  On Behalf Of Jan
> Beulich
> Sent: 23 January 2020 12:45
> To: Durrant, Paul 
> Cc: Kevin Tian ; Wei Liu ; Andrew Cooper
> ; Jun Nakajima ; xen-
> de...@lists.xenproject.org; Roger Pau Monné 
> Subject: Re: [Xen-devel] [PATCH v2 1/3] x86 / vmx: make apic_access_mfn
> type-safe
> 
> On 23.01.2020 13:21, Paul Durrant wrote:
> > Use mfn_t rather than unsigned long and change previous tests against 0
> to
> > tests against INVALID_MFN (also introducing initialization to that
> value).
> >
> > Signed-off-by: Paul Durrant 
> > Acked-by: Kevin Tian 
> > Reviewed-by: Jan Beulich 
> 
> No, this isn't what the R-b was given for.

Oh, sorry, I misunderstood; I thought the R-b was good as long as idempotency 
was ensured.

> 
> > v2:
> >  - Set apic_access_mfn to INVALID_MFN in vmx_free_vlapic_mapping() to
> make
> >the function idempotent
> 
> Andrew had suggested to use 0 instead of INVALID_MFN. I don't see
> how you achieved idempotency with this adjustment. Aiui
> vmx_free_vlapic_mapping() is supposed to also run correctly if
> vmx_alloc_vlapic_mapping() was never called.

It will. vmx_domain_initialise() will set apic_access_mfn to INVALID_MFN so 
vmx_free_vlapic_mapping() will do nothing.

  Paul
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] live migration from 4.12 to 4.13 fails due to qemu-xen bug

2020-01-27 Thread Durrant, Paul
> -Original Message-
> From: Olaf Hering [mailto:o...@aepfle.de]
> Sent: 27 January 2020 13:00
> To: Durrant, Paul 
> Cc: xen-de...@lists.xen.org
> Subject: Re: [Xen-devel] live migration from 4.12 to 4.13 fails due to
> qemu-xen bug
> 
> Am Mon, 27 Jan 2020 11:54:37 +0000
> schrieb "Durrant, Paul" :
> 
> > > Should the string "pc-i440fx-3.0" become a configure option?
> > I suppose. Could we have "pc-i440fx" as the default, which libxl prefix
> matches against qemu's supported versions to select the latest?
> 
> I think the qemu machine variant must become a property of the running
> domU, so that it will not get lost during migration. For incoming domUs
> without such property some default must be selected by libxl. libxl at
> runtime has no info what the initial qemu command was. So this fallback
> must become a compile or runtime knob as well. Not sure if it would be too
> cumbersome for host admins to apply the equivalent of
> "device_model_args_hvm=" to a five or six digit number of running domUs
> during or prior their migration.
> 
> There should be a --qemuu-hvm-machine, which may just default to 'pc-1.0'
> if not specified. That string should go to
> domain_build_info.u.hvm.qemuu_machine, so that it becomes part of the domU
> properties.
> 

Could we have an opinion from a toolstack maintainer (cc-ed), please?

  Paul

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] live migration from 4.12 to 4.13 fails due to qemu-xen bug

2020-01-27 Thread Durrant, Paul
> -Original Message-
> From: Xen-devel [mailto:xen-devel-boun...@lists.xenproject.org] On Behalf
> Of Olaf Hering
> Sent: 27 January 2020 11:30
> To: xen-de...@lists.xen.org
> Subject: Re: [Xen-devel] live migration from 4.12 to 4.13 fails due to
> qemu-xen bug
> 
> Am Mon, 13 Jan 2020 11:36:27 +0100
> schrieb Olaf Hering :
> 
> > This HVM domU fails to live migrate from staging-4.12 to staging-4.13:
> 
> It turned out libxl does not configure qemu correctly at runtime:
> libxl__build_device_model_args_new() uses 'qemu -machine xenfv', perhaps
> with the assumption that 'xenfv' does the right thing. Unfortunately,
> 'xenfv' entirely ignores compatibility of "pc-i440fx" between qemu
> versions, 'xenfv' just maps to 'pc' aka 'the lastest'. Instead of 'qemu -
> machine xenfv', libxl should run 'qemu -machine pc-i440fx-3.0 -device xen-
> platform -accel xen' to make sure the domU can be migrated safely to
> future versions of qemu.

Agreed, I think use xenfv needs to be dropped and xl/libxl ought to specify the 
pc version it wants, as you suggest. For compat though, if the pc version is 
not specified in xl.cfg we'd need a mechanism to scan the versions supported by 
the installed qemu and then pick the latest, such that it then gets baked into 
the json blob for save/restore/migration purposes. 

> 
> Maybe there should also be a way to select a specific variant of "pc-
> i440fx". Currently the only way to do that is to use
> device_model_args_hvm= in xl.cfg. Unfortunately libvirt does not support
> "b_info->extra*".
> 

Yeah, it should be a first class config option.

> Should the string "pc-i440fx-3.0" become a configure option?
> 

I suppose. Could we have "pc-i440fx" as the default, which libxl prefix matches 
against qemu's supported versions to select the latest? I guess that would work.

Functionally your code looks fine, but I don't think fixing on 3.0 is the right 
thing to do. What happens if someone is trying to use an older version of qemu? 
It's going to cause unexpected breakage I think.

  Paul


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 2/4] mm: modify domain_adjust_tot_pages() to better handle a zero adjustment

2020-01-29 Thread Durrant, Paul
> -Original Message-
> From: Jan Beulich 
> Sent: 29 January 2020 11:13
> To: Durrant, Paul 
> Cc: xen-devel@lists.xenproject.org; Andrew Cooper
> ; George Dunlap ;
> Ian Jackson ; Julien Grall ;
> Konrad Rzeszutek Wilk ; Stefano Stabellini
> ; Wei Liu 
> Subject: Re: [PATCH v5 2/4] mm: modify domain_adjust_tot_pages() to better
> handle a zero adjustment
> 
> On 29.01.2020 11:16, Paul Durrant wrote:
> > Currently the function will pointlessly acquire and release the global
> > 'heap_lock' in this case.
> >
> > NOTE: No caller yet calls domain_adjust_tot_pages() with a zero 'pages'
> >   argument, but a subsequent patch will make this possible.
> 
> With this memory_exchange(), as previously indicated, now needlessly
> prevents the call when !dec_count. I do think, as said there, that
> together with the addition here then redundant checks in callers
> should be dropped (and as it looks the named one is the only one).
> 

Ok, yes I missed that.

  Paul

> Jan
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [ANNOUNCE] Xen 4.14 Development Update

2020-01-29 Thread Durrant, Paul
> -Original Message-
> From: David Woodhouse 
> Sent: 29 January 2020 13:05
> To: xen-devel@lists.xenproject.org; Durrant, Paul 
> Cc: Woodhouse, David ; luwei.k...@intel.com;
> marma...@invisiblethingslab.com; andrew.coop...@citrix.com;
> roger@citrix.com
> Subject: Re: [ANNOUNCE] Xen 4.14 Development Update
> 
> On Wed, 2020-01-29 at 12:36 +, Paul Durrant wrote:
> > This email only tracks big items for xen.git tree. Please reply for
> items
> > you would like to see in 4.14 so that people have an idea what
> > is going on and prioritise accordingly.
> >
> > You're welcome to provide description and use cases of the feature
> you're
> > working on.
> >
> > = Timeline =
> >
> > We now adopt a fixed cut-off date scheme. We will release about every 8
> >  months.
> > The critical dates for Xen 4.14 are as follows:
> >
> > ---> We are here
> > * Last posting date: May 1st, 2020
> > * Hard code freeze: May 22nd, 2020
> > * Release: June 26th, 2020
> >
> > Note that we don't have a freeze exception scheme anymore. All patches
> > that wish to go into 4.14 must be posted initially no later than the
> > last posting date and finally no later than the hard code freeze.
> > All patches posted after that date will be automatically queued into
> next
> > release.
> >
> > RCs will be arranged immediately after freeze.
> >
> > There is also a jira instance to track all the tasks (not only big)
> > for the project. See:
> https://xenproject.atlassian.net/projects/XEN/issues.
> >
> > Some of the tasks tracked by this e-mail also have a corresponding jira
> task
> > referred by XEN-N.
> >
> > There is a version number for patch series associated to each feature.
> > Can each owner send an update giving the latest version number if the
> > series has been re-posted? Also, can the owners of any completed items
> > please respond so that the item can be moved into the 'Completed'
> section.
> >
> > = Projects =
> >
> > == Hypervisor ==
> >
> > *  Live-Updating Xen
> >   -  David Woodhouse
> 
> Latest RFC patchset posted is [RFC v2] at
> https://lists.xenproject.org/archives/html/xen-devel/2020-01/msg01764.html
> 
> The tree at
> https://xenbits.xen.org/gitweb/?p=people/dwmw2/xen.git;a=shortlog;h=refs/h
> eads/lu-master
> as moved on a little since then, and I'll post [RFC v3] shortly.
> 
> So far this is mostly just the basic handover of a stream of migration
> data from one Xen to the next, and allowing the new Xen to vmap and
> process it early enough, and reserve the pages which already contain
> domain data during its "boot".
> 
> In our development tree, we have a PV Dom0 surviving the transition.
> We're working on turning those hacks into something we can show in
> public.
> 
> I have updated the wiki page at https://wiki.xenproject.org/wiki/Live-
> Updating_Xen
> which includes a reference to the handover protocol documentation.
> 
> This also depends on the hypervisor side of non-cooperative live
> migration, mentioned below. But as well as that, parts of the memory
> management are going to intersect with the secret hiding work that you
> *didn't* call out in this email AFAICT.

Yes, I omitted Secret Hiding. Agreed it ought to be tracked.

  Paul

> 
> 
> > *  Non-Cooperative Live Migration
> >   -  Paul Durrant
> >
> > === x86 ===
> >
> > *  Intel Processor Trace virtualization enabling (v1)
> >   -  Luwei Kang
> >
> > *  Linux stub domains (RFC v2)
> >   -  Marek Marczykowski-Górecki
> >
> > *  Fixes to #DB injection
> >   -  Andrew Cooper
> >
> > *  CPUID/MSR Xen/toolstack improvements
> >   -  Andrew Cooper
> >
> > *  Improvements to domain_crash()
> >   -  Andrew Cooper
> >
> > *  EIBRS
> >   -  Andrew Cooper
> >
> > *  Xen ioreq server (v3)
> >   -  Roger Pau Monne
> >
> > === ARM ===
> >
> > == Completed ==
> >
> >
> > Paul Durrant

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] Xen 4.14 dates

2020-01-29 Thread Durrant, Paul
> -Original Message-
> From: Jan Beulich 
> Sent: 29 January 2020 11:08
> To: Durrant, Paul ; xen-devel@lists.xenproject.org
> Subject: Re: [Xen-devel] Xen 4.14 dates
> 
> On 29.01.2020 11:55, Durrant, Paul wrote:
> >   I'm not aware on any prior discussion w.r.t. dates for Xen 4.14 so as
> RM I'd like to propose the following:
> >
> > Last posting: May 1st 2020
> > Hard Freeze: May 22nd 2020
> > Release: June 26th
> 
> Was 4.13 really more than 1.5 months late? The above would make
> its originally planned release date Oct 26th (the actual one was
> Dec 18th) with our current 8 month cadence.
> 
> Just wondering,

I found the last 4.13 update email from Juergen stating release on Nov 7th. 8 
months would put release on Jul 7th but I rounded down as I'm conscious that 
any slip is going to move things into holiday season.
Another option would be to slip out to ~1 year after the planned 4.13 dates.

  Paul

> Jan
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [ANNOUNCE] Xen 4.14 Development Update

2020-01-29 Thread Durrant, Paul
> -Original Message-
> From: Marek Marczykowski-Górecki 
> Sent: 29 January 2020 12:44
> To: xen-devel@lists.xenproject.org; Durrant, Paul 
> Cc: Woodhouse, David ; luwei.k...@intel.com;
> andrew.coop...@citrix.com; roger@citrix.com
> Subject: Re: [ANNOUNCE] Xen 4.14 Development Update
> 
> On Wed, Jan 29, 2020 at 12:36:18PM +, Paul Durrant wrote:
> > *  Linux stub domains (RFC v2)
> >   -  Marek Marczykowski-Górecki
> 
> There is v4 series already.
> 

Thanks. Will update for next time.

  Paul
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v6 2/4] mm: modify domain_adjust_tot_pages() to better handle a zero adjustment

2020-01-29 Thread Durrant, Paul
> -Original Message-
> From: Jan Beulich 
> Sent: 29 January 2020 15:08
> To: Durrant, Paul 
> Cc: xen-devel@lists.xenproject.org; Andrew Cooper
> ; George Dunlap ;
> Ian Jackson ; Julien Grall ;
> Konrad Rzeszutek Wilk ; Stefano Stabellini
> ; Wei Liu 
> Subject: Re: [PATCH v6 2/4] mm: modify domain_adjust_tot_pages() to better
> handle a zero adjustment
> 
> On 29.01.2020 15:38, Paul Durrant wrote:
> > --- a/xen/common/memory.c
> > +++ b/xen/common/memory.c
> > @@ -727,8 +727,7 @@ static long
> memory_exchange(XEN_GUEST_HANDLE_PARAM(xen_memory_exchange_t) arg)
> >   (j * (1UL << exch.out.extent_order)));
> >
> >  spin_lock(>page_alloc_lock);
> > -drop_dom_ref = (dec_count &&
> > -!domain_adjust_tot_pages(d, -
> dec_count));
> > +drop_dom_ref = !domain_adjust_tot_pages(d, -dec_count);
> 
> And it's only now that I see it in this shape that it becomes
> clear to me why the change above shouldn't be done, and why in
> your other patch code should be written similar to the above:
> The abstract model requires that the domain reference be
> dropped only when ->tot_pages _transitions_ to zero. No drop
> should occur if the count was already zero. Granted this may
> be technically impossible in the specific case here, but the
> code would still better reflect this general model, to prevent
> it getting (mis-)cloned into other places.
> 

Ok, I guess I'll drop this and then make sure that free_domheap_pages() avoids 
an erroneous ref drop.

  Paul
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v6 3/4] mm: make MEMF_no_refcount pages safe to assign

2020-01-29 Thread Durrant, Paul


> -Original Message-
> From: Jan Beulich 
> Sent: 29 January 2020 15:15
> To: Durrant, Paul 
> Cc: xen-devel@lists.xenproject.org; Andrew Cooper
> ; George Dunlap ;
> Ian Jackson ; Julien Grall ;
> Konrad Rzeszutek Wilk ; Stefano Stabellini
> ; Wei Liu ; Volodymyr Babchuk
> ; Roger Pau Monné 
> Subject: Re: [PATCH v6 3/4] mm: make MEMF_no_refcount pages safe to assign
> 
> On 29.01.2020 15:38, Paul Durrant wrote:
> > @@ -2371,6 +2383,8 @@ void free_domheap_pages(struct page_info *pg,
> unsigned int order)
> >
> >  if ( likely(d) && likely(d != dom_cow) )
> >  {
> > +long pages = 0;
> > +
> >  /* NB. May recursively lock from relinquish_memory(). */
> >  spin_lock_recursive(>page_alloc_lock);
> >
> > @@ -2386,9 +2400,11 @@ void free_domheap_pages(struct page_info *pg,
> unsigned int order)
> >  BUG();
> >  }
> >  arch_free_heap_page(d, [i]);
> > +if ( !(pg[i].count_info & PGC_no_refcount) )
> > +pages--;
> >  }
> >
> > -drop_dom_ref = !domain_adjust_tot_pages(d, -(1 << order));
> > +drop_dom_ref = !domain_adjust_tot_pages(d, pages);
> 
> Following from what I've just said on the previous patch, this needs
> further changing then as well. There'll need to be a per-domain
> "non-refcounted-pages" count, which - when transitioning from zero
> to non-zero is accompanied by obtaining a domain ref, and when
> transitioning back to zero causes this domain ref to be dropped.
> Otherwise, once the last ref-counted page was freed, the domain
> may become ready for final destruction, no matter how many non-
> refcounted pages there still are on its page lists. (An alternative
> model might be to include all pages in ->tot_pages, keep using just
> that for the domain ref acquire/release, and subtract the new
> count when e.g. comparing against ->max_pages.)

Yes, I think I'll adjust totpages unconditionally and then subtract the 
secondary count for comparison as it means I can leave the ref counting alone.

  Paul

> 
> Jan
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [ANNOUNCE] Xen 4.14 Development Update

2020-01-29 Thread Durrant, Paul
> -Original Message-
> From: Jan Beulich 
> Sent: 29 January 2020 12:48
> To: Durrant, Paul 
> Cc: xen-devel@lists.xenproject.org
> Subject: Re: [ANNOUNCE] Xen 4.14 Development Update
> 
> On 29.01.2020 13:36, Paul Durrant wrote:
> > === x86 ===
> >
> > *  Intel Processor Trace virtualization enabling (v1)
> >   -  Luwei Kang
> >
> > *  Linux stub domains (RFC v2)
> >   -  Marek Marczykowski-Górecki
> >
> > *  Fixes to #DB injection
> >   -  Andrew Cooper
> >
> > *  CPUID/MSR Xen/toolstack improvements
> >   -  Andrew Cooper
> >
> > *  Improvements to domain_crash()
> >   -  Andrew Cooper
> >
> > *  EIBRS
> >   -  Andrew Cooper
> >
> > *  Xen ioreq server (v3)
> >   -  Roger Pau Monne
> 
> Do you want to add "x86/HVM: implement memory read caching"
> (https://lists.xenproject.org/archives/html/xen-devel/2018-
> 09/msg00976.html)
> here? I think I now have something coming a little closer to
> what Andrew wants. It has its own downsides, so there being a
> v4 (after well over a year) doesn't mean this will get it any
> closer to going in.

It sounds like something that is reasonable to track. I can add it in (starting 
at v3, if v4 has not been posted by that point)

> 
> Do you also want to add the ongoing x86 insn emulator work
> here? Some parts were posted, some parts are still in needed
> of finding time to carry out, and some parts are still pretty
> vague.

Ok. It's on-going work so it may as well be tracked.

  Paul

> 
> Jan
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] docs: Fix StudlyCaps in libxc-migration-stream.pandoc and xl.1.pod

2020-01-27 Thread Durrant, Paul
> -Original Message-
> From: Ian Jackson 
> Sent: 27 January 2020 16:46
> To: xen-devel@lists.xenproject.org
> Cc: Durrant, Paul ; Ian Jackson
> ; Wei Liu ; Andrew Cooper
> ; George Dunlap ;
> Jan Beulich ; Julien Grall ; Konrad
> Rzeszutek Wilk ; Stefano Stabellini
> 
> Subject: [PATCH] docs: Fix StudlyCaps in libxc-migration-stream.pandoc and
> xl.1.pod
> 
> $ git-grep libxenctrl | wc -l
> 99
> $ git-grep libxc | wc -l
> 206
> $ git-grep libxenlight | wc -l
> 48
> $ git-grep libxl | wc -l
> 13433
> $ git-grep LibXen | wc -l
> 2
> $
> 
> Reported-by: Paul Durrant 
> Signed-off-by: Ian Jackson 
> ---
>  docs/man/xl.1.pod.in | 2 +-
>  docs/specs/libxc-migration-stream.pandoc | 2 +-

What about docs/specs/libxl-migration-stream.pandoc? It could use a similar fix 
while you're at it.

  Paul

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH resend] docs: add DIRECTORY_PART specification do xenstore protocol doc

2020-01-27 Thread Durrant, Paul
> -Original Message-
> From: Xen-devel  On Behalf Of
> Juergen Gross
> Sent: 27 January 2020 16:51
> To: xen-devel@lists.xenproject.org
> Cc: Juergen Gross ; Stefano Stabellini
> ; Julien Grall ; Wei Liu
> ; Konrad Rzeszutek Wilk ; George
> Dunlap ; Andrew Cooper
> ; Ian Jackson 
> Subject: [Xen-devel] [PATCH resend] docs: add DIRECTORY_PART specification
> do xenstore protocol doc
> 
> DIRECTORY_PART was missing in docs/misc/xenstore.txt. Add it.
> 
> Signed-off-by: Juergen Gross 

Reviewed-by: Paul Durrant 

> ---
>  docs/misc/xenstore.txt | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/docs/misc/xenstore.txt b/docs/misc/xenstore.txt
> index ae1b6a8c6e..65570183b6 100644
> --- a/docs/misc/xenstore.txt
> +++ b/docs/misc/xenstore.txt
> @@ -152,6 +152,15 @@ DIRECTORY|  leaf-name>|*
>   leafnames.  The resulting children are each named
>   /.
> 
> +DIRECTORY_PART   | | name>|*
> + Same as DIRECTORY, but to be used for children lists longer than
> + XENSTORE_PAYLOAD_MAX. Input are  and the byte offset into
> + the list of children to return. Return values are the generation
> + count  of the node (to be used to ensure the node hasn't
> + changed between two reads:  being the same for multiple
> + reads guarantees the node hasn't changed) and the list of children
> + starting at the specified  of the complete list.
> +
>  GET_PERMS| |+
>  SET_PERMS||+?
>is one of the following
> --
> 2.16.4
> 
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v4 4/7] x86 / vmx: move teardown from domain_destroy()...

2020-01-28 Thread Durrant, Paul
> -Original Message-
> From: Jan Beulich 
> Sent: 28 January 2020 08:15
> To: Durrant, Paul 
> Cc: xen-devel@lists.xenproject.org; Jun Nakajima ;
> Kevin Tian ; Andrew Cooper
> ; Wei Liu ; Roger Pau Monné
> ; George Dunlap 
> Subject: Re: [PATCH v4 4/7] x86 / vmx: move teardown from
> domain_destroy()...
> 
> On 24.01.2020 16:31, Paul Durrant wrote:
> > ... to domain_relinquish_resources().
> >
> > The teardown code frees the APICv page. This does not need to be done
> late
> > so do it in domain_relinquish_resources() rather than domain_destroy().
> 
> For the normal domain destruction path this is fine. For the error path
> of domain_create(), however, this will leak the page, as in this case
> hvm_domain_relinquish_resources() won't be called.

Well it's really arch_domain_create() that's at fault but, yes that needs 
fixing.

> I'm afraid there
> already is a similar issue with e.g. viridian_domain_deinit(). I guess
> I'll make a patch.
> 

Ok, thanks.

  Paul

> Jan
> 
> > Signed-off-by: Paul Durrant 
> > ---
> > Cc: Jun Nakajima 
> > Cc: Kevin Tian 
> > Cc: Jan Beulich 
> > Cc: Andrew Cooper 
> > Cc: Wei Liu 
> > Cc: "Roger Pau Monné" 
> > Cc: George Dunlap 
> >
> > v4:
> >   - New in v4 (disaggregated from v3 patch #3)
> > ---
> >  xen/arch/x86/hvm/vmx/vmx.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/xen/arch/x86/hvm/vmx/vmx.c b/xen/arch/x86/hvm/vmx/vmx.c
> > index b262d38a7c..606f3dc2eb 100644
> > --- a/xen/arch/x86/hvm/vmx/vmx.c
> > +++ b/xen/arch/x86/hvm/vmx/vmx.c
> > @@ -419,7 +419,7 @@ static int vmx_domain_initialise(struct domain *d)
> >  return 0;
> >  }
> >
> > -static void vmx_domain_destroy(struct domain *d)
> > +static void vmx_domain_relinquish_resources(struct domain *d)
> >  {
> >  if ( !has_vlapic(d) )
> >  return;
> > @@ -2240,7 +2240,7 @@ static struct hvm_function_table __initdata
> vmx_function_table = {
> >  .cpu_up_prepare   = vmx_cpu_up_prepare,
> >  .cpu_dead = vmx_cpu_dead,
> >  .domain_initialise= vmx_domain_initialise,
> > -.domain_destroy   = vmx_domain_destroy,
> > +.domain_relinquish_resources = vmx_domain_relinquish_resources,
> >  .vcpu_initialise  = vmx_vcpu_initialise,
> >  .vcpu_destroy = vmx_vcpu_destroy,
> >  .save_cpu_ctxt= vmx_save_vmcs_ctxt,
> >

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] docs: retrospectively add XS_DIRECTORY_PART to the xenstore protocol...

2020-01-27 Thread Durrant, Paul
> -Original Message-
> From: Ian Jackson [mailto:ian.jack...@citrix.com]
> Sent: 27 January 2020 15:49
> To: Jürgen Groß 
> Cc: Durrant, Paul ; xen-devel@lists.xenproject.org;
> Andrew Cooper ; George Dunlap
> ; Jan Beulich ; Julien Grall
> ; Konrad Rzeszutek Wilk ; Stefano
> Stabellini ; Wei Liu 
> Subject: Re: [PATCH] docs: retrospectively add XS_DIRECTORY_PART to the
> xenstore protocol...
> 
> Jürgen Groß writes ("Re: [PATCH] docs: retrospectively add
> XS_DIRECTORY_PART to the xenstore protocol..."):
> > On 27.01.20 16:33, Ian Jackson wrote:
> > > Paul Durrant writes ("[PATCH] docs: retrospectively add
> XS_DIRECTORY_PART to the xenstore protocol..."):
> > >> ... specification.
> > >>
> > >> This was added by commit 0ca64ed8 "xenstore: add support for
> > >> reading directory with many children" but not added to the
> > >> specification at that point. A version of xenstored supporting the
> > >> command was first released in Xen 4.9.
> > >
> > > Thanks for documenting this.  A docs fix like this should be
> > > backported if it applies, IMO.
> > >
> > > Acked-by: Ian Jackson 
> > > Backport: 4.9+
> > >
> > > I will commit it to staging momentarily.
> > >
> > >> +DIRECTORY_PART  | |*
> > >> +Performs the same function as DIRECTORY, but returns a
> > >> +sub-list of children starting at  in the overall
> > >> +child list and less than or equal to XENSTORE_PAYLOAD_MAX
> > >> +octets in length. If  is beyond the end of the
> > >> +overall child list then the returned sub-list will be
> > >> +empty.
> > >
> > > I wonder if it should be somehow made more explicit that `index' is
> > > a count of directory entries, not bytes.  Maybe this is obvious.
> >
> > But this is wrong. It is bytes, and the generation count returned is
> > missing (see my original patch back in 2017).
> 
> Sorry for being too quick.  I have reverted my commit.
> 

Since I got it wrong, I suggest just taking Juergen's original text (which I 
was unaware of before). It seems ok to me.

  Paul

> Ian.

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v4 1/2] docs/designs: Add a design document for non-cooperative live migration

2020-01-30 Thread Durrant, Paul
> -Original Message-
> From: Andrew Cooper 
> Sent: 29 January 2020 19:47
> To: Durrant, Paul ; xen-devel@lists.xenproject.org
> Cc: George Dunlap ; Ian Jackson
> ; Jan Beulich ; Julien Grall
> ; Konrad Rzeszutek Wilk ; Stefano
> Stabellini ; Wei Liu 
> Subject: Re: [PATCH v4 1/2] docs/designs: Add a design document for non-
> cooperative live migration
> 
> On 29/01/2020 14:47, Paul Durrant wrote:
> > diff --git a/docs/designs/non-cooperative-migration.md
> b/docs/designs/non-cooperative-migration.md
> > new file mode 100644
> > index 00..5db3939db5
> > --- /dev/null
> > +++ b/docs/designs/non-cooperative-migration.md
> > @@ -0,0 +1,272 @@
> > +# Non-Cooperative Migration of Guests on Xen
> > +
> > +## Background
> > +
> > +The normal model of migration in Xen is driven by the guest because it
> was
> > +originally implemented for PV guests, where the guest must be aware it
> is
> > +running under Xen and is hence expected to co-operate.
> 
> For PV guests, is more than "expected to co-operate".
> 
> Migrating a PV guest involves rewriting every pagetable entry with a
> different MFN, so even before you consider things like the PV protocols,
> there is no way this could be done without the cooperation of the guest.

Yes, the P2M will change and this is visible to the guest, but does a PV guest 
need to take action when this occurs? I'm not sure.

> 
> Sadly, this fact was depended upon for migration of the PV protocols,
> and has migrated (excuse the pun) into the HVM world as well.
> 

Alas yes.

> > This model dates from
> > +an era when it was assumed that the host administrator had control of
> at least
> > +the privileged software running in the guest (i.e. the guest kernel)
> which may
> > +still be true in an enterprise deployment but is not generally true in
> a cloud
> > +environment.
> 
> I haven't seen it discussed elsewhere, but even enterprise environments
> have problems.
> 
> Having host admin == guest admin doesn't mean that guest drivers aren't
> buggy, or that the VM doesn't explode on migrate.

No, but at least the host admin has a chance to test and update guest software 
to be 'reasonably' confident that migration will work before employing it en 
masse.

> 
> The simple fact is that involving the guest kernel adds unnecessary
> moving parts which can (and do with a non-zero probability) go wrong.
> 

Yes, having written the frontend side of migration in the Windows drivers it is 
*very* hard to get right, particularly in Windows where one has to deal with 
the complex and asynchronous PnP subsystem colliding with a migration. The 
network driver also requires a multi-reader/single-writer lock with odd 
semantics (w.r.t. to IRQL) which I had to code myself 
(https://xenbits.xen.org/gitweb/?p=pvdrivers/win/xenvif.git;a=blob;f=src/xenvif/mrsw.h).
 It took years of fixing subtle races (in that and elsewhere) to get to the 
(AFAIK) reliable code we have now. 
Avoiding execution of code like this (in all OS) certainly avoids the 
opportunity for subtle bugs to manifest themselves.

> >  The aim of this design is to provide a model which is purely host
> > +driven, requiring no co-operation from the software running in the
> > +guest, and is thus suitable for cloud scenarios.
> > +
> > +PV guests are out of scope for this project because, as is outlined
> above, they
> > +have a symbiotic relationship with the hypervisor and therefore a
> certain level
> > +of co-operation can be assumed.
> 
> If nothing else, I'd at least suggest s/can be assumed/is necessary/.

Ok. I'll make that modification.

> 
> > +Because the service domain’s domid is used directly by the guest in
> setting
> > +up grant entries and event channels, the backend drivers in the new
> host
> > +environment must be provided by service domain with the same domid.
> Also,
> > +because the guest can sample its own domid from the frontend area and
> use it in
> > +hypercalls (e.g. HVMOP_set_param) rather than DOMID_SELF, the guest
> domid must
> > +also be preserved to maintain the ABI.
> 
> Has this been true since forever?  The grant and event APIs took some
> care to avoid the guest needing to know its own domid.
> 

The guest doesn't need to know its domid; DOMID_SELF will work, but the guest 
*can* use its own domid in this case (whereas I think grant and event ops will 
insist on DOMID_SELF unless referring to another domain). As far as I know this 
has been the case since forever and so I don't think it is something we can 
change now unless we move to a new ABI.

> > +
> > +Furthermore, it will necessary to modify backend drivers to r

Re: [Xen-devel] [PATCH v5 10/12] x86/hyperv: provide percpu hypercall input page

2020-01-30 Thread Durrant, Paul
> -Original Message-
> From: Xen-devel  On Behalf Of Wei
> Liu
> Sent: 29 January 2020 20:21
> To: Xen Development List 
> Cc: Wei Liu ; Wei Liu ; Andrew Cooper
> ; Durrant, Paul ;
> Michael Kelley ; Roger Pau Monné
> 
> Subject: [Xen-devel] [PATCH v5 10/12] x86/hyperv: provide percpu hypercall
> input page
> 
> Hyper-V's input / output argument must be 8 bytes aligned an not cross
> page boundary. One way to satisfy those requirements is to use percpu
> page.
> 
> For the foreseeable future we only need to provide input for TLB
> and APIC hypercalls, so skip setting up an output page.
> 
> We will also need to provide an ap_setup hook for secondary cpus to
> setup its own input page.
> 
> Signed-off-by: Wei Liu 

Reviewed-by: Paul Durrant 

> ---
> v5:
> 1. Adjust to new ap_setup
> 2. Change variable name to hv_pcpu_input_page
> 
> v4:
> 1. Change wording in commit message
> 2. Prevent leak
> 3. Introduce a private header
> 
> v3:
> 1. Use xenheap page instead
> 2. Drop page tracking structure
> 3. Drop Paul's review tag
> ---
>  xen/arch/x86/guest/hyperv/hyperv.c  | 31 +
>  xen/arch/x86/guest/hyperv/private.h | 29 +++
>  2 files changed, 60 insertions(+)
>  create mode 100644 xen/arch/x86/guest/hyperv/private.h
> 
> diff --git a/xen/arch/x86/guest/hyperv/hyperv.c
> b/xen/arch/x86/guest/hyperv/hyperv.c
> index 4387b6541e..f0facccbad 100644
> --- a/xen/arch/x86/guest/hyperv/hyperv.c
> +++ b/xen/arch/x86/guest/hyperv/hyperv.c
> @@ -27,7 +27,10 @@
>  #include 
>  #include 
> 
> +#include "private.h"
> +
>  struct ms_hyperv_info __read_mostly ms_hyperv;
> +DEFINE_PER_CPU_READ_MOSTLY(void *, hv_pcpu_input_page);
> 
>  static uint64_t generate_guest_id(void)
>  {
> @@ -127,14 +130,42 @@ static void __init setup_hypercall_page(void)
>  }
>  }
> 
> +static int setup_hypercall_pcpu_arg(void)
> +{
> +void *mapping;
> +
> +if ( this_cpu(hv_pcpu_input_page) )
> +return 0;
> +
> +mapping = alloc_xenheap_page();
> +if ( !mapping )
> +{
> +printk("Failed to allocate hypercall input page for CPU%u\n",
> +   smp_processor_id());
> +return -ENOMEM;
> +}
> +
> +this_cpu(hv_pcpu_input_page) = mapping;
> +
> +return 0;
> +}
> +
>  static void __init setup(void)
>  {
>  setup_hypercall_page();
> +if ( setup_hypercall_pcpu_arg() )
> +panic("Hypercall percpu arg setup failed\n");
> +}
> +
> +static int ap_setup(void)
> +{
> +return setup_hypercall_pcpu_arg();
>  }
> 
>  static const struct hypervisor_ops ops = {
>  .name = "Hyper-V",
>  .setup = setup,
> +.ap_setup = ap_setup,
>  };
> 
>  static void __maybe_unused build_assertions(void)
> diff --git a/xen/arch/x86/guest/hyperv/private.h
> b/xen/arch/x86/guest/hyperv/private.h
> new file mode 100644
> index 00..a339274985
> --- /dev/null
> +++ b/xen/arch/x86/guest/hyperv/private.h
> @@ -0,0 +1,29 @@
> +/
> **
> + * arch/x86/guest/hyperv/private.h
> + *
> + * Definitions / declarations only useful to Hyper-V code.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; If not, see <http://www.gnu.org/licenses/>.
> + *
> + * Copyright (c) 2020 Microsoft.
> + */
> +
> +#ifndef __XEN_HYPERV_PRIVIATE_H__
> +#define __XEN_HYPERV_PRIVIATE_H__
> +
> +#include 
> +
> +DECLARE_PER_CPU(void *, hv_pcpu_input_page);
> +
> +#endif /* __XEN_HYPERV_PRIVIATE_H__  */
> --
> 2.20.1
> 
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v4 2/2] docs/designs: Add a design document for migration of xenstore data

2020-01-30 Thread Durrant, Paul
> -Original Message-
> From: Marek Marczykowski-Górecki 
> Sent: 29 January 2020 21:23
> To: Durrant, Paul 
> Cc: xen-devel@lists.xenproject.org; Stefano Stabellini
> ; Julien Grall ; Wei Liu
> ; Konrad Rzeszutek Wilk ; George
> Dunlap ; Andrew Cooper
> ; Ian Jackson 
> Subject: Re: [Xen-devel] [PATCH v4 2/2] docs/designs: Add a design
> document for migration of xenstore data
> 
> On Wed, Jan 29, 2020 at 02:47:02PM +, Paul Durrant wrote:
> 
> 
> 
> > +**node data**
> > +
> > +
> > +`|||`
> > +
> > +
> > +`` is considered relative to the domain path
> `/local/domain/$domid`
> > +and hence must not begin with `/`.
> 
> How backend settings are going to be serialized?

They're not going to be. The toolstack will construct new backends; 
co-operation will be required from them in that they must support re-binding 
(which the latest versions of blkback and netback do). We will need to 
white-list backends that are known to do this and refuse non-cooperative 
migration if any other backend is use.

> For example vif backend
> has a bunch of feature-* entries, which should not change under the
> guest feet during non-cooperative migration.
> 

The frontend will normally come back in 'connected' state, in which case a 
change in any feature flags will be irrelevant until the next (if any) 
re-connection (since the protocols only sample them at that point). If the 
frontend is not connected then it will sample the feature flags in the usual 
way. If the frontend/backend are in transition then the locking in the backend 
should prevent the 'unbind' from completing until some level of stability has 
been reached.

> > +`` and `` should be suitable to formulate a `WRITE`
> operation
> > +to the receiving xenstore and `` should be similarly
> suitable
> > +to formulate a subsequent `SET_PERMS` operation.
> > +
> > +**watch data**
> > +
> > +
> > +`||`
> > +
> > +`` again is considered relative and, together with ``,
> should
> > +be suitable to formulate an `ADD_DOMAIN_WATCHES` operation (see below).
> > +
> > +
> > +### Protocol Extension
> > +
> > +The `WATCH` operation does not allow specification of a ``; it
> is
> > +assumed that the watch pertains to the domain that owns the shared ring
> > +over which the operation is passed. Hence, for the tool-stack to be
> able
> > +to register a watch on behalf of a domain a new operation is needed:
> > +
> > +```
> > +ADD_DOMAIN_WATCHES  ||+
> > +
> > +Adds watches on behalf of the specified domain.
> > +
> > + is a NUL separated tuple of |. The semantics of
> this
> > +operation are identical to the domain issuing WATCH || for
> > +each .
> > +```
> 
> Normal WATCH operation triggers an event immediately. Is it intended in
> this case too? On the other hand, guest should cope with spurious watch
> events, so probably not an issue.

I don't think it matters if one is triggered or not. As you say, the watch 
protocol allows them to spuriously fire so the guest should cope either way.

> 
> > +The watch information for a domain also needs to be extracted from the
> > +sending xenstored so the following operation is also needed:
> > +
> > +```
> > +GET_DOMAIN_WATCHES  |   ||*
> > +
> > +Gets the list of watches that are currently registered for the domain.
> > +
> > + is a NUL separated tuple of |. The sub-list
> returned
> > +will start at  into the the overall list of watches and may be
> > +truncated such that the returned data fits within XENSTORE_PAYLOAD_MAX.
> > +If  is beyond the end of the overall list then the returned sub-
> > +list will be empty. If the value of  changes then it indicates
> > +that the overall watch list has changed and thus it may be necessary
> > +to re-issue the operation for previous values of .
> > +```
> 
> In what units  is expressed? bytes? entries?

I thought entries was reasonably well implied since I refer to a list, but I 
can make it explicit.

> Can the response be truncated at arbitrary place, or only between
> records?
> 

Only between records. Again I'll add some words to make that clear.

  Paul

> --
> Best Regards,
> Marek Marczykowski-Górecki
> Invisible Things Lab
> A: Because it messes up the order in which people normally read text.
> Q: Why is top-posting such a bad thing?
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 07/12] x86/hyperv: setup hypercall page

2020-01-30 Thread Durrant, Paul
> -Original Message-
> From: Wei Liu  On Behalf Of Wei Liu
> Sent: 29 January 2020 20:20
> To: Xen Development List 
> Cc: Durrant, Paul ; Michael Kelley
> ; Wei Liu ; Jan Beulich
> ; Andrew Cooper ; Wei Liu
> ; Roger Pau Monné 
> Subject: [PATCH v5 07/12] x86/hyperv: setup hypercall page
> 
> Hyper-V uses a technique called overlay page for its hypercall page. It
> will insert a backing page to the guest when the hypercall functionality
> is enabled. That means we can use a page that is not backed by real
> memory for hypercall page.
> 
> Use the top-most addressable page for that purpose. Adjust e820 code
> accordingly.
> 
> We also need to register Xen's guest OS ID to Hyper-V. Use 0x3 as the
> vendor ID. Fix the comment in hyperv-tlfs.h while at it.
> 
> Signed-off-by: Wei Liu 
> ---
> v5:
> 1. use hypervisor_reserve_top_pages
> 2. add a macro for hypercall page mfn
> 3. address other misc comments
> 
> v4:
> 1. Use fixmap
> 2. Follow routines listed in TLFS
> ---
>  xen/arch/x86/e820.c |  5 +++
>  xen/arch/x86/guest/hyperv/hyperv.c  | 57 +++--
>  xen/include/asm-x86/guest/hyperv-tlfs.h |  5 ++-
>  xen/include/asm-x86/guest/hyperv.h  |  3 ++
>  4 files changed, 65 insertions(+), 5 deletions(-)
> 
> diff --git a/xen/arch/x86/e820.c b/xen/arch/x86/e820.c
> index 3892c9cfb7..99643f3ea0 100644
> --- a/xen/arch/x86/e820.c
> +++ b/xen/arch/x86/e820.c
> @@ -343,6 +343,7 @@ static unsigned long __init find_max_pfn(void)
>  {
>  unsigned int i;
>  unsigned long max_pfn = 0;
> +unsigned long top_pfn = ((1ull << paddr_bits) - 1) >> PAGE_SHIFT;
> 
>  for (i = 0; i < e820.nr_map; i++) {
>  unsigned long start, end;
> @@ -357,6 +358,10 @@ static unsigned long __init find_max_pfn(void)
>  max_pfn = end;
>  }
> 
> +top_pfn -= hypervisor_reserve_top_pages();
> +if ( max_pfn >= top_pfn )
> +max_pfn = top_pfn;
> +
>  return max_pfn;
>  }
> 
> diff --git a/xen/arch/x86/guest/hyperv/hyperv.c
> b/xen/arch/x86/guest/hyperv/hyperv.c
> index 8d38313d7a..2bedcc438c 100644
> --- a/xen/arch/x86/guest/hyperv/hyperv.c
> +++ b/xen/arch/x86/guest/hyperv/hyperv.c
> @@ -19,15 +19,26 @@
>   * Copyright (c) 2019 Microsoft.
>   */
>  #include 
> +#include 
> 
> +#include 
>  #include 
>  #include 
> +#include 
> 
>  struct ms_hyperv_info __read_mostly ms_hyperv;
> 
> -static const struct hypervisor_ops ops = {
> -.name = "Hyper-V",
> -};
> +static uint64_t generate_guest_id(void)
> +{
> +uint64_t id;
> +
> +id = (uint64_t)HV_XEN_VENDOR_ID << 48;
> +id |= (xen_major_version() << 16) | xen_minor_version();
> +
> +return id;

I think this should use the hv_guest_os_id union. You can then set the values 
using the bit-fields and return the raw.

  Paul

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 08/12] x86/hyperv: provide Hyper-V hypercall functions

2020-01-30 Thread Durrant, Paul
> -Original Message-
> From: Xen-devel  On Behalf Of Wei
> Liu
> Sent: 29 January 2020 20:21
> To: Xen Development List 
> Cc: Stefano Stabellini ; Wei Liu
> ; Wei Liu ; Konrad Rzeszutek Wilk
> ; George Dunlap ;
> Andrew Cooper ; Durrant, Paul
> ; Ian Jackson ; Michael
> Kelley ; Julien Grall ; Roger Pau
> Monné 
> Subject: [Xen-devel] [PATCH v5 08/12] x86/hyperv: provide Hyper-V
> hypercall functions
> 
> These functions will be used later to make hypercalls to Hyper-V.
> 
> Signed-off-by: Wei Liu 

Reviewed-by: Paul Durrant 

> ---
> v5:
> 1. Switch back to direct call
> 2. Fix some issues pointed out by Jan
> 
> I tried using the asm(".equ ..") trick but hit a problem with %c again.
> 
> mm.c:5736:5: error: invalid 'asm': operand is not a condition code,
> invalid operand code 'c'
>asm ( ".equ HV_HCALL_PAGE, %c0; .global HV_HCALL_PAGE"
> ---
>  MAINTAINERS  |  1 +
>  xen/arch/x86/guest/hyperv/hyperv.c   |  6 ++
>  xen/arch/x86/xen.lds.S   |  4 +
>  xen/include/asm-x86/fixmap.h |  3 +-
>  xen/include/asm-x86/guest/hyperv-hcall.h | 96 
>  5 files changed, 108 insertions(+), 2 deletions(-)
>  create mode 100644 xen/include/asm-x86/guest/hyperv-hcall.h
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 04d91482cd..d0a5ed635b 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -519,6 +519,7 @@ S:Supported
>  F:   xen/arch/x86/guest/hyperv/
>  F:   xen/arch/x86/hvm/viridian/
>  F:   xen/include/asm-x86/guest/hyperv.h
> +F:   xen/include/asm-x86/guest/hyperv-hcall.h
>  F:   xen/include/asm-x86/guest/hyperv-tlfs.h
>  F:   xen/include/asm-x86/hvm/viridian.h
> 
> diff --git a/xen/arch/x86/guest/hyperv/hyperv.c
> b/xen/arch/x86/guest/hyperv/hyperv.c
> index 2bedcc438c..932a648ff7 100644
> --- a/xen/arch/x86/guest/hyperv/hyperv.c
> +++ b/xen/arch/x86/guest/hyperv/hyperv.c
> @@ -123,6 +123,12 @@ static const struct hypervisor_ops ops = {
>  .setup = setup,
>  };
> 
> +static void __maybe_unused build_assertions(void)
> +{
> +/* We use 1 in linker script */
> +BUILD_BUG_ON(FIX_X_HYPERV_HCALL != 1);
> +}
> +
>  /*
>   * Local variables:
>   * mode: C
> diff --git a/xen/arch/x86/xen.lds.S b/xen/arch/x86/xen.lds.S
> index 97f9c07891..8e02b4c648 100644
> --- a/xen/arch/x86/xen.lds.S
> +++ b/xen/arch/x86/xen.lds.S
> @@ -329,6 +329,10 @@ SECTIONS
>efi = .;
>  #endif
> 
> +#ifdef CONFIG_HYPERV_GUEST
> +  hv_hcall_page = ABSOLUTE(__fix_x_to_virt(1));
> +#endif
> +
>/* Sections to be discarded */
>/DISCARD/ : {
> *(.exit.text)
> diff --git a/xen/include/asm-x86/fixmap.h b/xen/include/asm-x86/fixmap.h
> index 8094546b75..a9bcb068cb 100644
> --- a/xen/include/asm-x86/fixmap.h
> +++ b/xen/include/asm-x86/fixmap.h
> @@ -16,6 +16,7 @@
> 
>  #define FIXADDR_TOP (VMAP_VIRT_END - PAGE_SIZE)
>  #define FIXADDR_X_TOP (XEN_VIRT_END - PAGE_SIZE)
> +#define __fix_x_to_virt(x) (FIXADDR_X_TOP - ((x) << PAGE_SHIFT))
> 
>  #ifndef __ASSEMBLY__
> 
> @@ -110,8 +111,6 @@ extern void __set_fixmap_x(
> 
>  #define clear_fixmap_x(idx) __set_fixmap_x(idx, 0, 0)
> 
> -#define __fix_x_to_virt(x) (FIXADDR_X_TOP - ((x) << PAGE_SHIFT))
> -
>  #define fix_x_to_virt(x)   ((void *)__fix_x_to_virt(x))
> 
>  #endif /* __ASSEMBLY__ */
> diff --git a/xen/include/asm-x86/guest/hyperv-hcall.h b/xen/include/asm-
> x86/guest/hyperv-hcall.h
> new file mode 100644
> index 00..5b7509b3b5
> --- /dev/null
> +++ b/xen/include/asm-x86/guest/hyperv-hcall.h
> @@ -0,0 +1,96 @@
> +/
> **
> + * asm-x86/guest/hyperv-hcall.h
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms and conditions of the GNU General Public
> + * License, version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public
> + * License along with this program; If not, see
> <http://www.gnu.org/licenses/>.
> + *
> + * Copyright (c) 2019 Microsoft.
> + */
> +
> +#ifndef __X86_HYPERV_HCALL_H__
> +#define __X86_HYPERV_HCALL_H__
> +
> +#include 
> +#include 
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +static inline uint64_t

Re: [Xen-devel] [PATCH v5 01/12] MAINTAINERS: put Hyper-V code under Viridian maintainership

2020-01-30 Thread Durrant, Paul
> -Original Message-
> From: Xen-devel  On Behalf Of Wei
> Liu
> Sent: 29 January 2020 20:20
> To: Xen Development List 
> Cc: Stefano Stabellini ; Wei Liu
> ; Wei Liu ; Konrad Rzeszutek Wilk
> ; George Dunlap ;
> Andrew Cooper ; Durrant, Paul
> ; Ian Jackson ; Michael
> Kelley ; Julien Grall 
> Subject: [Xen-devel] [PATCH v5 01/12] MAINTAINERS: put Hyper-V code under
> Viridian maintainership
> 
> And add myself as a maintainer.
> 
> Sort the list alphabetically while at it.
> 
> Signed-off-by: Wei Liu 
> Signed-off-by: Wei Liu 

Reviewed-by: Paul Durrant 

> ---
>  MAINTAINERS | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 1915e09f8b..04d91482cd 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -514,10 +514,13 @@ F:  xen/arch/x86/mm/shadow/
> 
>  X86 VIRIDIAN ENLIGHTENMENTS
>  M:   Paul Durrant 
> +M:   Wei Liu 
>  S:   Supported
> +F:   xen/arch/x86/guest/hyperv/
>  F:   xen/arch/x86/hvm/viridian/
> -F:   xen/include/asm-x86/hvm/viridian.h
> +F:   xen/include/asm-x86/guest/hyperv.h
>  F:   xen/include/asm-x86/guest/hyperv-tlfs.h
> +F:   xen/include/asm-x86/hvm/viridian.h
> 
>  XENTRACE
>  M:   George Dunlap 
> --
> 2.20.1
> 
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v7 2/3] mm: make pages allocated with MEMF_no_refcount safe to assign

2020-01-30 Thread Durrant, Paul
> -Original Message-
> From: Jan Beulich 
> Sent: 30 January 2020 10:20
> To: Durrant, Paul 
> Cc: xen-devel@lists.xenproject.org; Andrew Cooper
> ; George Dunlap ;
> Ian Jackson ; Julien Grall ;
> Konrad Rzeszutek Wilk ; Stefano Stabellini
> ; Wei Liu ; Volodymyr Babchuk
> ; Roger Pau Monné 
> Subject: Re: [PATCH v7 2/3] mm: make pages allocated with MEMF_no_refcount
> safe to assign
> 
> On 29.01.2020 18:10, Paul Durrant wrote:
> > NOTE: steal_page() is also modified to decrement extra_pages in the case
> of
> >   a PGC_extra page being stolen from a domain.
> 
> I don't think stealing of such pages should be allowed. If anything,
> the replacement page then again should be an "extra" one, which I
> guess would be quite ugly to arrange for. But such "extra" pages
> aren't supposed to be properly exposed (and hence played with) to
> the domain in the first place.
> 
> > --- a/xen/common/page_alloc.c
> > +++ b/xen/common/page_alloc.c
> > @@ -2256,6 +2256,7 @@ int assign_pages(
> >  {
> >  int rc = 0;
> >  unsigned long i;
> > +unsigned int extra_pages = 0;
> >
> >  spin_lock(>page_alloc_lock);
> >
> > @@ -2267,13 +2268,19 @@ int assign_pages(
> >  goto out;
> >  }
> >
> > +for ( i = 0; i < (1 << order); i++ )
> > +if ( pg[i].count_info & PGC_extra )
> > +extra_pages++;
> 
> Perhaps assume (and maybe ASSERT()) that all pages in the batch
> are the same in this regard? Then you could ...
> 
> >  if ( !(memflags & MEMF_no_refcount) )
> >  {
> > -if ( unlikely((d->tot_pages + (1 << order)) > d->max_pages) )
> > +unsigned int max_pages = d->max_pages - d->extra_pages -
> extra_pages;
> > +
> > +if ( unlikely((d->tot_pages + (1 << order)) > max_pages) )
> >  {
> >  gprintk(XENLOG_INFO, "Over-allocation for domain %u: "
> >  "%u > %u\n", d->domain_id,
> > -d->tot_pages + (1 << order), d->max_pages);
> > +d->tot_pages + (1 << order), max_pages);
> >  rc = -E2BIG;
> >  goto out;
> >  }
> > @@ -2282,13 +2289,17 @@ int assign_pages(
> >  get_knownalive_domain(d);
> >  }
> >
> > +d->extra_pages += extra_pages;
> 
> ... arrange things like this, I think:
> 
> if ( pg[i].count_info & PGC_extra )
> d->extra_pages += 1U << order;
> else if ( !(memflags & MEMF_no_refcount) )
> {
> unsigned int max_pages = d->max_pages - d->extra_pages;
> ...
> 
> This would, afaict, then also eliminate the need to mask off
> MEMF_no_refcount in alloc_domheap_pages(), ...
> 
> 
> >  for ( i = 0; i < (1 << order); i++ )
> >  {
> > +unsigned long count_info = pg[i].count_info;
> > +
> >  ASSERT(page_get_owner([i]) == NULL);
> > -ASSERT(!pg[i].count_info);
> > +ASSERT(!(count_info & ~PGC_extra));
> 
> ... resulting in my prior comment on this one still applying.
> 
> Besides the changes you've made, what about the code handling
> XENMEM_set_pod_target? What about p2m-pod.c? And
> pv_shim_setup_dom()? I'm also not fully sure whether
> getdomaininfo() shouldn't subtract extra_pages, but I think
> this is the only way to avoid having an externally visible
> effect. There may be more. Perhaps it's best to introduce a
> domain_tot_pages() inline function returning the difference,
> and us it almost everywhere where ->tot_pages is used right
> now.

This is getting very very complicated now, which makes me think that my 
original approach using a 'normal' page and setting an initial max_pages in 
domain_create() was a better approach.

  Paul



___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v5 12/12] x86/hyperv: setup VP assist page

2020-01-30 Thread Durrant, Paul
> -Original Message-
> From: Wei Liu  On Behalf Of Wei Liu
> Sent: 29 January 2020 20:21
> To: Xen Development List 
> Cc: Durrant, Paul ; Michael Kelley
> ; Wei Liu ; Wei Liu
> ; Jan Beulich ; Andrew Cooper
> ; Roger Pau Monné 
> Subject: [PATCH v5 12/12] x86/hyperv: setup VP assist page
> 
> VP assist page is rather important as we need to toggle some bits in it
> for efficient nested virtualisation.
> 
> Signed-off-by: Wei Liu 
> ---
> v5:
> 1. Deal with error properly instead of always panicking
> 2. Swap percpu variables declarations' location
> 
> v4:
> 1. Use private.h
> 2. Prevent leak
> 
> v3:
> 1. Use xenheap page
> 2. Drop set_vp_assist
> 
> v2:
> 1. Use HV_HYP_PAGE_SHIFT instead
> ---
>  xen/arch/x86/guest/hyperv/hyperv.c  | 44 -
>  xen/arch/x86/guest/hyperv/private.h |  1 +
>  2 files changed, 44 insertions(+), 1 deletion(-)
> 
> diff --git a/xen/arch/x86/guest/hyperv/hyperv.c
> b/xen/arch/x86/guest/hyperv/hyperv.c
> index af0d6ed692..bc40a3d338 100644
> --- a/xen/arch/x86/guest/hyperv/hyperv.c
> +++ b/xen/arch/x86/guest/hyperv/hyperv.c
> @@ -31,6 +31,7 @@
> 
>  struct ms_hyperv_info __read_mostly ms_hyperv;
>  DEFINE_PER_CPU_READ_MOSTLY(void *, hv_pcpu_input_page);
> +DEFINE_PER_CPU_READ_MOSTLY(void *, hv_vp_assist);
>  DEFINE_PER_CPU_READ_MOSTLY(unsigned int, hv_vp_index);
> 
>  static uint64_t generate_guest_id(void)
> @@ -155,16 +156,57 @@ static int setup_hypercall_pcpu_arg(void)
>  return 0;
>  }
> 
> +static int setup_vp_assist(void)
> +{
> +void *mapping;
> +uint64_t val;
> +
> +mapping = this_cpu(hv_vp_assist);
> +
> +if ( !mapping )
> +{
> +mapping = alloc_xenheap_page();
> +if ( !mapping )
> +{
> +printk("Failed to allocate vp_assist page for CPU%u\n",
> +   smp_processor_id());
> +return -ENOMEM;
> +}
> +
> +clear_page(mapping);
> +this_cpu(hv_vp_assist) = mapping;
> +}
> +
> +val = (virt_to_mfn(mapping) << HV_HYP_PAGE_SHIFT)
> +| HV_X64_MSR_VP_ASSIST_PAGE_ENABLE;

Perhaps it would be neater to put the viridian_page_msr union into 
hyperv-tlfs.h and then use that.

  Paul

> +wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, val);
> +
> +return 0;
> +}
> +
>  static void __init setup(void)
>  {
>  setup_hypercall_page();
> +
>  if ( setup_hypercall_pcpu_arg() )
>  panic("Hypercall percpu arg setup failed\n");
> +
> +if ( setup_vp_assist() )
> +panic("VP assist page setup failed\n");
>  }
> 
>  static int ap_setup(void)
>  {
> -return setup_hypercall_pcpu_arg();
> +int rc;
> +
> +rc = setup_hypercall_pcpu_arg();
> +if ( rc )
> +goto out;
> +
> +rc = setup_vp_assist();
> +
> + out:
> +return rc;
>  }
> 
>  static const struct hypervisor_ops ops = {
> diff --git a/xen/arch/x86/guest/hyperv/private.h
> b/xen/arch/x86/guest/hyperv/private.h
> index c1c2431eff..fcddc47544 100644
> --- a/xen/arch/x86/guest/hyperv/private.h
> +++ b/xen/arch/x86/guest/hyperv/private.h
> @@ -25,6 +25,7 @@
>  #include 
> 
>  DECLARE_PER_CPU(void *, hv_pcpu_input_page);
> +DECLARE_PER_CPU(void *, hv_vp_assist);
>  DECLARE_PER_CPU(unsigned int, hv_vp_index);
> 
>  #endif /* __XEN_HYPERV_PRIVIATE_H__  */
> --
> 2.20.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v7 2/3] mm: make pages allocated with MEMF_no_refcount safe to assign

2020-01-30 Thread Durrant, Paul
> -Original Message-
> From: Jan Beulich 
> Sent: 30 January 2020 11:02
> To: Durrant, Paul 
> Cc: xen-devel@lists.xenproject.org; Andrew Cooper
> ; George Dunlap ;
> Ian Jackson ; Julien Grall ;
> Konrad Rzeszutek Wilk ; Stefano Stabellini
> ; Wei Liu ; Volodymyr Babchuk
> ; Roger Pau Monné 
> Subject: Re: [PATCH v7 2/3] mm: make pages allocated with MEMF_no_refcount
> safe to assign
> 
> (replying from seeing your reply on the list archives, i.e.
> threading lost/broken)
> 
> On 30.01.2020 10:40, Paul Durrant wrote:
> > This is getting very very complicated now, which makes me think that my
> > original approach using a 'normal' page and setting an initial max_pages
> in
> > domain_create() was a better approach.
> 
> I don't think so, no. I also don't thing auditing all ->{max,tot}_pages
> uses can be called "very very complicated". All I can say (again, I
> think) is that there was a reason this APIC page thing was done the
> way it was done. (It's another thing that this probably wasn't a
> _good_ reason.)
> 

I really want to get rid of shared xenheap pages though, so I will persist. 
I'll add the domain_tot_pages() helper as you suggest. I also agree that 
steal_page() ought not to encounter a PGC_extra page so I think I'll just make 
that an error case.

  Paul
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v9 0/4] purge free_shared_domheap_page()

2020-02-06 Thread Durrant, Paul
AFAICT these patches have the necessary A-b/R-b-s, or are there some missing 
that I need to chase?

  Paul

> -Original Message-
> From: Paul Durrant 
> Sent: 03 February 2020 10:57
> To: xen-devel@lists.xenproject.org
> Cc: Durrant, Paul ; Andrew Cooper
> ; George Dunlap ; Ian
> Jackson ; Julien Grall ; Jun
> Nakajima ; Konrad Rzeszutek Wilk
> ; Roger Pau Monné ; Stefano
> Stabellini ; Tim Deegan ; Volodymyr
> Babchuk ; Wei Liu 
> Subject: [PATCH v9 0/4] purge free_shared_domheap_page()
> 
> Paul Durrant (4):
>   x86 / vmx: move teardown from domain_destroy()...
>   add a domain_tot_pages() helper function
>   mm: make pages allocated with MEMF_no_refcount safe to assign
>   x86 / vmx: use a MEMF_no_refcount domheap page for
> APIC_DEFAULT_PHYS_BASE
> 
>  xen/arch/arm/arm64/domctl.c |  2 +-
>  xen/arch/x86/domain.c   |  2 +-
>  xen/arch/x86/hvm/vmx/vmx.c  | 25 ---
>  xen/arch/x86/mm.c   | 15 ++-
>  xen/arch/x86/mm/p2m-pod.c   | 10 ++---
>  xen/arch/x86/mm/shadow/common.c |  2 +-
>  xen/arch/x86/msi.c  |  2 +-
>  xen/arch/x86/numa.c |  2 +-
>  xen/arch/x86/pv/dom0_build.c| 25 ++-
>  xen/arch/x86/pv/domain.c|  2 +-
>  xen/arch/x86/pv/shim.c  |  4 +-
>  xen/common/domctl.c |  2 +-
>  xen/common/grant_table.c|  4 +-
>  xen/common/keyhandler.c |  2 +-
>  xen/common/memory.c |  2 +-
>  xen/common/page_alloc.c | 78 -
>  xen/include/asm-arm/mm.h|  5 ++-
>  xen/include/asm-x86/mm.h|  9 ++--
>  xen/include/public/memory.h |  4 +-
>  xen/include/xen/sched.h | 27 +---
>  20 files changed, 143 insertions(+), 81 deletions(-)
> ---
> Cc: Andrew Cooper 
> Cc: George Dunlap 
> Cc: Ian Jackson 
> Cc: Julien Grall 
> Cc: Jun Nakajima 
> Cc: Konrad Rzeszutek Wilk 
> Cc: "Roger Pau Monné" 
> Cc: Stefano Stabellini 
> Cc: Tim Deegan 
> Cc: Volodymyr Babchuk 
> Cc: Wei Liu 
> --
> 2.20.1

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v9 0/4] purge free_shared_domheap_page()

2020-02-06 Thread Durrant, Paul
> -Original Message-
> From: Jan Beulich 
> Sent: 06 February 2020 08:46
> To: Durrant, Paul 
> Cc: xen-devel@lists.xenproject.org; Stefano Stabellini
> ; Julien Grall ; Wei Liu
> ; Konrad Rzeszutek Wilk ; Andrew
> Cooper ; Ian Jackson
> ; George Dunlap ; Tim
> Deegan ; Jun Nakajima ; Volodymyr
> Babchuk ; Roger Pau Monné
> 
> Subject: Re: [Xen-devel] [PATCH v9 0/4] purge free_shared_domheap_page()
> 
> On 06.02.2020 09:28, Durrant, Paul wrote:
> > AFAICT these patches have the necessary A-b/R-b-s, or are there some
> missing that I need to chase?
> 
> According to my records ...
> 
> >> -Original Message-
> >> From: Paul Durrant 
> >> Sent: 03 February 2020 10:57
> >>
> >> Paul Durrant (4):
> >>   x86 / vmx: move teardown from domain_destroy()...
> >>   add a domain_tot_pages() helper function
> >>   mm: make pages allocated with MEMF_no_refcount safe to assign
> >>   x86 / vmx: use a MEMF_no_refcount domheap page for
> >> APIC_DEFAULT_PHYS_BASE
> >>
> >>  xen/arch/arm/arm64/domctl.c |  2 +-
> 
> ... this (Arm), ...
> 
> >>  xen/arch/x86/domain.c   |  2 +-
> >>  xen/arch/x86/hvm/vmx/vmx.c  | 25 ---
> 
> ... this (VMX), ...
> 
> >>  xen/arch/x86/mm.c   | 15 ++-
> >>  xen/arch/x86/mm/p2m-pod.c   | 10 ++---
> 
> ... this (MM), ...
> 
> >>  xen/arch/x86/mm/shadow/common.c |  2 +-
> 
> ... this (shadow), ...
> 
> >>  xen/arch/x86/msi.c  |  2 +-
> >>  xen/arch/x86/numa.c |  2 +-
> >>  xen/arch/x86/pv/dom0_build.c| 25 ++-
> >>  xen/arch/x86/pv/domain.c|  2 +-
> >>  xen/arch/x86/pv/shim.c  |  4 +-
> >>  xen/common/domctl.c |  2 +-
> >>  xen/common/grant_table.c|  4 +-
> >>  xen/common/keyhandler.c |  2 +-
> >>  xen/common/memory.c |  2 +-
> >>  xen/common/page_alloc.c | 78 -
> >>  xen/include/asm-arm/mm.h|  5 ++-
> 
> ... and this (Arm again). I think almost all are for patch 2, with
> an Arm one needed on patch 3. If I overlooked any, please point me
> at them.

Ok, thanks. Kevin has completed his acks (patches #1 and #4).

George, Julien, Tim,

  Can I have acks or otherwise, please?

  Paul

> 
> Jan
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v9 3/4] mm: make pages allocated with MEMF_no_refcount safe to assign

2020-02-06 Thread Durrant, Paul
> -Original Message-
> From: Julien Grall 
> Sent: 06 February 2020 10:04
> To: Durrant, Paul ; xen-devel@lists.xenproject.org
> Cc: Jan Beulich ; Andrew Cooper
> ; George Dunlap ;
> Ian Jackson ; Konrad Rzeszutek Wilk
> ; Stefano Stabellini ; Wei
> Liu ; Volodymyr Babchuk ; Roger
> Pau Monné 
> Subject: Re: [PATCH v9 3/4] mm: make pages allocated with MEMF_no_refcount
> safe to assign
> 
> Hi,
> 
> I am sorry to jump that late in the conversation.
> 
> On 03/02/2020 10:56, Paul Durrant wrote:
> >
> > -if ( unlikely(domain_adjust_tot_pages(d, 1 << order) == (1 <<
> order)) )
> > +if ( !(memflags & MEMF_no_refcount) &&
> > + unlikely(domain_adjust_tot_pages(d, 1 << order) == (1 <<
> order)) )
> >   get_knownalive_domain(d);
> > -}
> >
> >   for ( i = 0; i < (1 << order); i++ )
> >   {
> >   ASSERT(page_get_owner([i]) == NULL);
> > -ASSERT(!pg[i].count_info);
> >   page_set_owner([i], d);
> >   smp_wmb(); /* Domain pointer must be visible before updating
> refcnt. */
> > -pg[i].count_info = PGC_allocated | 1;
> > +pg[i].count_info =
> > +(pg[i].count_info & PGC_extra) | PGC_allocated | 1;
> 
> This is technically incorrect because we blindly assume the state of the
> page is inuse (which is thankfully equal to 0).

Assuming the page is inuse seems reasonable at this point.

> 
> See the discussion [1]. This is already an existing bug in the code base
> and I will be taking care of it.

Fair enough; it's a very long standing bug.

> However...
> 
> >   page_list_add_tail([i], >page_list);
> >   }
> >
> > @@ -2315,11 +2338,6 @@ struct page_info *alloc_domheap_pages(
> >
> >   if ( memflags & MEMF_no_owner )
> >   memflags |= MEMF_no_refcount;
> > -else if ( (memflags & MEMF_no_refcount) && d )
> > -{
> > -ASSERT(!(memflags & MEMF_no_refcount));
> > -return NULL;
> > -}
> >
> >   if ( !dma_bitsize )
> >   memflags &= ~MEMF_no_dma;
> > @@ -2332,11 +2350,23 @@ struct page_info *alloc_domheap_pages(
> > memflags, d)) == NULL)) )
> >return NULL;
> >
> > -if ( d && !(memflags & MEMF_no_owner) &&
> > - assign_pages(d, pg, order, memflags) )
> > +if ( d && !(memflags & MEMF_no_owner) )
> >   {
> > -free_heap_pages(pg, order, memflags & MEMF_no_scrub);
> > -return NULL;
> > +if ( memflags & MEMF_no_refcount )
> > +{
> > +unsigned long i;
> > +
> > +for ( i = 0; i < (1ul << order); i++ )
> > +{
> > +ASSERT(!pg[i].count_info);
> > +pg[i].count_info = PGC_extra;
> 
> ... this is pursuing the wrongness of the code above and not safe
> against offlining.
> 
> We could argue this is an already existing bug, however I am a bit
> unease to add more abuse in the code. Jan, what do you think?
> 

I'd consider a straightforward patch-clash. If this patch goes in after yours 
then it needs to be modified accordingly, or vice versa.

  Paul

> > +}
> > +}
> > +if ( assign_pages(d, pg, order, memflags) )
> > +{
> > +free_heap_pages(pg, order, memflags & MEMF_no_scrub);
> > +return NULL;
> > +}
> >   }
> 
> Cheers,
> 
> [1] https://lore.kernel.org/xen-devel/20200204133357.32101-1-
> jul...@xen.org/
> 
> --
> Julien Grall
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] xen/mm: Avoid assuming the page is inuse in assign_pages()

2020-02-06 Thread Durrant, Paul
> -Original Message-
> From: Julien Grall 
> Sent: 06 February 2020 11:17
> To: Durrant, Paul ; xen-devel@lists.xenproject.org
> Cc: Grall, Julien 
> Subject: Re: [PATCH v2] xen/mm: Avoid assuming the page is inuse in
> assign_pages()
> 
> Hi Paul,
> 
> On 06/02/2020 10:52, Durrant, Paul wrote:
> >> -Original Message-
> >> From: Julien Grall 
> >> Sent: 06 February 2020 10:39
> >> To: xen-devel@lists.xenproject.org
> >> Cc: jul...@xen.org; Durrant, Paul ; Grall,
> Julien
> >> 
> >> Subject: [PATCH v2] xen/mm: Avoid assuming the page is inuse in
> >> assign_pages()
> >>
> >> From: Julien Grall 
> >>
> >> At the moment, assign_pages() on the page to be inuse (PGC_state_inuse)
> >> and the state value to be 0.
> >>
> >> However, the code may race with the page offlining code (see
> >> offline_page()). Depending on the ordering, the page may be in
> offlining
> >> state (PGC_state_offlining) before it is assigned to a domain.
> >>
> >> On debug build, this may result to hit the assert or just clobber the
> >> state. On non-debug build, the state will get clobbered.
> >>
> >> Incidentally the flag PGC_broken will get clobbered as well.
> >>
> >> Grab the heap_lock to prevent a race with offline_page() and keep the
> >> state and broken flag around.
> >>
> >> Signed-off-by: Julien Grall 
> >
> > This seems like a reasonable change. I guess having assign_pages() take
> the global lock is no more problem than its existing call to
> domain_adjust_tot_pages() which also takes the same lock.
> 
> That's my understanding. Summarizing our discussion IRL for the other,
> it is not clear whether the lock is enough here.
> 
>  From my understanding the sequence
> 
> pg[i].count_info &= ...;
> pg[i].count_info |= ...;
> 
> could result to multiple read/write from the compiler. We could use a
> single assignment, but I still don't think this prevent the compiler to
> be use multiple read/write.
> 
> The concern would be a race with get_page_owner_and_reference(). If 1 is
> set before the rest of the bits, then you may be able to get the page.
> 
> So I might want to use write_atomic() below. Any opinion?
> 

TBH I wonder if we ought to say that any update to count_info ought to be done 
by a write_atomic (where it's not already done by cmpxchg).

  Paul

> Cheers,
> 
> --
> Julien Grall
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v2] xen/mm: Avoid assuming the page is inuse in assign_pages()

2020-02-06 Thread Durrant, Paul
> -Original Message-
> From: Julien Grall 
> Sent: 06 February 2020 10:39
> To: xen-devel@lists.xenproject.org
> Cc: jul...@xen.org; Durrant, Paul ; Grall, Julien
> 
> Subject: [PATCH v2] xen/mm: Avoid assuming the page is inuse in
> assign_pages()
> 
> From: Julien Grall 
> 
> At the moment, assign_pages() on the page to be inuse (PGC_state_inuse)
> and the state value to be 0.
> 
> However, the code may race with the page offlining code (see
> offline_page()). Depending on the ordering, the page may be in offlining
> state (PGC_state_offlining) before it is assigned to a domain.
> 
> On debug build, this may result to hit the assert or just clobber the
> state. On non-debug build, the state will get clobbered.
> 
> Incidentally the flag PGC_broken will get clobbered as well.
> 
> Grab the heap_lock to prevent a race with offline_page() and keep the
> state and broken flag around.
> 
> Signed-off-by: Julien Grall 

This seems like a reasonable change. I guess having assign_pages() take the 
global lock is no more problem than its existing call to 
domain_adjust_tot_pages() which also takes the same lock.

  Paul

> 
> ---
> Changes in v2:
> - Superseed <20200204133357.32101-1-jul...@xen.org>
> - Fix the race with offline_page()
> ---
>  xen/common/page_alloc.c | 16 ++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
> index 97902d42c1..a684dbf37c 100644
> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -2283,15 +2283,27 @@ int assign_pages(
>  get_knownalive_domain(d);
>  }
> 
> +spin_lock(_lock);
>  for ( i = 0; i < (1 << order); i++ )
>  {
> +/*
> + * We should only be here if the page is inuse or offlining.
> + * The latter happen if we race with mark_page_offline() as we
> + * don't hold the heap_lock.
> + */
> +ASSERT(page_state_is([i], inuse) ||
> +   page_state_is([i], offlining));
> +ASSERT(!(pg[i].count_info & ~(PGC_state | PGC_broken)));
>  ASSERT(page_get_owner([i]) == NULL);
> -ASSERT(!pg[i].count_info);
>  page_set_owner([i], d);
>  smp_wmb(); /* Domain pointer must be visible before updating
> refcnt. */
> -pg[i].count_info = PGC_allocated | 1;
> +
> +pg[i].count_info &= PGC_state | PGC_broken;
> +pg[i].count_info |= PGC_allocated | 1;
> +
>  page_list_add_tail([i], >page_list);
>  }
> +spin_unlock(_lock);
> 
>   out:
>  spin_unlock(>page_alloc_lock);
> --
> 2.17.1


___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v4 3/7] x86/HVM: introduce "curr" into hvmemul_rep_{mov, sto}s()

2020-02-06 Thread Durrant, Paul
> -Original Message-
> From: Xen-devel  On Behalf Of Jan
> Beulich
> Sent: 31 January 2020 16:43
> To: xen-devel@lists.xenproject.org
> Cc: George Dunlap ; Andrew Cooper
> ; Roger Pau Monné ; Wei
> Liu ; Paul Durrant 
> Subject: [Xen-devel] [PATCH v4 3/7] x86/HVM: introduce "curr" into
> hvmemul_rep_{mov, sto}s()
> 
> There are a number of uses of "current" already, and more may appear
> down the road. Latch into a local variable.
> 
> At this occasion also drop stray casts from code getting touched anyway.
> 
> Signed-off-by: Jan Beulich 

Reviewed-by: Paul Durrant 

...with one suggestion:

> ---
> v4: New.
> 
> --- a/xen/arch/x86/hvm/emulate.c
> +++ b/xen/arch/x86/hvm/emulate.c
> @@ -1747,7 +1747,8 @@ static int hvmemul_rep_movs(
>  {
>  struct hvm_emulate_ctxt *hvmemul_ctxt =
>  container_of(ctxt, struct hvm_emulate_ctxt, ctxt);
> -struct hvm_vcpu_io *vio = >arch.hvm.hvm_io;
> +struct vcpu *curr = current;
> +struct hvm_vcpu_io *vio = >arch.hvm.hvm_io;
>  unsigned long saddr, daddr, bytes;
>  paddr_t sgpa, dgpa;
>  uint32_t pfec = PFEC_page_present;
> @@ -1807,8 +1808,8 @@ static int hvmemul_rep_movs(
>  }
> 
>  /* Check for MMIO ops */
> -(void) get_gfn_query_unlocked(current->domain, sgpa >> PAGE_SHIFT,
> );
> -(void) get_gfn_query_unlocked(current->domain, dgpa >> PAGE_SHIFT,
> );
> +get_gfn_query_unlocked(curr->domain, sgpa >> PAGE_SHIFT, );
> +get_gfn_query_unlocked(curr->domain, dgpa >> PAGE_SHIFT, );

Since we have more than one 'curr->domain', I think it would be nice to a 
latched 'currd' domain pointer too.

  Paul

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v4 7/7] x86/HVM: reduce scope of pfec in hvm_emulate_init_per_insn()

2020-02-06 Thread Durrant, Paul
> -Original Message-
> From: Xen-devel  On Behalf Of Jan
> Beulich
> Sent: 31 January 2020 16:46
> To: xen-devel@lists.xenproject.org
> Cc: George Dunlap ; Andrew Cooper
> ; Roger Pau Monné ; Wei
> Liu ; Paul Durrant 
> Subject: [Xen-devel] [PATCH v4 7/7] x86/HVM: reduce scope of pfec in
> hvm_emulate_init_per_insn()
> 
> It needs calculating only in one out of three cases. Re-structure the
> code a little such that the variable truly gets calculated only when we
> don't get any insn bytes from elsewhere, and hence need to (try to)
> fetch them. Also OR in PFEC_insn_fetch right in the initializer.
> 
> While in this mood, restrict addr's scope as well.
> 
> Signed-off-by: Jan Beulich 

Reviewed-by: Paul Durrant 

> ---
> v4: New.
> 
> --- a/xen/arch/x86/hvm/emulate.c
> +++ b/xen/arch/x86/hvm/emulate.c
> @@ -2762,8 +2762,6 @@ void hvm_emulate_init_per_insn(
>  unsigned int insn_bytes)
>  {
>  struct vcpu *curr = current;
> -unsigned int pfec = PFEC_page_present;
> -unsigned long addr;
> 
>  hvmemul_ctxt->ctxt.lma = hvm_long_mode_active(curr);
> 
> @@ -2778,14 +2776,23 @@ void hvm_emulate_init_per_insn(
>  hvmemul_ctxt->seg_reg[x86_seg_ss].db ? 32 : 16;
>  }
> 
> -if ( hvmemul_ctxt->seg_reg[x86_seg_ss].dpl == 3 )
> -pfec |= PFEC_user_mode;
> -
>  hvmemul_ctxt->insn_buf_eip = hvmemul_ctxt->ctxt.regs->rip;
> -if ( !insn_bytes )
> +
> +if ( insn_bytes )
>  {
> +hvmemul_ctxt->insn_buf_bytes = insn_bytes;
> +memcpy(hvmemul_ctxt->insn_buf, insn_buf, insn_bytes);
> +}
> +else if ( !(hvmemul_ctxt->insn_buf_bytes =
> +hvm_get_insn_bytes(curr, hvmemul_ctxt->insn_buf)) )
> +{
> +unsigned int pfec = PFEC_page_present | PFEC_insn_fetch;
> +unsigned long addr;
> +
> +if ( hvmemul_ctxt->seg_reg[x86_seg_ss].dpl == 3 )
> +pfec |= PFEC_user_mode;
> +
>  hvmemul_ctxt->insn_buf_bytes =
> -hvm_get_insn_bytes(curr, hvmemul_ctxt->insn_buf) ?:
>  (hvm_virtual_to_linear_addr(x86_seg_cs,
>  _ctxt->seg_reg[x86_seg_cs],
>  hvmemul_ctxt->insn_buf_eip,
> @@ -2795,15 +2802,9 @@ void hvm_emulate_init_per_insn(
>  ) &&
>   hvm_copy_from_guest_linear(hvmemul_ctxt->insn_buf, addr,
>  sizeof(hvmemul_ctxt->insn_buf),
> -pfec | PFEC_insn_fetch,
> -NULL) == HVMTRANS_okay) ?
> +pfec, NULL) == HVMTRANS_okay) ?
>  sizeof(hvmemul_ctxt->insn_buf) : 0;
>  }
> -else
> -{
> -hvmemul_ctxt->insn_buf_bytes = insn_bytes;
> -memcpy(hvmemul_ctxt->insn_buf, insn_buf, insn_bytes);
> -}
>  }
> 
>  void hvm_emulate_writeback(
> 
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v8 1/4] x86 / vmx: move teardown from domain_destroy()...

2020-01-31 Thread Durrant, Paul
> -Original Message-
> From: Jan Beulich 
> Sent: 31 January 2020 13:32
> To: Durrant, Paul 
> Cc: xen-devel@lists.xenproject.org; Jun Nakajima ;
> Kevin Tian ; Andrew Cooper
> ; Wei Liu ; Roger Pau Monné
> ; George Dunlap 
> Subject: Re: [PATCH v8 1/4] x86 / vmx: move teardown from
> domain_destroy()...
> 
> On 30.01.2020 15:57, Paul Durrant wrote:
> > ... to domain_relinquish_resources().
> >
> > The teardown code frees the APICv page. This does not need to be done
> late
> > so do it in domain_relinquish_resources() rather than domain_destroy().
> >
> > Signed-off-by: Paul Durrant 
> 
> Btw., this can have my
> Reviewed-by: Jan Beulich 
> as soon as "x86/HVM: relinquish resources also from hvm_domain_destroy()"
> has gone in. But that's still pending an ack or otherwise by you.
> 

Ok, thanks, I'll get on and review that.

  Paul
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v4 1/7] libxl: add definition of INVALID_DOMID to the API

2020-01-31 Thread Durrant, Paul
> -Original Message-
> From: Xen-devel  On Behalf Of
> Roger Pau Monné
> Sent: 22 January 2020 14:53
> To: Durrant, Paul 
> Cc: Anthony PERARD ; xen-
> de...@lists.xenproject.org; Ian Jackson ; Wei
> Liu 
> Subject: Re: [Xen-devel] [PATCH v4 1/7] libxl: add definition of
> INVALID_DOMID to the API
> 
> On Wed, Jan 22, 2020 at 02:44:40PM +, Paul Durrant wrote:
> > Currently both xl and libxl have internal definitions of INVALID_DOMID
> > which happen to be identical. However, for the purposes of describing
> the
> > behaviour of libxl_domain_create_new/restore() it is useful to have a
> > specified invalid value for a domain id.
> >
> > This patch therefore moves the libxl definition from libxl_internal.h to
> > libxl.h and removes the internal definition from xl_utils.h. The
> hardcoded
> > '-1' passed back via domcreate_complete() is then updated to
> INVALID_DOMID
> > and comment above libxl_domain_create_new/restore() is accordingly
> > modified.
> 
> Urg, it's kind of ugly to add another definition of invalid domid when
> there's already DOMID_INVALID in the public headers. I guess there's a
> reason I'm missing for not using DOMID_INVALID instead of introducing
> a new value?
> 

TBH I don't know. As far as xl/libxl goes, domids are uint32_ts so maybe 
DOMID_INVALID was for some reason not considered appropriate? Bear in mind, 
this patch is not truly introducing a new value; it's just making something 
that was internal but identical in both xl and libxl part of the interface.

> If so could this be mentioned in the commit message?
> 

I'll add a note to the commit comment to point out that this is not changing 
any value used by the toolstack.

  Paul

> Thanks, Roger.
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel
___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH v4 4/7] libxl: add infrastructure to track and query 'recent' domids

2020-01-31 Thread Durrant, Paul


> -Original Message-
> From: Anthony PERARD 
> Sent: 31 January 2020 10:55
> To: Durrant, Paul 
> Cc: xen-devel@lists.xenproject.org; Ian Jackson
> ; Wei Liu 
> Subject: Re: [PATCH v4 4/7] libxl: add infrastructure to track and query
> 'recent' domids
> 
> On Wed, Jan 22, 2020 at 02:44:43PM +, Paul Durrant wrote:
> > A domid is considered recent if the domain it represents was destroyed
> > less than a specified number of seconds ago. The number can be set using
> > the environment variable LIBXL_DOMID_REUSE_TIMEOUT. If the variable does
> > not exist then a default value of 60s is used.
> 
> Could you rewrite that part of the commit message? By reading it, it
> sounds to me like LIBXL_DOMID_REUSE_TIMEOUT is a configuration variable
> that a toolstack may want to set. Whereas the comment in
> libxl_internal.h indicates that it's for debuging.  Having env var for
> debugging sounds good, but having env var as configuration doesn't.
> 

Sure, I'll make the commit comment clearer.

  Paul

___
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

<    1   2   3   4   5   >