from:"Paul Durrant"

RE: [PATCH v2 5/8] xen/events: link interdomain events to associated xenbus device

2021-02-11 Thread Paul Durrant

> -Original Message-
> From: Juergen Gross 
> Sent: 11 February 2021 10:16
> To: xen-de...@lists.xenproject.org; linux-bl...@vger.kernel.org; 
> linux-kernel@vger.kernel.org;
> net...@vger.kernel.org; linux-s...@vger.kernel.org
> Cc: Juergen Gross ; Konrad Rzeszutek Wilk 
> ; Roger Pau Monné
> ; Jens Axboe ; Wei Liu 
> ; Paul Durrant
> ; David S. Miller ; Jakub Kicinski 
> ; Boris
> Ostrovsky ; Stefano Stabellini 
> 
> Subject: [PATCH v2 5/8] xen/events: link interdomain events to associated 
> xenbus device
> 
> In order to support the possibility of per-device event channel
> settings (e.g. lateeoi spurious event thresholds) add a xenbus device
> pointer to struct irq_info() and modify the related event channel
> binding interfaces to take the pointer to the xenbus device as a
> parameter instead of the domain id of the other side.
> 
> While at it remove the stale prototype of bind_evtchn_to_irq_lateeoi().
> 
> Signed-off-by: Juergen Gross 
> Reviewed-by: Boris Ostrovsky 
> Reviewed-by: Wei Liu 

Reviewed-by: Paul Durrant

RE: [PATCH v2 4/8] xen/netback: fix spurious event detection for common event case

2021-02-11 Thread Paul Durrant

> -Original Message-
> From: Juergen Gross 
> Sent: 11 February 2021 10:16
> To: xen-de...@lists.xenproject.org; net...@vger.kernel.org; 
> linux-kernel@vger.kernel.org
> Cc: Juergen Gross ; Wei Liu ; Paul 
> Durrant ; David
> S. Miller ; Jakub Kicinski 
> Subject: [PATCH v2 4/8] xen/netback: fix spurious event detection for common 
> event case
> 
> In case of a common event for rx and tx queue the event should be
> regarded to be spurious if no rx and no tx requests are pending.
> 
> Unfortunately the condition for testing that is wrong causing to
> decide a event being spurious if no rx OR no tx requests are
> pending.
> 
> Fix that plus using local variables for rx/tx pending indicators in
> order to split function calls and if condition.
> 

Definitely neater.

> Fixes: 23025393dbeb3b ("xen/netback: use lateeoi irq binding")
> Signed-off-by: Juergen Gross 

Reviewed-by: Paul Durrant

[PATCH v3] xen-blkback: fix compatibility bug with single page rings

2021-02-02 Thread Paul Durrant

From: Paul Durrant 

Prior to commit 4a8c31a1c6f5 ("xen/blkback: rework connect_ring() to avoid
inconsistent xenstore 'ring-page-order' set by malicious blkfront"), the
behaviour of xen-blkback when connecting to a frontend was:

- read 'ring-page-order'
- if not present then expect a single page ring specified by 'ring-ref'
- else expect a ring specified by 'ring-refX' where X is between 0 and
  1 << ring-page-order

This was correct behaviour, but was broken by the afforementioned commit to
become:

- read 'ring-page-order'
- if not present then expect a single page ring (i.e. ring-page-order = 0)
- expect a ring specified by 'ring-refX' where X is between 0 and
  1 << ring-page-order
- if that didn't work then see if there's a single page ring specified by
  'ring-ref'

This incorrect behaviour works most of the time but fails when a frontend
that sets 'ring-page-order' is unloaded and replaced by one that does not
because, instead of reading 'ring-ref', xen-blkback will read the stale
'ring-ref0' left around by the previous frontend will try to map the wrong
grant reference.

This patch restores the original behaviour.

Fixes: 4a8c31a1c6f5 ("xen/blkback: rework connect_ring() to avoid inconsistent 
xenstore 'ring-page-order' set by malicious blkfront")
Signed-off-by: Paul Durrant 
Reviewed-by: Dongli Zhang 
Reviewed-by: "Roger Pau Monné" 
---
Cc: Konrad Rzeszutek Wilk 
Cc: Jens Axboe 

v3:
 - Whitespace fix

v2:
 - Remove now-spurious error path special-case when nr_grefs == 1
---
 drivers/block/xen-blkback/common.h |  1 +
 drivers/block/xen-blkback/xenbus.c | 38 +-
 2 files changed, 17 insertions(+), 22 deletions(-)

diff --git a/drivers/block/xen-blkback/common.h 
b/drivers/block/xen-blkback/common.h
index b0c71d3a81a0..bda5c815e441 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -313,6 +313,7 @@ struct xen_blkif {
 
struct work_struct  free_work;
unsigned intnr_ring_pages;
+   boolmulti_ref;
/* All rings for this device. */
struct xen_blkif_ring   *rings;
unsigned intnr_rings;
diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index 9860d4842f36..6c5e9373e91c 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -998,14 +998,17 @@ static int read_per_ring_refs(struct xen_blkif_ring 
*ring, const char *dir)
for (i = 0; i < nr_grefs; i++) {
char ring_ref_name[RINGREF_NAME_LEN];
 
-   snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", i);
+   if (blkif->multi_ref)
+   snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", 
i);
+   else {
+   WARN_ON(i != 0);
+   snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref");
+   }
+
err = xenbus_scanf(XBT_NIL, dir, ring_ref_name,
   "%u", _ref[i]);
 
if (err != 1) {
-   if (nr_grefs == 1)
-   break;
-
err = -EINVAL;
xenbus_dev_fatal(dev, err, "reading %s/%s",
 dir, ring_ref_name);
@@ -1013,18 +1016,6 @@ static int read_per_ring_refs(struct xen_blkif_ring 
*ring, const char *dir)
}
}
 
-   if (err != 1) {
-   WARN_ON(nr_grefs != 1);
-
-   err = xenbus_scanf(XBT_NIL, dir, "ring-ref", "%u",
-  _ref[0]);
-   if (err != 1) {
-   err = -EINVAL;
-   xenbus_dev_fatal(dev, err, "reading %s/ring-ref", dir);
-   return err;
-   }
-   }
-
err = -ENOMEM;
for (i = 0; i < nr_grefs * XEN_BLKIF_REQS_PER_PAGE; i++) {
req = kzalloc(sizeof(*req), GFP_KERNEL);
@@ -1129,10 +1120,15 @@ static int connect_ring(struct backend_info *be)
 blkif->nr_rings, blkif->blk_protocol, protocol,
 blkif->vbd.feature_gnt_persistent ? "persistent grants" : "");
 
-   ring_page_order = xenbus_read_unsigned(dev->otherend,
-  "ring-page-order", 0);
-
-   if (ring_page_order > xen_blkif_max_ring_order) {
+   err = xenbus_scanf(XBT_NIL, dev->otherend, "ring-page-order", "%u",
+  _page_order);
+   if (err != 1) {
+   blkif->nr_ring_pages = 1;
+   blkif->multi_ref = false;
+   } else if (ring_page_order <= xen_blkif_max_ring_order) {
+   blkif->nr_ring_pages = 1 << ring_pag

RE: [PATCH v2] xen-blkback: fix compatibility bug with single page rings

2021-02-02 Thread Paul Durrant

> -Original Message-
> From: Roger Pau Monné 
> Sent: 02 February 2021 16:29
> To: Paul Durrant 
> Cc: xen-de...@lists.xenproject.org; linux-bl...@vger.kernel.org; 
> linux-kernel@vger.kernel.org; Paul
> Durrant ; Konrad Rzeszutek Wilk 
> ; Jens Axboe
> ; Dongli Zhang 
> Subject: Re: [PATCH v2] xen-blkback: fix compatibility bug with single page 
> rings
> 
> On Thu, Jan 28, 2021 at 01:04:41PM +0000, Paul Durrant wrote:
> > From: Paul Durrant 
> >
> > Prior to commit 4a8c31a1c6f5 ("xen/blkback: rework connect_ring() to avoid
> > inconsistent xenstore 'ring-page-order' set by malicious blkfront"), the
> > behaviour of xen-blkback when connecting to a frontend was:
> >
> > - read 'ring-page-order'
> > - if not present then expect a single page ring specified by 'ring-ref'
> > - else expect a ring specified by 'ring-refX' where X is between 0 and
> >   1 << ring-page-order
> >
> > This was correct behaviour, but was broken by the afforementioned commit to
> > become:
> >
> > - read 'ring-page-order'
> > - if not present then expect a single page ring (i.e. ring-page-order = 0)
> > - expect a ring specified by 'ring-refX' where X is between 0 and
> >   1 << ring-page-order
> > - if that didn't work then see if there's a single page ring specified by
> >   'ring-ref'
> >
> > This incorrect behaviour works most of the time but fails when a frontend
> > that sets 'ring-page-order' is unloaded and replaced by one that does not
> > because, instead of reading 'ring-ref', xen-blkback will read the stale
> > 'ring-ref0' left around by the previous frontend will try to map the wrong
> > grant reference.
> >
> > This patch restores the original behaviour.
> >
> > Fixes: 4a8c31a1c6f5 ("xen/blkback: rework connect_ring() to avoid 
> > inconsistent xenstore 'ring-page-
> order' set by malicious blkfront")
> > Signed-off-by: Paul Durrant 
> > ---
> > Cc: Konrad Rzeszutek Wilk 
> > Cc: "Roger Pau Monné" 
> > Cc: Jens Axboe 
> > Cc: Dongli Zhang 
> >
> > v2:
> >  - Remove now-spurious error path special-case when nr_grefs == 1
> > ---
> >  drivers/block/xen-blkback/common.h |  1 +
> >  drivers/block/xen-blkback/xenbus.c | 38 +-
> >  2 files changed, 17 insertions(+), 22 deletions(-)
> >
> > diff --git a/drivers/block/xen-blkback/common.h 
> > b/drivers/block/xen-blkback/common.h
> > index b0c71d3a81a0..524a79f10de6 100644
> > --- a/drivers/block/xen-blkback/common.h
> > +++ b/drivers/block/xen-blkback/common.h
> > @@ -313,6 +313,7 @@ struct xen_blkif {
> >
> > struct work_struct  free_work;
> > unsigned intnr_ring_pages;
> > +   boolmulti_ref;
> 
> You seem to have used spaces between the type and the variable name
> here, while neighbors also use hard tabs.
> 

Oops. Xen vs. Linux coding style :-( I'll send a v3 with the whitespace fixed.

> The rest LGTM:
> 
> Reviewed-by: Roger Pau Monné 
> 
> We should have forbidden the usage of ring-page-order = 0 and we could
> have avoided having to add the multi_ref variable, but that's too late
> now.

Thanks. Yes, that cat is out of the bag and has been for a while unfortunately.

  Paul

> 
> Thanks, Roger.

RE: [PATCH v2] xen-blkback: fix compatibility bug with single page rings

2021-02-02 Thread Paul Durrant

> -Original Message-
> From: Xen-devel  On Behalf Of Dongli 
> Zhang
> Sent: 30 January 2021 05:09
> To: p...@xen.org; 'Jürgen Groß' ; 
> xen-de...@lists.xenproject.org; linux-
> bl...@vger.kernel.org; linux-kernel@vger.kernel.org
> Cc: 'Paul Durrant' ; 'Konrad Rzeszutek Wilk' 
> ; 'Roger Pau
> Monné' ; 'Jens Axboe' 
> Subject: Re: [PATCH v2] xen-blkback: fix compatibility bug with single page 
> rings
> 
> 
> 
> On 1/29/21 12:13 AM, Paul Durrant wrote:
> >> -Original Message-
> >> From: Jürgen Groß 
> >> Sent: 29 January 2021 07:35
> >> To: Dongli Zhang ; Paul Durrant ; 
> >> xen-
> >> de...@lists.xenproject.org; linux-bl...@vger.kernel.org; 
> >> linux-kernel@vger.kernel.org
> >> Cc: Paul Durrant ; Konrad Rzeszutek Wilk 
> >> ; Roger Pau
> >> Monné ; Jens Axboe 
> >> Subject: Re: [PATCH v2] xen-blkback: fix compatibility bug with single 
> >> page rings
> >>
> >> On 29.01.21 07:20, Dongli Zhang wrote:
> >>>
> >>>
> >>> On 1/28/21 5:04 AM, Paul Durrant wrote:
> >>>> From: Paul Durrant 
> >>>>
> >>>> Prior to commit 4a8c31a1c6f5 ("xen/blkback: rework connect_ring() to 
> >>>> avoid
> >>>> inconsistent xenstore 'ring-page-order' set by malicious blkfront"), the
> >>>> behaviour of xen-blkback when connecting to a frontend was:
> >>>>
> >>>> - read 'ring-page-order'
> >>>> - if not present then expect a single page ring specified by 'ring-ref'
> >>>> - else expect a ring specified by 'ring-refX' where X is between 0 and
> >>>>1 << ring-page-order
> >>>>
> >>>> This was correct behaviour, but was broken by the afforementioned commit 
> >>>> to
> >>>> become:
> >>>>
> >>>> - read 'ring-page-order'
> >>>> - if not present then expect a single page ring (i.e. ring-page-order = 
> >>>> 0)
> >>>> - expect a ring specified by 'ring-refX' where X is between 0 and
> >>>>1 << ring-page-order
> >>>> - if that didn't work then see if there's a single page ring specified by
> >>>>'ring-ref'
> >>>>
> >>>> This incorrect behaviour works most of the time but fails when a frontend
> >>>> that sets 'ring-page-order' is unloaded and replaced by one that does not
> >>>> because, instead of reading 'ring-ref', xen-blkback will read the stale
> >>>> 'ring-ref0' left around by the previous frontend will try to map the 
> >>>> wrong
> >>>> grant reference.
> >>>>
> >>>> This patch restores the original behaviour.
> >>>>
> >>>> Fixes: 4a8c31a1c6f5 ("xen/blkback: rework connect_ring() to avoid 
> >>>> inconsistent xenstore 'ring-
> page-
> >> order' set by malicious blkfront")
> >>>> Signed-off-by: Paul Durrant 
> >>>> ---
> >>>> Cc: Konrad Rzeszutek Wilk 
> >>>> Cc: "Roger Pau Monné" 
> >>>> Cc: Jens Axboe 
> >>>> Cc: Dongli Zhang 
> >>>>
> >>>> v2:
> >>>>   - Remove now-spurious error path special-case when nr_grefs == 1
> >>>> ---
> >>>>   drivers/block/xen-blkback/common.h |  1 +
> >>>>   drivers/block/xen-blkback/xenbus.c | 38 +-
> >>>>   2 files changed, 17 insertions(+), 22 deletions(-)
> >>>>
> >>>> diff --git a/drivers/block/xen-blkback/common.h 
> >>>> b/drivers/block/xen-blkback/common.h
> >>>> index b0c71d3a81a0..524a79f10de6 100644
> >>>> --- a/drivers/block/xen-blkback/common.h
> >>>> +++ b/drivers/block/xen-blkback/common.h
> >>>> @@ -313,6 +313,7 @@ struct xen_blkif {
> >>>>
> >>>>  struct work_struct  free_work;
> >>>>  unsigned intnr_ring_pages;
> >>>> +boolmulti_ref;
> >>>
> >>> Is it really necessary to introduce 'multi_ref' here or we may just re-use
> >>> 'nr_ring_pages'?
> >>>
> >>> According to blkfront code, 'ring-page-order' is set only when it is not 
> >>> zero,
> >>> that is, only when (info->nr_ring_pages > 1).
> >>
> >
> > That's how it is *supposed* to be. Windows certainly behaves that way too.
> >
> >> Did you look into all other OS's (Windows, OpenBSD, FreebSD, NetBSD,
> >> Solaris, Netware, other proprietary systems) implementations to verify
> >> that claim?
> >>
> >> I don't think so. So better safe than sorry.
> >>
> >
> > Indeed. It was unfortunate that the commit to blkif.h documenting 
> > multi-page (829f2a9c6dfae) was not
> crystal clear and (possibly as a consequence) blkback was implemented to read 
> ring-ref0 rather than
> ring-ref if ring-page-order was present and 0. Hence the only safe thing to 
> do is to restore that
> behaviour.
> >
> 
> Thank you very much for the explanation!
> 
> Reviewed-by: Dongli Zhang 
> 

Thanks.

Roger, Konrad, can I get a maintainer ack or otherwise, please?

  Paul

[PATCH v2] xen-blkback: fix compatibility bug with single page rings

2021-01-28 Thread Paul Durrant

From: Paul Durrant 

Prior to commit 4a8c31a1c6f5 ("xen/blkback: rework connect_ring() to avoid
inconsistent xenstore 'ring-page-order' set by malicious blkfront"), the
behaviour of xen-blkback when connecting to a frontend was:

- read 'ring-page-order'
- if not present then expect a single page ring specified by 'ring-ref'
- else expect a ring specified by 'ring-refX' where X is between 0 and
  1 << ring-page-order

This was correct behaviour, but was broken by the afforementioned commit to
become:

- read 'ring-page-order'
- if not present then expect a single page ring (i.e. ring-page-order = 0)
- expect a ring specified by 'ring-refX' where X is between 0 and
  1 << ring-page-order
- if that didn't work then see if there's a single page ring specified by
  'ring-ref'

This incorrect behaviour works most of the time but fails when a frontend
that sets 'ring-page-order' is unloaded and replaced by one that does not
because, instead of reading 'ring-ref', xen-blkback will read the stale
'ring-ref0' left around by the previous frontend will try to map the wrong
grant reference.

This patch restores the original behaviour.

Fixes: 4a8c31a1c6f5 ("xen/blkback: rework connect_ring() to avoid inconsistent 
xenstore 'ring-page-order' set by malicious blkfront")
Signed-off-by: Paul Durrant 
---
Cc: Konrad Rzeszutek Wilk 
Cc: "Roger Pau Monné" 
Cc: Jens Axboe 
Cc: Dongli Zhang 

v2:
 - Remove now-spurious error path special-case when nr_grefs == 1
---
 drivers/block/xen-blkback/common.h |  1 +
 drivers/block/xen-blkback/xenbus.c | 38 +-
 2 files changed, 17 insertions(+), 22 deletions(-)

diff --git a/drivers/block/xen-blkback/common.h 
b/drivers/block/xen-blkback/common.h
index b0c71d3a81a0..524a79f10de6 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -313,6 +313,7 @@ struct xen_blkif {
 
struct work_struct  free_work;
unsigned intnr_ring_pages;
+   boolmulti_ref;
/* All rings for this device. */
struct xen_blkif_ring   *rings;
unsigned intnr_rings;
diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index 9860d4842f36..6c5e9373e91c 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -998,14 +998,17 @@ static int read_per_ring_refs(struct xen_blkif_ring 
*ring, const char *dir)
for (i = 0; i < nr_grefs; i++) {
char ring_ref_name[RINGREF_NAME_LEN];
 
-   snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", i);
+   if (blkif->multi_ref)
+   snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", 
i);
+   else {
+   WARN_ON(i != 0);
+   snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref");
+   }
+
err = xenbus_scanf(XBT_NIL, dir, ring_ref_name,
   "%u", _ref[i]);
 
if (err != 1) {
-   if (nr_grefs == 1)
-   break;
-
err = -EINVAL;
xenbus_dev_fatal(dev, err, "reading %s/%s",
 dir, ring_ref_name);
@@ -1013,18 +1016,6 @@ static int read_per_ring_refs(struct xen_blkif_ring 
*ring, const char *dir)
}
}
 
-   if (err != 1) {
-   WARN_ON(nr_grefs != 1);
-
-   err = xenbus_scanf(XBT_NIL, dir, "ring-ref", "%u",
-  _ref[0]);
-   if (err != 1) {
-   err = -EINVAL;
-   xenbus_dev_fatal(dev, err, "reading %s/ring-ref", dir);
-   return err;
-   }
-   }
-
err = -ENOMEM;
for (i = 0; i < nr_grefs * XEN_BLKIF_REQS_PER_PAGE; i++) {
req = kzalloc(sizeof(*req), GFP_KERNEL);
@@ -1129,10 +1120,15 @@ static int connect_ring(struct backend_info *be)
 blkif->nr_rings, blkif->blk_protocol, protocol,
 blkif->vbd.feature_gnt_persistent ? "persistent grants" : "");
 
-   ring_page_order = xenbus_read_unsigned(dev->otherend,
-  "ring-page-order", 0);
-
-   if (ring_page_order > xen_blkif_max_ring_order) {
+   err = xenbus_scanf(XBT_NIL, dev->otherend, "ring-page-order", "%u",
+  _page_order);
+   if (err != 1) {
+   blkif->nr_ring_pages = 1;
+   blkif->multi_ref = false;
+   } else if (ring_page_order <= xen_blkif_max_ring_order) {
+   blkif->nr_ring_pages = 1 << ring_page_order;
+   blkif->

RE: [PATCH] xen-blkback: fix compatibility bug with single page rings

2021-01-28 Thread Paul Durrant

Apologies; I missed the v2 from the subject line. I'll re-send.

  Paul

> -Original Message-
> From: Xen-devel  On Behalf Of Paul 
> Durrant
> Sent: 28 January 2021 12:55
> To: xen-de...@lists.xenproject.org; linux-bl...@vger.kernel.org; 
> linux-kernel@vger.kernel.org
> Cc: Paul Durrant ; Konrad Rzeszutek Wilk 
> ; Roger Pau
> Monné ; Jens Axboe ; Dongli Zhang 
> 
> Subject: [PATCH] xen-blkback: fix compatibility bug with single page rings
> 
> From: Paul Durrant 
> 
> Prior to commit 4a8c31a1c6f5 ("xen/blkback: rework connect_ring() to avoid
> inconsistent xenstore 'ring-page-order' set by malicious blkfront"), the
> behaviour of xen-blkback when connecting to a frontend was:
> 
> - read 'ring-page-order'
> - if not present then expect a single page ring specified by 'ring-ref'
> - else expect a ring specified by 'ring-refX' where X is between 0 and
>   1 << ring-page-order
> 
> This was correct behaviour, but was broken by the afforementioned commit to
> become:
> 
> - read 'ring-page-order'
> - if not present then expect a single page ring (i.e. ring-page-order = 0)
> - expect a ring specified by 'ring-refX' where X is between 0 and
>   1 << ring-page-order
> - if that didn't work then see if there's a single page ring specified by
>   'ring-ref'
> 
> This incorrect behaviour works most of the time but fails when a frontend
> that sets 'ring-page-order' is unloaded and replaced by one that does not
> because, instead of reading 'ring-ref', xen-blkback will read the stale
> 'ring-ref0' left around by the previous frontend will try to map the wrong
> grant reference.
> 
> This patch restores the original behaviour.
> 
> Fixes: 4a8c31a1c6f5 ("xen/blkback: rework connect_ring() to avoid 
> inconsistent xenstore 'ring-page-
> order' set by malicious blkfront")
> Signed-off-by: Paul Durrant 
> ---
> Cc: Konrad Rzeszutek Wilk 
> Cc: "Roger Pau Monné" 
> Cc: Jens Axboe 
> Cc: Dongli Zhang 
> 
> v2:
>  - Remove now-spurious error path special-case when nr_grefs == 1
> ---
>  drivers/block/xen-blkback/common.h |  1 +
>  drivers/block/xen-blkback/xenbus.c | 38 +-
>  2 files changed, 17 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/block/xen-blkback/common.h 
> b/drivers/block/xen-blkback/common.h
> index b0c71d3a81a0..524a79f10de6 100644
> --- a/drivers/block/xen-blkback/common.h
> +++ b/drivers/block/xen-blkback/common.h
> @@ -313,6 +313,7 @@ struct xen_blkif {
> 
>   struct work_struct  free_work;
>   unsigned intnr_ring_pages;
> + boolmulti_ref;
>   /* All rings for this device. */
>   struct xen_blkif_ring   *rings;
>   unsigned intnr_rings;
> diff --git a/drivers/block/xen-blkback/xenbus.c 
> b/drivers/block/xen-blkback/xenbus.c
> index 9860d4842f36..6c5e9373e91c 100644
> --- a/drivers/block/xen-blkback/xenbus.c
> +++ b/drivers/block/xen-blkback/xenbus.c
> @@ -998,14 +998,17 @@ static int read_per_ring_refs(struct xen_blkif_ring 
> *ring, const char *dir)
>   for (i = 0; i < nr_grefs; i++) {
>   char ring_ref_name[RINGREF_NAME_LEN];
> 
> - snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", i);
> + if (blkif->multi_ref)
> + snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", 
> i);
> + else {
> + WARN_ON(i != 0);
> + snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref");
> + }
> +
>   err = xenbus_scanf(XBT_NIL, dir, ring_ref_name,
>  "%u", _ref[i]);
> 
>   if (err != 1) {
> - if (nr_grefs == 1)
> - break;
> -
>   err = -EINVAL;
>   xenbus_dev_fatal(dev, err, "reading %s/%s",
>dir, ring_ref_name);
> @@ -1013,18 +1016,6 @@ static int read_per_ring_refs(struct xen_blkif_ring 
> *ring, const char *dir)
>   }
>   }
> 
> - if (err != 1) {
> - WARN_ON(nr_grefs != 1);
> -
> - err = xenbus_scanf(XBT_NIL, dir, "ring-ref", "%u",
> -_ref[0]);
> - if (err != 1) {
> - err = -EINVAL;
> - xenbus_dev_fatal(dev, err, "reading %s/ring-ref", dir);
> - return err;
> - }
> - }
> -
>   err = -ENOMEM;
>   for (i = 0; i < nr_grefs * XEN_BLKIF_REQS

[PATCH] xen-blkback: fix compatibility bug with single page rings

2021-01-28 Thread Paul Durrant

From: Paul Durrant 

Prior to commit 4a8c31a1c6f5 ("xen/blkback: rework connect_ring() to avoid
inconsistent xenstore 'ring-page-order' set by malicious blkfront"), the
behaviour of xen-blkback when connecting to a frontend was:

- read 'ring-page-order'
- if not present then expect a single page ring specified by 'ring-ref'
- else expect a ring specified by 'ring-refX' where X is between 0 and
  1 << ring-page-order

This was correct behaviour, but was broken by the afforementioned commit to
become:

- read 'ring-page-order'
- if not present then expect a single page ring (i.e. ring-page-order = 0)
- expect a ring specified by 'ring-refX' where X is between 0 and
  1 << ring-page-order
- if that didn't work then see if there's a single page ring specified by
  'ring-ref'

This incorrect behaviour works most of the time but fails when a frontend
that sets 'ring-page-order' is unloaded and replaced by one that does not
because, instead of reading 'ring-ref', xen-blkback will read the stale
'ring-ref0' left around by the previous frontend will try to map the wrong
grant reference.

This patch restores the original behaviour.

Fixes: 4a8c31a1c6f5 ("xen/blkback: rework connect_ring() to avoid inconsistent 
xenstore 'ring-page-order' set by malicious blkfront")
Signed-off-by: Paul Durrant 
---
Cc: Konrad Rzeszutek Wilk 
Cc: "Roger Pau Monné" 
Cc: Jens Axboe 
Cc: Dongli Zhang 

v2:
 - Remove now-spurious error path special-case when nr_grefs == 1
---
 drivers/block/xen-blkback/common.h |  1 +
 drivers/block/xen-blkback/xenbus.c | 38 +-
 2 files changed, 17 insertions(+), 22 deletions(-)

diff --git a/drivers/block/xen-blkback/common.h 
b/drivers/block/xen-blkback/common.h
index b0c71d3a81a0..524a79f10de6 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -313,6 +313,7 @@ struct xen_blkif {
 
struct work_struct  free_work;
unsigned intnr_ring_pages;
+   boolmulti_ref;
/* All rings for this device. */
struct xen_blkif_ring   *rings;
unsigned intnr_rings;
diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index 9860d4842f36..6c5e9373e91c 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -998,14 +998,17 @@ static int read_per_ring_refs(struct xen_blkif_ring 
*ring, const char *dir)
for (i = 0; i < nr_grefs; i++) {
char ring_ref_name[RINGREF_NAME_LEN];
 
-   snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", i);
+   if (blkif->multi_ref)
+   snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", 
i);
+   else {
+   WARN_ON(i != 0);
+   snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref");
+   }
+
err = xenbus_scanf(XBT_NIL, dir, ring_ref_name,
   "%u", _ref[i]);
 
if (err != 1) {
-   if (nr_grefs == 1)
-   break;
-
err = -EINVAL;
xenbus_dev_fatal(dev, err, "reading %s/%s",
 dir, ring_ref_name);
@@ -1013,18 +1016,6 @@ static int read_per_ring_refs(struct xen_blkif_ring 
*ring, const char *dir)
}
}
 
-   if (err != 1) {
-   WARN_ON(nr_grefs != 1);
-
-   err = xenbus_scanf(XBT_NIL, dir, "ring-ref", "%u",
-  _ref[0]);
-   if (err != 1) {
-   err = -EINVAL;
-   xenbus_dev_fatal(dev, err, "reading %s/ring-ref", dir);
-   return err;
-   }
-   }
-
err = -ENOMEM;
for (i = 0; i < nr_grefs * XEN_BLKIF_REQS_PER_PAGE; i++) {
req = kzalloc(sizeof(*req), GFP_KERNEL);
@@ -1129,10 +1120,15 @@ static int connect_ring(struct backend_info *be)
 blkif->nr_rings, blkif->blk_protocol, protocol,
 blkif->vbd.feature_gnt_persistent ? "persistent grants" : "");
 
-   ring_page_order = xenbus_read_unsigned(dev->otherend,
-  "ring-page-order", 0);
-
-   if (ring_page_order > xen_blkif_max_ring_order) {
+   err = xenbus_scanf(XBT_NIL, dev->otherend, "ring-page-order", "%u",
+  _page_order);
+   if (err != 1) {
+   blkif->nr_ring_pages = 1;
+   blkif->multi_ref = false;
+   } else if (ring_page_order <= xen_blkif_max_ring_order) {
+   blkif->nr_ring_pages = 1 << ring_page_order;
+   blkif->

RE: [PATCH] xen-blkback: fix compatibility bug with single page rings

2021-01-28 Thread Paul Durrant

> -Original Message-
> From: Xen-devel  On Behalf Of Dongli 
> Zhang
> Sent: 27 January 2021 19:57
> To: Paul Durrant ; xen-de...@lists.xenproject.org; 
> linux-bl...@vger.kernel.org; linux-
> ker...@vger.kernel.org
> Cc: Paul Durrant ; Konrad Rzeszutek Wilk 
> ; Roger Pau
> Monné ; Jens Axboe 
> Subject: Re: [PATCH] xen-blkback: fix compatibility bug with single page rings
> 
> 
> 
> On 1/27/21 2:30 AM, Paul Durrant wrote:
> > From: Paul Durrant 
> >
> > Prior to commit 4a8c31a1c6f5 ("xen/blkback: rework connect_ring() to avoid
> > inconsistent xenstore 'ring-page-order' set by malicious blkfront"), the
> > behaviour of xen-blkback when connecting to a frontend was:
> >
> > - read 'ring-page-order'
> > - if not present then expect a single page ring specified by 'ring-ref'
> > - else expect a ring specified by 'ring-refX' where X is between 0 and
> >   1 << ring-page-order
> >
> > This was correct behaviour, but was broken by the afforementioned commit to
> > become:
> >
> > - read 'ring-page-order'
> > - if not present then expect a single page ring
> > - expect a ring specified by 'ring-refX' where X is between 0 and
> >   1 << ring-page-order
> > - if that didn't work then see if there's a single page ring specified by
> >   'ring-ref'
> >
> > This incorrect behaviour works most of the time but fails when a frontend
> > that sets 'ring-page-order' is unloaded and replaced by one that does not
> > because, instead of reading 'ring-ref', xen-blkback will read the stale
> > 'ring-ref0' left around by the previous frontend will try to map the wrong
> > grant reference.
> >
> > This patch restores the original behaviour.
> >
> > Fixes: 4a8c31a1c6f5 ("xen/blkback: rework connect_ring() to avoid 
> > inconsistent xenstore 'ring-page-
> order' set by malicious blkfront")
> > Signed-off-by: Paul Durrant 
> > ---
> > Cc: Konrad Rzeszutek Wilk 
> > Cc: "Roger Pau Monné" 
> > Cc: Jens Axboe 
> > Cc: Dongli Zhang 
> > ---
> >  drivers/block/xen-blkback/common.h |  1 +
> >  drivers/block/xen-blkback/xenbus.c | 36 +-
> >  2 files changed, 17 insertions(+), 20 deletions(-)
> >
> > diff --git a/drivers/block/xen-blkback/common.h 
> > b/drivers/block/xen-blkback/common.h
> > index b0c71d3a81a0..524a79f10de6 100644
> > --- a/drivers/block/xen-blkback/common.h
> > +++ b/drivers/block/xen-blkback/common.h
> > @@ -313,6 +313,7 @@ struct xen_blkif {
> >
> > struct work_struct  free_work;
> > unsigned intnr_ring_pages;
> > +   boolmulti_ref;
> > /* All rings for this device. */
> > struct xen_blkif_ring   *rings;
> > unsigned intnr_rings;
> > diff --git a/drivers/block/xen-blkback/xenbus.c 
> > b/drivers/block/xen-blkback/xenbus.c
> > index 9860d4842f36..4c1541cde68c 100644
> > --- a/drivers/block/xen-blkback/xenbus.c
> > +++ b/drivers/block/xen-blkback/xenbus.c
> > @@ -998,10 +998,15 @@ static int read_per_ring_refs(struct xen_blkif_ring 
> > *ring, const char *dir)
> > for (i = 0; i < nr_grefs; i++) {
> > char ring_ref_name[RINGREF_NAME_LEN];
> >
> > -   snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", i);
> > +   if (blkif->multi_ref)
> > +   snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", 
> > i);
> > +   else {
> > +   WARN_ON(i != 0);
> > +   snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref");
> > +   }
> > +
> > err = xenbus_scanf(XBT_NIL, dir, ring_ref_name,
> >"%u", _ref[i]);
> > -
> > if (err != 1) {
> > if (nr_grefs == 1)
> > break;
> 
> I think we should not simply break here because the failure can be due to when
> (nr_grefs == 1) and reading from legacy "ring-ref".
> 

Yes, you're quite right. This special case is no longer correct.

> Should we do something as below?
> 
> err = -EINVAL;
> xenbus_dev_fatal(dev, err, "reading %s/ring-ref", dir);
> return err;
> 

I think simply removing the 'if (nr_grefs == 1)' will be sufficient.

  Paul

> Dongli Zhang
> 
> 
> > @@ -1013,18 +1018,6 @@ static int read_per_ring_refs(struct xen_blkif_ring 
> > *ring, const char *dir)
> > }
> >

RE: [PATCH] xen-blkback: fix compatibility bug with single page rings

2021-01-27 Thread Paul Durrant

> -Original Message-
> From: Jan Beulich 
> Sent: 27 January 2021 11:21
> To: p...@xen.org
> Cc: 'Paul Durrant' ; 'Konrad Rzeszutek Wilk' 
> ; 'Roger Pau
> Monné' ; 'Jens Axboe' ; 'Dongli Zhang'
> ; linux-kernel@vger.kernel.org; 
> linux-bl...@vger.kernel.org; xen-
> de...@lists.xenproject.org
> Subject: Re: [PATCH] xen-blkback: fix compatibility bug with single page rings
> 
> On 27.01.2021 12:09, Paul Durrant wrote:
> >> -Original Message-
> >> From: Jan Beulich 
> >> Sent: 27 January 2021 10:57
> >> To: Paul Durrant 
> >> Cc: Paul Durrant ; Konrad Rzeszutek Wilk 
> >> ; Roger Pau
> >> Monné ; Jens Axboe ; Dongli Zhang 
> >> ;
> >> linux-kernel@vger.kernel.org; linux-bl...@vger.kernel.org; 
> >> xen-de...@lists.xenproject.org
> >> Subject: Re: [PATCH] xen-blkback: fix compatibility bug with single page 
> >> rings
> >>
> >> On 27.01.2021 11:30, Paul Durrant wrote:
> >>> From: Paul Durrant 
> >>>
> >>> Prior to commit 4a8c31a1c6f5 ("xen/blkback: rework connect_ring() to avoid
> >>> inconsistent xenstore 'ring-page-order' set by malicious blkfront"), the
> >>> behaviour of xen-blkback when connecting to a frontend was:
> >>>
> >>> - read 'ring-page-order'
> >>> - if not present then expect a single page ring specified by 'ring-ref'
> >>> - else expect a ring specified by 'ring-refX' where X is between 0 and
> >>>   1 << ring-page-order
> >>>
> >>> This was correct behaviour, but was broken by the afforementioned commit 
> >>> to
> >>> become:
> >>>
> >>> - read 'ring-page-order'
> >>> - if not present then expect a single page ring
> >>> - expect a ring specified by 'ring-refX' where X is between 0 and
> >>>   1 << ring-page-order
> >>> - if that didn't work then see if there's a single page ring specified by
> >>>   'ring-ref'
> >>>
> >>> This incorrect behaviour works most of the time but fails when a frontend
> >>> that sets 'ring-page-order' is unloaded and replaced by one that does not
> >>> because, instead of reading 'ring-ref', xen-blkback will read the stale
> >>> 'ring-ref0' left around by the previous frontend will try to map the wrong
> >>> grant reference.
> >>>
> >>> This patch restores the original behaviour.
> >>
> >> Isn't this only the 2nd of a pair of fixes that's needed, the
> >> first being the drivers, upon being unloaded, to fully clean up
> >> after itself? Any stale key left may lead to confusion upon
> >> re-use of the containing directory.
> >
> > In a backend we shouldn't be relying on, nor really expect IMO, a frontend 
> > to clean up after itself.
> Any backend should know *exactly* what xenstore nodes it’s looking for from a 
> frontend.
> 
> But the backend can't know whether a node exists because the present
> frontend has written it, or because an earlier instance forgot to
> delete it. It can only honor what's there. (In fact the other day I
> was wondering whether some of the writes of boolean "false" nodes
> wouldn't better be xenbus_rm() instead.)

In the particular case this patch is fixing for me, the frontends are the 
Windows XENVBD driver and the Windows crash version of the same driver 
(actually built from different code). The 'normal' instance is multi-page aware 
and the crash instance is not quite, i.e. it uses the old ring-ref but knows to 
clean up 'ring-page-order'.
Clearly, in a crash situation, we cannot rely on frontend to clean up so what 
you say does highlight that there indeed needs to be a second patch to 
xen-blkback to make sure it removes 'ring-page-order' itself as 'state' cycles 
through Closed and back to InitWait. I think this patch does still stand on its 
own though.

  Paul

> 
> Jan

RE: [PATCH] xen-blkback: fix compatibility bug with single page rings

2021-01-27 Thread Paul Durrant

> -Original Message-
> From: Jan Beulich 
> Sent: 27 January 2021 10:57
> To: Paul Durrant 
> Cc: Paul Durrant ; Konrad Rzeszutek Wilk 
> ; Roger Pau
> Monné ; Jens Axboe ; Dongli Zhang 
> ;
> linux-kernel@vger.kernel.org; linux-bl...@vger.kernel.org; 
> xen-de...@lists.xenproject.org
> Subject: Re: [PATCH] xen-blkback: fix compatibility bug with single page rings
> 
> On 27.01.2021 11:30, Paul Durrant wrote:
> > From: Paul Durrant 
> >
> > Prior to commit 4a8c31a1c6f5 ("xen/blkback: rework connect_ring() to avoid
> > inconsistent xenstore 'ring-page-order' set by malicious blkfront"), the
> > behaviour of xen-blkback when connecting to a frontend was:
> >
> > - read 'ring-page-order'
> > - if not present then expect a single page ring specified by 'ring-ref'
> > - else expect a ring specified by 'ring-refX' where X is between 0 and
> >   1 << ring-page-order
> >
> > This was correct behaviour, but was broken by the afforementioned commit to
> > become:
> >
> > - read 'ring-page-order'
> > - if not present then expect a single page ring
> > - expect a ring specified by 'ring-refX' where X is between 0 and
> >   1 << ring-page-order
> > - if that didn't work then see if there's a single page ring specified by
> >   'ring-ref'
> >
> > This incorrect behaviour works most of the time but fails when a frontend
> > that sets 'ring-page-order' is unloaded and replaced by one that does not
> > because, instead of reading 'ring-ref', xen-blkback will read the stale
> > 'ring-ref0' left around by the previous frontend will try to map the wrong
> > grant reference.
> >
> > This patch restores the original behaviour.
> 
> Isn't this only the 2nd of a pair of fixes that's needed, the
> first being the drivers, upon being unloaded, to fully clean up
> after itself? Any stale key left may lead to confusion upon
> re-use of the containing directory.

In a backend we shouldn't be relying on, nor really expect IMO, a frontend to 
clean up after itself. Any backend should know *exactly* what xenstore nodes 
it’s looking for from a frontend.

  Paul

> 
> Jan

[PATCH] xen-blkback: fix compatibility bug with single page rings

2021-01-27 Thread Paul Durrant

From: Paul Durrant 

Prior to commit 4a8c31a1c6f5 ("xen/blkback: rework connect_ring() to avoid
inconsistent xenstore 'ring-page-order' set by malicious blkfront"), the
behaviour of xen-blkback when connecting to a frontend was:

- read 'ring-page-order'
- if not present then expect a single page ring specified by 'ring-ref'
- else expect a ring specified by 'ring-refX' where X is between 0 and
  1 << ring-page-order

This was correct behaviour, but was broken by the afforementioned commit to
become:

- read 'ring-page-order'
- if not present then expect a single page ring
- expect a ring specified by 'ring-refX' where X is between 0 and
  1 << ring-page-order
- if that didn't work then see if there's a single page ring specified by
  'ring-ref'

This incorrect behaviour works most of the time but fails when a frontend
that sets 'ring-page-order' is unloaded and replaced by one that does not
because, instead of reading 'ring-ref', xen-blkback will read the stale
'ring-ref0' left around by the previous frontend will try to map the wrong
grant reference.

This patch restores the original behaviour.

Fixes: 4a8c31a1c6f5 ("xen/blkback: rework connect_ring() to avoid inconsistent 
xenstore 'ring-page-order' set by malicious blkfront")
Signed-off-by: Paul Durrant 
---
Cc: Konrad Rzeszutek Wilk 
Cc: "Roger Pau Monné" 
Cc: Jens Axboe 
Cc: Dongli Zhang 
---
 drivers/block/xen-blkback/common.h |  1 +
 drivers/block/xen-blkback/xenbus.c | 36 +-
 2 files changed, 17 insertions(+), 20 deletions(-)

diff --git a/drivers/block/xen-blkback/common.h 
b/drivers/block/xen-blkback/common.h
index b0c71d3a81a0..524a79f10de6 100644
--- a/drivers/block/xen-blkback/common.h
+++ b/drivers/block/xen-blkback/common.h
@@ -313,6 +313,7 @@ struct xen_blkif {
 
struct work_struct  free_work;
unsigned intnr_ring_pages;
+   boolmulti_ref;
/* All rings for this device. */
struct xen_blkif_ring   *rings;
unsigned intnr_rings;
diff --git a/drivers/block/xen-blkback/xenbus.c 
b/drivers/block/xen-blkback/xenbus.c
index 9860d4842f36..4c1541cde68c 100644
--- a/drivers/block/xen-blkback/xenbus.c
+++ b/drivers/block/xen-blkback/xenbus.c
@@ -998,10 +998,15 @@ static int read_per_ring_refs(struct xen_blkif_ring 
*ring, const char *dir)
for (i = 0; i < nr_grefs; i++) {
char ring_ref_name[RINGREF_NAME_LEN];
 
-   snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", i);
+   if (blkif->multi_ref)
+   snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", 
i);
+   else {
+   WARN_ON(i != 0);
+   snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref");
+   }
+
err = xenbus_scanf(XBT_NIL, dir, ring_ref_name,
   "%u", _ref[i]);
-
if (err != 1) {
if (nr_grefs == 1)
break;
@@ -1013,18 +1018,6 @@ static int read_per_ring_refs(struct xen_blkif_ring 
*ring, const char *dir)
}
}
 
-   if (err != 1) {
-   WARN_ON(nr_grefs != 1);
-
-   err = xenbus_scanf(XBT_NIL, dir, "ring-ref", "%u",
-  _ref[0]);
-   if (err != 1) {
-   err = -EINVAL;
-   xenbus_dev_fatal(dev, err, "reading %s/ring-ref", dir);
-   return err;
-   }
-   }
-
err = -ENOMEM;
for (i = 0; i < nr_grefs * XEN_BLKIF_REQS_PER_PAGE; i++) {
req = kzalloc(sizeof(*req), GFP_KERNEL);
@@ -1129,10 +1122,15 @@ static int connect_ring(struct backend_info *be)
 blkif->nr_rings, blkif->blk_protocol, protocol,
 blkif->vbd.feature_gnt_persistent ? "persistent grants" : "");
 
-   ring_page_order = xenbus_read_unsigned(dev->otherend,
-  "ring-page-order", 0);
-
-   if (ring_page_order > xen_blkif_max_ring_order) {
+   err = xenbus_scanf(XBT_NIL, dev->otherend, "ring-page-order", "%u",
+  _page_order);
+   if (err != 1) {
+   blkif->nr_ring_pages = 1;
+   blkif->multi_ref = false;
+   } else if (ring_page_order <= xen_blkif_max_ring_order) {
+   blkif->nr_ring_pages = 1 << ring_page_order;
+   blkif->multi_ref = true;
+   } else {
err = -EINVAL;
xenbus_dev_fatal(dev, err,
 "requested ring page order %d exceed max:%d",
@@ -1141,8 +1139,6 @@ static int connect_ring(struct backend_info *be)
return err;
}
 
-   blkif->nr_ring_pages = 1 << ring_page_order;
-
if (blkif->nr_rings == 1)
return read_per_ring_refs(>rings[0], dev->otherend);
else {
-- 
2.17.1

RE: [PATCH 2/7] net: xen-netback: xenbus: Demote nonconformant kernel-doc headers

2021-01-18 Thread Paul Durrant

> -Original Message-
> From: Lee Jones 
> Sent: 15 January 2021 20:09
> To: lee.jo...@linaro.org
> Cc: linux-kernel@vger.kernel.org; Wei Liu ; Paul Durrant 
> ; David S.
> Miller ; Jakub Kicinski ; Alexei 
> Starovoitov ;
> Daniel Borkmann ; Jesper Dangaard Brouer 
> ; John Fastabend
> ; Rusty Russell ; 
> xen-de...@lists.xenproject.org;
> net...@vger.kernel.org; b...@vger.kernel.org; Andrew Lunn 
> Subject: [PATCH 2/7] net: xen-netback: xenbus: Demote nonconformant 
> kernel-doc headers
> 
> Fixes the following W=1 kernel build warning(s):
> 
>  drivers/net/xen-netback/xenbus.c:419: warning: Function parameter or member 
> 'dev' not described in
> 'frontend_changed'
>  drivers/net/xen-netback/xenbus.c:419: warning: Function parameter or member 
> 'frontend_state' not
> described in 'frontend_changed'
>  drivers/net/xen-netback/xenbus.c:1001: warning: Function parameter or member 
> 'dev' not described in
> 'netback_probe'
>  drivers/net/xen-netback/xenbus.c:1001: warning: Function parameter or member 
> 'id' not described in
> 'netback_probe'
> 
> Cc: Wei Liu 
> Cc: Paul Durrant 
> Cc: "David S. Miller" 
> Cc: Jakub Kicinski 
> Cc: Alexei Starovoitov 
> Cc: Daniel Borkmann 
> Cc: Jesper Dangaard Brouer 
> Cc: John Fastabend 
> Cc: Rusty Russell 
> Cc: xen-de...@lists.xenproject.org
> Cc: net...@vger.kernel.org
> Cc: b...@vger.kernel.org
> Reviewed-by: Andrew Lunn 
> Signed-off-by: Lee Jones 

Reviewed-by: Paul Durrant

RE: [PATCH v3 3/3] xen/privcmd: Convert get_user_pages() to pin_user_pages()

2020-07-13 Thread Paul Durrant

> -Original Message-
> From: Souptick Joarder 
> Sent: 12 July 2020 04:40
> To: boris.ostrov...@oracle.com; jgr...@suse.com; sstabell...@kernel.org
> Cc: xen-de...@lists.xenproject.org; linux-kernel@vger.kernel.org; Souptick 
> Joarder
> ; John Hubbard ; Paul Durrant 
> 
> Subject: [PATCH v3 3/3] xen/privcmd: Convert get_user_pages*() to 
> pin_user_pages*()
> 
> In 2019, we introduced pin_user_pages*() and now we are converting
> get_user_pages*() to the new API as appropriate. [1] & [2] could
> be referred for more information. This is case 5 as per document [1].
> 
> [1] Documentation/core-api/pin_user_pages.rst
> 
> [2] "Explicit pinning of user-space pages":
> https://lwn.net/Articles/807108/
> 
> Signed-off-by: Souptick Joarder 
> Reviewed-by: Juergen Gross 
> Cc: John Hubbard 
> Cc: Boris Ostrovsky 
> Cc: Paul Durrant 

Reviewed-by: Paul Durrant 

> ---
>  drivers/xen/privcmd.c | 10 ++
>  1 file changed, 2 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
> index 079d35b..63abe6c 100644
> --- a/drivers/xen/privcmd.c
> +++ b/drivers/xen/privcmd.c
> @@ -593,7 +593,7 @@ static int lock_pages(
>   if (requested > nr_pages)
>   return -ENOSPC;
> 
> - page_count = get_user_pages_fast(
> + page_count = pin_user_pages_fast(
>   (unsigned long) kbufs[i].uptr,
>   requested, FOLL_WRITE, pages);
>   if (page_count < 0)
> @@ -609,13 +609,7 @@ static int lock_pages(
> 
>  static void unlock_pages(struct page *pages[], unsigned int nr_pages)
>  {
> - unsigned int i;
> -
> - for (i = 0; i < nr_pages; i++) {
> - if (!PageDirty(pages[i]))
> - set_page_dirty_lock(pages[i]);
> - put_page(pages[i]);
> - }
> + unpin_user_pages_dirty_lock(pages, nr_pages, true);
>  }
> 
>  static long privcmd_ioctl_dm_op(struct file *file, void __user *udata)
> --
> 1.9.1

RE: [PATCH v3 2/3] xen/privcmd: Mark pages as dirty

2020-07-13 Thread Paul Durrant

> -Original Message-
> From: Souptick Joarder 
> Sent: 12 July 2020 04:40
> To: boris.ostrov...@oracle.com; jgr...@suse.com; sstabell...@kernel.org
> Cc: xen-de...@lists.xenproject.org; linux-kernel@vger.kernel.org; Souptick 
> Joarder
> ; John Hubbard ; Paul Durrant 
> 
> Subject: [PATCH v3 2/3] xen/privcmd: Mark pages as dirty
> 
> pages need to be marked as dirty before unpinned it in
> unlock_pages() which was oversight. This is fixed now.
> 
> Signed-off-by: Souptick Joarder 
> Suggested-by: John Hubbard 
> Reviewed-by: Juergen Gross 
> Cc: John Hubbard 
> Cc: Boris Ostrovsky 
> Cc: Paul Durrant 

Reviewed-by: Paul Durrant 

> ---
>  drivers/xen/privcmd.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
> index b001673..079d35b 100644
> --- a/drivers/xen/privcmd.c
> +++ b/drivers/xen/privcmd.c
> @@ -611,8 +611,11 @@ static void unlock_pages(struct page *pages[], unsigned 
> int nr_pages)
>  {
>   unsigned int i;
> 
> - for (i = 0; i < nr_pages; i++)
> + for (i = 0; i < nr_pages; i++) {
> + if (!PageDirty(pages[i]))
> + set_page_dirty_lock(pages[i]);
>   put_page(pages[i]);
> + }
>  }
> 
>  static long privcmd_ioctl_dm_op(struct file *file, void __user *udata)
> --
> 1.9.1

RE: [PATCH v3 1/3] xen/privcmd: Corrected error handling path

2020-07-13 Thread Paul Durrant

> -Original Message-
> From: Souptick Joarder 
> Sent: 12 July 2020 04:40
> To: boris.ostrov...@oracle.com; jgr...@suse.com; sstabell...@kernel.org
> Cc: xen-de...@lists.xenproject.org; linux-kernel@vger.kernel.org; Souptick 
> Joarder
> ; John Hubbard ; Paul Durrant 
> 
> Subject: [PATCH v3 1/3] xen/privcmd: Corrected error handling path
> 
> Previously, if lock_pages() end up partially mapping pages, it used
> to return -ERRNO due to which unlock_pages() have to go through
> each pages[i] till *nr_pages* to validate them. This can be avoided
> by passing correct number of partially mapped pages & -ERRNO separately,
> while returning from lock_pages() due to error.
> 
> With this fix unlock_pages() doesn't need to validate pages[i] till
> *nr_pages* for error scenario and few condition checks can be ignored.
> 
> Signed-off-by: Souptick Joarder 
> Reviewed-by: Juergen Gross 
> Cc: John Hubbard 
> Cc: Boris Ostrovsky 
> Cc: Paul Durrant 

Reviewed-by: Paul Durrant 

> ---
>  drivers/xen/privcmd.c | 31 +++
>  1 file changed, 15 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
> index 5dfc59f..b001673 100644
> --- a/drivers/xen/privcmd.c
> +++ b/drivers/xen/privcmd.c
> @@ -579,13 +579,13 @@ static long privcmd_ioctl_mmap_batch(
> 
>  static int lock_pages(
>   struct privcmd_dm_op_buf kbufs[], unsigned int num,
> - struct page *pages[], unsigned int nr_pages)
> + struct page *pages[], unsigned int nr_pages, unsigned int *pinned)
>  {
>   unsigned int i;
> 
>   for (i = 0; i < num; i++) {
>   unsigned int requested;
> - int pinned;
> + int page_count;
> 
>   requested = DIV_ROUND_UP(
>   offset_in_page(kbufs[i].uptr) + kbufs[i].size,
> @@ -593,14 +593,15 @@ static int lock_pages(
>   if (requested > nr_pages)
>   return -ENOSPC;
> 
> - pinned = get_user_pages_fast(
> + page_count = get_user_pages_fast(
>   (unsigned long) kbufs[i].uptr,
>   requested, FOLL_WRITE, pages);
> - if (pinned < 0)
> - return pinned;
> + if (page_count < 0)
> + return page_count;
> 
> - nr_pages -= pinned;
> - pages += pinned;
> + *pinned += page_count;
> + nr_pages -= page_count;
> + pages += page_count;
>   }
> 
>   return 0;
> @@ -610,13 +611,8 @@ static void unlock_pages(struct page *pages[], unsigned 
> int nr_pages)
>  {
>   unsigned int i;
> 
> - if (!pages)
> - return;
> -
> - for (i = 0; i < nr_pages; i++) {
> - if (pages[i])
> - put_page(pages[i]);
> - }
> + for (i = 0; i < nr_pages; i++)
> + put_page(pages[i]);
>  }
> 
>  static long privcmd_ioctl_dm_op(struct file *file, void __user *udata)
> @@ -629,6 +625,7 @@ static long privcmd_ioctl_dm_op(struct file *file, void 
> __user *udata)
>   struct xen_dm_op_buf *xbufs = NULL;
>   unsigned int i;
>   long rc;
> + unsigned int pinned = 0;
> 
>   if (copy_from_user(, udata, sizeof(kdata)))
>   return -EFAULT;
> @@ -682,9 +679,11 @@ static long privcmd_ioctl_dm_op(struct file *file, void 
> __user *udata)
>   goto out;
>   }
> 
> - rc = lock_pages(kbufs, kdata.num, pages, nr_pages);
> - if (rc)
> + rc = lock_pages(kbufs, kdata.num, pages, nr_pages, );
> + if (rc < 0) {
> + nr_pages = pinned;
>   goto out;
> + }
> 
>   for (i = 0; i < kdata.num; i++) {
>   set_xen_guest_handle(xbufs[i].h, kbufs[i].uptr);
> --
> 1.9.1

RE: [RFC PATCH] xen/privcmd: Convert get_user_pages() to pin_user_pages()

2020-06-19 Thread Paul Durrant

> -Original Message-
> From: Boris Ostrovsky 
> Sent: 17 June 2020 18:57
> To: Souptick Joarder ; jgr...@suse.com; 
> sstabell...@kernel.org
> Cc: xen-de...@lists.xenproject.org; linux-kernel@vger.kernel.org; John 
> Hubbard ;
> p...@xen.org
> Subject: Re: [RFC PATCH] xen/privcmd: Convert get_user_pages*() to 
> pin_user_pages*()
> 
> On 6/16/20 11:14 PM, Souptick Joarder wrote:
> > In 2019, we introduced pin_user_pages*() and now we are converting
> > get_user_pages*() to the new API as appropriate. [1] & [2] could
> > be referred for more information.
> >
> > [1] Documentation/core-api/pin_user_pages.rst
> >
> > [2] "Explicit pinning of user-space pages":
> > https://lwn.net/Articles/807108/
> >
> > Signed-off-by: Souptick Joarder 
> > Cc: John Hubbard 
> > ---
> > Hi,
> >
> > I have compile tested this patch but unable to run-time test,
> > so any testing help is much appriciated.
> >
> > Also have a question, why the existing code is not marking the
> > pages dirty (since it did FOLL_WRITE) ?
> 
> 
> Indeed, seems to me it should. Paul?
> 

Yes, it looks like that was an oversight. The hypercall may well result in data 
being copied back into the buffers so the whole pages array should be 
considered dirty.

  Paul

> 
> >
> >  drivers/xen/privcmd.c | 7 ++-
> >  1 file changed, 2 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
> > index a250d11..543739e 100644
> > --- a/drivers/xen/privcmd.c
> > +++ b/drivers/xen/privcmd.c
> > @@ -594,7 +594,7 @@ static int lock_pages(
> > if (requested > nr_pages)
> > return -ENOSPC;
> >
> > -   pinned = get_user_pages_fast(
> > +   pinned = pin_user_pages_fast(
> > (unsigned long) kbufs[i].uptr,
> > requested, FOLL_WRITE, pages);
> > if (pinned < 0)
> > @@ -614,10 +614,7 @@ static void unlock_pages(struct page *pages[], 
> > unsigned int nr_pages)
> > if (!pages)
> > return;
> >
> > -   for (i = 0; i < nr_pages; i++) {
> > -   if (pages[i])
> > -   put_page(pages[i]);
> > -   }
> > +   unpin_user_pages(pages, nr_pages);
> 
> 
> Why are you no longer checking for valid pages?
> 
> 
> -boris
> 
> 
>

Re: [PATCH 2/2] xen/netback: cleanup init and deinit code

2019-10-14 Thread Paul Durrant

On Mon, 14 Oct 2019 at 10:09, Juergen Gross  wrote:
>
> Do some cleanup of the netback init and deinit code:
>
> - add an omnipotent queue deinit function usable from
>   xenvif_disconnect_data() and the error path of xenvif_connect_data()
> - only install the irq handlers after initializing all relevant items
>   (especially the kthreads related to the queue)
> - there is no need to use get_task_struct() after creating a kthread
>   and using put_task_struct() again after having stopped it.
> - use kthread_run() instead of kthread_create() to spare the call of
>   wake_up_process().

I guess the reason it was done that way was to ensure that queue->task
and queue->dealloc_task would be set before the relevant threads
executed, but I don't see anywhere relying on this so I guess change
is safe. The rest of it looks fine.

>
> Signed-off-by: Juergen Gross 

Reviewed-by: Paul Durrant 

> ---
>  drivers/net/xen-netback/interface.c | 114 
> +---
>  1 file changed, 54 insertions(+), 60 deletions(-)
>
> diff --git a/drivers/net/xen-netback/interface.c 
> b/drivers/net/xen-netback/interface.c
> index 103ed00775eb..68dd7bb07ca6 100644
> --- a/drivers/net/xen-netback/interface.c
> +++ b/drivers/net/xen-netback/interface.c
> @@ -626,6 +626,38 @@ int xenvif_connect_ctrl(struct xenvif *vif, grant_ref_t 
> ring_ref,
> return err;
>  }
>
> +static void xenvif_disconnect_queue(struct xenvif_queue *queue)
> +{
> +   if (queue->tx_irq) {
> +   unbind_from_irqhandler(queue->tx_irq, queue);
> +   if (queue->tx_irq == queue->rx_irq)
> +   queue->rx_irq = 0;
> +   queue->tx_irq = 0;
> +   }
> +
> +   if (queue->rx_irq) {
> +   unbind_from_irqhandler(queue->rx_irq, queue);
> +   queue->rx_irq = 0;
> +   }
> +
> +   if (queue->task) {
> +   kthread_stop(queue->task);
> +   queue->task = NULL;
> +   }
> +
> +   if (queue->dealloc_task) {
> +   kthread_stop(queue->dealloc_task);
> +   queue->dealloc_task = NULL;
> +   }
> +
> +   if (queue->napi.poll) {
> +   netif_napi_del(>napi);
> +   queue->napi.poll = NULL;
> +   }
> +
> +   xenvif_unmap_frontend_data_rings(queue);
> +}
> +
>  int xenvif_connect_data(struct xenvif_queue *queue,
> unsigned long tx_ring_ref,
> unsigned long rx_ring_ref,
> @@ -651,13 +683,27 @@ int xenvif_connect_data(struct xenvif_queue *queue,
> netif_napi_add(queue->vif->dev, >napi, xenvif_poll,
> XENVIF_NAPI_WEIGHT);
>
> +   queue->stalled = true;
> +
> +   task = kthread_run(xenvif_kthread_guest_rx, queue,
> +  "%s-guest-rx", queue->name);
> +   if (IS_ERR(task))
> +   goto kthread_err;
> +   queue->task = task;
> +
> +   task = kthread_run(xenvif_dealloc_kthread, queue,
> +  "%s-dealloc", queue->name);
> +   if (IS_ERR(task))
> +   goto kthread_err;
> +   queue->dealloc_task = task;
> +
> if (tx_evtchn == rx_evtchn) {
> /* feature-split-event-channels == 0 */
> err = bind_interdomain_evtchn_to_irqhandler(
> queue->vif->domid, tx_evtchn, xenvif_interrupt, 0,
> queue->name, queue);
> if (err < 0)
> -   goto err_unmap;
> +   goto err;
> queue->tx_irq = queue->rx_irq = err;
> disable_irq(queue->tx_irq);
> } else {
> @@ -668,7 +714,7 @@ int xenvif_connect_data(struct xenvif_queue *queue,
> queue->vif->domid, tx_evtchn, xenvif_tx_interrupt, 0,
> queue->tx_irq_name, queue);
> if (err < 0)
> -   goto err_unmap;
> +   goto err;
> queue->tx_irq = err;
> disable_irq(queue->tx_irq);
>
> @@ -678,47 +724,18 @@ int xenvif_connect_data(struct xenvif_queue *queue,
> queue->vif->domid, rx_evtchn, xenvif_rx_interrupt, 0,
> queue->rx_irq_name, queue);
> if (err < 0)
> -   goto err_tx_unbind;
> +   goto err;
> queue->rx_irq = err;
> disable_irq(queue->rx_irq);
> }

Re: [PATCH 1/2] xen/netback: fix error path of xenvif_connect_data()

2019-10-14 Thread Paul Durrant

On Mon, 14 Oct 2019 at 10:09, Juergen Gross  wrote:
>
> xenvif_connect_data() calls module_put() in case of error. This is
> wrong as there is no related module_get().
>
> Remove the superfluous module_put().
>
> Fixes: 279f438e36c0a7 ("xen-netback: Don't destroy the netdev until the vif 
> is shut down")
> Cc:  # 3.12
> Signed-off-by: Juergen Gross 

Yes, looks like this should have been cleaned up a long time ago.

Reviewed-by: Paul Durrant 

> ---
>  drivers/net/xen-netback/interface.c | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/drivers/net/xen-netback/interface.c 
> b/drivers/net/xen-netback/interface.c
> index 240f762b3749..103ed00775eb 100644
> --- a/drivers/net/xen-netback/interface.c
> +++ b/drivers/net/xen-netback/interface.c
> @@ -719,7 +719,6 @@ int xenvif_connect_data(struct xenvif_queue *queue,
> xenvif_unmap_frontend_data_rings(queue);
> netif_napi_del(>napi);
>  err:
> -   module_put(THIS_MODULE);
> return err;
>  }
>
> --
> 2.16.4
>

RE: [PATCH] xen-netback: don't populate the hash cache on XenBus disconnect

2019-02-28 Thread Paul Durrant

> -Original Message-
> From: Igor Druzhinin [mailto:igor.druzhi...@citrix.com]
> Sent: 28 February 2019 14:11
> To: xen-de...@lists.xenproject.org; net...@vger.kernel.org; 
> linux-kernel@vger.kernel.org
> Cc: Wei Liu ; Paul Durrant ; 
> da...@davemloft.net; Igor
> Druzhinin 
> Subject: [PATCH] xen-netback: don't populate the hash cache on XenBus 
> disconnect
> 
> Occasionally, during the disconnection procedure on XenBus which
> includes hash cache deinitialization there might be some packets
> still in-flight on other processors. Handling of these packets includes
> hashing and hash cache population that finally results in hash cache
> data structure corruption.
> 
> In order to avoid this we prevent hashing of those packets if there
> are no queues initialized. In that case RCU protection of queues guards
> the hash cache as well.
> 
> Signed-off-by: Igor Druzhinin 

Reviewed-by: Paul Durrant 

> ---
> 
> Found this while applying the previous patch to our patchqueue. Seems it
> never went to the mailing list and, to my knowledge, the problem is still
> present. From my recollection, it only happened on stress frontend on/off
> test with Windows guests (since only those detach the frontend completely).
> So better late than never.
> 
> ---
>  drivers/net/xen-netback/hash.c  | 2 ++
>  drivers/net/xen-netback/interface.c | 7 +++
>  2 files changed, 9 insertions(+)
> 
> diff --git a/drivers/net/xen-netback/hash.c b/drivers/net/xen-netback/hash.c
> index 0ccb021..10d580c 100644
> --- a/drivers/net/xen-netback/hash.c
> +++ b/drivers/net/xen-netback/hash.c
> @@ -454,6 +454,8 @@ void xenvif_init_hash(struct xenvif *vif)
>   if (xenvif_hash_cache_size == 0)
>   return;
> 
> + BUG_ON(vif->hash.cache.count);
> +
>   spin_lock_init(>hash.cache.lock);
>   INIT_LIST_HEAD(>hash.cache.list);
>  }
> diff --git a/drivers/net/xen-netback/interface.c 
> b/drivers/net/xen-netback/interface.c
> index 182d677..6da1251 100644
> --- a/drivers/net/xen-netback/interface.c
> +++ b/drivers/net/xen-netback/interface.c
> @@ -153,6 +153,13 @@ static u16 xenvif_select_queue(struct net_device *dev, 
> struct sk_buff *skb,
>  {
>   struct xenvif *vif = netdev_priv(dev);
>   unsigned int size = vif->hash.size;
> + unsigned int num_queues;
> +
> + /* If queues are not set up internally - always return 0
> +  * as the packet going to be dropped anyway */
> + num_queues = READ_ONCE(vif->num_queues);
> + if (num_queues < 1)
> + return 0;
> 
>   if (vif->hash.alg == XEN_NETIF_CTRL_HASH_ALGORITHM_NONE)
>   return fallback(dev, skb, NULL) % dev->real_num_tx_queues;
> --
> 2.7.4

RE: [PATCH] xen-netback: fix occasional leak of grant ref mappings under memory pressure

2019-02-28 Thread Paul Durrant



> -Original Message-
> From: Xen-devel [mailto:xen-devel-boun...@lists.xenproject.org] On Behalf Of 
> Paul Durrant
> Sent: 28 February 2019 11:22
> To: Wei Liu 
> Cc: Igor Druzhinin ; Wei Liu 
> ; net...@vger.kernel.org;
> linux-kernel@vger.kernel.org; xen-de...@lists.xenproject.org; 
> da...@davemloft.net
> Subject: Re: [Xen-devel] [PATCH] xen-netback: fix occasional leak of grant 
> ref mappings under memory
> pressure
> 
> > -Original Message-
> > From: Wei Liu [mailto:wei.l...@citrix.com]
> > Sent: 28 February 2019 11:02
> > To: Paul Durrant 
> > Cc: Igor Druzhinin ; 
> > xen-de...@lists.xenproject.org;
> > net...@vger.kernel.org; linux-kernel@vger.kernel.org; Wei Liu 
> > ;
> > da...@davemloft.net
> > Subject: Re: [PATCH] xen-netback: fix occasional leak of grant ref mappings 
> > under memory pressure
> >
> > On Thu, Feb 28, 2019 at 09:46:57AM +, Paul Durrant wrote:
> > > > -Original Message-
> > > > From: Igor Druzhinin [mailto:igor.druzhi...@citrix.com]
> > > > Sent: 28 February 2019 02:03
> > > > To: xen-de...@lists.xenproject.org; net...@vger.kernel.org; 
> > > > linux-kernel@vger.kernel.org
> > > > Cc: Wei Liu ; Paul Durrant 
> > > > ; da...@davemloft.net;
> > Igor
> > > > Druzhinin 
> > > > Subject: [PATCH] xen-netback: fix occasional leak of grant ref mappings 
> > > > under memory pressure
> > > >
> > > > Zero-copy callback flag is not yet set on frag list skb at the moment
> > > > xenvif_handle_frag_list() returns -ENOMEM. This eventually results in
> > > > leaking grant ref mappings since xenvif_zerocopy_callback() is never
> > > > called for these fragments. Those eventually build up and cause Xen
> > > > to kill Dom0 as the slots get reused for new mappings.
> > > >
> > > > That behavior is observed under certain workloads where sudden spikes
> > > > of page cache usage for writes coexist with active atomic skb 
> > > > allocations.
> > > >
> > > > Signed-off-by: Igor Druzhinin 
> > > > ---
> > > >  drivers/net/xen-netback/netback.c | 3 +++
> > > >  1 file changed, 3 insertions(+)
> > > >
> > > > diff --git a/drivers/net/xen-netback/netback.c 
> > > > b/drivers/net/xen-netback/netback.c
> > > > index 80aae3a..2023317 100644
> > > > --- a/drivers/net/xen-netback/netback.c
> > > > +++ b/drivers/net/xen-netback/netback.c
> > > > @@ -1146,9 +1146,12 @@ static int xenvif_tx_submit(struct xenvif_queue 
> > > > *queue)
> > > >
> > > > if (unlikely(skb_has_frag_list(skb))) {
> > > > if (xenvif_handle_frag_list(queue, skb)) {
> > > > +   struct sk_buff *nskb =
> > > > +   
> > > > skb_shinfo(skb)->frag_list;
> > > > if (net_ratelimit())
> > > > netdev_err(queue->vif->dev,
> > > >"Not enough memory 
> > > > to consolidate frag_list!\n");
> > > > +   xenvif_skb_zerocopy_prepare(queue, 
> > > > nskb);
> > > > xenvif_skb_zerocopy_prepare(queue, skb);
> > > > kfree_skb(skb);
> > > > continue;
> > >
> > > Whilst this fix will do the job, I think it would be better to get rid of 
> > > the kfree_skb() from
> > inside xenvif_handle_frag_list() and always deal with it here rather than 
> > having it happen in two
> > different places. Something like the following...
> >
> > +1 for having only one place.
> >
> > >
> > > ---8<---
> > > diff --git a/drivers/net/xen-netback/netback.c 
> > > b/drivers/net/xen-netback/netback.c
> > > index 80aae3a32c2a..093c7b860772 100644
> > > --- a/drivers/net/xen-netback/netback.c
> > > +++ b/drivers/net/xen-netback/netback.c
> > > @@ -1027,13 +1027,13 @@ static void xenvif_tx_build_gops(struct 
> > > xenvif_queue *queue,
> > >  /* Consolidate skb with a frag_list into a brand new one with local 
> > > pages on
> > >   * frags. Returns 0 or -ENOMEM if can't allocate new pages.
> > >

RE: [PATCH] xen-netback: fix occasional leak of grant ref mappings under memory pressure

2019-02-28 Thread Paul Durrant

> -Original Message-
> From: Igor Druzhinin [mailto:igor.druzhi...@citrix.com]
> Sent: 28 February 2019 11:44
> To: Paul Durrant ; Wei Liu 
> Cc: xen-de...@lists.xenproject.org; net...@vger.kernel.org; 
> linux-kernel@vger.kernel.org;
> da...@davemloft.net
> Subject: Re: [PATCH] xen-netback: fix occasional leak of grant ref mappings 
> under memory pressure
> 
> On 28/02/2019 11:21, Paul Durrant wrote:
> >>> @@ -1153,6 +1152,10 @@ static int xenvif_tx_submit(struct xenvif_queue 
> >>> *queue)
> >>> kfree_skb(skb);
> >>> continue;
> >>> }
> >>> +
> >>> +   /* Copied all the bits from the frag list. */
> >>> +   skb_frag_list_init(skb);
> >>> +   kfree(nskb);
> >>
> >> I think you want kfree_skb here?
> >
> > No. nskb is the frag list... it is unlinked from skb by the call to 
> > skb_frag_list_init() and then it
> can be freed on its own. The skb is what we need to retain, because that now 
> contains all the data.
> >
> 
> Are you saying previous code in xenvif_handle_frag_list() incorrectly
> called kfree_skb()?

No, it correctly called kfree_skb() on nskb in the success case. What Wei and 
myself would prefer is that we have a single place that the frag list is freed 
in both the success and error cases.

  Paul

> 
> Igor

RE: [PATCH] xen-netback: fix occasional leak of grant ref mappings under memory pressure

2019-02-28 Thread Paul Durrant

> -Original Message-
> From: Wei Liu [mailto:wei.l...@citrix.com]
> Sent: 28 February 2019 11:02
> To: Paul Durrant 
> Cc: Igor Druzhinin ; 
> xen-de...@lists.xenproject.org;
> net...@vger.kernel.org; linux-kernel@vger.kernel.org; Wei Liu 
> ;
> da...@davemloft.net
> Subject: Re: [PATCH] xen-netback: fix occasional leak of grant ref mappings 
> under memory pressure
> 
> On Thu, Feb 28, 2019 at 09:46:57AM +, Paul Durrant wrote:
> > > -Original Message-
> > > From: Igor Druzhinin [mailto:igor.druzhi...@citrix.com]
> > > Sent: 28 February 2019 02:03
> > > To: xen-de...@lists.xenproject.org; net...@vger.kernel.org; 
> > > linux-kernel@vger.kernel.org
> > > Cc: Wei Liu ; Paul Durrant 
> > > ; da...@davemloft.net;
> Igor
> > > Druzhinin 
> > > Subject: [PATCH] xen-netback: fix occasional leak of grant ref mappings 
> > > under memory pressure
> > >
> > > Zero-copy callback flag is not yet set on frag list skb at the moment
> > > xenvif_handle_frag_list() returns -ENOMEM. This eventually results in
> > > leaking grant ref mappings since xenvif_zerocopy_callback() is never
> > > called for these fragments. Those eventually build up and cause Xen
> > > to kill Dom0 as the slots get reused for new mappings.
> > >
> > > That behavior is observed under certain workloads where sudden spikes
> > > of page cache usage for writes coexist with active atomic skb allocations.
> > >
> > > Signed-off-by: Igor Druzhinin 
> > > ---
> > >  drivers/net/xen-netback/netback.c | 3 +++
> > >  1 file changed, 3 insertions(+)
> > >
> > > diff --git a/drivers/net/xen-netback/netback.c 
> > > b/drivers/net/xen-netback/netback.c
> > > index 80aae3a..2023317 100644
> > > --- a/drivers/net/xen-netback/netback.c
> > > +++ b/drivers/net/xen-netback/netback.c
> > > @@ -1146,9 +1146,12 @@ static int xenvif_tx_submit(struct xenvif_queue 
> > > *queue)
> > >
> > >   if (unlikely(skb_has_frag_list(skb))) {
> > >   if (xenvif_handle_frag_list(queue, skb)) {
> > > + struct sk_buff *nskb =
> > > + skb_shinfo(skb)->frag_list;
> > >   if (net_ratelimit())
> > >   netdev_err(queue->vif->dev,
> > >  "Not enough memory to 
> > > consolidate frag_list!\n");
> > > + xenvif_skb_zerocopy_prepare(queue, nskb);
> > >   xenvif_skb_zerocopy_prepare(queue, skb);
> > >   kfree_skb(skb);
> > >   continue;
> >
> > Whilst this fix will do the job, I think it would be better to get rid of 
> > the kfree_skb() from
> inside xenvif_handle_frag_list() and always deal with it here rather than 
> having it happen in two
> different places. Something like the following...
> 
> +1 for having only one place.
> 
> >
> > ---8<---
> > diff --git a/drivers/net/xen-netback/netback.c 
> > b/drivers/net/xen-netback/netback.c
> > index 80aae3a32c2a..093c7b860772 100644
> > --- a/drivers/net/xen-netback/netback.c
> > +++ b/drivers/net/xen-netback/netback.c
> > @@ -1027,13 +1027,13 @@ static void xenvif_tx_build_gops(struct 
> > xenvif_queue *queue,
> >  /* Consolidate skb with a frag_list into a brand new one with local pages 
> > on
> >   * frags. Returns 0 or -ENOMEM if can't allocate new pages.
> >   */
> > -static int xenvif_handle_frag_list(struct xenvif_queue *queue, struct 
> > sk_buff *skb)
> > +static int xenvif_handle_frag_list(struct xenvif_queue *queue, struct 
> > sk_buff *diff --git
> a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
> > index 80aae3a32c2a..093c7b860772 100644
> > --- a/drivers/net/xen-netback/netback.c
> > +++ b/drivers/net/xen-netback/netback.c
> > @@ -1027,13 +1027,13 @@ static void xenvif_tx_build_gops(struct 
> > xenvif_queue *qu
> > eue,
> >  /* Consolidate skb with a frag_list into a brand new one with local pages 
> > on
> >   * frags. Returns 0 or -ENOMEM if can't allocate new pages.
> >   */
> > -static int xenvif_handle_frag_list(struct xenvif_queue *queue, struct 
> > sk_buff *
> > skb)
> > +static int xenvif_handle_frag_list(struct xenvif_queue *queue, struct 
> > sk_buff *
> > skb,
> > +

RE: [PATCH] xen-netback: fix occasional leak of grant ref mappings under memory pressure

2019-02-28 Thread Paul Durrant

> -Original Message-
> From: Igor Druzhinin [mailto:igor.druzhi...@citrix.com]
> Sent: 28 February 2019 02:03
> To: xen-de...@lists.xenproject.org; net...@vger.kernel.org; 
> linux-kernel@vger.kernel.org
> Cc: Wei Liu ; Paul Durrant ; 
> da...@davemloft.net; Igor
> Druzhinin 
> Subject: [PATCH] xen-netback: fix occasional leak of grant ref mappings under 
> memory pressure
> 
> Zero-copy callback flag is not yet set on frag list skb at the moment
> xenvif_handle_frag_list() returns -ENOMEM. This eventually results in
> leaking grant ref mappings since xenvif_zerocopy_callback() is never
> called for these fragments. Those eventually build up and cause Xen
> to kill Dom0 as the slots get reused for new mappings.
> 
> That behavior is observed under certain workloads where sudden spikes
> of page cache usage for writes coexist with active atomic skb allocations.
> 
> Signed-off-by: Igor Druzhinin 
> ---
>  drivers/net/xen-netback/netback.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/net/xen-netback/netback.c 
> b/drivers/net/xen-netback/netback.c
> index 80aae3a..2023317 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -1146,9 +1146,12 @@ static int xenvif_tx_submit(struct xenvif_queue *queue)
> 
>   if (unlikely(skb_has_frag_list(skb))) {
>   if (xenvif_handle_frag_list(queue, skb)) {
> + struct sk_buff *nskb =
> + skb_shinfo(skb)->frag_list;
>   if (net_ratelimit())
>   netdev_err(queue->vif->dev,
>  "Not enough memory to 
> consolidate frag_list!\n");
> + xenvif_skb_zerocopy_prepare(queue, nskb);
>   xenvif_skb_zerocopy_prepare(queue, skb);
>   kfree_skb(skb);
>   continue;

Whilst this fix will do the job, I think it would be better to get rid of the 
kfree_skb() from inside xenvif_handle_frag_list() and always deal with it here 
rather than having it happen in two different places. Something like the 
following...

---8<---
diff --git a/drivers/net/xen-netback/netback.c 
b/drivers/net/xen-netback/netback.c
index 80aae3a32c2a..093c7b860772 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -1027,13 +1027,13 @@ static void xenvif_tx_build_gops(struct xenvif_queue 
*queue,
 /* Consolidate skb with a frag_list into a brand new one with local pages on
  * frags. Returns 0 or -ENOMEM if can't allocate new pages.
  */
-static int xenvif_handle_frag_list(struct xenvif_queue *queue, struct sk_buff 
*skb)
+static int xenvif_handle_frag_list(struct xenvif_queue *queue, struct sk_buff 
*diff --git a/drivers/net/xen-netback/netback.c 
b/drivers/net/xen-netback/netback.c
index 80aae3a32c2a..093c7b860772 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -1027,13 +1027,13 @@ static void xenvif_tx_build_gops(struct xenvif_queue *qu
eue,
 /* Consolidate skb with a frag_list into a brand new one with local pages on
  * frags. Returns 0 or -ENOMEM if can't allocate new pages.
  */
-static int xenvif_handle_frag_list(struct xenvif_queue *queue, struct sk_buff *
skb)
+static int xenvif_handle_frag_list(struct xenvif_queue *queue, struct sk_buff *
skb,
+  struct sk_buff *nskb)
 {
unsigned int offset = skb_headlen(skb);
skb_frag_t frags[MAX_SKB_FRAGS];
int i, f;
struct ubuf_info *uarg;
-   struct sk_buff *nskb = skb_shinfo(skb)->frag_list;

queue->stats.tx_zerocopy_sent += 2;
queue->stats.tx_frag_overflow++;
@@ -1072,11 +1072,6 @@ static int xenvif_handle_frag_list(struct xenvif_queue *q
ueue, struct sk_buff *s
skb_frag_size_set([i], len);
}

-   /* Copied all the bits from the frag list -- free it. */
-   skb_frag_list_init(skb);
-   xenvif_skb_zerocopy_prepare(queue, nskb);
-   kfree_skb(nskb);
-
/* Release all the original (foreign) frags. */
for (f = 0; f < skb_shinfo(skb)->nr_frags; f++)
skb_frag_unref(skb, f);
@@ -1145,7 +1140,11 @@ static int xenvif_tx_submit(struct xenvif_queue *queue)
xenvif_fill_frags(queue, skb);

if (unlikely(skb_has_frag_list(skb))) {
-   if (xenvif_handle_frag_list(queue, skb)) {
+   struct sk_buff *nskb = skb_shinfo(skb)->frag_list;
+
+   xenvif_skb_zerocopy_prepare(queue, nskb);
+
+   if (xenvif_handle_frag_list(queue, skb, nskb)) {
if (net_ratelim

RE: [Xen-devel] xen/evtchn and forced threaded irq

2019-02-26 Thread Paul Durrant

> -Original Message-
> From: Xen-devel [mailto:xen-devel-boun...@lists.xenproject.org] On Behalf Of 
> Andrew Cooper
> Sent: 26 February 2019 09:30
> To: Roger Pau Monne ; Julien Grall 
> 
> Cc: Juergen Gross ; Stefano Stabellini 
> ; Oleksandr
> Andrushchenko ; linux-kernel@vger.kernel.org; Jan Beulich 
> ;
> xen-devel ; Boris Ostrovsky 
> ; Dave P
> Martin 
> Subject: Re: [Xen-devel] xen/evtchn and forced threaded irq
> 
> On 26/02/2019 09:14, Roger Pau Monné wrote:
> > On Mon, Feb 25, 2019 at 01:55:42PM +, Julien Grall wrote:
> >> Hi Oleksandr,
> >>
> >> On 25/02/2019 13:24, Oleksandr Andrushchenko wrote:
> >>> On 2/22/19 3:33 PM, Julien Grall wrote:
>  Hi,
> 
>  On 22/02/2019 12:38, Oleksandr Andrushchenko wrote:
> > On 2/20/19 10:46 PM, Julien Grall wrote:
> >> Discussing with my team, a solution that came up would be to
> >> introduce one atomic field per event to record the number of
> >> event received. I will explore that solution tomorrow.
> > How will this help if events have some payload?
>  What payload? The event channel does not carry any payload. It only
>  notify you that something happen. Then this is up to the user to
>  decide what to you with it.
> >>> Sorry, I was probably not precise enough. I mean that an event might have
> >>> associated payload in the ring buffer, for example [1]. So, counting 
> >>> events
> >>> may help somehow, but the ring's data may still be lost
> >> From my understanding of event channels are edge interrupts. By definition,
> > IMO event channels are active high level interrupts.
> >
> > Let's take into account the following situation: you have an event
> > channel masked and the event channel pending bit (akin to the line on
> > bare metal) goes from low to high (0 -> 1), then you unmask the
> > interrupt and you get an event injected. If it was an edge interrupt
> > you wont get an event injected after unmasking, because you would
> > have lost the edge. I think the problem here is that Linux treats
> > event channels as edge interrupts, when they are actually level.
> 
> Event channels are edge interrupts.  There are several very subtle bugs
> to be had by software which treats them as line interrupts.

They are more subtle than that are they not? There is a single per-vcpu ACK 
which can cover multiple event channels.

  Paul

> 
> Most critically, if you fail to ack them, rebind them to a new vcpu, and
> reenable interrupts, you don't get a new interrupt notification.  This
> was the source of a 4 month bug when XenServer was moving from
> classic-xen to PVOps where using irqbalance would cause dom0 to
> occasionally lose interrupts.
> 
> ~Andrew
> 
> ___
> Xen-devel mailing list
> xen-de...@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel

RE: [PATCH v4 2/2] xen/blkback: rework connect_ring() to avoid inconsistent xenstore 'ring-page-order' set by malicious blkfront

2019-01-07 Thread Paul Durrant

> -Original Message-
> From: Dongli Zhang [mailto:dongli.zh...@oracle.com]
> Sent: 07 January 2019 05:36
> To: xen-de...@lists.xenproject.org; linux-bl...@vger.kernel.org; linux-
> ker...@vger.kernel.org
> Cc: konrad.w...@oracle.com; Roger Pau Monne ;
> ax...@kernel.dk; Paul Durrant 
> Subject: [PATCH v4 2/2] xen/blkback: rework connect_ring() to avoid
> inconsistent xenstore 'ring-page-order' set by malicious blkfront
> 
> The xenstore 'ring-page-order' is used globally for each blkback queue and
> therefore should be read from xenstore only once. However, it is obtained
> in read_per_ring_refs() which might be called multiple times during the
> initialization of each blkback queue.
> 
> If the blkfront is malicious and the 'ring-page-order' is set in different
> value by blkfront every time before blkback reads it, this may end up at
> the "WARN_ON(i != (XEN_BLKIF_REQS_PER_PAGE * blkif->nr_ring_pages));" in
> xen_blkif_disconnect() when frontend is destroyed.
> 
> This patch reworks connect_ring() to read xenstore 'ring-page-order' only
> once.
> 
> Signed-off-by: Dongli Zhang 
> ---
> Changed since v1:
>   * change the order of xenstore read in read_per_ring_refs
>   * use xenbus_read_unsigned() in connect_ring()
> 
> Changed since v2:
>   * simplify the condition check as "(err != 1 && nr_grefs > 1)"
>   * avoid setting err as -EINVAL to remove extra one line of code
> 
> Changed since v3:
>   * exit at the beginning if !nr_grefs
>   * change the if statements to avoid test (err != 1) twice
>   * initialize a 'blkif' stack variable (refer to PATCH 1/2)
> 
>  drivers/block/xen-blkback/xenbus.c | 76 +
> -
>  1 file changed, 43 insertions(+), 33 deletions(-)
> 
> diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-
> blkback/xenbus.c
> index a4aadac..a2acbc9 100644
> --- a/drivers/block/xen-blkback/xenbus.c
> +++ b/drivers/block/xen-blkback/xenbus.c
> @@ -926,7 +926,7 @@ static int read_per_ring_refs(struct xen_blkif_ring
> *ring, const char *dir)
>   int err, i, j;
>   struct xen_blkif *blkif = ring->blkif;
>   struct xenbus_device *dev = blkif->be->dev;
> - unsigned int ring_page_order, nr_grefs, evtchn;
> + unsigned int nr_grefs, evtchn;
> 
>   err = xenbus_scanf(XBT_NIL, dir, "event-channel", "%u",
> );
> @@ -936,43 +936,38 @@ static int read_per_ring_refs(struct xen_blkif_ring
> *ring, const char *dir)
>   return err;
>   }
> 
> - err = xenbus_scanf(XBT_NIL, dev->otherend, "ring-page-order", "%u",
> -   _page_order);
> - if (err != 1) {
> - err = xenbus_scanf(XBT_NIL, dir, "ring-ref", "%u",
> _ref[0]);
> + nr_grefs = blkif->nr_ring_pages;
> +
> + if (unlikely(!nr_grefs))
> + return -EINVAL;
> +
> + for (i = 0; i < nr_grefs; i++) {
> + char ring_ref_name[RINGREF_NAME_LEN];
> +
> + snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", i);
> + err = xenbus_scanf(XBT_NIL, dir, ring_ref_name,
> +"%u", _ref[i]);
> +
>   if (err != 1) {
> - err = -EINVAL;
> - xenbus_dev_fatal(dev, err, "reading %s/ring-ref", dir);
> - return err;
> - }
> - nr_grefs = 1;
> - } else {
> - unsigned int i;
> -
> - if (ring_page_order > xen_blkif_max_ring_order) {
> - err = -EINVAL;
> - xenbus_dev_fatal(dev, err, "%s/request %d ring page
> order exceed max:%d",
> -  dir, ring_page_order,
> -  xen_blkif_max_ring_order);
> - return err;
> + if (nr_grefs == 1)
> + break;
> +
> + xenbus_dev_fatal(dev, err, "reading %s/%s",
> +  dir, ring_ref_name);

This patch looks much better, but I guess you don't want to be using 'err' in 
the above call as it will still be set to whatever xenbus_scanf() returned. 
Probably neatest to just leave the "err = -EINVAL" and "return err" alone.

> + return -EINVAL;
>   }
> + }
> 
> - nr_grefs = 1 << ring_page_order;
> - for (i = 0; i < nr_grefs; i++) {
> - char ring_ref_name[

RE: [PATCH v4 1/2] xen/blkback: add stack variable 'blkif' in connect_ring()

2019-01-07 Thread Paul Durrant

> -Original Message-
> From: Dongli Zhang [mailto:dongli.zh...@oracle.com]
> Sent: 07 January 2019 05:36
> To: xen-de...@lists.xenproject.org; linux-bl...@vger.kernel.org; linux-
> ker...@vger.kernel.org
> Cc: konrad.w...@oracle.com; Roger Pau Monne ;
> ax...@kernel.dk; Paul Durrant 
> Subject: [PATCH v4 1/2] xen/blkback: add stack variable 'blkif' in
> connect_ring()
> 
> As 'be->blkif' is used for many times in connect_ring(), the stack
> variable
> 'blkif' is added to substitute 'be-blkif'.
> 
> Suggested-by: Paul Durrant 
> Signed-off-by: Dongli Zhang 

That looks better :-)

Reviewed-by: Paul Durrant 

> ---
>  drivers/block/xen-blkback/xenbus.c | 27 ++-
>  1 file changed, 14 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-
> blkback/xenbus.c
> index a4bc74e..a4aadac 100644
> --- a/drivers/block/xen-blkback/xenbus.c
> +++ b/drivers/block/xen-blkback/xenbus.c
> @@ -1023,6 +1023,7 @@ static int read_per_ring_refs(struct xen_blkif_ring
> *ring, const char *dir)
>  static int connect_ring(struct backend_info *be)
>  {
>   struct xenbus_device *dev = be->dev;
> + struct xen_blkif *blkif = be->blkif;
>   unsigned int pers_grants;
>   char protocol[64] = "";
>   int err, i;
> @@ -1033,25 +1034,25 @@ static int connect_ring(struct backend_info *be)
> 
>   pr_debug("%s %s\n", __func__, dev->otherend);
> 
> - be->blkif->blk_protocol = BLKIF_PROTOCOL_DEFAULT;
> + blkif->blk_protocol = BLKIF_PROTOCOL_DEFAULT;
>   err = xenbus_scanf(XBT_NIL, dev->otherend, "protocol",
>  "%63s", protocol);
>   if (err <= 0)
>   strcpy(protocol, "unspecified, assuming default");
>   else if (0 == strcmp(protocol, XEN_IO_PROTO_ABI_NATIVE))
> - be->blkif->blk_protocol = BLKIF_PROTOCOL_NATIVE;
> + blkif->blk_protocol = BLKIF_PROTOCOL_NATIVE;
>   else if (0 == strcmp(protocol, XEN_IO_PROTO_ABI_X86_32))
> - be->blkif->blk_protocol = BLKIF_PROTOCOL_X86_32;
> + blkif->blk_protocol = BLKIF_PROTOCOL_X86_32;
>   else if (0 == strcmp(protocol, XEN_IO_PROTO_ABI_X86_64))
> - be->blkif->blk_protocol = BLKIF_PROTOCOL_X86_64;
> + blkif->blk_protocol = BLKIF_PROTOCOL_X86_64;
>   else {
>   xenbus_dev_fatal(dev, err, "unknown fe protocol %s",
> protocol);
>   return -ENOSYS;
>   }
>   pers_grants = xenbus_read_unsigned(dev->otherend, "feature-
> persistent",
>  0);
> - be->blkif->vbd.feature_gnt_persistent = pers_grants;
> - be->blkif->vbd.overflow_max_grants = 0;
> + blkif->vbd.feature_gnt_persistent = pers_grants;
> + blkif->vbd.overflow_max_grants = 0;
> 
>   /*
>* Read the number of hardware queues from frontend.
> @@ -1067,16 +1068,16 @@ static int connect_ring(struct backend_info *be)
>   requested_num_queues, xenblk_max_queues);
>   return -ENOSYS;
>   }
> - be->blkif->nr_rings = requested_num_queues;
> - if (xen_blkif_alloc_rings(be->blkif))
> + blkif->nr_rings = requested_num_queues;
> + if (xen_blkif_alloc_rings(blkif))
>   return -ENOMEM;
> 
>   pr_info("%s: using %d queues, protocol %d (%s) %s\n", dev->nodename,
> -  be->blkif->nr_rings, be->blkif->blk_protocol, protocol,
> +  blkif->nr_rings, blkif->blk_protocol, protocol,
>pers_grants ? "persistent grants" : "");
> 
> - if (be->blkif->nr_rings == 1)
> - return read_per_ring_refs(>blkif->rings[0], dev-
> >otherend);
> + if (blkif->nr_rings == 1)
> + return read_per_ring_refs(>rings[0], dev->otherend);
>   else {
>   xspathsize = strlen(dev->otherend) + xenstore_path_ext_size;
>   xspath = kmalloc(xspathsize, GFP_KERNEL);
> @@ -1085,10 +1086,10 @@ static int connect_ring(struct backend_info *be)
>   return -ENOMEM;
>   }
> 
> - for (i = 0; i < be->blkif->nr_rings; i++) {
> + for (i = 0; i < blkif->nr_rings; i++) {
>   memset(xspath, 0, xspathsize);
>   snprintf(xspath, xspathsize, "%s/queue-%u", dev-
> >otherend, i);
> - err = read_per_ring_refs(>blkif->rings[i], xspath);
> + err = read_per_ring_refs(>rings[i], xspath);
>   if (err) {
>   kfree(xspath);
>   return err;
> --
> 2.7.4

RE: [PATCH v3 1/1] xen/blkback: rework connect_ring() to avoid inconsistent xenstore 'ring-page-order' set by malicious blkfront

2019-01-04 Thread Paul Durrant

> -Original Message-
> On 12/19/2018 09:23 PM, Dongli Zhang wrote:
> > The xenstore 'ring-page-order' is used globally for each blkback queue
> and
> > therefore should be read from xenstore only once. However, it is
> obtained
> > in read_per_ring_refs() which might be called multiple times during the
> > initialization of each blkback queue.
> >
> > If the blkfront is malicious and the 'ring-page-order' is set in
> different
> > value by blkfront every time before blkback reads it, this may end up at
> > the "WARN_ON(i != (XEN_BLKIF_REQS_PER_PAGE * blkif->nr_ring_pages));" in
> > xen_blkif_disconnect() when frontend is destroyed.
> >
> > This patch reworks connect_ring() to read xenstore 'ring-page-order'
> only
> > once.
> >
> > Signed-off-by: Dongli Zhang 
> > ---
> > Changed since v1:
> >   * change the order of xenstore read in read_per_ring_refs
> >   * use xenbus_read_unsigned() in connect_ring()
> >
> > Changed since v2:
> >   * simplify the condition check as "(err != 1 && nr_grefs > 1)"
> >   * avoid setting err as -EINVAL to remove extra one line of code
> >
> >  drivers/block/xen-blkback/xenbus.c | 74 +--
> ---
> >  1 file changed, 41 insertions(+), 33 deletions(-)
> >
> > diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-
> blkback/xenbus.c
> > index a4bc74e..dfea3a4 100644
> > --- a/drivers/block/xen-blkback/xenbus.c
> > +++ b/drivers/block/xen-blkback/xenbus.c
> > @@ -926,7 +926,7 @@ static int read_per_ring_refs(struct xen_blkif_ring
> *ring, const char *dir)
> > int err, i, j;
> > struct xen_blkif *blkif = ring->blkif;
> > struct xenbus_device *dev = blkif->be->dev;
> > -   unsigned int ring_page_order, nr_grefs, evtchn;
> > +   unsigned int nr_grefs, evtchn;
> >
> > err = xenbus_scanf(XBT_NIL, dir, "event-channel", "%u",
> >   );
> > @@ -936,43 +936,36 @@ static int read_per_ring_refs(struct
> xen_blkif_ring *ring, const char *dir)
> > return err;
> > }
> >
> > -   err = xenbus_scanf(XBT_NIL, dev->otherend, "ring-page-order", "%u",
> > - _page_order);
> > +   nr_grefs = blkif->nr_ring_pages;
> > +   WARN_ON(!nr_grefs);

Why not exit if !nr_grefs? There's nothing useful for this function to do in 
that case.

> > +
> > +   for (i = 0; i < nr_grefs; i++) {
> > +   char ring_ref_name[RINGREF_NAME_LEN];
> > +
> > +   snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", i);
> > +   err = xenbus_scanf(XBT_NIL, dir, ring_ref_name,
> > +  "%u", _ref[i]);
> > +
> > +   if (err != 1 && nr_grefs > 1) {
> > +   xenbus_dev_fatal(dev, err, "reading %s/%s",
> > +dir, ring_ref_name);
> > +   return -EINVAL;
> > +   }
> > +
> > +   if (err != 1)
> > +   break;

Seems odd to test (err != 1) twice. I'd prefer:

if (err != 1) {
if (nr_grefs == 1)
break;


}

Either that or simply break if err != 1 and then...

> > +   }
> > +
> > if (err != 1) {

...add a check and fatal error exit here if nr_grefs != 1.

> > -   err = xenbus_scanf(XBT_NIL, dir, "ring-ref", "%u",
> _ref[0]);
> > +   WARN_ON(nr_grefs != 1);
> > +
> > +   err = xenbus_scanf(XBT_NIL, dir, "ring-ref", "%u",
> > +  _ref[0]);
> > if (err != 1) {
> > -   err = -EINVAL;
> > xenbus_dev_fatal(dev, err, "reading %s/ring-ref", dir);
> > -   return err;
> > -   }
> > -   nr_grefs = 1;
> > -   } else {
> > -   unsigned int i;
> > -
> > -   if (ring_page_order > xen_blkif_max_ring_order) {
> > -   err = -EINVAL;
> > -   xenbus_dev_fatal(dev, err, "%s/request %d ring page
> order exceed max:%d",
> > -dir, ring_page_order,
> > -xen_blkif_max_ring_order);
> > -   return err;
> > -   }
> > -
> > -   nr_grefs = 1 << ring_page_order;
> > -   for (i = 0; i < nr_grefs; i++) {
> > -   char ring_ref_name[RINGREF_NAME_LEN];
> > -
> > -   snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u",
> i);
> > -   err = xenbus_scanf(XBT_NIL, dir, ring_ref_name,
> > -  "%u", _ref[i]);
> > -   if (err != 1) {
> > -   err = -EINVAL;
> > -   xenbus_dev_fatal(dev, err, "reading %s/%s",
> > -dir, ring_ref_name);
> > -   return err;
> > -   }
> > +   return -EINVAL;
> > }
> > }
> > -   blkif->nr_ring_pages = nr_grefs;
> >
> > for (i = 0; i < nr_grefs * XEN_BLKIF_REQS_PER_PAGE; i++) {
> > req =

RE: [Xen-devel] [PATCH 1/1] xen/blkback: rework connect_ring() to avoid inconsistent xenstore 'ring-page-order' set by malicious blkfront

2018-12-07 Thread Paul Durrant

> -Original Message-
> From: Dongli Zhang [mailto:dongli.zh...@oracle.com]
> Sent: 07 December 2018 15:10
> To: Paul Durrant ; linux-kernel@vger.kernel.org;
> xen-de...@lists.xenproject.org; linux-bl...@vger.kernel.org
> Cc: ax...@kernel.dk; Roger Pau Monne ;
> konrad.w...@oracle.com
> Subject: Re: [Xen-devel] [PATCH 1/1] xen/blkback: rework connect_ring() to
> avoid inconsistent xenstore 'ring-page-order' set by malicious blkfront
> 
> Hi Paul,
> 
> On 12/07/2018 05:39 PM, Paul Durrant wrote:
> >> -Original Message-
> >> From: Xen-devel [mailto:xen-devel-boun...@lists.xenproject.org] On
> Behalf
> >> Of Dongli Zhang
> >> Sent: 07 December 2018 04:18
> >> To: linux-kernel@vger.kernel.org; xen-de...@lists.xenproject.org;
> linux-
> >> bl...@vger.kernel.org
> >> Cc: ax...@kernel.dk; Roger Pau Monne ;
> >> konrad.w...@oracle.com
> >> Subject: [Xen-devel] [PATCH 1/1] xen/blkback: rework connect_ring() to
> >> avoid inconsistent xenstore 'ring-page-order' set by malicious blkfront
> >>
> >> The xenstore 'ring-page-order' is used globally for each blkback queue
> and
> >> therefore should be read from xenstore only once. However, it is
> obtained
> >> in read_per_ring_refs() which might be called multiple times during the
> >> initialization of each blkback queue.
> >
> > That is certainly sub-optimal.
> >
> >>
> >> If the blkfront is malicious and the 'ring-page-order' is set in
> different
> >> value by blkfront every time before blkback reads it, this may end up
> at
> >> the "WARN_ON(i != (XEN_BLKIF_REQS_PER_PAGE * blkif->nr_ring_pages));"
> in
> >> xen_blkif_disconnect() when frontend is destroyed.
> >
> > I can't actually see what useful function blkif->nr_ring_pages actually
> performs any more. Perhaps you could actually get rid of it?
> 
> How about we keep it? Other than reading from xenstore, it is the only
> place for
> us to know the value from 'ring-page-order'.
> 
> This helps calculate the initialized number of elements on all
> xen_blkif_ring->pending_free lists. That's how "WARN_ON(i !=
> (XEN_BLKIF_REQS_PER_PAGE * blkif->nr_ring_pages));" is used to double
> check if
> there is no leak of elements reclaimed from all xen_blkif_ring-
> >pending_free.
> 
> It helps vmcore analysis as well. Given blkif->nr_ring_pages, we would be
> able
> to double check if the number of ring buffer slots are correct.
> 
> I could not see any drawback leaving blkif->nr_ring_pagesin the code.

No, there's no drawback apart from space, but apart from that cross-check and, 
as you say, core analysis it seems to have little value.

  Paul

> 
> Dongli Zhang
> 
> >
> >>
> >> This patch reworks connect_ring() to read xenstore 'ring-page-order'
> only
> >> once.
> >
> > That is certainly a good thing :-)
> >
> >   Paul
> >
> >>
> >> Signed-off-by: Dongli Zhang 
> >> ---
> >>  drivers/block/xen-blkback/xenbus.c | 49 --
> ---
> >> -
> >>  1 file changed, 31 insertions(+), 18 deletions(-)
> >>
> >> diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-
> >> blkback/xenbus.c
> >> index a4bc74e..4a8ce20 100644
> >> --- a/drivers/block/xen-blkback/xenbus.c
> >> +++ b/drivers/block/xen-blkback/xenbus.c
> >> @@ -919,14 +919,15 @@ static void connect(struct backend_info *be)
> >>  /*
> >>   * Each ring may have multi pages, depends on "ring-page-order".
> >>   */
> >> -static int read_per_ring_refs(struct xen_blkif_ring *ring, const char
> >> *dir)
> >> +static int read_per_ring_refs(struct xen_blkif_ring *ring, const char
> >> *dir,
> >> +bool use_ring_page_order)
> >>  {
> >>unsigned int ring_ref[XENBUS_MAX_RING_GRANTS];
> >>struct pending_req *req, *n;
> >>int err, i, j;
> >>struct xen_blkif *blkif = ring->blkif;
> >>struct xenbus_device *dev = blkif->be->dev;
> >> -  unsigned int ring_page_order, nr_grefs, evtchn;
> >> +  unsigned int nr_grefs, evtchn;
> >>
> >>err = xenbus_scanf(XBT_NIL, dir, "event-channel", "%u",
> >>  );
> >> @@ -936,28 +937,18 @@ static int read_per_ring_refs(struct
> xen_blkif_ring
> >> *ring, const char *dir)
> >>return err;
> >>

RE: [Xen-devel] [PATCH 1/1] xen/blkback: rework connect_ring() to avoid inconsistent xenstore 'ring-page-order' set by malicious blkfront

2018-12-07 Thread Paul Durrant

> -Original Message-
> From: Xen-devel [mailto:xen-devel-boun...@lists.xenproject.org] On Behalf
> Of Dongli Zhang
> Sent: 07 December 2018 04:18
> To: linux-kernel@vger.kernel.org; xen-de...@lists.xenproject.org; linux-
> bl...@vger.kernel.org
> Cc: ax...@kernel.dk; Roger Pau Monne ;
> konrad.w...@oracle.com
> Subject: [Xen-devel] [PATCH 1/1] xen/blkback: rework connect_ring() to
> avoid inconsistent xenstore 'ring-page-order' set by malicious blkfront
> 
> The xenstore 'ring-page-order' is used globally for each blkback queue and
> therefore should be read from xenstore only once. However, it is obtained
> in read_per_ring_refs() which might be called multiple times during the
> initialization of each blkback queue.

That is certainly sub-optimal.

> 
> If the blkfront is malicious and the 'ring-page-order' is set in different
> value by blkfront every time before blkback reads it, this may end up at
> the "WARN_ON(i != (XEN_BLKIF_REQS_PER_PAGE * blkif->nr_ring_pages));" in
> xen_blkif_disconnect() when frontend is destroyed.

I can't actually see what useful function blkif->nr_ring_pages actually 
performs any more. Perhaps you could actually get rid of it?

> 
> This patch reworks connect_ring() to read xenstore 'ring-page-order' only
> once.

That is certainly a good thing :-)

  Paul

> 
> Signed-off-by: Dongli Zhang 
> ---
>  drivers/block/xen-blkback/xenbus.c | 49 -
> -
>  1 file changed, 31 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/block/xen-blkback/xenbus.c b/drivers/block/xen-
> blkback/xenbus.c
> index a4bc74e..4a8ce20 100644
> --- a/drivers/block/xen-blkback/xenbus.c
> +++ b/drivers/block/xen-blkback/xenbus.c
> @@ -919,14 +919,15 @@ static void connect(struct backend_info *be)
>  /*
>   * Each ring may have multi pages, depends on "ring-page-order".
>   */
> -static int read_per_ring_refs(struct xen_blkif_ring *ring, const char
> *dir)
> +static int read_per_ring_refs(struct xen_blkif_ring *ring, const char
> *dir,
> +   bool use_ring_page_order)
>  {
>   unsigned int ring_ref[XENBUS_MAX_RING_GRANTS];
>   struct pending_req *req, *n;
>   int err, i, j;
>   struct xen_blkif *blkif = ring->blkif;
>   struct xenbus_device *dev = blkif->be->dev;
> - unsigned int ring_page_order, nr_grefs, evtchn;
> + unsigned int nr_grefs, evtchn;
> 
>   err = xenbus_scanf(XBT_NIL, dir, "event-channel", "%u",
> );
> @@ -936,28 +937,18 @@ static int read_per_ring_refs(struct xen_blkif_ring
> *ring, const char *dir)
>   return err;
>   }
> 
> - err = xenbus_scanf(XBT_NIL, dev->otherend, "ring-page-order", "%u",
> -   _page_order);
> - if (err != 1) {
> + nr_grefs = blkif->nr_ring_pages;
> +
> + if (!use_ring_page_order) {
>   err = xenbus_scanf(XBT_NIL, dir, "ring-ref", "%u",
> _ref[0]);
>   if (err != 1) {
>   err = -EINVAL;
>   xenbus_dev_fatal(dev, err, "reading %s/ring-ref", dir);
>   return err;
>   }
> - nr_grefs = 1;
>   } else {
>   unsigned int i;
> 
> - if (ring_page_order > xen_blkif_max_ring_order) {
> - err = -EINVAL;
> - xenbus_dev_fatal(dev, err, "%s/request %d ring page
> order exceed max:%d",
> -  dir, ring_page_order,
> -  xen_blkif_max_ring_order);
> - return err;
> - }
> -
> - nr_grefs = 1 << ring_page_order;
>   for (i = 0; i < nr_grefs; i++) {
>   char ring_ref_name[RINGREF_NAME_LEN];
> 
> @@ -972,7 +963,6 @@ static int read_per_ring_refs(struct xen_blkif_ring
> *ring, const char *dir)
>   }
>   }
>   }
> - blkif->nr_ring_pages = nr_grefs;
> 
>   for (i = 0; i < nr_grefs * XEN_BLKIF_REQS_PER_PAGE; i++) {
>   req = kzalloc(sizeof(*req), GFP_KERNEL);
> @@ -1030,6 +1020,8 @@ static int connect_ring(struct backend_info *be)
>   size_t xspathsize;
>   const size_t xenstore_path_ext_size = 11; /* sufficient for "/queue-
> NNN" */
>   unsigned int requested_num_queues = 0;
> + bool use_ring_page_order = false;
> + unsigned int ring_page_order;
> 
>   pr_debug("%s %s\n", __func__, dev->otherend);
> 
> @@ -1075,8 +1067,28 @@ static int connect_ring(struct backend_info *be)
>be->blkif->nr_rings, be->blkif->blk_protocol, protocol,
>pers_grants ? "persistent grants" : "");
> 
> + err = xenbus_scanf(XBT_NIL, dev->otherend, "ring-page-order", "%u",
> +_page_order);
> +
> + if (err != 1) {
> + be->blkif->nr_ring_pages = 1;
> + } else {
> + if (ring_page_order > xen_blkif_max_ring_order) {
> + err = -EINVAL;
> +

RE: [PATCH v5] xen/privcmd: add IOCTL_PRIVCMD_MMAP_RESOURCE

2018-05-09 Thread Paul Durrant

Apoligies. I appear to have already sent this a while ago.

  Paul

> -Original Message-
> From: Paul Durrant [mailto:paul.durr...@citrix.com]
> Sent: 09 May 2018 14:16
> To: xen-de...@lists.xenproject.org; linux-kernel@vger.kernel.org; linux-
> arm-ker...@lists.infradead.org
> Cc: Paul Durrant <paul.durr...@citrix.com>; Juergen Gross
> <jgr...@suse.com>; Thomas Gleixner <t...@linutronix.de>; Ingo Molnar
> <mi...@redhat.com>; Stefano Stabellini <sstabell...@kernel.org>
> Subject: [PATCH v5] xen/privcmd: add IOCTL_PRIVCMD_MMAP_RESOURCE
> 
> My recent Xen patch series introduces a new HYPERVISOR_memory_op to
> support direct priv-mapping of certain guest resources (such as ioreq
> pages, used by emulators) by a tools domain, rather than having to access
> such resources via the guest P2M.
> 
> This patch adds the necessary infrastructure to the privcmd driver and
> Xen MMU code to support direct resource mapping.
> 
> NOTE: The adjustment in the MMU code is partially cosmetic. Xen will now
>   allow a PV tools domain to map guest pages either by GFN or MFN, thus
>   the term 'mfn' has been swapped for 'pfn' in the lower layers of the
>   remap code.
> 
> Signed-off-by: Paul Durrant <paul.durr...@citrix.com>
> Reviewed-by: Boris Ostrovsky <boris.ostrov...@oracle.com>
> ---
> Cc: Juergen Gross <jgr...@suse.com>
> Cc: Thomas Gleixner <t...@linutronix.de>
> Cc: Ingo Molnar <mi...@redhat.com>
> Cc: Stefano Stabellini <sstabell...@kernel.org>
> 
> v5:
>  - Handle the case of xen_feature(XENFEAT_auto_translated_physmap) &&
>(PAGE_SIZE > XEN_PAGE_SIZE)
> 
> v4:
>  - Remove stray line of debug that causes a build warning on ARM 32-bit
> 
> v3:
>  - Addres comments from Boris
>  - Fix ARM build
> 
> v2:
>  - Fix bug when mapping multiple pages of a resource
> ---
>  arch/arm/xen/enlighten.c   |  11 
>  arch/x86/xen/mmu.c |  60 +--
>  drivers/xen/privcmd.c  | 133
> +
>  include/uapi/xen/privcmd.h |  11 
>  include/xen/interface/memory.h |  66 
>  include/xen/interface/xen.h|   7 ++-
>  include/xen/xen-ops.h  |  24 +++-
>  7 files changed, 291 insertions(+), 21 deletions(-)
> 
> diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
> index ba7f4c8f5c3e..8073625371f5 100644
> --- a/arch/arm/xen/enlighten.c
> +++ b/arch/arm/xen/enlighten.c
> @@ -89,6 +89,17 @@ int xen_unmap_domain_gfn_range(struct
> vm_area_struct *vma,
>  }
>  EXPORT_SYMBOL_GPL(xen_unmap_domain_gfn_range);
> 
> +/* Not used by XENFEAT_auto_translated guests. */
> +int xen_remap_domain_mfn_array(struct vm_area_struct *vma,
> +unsigned long addr,
> +xen_pfn_t *mfn, int nr,
> +int *err_ptr, pgprot_t prot,
> +unsigned int domid, struct page **pages)
> +{
> + return -ENOSYS;
> +}
> +EXPORT_SYMBOL_GPL(xen_remap_domain_mfn_array);
> +
>  static void xen_read_wallclock(struct timespec64 *ts)
>  {
>   u32 version;
> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> index d33e7dbe3129..af2960cb7a3e 100644
> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -65,37 +65,44 @@ static void xen_flush_tlb_all(void)
>  #define REMAP_BATCH_SIZE 16
> 
>  struct remap_data {
> - xen_pfn_t *mfn;
> + xen_pfn_t *pfn;
>   bool contiguous;
> + bool no_translate;
>   pgprot_t prot;
>   struct mmu_update *mmu_update;
>  };
> 
> -static int remap_area_mfn_pte_fn(pte_t *ptep, pgtable_t token,
> +static int remap_area_pfn_pte_fn(pte_t *ptep, pgtable_t token,
>unsigned long addr, void *data)
>  {
>   struct remap_data *rmd = data;
> - pte_t pte = pte_mkspecial(mfn_pte(*rmd->mfn, rmd->prot));
> + pte_t pte = pte_mkspecial(mfn_pte(*rmd->pfn, rmd->prot));
> 
> - /* If we have a contiguous range, just update the mfn itself,
> -else update pointer to be "next mfn". */
> + /*
> +  * If we have a contiguous range, just update the pfn itself,
> +  * else update pointer to be "next pfn".
> +  */
>   if (rmd->contiguous)
> - (*rmd->mfn)++;
> + (*rmd->pfn)++;
>   else
> - rmd->mfn++;
> + rmd->pfn++;
> 
> - rmd->mmu_update->ptr = virt_to_machine(ptep).maddr |
> MMU_NORMAL_PT_UPDATE;
> + rmd->mmu_update->ptr = virt_to_machine(ptep).maddr;
> + rmd->mmu_upd

RE: [PATCH v5] xen/privcmd: add IOCTL_PRIVCMD_MMAP_RESOURCE

2018-05-09 Thread Paul Durrant

Apoligies. I appear to have already sent this a while ago.

  Paul

> -Original Message-
> From: Paul Durrant [mailto:paul.durr...@citrix.com]
> Sent: 09 May 2018 14:16
> To: xen-de...@lists.xenproject.org; linux-kernel@vger.kernel.org; linux-
> arm-ker...@lists.infradead.org
> Cc: Paul Durrant ; Juergen Gross
> ; Thomas Gleixner ; Ingo Molnar
> ; Stefano Stabellini 
> Subject: [PATCH v5] xen/privcmd: add IOCTL_PRIVCMD_MMAP_RESOURCE
> 
> My recent Xen patch series introduces a new HYPERVISOR_memory_op to
> support direct priv-mapping of certain guest resources (such as ioreq
> pages, used by emulators) by a tools domain, rather than having to access
> such resources via the guest P2M.
> 
> This patch adds the necessary infrastructure to the privcmd driver and
> Xen MMU code to support direct resource mapping.
> 
> NOTE: The adjustment in the MMU code is partially cosmetic. Xen will now
>   allow a PV tools domain to map guest pages either by GFN or MFN, thus
>   the term 'mfn' has been swapped for 'pfn' in the lower layers of the
>   remap code.
> 
> Signed-off-by: Paul Durrant 
> Reviewed-by: Boris Ostrovsky 
> ---
> Cc: Juergen Gross 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: Stefano Stabellini 
> 
> v5:
>  - Handle the case of xen_feature(XENFEAT_auto_translated_physmap) &&
>(PAGE_SIZE > XEN_PAGE_SIZE)
> 
> v4:
>  - Remove stray line of debug that causes a build warning on ARM 32-bit
> 
> v3:
>  - Addres comments from Boris
>  - Fix ARM build
> 
> v2:
>  - Fix bug when mapping multiple pages of a resource
> ---
>  arch/arm/xen/enlighten.c   |  11 
>  arch/x86/xen/mmu.c |  60 +--
>  drivers/xen/privcmd.c  | 133
> +
>  include/uapi/xen/privcmd.h |  11 
>  include/xen/interface/memory.h |  66 
>  include/xen/interface/xen.h|   7 ++-
>  include/xen/xen-ops.h  |  24 +++-
>  7 files changed, 291 insertions(+), 21 deletions(-)
> 
> diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
> index ba7f4c8f5c3e..8073625371f5 100644
> --- a/arch/arm/xen/enlighten.c
> +++ b/arch/arm/xen/enlighten.c
> @@ -89,6 +89,17 @@ int xen_unmap_domain_gfn_range(struct
> vm_area_struct *vma,
>  }
>  EXPORT_SYMBOL_GPL(xen_unmap_domain_gfn_range);
> 
> +/* Not used by XENFEAT_auto_translated guests. */
> +int xen_remap_domain_mfn_array(struct vm_area_struct *vma,
> +unsigned long addr,
> +xen_pfn_t *mfn, int nr,
> +int *err_ptr, pgprot_t prot,
> +unsigned int domid, struct page **pages)
> +{
> + return -ENOSYS;
> +}
> +EXPORT_SYMBOL_GPL(xen_remap_domain_mfn_array);
> +
>  static void xen_read_wallclock(struct timespec64 *ts)
>  {
>   u32 version;
> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> index d33e7dbe3129..af2960cb7a3e 100644
> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -65,37 +65,44 @@ static void xen_flush_tlb_all(void)
>  #define REMAP_BATCH_SIZE 16
> 
>  struct remap_data {
> - xen_pfn_t *mfn;
> + xen_pfn_t *pfn;
>   bool contiguous;
> + bool no_translate;
>   pgprot_t prot;
>   struct mmu_update *mmu_update;
>  };
> 
> -static int remap_area_mfn_pte_fn(pte_t *ptep, pgtable_t token,
> +static int remap_area_pfn_pte_fn(pte_t *ptep, pgtable_t token,
>unsigned long addr, void *data)
>  {
>   struct remap_data *rmd = data;
> - pte_t pte = pte_mkspecial(mfn_pte(*rmd->mfn, rmd->prot));
> + pte_t pte = pte_mkspecial(mfn_pte(*rmd->pfn, rmd->prot));
> 
> - /* If we have a contiguous range, just update the mfn itself,
> -else update pointer to be "next mfn". */
> + /*
> +  * If we have a contiguous range, just update the pfn itself,
> +  * else update pointer to be "next pfn".
> +  */
>   if (rmd->contiguous)
> - (*rmd->mfn)++;
> + (*rmd->pfn)++;
>   else
> - rmd->mfn++;
> + rmd->pfn++;
> 
> - rmd->mmu_update->ptr = virt_to_machine(ptep).maddr |
> MMU_NORMAL_PT_UPDATE;
> + rmd->mmu_update->ptr = virt_to_machine(ptep).maddr;
> + rmd->mmu_update->ptr |= rmd->no_translate ?
> + MMU_PT_UPDATE_NO_TRANSLATE :
> + MMU_NORMAL_PT_UPDATE;
>   rmd->mmu_update->val = pte_val_ma(pte);
>   rmd->mmu_update++;
> 
>   return 0;
>  }
> 
> -static int do_remap_gfn(struct vm_area

[PATCH v5] xen/privcmd: add IOCTL_PRIVCMD_MMAP_RESOURCE

2018-05-09 Thread Paul Durrant

My recent Xen patch series introduces a new HYPERVISOR_memory_op to
support direct priv-mapping of certain guest resources (such as ioreq
pages, used by emulators) by a tools domain, rather than having to access
such resources via the guest P2M.

This patch adds the necessary infrastructure to the privcmd driver and
Xen MMU code to support direct resource mapping.

NOTE: The adjustment in the MMU code is partially cosmetic. Xen will now
  allow a PV tools domain to map guest pages either by GFN or MFN, thus
  the term 'mfn' has been swapped for 'pfn' in the lower layers of the
  remap code.

Signed-off-by: Paul Durrant <paul.durr...@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrov...@oracle.com>
---
Cc: Juergen Gross <jgr...@suse.com>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Ingo Molnar <mi...@redhat.com>
Cc: Stefano Stabellini <sstabell...@kernel.org>

v5:
 - Handle the case of xen_feature(XENFEAT_auto_translated_physmap) &&
   (PAGE_SIZE > XEN_PAGE_SIZE)

v4:
 - Remove stray line of debug that causes a build warning on ARM 32-bit

v3:
 - Addres comments from Boris
 - Fix ARM build

v2:
 - Fix bug when mapping multiple pages of a resource
---
 arch/arm/xen/enlighten.c   |  11 
 arch/x86/xen/mmu.c |  60 +--
 drivers/xen/privcmd.c  | 133 +
 include/uapi/xen/privcmd.h |  11 
 include/xen/interface/memory.h |  66 
 include/xen/interface/xen.h|   7 ++-
 include/xen/xen-ops.h  |  24 +++-
 7 files changed, 291 insertions(+), 21 deletions(-)

diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
index ba7f4c8f5c3e..8073625371f5 100644
--- a/arch/arm/xen/enlighten.c
+++ b/arch/arm/xen/enlighten.c
@@ -89,6 +89,17 @@ int xen_unmap_domain_gfn_range(struct vm_area_struct *vma,
 }
 EXPORT_SYMBOL_GPL(xen_unmap_domain_gfn_range);
 
+/* Not used by XENFEAT_auto_translated guests. */
+int xen_remap_domain_mfn_array(struct vm_area_struct *vma,
+  unsigned long addr,
+  xen_pfn_t *mfn, int nr,
+  int *err_ptr, pgprot_t prot,
+  unsigned int domid, struct page **pages)
+{
+   return -ENOSYS;
+}
+EXPORT_SYMBOL_GPL(xen_remap_domain_mfn_array);
+
 static void xen_read_wallclock(struct timespec64 *ts)
 {
u32 version;
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index d33e7dbe3129..af2960cb7a3e 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -65,37 +65,44 @@ static void xen_flush_tlb_all(void)
 #define REMAP_BATCH_SIZE 16
 
 struct remap_data {
-   xen_pfn_t *mfn;
+   xen_pfn_t *pfn;
bool contiguous;
+   bool no_translate;
pgprot_t prot;
struct mmu_update *mmu_update;
 };
 
-static int remap_area_mfn_pte_fn(pte_t *ptep, pgtable_t token,
+static int remap_area_pfn_pte_fn(pte_t *ptep, pgtable_t token,
 unsigned long addr, void *data)
 {
struct remap_data *rmd = data;
-   pte_t pte = pte_mkspecial(mfn_pte(*rmd->mfn, rmd->prot));
+   pte_t pte = pte_mkspecial(mfn_pte(*rmd->pfn, rmd->prot));
 
-   /* If we have a contiguous range, just update the mfn itself,
-  else update pointer to be "next mfn". */
+   /*
+* If we have a contiguous range, just update the pfn itself,
+* else update pointer to be "next pfn".
+*/
if (rmd->contiguous)
-   (*rmd->mfn)++;
+   (*rmd->pfn)++;
else
-   rmd->mfn++;
+   rmd->pfn++;
 
-   rmd->mmu_update->ptr = virt_to_machine(ptep).maddr | 
MMU_NORMAL_PT_UPDATE;
+   rmd->mmu_update->ptr = virt_to_machine(ptep).maddr;
+   rmd->mmu_update->ptr |= rmd->no_translate ?
+   MMU_PT_UPDATE_NO_TRANSLATE :
+   MMU_NORMAL_PT_UPDATE;
rmd->mmu_update->val = pte_val_ma(pte);
rmd->mmu_update++;
 
return 0;
 }
 
-static int do_remap_gfn(struct vm_area_struct *vma,
+static int do_remap_pfn(struct vm_area_struct *vma,
unsigned long addr,
-   xen_pfn_t *gfn, int nr,
+   xen_pfn_t *pfn, int nr,
int *err_ptr, pgprot_t prot,
-   unsigned domid,
+   unsigned int domid,
+   bool no_translate,
struct page **pages)
 {
int err = 0;
@@ -106,11 +113,14 @@ static int do_remap_gfn(struct vm_area_struct *vma,
 
BUG_ON(!((vma->vm_flags & (VM_PFNMAP | VM_IO)) == (VM_PFNMAP | VM_IO)));
 
-   rmd.mfn = gfn;
+   rmd.pfn = pfn;
rmd.prot = prot;
-   /* We use the err_ptr to indicate if there we are doing a contiguous
-* mapping or a discontigious mapping. */
+

[PATCH v5] xen/privcmd: add IOCTL_PRIVCMD_MMAP_RESOURCE

2018-05-09 Thread Paul Durrant

My recent Xen patch series introduces a new HYPERVISOR_memory_op to
support direct priv-mapping of certain guest resources (such as ioreq
pages, used by emulators) by a tools domain, rather than having to access
such resources via the guest P2M.

This patch adds the necessary infrastructure to the privcmd driver and
Xen MMU code to support direct resource mapping.

NOTE: The adjustment in the MMU code is partially cosmetic. Xen will now
  allow a PV tools domain to map guest pages either by GFN or MFN, thus
  the term 'mfn' has been swapped for 'pfn' in the lower layers of the
  remap code.

Signed-off-by: Paul Durrant 
Reviewed-by: Boris Ostrovsky 
---
Cc: Juergen Gross 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Stefano Stabellini 

v5:
 - Handle the case of xen_feature(XENFEAT_auto_translated_physmap) &&
   (PAGE_SIZE > XEN_PAGE_SIZE)

v4:
 - Remove stray line of debug that causes a build warning on ARM 32-bit

v3:
 - Addres comments from Boris
 - Fix ARM build

v2:
 - Fix bug when mapping multiple pages of a resource
---
 arch/arm/xen/enlighten.c   |  11 
 arch/x86/xen/mmu.c |  60 +--
 drivers/xen/privcmd.c  | 133 +
 include/uapi/xen/privcmd.h |  11 
 include/xen/interface/memory.h |  66 
 include/xen/interface/xen.h|   7 ++-
 include/xen/xen-ops.h  |  24 +++-
 7 files changed, 291 insertions(+), 21 deletions(-)

diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
index ba7f4c8f5c3e..8073625371f5 100644
--- a/arch/arm/xen/enlighten.c
+++ b/arch/arm/xen/enlighten.c
@@ -89,6 +89,17 @@ int xen_unmap_domain_gfn_range(struct vm_area_struct *vma,
 }
 EXPORT_SYMBOL_GPL(xen_unmap_domain_gfn_range);
 
+/* Not used by XENFEAT_auto_translated guests. */
+int xen_remap_domain_mfn_array(struct vm_area_struct *vma,
+  unsigned long addr,
+  xen_pfn_t *mfn, int nr,
+  int *err_ptr, pgprot_t prot,
+  unsigned int domid, struct page **pages)
+{
+   return -ENOSYS;
+}
+EXPORT_SYMBOL_GPL(xen_remap_domain_mfn_array);
+
 static void xen_read_wallclock(struct timespec64 *ts)
 {
u32 version;
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index d33e7dbe3129..af2960cb7a3e 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -65,37 +65,44 @@ static void xen_flush_tlb_all(void)
 #define REMAP_BATCH_SIZE 16
 
 struct remap_data {
-   xen_pfn_t *mfn;
+   xen_pfn_t *pfn;
bool contiguous;
+   bool no_translate;
pgprot_t prot;
struct mmu_update *mmu_update;
 };
 
-static int remap_area_mfn_pte_fn(pte_t *ptep, pgtable_t token,
+static int remap_area_pfn_pte_fn(pte_t *ptep, pgtable_t token,
 unsigned long addr, void *data)
 {
struct remap_data *rmd = data;
-   pte_t pte = pte_mkspecial(mfn_pte(*rmd->mfn, rmd->prot));
+   pte_t pte = pte_mkspecial(mfn_pte(*rmd->pfn, rmd->prot));
 
-   /* If we have a contiguous range, just update the mfn itself,
-  else update pointer to be "next mfn". */
+   /*
+* If we have a contiguous range, just update the pfn itself,
+* else update pointer to be "next pfn".
+*/
if (rmd->contiguous)
-   (*rmd->mfn)++;
+   (*rmd->pfn)++;
else
-   rmd->mfn++;
+   rmd->pfn++;
 
-   rmd->mmu_update->ptr = virt_to_machine(ptep).maddr | 
MMU_NORMAL_PT_UPDATE;
+   rmd->mmu_update->ptr = virt_to_machine(ptep).maddr;
+   rmd->mmu_update->ptr |= rmd->no_translate ?
+   MMU_PT_UPDATE_NO_TRANSLATE :
+   MMU_NORMAL_PT_UPDATE;
rmd->mmu_update->val = pte_val_ma(pte);
rmd->mmu_update++;
 
return 0;
 }
 
-static int do_remap_gfn(struct vm_area_struct *vma,
+static int do_remap_pfn(struct vm_area_struct *vma,
unsigned long addr,
-   xen_pfn_t *gfn, int nr,
+   xen_pfn_t *pfn, int nr,
int *err_ptr, pgprot_t prot,
-   unsigned domid,
+   unsigned int domid,
+   bool no_translate,
struct page **pages)
 {
int err = 0;
@@ -106,11 +113,14 @@ static int do_remap_gfn(struct vm_area_struct *vma,
 
BUG_ON(!((vma->vm_flags & (VM_PFNMAP | VM_IO)) == (VM_PFNMAP | VM_IO)));
 
-   rmd.mfn = gfn;
+   rmd.pfn = pfn;
rmd.prot = prot;
-   /* We use the err_ptr to indicate if there we are doing a contiguous
-* mapping or a discontigious mapping. */
+   /*
+* We use the err_ptr to indicate if there we are doing a contiguous
+* mapping or a discontigious mapping.
+*/
rmd.contiguous

RE: [Xen-devel] [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy helper DRM driver

2018-04-18 Thread Paul Durrant

> -Original Message-
> From: Oleksandr Andrushchenko [mailto:andr2...@gmail.com]
> Sent: 18 April 2018 11:21
> To: Paul Durrant <paul.durr...@citrix.com>; Roger Pau Monne
> <roger@citrix.com>
> Cc: jgr...@suse.com; Artem Mygaiev <artem_myga...@epam.com>;
> Dongwon Kim <dongwon@intel.com>; airl...@linux.ie;
> oleksandr_andrushche...@epam.com; linux-kernel@vger.kernel.org; dri-
> de...@lists.freedesktop.org; Potrola, MateuszX
> <mateuszx.potr...@intel.com>; xen-de...@lists.xenproject.org;
> daniel.vet...@intel.com; boris.ostrov...@oracle.com; Matt Roper
> <matthew.d.ro...@intel.com>
> Subject: Re: [Xen-devel] [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy
> helper DRM driver
> 
> On 04/18/2018 01:18 PM, Paul Durrant wrote:
> >> -Original Message-
> >> From: Xen-devel [mailto:xen-devel-boun...@lists.xenproject.org] On
> Behalf
> >> Of Roger Pau Monné
> >> Sent: 18 April 2018 11:11
> >> To: Oleksandr Andrushchenko <andr2...@gmail.com>
> >> Cc: jgr...@suse.com; Artem Mygaiev <artem_myga...@epam.com>;
> >> Dongwon Kim <dongwon@intel.com>; airl...@linux.ie;
> >> oleksandr_andrushche...@epam.com; linux-kernel@vger.kernel.org;
> dri-
> >> de...@lists.freedesktop.org; Potrola, MateuszX
> >> <mateuszx.potr...@intel.com>; xen-de...@lists.xenproject.org;
> >> daniel.vet...@intel.com; boris.ostrov...@oracle.com; Matt Roper
> >> <matthew.d.ro...@intel.com>
> >> Subject: Re: [Xen-devel] [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy
> >> helper DRM driver
> >>
> >> On Wed, Apr 18, 2018 at 11:01:12AM +0300, Oleksandr Andrushchenko
> >> wrote:
> >>> On 04/18/2018 10:35 AM, Roger Pau Monné wrote:
> >>>> On Wed, Apr 18, 2018 at 09:38:39AM +0300, Oleksandr Andrushchenko
> >> wrote:
> >>>>> On 04/17/2018 11:57 PM, Dongwon Kim wrote:
> >>>>>> On Tue, Apr 17, 2018 at 09:59:28AM +0200, Daniel Vetter wrote:
> >>>>>>> On Mon, Apr 16, 2018 at 12:29:05PM -0700, Dongwon Kim wrote:
> >>>>> 3.2 Backend exports dma-buf to xen-front
> >>>>>
> >>>>> In this case Dom0 pages are shared with DomU. As before, DomU can
> >> only write
> >>>>> to these pages, not any other page from Dom0, so it can be still
> >> considered
> >>>>> safe.
> >>>>> But, the following must be considered (highlighted in xen-front's
> Kernel
> >>>>> documentation):
> >>>>>    - If guest domain dies then pages/grants received from the backend
> >> cannot
> >>>>>      be claimed back - think of it as memory lost to Dom0 (won't be used
> >> for
> >>>>> any
> >>>>>      other guest)
> >>>>>    - Misbehaving guest may send too many requests to the backend
> >> exhausting
> >>>>>      its grant references and memory (consider this from security POV).
> >> As the
> >>>>>      backend runs in the trusted domain we also assume that it is
> trusted
> >> as
> >>>>> well,
> >>>>>      e.g. must take measures to prevent DDoS attacks.
> >>>> I cannot parse the above sentence:
> >>>>
> >>>> "As the backend runs in the trusted domain we also assume that it is
> >>>> trusted as well, e.g. must take measures to prevent DDoS attacks."
> >>>>
> >>>> What's the relation between being trusted and protecting from DoS
> >>>> attacks?
> >>> I mean that we trust the backend that it can prevent Dom0
> >>> from crashing in case DomU's frontend misbehaves, e.g.
> >>> if the frontend sends too many memory requests etc.
> >>>> In any case, all? PV protocols are implemented with the frontend
> >>>> sharing pages to the backend, and I think there's a reason why this
> >>>> model is used, and it should continue to be used.
> >>> This is the first use-case above. But there are real-world
> >>> use-cases (embedded in my case) when physically contiguous memory
> >>> needs to be shared, one of the possible ways to achieve this is
> >>> to share contiguous memory from Dom0 to DomU (the second use-case
> >> above)
> >>>> Having to add logic in the backend to prevent such attacks means
> >>>> that:
> >>>>
> >>>>- We need more code in the backend, which incre

RE: [Xen-devel] [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy helper DRM driver

2018-04-18 Thread Paul Durrant

> -Original Message-
> From: Oleksandr Andrushchenko [mailto:andr2...@gmail.com]
> Sent: 18 April 2018 11:21
> To: Paul Durrant ; Roger Pau Monne
> 
> Cc: jgr...@suse.com; Artem Mygaiev ;
> Dongwon Kim ; airl...@linux.ie;
> oleksandr_andrushche...@epam.com; linux-kernel@vger.kernel.org; dri-
> de...@lists.freedesktop.org; Potrola, MateuszX
> ; xen-de...@lists.xenproject.org;
> daniel.vet...@intel.com; boris.ostrov...@oracle.com; Matt Roper
> 
> Subject: Re: [Xen-devel] [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy
> helper DRM driver
> 
> On 04/18/2018 01:18 PM, Paul Durrant wrote:
> >> -Original Message-
> >> From: Xen-devel [mailto:xen-devel-boun...@lists.xenproject.org] On
> Behalf
> >> Of Roger Pau Monné
> >> Sent: 18 April 2018 11:11
> >> To: Oleksandr Andrushchenko 
> >> Cc: jgr...@suse.com; Artem Mygaiev ;
> >> Dongwon Kim ; airl...@linux.ie;
> >> oleksandr_andrushche...@epam.com; linux-kernel@vger.kernel.org;
> dri-
> >> de...@lists.freedesktop.org; Potrola, MateuszX
> >> ; xen-de...@lists.xenproject.org;
> >> daniel.vet...@intel.com; boris.ostrov...@oracle.com; Matt Roper
> >> 
> >> Subject: Re: [Xen-devel] [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy
> >> helper DRM driver
> >>
> >> On Wed, Apr 18, 2018 at 11:01:12AM +0300, Oleksandr Andrushchenko
> >> wrote:
> >>> On 04/18/2018 10:35 AM, Roger Pau Monné wrote:
> >>>> On Wed, Apr 18, 2018 at 09:38:39AM +0300, Oleksandr Andrushchenko
> >> wrote:
> >>>>> On 04/17/2018 11:57 PM, Dongwon Kim wrote:
> >>>>>> On Tue, Apr 17, 2018 at 09:59:28AM +0200, Daniel Vetter wrote:
> >>>>>>> On Mon, Apr 16, 2018 at 12:29:05PM -0700, Dongwon Kim wrote:
> >>>>> 3.2 Backend exports dma-buf to xen-front
> >>>>>
> >>>>> In this case Dom0 pages are shared with DomU. As before, DomU can
> >> only write
> >>>>> to these pages, not any other page from Dom0, so it can be still
> >> considered
> >>>>> safe.
> >>>>> But, the following must be considered (highlighted in xen-front's
> Kernel
> >>>>> documentation):
> >>>>>    - If guest domain dies then pages/grants received from the backend
> >> cannot
> >>>>>      be claimed back - think of it as memory lost to Dom0 (won't be used
> >> for
> >>>>> any
> >>>>>      other guest)
> >>>>>    - Misbehaving guest may send too many requests to the backend
> >> exhausting
> >>>>>      its grant references and memory (consider this from security POV).
> >> As the
> >>>>>      backend runs in the trusted domain we also assume that it is
> trusted
> >> as
> >>>>> well,
> >>>>>      e.g. must take measures to prevent DDoS attacks.
> >>>> I cannot parse the above sentence:
> >>>>
> >>>> "As the backend runs in the trusted domain we also assume that it is
> >>>> trusted as well, e.g. must take measures to prevent DDoS attacks."
> >>>>
> >>>> What's the relation between being trusted and protecting from DoS
> >>>> attacks?
> >>> I mean that we trust the backend that it can prevent Dom0
> >>> from crashing in case DomU's frontend misbehaves, e.g.
> >>> if the frontend sends too many memory requests etc.
> >>>> In any case, all? PV protocols are implemented with the frontend
> >>>> sharing pages to the backend, and I think there's a reason why this
> >>>> model is used, and it should continue to be used.
> >>> This is the first use-case above. But there are real-world
> >>> use-cases (embedded in my case) when physically contiguous memory
> >>> needs to be shared, one of the possible ways to achieve this is
> >>> to share contiguous memory from Dom0 to DomU (the second use-case
> >> above)
> >>>> Having to add logic in the backend to prevent such attacks means
> >>>> that:
> >>>>
> >>>>- We need more code in the backend, which increases complexity and
> >>>>  chances of bugs.
> >>>>- Such code/logic could be wrong, thus allowing DoS.
> >>> You can live without this code at all, but this is then up to
> >>> backend which may make Dom0 down because of DomU's frontend
> doing
> >> evil
> >

RE: [Xen-devel] [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy helper DRM driver

2018-04-18 Thread Paul Durrant

> -Original Message-
> From: Xen-devel [mailto:xen-devel-boun...@lists.xenproject.org] On Behalf
> Of Roger Pau Monné
> Sent: 18 April 2018 11:11
> To: Oleksandr Andrushchenko 
> Cc: jgr...@suse.com; Artem Mygaiev ;
> Dongwon Kim ; airl...@linux.ie;
> oleksandr_andrushche...@epam.com; linux-kernel@vger.kernel.org; dri-
> de...@lists.freedesktop.org; Potrola, MateuszX
> ; xen-de...@lists.xenproject.org;
> daniel.vet...@intel.com; boris.ostrov...@oracle.com; Matt Roper
> 
> Subject: Re: [Xen-devel] [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy
> helper DRM driver
> 
> On Wed, Apr 18, 2018 at 11:01:12AM +0300, Oleksandr Andrushchenko
> wrote:
> > On 04/18/2018 10:35 AM, Roger Pau Monné wrote:
> > > On Wed, Apr 18, 2018 at 09:38:39AM +0300, Oleksandr Andrushchenko
> wrote:
> > > > On 04/17/2018 11:57 PM, Dongwon Kim wrote:
> > > > > On Tue, Apr 17, 2018 at 09:59:28AM +0200, Daniel Vetter wrote:
> > > > > > On Mon, Apr 16, 2018 at 12:29:05PM -0700, Dongwon Kim wrote:
> > > > 3.2 Backend exports dma-buf to xen-front
> > > >
> > > > In this case Dom0 pages are shared with DomU. As before, DomU can
> only write
> > > > to these pages, not any other page from Dom0, so it can be still
> considered
> > > > safe.
> > > > But, the following must be considered (highlighted in xen-front's Kernel
> > > > documentation):
> > > >   - If guest domain dies then pages/grants received from the backend
> cannot
> > > >     be claimed back - think of it as memory lost to Dom0 (won't be used
> for
> > > > any
> > > >     other guest)
> > > >   - Misbehaving guest may send too many requests to the backend
> exhausting
> > > >     its grant references and memory (consider this from security POV).
> As the
> > > >     backend runs in the trusted domain we also assume that it is trusted
> as
> > > > well,
> > > >     e.g. must take measures to prevent DDoS attacks.
> > > I cannot parse the above sentence:
> > >
> > > "As the backend runs in the trusted domain we also assume that it is
> > > trusted as well, e.g. must take measures to prevent DDoS attacks."
> > >
> > > What's the relation between being trusted and protecting from DoS
> > > attacks?
> > I mean that we trust the backend that it can prevent Dom0
> > from crashing in case DomU's frontend misbehaves, e.g.
> > if the frontend sends too many memory requests etc.
> > > In any case, all? PV protocols are implemented with the frontend
> > > sharing pages to the backend, and I think there's a reason why this
> > > model is used, and it should continue to be used.
> > This is the first use-case above. But there are real-world
> > use-cases (embedded in my case) when physically contiguous memory
> > needs to be shared, one of the possible ways to achieve this is
> > to share contiguous memory from Dom0 to DomU (the second use-case
> above)
> > > Having to add logic in the backend to prevent such attacks means
> > > that:
> > >
> > >   - We need more code in the backend, which increases complexity and
> > > chances of bugs.
> > >   - Such code/logic could be wrong, thus allowing DoS.
> > You can live without this code at all, but this is then up to
> > backend which may make Dom0 down because of DomU's frontend doing
> evil
> > things
> 
> IMO we should design protocols that do not allow such attacks instead
> of having to defend against them.
> 
> > > > 4. xen-front/backend/xen-zcopy synchronization
> > > >
> > > > 4.1. As I already said in 2) all the inter VM communication happens
> between
> > > > xen-front and the backend, xen-zcopy is NOT involved in that.
> > > > When xen-front wants to destroy a display buffer (dumb/dma-buf) it
> issues a
> > > > XENDISPL_OP_DBUF_DESTROY command (opposite to
> XENDISPL_OP_DBUF_CREATE).
> > > > This call is synchronous, so xen-front expects that backend does free
> the
> > > > buffer pages on return.
> > > >
> > > > 4.2. Backend, on XENDISPL_OP_DBUF_DESTROY:
> > > >    - closes all dumb handles/fd's of the buffer according to [3]
> > > >    - issues DRM_IOCTL_XEN_ZCOPY_DUMB_WAIT_FREE IOCTL to xen-
> zcopy to make
> > > > sure
> > > >      the buffer is freed (think of it as it waits for dma-buf->release
> > > > callback)
> > > So this zcopy thing keeps some kind of track of the memory usage? Why
> > > can't the user-space backend keep track of the buffer usage?
> > Because there is no dma-buf UAPI which allows to track the buffer life cycle
> > (e.g. wait until dma-buf's .release callback is called)
> > > >    - replies to xen-front that the buffer can be destroyed.
> > > > This way deletion of the buffer happens synchronously on both Dom0
> and DomU
> > > > sides. In case if DRM_IOCTL_XEN_ZCOPY_DUMB_WAIT_FREE returns
> with time-out
> > > > error
> > > > (BTW, wait time is a parameter of this IOCTL), Xen will defer grant
> > > > reference
> > > > removal and will retry later until those are free.
> > > >
> > >

RE: [Xen-devel] [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy helper DRM driver

2018-04-18 Thread Paul Durrant

> -Original Message-
> From: Xen-devel [mailto:xen-devel-boun...@lists.xenproject.org] On Behalf
> Of Roger Pau Monné
> Sent: 18 April 2018 11:11
> To: Oleksandr Andrushchenko 
> Cc: jgr...@suse.com; Artem Mygaiev ;
> Dongwon Kim ; airl...@linux.ie;
> oleksandr_andrushche...@epam.com; linux-kernel@vger.kernel.org; dri-
> de...@lists.freedesktop.org; Potrola, MateuszX
> ; xen-de...@lists.xenproject.org;
> daniel.vet...@intel.com; boris.ostrov...@oracle.com; Matt Roper
> 
> Subject: Re: [Xen-devel] [PATCH 0/1] drm/xen-zcopy: Add Xen zero-copy
> helper DRM driver
> 
> On Wed, Apr 18, 2018 at 11:01:12AM +0300, Oleksandr Andrushchenko
> wrote:
> > On 04/18/2018 10:35 AM, Roger Pau Monné wrote:
> > > On Wed, Apr 18, 2018 at 09:38:39AM +0300, Oleksandr Andrushchenko
> wrote:
> > > > On 04/17/2018 11:57 PM, Dongwon Kim wrote:
> > > > > On Tue, Apr 17, 2018 at 09:59:28AM +0200, Daniel Vetter wrote:
> > > > > > On Mon, Apr 16, 2018 at 12:29:05PM -0700, Dongwon Kim wrote:
> > > > 3.2 Backend exports dma-buf to xen-front
> > > >
> > > > In this case Dom0 pages are shared with DomU. As before, DomU can
> only write
> > > > to these pages, not any other page from Dom0, so it can be still
> considered
> > > > safe.
> > > > But, the following must be considered (highlighted in xen-front's Kernel
> > > > documentation):
> > > >   - If guest domain dies then pages/grants received from the backend
> cannot
> > > >     be claimed back - think of it as memory lost to Dom0 (won't be used
> for
> > > > any
> > > >     other guest)
> > > >   - Misbehaving guest may send too many requests to the backend
> exhausting
> > > >     its grant references and memory (consider this from security POV).
> As the
> > > >     backend runs in the trusted domain we also assume that it is trusted
> as
> > > > well,
> > > >     e.g. must take measures to prevent DDoS attacks.
> > > I cannot parse the above sentence:
> > >
> > > "As the backend runs in the trusted domain we also assume that it is
> > > trusted as well, e.g. must take measures to prevent DDoS attacks."
> > >
> > > What's the relation between being trusted and protecting from DoS
> > > attacks?
> > I mean that we trust the backend that it can prevent Dom0
> > from crashing in case DomU's frontend misbehaves, e.g.
> > if the frontend sends too many memory requests etc.
> > > In any case, all? PV protocols are implemented with the frontend
> > > sharing pages to the backend, and I think there's a reason why this
> > > model is used, and it should continue to be used.
> > This is the first use-case above. But there are real-world
> > use-cases (embedded in my case) when physically contiguous memory
> > needs to be shared, one of the possible ways to achieve this is
> > to share contiguous memory from Dom0 to DomU (the second use-case
> above)
> > > Having to add logic in the backend to prevent such attacks means
> > > that:
> > >
> > >   - We need more code in the backend, which increases complexity and
> > > chances of bugs.
> > >   - Such code/logic could be wrong, thus allowing DoS.
> > You can live without this code at all, but this is then up to
> > backend which may make Dom0 down because of DomU's frontend doing
> evil
> > things
> 
> IMO we should design protocols that do not allow such attacks instead
> of having to defend against them.
> 
> > > > 4. xen-front/backend/xen-zcopy synchronization
> > > >
> > > > 4.1. As I already said in 2) all the inter VM communication happens
> between
> > > > xen-front and the backend, xen-zcopy is NOT involved in that.
> > > > When xen-front wants to destroy a display buffer (dumb/dma-buf) it
> issues a
> > > > XENDISPL_OP_DBUF_DESTROY command (opposite to
> XENDISPL_OP_DBUF_CREATE).
> > > > This call is synchronous, so xen-front expects that backend does free
> the
> > > > buffer pages on return.
> > > >
> > > > 4.2. Backend, on XENDISPL_OP_DBUF_DESTROY:
> > > >    - closes all dumb handles/fd's of the buffer according to [3]
> > > >    - issues DRM_IOCTL_XEN_ZCOPY_DUMB_WAIT_FREE IOCTL to xen-
> zcopy to make
> > > > sure
> > > >      the buffer is freed (think of it as it waits for dma-buf->release
> > > > callback)
> > > So this zcopy thing keeps some kind of track of the memory usage? Why
> > > can't the user-space backend keep track of the buffer usage?
> > Because there is no dma-buf UAPI which allows to track the buffer life cycle
> > (e.g. wait until dma-buf's .release callback is called)
> > > >    - replies to xen-front that the buffer can be destroyed.
> > > > This way deletion of the buffer happens synchronously on both Dom0
> and DomU
> > > > sides. In case if DRM_IOCTL_XEN_ZCOPY_DUMB_WAIT_FREE returns
> with time-out
> > > > error
> > > > (BTW, wait time is a parameter of this IOCTL), Xen will defer grant
> > > > reference
> > > > removal and will retry later until those are free.
> > > >
> > > > Hope this helps understand how buffers are synchronously deleted in
> case
> > > > of xen-zcopy with a single protocol

[PATCH v5] xen/privcmd: add IOCTL_PRIVCMD_MMAP_RESOURCE

2018-04-11 Thread Paul Durrant

My recent Xen patch series introduces a new HYPERVISOR_memory_op to
support direct priv-mapping of certain guest resources (such as ioreq
pages, used by emulators) by a tools domain, rather than having to access
such resources via the guest P2M.

This patch adds the necessary infrastructure to the privcmd driver and
Xen MMU code to support direct resource mapping.

NOTE: The adjustment in the MMU code is partially cosmetic. Xen will now
  allow a PV tools domain to map guest pages either by GFN or MFN, thus
  the term 'mfn' has been swapped for 'pfn' in the lower layers of the
  remap code.

Signed-off-by: Paul Durrant <paul.durr...@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrov...@oracle.com>
---
Cc: Juergen Gross <jgr...@suse.com>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Ingo Molnar <mi...@redhat.com>
Cc: Stefano Stabellini <sstabell...@kernel.org>

v5:
 - Handle the case of xen_feature(XENFEAT_auto_translated_physmap) &&
   (PAGE_SIZE > XEN_PAGE_SIZE)

v4:
 - Remove stray line of debug that causes a build warning on ARM 32-bit

v3:
 - Addres comments from Boris
 - Fix ARM build

v2:
 - Fix bug when mapping multiple pages of a resource
---
 arch/arm/xen/enlighten.c   |  11 
 arch/x86/xen/mmu.c |  60 +--
 drivers/xen/privcmd.c  | 133 +
 include/uapi/xen/privcmd.h |  11 
 include/xen/interface/memory.h |  66 
 include/xen/interface/xen.h|   7 ++-
 include/xen/xen-ops.h  |  24 +++-
 7 files changed, 291 insertions(+), 21 deletions(-)

diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
index ba7f4c8f5c3e..8073625371f5 100644
--- a/arch/arm/xen/enlighten.c
+++ b/arch/arm/xen/enlighten.c
@@ -89,6 +89,17 @@ int xen_unmap_domain_gfn_range(struct vm_area_struct *vma,
 }
 EXPORT_SYMBOL_GPL(xen_unmap_domain_gfn_range);
 
+/* Not used by XENFEAT_auto_translated guests. */
+int xen_remap_domain_mfn_array(struct vm_area_struct *vma,
+  unsigned long addr,
+  xen_pfn_t *mfn, int nr,
+  int *err_ptr, pgprot_t prot,
+  unsigned int domid, struct page **pages)
+{
+   return -ENOSYS;
+}
+EXPORT_SYMBOL_GPL(xen_remap_domain_mfn_array);
+
 static void xen_read_wallclock(struct timespec64 *ts)
 {
u32 version;
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index d33e7dbe3129..af2960cb7a3e 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -65,37 +65,44 @@ static void xen_flush_tlb_all(void)
 #define REMAP_BATCH_SIZE 16
 
 struct remap_data {
-   xen_pfn_t *mfn;
+   xen_pfn_t *pfn;
bool contiguous;
+   bool no_translate;
pgprot_t prot;
struct mmu_update *mmu_update;
 };
 
-static int remap_area_mfn_pte_fn(pte_t *ptep, pgtable_t token,
+static int remap_area_pfn_pte_fn(pte_t *ptep, pgtable_t token,
 unsigned long addr, void *data)
 {
struct remap_data *rmd = data;
-   pte_t pte = pte_mkspecial(mfn_pte(*rmd->mfn, rmd->prot));
+   pte_t pte = pte_mkspecial(mfn_pte(*rmd->pfn, rmd->prot));
 
-   /* If we have a contiguous range, just update the mfn itself,
-  else update pointer to be "next mfn". */
+   /*
+* If we have a contiguous range, just update the pfn itself,
+* else update pointer to be "next pfn".
+*/
if (rmd->contiguous)
-   (*rmd->mfn)++;
+   (*rmd->pfn)++;
else
-   rmd->mfn++;
+   rmd->pfn++;
 
-   rmd->mmu_update->ptr = virt_to_machine(ptep).maddr | 
MMU_NORMAL_PT_UPDATE;
+   rmd->mmu_update->ptr = virt_to_machine(ptep).maddr;
+   rmd->mmu_update->ptr |= rmd->no_translate ?
+   MMU_PT_UPDATE_NO_TRANSLATE :
+   MMU_NORMAL_PT_UPDATE;
rmd->mmu_update->val = pte_val_ma(pte);
rmd->mmu_update++;
 
return 0;
 }
 
-static int do_remap_gfn(struct vm_area_struct *vma,
+static int do_remap_pfn(struct vm_area_struct *vma,
unsigned long addr,
-   xen_pfn_t *gfn, int nr,
+   xen_pfn_t *pfn, int nr,
int *err_ptr, pgprot_t prot,
-   unsigned domid,
+   unsigned int domid,
+   bool no_translate,
struct page **pages)
 {
int err = 0;
@@ -106,11 +113,14 @@ static int do_remap_gfn(struct vm_area_struct *vma,
 
BUG_ON(!((vma->vm_flags & (VM_PFNMAP | VM_IO)) == (VM_PFNMAP | VM_IO)));
 
-   rmd.mfn = gfn;
+   rmd.pfn = pfn;
rmd.prot = prot;
-   /* We use the err_ptr to indicate if there we are doing a contiguous
-* mapping or a discontigious mapping. */
+

[PATCH v5] xen/privcmd: add IOCTL_PRIVCMD_MMAP_RESOURCE

2018-04-11 Thread Paul Durrant

My recent Xen patch series introduces a new HYPERVISOR_memory_op to
support direct priv-mapping of certain guest resources (such as ioreq
pages, used by emulators) by a tools domain, rather than having to access
such resources via the guest P2M.

This patch adds the necessary infrastructure to the privcmd driver and
Xen MMU code to support direct resource mapping.

NOTE: The adjustment in the MMU code is partially cosmetic. Xen will now
  allow a PV tools domain to map guest pages either by GFN or MFN, thus
  the term 'mfn' has been swapped for 'pfn' in the lower layers of the
  remap code.

Signed-off-by: Paul Durrant 
Reviewed-by: Boris Ostrovsky 
---
Cc: Juergen Gross 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Stefano Stabellini 

v5:
 - Handle the case of xen_feature(XENFEAT_auto_translated_physmap) &&
   (PAGE_SIZE > XEN_PAGE_SIZE)

v4:
 - Remove stray line of debug that causes a build warning on ARM 32-bit

v3:
 - Addres comments from Boris
 - Fix ARM build

v2:
 - Fix bug when mapping multiple pages of a resource
---
 arch/arm/xen/enlighten.c   |  11 
 arch/x86/xen/mmu.c |  60 +--
 drivers/xen/privcmd.c  | 133 +
 include/uapi/xen/privcmd.h |  11 
 include/xen/interface/memory.h |  66 
 include/xen/interface/xen.h|   7 ++-
 include/xen/xen-ops.h  |  24 +++-
 7 files changed, 291 insertions(+), 21 deletions(-)

diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
index ba7f4c8f5c3e..8073625371f5 100644
--- a/arch/arm/xen/enlighten.c
+++ b/arch/arm/xen/enlighten.c
@@ -89,6 +89,17 @@ int xen_unmap_domain_gfn_range(struct vm_area_struct *vma,
 }
 EXPORT_SYMBOL_GPL(xen_unmap_domain_gfn_range);
 
+/* Not used by XENFEAT_auto_translated guests. */
+int xen_remap_domain_mfn_array(struct vm_area_struct *vma,
+  unsigned long addr,
+  xen_pfn_t *mfn, int nr,
+  int *err_ptr, pgprot_t prot,
+  unsigned int domid, struct page **pages)
+{
+   return -ENOSYS;
+}
+EXPORT_SYMBOL_GPL(xen_remap_domain_mfn_array);
+
 static void xen_read_wallclock(struct timespec64 *ts)
 {
u32 version;
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index d33e7dbe3129..af2960cb7a3e 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -65,37 +65,44 @@ static void xen_flush_tlb_all(void)
 #define REMAP_BATCH_SIZE 16
 
 struct remap_data {
-   xen_pfn_t *mfn;
+   xen_pfn_t *pfn;
bool contiguous;
+   bool no_translate;
pgprot_t prot;
struct mmu_update *mmu_update;
 };
 
-static int remap_area_mfn_pte_fn(pte_t *ptep, pgtable_t token,
+static int remap_area_pfn_pte_fn(pte_t *ptep, pgtable_t token,
 unsigned long addr, void *data)
 {
struct remap_data *rmd = data;
-   pte_t pte = pte_mkspecial(mfn_pte(*rmd->mfn, rmd->prot));
+   pte_t pte = pte_mkspecial(mfn_pte(*rmd->pfn, rmd->prot));
 
-   /* If we have a contiguous range, just update the mfn itself,
-  else update pointer to be "next mfn". */
+   /*
+* If we have a contiguous range, just update the pfn itself,
+* else update pointer to be "next pfn".
+*/
if (rmd->contiguous)
-   (*rmd->mfn)++;
+   (*rmd->pfn)++;
else
-   rmd->mfn++;
+   rmd->pfn++;
 
-   rmd->mmu_update->ptr = virt_to_machine(ptep).maddr | 
MMU_NORMAL_PT_UPDATE;
+   rmd->mmu_update->ptr = virt_to_machine(ptep).maddr;
+   rmd->mmu_update->ptr |= rmd->no_translate ?
+   MMU_PT_UPDATE_NO_TRANSLATE :
+   MMU_NORMAL_PT_UPDATE;
rmd->mmu_update->val = pte_val_ma(pte);
rmd->mmu_update++;
 
return 0;
 }
 
-static int do_remap_gfn(struct vm_area_struct *vma,
+static int do_remap_pfn(struct vm_area_struct *vma,
unsigned long addr,
-   xen_pfn_t *gfn, int nr,
+   xen_pfn_t *pfn, int nr,
int *err_ptr, pgprot_t prot,
-   unsigned domid,
+   unsigned int domid,
+   bool no_translate,
struct page **pages)
 {
int err = 0;
@@ -106,11 +113,14 @@ static int do_remap_gfn(struct vm_area_struct *vma,
 
BUG_ON(!((vma->vm_flags & (VM_PFNMAP | VM_IO)) == (VM_PFNMAP | VM_IO)));
 
-   rmd.mfn = gfn;
+   rmd.pfn = pfn;
rmd.prot = prot;
-   /* We use the err_ptr to indicate if there we are doing a contiguous
-* mapping or a discontigious mapping. */
+   /*
+* We use the err_ptr to indicate if there we are doing a contiguous
+* mapping or a discontigious mapping.
+*/
rmd.contiguous

RE: [PATCH v4] xen/privcmd: add IOCTL_PRIVCMD_MMAP_RESOURCE

2018-04-11 Thread Paul Durrant

> -Original Message-
> From: Julien Grall [mailto:julien.gr...@arm.com]
> Sent: 11 April 2018 10:46
> To: Paul Durrant <paul.durr...@citrix.com>; xen-de...@lists.xenproject.org;
> linux-arm-ker...@lists.infradead.org; linux-kernel@vger.kernel.org
> Cc: Juergen Gross <jgr...@suse.com>; Thomas Gleixner
> <t...@linutronix.de>; Stefano Stabellini <sstabell...@kernel.org>; Ingo
> Molnar <mi...@redhat.com>
> Subject: Re: [PATCH v4] xen/privcmd: add
> IOCTL_PRIVCMD_MMAP_RESOURCE
> 
> Hi,
> 
> On 10/04/18 08:58, Paul Durrant wrote:
> > +static long privcmd_ioctl_mmap_resource(struct file *file, void __user
> *udata)
> > +{
> > +   struct privcmd_data *data = file->private_data;
> > +   struct mm_struct *mm = current->mm;
> > +   struct vm_area_struct *vma;
> > +   struct privcmd_mmap_resource kdata;
> > +   xen_pfn_t *pfns = NULL;
> > +   struct xen_mem_acquire_resource xdata;
> > +   int rc;
> > +
> > +   if (copy_from_user(, udata, sizeof(kdata)))
> > +   return -EFAULT;
> > +
> > +   /* If restriction is in place, check the domid matches */
> > +   if (data->domid != DOMID_INVALID && data->domid != kdata.dom)
> > +   return -EPERM;
> > +
> > +   down_write(>mmap_sem);
> > +
> > +   vma = find_vma(mm, kdata.addr);
> > +   if (!vma || vma->vm_ops != _vm_ops) {
> > +   rc = -EINVAL;
> > +   goto out;
> > +   }
> > +
> > +   pfns = kcalloc(kdata.num, sizeof(*pfns), GFP_KERNEL);
> > +   if (!pfns) {
> > +   rc = -ENOMEM;
> > +   goto out;
> > +   }
> > +
> > +   if (xen_feature(XENFEAT_auto_translated_physmap)) {
> > +   struct page **pages;
> > +   unsigned int i;
> > +
> > +   rc = alloc_empty_pages(vma, kdata.num);
> > +   if (rc < 0)
> > +   goto out;
> > +
> > +   pages = vma->vm_private_data;
> > +   for (i = 0; i < kdata.num; i++)
> > +   pfns[i] = page_to_pfn(pages[i]);
> 
> I don't think this is going to work well if the hypervisor is using a
> different granularity for the page.
> 
> Imagine Xen is using 4K but the kernel 64K. You would end up to have the
> resource not mapped contiguously in the memory.

Good point. I do need to take account of the kernel page size in this case.

  Paul

> 
> Cheers,
> 
> --
> Julien Grall

RE: [PATCH v4] xen/privcmd: add IOCTL_PRIVCMD_MMAP_RESOURCE

2018-04-11 Thread Paul Durrant

> -Original Message-
> From: Julien Grall [mailto:julien.gr...@arm.com]
> Sent: 11 April 2018 10:46
> To: Paul Durrant ; xen-de...@lists.xenproject.org;
> linux-arm-ker...@lists.infradead.org; linux-kernel@vger.kernel.org
> Cc: Juergen Gross ; Thomas Gleixner
> ; Stefano Stabellini ; Ingo
> Molnar 
> Subject: Re: [PATCH v4] xen/privcmd: add
> IOCTL_PRIVCMD_MMAP_RESOURCE
> 
> Hi,
> 
> On 10/04/18 08:58, Paul Durrant wrote:
> > +static long privcmd_ioctl_mmap_resource(struct file *file, void __user
> *udata)
> > +{
> > +   struct privcmd_data *data = file->private_data;
> > +   struct mm_struct *mm = current->mm;
> > +   struct vm_area_struct *vma;
> > +   struct privcmd_mmap_resource kdata;
> > +   xen_pfn_t *pfns = NULL;
> > +   struct xen_mem_acquire_resource xdata;
> > +   int rc;
> > +
> > +   if (copy_from_user(, udata, sizeof(kdata)))
> > +   return -EFAULT;
> > +
> > +   /* If restriction is in place, check the domid matches */
> > +   if (data->domid != DOMID_INVALID && data->domid != kdata.dom)
> > +   return -EPERM;
> > +
> > +   down_write(>mmap_sem);
> > +
> > +   vma = find_vma(mm, kdata.addr);
> > +   if (!vma || vma->vm_ops != _vm_ops) {
> > +   rc = -EINVAL;
> > +   goto out;
> > +   }
> > +
> > +   pfns = kcalloc(kdata.num, sizeof(*pfns), GFP_KERNEL);
> > +   if (!pfns) {
> > +   rc = -ENOMEM;
> > +   goto out;
> > +   }
> > +
> > +   if (xen_feature(XENFEAT_auto_translated_physmap)) {
> > +   struct page **pages;
> > +   unsigned int i;
> > +
> > +   rc = alloc_empty_pages(vma, kdata.num);
> > +   if (rc < 0)
> > +   goto out;
> > +
> > +   pages = vma->vm_private_data;
> > +   for (i = 0; i < kdata.num; i++)
> > +   pfns[i] = page_to_pfn(pages[i]);
> 
> I don't think this is going to work well if the hypervisor is using a
> different granularity for the page.
> 
> Imagine Xen is using 4K but the kernel 64K. You would end up to have the
> resource not mapped contiguously in the memory.

Good point. I do need to take account of the kernel page size in this case.

  Paul

> 
> Cheers,
> 
> --
> Julien Grall

[PATCH v4] xen/privcmd: add IOCTL_PRIVCMD_MMAP_RESOURCE

2018-04-10 Thread Paul Durrant

My recent Xen patch series introduces a new HYPERVISOR_memory_op to
support direct priv-mapping of certain guest resources (such as ioreq
pages, used by emulators) by a tools domain, rather than having to access
such resources via the guest P2M.

This patch adds the necessary infrastructure to the privcmd driver and
Xen MMU code to support direct resource mapping.

NOTE: The adjustment in the MMU code is partially cosmetic. Xen will now
  allow a PV tools domain to map guest pages either by GFN or MFN, thus
  the term 'mfn' has been swapped for 'pfn' in the lower layers of the
  remap code.

Signed-off-by: Paul Durrant <paul.durr...@citrix.com>
Reviewed-by: Boris Ostrovsky <boris.ostrov...@oracle.com>
---
Cc: Juergen Gross <jgr...@suse.com>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Ingo Molnar <mi...@redhat.com>
Cc: Stefano Stabellini <sstabell...@kernel.org>

v4:
 - Remove stray line of debug that causes a build warning on ARM 32-bit

v3:
 - Addres comments from Boris
 - Fix ARM build

v2:
 - Fix bug when mapping multiple pages of a resource
---
 arch/arm/xen/enlighten.c   |  11 
 arch/x86/xen/mmu.c |  60 +--
 drivers/xen/privcmd.c  | 128 +
 include/uapi/xen/privcmd.h |  11 
 include/xen/interface/memory.h |  66 +
 include/xen/interface/xen.h|   7 ++-
 include/xen/xen-ops.h  |  24 +++-
 7 files changed, 286 insertions(+), 21 deletions(-)

diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
index ba7f4c8f5c3e..8073625371f5 100644
--- a/arch/arm/xen/enlighten.c
+++ b/arch/arm/xen/enlighten.c
@@ -89,6 +89,17 @@ int xen_unmap_domain_gfn_range(struct vm_area_struct *vma,
 }
 EXPORT_SYMBOL_GPL(xen_unmap_domain_gfn_range);
 
+/* Not used by XENFEAT_auto_translated guests. */
+int xen_remap_domain_mfn_array(struct vm_area_struct *vma,
+  unsigned long addr,
+  xen_pfn_t *mfn, int nr,
+  int *err_ptr, pgprot_t prot,
+  unsigned int domid, struct page **pages)
+{
+   return -ENOSYS;
+}
+EXPORT_SYMBOL_GPL(xen_remap_domain_mfn_array);
+
 static void xen_read_wallclock(struct timespec64 *ts)
 {
u32 version;
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index d33e7dbe3129..af2960cb7a3e 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -65,37 +65,44 @@ static void xen_flush_tlb_all(void)
 #define REMAP_BATCH_SIZE 16
 
 struct remap_data {
-   xen_pfn_t *mfn;
+   xen_pfn_t *pfn;
bool contiguous;
+   bool no_translate;
pgprot_t prot;
struct mmu_update *mmu_update;
 };
 
-static int remap_area_mfn_pte_fn(pte_t *ptep, pgtable_t token,
+static int remap_area_pfn_pte_fn(pte_t *ptep, pgtable_t token,
 unsigned long addr, void *data)
 {
struct remap_data *rmd = data;
-   pte_t pte = pte_mkspecial(mfn_pte(*rmd->mfn, rmd->prot));
+   pte_t pte = pte_mkspecial(mfn_pte(*rmd->pfn, rmd->prot));
 
-   /* If we have a contiguous range, just update the mfn itself,
-  else update pointer to be "next mfn". */
+   /*
+* If we have a contiguous range, just update the pfn itself,
+* else update pointer to be "next pfn".
+*/
if (rmd->contiguous)
-   (*rmd->mfn)++;
+   (*rmd->pfn)++;
else
-   rmd->mfn++;
+   rmd->pfn++;
 
-   rmd->mmu_update->ptr = virt_to_machine(ptep).maddr | 
MMU_NORMAL_PT_UPDATE;
+   rmd->mmu_update->ptr = virt_to_machine(ptep).maddr;
+   rmd->mmu_update->ptr |= rmd->no_translate ?
+   MMU_PT_UPDATE_NO_TRANSLATE :
+   MMU_NORMAL_PT_UPDATE;
rmd->mmu_update->val = pte_val_ma(pte);
rmd->mmu_update++;
 
return 0;
 }
 
-static int do_remap_gfn(struct vm_area_struct *vma,
+static int do_remap_pfn(struct vm_area_struct *vma,
unsigned long addr,
-   xen_pfn_t *gfn, int nr,
+   xen_pfn_t *pfn, int nr,
int *err_ptr, pgprot_t prot,
-   unsigned domid,
+   unsigned int domid,
+   bool no_translate,
struct page **pages)
 {
int err = 0;
@@ -106,11 +113,14 @@ static int do_remap_gfn(struct vm_area_struct *vma,
 
BUG_ON(!((vma->vm_flags & (VM_PFNMAP | VM_IO)) == (VM_PFNMAP | VM_IO)));
 
-   rmd.mfn = gfn;
+   rmd.pfn = pfn;
rmd.prot = prot;
-   /* We use the err_ptr to indicate if there we are doing a contiguous
-* mapping or a discontigious mapping. */
+   /*
+* We use the err_ptr to indicate if there we are doing a contiguous
+* mapping or a discontigi

[PATCH v4] xen/privcmd: add IOCTL_PRIVCMD_MMAP_RESOURCE

2018-04-10 Thread Paul Durrant

My recent Xen patch series introduces a new HYPERVISOR_memory_op to
support direct priv-mapping of certain guest resources (such as ioreq
pages, used by emulators) by a tools domain, rather than having to access
such resources via the guest P2M.

This patch adds the necessary infrastructure to the privcmd driver and
Xen MMU code to support direct resource mapping.

NOTE: The adjustment in the MMU code is partially cosmetic. Xen will now
  allow a PV tools domain to map guest pages either by GFN or MFN, thus
  the term 'mfn' has been swapped for 'pfn' in the lower layers of the
  remap code.

Signed-off-by: Paul Durrant 
Reviewed-by: Boris Ostrovsky 
---
Cc: Juergen Gross 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Stefano Stabellini 

v4:
 - Remove stray line of debug that causes a build warning on ARM 32-bit

v3:
 - Addres comments from Boris
 - Fix ARM build

v2:
 - Fix bug when mapping multiple pages of a resource
---
 arch/arm/xen/enlighten.c   |  11 
 arch/x86/xen/mmu.c |  60 +--
 drivers/xen/privcmd.c  | 128 +
 include/uapi/xen/privcmd.h |  11 
 include/xen/interface/memory.h |  66 +
 include/xen/interface/xen.h|   7 ++-
 include/xen/xen-ops.h  |  24 +++-
 7 files changed, 286 insertions(+), 21 deletions(-)

diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
index ba7f4c8f5c3e..8073625371f5 100644
--- a/arch/arm/xen/enlighten.c
+++ b/arch/arm/xen/enlighten.c
@@ -89,6 +89,17 @@ int xen_unmap_domain_gfn_range(struct vm_area_struct *vma,
 }
 EXPORT_SYMBOL_GPL(xen_unmap_domain_gfn_range);
 
+/* Not used by XENFEAT_auto_translated guests. */
+int xen_remap_domain_mfn_array(struct vm_area_struct *vma,
+  unsigned long addr,
+  xen_pfn_t *mfn, int nr,
+  int *err_ptr, pgprot_t prot,
+  unsigned int domid, struct page **pages)
+{
+   return -ENOSYS;
+}
+EXPORT_SYMBOL_GPL(xen_remap_domain_mfn_array);
+
 static void xen_read_wallclock(struct timespec64 *ts)
 {
u32 version;
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index d33e7dbe3129..af2960cb7a3e 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -65,37 +65,44 @@ static void xen_flush_tlb_all(void)
 #define REMAP_BATCH_SIZE 16
 
 struct remap_data {
-   xen_pfn_t *mfn;
+   xen_pfn_t *pfn;
bool contiguous;
+   bool no_translate;
pgprot_t prot;
struct mmu_update *mmu_update;
 };
 
-static int remap_area_mfn_pte_fn(pte_t *ptep, pgtable_t token,
+static int remap_area_pfn_pte_fn(pte_t *ptep, pgtable_t token,
 unsigned long addr, void *data)
 {
struct remap_data *rmd = data;
-   pte_t pte = pte_mkspecial(mfn_pte(*rmd->mfn, rmd->prot));
+   pte_t pte = pte_mkspecial(mfn_pte(*rmd->pfn, rmd->prot));
 
-   /* If we have a contiguous range, just update the mfn itself,
-  else update pointer to be "next mfn". */
+   /*
+* If we have a contiguous range, just update the pfn itself,
+* else update pointer to be "next pfn".
+*/
if (rmd->contiguous)
-   (*rmd->mfn)++;
+   (*rmd->pfn)++;
else
-   rmd->mfn++;
+   rmd->pfn++;
 
-   rmd->mmu_update->ptr = virt_to_machine(ptep).maddr | 
MMU_NORMAL_PT_UPDATE;
+   rmd->mmu_update->ptr = virt_to_machine(ptep).maddr;
+   rmd->mmu_update->ptr |= rmd->no_translate ?
+   MMU_PT_UPDATE_NO_TRANSLATE :
+   MMU_NORMAL_PT_UPDATE;
rmd->mmu_update->val = pte_val_ma(pte);
rmd->mmu_update++;
 
return 0;
 }
 
-static int do_remap_gfn(struct vm_area_struct *vma,
+static int do_remap_pfn(struct vm_area_struct *vma,
unsigned long addr,
-   xen_pfn_t *gfn, int nr,
+   xen_pfn_t *pfn, int nr,
int *err_ptr, pgprot_t prot,
-   unsigned domid,
+   unsigned int domid,
+   bool no_translate,
struct page **pages)
 {
int err = 0;
@@ -106,11 +113,14 @@ static int do_remap_gfn(struct vm_area_struct *vma,
 
BUG_ON(!((vma->vm_flags & (VM_PFNMAP | VM_IO)) == (VM_PFNMAP | VM_IO)));
 
-   rmd.mfn = gfn;
+   rmd.pfn = pfn;
rmd.prot = prot;
-   /* We use the err_ptr to indicate if there we are doing a contiguous
-* mapping or a discontigious mapping. */
+   /*
+* We use the err_ptr to indicate if there we are doing a contiguous
+* mapping or a discontigious mapping.
+*/
rmd.contiguous = !err_ptr;
+   rmd.no_translate = no_translate;
 
while (nr) {
i

RE: [PATCH v3] xen/privcmd: add IOCTL_PRIVCMD_MMAP_RESOURCE

2018-04-10 Thread Paul Durrant

> -Original Message-
> From: Boris Ostrovsky [mailto:boris.ostrov...@oracle.com]
> Sent: 09 April 2018 20:19
> To: Paul Durrant <paul.durr...@citrix.com>; xen-de...@lists.xenproject.org;
> linux-arm-ker...@lists.infradead.org; linux-kernel@vger.kernel.org
> Cc: Juergen Gross <jgr...@suse.com>; Thomas Gleixner
> <t...@linutronix.de>; Ingo Molnar <mi...@redhat.com>; Stefano Stabellini
> <sstabell...@kernel.org>
> Subject: Re: [PATCH v3] xen/privcmd: add
> IOCTL_PRIVCMD_MMAP_RESOURCE
> 
> On 04/09/2018 12:36 PM, Boris Ostrovsky wrote:
> > On 04/09/2018 05:36 AM, Paul Durrant wrote:
> >> My recent Xen patch series introduces a new HYPERVISOR_memory_op
> to
> >> support direct priv-mapping of certain guest resources (such as ioreq
> >> pages, used by emulators) by a tools domain, rather than having to access
> >> such resources via the guest P2M.
> >>
> >> This patch adds the necessary infrastructure to the privcmd driver and
> >> Xen MMU code to support direct resource mapping.
> >>
> >> NOTE: The adjustment in the MMU code is partially cosmetic. Xen will
> now
> >>   allow a PV tools domain to map guest pages either by GFN or MFN,
> thus
> >>   the term 'mfn' has been swapped for 'pfn' in the lower layers of the
> >>   remap code.
> >>
> >> Signed-off-by: Paul Durrant <paul.durr...@citrix.com>
> > Reviewed-by: Boris Ostrovsky <boris.ostrov...@oracle.com>
> >
> > I think this will have to wait until 4.18 though, it's somewhat late for
> > current merge window right now.
> 
> 
> Warns on 32-bit ARM build:
> 
>   CC  drivers/xen/privcmd.o
> In file included from /data/upstream/linux-xen/include/linux/kernel.h:14:0,
>  from /data/upstream/linux-xen/drivers/xen/privcmd.c:11:
> /data/upstream/linux-xen/drivers/xen/privcmd.c: In function
> ‘privcmd_ioctl_mmap_resource’:
> /data/upstream/linux-xen/drivers/xen/privcmd.c:788:33: warning: cast to
> pointer from integer of different size [-Wint-to-pointer-cast]
>     pr_info("pfn[%u] = %p\n", i, (void *)pfns[i]);

I'm glad that was caught. It was a line of debug that was supposed to have been 
removed.

  Paul

>  ^
> /data/upstream/linux-xen/include/linux/printk.h:308:34: note: in
> definition of macro ‘pr_info’
>   printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
>   ^~~
>   AR  drivers/xen/xen-privcmd.o
> 
> 
> -boris

RE: [PATCH v3] xen/privcmd: add IOCTL_PRIVCMD_MMAP_RESOURCE

2018-04-10 Thread Paul Durrant

> -Original Message-
> From: Boris Ostrovsky [mailto:boris.ostrov...@oracle.com]
> Sent: 09 April 2018 20:19
> To: Paul Durrant ; xen-de...@lists.xenproject.org;
> linux-arm-ker...@lists.infradead.org; linux-kernel@vger.kernel.org
> Cc: Juergen Gross ; Thomas Gleixner
> ; Ingo Molnar ; Stefano Stabellini
> 
> Subject: Re: [PATCH v3] xen/privcmd: add
> IOCTL_PRIVCMD_MMAP_RESOURCE
> 
> On 04/09/2018 12:36 PM, Boris Ostrovsky wrote:
> > On 04/09/2018 05:36 AM, Paul Durrant wrote:
> >> My recent Xen patch series introduces a new HYPERVISOR_memory_op
> to
> >> support direct priv-mapping of certain guest resources (such as ioreq
> >> pages, used by emulators) by a tools domain, rather than having to access
> >> such resources via the guest P2M.
> >>
> >> This patch adds the necessary infrastructure to the privcmd driver and
> >> Xen MMU code to support direct resource mapping.
> >>
> >> NOTE: The adjustment in the MMU code is partially cosmetic. Xen will
> now
> >>   allow a PV tools domain to map guest pages either by GFN or MFN,
> thus
> >>   the term 'mfn' has been swapped for 'pfn' in the lower layers of the
> >>   remap code.
> >>
> >> Signed-off-by: Paul Durrant 
> > Reviewed-by: Boris Ostrovsky 
> >
> > I think this will have to wait until 4.18 though, it's somewhat late for
> > current merge window right now.
> 
> 
> Warns on 32-bit ARM build:
> 
>   CC  drivers/xen/privcmd.o
> In file included from /data/upstream/linux-xen/include/linux/kernel.h:14:0,
>  from /data/upstream/linux-xen/drivers/xen/privcmd.c:11:
> /data/upstream/linux-xen/drivers/xen/privcmd.c: In function
> ‘privcmd_ioctl_mmap_resource’:
> /data/upstream/linux-xen/drivers/xen/privcmd.c:788:33: warning: cast to
> pointer from integer of different size [-Wint-to-pointer-cast]
>     pr_info("pfn[%u] = %p\n", i, (void *)pfns[i]);

I'm glad that was caught. It was a line of debug that was supposed to have been 
removed.

  Paul

>  ^
> /data/upstream/linux-xen/include/linux/printk.h:308:34: note: in
> definition of macro ‘pr_info’
>   printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
>   ^~~
>   AR  drivers/xen/xen-privcmd.o
> 
> 
> -boris

[PATCH v3] xen/privcmd: add IOCTL_PRIVCMD_MMAP_RESOURCE

2018-04-09 Thread Paul Durrant

My recent Xen patch series introduces a new HYPERVISOR_memory_op to
support direct priv-mapping of certain guest resources (such as ioreq
pages, used by emulators) by a tools domain, rather than having to access
such resources via the guest P2M.

This patch adds the necessary infrastructure to the privcmd driver and
Xen MMU code to support direct resource mapping.

NOTE: The adjustment in the MMU code is partially cosmetic. Xen will now
  allow a PV tools domain to map guest pages either by GFN or MFN, thus
  the term 'mfn' has been swapped for 'pfn' in the lower layers of the
  remap code.

Signed-off-by: Paul Durrant <paul.durr...@citrix.com>
---
Cc: Boris Ostrovsky <boris.ostrov...@oracle.com>
Cc: Juergen Gross <jgr...@suse.com>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Ingo Molnar <mi...@redhat.com>
Cc: Stefano Stabellini <sstabell...@kernel.org>

v2:
 - Fix bug when mapping multiple pages of a resource

v3:
 - Addres comments from Boris
 - Fix ARM build
---
 arch/arm/xen/enlighten.c   |  11 
 arch/x86/xen/mmu.c |  60 +--
 drivers/xen/privcmd.c  | 130 +
 include/uapi/xen/privcmd.h |  11 
 include/xen/interface/memory.h |  66 +
 include/xen/interface/xen.h|   7 ++-
 include/xen/xen-ops.h  |  24 +++-
 7 files changed, 288 insertions(+), 21 deletions(-)

diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
index ba7f4c8f5c3e..8073625371f5 100644
--- a/arch/arm/xen/enlighten.c
+++ b/arch/arm/xen/enlighten.c
@@ -89,6 +89,17 @@ int xen_unmap_domain_gfn_range(struct vm_area_struct *vma,
 }
 EXPORT_SYMBOL_GPL(xen_unmap_domain_gfn_range);
 
+/* Not used by XENFEAT_auto_translated guests. */
+int xen_remap_domain_mfn_array(struct vm_area_struct *vma,
+  unsigned long addr,
+  xen_pfn_t *mfn, int nr,
+  int *err_ptr, pgprot_t prot,
+  unsigned int domid, struct page **pages)
+{
+   return -ENOSYS;
+}
+EXPORT_SYMBOL_GPL(xen_remap_domain_mfn_array);
+
 static void xen_read_wallclock(struct timespec64 *ts)
 {
u32 version;
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index d33e7dbe3129..af2960cb7a3e 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -65,37 +65,44 @@ static void xen_flush_tlb_all(void)
 #define REMAP_BATCH_SIZE 16
 
 struct remap_data {
-   xen_pfn_t *mfn;
+   xen_pfn_t *pfn;
bool contiguous;
+   bool no_translate;
pgprot_t prot;
struct mmu_update *mmu_update;
 };
 
-static int remap_area_mfn_pte_fn(pte_t *ptep, pgtable_t token,
+static int remap_area_pfn_pte_fn(pte_t *ptep, pgtable_t token,
 unsigned long addr, void *data)
 {
struct remap_data *rmd = data;
-   pte_t pte = pte_mkspecial(mfn_pte(*rmd->mfn, rmd->prot));
+   pte_t pte = pte_mkspecial(mfn_pte(*rmd->pfn, rmd->prot));
 
-   /* If we have a contiguous range, just update the mfn itself,
-  else update pointer to be "next mfn". */
+   /*
+* If we have a contiguous range, just update the pfn itself,
+* else update pointer to be "next pfn".
+*/
if (rmd->contiguous)
-   (*rmd->mfn)++;
+   (*rmd->pfn)++;
else
-   rmd->mfn++;
+   rmd->pfn++;
 
-   rmd->mmu_update->ptr = virt_to_machine(ptep).maddr | 
MMU_NORMAL_PT_UPDATE;
+   rmd->mmu_update->ptr = virt_to_machine(ptep).maddr;
+   rmd->mmu_update->ptr |= rmd->no_translate ?
+   MMU_PT_UPDATE_NO_TRANSLATE :
+   MMU_NORMAL_PT_UPDATE;
rmd->mmu_update->val = pte_val_ma(pte);
rmd->mmu_update++;
 
return 0;
 }
 
-static int do_remap_gfn(struct vm_area_struct *vma,
+static int do_remap_pfn(struct vm_area_struct *vma,
unsigned long addr,
-   xen_pfn_t *gfn, int nr,
+   xen_pfn_t *pfn, int nr,
int *err_ptr, pgprot_t prot,
-   unsigned domid,
+   unsigned int domid,
+   bool no_translate,
struct page **pages)
 {
int err = 0;
@@ -106,11 +113,14 @@ static int do_remap_gfn(struct vm_area_struct *vma,
 
BUG_ON(!((vma->vm_flags & (VM_PFNMAP | VM_IO)) == (VM_PFNMAP | VM_IO)));
 
-   rmd.mfn = gfn;
+   rmd.pfn = pfn;
rmd.prot = prot;
-   /* We use the err_ptr to indicate if there we are doing a contiguous
-* mapping or a discontigious mapping. */
+   /*
+* We use the err_ptr to indicate if there we are doing a contiguous
+* mapping or a discontigious mapping.
+*/
rmd.contiguous = !err_ptr;
+   rmd.no_translate

[PATCH v3] xen/privcmd: add IOCTL_PRIVCMD_MMAP_RESOURCE

2018-04-09 Thread Paul Durrant

My recent Xen patch series introduces a new HYPERVISOR_memory_op to
support direct priv-mapping of certain guest resources (such as ioreq
pages, used by emulators) by a tools domain, rather than having to access
such resources via the guest P2M.

This patch adds the necessary infrastructure to the privcmd driver and
Xen MMU code to support direct resource mapping.

NOTE: The adjustment in the MMU code is partially cosmetic. Xen will now
  allow a PV tools domain to map guest pages either by GFN or MFN, thus
  the term 'mfn' has been swapped for 'pfn' in the lower layers of the
  remap code.

Signed-off-by: Paul Durrant 
---
Cc: Boris Ostrovsky 
Cc: Juergen Gross 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: Stefano Stabellini 

v2:
 - Fix bug when mapping multiple pages of a resource

v3:
 - Addres comments from Boris
 - Fix ARM build
---
 arch/arm/xen/enlighten.c   |  11 
 arch/x86/xen/mmu.c |  60 +--
 drivers/xen/privcmd.c  | 130 +
 include/uapi/xen/privcmd.h |  11 
 include/xen/interface/memory.h |  66 +
 include/xen/interface/xen.h|   7 ++-
 include/xen/xen-ops.h  |  24 +++-
 7 files changed, 288 insertions(+), 21 deletions(-)

diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
index ba7f4c8f5c3e..8073625371f5 100644
--- a/arch/arm/xen/enlighten.c
+++ b/arch/arm/xen/enlighten.c
@@ -89,6 +89,17 @@ int xen_unmap_domain_gfn_range(struct vm_area_struct *vma,
 }
 EXPORT_SYMBOL_GPL(xen_unmap_domain_gfn_range);
 
+/* Not used by XENFEAT_auto_translated guests. */
+int xen_remap_domain_mfn_array(struct vm_area_struct *vma,
+  unsigned long addr,
+  xen_pfn_t *mfn, int nr,
+  int *err_ptr, pgprot_t prot,
+  unsigned int domid, struct page **pages)
+{
+   return -ENOSYS;
+}
+EXPORT_SYMBOL_GPL(xen_remap_domain_mfn_array);
+
 static void xen_read_wallclock(struct timespec64 *ts)
 {
u32 version;
diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index d33e7dbe3129..af2960cb7a3e 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -65,37 +65,44 @@ static void xen_flush_tlb_all(void)
 #define REMAP_BATCH_SIZE 16
 
 struct remap_data {
-   xen_pfn_t *mfn;
+   xen_pfn_t *pfn;
bool contiguous;
+   bool no_translate;
pgprot_t prot;
struct mmu_update *mmu_update;
 };
 
-static int remap_area_mfn_pte_fn(pte_t *ptep, pgtable_t token,
+static int remap_area_pfn_pte_fn(pte_t *ptep, pgtable_t token,
 unsigned long addr, void *data)
 {
struct remap_data *rmd = data;
-   pte_t pte = pte_mkspecial(mfn_pte(*rmd->mfn, rmd->prot));
+   pte_t pte = pte_mkspecial(mfn_pte(*rmd->pfn, rmd->prot));
 
-   /* If we have a contiguous range, just update the mfn itself,
-  else update pointer to be "next mfn". */
+   /*
+* If we have a contiguous range, just update the pfn itself,
+* else update pointer to be "next pfn".
+*/
if (rmd->contiguous)
-   (*rmd->mfn)++;
+   (*rmd->pfn)++;
else
-   rmd->mfn++;
+   rmd->pfn++;
 
-   rmd->mmu_update->ptr = virt_to_machine(ptep).maddr | 
MMU_NORMAL_PT_UPDATE;
+   rmd->mmu_update->ptr = virt_to_machine(ptep).maddr;
+   rmd->mmu_update->ptr |= rmd->no_translate ?
+   MMU_PT_UPDATE_NO_TRANSLATE :
+   MMU_NORMAL_PT_UPDATE;
rmd->mmu_update->val = pte_val_ma(pte);
rmd->mmu_update++;
 
return 0;
 }
 
-static int do_remap_gfn(struct vm_area_struct *vma,
+static int do_remap_pfn(struct vm_area_struct *vma,
unsigned long addr,
-   xen_pfn_t *gfn, int nr,
+   xen_pfn_t *pfn, int nr,
int *err_ptr, pgprot_t prot,
-   unsigned domid,
+   unsigned int domid,
+   bool no_translate,
struct page **pages)
 {
int err = 0;
@@ -106,11 +113,14 @@ static int do_remap_gfn(struct vm_area_struct *vma,
 
BUG_ON(!((vma->vm_flags & (VM_PFNMAP | VM_IO)) == (VM_PFNMAP | VM_IO)));
 
-   rmd.mfn = gfn;
+   rmd.pfn = pfn;
rmd.prot = prot;
-   /* We use the err_ptr to indicate if there we are doing a contiguous
-* mapping or a discontigious mapping. */
+   /*
+* We use the err_ptr to indicate if there we are doing a contiguous
+* mapping or a discontigious mapping.
+*/
rmd.contiguous = !err_ptr;
+   rmd.no_translate = no_translate;
 
while (nr) {
int index = 0;
@@ -121,7 +131,7 @@ static int do_remap_gfn(struct vm_area_struct *vma,
 
rmd

RE: [PATCH v2] xen/privcmd: add IOCTL_PRIVCMD_MMAP_RESOURCE

2018-04-09 Thread Paul Durrant

> -Original Message-
> From: Boris Ostrovsky [mailto:boris.ostrov...@oracle.com]
> Sent: 05 April 2018 23:34
> To: Paul Durrant <paul.durr...@citrix.com>; x...@kernel.org; xen-
> de...@lists.xenproject.org; linux-kernel@vger.kernel.org
> Cc: Juergen Gross <jgr...@suse.com>; Thomas Gleixner
> <t...@linutronix.de>; Ingo Molnar <mi...@redhat.com>
> Subject: Re: [PATCH v2] xen/privcmd: add
> IOCTL_PRIVCMD_MMAP_RESOURCE
> 
> On 04/05/2018 11:42 AM, Paul Durrant wrote:
> > My recent Xen patch series introduces a new HYPERVISOR_memory_op to
> > support direct priv-mapping of certain guest resources (such as ioreq
> > pages, used by emulators) by a tools domain, rather than having to access
> > such resources via the guest P2M.
> >
> > This patch adds the necessary infrastructure to the privcmd driver and
> > Xen MMU code to support direct resource mapping.
> >
> > NOTE: The adjustment in the MMU code is partially cosmetic. Xen will now
> >   allow a PV tools domain to map guest pages either by GFN or MFN, thus
> >   the term 'mfn' has been swapped for 'pfn' in the lower layers of the
> >   remap code.
> >
> > Signed-off-by: Paul Durrant <paul.durr...@citrix.com>
> > ---
> > Cc: Boris Ostrovsky <boris.ostrov...@oracle.com>
> > Cc: Juergen Gross <jgr...@suse.com>
> > Cc: Thomas Gleixner <t...@linutronix.de>
> > Cc: Ingo Molnar <mi...@redhat.com>
> >
> > v2:
> >  - Fix bug when mapping multiple pages of a resource
> 
> 
> Only a few nits below.
> 
> > ---
> >  arch/x86/xen/mmu.c |  50 +++-
> >  drivers/xen/privcmd.c  | 130
> +
> >  include/uapi/xen/privcmd.h |  11 
> >  include/xen/interface/memory.h |  66 +
> >  include/xen/interface/xen.h|   7 ++-
> >  include/xen/xen-ops.h  |  24 +++-
> >  6 files changed, 270 insertions(+), 18 deletions(-)
> >
> > diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> > index d33e7dbe3129..8453d7be415c 100644
> > --- a/arch/x86/xen/mmu.c
> > +++ b/arch/x86/xen/mmu.c
> > @@ -65,37 +65,42 @@ static void xen_flush_tlb_all(void)
> >  #define REMAP_BATCH_SIZE 16
> >
> >  struct remap_data {
> > -   xen_pfn_t *mfn;
> > +   xen_pfn_t *pfn;
> > bool contiguous;
> > +   bool no_translate;
> > pgprot_t prot;
> > struct mmu_update *mmu_update;
> >  };
> >
> > -static int remap_area_mfn_pte_fn(pte_t *ptep, pgtable_t token,
> > +static int remap_area_pfn_pte_fn(pte_t *ptep, pgtable_t token,
> >  unsigned long addr, void *data)
> >  {
> > struct remap_data *rmd = data;
> > -   pte_t pte = pte_mkspecial(mfn_pte(*rmd->mfn, rmd->prot));
> > +   pte_t pte = pte_mkspecial(mfn_pte(*rmd->pfn, rmd->prot));
> >
> > /* If we have a contiguous range, just update the mfn itself,
> >else update pointer to be "next mfn". */
> 
> This probably also needs to be updated (and while at it, comment style
> fixed)
> 

Ok.

> > if (rmd->contiguous)
> > -   (*rmd->mfn)++;
> > +   (*rmd->pfn)++;
> > else
> > -   rmd->mfn++;
> > +   rmd->pfn++;
> >
> > -   rmd->mmu_update->ptr = virt_to_machine(ptep).maddr |
> MMU_NORMAL_PT_UPDATE;
> > +   rmd->mmu_update->ptr = virt_to_machine(ptep).maddr;
> > +   rmd->mmu_update->ptr |= rmd->no_translate ?
> > +   MMU_PT_UPDATE_NO_TRANSLATE :
> > +   MMU_NORMAL_PT_UPDATE;
> > rmd->mmu_update->val = pte_val_ma(pte);
> > rmd->mmu_update++;
> >
> > return 0;
> >  }
> >
> > -static int do_remap_gfn(struct vm_area_struct *vma,
> > +static int do_remap_pfn(struct vm_area_struct *vma,
> > unsigned long addr,
> > -   xen_pfn_t *gfn, int nr,
> > +   xen_pfn_t *pfn, int nr,
> > int *err_ptr, pgprot_t prot,
> > -   unsigned domid,
> > +   unsigned int domid,
> > +   bool no_translate,
> > struct page **pages)
> >  {
> > int err = 0;
> > @@ -106,11 +111,12 @@ static int do_remap_gfn(struct vm_area_struct
> *vma,
> >
> > BUG_ON(!((vma->vm_flags & (VM_PFNMAP | VM_IO)) ==
> (VM_PFNMAP | VM_IO)));
> >
> > -   rmd.mfn = gfn;
> > +   rmd.pfn = pfn;
> &g

RE: [PATCH v2] xen/privcmd: add IOCTL_PRIVCMD_MMAP_RESOURCE

2018-04-09 Thread Paul Durrant

> -Original Message-
> From: Boris Ostrovsky [mailto:boris.ostrov...@oracle.com]
> Sent: 05 April 2018 23:34
> To: Paul Durrant ; x...@kernel.org; xen-
> de...@lists.xenproject.org; linux-kernel@vger.kernel.org
> Cc: Juergen Gross ; Thomas Gleixner
> ; Ingo Molnar 
> Subject: Re: [PATCH v2] xen/privcmd: add
> IOCTL_PRIVCMD_MMAP_RESOURCE
> 
> On 04/05/2018 11:42 AM, Paul Durrant wrote:
> > My recent Xen patch series introduces a new HYPERVISOR_memory_op to
> > support direct priv-mapping of certain guest resources (such as ioreq
> > pages, used by emulators) by a tools domain, rather than having to access
> > such resources via the guest P2M.
> >
> > This patch adds the necessary infrastructure to the privcmd driver and
> > Xen MMU code to support direct resource mapping.
> >
> > NOTE: The adjustment in the MMU code is partially cosmetic. Xen will now
> >   allow a PV tools domain to map guest pages either by GFN or MFN, thus
> >   the term 'mfn' has been swapped for 'pfn' in the lower layers of the
> >   remap code.
> >
> > Signed-off-by: Paul Durrant 
> > ---
> > Cc: Boris Ostrovsky 
> > Cc: Juergen Gross 
> > Cc: Thomas Gleixner 
> > Cc: Ingo Molnar 
> >
> > v2:
> >  - Fix bug when mapping multiple pages of a resource
> 
> 
> Only a few nits below.
> 
> > ---
> >  arch/x86/xen/mmu.c |  50 +++-
> >  drivers/xen/privcmd.c  | 130
> +
> >  include/uapi/xen/privcmd.h |  11 
> >  include/xen/interface/memory.h |  66 +
> >  include/xen/interface/xen.h|   7 ++-
> >  include/xen/xen-ops.h  |  24 +++-
> >  6 files changed, 270 insertions(+), 18 deletions(-)
> >
> > diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> > index d33e7dbe3129..8453d7be415c 100644
> > --- a/arch/x86/xen/mmu.c
> > +++ b/arch/x86/xen/mmu.c
> > @@ -65,37 +65,42 @@ static void xen_flush_tlb_all(void)
> >  #define REMAP_BATCH_SIZE 16
> >
> >  struct remap_data {
> > -   xen_pfn_t *mfn;
> > +   xen_pfn_t *pfn;
> > bool contiguous;
> > +   bool no_translate;
> > pgprot_t prot;
> > struct mmu_update *mmu_update;
> >  };
> >
> > -static int remap_area_mfn_pte_fn(pte_t *ptep, pgtable_t token,
> > +static int remap_area_pfn_pte_fn(pte_t *ptep, pgtable_t token,
> >  unsigned long addr, void *data)
> >  {
> > struct remap_data *rmd = data;
> > -   pte_t pte = pte_mkspecial(mfn_pte(*rmd->mfn, rmd->prot));
> > +   pte_t pte = pte_mkspecial(mfn_pte(*rmd->pfn, rmd->prot));
> >
> > /* If we have a contiguous range, just update the mfn itself,
> >else update pointer to be "next mfn". */
> 
> This probably also needs to be updated (and while at it, comment style
> fixed)
> 

Ok.

> > if (rmd->contiguous)
> > -   (*rmd->mfn)++;
> > +   (*rmd->pfn)++;
> > else
> > -   rmd->mfn++;
> > +   rmd->pfn++;
> >
> > -   rmd->mmu_update->ptr = virt_to_machine(ptep).maddr |
> MMU_NORMAL_PT_UPDATE;
> > +   rmd->mmu_update->ptr = virt_to_machine(ptep).maddr;
> > +   rmd->mmu_update->ptr |= rmd->no_translate ?
> > +   MMU_PT_UPDATE_NO_TRANSLATE :
> > +   MMU_NORMAL_PT_UPDATE;
> > rmd->mmu_update->val = pte_val_ma(pte);
> > rmd->mmu_update++;
> >
> > return 0;
> >  }
> >
> > -static int do_remap_gfn(struct vm_area_struct *vma,
> > +static int do_remap_pfn(struct vm_area_struct *vma,
> > unsigned long addr,
> > -   xen_pfn_t *gfn, int nr,
> > +   xen_pfn_t *pfn, int nr,
> > int *err_ptr, pgprot_t prot,
> > -   unsigned domid,
> > +   unsigned int domid,
> > +   bool no_translate,
> > struct page **pages)
> >  {
> > int err = 0;
> > @@ -106,11 +111,12 @@ static int do_remap_gfn(struct vm_area_struct
> *vma,
> >
> > BUG_ON(!((vma->vm_flags & (VM_PFNMAP | VM_IO)) ==
> (VM_PFNMAP | VM_IO)));
> >
> > -   rmd.mfn = gfn;
> > +   rmd.pfn = pfn;
> > rmd.prot = prot;
> > /* We use the err_ptr to indicate if there we are doing a contiguous
> >  * mapping or a discontigious mapping. */
> 
> Style.
> 

I'm not modifying this comment but I'll fix it.

> &g

[PATCH v2] xen/privcmd: add IOCTL_PRIVCMD_MMAP_RESOURCE

2018-04-05 Thread Paul Durrant

My recent Xen patch series introduces a new HYPERVISOR_memory_op to
support direct priv-mapping of certain guest resources (such as ioreq
pages, used by emulators) by a tools domain, rather than having to access
such resources via the guest P2M.

This patch adds the necessary infrastructure to the privcmd driver and
Xen MMU code to support direct resource mapping.

NOTE: The adjustment in the MMU code is partially cosmetic. Xen will now
  allow a PV tools domain to map guest pages either by GFN or MFN, thus
  the term 'mfn' has been swapped for 'pfn' in the lower layers of the
  remap code.

Signed-off-by: Paul Durrant <paul.durr...@citrix.com>
---
Cc: Boris Ostrovsky <boris.ostrov...@oracle.com>
Cc: Juergen Gross <jgr...@suse.com>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Ingo Molnar <mi...@redhat.com>

v2:
 - Fix bug when mapping multiple pages of a resource
---
 arch/x86/xen/mmu.c |  50 +++-
 drivers/xen/privcmd.c  | 130 +
 include/uapi/xen/privcmd.h |  11 
 include/xen/interface/memory.h |  66 +
 include/xen/interface/xen.h|   7 ++-
 include/xen/xen-ops.h  |  24 +++-
 6 files changed, 270 insertions(+), 18 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index d33e7dbe3129..8453d7be415c 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -65,37 +65,42 @@ static void xen_flush_tlb_all(void)
 #define REMAP_BATCH_SIZE 16
 
 struct remap_data {
-   xen_pfn_t *mfn;
+   xen_pfn_t *pfn;
bool contiguous;
+   bool no_translate;
pgprot_t prot;
struct mmu_update *mmu_update;
 };
 
-static int remap_area_mfn_pte_fn(pte_t *ptep, pgtable_t token,
+static int remap_area_pfn_pte_fn(pte_t *ptep, pgtable_t token,
 unsigned long addr, void *data)
 {
struct remap_data *rmd = data;
-   pte_t pte = pte_mkspecial(mfn_pte(*rmd->mfn, rmd->prot));
+   pte_t pte = pte_mkspecial(mfn_pte(*rmd->pfn, rmd->prot));
 
/* If we have a contiguous range, just update the mfn itself,
   else update pointer to be "next mfn". */
if (rmd->contiguous)
-   (*rmd->mfn)++;
+   (*rmd->pfn)++;
else
-   rmd->mfn++;
+   rmd->pfn++;
 
-   rmd->mmu_update->ptr = virt_to_machine(ptep).maddr | 
MMU_NORMAL_PT_UPDATE;
+   rmd->mmu_update->ptr = virt_to_machine(ptep).maddr;
+   rmd->mmu_update->ptr |= rmd->no_translate ?
+   MMU_PT_UPDATE_NO_TRANSLATE :
+   MMU_NORMAL_PT_UPDATE;
rmd->mmu_update->val = pte_val_ma(pte);
rmd->mmu_update++;
 
return 0;
 }
 
-static int do_remap_gfn(struct vm_area_struct *vma,
+static int do_remap_pfn(struct vm_area_struct *vma,
unsigned long addr,
-   xen_pfn_t *gfn, int nr,
+   xen_pfn_t *pfn, int nr,
int *err_ptr, pgprot_t prot,
-   unsigned domid,
+   unsigned int domid,
+   bool no_translate,
struct page **pages)
 {
int err = 0;
@@ -106,11 +111,12 @@ static int do_remap_gfn(struct vm_area_struct *vma,
 
BUG_ON(!((vma->vm_flags & (VM_PFNMAP | VM_IO)) == (VM_PFNMAP | VM_IO)));
 
-   rmd.mfn = gfn;
+   rmd.pfn = pfn;
rmd.prot = prot;
/* We use the err_ptr to indicate if there we are doing a contiguous
 * mapping or a discontigious mapping. */
rmd.contiguous = !err_ptr;
+   rmd.no_translate = no_translate;
 
while (nr) {
int index = 0;
@@ -121,7 +127,7 @@ static int do_remap_gfn(struct vm_area_struct *vma,
 
rmd.mmu_update = mmu_update;
err = apply_to_page_range(vma->vm_mm, addr, range,
- remap_area_mfn_pte_fn, );
+ remap_area_pfn_pte_fn, );
if (err)
goto out;
 
@@ -175,7 +181,8 @@ int xen_remap_domain_gfn_range(struct vm_area_struct *vma,
if (xen_feature(XENFEAT_auto_translated_physmap))
return -EOPNOTSUPP;
 
-   return do_remap_gfn(vma, addr, , nr, NULL, prot, domid, pages);
+   return do_remap_pfn(vma, addr, , nr, NULL, prot, domid, false,
+   pages);
 }
 EXPORT_SYMBOL_GPL(xen_remap_domain_gfn_range);
 
@@ -183,7 +190,7 @@ int xen_remap_domain_gfn_array(struct vm_area_struct *vma,
   unsigned long addr,
   xen_pfn_t *gfn, int nr,
   int *err_ptr, pgprot_t prot,
-  unsigned domid, struct page **pages)
+  unsigned int domid, struct page **pages)
 {

[PATCH v2] xen/privcmd: add IOCTL_PRIVCMD_MMAP_RESOURCE

2018-04-05 Thread Paul Durrant

My recent Xen patch series introduces a new HYPERVISOR_memory_op to
support direct priv-mapping of certain guest resources (such as ioreq
pages, used by emulators) by a tools domain, rather than having to access
such resources via the guest P2M.

This patch adds the necessary infrastructure to the privcmd driver and
Xen MMU code to support direct resource mapping.

NOTE: The adjustment in the MMU code is partially cosmetic. Xen will now
  allow a PV tools domain to map guest pages either by GFN or MFN, thus
  the term 'mfn' has been swapped for 'pfn' in the lower layers of the
  remap code.

Signed-off-by: Paul Durrant 
---
Cc: Boris Ostrovsky 
Cc: Juergen Gross 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 

v2:
 - Fix bug when mapping multiple pages of a resource
---
 arch/x86/xen/mmu.c |  50 +++-
 drivers/xen/privcmd.c  | 130 +
 include/uapi/xen/privcmd.h |  11 
 include/xen/interface/memory.h |  66 +
 include/xen/interface/xen.h|   7 ++-
 include/xen/xen-ops.h  |  24 +++-
 6 files changed, 270 insertions(+), 18 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index d33e7dbe3129..8453d7be415c 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -65,37 +65,42 @@ static void xen_flush_tlb_all(void)
 #define REMAP_BATCH_SIZE 16
 
 struct remap_data {
-   xen_pfn_t *mfn;
+   xen_pfn_t *pfn;
bool contiguous;
+   bool no_translate;
pgprot_t prot;
struct mmu_update *mmu_update;
 };
 
-static int remap_area_mfn_pte_fn(pte_t *ptep, pgtable_t token,
+static int remap_area_pfn_pte_fn(pte_t *ptep, pgtable_t token,
 unsigned long addr, void *data)
 {
struct remap_data *rmd = data;
-   pte_t pte = pte_mkspecial(mfn_pte(*rmd->mfn, rmd->prot));
+   pte_t pte = pte_mkspecial(mfn_pte(*rmd->pfn, rmd->prot));
 
/* If we have a contiguous range, just update the mfn itself,
   else update pointer to be "next mfn". */
if (rmd->contiguous)
-   (*rmd->mfn)++;
+   (*rmd->pfn)++;
else
-   rmd->mfn++;
+   rmd->pfn++;
 
-   rmd->mmu_update->ptr = virt_to_machine(ptep).maddr | 
MMU_NORMAL_PT_UPDATE;
+   rmd->mmu_update->ptr = virt_to_machine(ptep).maddr;
+   rmd->mmu_update->ptr |= rmd->no_translate ?
+   MMU_PT_UPDATE_NO_TRANSLATE :
+   MMU_NORMAL_PT_UPDATE;
rmd->mmu_update->val = pte_val_ma(pte);
rmd->mmu_update++;
 
return 0;
 }
 
-static int do_remap_gfn(struct vm_area_struct *vma,
+static int do_remap_pfn(struct vm_area_struct *vma,
unsigned long addr,
-   xen_pfn_t *gfn, int nr,
+   xen_pfn_t *pfn, int nr,
int *err_ptr, pgprot_t prot,
-   unsigned domid,
+   unsigned int domid,
+   bool no_translate,
struct page **pages)
 {
int err = 0;
@@ -106,11 +111,12 @@ static int do_remap_gfn(struct vm_area_struct *vma,
 
BUG_ON(!((vma->vm_flags & (VM_PFNMAP | VM_IO)) == (VM_PFNMAP | VM_IO)));
 
-   rmd.mfn = gfn;
+   rmd.pfn = pfn;
rmd.prot = prot;
/* We use the err_ptr to indicate if there we are doing a contiguous
 * mapping or a discontigious mapping. */
rmd.contiguous = !err_ptr;
+   rmd.no_translate = no_translate;
 
while (nr) {
int index = 0;
@@ -121,7 +127,7 @@ static int do_remap_gfn(struct vm_area_struct *vma,
 
rmd.mmu_update = mmu_update;
err = apply_to_page_range(vma->vm_mm, addr, range,
- remap_area_mfn_pte_fn, );
+ remap_area_pfn_pte_fn, );
if (err)
goto out;
 
@@ -175,7 +181,8 @@ int xen_remap_domain_gfn_range(struct vm_area_struct *vma,
if (xen_feature(XENFEAT_auto_translated_physmap))
return -EOPNOTSUPP;
 
-   return do_remap_gfn(vma, addr, , nr, NULL, prot, domid, pages);
+   return do_remap_pfn(vma, addr, , nr, NULL, prot, domid, false,
+   pages);
 }
 EXPORT_SYMBOL_GPL(xen_remap_domain_gfn_range);
 
@@ -183,7 +190,7 @@ int xen_remap_domain_gfn_array(struct vm_area_struct *vma,
   unsigned long addr,
   xen_pfn_t *gfn, int nr,
   int *err_ptr, pgprot_t prot,
-  unsigned domid, struct page **pages)
+  unsigned int domid, struct page **pages)
 {
if (xen_feature(XENFEAT_auto_translated_physmap))
return xen_xlate_remap_gfn_array(vma, addr, gfn, nr, err_ptr,
@@

RE: [PATCH] xen/privcmd: add IOCTL_PRIVCMD_MMAP_RESOURCE

2018-04-05 Thread Paul Durrant

> -Original Message-
> From: Paul Durrant [mailto:paul.durr...@citrix.com]
> Sent: 05 April 2018 10:32
> To: xen-de...@lists.xenproject.org; linux-kernel@vger.kernel.org;
> x...@kernel.org
> Cc: Paul Durrant <paul.durr...@citrix.com>; Boris Ostrovsky
> <boris.ostrov...@oracle.com>; Juergen Gross <jgr...@suse.com>; Thomas
> Gleixner <t...@linutronix.de>; Ingo Molnar <mi...@redhat.com>
> Subject: [PATCH] xen/privcmd: add IOCTL_PRIVCMD_MMAP_RESOURCE
> 
> My recent Xen patch series introduces a new HYPERVISOR_memory_op to
> support direct priv-mapping of certain guest resources (such as ioreq
> pages, used by emulators) by a tools domain, rather than having to access
> such resources via the guest P2M.
> 
> This patch adds the necessary infrastructure to the privcmd driver and
> Xen MMU code to support direct resource mapping.
> 
> NOTE: The adjustment in the MMU code is partially cosmetic. Xen will now
>   allow a PV tools domain to map guest pages either by GFN or MFN, thus
>   the term 'gfn' has been swapped for 'pfn' in the lower layers of the
>   remap code.
> 
> Signed-off-by: Paul Durrant <paul.durr...@citrix.com>

Unfortunately I have just found a bug in this patch when it comes to mapping 
multiple frames. I will send a v2 shortly. Apologies for the noise.

  Paul

> ---
> Cc: Boris Ostrovsky <boris.ostrov...@oracle.com>
> Cc: Juergen Gross <jgr...@suse.com>
> Cc: Thomas Gleixner <t...@linutronix.de>
> Cc: Ingo Molnar <mi...@redhat.com>
> ---
>  arch/x86/xen/mmu.c |  50 -
>  drivers/xen/privcmd.c  | 119
> +
>  include/uapi/xen/privcmd.h |  11 
>  include/xen/interface/memory.h |  67 +++
>  include/xen/interface/xen.h|   7 +--
>  include/xen/xen-ops.h  |  24 -
>  6 files changed, 260 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> index d33e7dbe3129..8453d7be415c 100644
> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -65,37 +65,42 @@ static void xen_flush_tlb_all(void)
>  #define REMAP_BATCH_SIZE 16
> 
>  struct remap_data {
> - xen_pfn_t *mfn;
> + xen_pfn_t *pfn;
>   bool contiguous;
> + bool no_translate;
>   pgprot_t prot;
>   struct mmu_update *mmu_update;
>  };
> 
> -static int remap_area_mfn_pte_fn(pte_t *ptep, pgtable_t token,
> +static int remap_area_pfn_pte_fn(pte_t *ptep, pgtable_t token,
>unsigned long addr, void *data)
>  {
>   struct remap_data *rmd = data;
> - pte_t pte = pte_mkspecial(mfn_pte(*rmd->mfn, rmd->prot));
> + pte_t pte = pte_mkspecial(mfn_pte(*rmd->pfn, rmd->prot));
> 
>   /* If we have a contiguous range, just update the mfn itself,
>  else update pointer to be "next mfn". */
>   if (rmd->contiguous)
> - (*rmd->mfn)++;
> + (*rmd->pfn)++;
>   else
> - rmd->mfn++;
> + rmd->pfn++;
> 
> - rmd->mmu_update->ptr = virt_to_machine(ptep).maddr |
> MMU_NORMAL_PT_UPDATE;
> + rmd->mmu_update->ptr = virt_to_machine(ptep).maddr;
> + rmd->mmu_update->ptr |= rmd->no_translate ?
> + MMU_PT_UPDATE_NO_TRANSLATE :
> + MMU_NORMAL_PT_UPDATE;
>   rmd->mmu_update->val = pte_val_ma(pte);
>   rmd->mmu_update++;
> 
>   return 0;
>  }
> 
> -static int do_remap_gfn(struct vm_area_struct *vma,
> +static int do_remap_pfn(struct vm_area_struct *vma,
>   unsigned long addr,
> - xen_pfn_t *gfn, int nr,
> + xen_pfn_t *pfn, int nr,
>   int *err_ptr, pgprot_t prot,
> - unsigned domid,
> + unsigned int domid,
> + bool no_translate,
>   struct page **pages)
>  {
>   int err = 0;
> @@ -106,11 +111,12 @@ static int do_remap_gfn(struct vm_area_struct
> *vma,
> 
>   BUG_ON(!((vma->vm_flags & (VM_PFNMAP | VM_IO)) ==
> (VM_PFNMAP | VM_IO)));
> 
> - rmd.mfn = gfn;
> + rmd.pfn = pfn;
>   rmd.prot = prot;
>   /* We use the err_ptr to indicate if there we are doing a contiguous
>* mapping or a discontigious mapping. */
>   rmd.contiguous = !err_ptr;
> + rmd.no_translate = no_translate;
> 
>   while (nr) {
>   int index = 0;
> @@ -121,7 +127,7 @@ static int do_remap_gfn(struct vm_area_struct *vma,
> 
>   rmd.mmu_update = mmu_update;
>   er

RE: [PATCH] xen/privcmd: add IOCTL_PRIVCMD_MMAP_RESOURCE

2018-04-05 Thread Paul Durrant

> -Original Message-
> From: Paul Durrant [mailto:paul.durr...@citrix.com]
> Sent: 05 April 2018 10:32
> To: xen-de...@lists.xenproject.org; linux-kernel@vger.kernel.org;
> x...@kernel.org
> Cc: Paul Durrant ; Boris Ostrovsky
> ; Juergen Gross ; Thomas
> Gleixner ; Ingo Molnar 
> Subject: [PATCH] xen/privcmd: add IOCTL_PRIVCMD_MMAP_RESOURCE
> 
> My recent Xen patch series introduces a new HYPERVISOR_memory_op to
> support direct priv-mapping of certain guest resources (such as ioreq
> pages, used by emulators) by a tools domain, rather than having to access
> such resources via the guest P2M.
> 
> This patch adds the necessary infrastructure to the privcmd driver and
> Xen MMU code to support direct resource mapping.
> 
> NOTE: The adjustment in the MMU code is partially cosmetic. Xen will now
>   allow a PV tools domain to map guest pages either by GFN or MFN, thus
>   the term 'gfn' has been swapped for 'pfn' in the lower layers of the
>   remap code.
> 
> Signed-off-by: Paul Durrant 

Unfortunately I have just found a bug in this patch when it comes to mapping 
multiple frames. I will send a v2 shortly. Apologies for the noise.

  Paul

> ---
> Cc: Boris Ostrovsky 
> Cc: Juergen Gross 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> ---
>  arch/x86/xen/mmu.c |  50 -
>  drivers/xen/privcmd.c  | 119
> +
>  include/uapi/xen/privcmd.h |  11 
>  include/xen/interface/memory.h |  67 +++
>  include/xen/interface/xen.h|   7 +--
>  include/xen/xen-ops.h  |  24 -
>  6 files changed, 260 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> index d33e7dbe3129..8453d7be415c 100644
> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -65,37 +65,42 @@ static void xen_flush_tlb_all(void)
>  #define REMAP_BATCH_SIZE 16
> 
>  struct remap_data {
> - xen_pfn_t *mfn;
> + xen_pfn_t *pfn;
>   bool contiguous;
> + bool no_translate;
>   pgprot_t prot;
>   struct mmu_update *mmu_update;
>  };
> 
> -static int remap_area_mfn_pte_fn(pte_t *ptep, pgtable_t token,
> +static int remap_area_pfn_pte_fn(pte_t *ptep, pgtable_t token,
>unsigned long addr, void *data)
>  {
>   struct remap_data *rmd = data;
> - pte_t pte = pte_mkspecial(mfn_pte(*rmd->mfn, rmd->prot));
> + pte_t pte = pte_mkspecial(mfn_pte(*rmd->pfn, rmd->prot));
> 
>   /* If we have a contiguous range, just update the mfn itself,
>  else update pointer to be "next mfn". */
>   if (rmd->contiguous)
> - (*rmd->mfn)++;
> + (*rmd->pfn)++;
>   else
> - rmd->mfn++;
> + rmd->pfn++;
> 
> - rmd->mmu_update->ptr = virt_to_machine(ptep).maddr |
> MMU_NORMAL_PT_UPDATE;
> + rmd->mmu_update->ptr = virt_to_machine(ptep).maddr;
> + rmd->mmu_update->ptr |= rmd->no_translate ?
> + MMU_PT_UPDATE_NO_TRANSLATE :
> + MMU_NORMAL_PT_UPDATE;
>   rmd->mmu_update->val = pte_val_ma(pte);
>   rmd->mmu_update++;
> 
>   return 0;
>  }
> 
> -static int do_remap_gfn(struct vm_area_struct *vma,
> +static int do_remap_pfn(struct vm_area_struct *vma,
>   unsigned long addr,
> - xen_pfn_t *gfn, int nr,
> + xen_pfn_t *pfn, int nr,
>   int *err_ptr, pgprot_t prot,
> - unsigned domid,
> + unsigned int domid,
> + bool no_translate,
>   struct page **pages)
>  {
>   int err = 0;
> @@ -106,11 +111,12 @@ static int do_remap_gfn(struct vm_area_struct
> *vma,
> 
>   BUG_ON(!((vma->vm_flags & (VM_PFNMAP | VM_IO)) ==
> (VM_PFNMAP | VM_IO)));
> 
> - rmd.mfn = gfn;
> + rmd.pfn = pfn;
>   rmd.prot = prot;
>   /* We use the err_ptr to indicate if there we are doing a contiguous
>* mapping or a discontigious mapping. */
>   rmd.contiguous = !err_ptr;
> + rmd.no_translate = no_translate;
> 
>   while (nr) {
>   int index = 0;
> @@ -121,7 +127,7 @@ static int do_remap_gfn(struct vm_area_struct *vma,
> 
>   rmd.mmu_update = mmu_update;
>   err = apply_to_page_range(vma->vm_mm, addr, range,
> -   remap_area_mfn_pte_fn, );
> +   remap_area_pfn_pte_fn, );
>   if (err)
>   goto out;
> 
>

[PATCH] xen/privcmd: add IOCTL_PRIVCMD_MMAP_RESOURCE

2018-04-05 Thread Paul Durrant

My recent Xen patch series introduces a new HYPERVISOR_memory_op to
support direct priv-mapping of certain guest resources (such as ioreq
pages, used by emulators) by a tools domain, rather than having to access
such resources via the guest P2M.

This patch adds the necessary infrastructure to the privcmd driver and
Xen MMU code to support direct resource mapping.

NOTE: The adjustment in the MMU code is partially cosmetic. Xen will now
  allow a PV tools domain to map guest pages either by GFN or MFN, thus
  the term 'gfn' has been swapped for 'pfn' in the lower layers of the
  remap code.

Signed-off-by: Paul Durrant <paul.durr...@citrix.com>
---
Cc: Boris Ostrovsky <boris.ostrov...@oracle.com>
Cc: Juergen Gross <jgr...@suse.com>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Ingo Molnar <mi...@redhat.com>
---
 arch/x86/xen/mmu.c |  50 -
 drivers/xen/privcmd.c  | 119 +
 include/uapi/xen/privcmd.h |  11 
 include/xen/interface/memory.h |  67 +++
 include/xen/interface/xen.h|   7 +--
 include/xen/xen-ops.h  |  24 -
 6 files changed, 260 insertions(+), 18 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index d33e7dbe3129..8453d7be415c 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -65,37 +65,42 @@ static void xen_flush_tlb_all(void)
 #define REMAP_BATCH_SIZE 16
 
 struct remap_data {
-   xen_pfn_t *mfn;
+   xen_pfn_t *pfn;
bool contiguous;
+   bool no_translate;
pgprot_t prot;
struct mmu_update *mmu_update;
 };
 
-static int remap_area_mfn_pte_fn(pte_t *ptep, pgtable_t token,
+static int remap_area_pfn_pte_fn(pte_t *ptep, pgtable_t token,
 unsigned long addr, void *data)
 {
struct remap_data *rmd = data;
-   pte_t pte = pte_mkspecial(mfn_pte(*rmd->mfn, rmd->prot));
+   pte_t pte = pte_mkspecial(mfn_pte(*rmd->pfn, rmd->prot));
 
/* If we have a contiguous range, just update the mfn itself,
   else update pointer to be "next mfn". */
if (rmd->contiguous)
-   (*rmd->mfn)++;
+   (*rmd->pfn)++;
else
-   rmd->mfn++;
+   rmd->pfn++;
 
-   rmd->mmu_update->ptr = virt_to_machine(ptep).maddr | 
MMU_NORMAL_PT_UPDATE;
+   rmd->mmu_update->ptr = virt_to_machine(ptep).maddr;
+   rmd->mmu_update->ptr |= rmd->no_translate ?
+   MMU_PT_UPDATE_NO_TRANSLATE :
+   MMU_NORMAL_PT_UPDATE;
rmd->mmu_update->val = pte_val_ma(pte);
rmd->mmu_update++;
 
return 0;
 }
 
-static int do_remap_gfn(struct vm_area_struct *vma,
+static int do_remap_pfn(struct vm_area_struct *vma,
unsigned long addr,
-   xen_pfn_t *gfn, int nr,
+   xen_pfn_t *pfn, int nr,
int *err_ptr, pgprot_t prot,
-   unsigned domid,
+   unsigned int domid,
+   bool no_translate,
struct page **pages)
 {
int err = 0;
@@ -106,11 +111,12 @@ static int do_remap_gfn(struct vm_area_struct *vma,
 
BUG_ON(!((vma->vm_flags & (VM_PFNMAP | VM_IO)) == (VM_PFNMAP | VM_IO)));
 
-   rmd.mfn = gfn;
+   rmd.pfn = pfn;
rmd.prot = prot;
/* We use the err_ptr to indicate if there we are doing a contiguous
 * mapping or a discontigious mapping. */
rmd.contiguous = !err_ptr;
+   rmd.no_translate = no_translate;
 
while (nr) {
int index = 0;
@@ -121,7 +127,7 @@ static int do_remap_gfn(struct vm_area_struct *vma,
 
rmd.mmu_update = mmu_update;
err = apply_to_page_range(vma->vm_mm, addr, range,
- remap_area_mfn_pte_fn, );
+ remap_area_pfn_pte_fn, );
if (err)
goto out;
 
@@ -175,7 +181,8 @@ int xen_remap_domain_gfn_range(struct vm_area_struct *vma,
if (xen_feature(XENFEAT_auto_translated_physmap))
return -EOPNOTSUPP;
 
-   return do_remap_gfn(vma, addr, , nr, NULL, prot, domid, pages);
+   return do_remap_pfn(vma, addr, , nr, NULL, prot, domid, false,
+   pages);
 }
 EXPORT_SYMBOL_GPL(xen_remap_domain_gfn_range);
 
@@ -183,7 +190,7 @@ int xen_remap_domain_gfn_array(struct vm_area_struct *vma,
   unsigned long addr,
   xen_pfn_t *gfn, int nr,
   int *err_ptr, pgprot_t prot,
-  unsigned domid, struct page **pages)
+  unsigned int domid, struct page **pages)
 {
if (xen_feature(XENFEAT_auto_translated_physma

[PATCH] xen/privcmd: add IOCTL_PRIVCMD_MMAP_RESOURCE

2018-04-05 Thread Paul Durrant

My recent Xen patch series introduces a new HYPERVISOR_memory_op to
support direct priv-mapping of certain guest resources (such as ioreq
pages, used by emulators) by a tools domain, rather than having to access
such resources via the guest P2M.

This patch adds the necessary infrastructure to the privcmd driver and
Xen MMU code to support direct resource mapping.

NOTE: The adjustment in the MMU code is partially cosmetic. Xen will now
  allow a PV tools domain to map guest pages either by GFN or MFN, thus
  the term 'gfn' has been swapped for 'pfn' in the lower layers of the
  remap code.

Signed-off-by: Paul Durrant 
---
Cc: Boris Ostrovsky 
Cc: Juergen Gross 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
---
 arch/x86/xen/mmu.c |  50 -
 drivers/xen/privcmd.c  | 119 +
 include/uapi/xen/privcmd.h |  11 
 include/xen/interface/memory.h |  67 +++
 include/xen/interface/xen.h|   7 +--
 include/xen/xen-ops.h  |  24 -
 6 files changed, 260 insertions(+), 18 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index d33e7dbe3129..8453d7be415c 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -65,37 +65,42 @@ static void xen_flush_tlb_all(void)
 #define REMAP_BATCH_SIZE 16
 
 struct remap_data {
-   xen_pfn_t *mfn;
+   xen_pfn_t *pfn;
bool contiguous;
+   bool no_translate;
pgprot_t prot;
struct mmu_update *mmu_update;
 };
 
-static int remap_area_mfn_pte_fn(pte_t *ptep, pgtable_t token,
+static int remap_area_pfn_pte_fn(pte_t *ptep, pgtable_t token,
 unsigned long addr, void *data)
 {
struct remap_data *rmd = data;
-   pte_t pte = pte_mkspecial(mfn_pte(*rmd->mfn, rmd->prot));
+   pte_t pte = pte_mkspecial(mfn_pte(*rmd->pfn, rmd->prot));
 
/* If we have a contiguous range, just update the mfn itself,
   else update pointer to be "next mfn". */
if (rmd->contiguous)
-   (*rmd->mfn)++;
+   (*rmd->pfn)++;
else
-   rmd->mfn++;
+   rmd->pfn++;
 
-   rmd->mmu_update->ptr = virt_to_machine(ptep).maddr | 
MMU_NORMAL_PT_UPDATE;
+   rmd->mmu_update->ptr = virt_to_machine(ptep).maddr;
+   rmd->mmu_update->ptr |= rmd->no_translate ?
+   MMU_PT_UPDATE_NO_TRANSLATE :
+   MMU_NORMAL_PT_UPDATE;
rmd->mmu_update->val = pte_val_ma(pte);
rmd->mmu_update++;
 
return 0;
 }
 
-static int do_remap_gfn(struct vm_area_struct *vma,
+static int do_remap_pfn(struct vm_area_struct *vma,
unsigned long addr,
-   xen_pfn_t *gfn, int nr,
+   xen_pfn_t *pfn, int nr,
int *err_ptr, pgprot_t prot,
-   unsigned domid,
+   unsigned int domid,
+   bool no_translate,
struct page **pages)
 {
int err = 0;
@@ -106,11 +111,12 @@ static int do_remap_gfn(struct vm_area_struct *vma,
 
BUG_ON(!((vma->vm_flags & (VM_PFNMAP | VM_IO)) == (VM_PFNMAP | VM_IO)));
 
-   rmd.mfn = gfn;
+   rmd.pfn = pfn;
rmd.prot = prot;
/* We use the err_ptr to indicate if there we are doing a contiguous
 * mapping or a discontigious mapping. */
rmd.contiguous = !err_ptr;
+   rmd.no_translate = no_translate;
 
while (nr) {
int index = 0;
@@ -121,7 +127,7 @@ static int do_remap_gfn(struct vm_area_struct *vma,
 
rmd.mmu_update = mmu_update;
err = apply_to_page_range(vma->vm_mm, addr, range,
- remap_area_mfn_pte_fn, );
+ remap_area_pfn_pte_fn, );
if (err)
goto out;
 
@@ -175,7 +181,8 @@ int xen_remap_domain_gfn_range(struct vm_area_struct *vma,
if (xen_feature(XENFEAT_auto_translated_physmap))
return -EOPNOTSUPP;
 
-   return do_remap_gfn(vma, addr, , nr, NULL, prot, domid, pages);
+   return do_remap_pfn(vma, addr, , nr, NULL, prot, domid, false,
+   pages);
 }
 EXPORT_SYMBOL_GPL(xen_remap_domain_gfn_range);
 
@@ -183,7 +190,7 @@ int xen_remap_domain_gfn_array(struct vm_area_struct *vma,
   unsigned long addr,
   xen_pfn_t *gfn, int nr,
   int *err_ptr, pgprot_t prot,
-  unsigned domid, struct page **pages)
+  unsigned int domid, struct page **pages)
 {
if (xen_feature(XENFEAT_auto_translated_physmap))
return xen_xlate_remap_gfn_array(vma, addr, gfn, nr, err_ptr,
@@ -194,10 +201,25 @@ int xen_remap_domain_gfn_a

RE: [PATCH 1/1] xen-netback: process malformed sk_buff correctly to avoid BUG_ON()

2018-03-28 Thread Paul Durrant

> -Original Message-
> From: Dongli Zhang [mailto:dongli.zh...@oracle.com]
> Sent: 28 March 2018 00:42
> To: xen-de...@lists.xenproject.org; linux-kernel@vger.kernel.org
> Cc: net...@vger.kernel.org; Wei Liu <wei.l...@citrix.com>; Paul Durrant
> <paul.durr...@citrix.com>
> Subject: [PATCH 1/1] xen-netback: process malformed sk_buff correctly to
> avoid BUG_ON()
> 
> The "BUG_ON(!frag_iter)" in function xenvif_rx_next_chunk() is triggered if
> the received sk_buff is malformed, that is, when the sk_buff has pattern
> (skb->data_len && !skb_shinfo(skb)->nr_frags). Below is a sample call
> stack:
> 
> [  438.652658] [ cut here ]
> [  438.652660] kernel BUG at drivers/net/xen-netback/rx.c:325!
> [  438.652714] invalid opcode:  [#1] SMP NOPTI
> [  438.652813] CPU: 0 PID: 2492 Comm: vif1.0-q0-guest Tainted: G   O
> 4.16.0-rc6+ #1
> [  438.652896] RIP: e030:xenvif_rx_skb+0x3c2/0x5e0 [xen_netback]
> [  438.652926] RSP: e02b:c90040877dc8 EFLAGS: 00010246
> [  438.652956] RAX: 0160 RBX: 0022 RCX:
> 0001
> [  438.652993] RDX: c900402890d0 RSI:  RDI:
> c90040889000
> [  438.653029] RBP: 88002b460040 R08: c90040877de0 R09:
> 0100
> [  438.653065] R10: 7ff0 R11: 0002 R12:
> c90040889000
> [  438.653100] R13: 8000 R14: 0022 R15:
> 8000
> [  438.653149] FS:  7f15603778c0() GS:88003040()
> knlGS:
> [  438.653188] CS:  e033 DS:  ES:  CR0: 80050033
> [  438.653219] CR2: 01832a08 CR3: 29c12000 CR4:
> 00042660
> [  438.653262] Call Trace:
> [  438.653284]  ? xen_hypercall_event_channel_op+0xa/0x20
> [  438.653313]  xenvif_rx_action+0x41/0x80 [xen_netback]
> [  438.653341]  xenvif_kthread_guest_rx+0xb2/0x2a8 [xen_netback]
> [  438.653374]  ? __schedule+0x352/0x700
> [  438.653398]  ? wait_woken+0x80/0x80
> [  438.653421]  kthread+0xf3/0x130
> [  438.653442]  ? xenvif_rx_action+0x80/0x80 [xen_netback]
> [  438.653470]  ? kthread_destroy_worker+0x40/0x40
> [  438.653497]  ret_from_fork+0x35/0x40
> 
> The issue is hit by xen-netback when there is bug with other networking
> interface (e.g., dom0 physical NIC), who has generated and forwarded
> malformed sk_buff to dom0 vifX.Y. It is possible to reproduce the issue on
> purpose with below sample code in a kernel module:
> 
> skb->dev = dev; // dev of vifX.Y
> skb->len = 386;
> skb->data_len = 352;
> skb->tail = 98;
> skb->end = 384;
> dev->netdev_ops->ndo_start_xmit(skb, dev);
> 
> This patch stops processing sk_buff immediately if it is detected as
> malformed, that is, pkt->frag_iter is NULL but there is still remaining
> pkt->remaining_len.
> 
> Signed-off-by: Dongli Zhang <dongli.zh...@oracle.com>
> ---
>  drivers/net/xen-netback/rx.c | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/net/xen-netback/rx.c b/drivers/net/xen-netback/rx.c
> index b1cf7c6..289cc82 100644
> --- a/drivers/net/xen-netback/rx.c
> +++ b/drivers/net/xen-netback/rx.c
> @@ -369,6 +369,14 @@ static void xenvif_rx_data_slot(struct xenvif_queue
> *queue,
>   offset += len;
>   pkt->remaining_len -= len;
> 
> + if (unlikely(!pkt->frag_iter && pkt->remaining_len)) {
> + pkt->remaining_len = 0;
> + pkt->extra_count = 0;
> + pr_err_ratelimited("malformed sk_buff at %s\n",
> +queue->name);
> + break;
> + }
> +

This looks fine, but I think it would also be good to indicate the error to the 
frontend by setting rsp->status below. That should cause the frontend to bin 
the packet.

  Paul

>   } while (offset < XEN_PAGE_SIZE && pkt->remaining_len > 0);
> 
>   if (pkt->remaining_len > 0)
> --
> 2.7.4

RE: [PATCH 1/1] xen-netback: process malformed sk_buff correctly to avoid BUG_ON()

2018-03-28 Thread Paul Durrant

> -Original Message-
> From: Dongli Zhang [mailto:dongli.zh...@oracle.com]
> Sent: 28 March 2018 00:42
> To: xen-de...@lists.xenproject.org; linux-kernel@vger.kernel.org
> Cc: net...@vger.kernel.org; Wei Liu ; Paul Durrant
> 
> Subject: [PATCH 1/1] xen-netback: process malformed sk_buff correctly to
> avoid BUG_ON()
> 
> The "BUG_ON(!frag_iter)" in function xenvif_rx_next_chunk() is triggered if
> the received sk_buff is malformed, that is, when the sk_buff has pattern
> (skb->data_len && !skb_shinfo(skb)->nr_frags). Below is a sample call
> stack:
> 
> [  438.652658] [ cut here ]
> [  438.652660] kernel BUG at drivers/net/xen-netback/rx.c:325!
> [  438.652714] invalid opcode:  [#1] SMP NOPTI
> [  438.652813] CPU: 0 PID: 2492 Comm: vif1.0-q0-guest Tainted: G   O
> 4.16.0-rc6+ #1
> [  438.652896] RIP: e030:xenvif_rx_skb+0x3c2/0x5e0 [xen_netback]
> [  438.652926] RSP: e02b:c90040877dc8 EFLAGS: 00010246
> [  438.652956] RAX: 0160 RBX: 0022 RCX:
> 0001
> [  438.652993] RDX: c900402890d0 RSI:  RDI:
> c90040889000
> [  438.653029] RBP: 88002b460040 R08: c90040877de0 R09:
> 0100
> [  438.653065] R10: 7ff0 R11: 0002 R12:
> c90040889000
> [  438.653100] R13: 8000 R14: 0022 R15:
> 8000
> [  438.653149] FS:  7f15603778c0() GS:88003040()
> knlGS:
> [  438.653188] CS:  e033 DS:  ES:  CR0: 80050033
> [  438.653219] CR2: 01832a08 CR3: 29c12000 CR4:
> 00042660
> [  438.653262] Call Trace:
> [  438.653284]  ? xen_hypercall_event_channel_op+0xa/0x20
> [  438.653313]  xenvif_rx_action+0x41/0x80 [xen_netback]
> [  438.653341]  xenvif_kthread_guest_rx+0xb2/0x2a8 [xen_netback]
> [  438.653374]  ? __schedule+0x352/0x700
> [  438.653398]  ? wait_woken+0x80/0x80
> [  438.653421]  kthread+0xf3/0x130
> [  438.653442]  ? xenvif_rx_action+0x80/0x80 [xen_netback]
> [  438.653470]  ? kthread_destroy_worker+0x40/0x40
> [  438.653497]  ret_from_fork+0x35/0x40
> 
> The issue is hit by xen-netback when there is bug with other networking
> interface (e.g., dom0 physical NIC), who has generated and forwarded
> malformed sk_buff to dom0 vifX.Y. It is possible to reproduce the issue on
> purpose with below sample code in a kernel module:
> 
> skb->dev = dev; // dev of vifX.Y
> skb->len = 386;
> skb->data_len = 352;
> skb->tail = 98;
> skb->end = 384;
> dev->netdev_ops->ndo_start_xmit(skb, dev);
> 
> This patch stops processing sk_buff immediately if it is detected as
> malformed, that is, pkt->frag_iter is NULL but there is still remaining
> pkt->remaining_len.
> 
> Signed-off-by: Dongli Zhang 
> ---
>  drivers/net/xen-netback/rx.c | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/net/xen-netback/rx.c b/drivers/net/xen-netback/rx.c
> index b1cf7c6..289cc82 100644
> --- a/drivers/net/xen-netback/rx.c
> +++ b/drivers/net/xen-netback/rx.c
> @@ -369,6 +369,14 @@ static void xenvif_rx_data_slot(struct xenvif_queue
> *queue,
>   offset += len;
>   pkt->remaining_len -= len;
> 
> + if (unlikely(!pkt->frag_iter && pkt->remaining_len)) {
> + pkt->remaining_len = 0;
> + pkt->extra_count = 0;
> + pr_err_ratelimited("malformed sk_buff at %s\n",
> +queue->name);
> + break;
> + }
> +

This looks fine, but I think it would also be good to indicate the error to the 
frontend by setting rsp->status below. That should cause the frontend to bin 
the packet.

  Paul

>   } while (offset < XEN_PAGE_SIZE && pkt->remaining_len > 0);
> 
>   if (pkt->remaining_len > 0)
> --
> 2.7.4

RE: [PATCH] xen-netback: Fix logging message with spurious period after newline

2017-12-06 Thread Paul Durrant

> -Original Message-
> From: Joe Perches [mailto:j...@perches.com]
> Sent: 06 December 2017 06:40
> To: Wei Liu <wei.l...@citrix.com>; Paul Durrant <paul.durr...@citrix.com>
> Cc: xen-de...@lists.xenproject.org; net...@vger.kernel.org; linux-
> ker...@vger.kernel.org
> Subject: [PATCH] xen-netback: Fix logging message with spurious period
> after newline
> 
> Using a period after a newline causes bad output.
> 
> Signed-off-by: Joe Perches <j...@perches.com>

Reviewed-by: Paul Durrant <paul.durr...@citrix.com>

> ---
>  drivers/net/xen-netback/interface.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-
> netback/interface.c
> index d6dff347f896..78ebe494fef0 100644
> --- a/drivers/net/xen-netback/interface.c
> +++ b/drivers/net/xen-netback/interface.c
> @@ -186,7 +186,7 @@ static int xenvif_start_xmit(struct sk_buff *skb, struct
> net_device *dev)
>   /* Obtain the queue to be used to transmit this packet */
>   index = skb_get_queue_mapping(skb);
>   if (index >= num_queues) {
> - pr_warn_ratelimited("Invalid queue %hu for packet on
> interface %s\n.",
> + pr_warn_ratelimited("Invalid queue %hu for packet on
> interface %s\n",
>   index, vif->dev->name);
>   index %= num_queues;
>   }
> --
> 2.15.0

RE: [PATCH] xen-netback: Fix logging message with spurious period after newline

2017-12-06 Thread Paul Durrant

> -Original Message-
> From: Joe Perches [mailto:j...@perches.com]
> Sent: 06 December 2017 06:40
> To: Wei Liu ; Paul Durrant 
> Cc: xen-de...@lists.xenproject.org; net...@vger.kernel.org; linux-
> ker...@vger.kernel.org
> Subject: [PATCH] xen-netback: Fix logging message with spurious period
> after newline
> 
> Using a period after a newline causes bad output.
> 
> Signed-off-by: Joe Perches 

Reviewed-by: Paul Durrant 

> ---
>  drivers/net/xen-netback/interface.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-
> netback/interface.c
> index d6dff347f896..78ebe494fef0 100644
> --- a/drivers/net/xen-netback/interface.c
> +++ b/drivers/net/xen-netback/interface.c
> @@ -186,7 +186,7 @@ static int xenvif_start_xmit(struct sk_buff *skb, struct
> net_device *dev)
>   /* Obtain the queue to be used to transmit this packet */
>   index = skb_get_queue_mapping(skb);
>   if (index >= num_queues) {
> - pr_warn_ratelimited("Invalid queue %hu for packet on
> interface %s\n.",
> + pr_warn_ratelimited("Invalid queue %hu for packet on
> interface %s\n",
>   index, vif->dev->name);
>   index %= num_queues;
>   }
> --
> 2.15.0

RE: [PATCH] xen-netfront: remove warning when unloading module

2017-11-20 Thread Paul Durrant

> -Original Message-
> From: Eduardo Otubo [mailto:ot...@redhat.com]
> Sent: 20 November 2017 10:41
> To: xen-de...@lists.xenproject.org
> Cc: net...@vger.kernel.org; Paul Durrant <paul.durr...@citrix.com>; Wei
> Liu <wei.l...@citrix.com>; linux-kernel@vger.kernel.org;
> vkuzn...@redhat.com; cav...@redhat.com; che...@redhat.com;
> mga...@redhat.com; Eduardo Otubo <ot...@redhat.com>
> Subject: [PATCH] xen-netfront: remove warning when unloading module
> 
> When unloading module xen_netfront from guest, dmesg would output
> warning messages like below:
> 
>   [  105.236836] xen:grant_table: WARNING: g.e. 0x903 still in use!
>   [  105.236839] deferring g.e. 0x903 (pfn 0x35805)
> 
> This problem relies on netfront and netback being out of sync. By the time
> netfront revokes the g.e.'s netback didn't have enough time to free all of
> them, hence displaying the warnings on dmesg.
> 
> The trick here is to make netfront to wait until netback frees all the g.e.'s
> and only then continue to cleanup for the module removal, and this is done
> by
> manipulating both device states.
> 
> Signed-off-by: Eduardo Otubo <ot...@redhat.com>
> ---
>  drivers/net/xen-netfront.c | 11 +++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index 8b8689c6d887..b948e2a1ce40 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -2130,6 +2130,17 @@ static int xennet_remove(struct xenbus_device
> *dev)
> 
>   dev_dbg(>dev, "%s\n", dev->nodename);
> 
> + xenbus_switch_state(dev, XenbusStateClosing);
> + while (xenbus_read_driver_state(dev->otherend) !=
> XenbusStateClosing){
> + cpu_relax();
> + schedule();
> + }
> + xenbus_switch_state(dev, XenbusStateClosed);
> + while (dev->xenbus_state != XenbusStateClosed){
> + cpu_relax();
> + schedule();
> + }
> +

Waitiing for closing should be ok but waiting for closed is risky. As soon as a 
backend is in the closed state then a toolstack can completely remove the 
backend xenstore area, resulting a state of XenbusStateUnknown, which would 
cause your second loop to spin forever.

  Paul

>   xennet_disconnect_backend(info);
> 
>   unregister_netdev(info->netdev);
> --
> 2.13.6

RE: [PATCH] xen-netfront: remove warning when unloading module

2017-11-20 Thread Paul Durrant

> -Original Message-
> From: Eduardo Otubo [mailto:ot...@redhat.com]
> Sent: 20 November 2017 10:41
> To: xen-de...@lists.xenproject.org
> Cc: net...@vger.kernel.org; Paul Durrant ; Wei
> Liu ; linux-kernel@vger.kernel.org;
> vkuzn...@redhat.com; cav...@redhat.com; che...@redhat.com;
> mga...@redhat.com; Eduardo Otubo 
> Subject: [PATCH] xen-netfront: remove warning when unloading module
> 
> When unloading module xen_netfront from guest, dmesg would output
> warning messages like below:
> 
>   [  105.236836] xen:grant_table: WARNING: g.e. 0x903 still in use!
>   [  105.236839] deferring g.e. 0x903 (pfn 0x35805)
> 
> This problem relies on netfront and netback being out of sync. By the time
> netfront revokes the g.e.'s netback didn't have enough time to free all of
> them, hence displaying the warnings on dmesg.
> 
> The trick here is to make netfront to wait until netback frees all the g.e.'s
> and only then continue to cleanup for the module removal, and this is done
> by
> manipulating both device states.
> 
> Signed-off-by: Eduardo Otubo 
> ---
>  drivers/net/xen-netfront.c | 11 +++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
> index 8b8689c6d887..b948e2a1ce40 100644
> --- a/drivers/net/xen-netfront.c
> +++ b/drivers/net/xen-netfront.c
> @@ -2130,6 +2130,17 @@ static int xennet_remove(struct xenbus_device
> *dev)
> 
>   dev_dbg(>dev, "%s\n", dev->nodename);
> 
> + xenbus_switch_state(dev, XenbusStateClosing);
> + while (xenbus_read_driver_state(dev->otherend) !=
> XenbusStateClosing){
> + cpu_relax();
> + schedule();
> + }
> + xenbus_switch_state(dev, XenbusStateClosed);
> + while (dev->xenbus_state != XenbusStateClosed){
> + cpu_relax();
> + schedule();
> + }
> +

Waitiing for closing should be ok but waiting for closed is risky. As soon as a 
backend is in the closed state then a toolstack can completely remove the 
backend xenstore area, resulting a state of XenbusStateUnknown, which would 
cause your second loop to spin forever.

  Paul

>   xennet_disconnect_backend(info);
> 
>   unregister_netdev(info->netdev);
> --
> 2.13.6

[PATCH v4] xen: support priv-mapping in an HVM tools domain

2017-11-03 Thread Paul Durrant

If the domain has XENFEAT_auto_translated_physmap then use of the PV-
specific HYPERVISOR_mmu_update hypercall is clearly incorrect.

This patch adds checks in xen_remap_domain_gfn_array() and
xen_unmap_domain_gfn_array() which call through to the approprate
xlate_mmu function if the feature is present. A check is also added
to xen_remap_domain_gfn_range() to fail with -EOPNOTSUPP since this
should not be used in an HVM tools domain.

Signed-off-by: Paul Durrant <paul.durr...@citrix.com>
---
Cc: Boris Ostrovsky <boris.ostrov...@oracle.com>
Cc: Juergen Gross <jgr...@suse.com>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Ingo Molnar <mi...@redhat.com>
Cc: "H. Peter Anvin" <h...@zytor.com>

v4:
 - Restore v1 commit comment.

v3:
 - As v1 but with additional stubs in xen/xen-ops.h to handle
   configurations without CONFIG_XEN_AUTO_XLATE.
---
 arch/x86/xen/mmu.c| 14 --
 include/xen/xen-ops.h | 24 
 2 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 3e15345abfe7..d33e7dbe3129 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -172,6 +172,9 @@ int xen_remap_domain_gfn_range(struct vm_area_struct *vma,
   pgprot_t prot, unsigned domid,
   struct page **pages)
 {
+   if (xen_feature(XENFEAT_auto_translated_physmap))
+   return -EOPNOTSUPP;
+
return do_remap_gfn(vma, addr, , nr, NULL, prot, domid, pages);
 }
 EXPORT_SYMBOL_GPL(xen_remap_domain_gfn_range);
@@ -182,6 +185,10 @@ int xen_remap_domain_gfn_array(struct vm_area_struct *vma,
   int *err_ptr, pgprot_t prot,
   unsigned domid, struct page **pages)
 {
+   if (xen_feature(XENFEAT_auto_translated_physmap))
+   return xen_xlate_remap_gfn_array(vma, addr, gfn, nr, err_ptr,
+prot, domid, pages);
+
/* We BUG_ON because it's a programmer error to pass a NULL err_ptr,
 * and the consequences later is quite hard to detect what the actual
 * cause of "wrong memory was mapped in".
@@ -193,9 +200,12 @@ EXPORT_SYMBOL_GPL(xen_remap_domain_gfn_array);
 
 /* Returns: 0 success */
 int xen_unmap_domain_gfn_range(struct vm_area_struct *vma,
-  int numpgs, struct page **pages)
+  int nr, struct page **pages)
 {
-   if (!pages || !xen_feature(XENFEAT_auto_translated_physmap))
+   if (xen_feature(XENFEAT_auto_translated_physmap))
+   return xen_xlate_unmap_gfn_range(vma, nr, pages);
+
+   if (!pages)
return 0;
 
return -EINVAL;
diff --git a/include/xen/xen-ops.h b/include/xen/xen-ops.h
index 218e6aae5433..18b25631a113 100644
--- a/include/xen/xen-ops.h
+++ b/include/xen/xen-ops.h
@@ -103,6 +103,8 @@ int xen_remap_domain_gfn_range(struct vm_area_struct *vma,
   struct page **pages);
 int xen_unmap_domain_gfn_range(struct vm_area_struct *vma,
   int numpgs, struct page **pages);
+
+#ifdef CONFIG_XEN_AUTO_XLATE
 int xen_xlate_remap_gfn_array(struct vm_area_struct *vma,
  unsigned long addr,
  xen_pfn_t *gfn, int nr,
@@ -111,6 +113,28 @@ int xen_xlate_remap_gfn_array(struct vm_area_struct *vma,
  struct page **pages);
 int xen_xlate_unmap_gfn_range(struct vm_area_struct *vma,
  int nr, struct page **pages);
+#else
+/*
+ * These two functions are called from arch/x86/xen/mmu.c and so stubs
+ * are needed for a configuration not specifying CONFIG_XEN_AUTO_XLATE.
+ */
+static inline int xen_xlate_remap_gfn_array(struct vm_area_struct *vma,
+   unsigned long addr,
+   xen_pfn_t *gfn, int nr,
+   int *err_ptr, pgprot_t prot,
+   unsigned int domid,
+   struct page **pages)
+{
+   return -EOPNOTSUPP;
+}
+
+static inline int xen_xlate_unmap_gfn_range(struct vm_area_struct *vma,
+   int nr, struct page **pages)
+{
+   return -EOPNOTSUPP;
+}
+#endif
+
 int xen_xlate_map_ballooned_pages(xen_pfn_t **pfns, void **vaddr,
  unsigned long nr_grant_frames);
 
-- 
2.11.0

[PATCH v4] xen: support priv-mapping in an HVM tools domain

2017-11-03 Thread Paul Durrant

If the domain has XENFEAT_auto_translated_physmap then use of the PV-
specific HYPERVISOR_mmu_update hypercall is clearly incorrect.

This patch adds checks in xen_remap_domain_gfn_array() and
xen_unmap_domain_gfn_array() which call through to the approprate
xlate_mmu function if the feature is present. A check is also added
to xen_remap_domain_gfn_range() to fail with -EOPNOTSUPP since this
should not be used in an HVM tools domain.

Signed-off-by: Paul Durrant 
---
Cc: Boris Ostrovsky 
Cc: Juergen Gross 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 

v4:
 - Restore v1 commit comment.

v3:
 - As v1 but with additional stubs in xen/xen-ops.h to handle
   configurations without CONFIG_XEN_AUTO_XLATE.
---
 arch/x86/xen/mmu.c| 14 --
 include/xen/xen-ops.h | 24 
 2 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 3e15345abfe7..d33e7dbe3129 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -172,6 +172,9 @@ int xen_remap_domain_gfn_range(struct vm_area_struct *vma,
   pgprot_t prot, unsigned domid,
   struct page **pages)
 {
+   if (xen_feature(XENFEAT_auto_translated_physmap))
+   return -EOPNOTSUPP;
+
return do_remap_gfn(vma, addr, , nr, NULL, prot, domid, pages);
 }
 EXPORT_SYMBOL_GPL(xen_remap_domain_gfn_range);
@@ -182,6 +185,10 @@ int xen_remap_domain_gfn_array(struct vm_area_struct *vma,
   int *err_ptr, pgprot_t prot,
   unsigned domid, struct page **pages)
 {
+   if (xen_feature(XENFEAT_auto_translated_physmap))
+   return xen_xlate_remap_gfn_array(vma, addr, gfn, nr, err_ptr,
+prot, domid, pages);
+
/* We BUG_ON because it's a programmer error to pass a NULL err_ptr,
 * and the consequences later is quite hard to detect what the actual
 * cause of "wrong memory was mapped in".
@@ -193,9 +200,12 @@ EXPORT_SYMBOL_GPL(xen_remap_domain_gfn_array);
 
 /* Returns: 0 success */
 int xen_unmap_domain_gfn_range(struct vm_area_struct *vma,
-  int numpgs, struct page **pages)
+  int nr, struct page **pages)
 {
-   if (!pages || !xen_feature(XENFEAT_auto_translated_physmap))
+   if (xen_feature(XENFEAT_auto_translated_physmap))
+   return xen_xlate_unmap_gfn_range(vma, nr, pages);
+
+   if (!pages)
return 0;
 
return -EINVAL;
diff --git a/include/xen/xen-ops.h b/include/xen/xen-ops.h
index 218e6aae5433..18b25631a113 100644
--- a/include/xen/xen-ops.h
+++ b/include/xen/xen-ops.h
@@ -103,6 +103,8 @@ int xen_remap_domain_gfn_range(struct vm_area_struct *vma,
   struct page **pages);
 int xen_unmap_domain_gfn_range(struct vm_area_struct *vma,
   int numpgs, struct page **pages);
+
+#ifdef CONFIG_XEN_AUTO_XLATE
 int xen_xlate_remap_gfn_array(struct vm_area_struct *vma,
  unsigned long addr,
  xen_pfn_t *gfn, int nr,
@@ -111,6 +113,28 @@ int xen_xlate_remap_gfn_array(struct vm_area_struct *vma,
  struct page **pages);
 int xen_xlate_unmap_gfn_range(struct vm_area_struct *vma,
  int nr, struct page **pages);
+#else
+/*
+ * These two functions are called from arch/x86/xen/mmu.c and so stubs
+ * are needed for a configuration not specifying CONFIG_XEN_AUTO_XLATE.
+ */
+static inline int xen_xlate_remap_gfn_array(struct vm_area_struct *vma,
+   unsigned long addr,
+   xen_pfn_t *gfn, int nr,
+   int *err_ptr, pgprot_t prot,
+   unsigned int domid,
+   struct page **pages)
+{
+   return -EOPNOTSUPP;
+}
+
+static inline int xen_xlate_unmap_gfn_range(struct vm_area_struct *vma,
+   int nr, struct page **pages)
+{
+   return -EOPNOTSUPP;
+}
+#endif
+
 int xen_xlate_map_ballooned_pages(xen_pfn_t **pfns, void **vaddr,
  unsigned long nr_grant_frames);
 
-- 
2.11.0

RE: [PATCH v3] xen: support priv-mapping in an HVM tools domain

2017-11-03 Thread Paul Durrant

> -Original Message-
> From: Paul Durrant [mailto:paul.durr...@citrix.com]
> Sent: 03 November 2017 16:58
> To: x...@kernel.org; xen-de...@lists.xenproject.org; linux-
> ker...@vger.kernel.org
> Cc: Paul Durrant <paul.durr...@citrix.com>; Boris Ostrovsky
> <boris.ostrov...@oracle.com>; Juergen Gross <jgr...@suse.com>; Thomas
> Gleixner <t...@linutronix.de>; Ingo Molnar <mi...@redhat.com>; H. Peter
> Anvin <h...@zytor.com>
> Subject: [PATCH v3] xen: support priv-mapping in an HVM tools domain
> 
> If the domain has XENFEAT_auto_translated_physmap then use of the PV-
> specific HYPERVISOR_mmu_update hypercall is clearly incorrect.
> 
> This patch adds checks in xen_remap_domain_gfn_array() and
> xen_unmap_domain_gfn_array() which call through to the approprate
> xlate_mmu function if the feature is present.
> 
> This patch also moves xen_remap_domain_gfn_range() into the PV-only
> MMU
> code and #ifdefs the (only) calling code in privcmd accordingly.

 I realise now that this paragraph refers to the code in the v2 patch. 
I'll send v4.

  Paul

> 
> Signed-off-by: Paul Durrant <paul.durr...@citrix.com>
> ---
> Cc: Boris Ostrovsky <boris.ostrov...@oracle.com>
> Cc: Juergen Gross <jgr...@suse.com>
> Cc: Thomas Gleixner <t...@linutronix.de>
> Cc: Ingo Molnar <mi...@redhat.com>
> Cc: "H. Peter Anvin" <h...@zytor.com>
> 
> v3:
>  - As v1 but with additional stubs in xen/xen-ops.h to handle
>configurations without CONFIG_XEN_AUTO_XLATE.
> ---
>  arch/x86/xen/mmu.c| 14 --
>  include/xen/xen-ops.h | 24 
>  2 files changed, 36 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> index 3e15345abfe7..d33e7dbe3129 100644
> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -172,6 +172,9 @@ int xen_remap_domain_gfn_range(struct
> vm_area_struct *vma,
>  pgprot_t prot, unsigned domid,
>  struct page **pages)
>  {
> + if (xen_feature(XENFEAT_auto_translated_physmap))
> + return -EOPNOTSUPP;
> +
>   return do_remap_gfn(vma, addr, , nr, NULL, prot, domid,
> pages);
>  }
>  EXPORT_SYMBOL_GPL(xen_remap_domain_gfn_range);
> @@ -182,6 +185,10 @@ int xen_remap_domain_gfn_array(struct
> vm_area_struct *vma,
>  int *err_ptr, pgprot_t prot,
>  unsigned domid, struct page **pages)
>  {
> + if (xen_feature(XENFEAT_auto_translated_physmap))
> + return xen_xlate_remap_gfn_array(vma, addr, gfn, nr,
> err_ptr,
> +  prot, domid, pages);
> +
>   /* We BUG_ON because it's a programmer error to pass a NULL
> err_ptr,
>* and the consequences later is quite hard to detect what the actual
>* cause of "wrong memory was mapped in".
> @@ -193,9 +200,12 @@
> EXPORT_SYMBOL_GPL(xen_remap_domain_gfn_array);
> 
>  /* Returns: 0 success */
>  int xen_unmap_domain_gfn_range(struct vm_area_struct *vma,
> -int numpgs, struct page **pages)
> +int nr, struct page **pages)
>  {
> - if (!pages || !xen_feature(XENFEAT_auto_translated_physmap))
> + if (xen_feature(XENFEAT_auto_translated_physmap))
> + return xen_xlate_unmap_gfn_range(vma, nr, pages);
> +
> + if (!pages)
>   return 0;
> 
>   return -EINVAL;
> diff --git a/include/xen/xen-ops.h b/include/xen/xen-ops.h
> index 218e6aae5433..18b25631a113 100644
> --- a/include/xen/xen-ops.h
> +++ b/include/xen/xen-ops.h
> @@ -103,6 +103,8 @@ int xen_remap_domain_gfn_range(struct
> vm_area_struct *vma,
>  struct page **pages);
>  int xen_unmap_domain_gfn_range(struct vm_area_struct *vma,
>  int numpgs, struct page **pages);
> +
> +#ifdef CONFIG_XEN_AUTO_XLATE
>  int xen_xlate_remap_gfn_array(struct vm_area_struct *vma,
> unsigned long addr,
> xen_pfn_t *gfn, int nr,
> @@ -111,6 +113,28 @@ int xen_xlate_remap_gfn_array(struct
> vm_area_struct *vma,
> struct page **pages);
>  int xen_xlate_unmap_gfn_range(struct vm_area_struct *vma,
> int nr, struct page **pages);
> +#else
> +/*
> + * These two functions are called from arch/x86/xen/mmu.c and so stubs
> + * are needed for a configuration not specifying
> CONFIG_XEN_AUTO_XLATE.
> + */
> +static inline int xen_xlate_remap_gfn_array(struct vm_area_struct *vma,
> +

RE: [PATCH v3] xen: support priv-mapping in an HVM tools domain

2017-11-03 Thread Paul Durrant

> -Original Message-
> From: Paul Durrant [mailto:paul.durr...@citrix.com]
> Sent: 03 November 2017 16:58
> To: x...@kernel.org; xen-de...@lists.xenproject.org; linux-
> ker...@vger.kernel.org
> Cc: Paul Durrant ; Boris Ostrovsky
> ; Juergen Gross ; Thomas
> Gleixner ; Ingo Molnar ; H. Peter
> Anvin 
> Subject: [PATCH v3] xen: support priv-mapping in an HVM tools domain
> 
> If the domain has XENFEAT_auto_translated_physmap then use of the PV-
> specific HYPERVISOR_mmu_update hypercall is clearly incorrect.
> 
> This patch adds checks in xen_remap_domain_gfn_array() and
> xen_unmap_domain_gfn_array() which call through to the approprate
> xlate_mmu function if the feature is present.
> 
> This patch also moves xen_remap_domain_gfn_range() into the PV-only
> MMU
> code and #ifdefs the (only) calling code in privcmd accordingly.

 I realise now that this paragraph refers to the code in the v2 patch. 
I'll send v4.

  Paul

> 
> Signed-off-by: Paul Durrant 
> ---
> Cc: Boris Ostrovsky 
> Cc: Juergen Gross 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: "H. Peter Anvin" 
> 
> v3:
>  - As v1 but with additional stubs in xen/xen-ops.h to handle
>configurations without CONFIG_XEN_AUTO_XLATE.
> ---
>  arch/x86/xen/mmu.c| 14 --
>  include/xen/xen-ops.h | 24 
>  2 files changed, 36 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> index 3e15345abfe7..d33e7dbe3129 100644
> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -172,6 +172,9 @@ int xen_remap_domain_gfn_range(struct
> vm_area_struct *vma,
>  pgprot_t prot, unsigned domid,
>  struct page **pages)
>  {
> + if (xen_feature(XENFEAT_auto_translated_physmap))
> + return -EOPNOTSUPP;
> +
>   return do_remap_gfn(vma, addr, , nr, NULL, prot, domid,
> pages);
>  }
>  EXPORT_SYMBOL_GPL(xen_remap_domain_gfn_range);
> @@ -182,6 +185,10 @@ int xen_remap_domain_gfn_array(struct
> vm_area_struct *vma,
>  int *err_ptr, pgprot_t prot,
>  unsigned domid, struct page **pages)
>  {
> + if (xen_feature(XENFEAT_auto_translated_physmap))
> + return xen_xlate_remap_gfn_array(vma, addr, gfn, nr,
> err_ptr,
> +  prot, domid, pages);
> +
>   /* We BUG_ON because it's a programmer error to pass a NULL
> err_ptr,
>* and the consequences later is quite hard to detect what the actual
>* cause of "wrong memory was mapped in".
> @@ -193,9 +200,12 @@
> EXPORT_SYMBOL_GPL(xen_remap_domain_gfn_array);
> 
>  /* Returns: 0 success */
>  int xen_unmap_domain_gfn_range(struct vm_area_struct *vma,
> -int numpgs, struct page **pages)
> +int nr, struct page **pages)
>  {
> - if (!pages || !xen_feature(XENFEAT_auto_translated_physmap))
> + if (xen_feature(XENFEAT_auto_translated_physmap))
> + return xen_xlate_unmap_gfn_range(vma, nr, pages);
> +
> + if (!pages)
>   return 0;
> 
>   return -EINVAL;
> diff --git a/include/xen/xen-ops.h b/include/xen/xen-ops.h
> index 218e6aae5433..18b25631a113 100644
> --- a/include/xen/xen-ops.h
> +++ b/include/xen/xen-ops.h
> @@ -103,6 +103,8 @@ int xen_remap_domain_gfn_range(struct
> vm_area_struct *vma,
>  struct page **pages);
>  int xen_unmap_domain_gfn_range(struct vm_area_struct *vma,
>  int numpgs, struct page **pages);
> +
> +#ifdef CONFIG_XEN_AUTO_XLATE
>  int xen_xlate_remap_gfn_array(struct vm_area_struct *vma,
> unsigned long addr,
> xen_pfn_t *gfn, int nr,
> @@ -111,6 +113,28 @@ int xen_xlate_remap_gfn_array(struct
> vm_area_struct *vma,
> struct page **pages);
>  int xen_xlate_unmap_gfn_range(struct vm_area_struct *vma,
> int nr, struct page **pages);
> +#else
> +/*
> + * These two functions are called from arch/x86/xen/mmu.c and so stubs
> + * are needed for a configuration not specifying
> CONFIG_XEN_AUTO_XLATE.
> + */
> +static inline int xen_xlate_remap_gfn_array(struct vm_area_struct *vma,
> + unsigned long addr,
> + xen_pfn_t *gfn, int nr,
> + int *err_ptr, pgprot_t prot,
> + unsigned int domid,
> + struct page **pages)
> +{
&

[PATCH v3] xen: support priv-mapping in an HVM tools domain

2017-11-03 Thread Paul Durrant

If the domain has XENFEAT_auto_translated_physmap then use of the PV-
specific HYPERVISOR_mmu_update hypercall is clearly incorrect.

This patch adds checks in xen_remap_domain_gfn_array() and
xen_unmap_domain_gfn_array() which call through to the approprate
xlate_mmu function if the feature is present.

This patch also moves xen_remap_domain_gfn_range() into the PV-only MMU
code and #ifdefs the (only) calling code in privcmd accordingly.

Signed-off-by: Paul Durrant <paul.durr...@citrix.com>
---
Cc: Boris Ostrovsky <boris.ostrov...@oracle.com>
Cc: Juergen Gross <jgr...@suse.com>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Ingo Molnar <mi...@redhat.com>
Cc: "H. Peter Anvin" <h...@zytor.com>

v3:
 - As v1 but with additional stubs in xen/xen-ops.h to handle
   configurations without CONFIG_XEN_AUTO_XLATE.
---
 arch/x86/xen/mmu.c| 14 --
 include/xen/xen-ops.h | 24 
 2 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 3e15345abfe7..d33e7dbe3129 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -172,6 +172,9 @@ int xen_remap_domain_gfn_range(struct vm_area_struct *vma,
   pgprot_t prot, unsigned domid,
   struct page **pages)
 {
+   if (xen_feature(XENFEAT_auto_translated_physmap))
+   return -EOPNOTSUPP;
+
return do_remap_gfn(vma, addr, , nr, NULL, prot, domid, pages);
 }
 EXPORT_SYMBOL_GPL(xen_remap_domain_gfn_range);
@@ -182,6 +185,10 @@ int xen_remap_domain_gfn_array(struct vm_area_struct *vma,
   int *err_ptr, pgprot_t prot,
   unsigned domid, struct page **pages)
 {
+   if (xen_feature(XENFEAT_auto_translated_physmap))
+   return xen_xlate_remap_gfn_array(vma, addr, gfn, nr, err_ptr,
+prot, domid, pages);
+
/* We BUG_ON because it's a programmer error to pass a NULL err_ptr,
 * and the consequences later is quite hard to detect what the actual
 * cause of "wrong memory was mapped in".
@@ -193,9 +200,12 @@ EXPORT_SYMBOL_GPL(xen_remap_domain_gfn_array);
 
 /* Returns: 0 success */
 int xen_unmap_domain_gfn_range(struct vm_area_struct *vma,
-  int numpgs, struct page **pages)
+  int nr, struct page **pages)
 {
-   if (!pages || !xen_feature(XENFEAT_auto_translated_physmap))
+   if (xen_feature(XENFEAT_auto_translated_physmap))
+   return xen_xlate_unmap_gfn_range(vma, nr, pages);
+
+   if (!pages)
return 0;
 
return -EINVAL;
diff --git a/include/xen/xen-ops.h b/include/xen/xen-ops.h
index 218e6aae5433..18b25631a113 100644
--- a/include/xen/xen-ops.h
+++ b/include/xen/xen-ops.h
@@ -103,6 +103,8 @@ int xen_remap_domain_gfn_range(struct vm_area_struct *vma,
   struct page **pages);
 int xen_unmap_domain_gfn_range(struct vm_area_struct *vma,
   int numpgs, struct page **pages);
+
+#ifdef CONFIG_XEN_AUTO_XLATE
 int xen_xlate_remap_gfn_array(struct vm_area_struct *vma,
  unsigned long addr,
  xen_pfn_t *gfn, int nr,
@@ -111,6 +113,28 @@ int xen_xlate_remap_gfn_array(struct vm_area_struct *vma,
  struct page **pages);
 int xen_xlate_unmap_gfn_range(struct vm_area_struct *vma,
  int nr, struct page **pages);
+#else
+/*
+ * These two functions are called from arch/x86/xen/mmu.c and so stubs
+ * are needed for a configuration not specifying CONFIG_XEN_AUTO_XLATE.
+ */
+static inline int xen_xlate_remap_gfn_array(struct vm_area_struct *vma,
+   unsigned long addr,
+   xen_pfn_t *gfn, int nr,
+   int *err_ptr, pgprot_t prot,
+   unsigned int domid,
+   struct page **pages)
+{
+   return -EOPNOTSUPP;
+}
+
+static inline int xen_xlate_unmap_gfn_range(struct vm_area_struct *vma,
+   int nr, struct page **pages)
+{
+   return -EOPNOTSUPP;
+}
+#endif
+
 int xen_xlate_map_ballooned_pages(xen_pfn_t **pfns, void **vaddr,
  unsigned long nr_grant_frames);
 
-- 
2.11.0

[PATCH v3] xen: support priv-mapping in an HVM tools domain

2017-11-03 Thread Paul Durrant

If the domain has XENFEAT_auto_translated_physmap then use of the PV-
specific HYPERVISOR_mmu_update hypercall is clearly incorrect.

This patch adds checks in xen_remap_domain_gfn_array() and
xen_unmap_domain_gfn_array() which call through to the approprate
xlate_mmu function if the feature is present.

This patch also moves xen_remap_domain_gfn_range() into the PV-only MMU
code and #ifdefs the (only) calling code in privcmd accordingly.

Signed-off-by: Paul Durrant 
---
Cc: Boris Ostrovsky 
Cc: Juergen Gross 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 

v3:
 - As v1 but with additional stubs in xen/xen-ops.h to handle
   configurations without CONFIG_XEN_AUTO_XLATE.
---
 arch/x86/xen/mmu.c| 14 --
 include/xen/xen-ops.h | 24 
 2 files changed, 36 insertions(+), 2 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 3e15345abfe7..d33e7dbe3129 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -172,6 +172,9 @@ int xen_remap_domain_gfn_range(struct vm_area_struct *vma,
   pgprot_t prot, unsigned domid,
   struct page **pages)
 {
+   if (xen_feature(XENFEAT_auto_translated_physmap))
+   return -EOPNOTSUPP;
+
return do_remap_gfn(vma, addr, , nr, NULL, prot, domid, pages);
 }
 EXPORT_SYMBOL_GPL(xen_remap_domain_gfn_range);
@@ -182,6 +185,10 @@ int xen_remap_domain_gfn_array(struct vm_area_struct *vma,
   int *err_ptr, pgprot_t prot,
   unsigned domid, struct page **pages)
 {
+   if (xen_feature(XENFEAT_auto_translated_physmap))
+   return xen_xlate_remap_gfn_array(vma, addr, gfn, nr, err_ptr,
+prot, domid, pages);
+
/* We BUG_ON because it's a programmer error to pass a NULL err_ptr,
 * and the consequences later is quite hard to detect what the actual
 * cause of "wrong memory was mapped in".
@@ -193,9 +200,12 @@ EXPORT_SYMBOL_GPL(xen_remap_domain_gfn_array);
 
 /* Returns: 0 success */
 int xen_unmap_domain_gfn_range(struct vm_area_struct *vma,
-  int numpgs, struct page **pages)
+  int nr, struct page **pages)
 {
-   if (!pages || !xen_feature(XENFEAT_auto_translated_physmap))
+   if (xen_feature(XENFEAT_auto_translated_physmap))
+   return xen_xlate_unmap_gfn_range(vma, nr, pages);
+
+   if (!pages)
return 0;
 
return -EINVAL;
diff --git a/include/xen/xen-ops.h b/include/xen/xen-ops.h
index 218e6aae5433..18b25631a113 100644
--- a/include/xen/xen-ops.h
+++ b/include/xen/xen-ops.h
@@ -103,6 +103,8 @@ int xen_remap_domain_gfn_range(struct vm_area_struct *vma,
   struct page **pages);
 int xen_unmap_domain_gfn_range(struct vm_area_struct *vma,
   int numpgs, struct page **pages);
+
+#ifdef CONFIG_XEN_AUTO_XLATE
 int xen_xlate_remap_gfn_array(struct vm_area_struct *vma,
  unsigned long addr,
  xen_pfn_t *gfn, int nr,
@@ -111,6 +113,28 @@ int xen_xlate_remap_gfn_array(struct vm_area_struct *vma,
  struct page **pages);
 int xen_xlate_unmap_gfn_range(struct vm_area_struct *vma,
  int nr, struct page **pages);
+#else
+/*
+ * These two functions are called from arch/x86/xen/mmu.c and so stubs
+ * are needed for a configuration not specifying CONFIG_XEN_AUTO_XLATE.
+ */
+static inline int xen_xlate_remap_gfn_array(struct vm_area_struct *vma,
+   unsigned long addr,
+   xen_pfn_t *gfn, int nr,
+   int *err_ptr, pgprot_t prot,
+   unsigned int domid,
+   struct page **pages)
+{
+   return -EOPNOTSUPP;
+}
+
+static inline int xen_xlate_unmap_gfn_range(struct vm_area_struct *vma,
+   int nr, struct page **pages)
+{
+   return -EOPNOTSUPP;
+}
+#endif
+
 int xen_xlate_map_ballooned_pages(xen_pfn_t **pfns, void **vaddr,
  unsigned long nr_grant_frames);
 
-- 
2.11.0

RE: [PATCH v2] xen: support priv-mapping in an HVM tools domain

2017-11-02 Thread Paul Durrant

> -Original Message-
> From: Boris Ostrovsky [mailto:boris.ostrov...@oracle.com]
> Sent: 01 November 2017 18:19
> To: Juergen Gross <jgr...@suse.com>; Paul Durrant
> <paul.durr...@citrix.com>; x...@kernel.org; xen-
> de...@lists.xenproject.org; linux-kernel@vger.kernel.org
> Cc: Thomas Gleixner <t...@linutronix.de>; Ingo Molnar
> <mi...@redhat.com>; H. Peter Anvin <h...@zytor.com>
> Subject: Re: [PATCH v2] xen: support priv-mapping in an HVM tools domain
> 
> On 11/01/2017 11:37 AM, Juergen Gross wrote:
> >
> > TBH I like V1 better, too.
> >
> > Boris, do you feel strong about the #ifdef part?
> 
> Having looked at what this turned into I now like V1 better too ;-)
> 
> Sorry, Paul.

That's ok. Are you happy with v1 as-is or do you want me to submit a v3 with 
any tweaks?

  Paul

> 
> 
> -boris

RE: [PATCH v2] xen: support priv-mapping in an HVM tools domain

2017-11-02 Thread Paul Durrant

> -Original Message-
> From: Boris Ostrovsky [mailto:boris.ostrov...@oracle.com]
> Sent: 01 November 2017 18:19
> To: Juergen Gross ; Paul Durrant
> ; x...@kernel.org; xen-
> de...@lists.xenproject.org; linux-kernel@vger.kernel.org
> Cc: Thomas Gleixner ; Ingo Molnar
> ; H. Peter Anvin 
> Subject: Re: [PATCH v2] xen: support priv-mapping in an HVM tools domain
> 
> On 11/01/2017 11:37 AM, Juergen Gross wrote:
> >
> > TBH I like V1 better, too.
> >
> > Boris, do you feel strong about the #ifdef part?
> 
> Having looked at what this turned into I now like V1 better too ;-)
> 
> Sorry, Paul.

That's ok. Are you happy with v1 as-is or do you want me to submit a v3 with 
any tweaks?

  Paul

> 
> 
> -boris

RE: [PATCH v2] xen: support priv-mapping in an HVM tools domain

2017-11-01 Thread Paul Durrant

> -Original Message-
> From: Juergen Gross [mailto:jgr...@suse.com]
> Sent: 01 November 2017 13:40
> To: Paul Durrant <paul.durr...@citrix.com>; x...@kernel.org; xen-
> de...@lists.xenproject.org; linux-kernel@vger.kernel.org
> Cc: Boris Ostrovsky <boris.ostrov...@oracle.com>; Thomas Gleixner
> <t...@linutronix.de>; Ingo Molnar <mi...@redhat.com>; H. Peter Anvin
> <h...@zytor.com>
> Subject: Re: [PATCH v2] xen: support priv-mapping in an HVM tools domain
> 
> On 01/11/17 12:31, Paul Durrant wrote:
> > If the domain has XENFEAT_auto_translated_physmap then use of the PV-
> > specific HYPERVISOR_mmu_update hypercall is clearly incorrect.
> >
> > This patch adds checks in xen_remap_domain_gfn_array() and
> > xen_unmap_domain_gfn_array() which call through to the approprate
> > xlate_mmu function if the feature is present.
> >
> > This patch also moves xen_remap_domain_gfn_range() into the PV-only
> MMU
> > code and #ifdefs the (only) calling code in privcmd accordingly.
> >
> > Signed-off-by: Paul Durrant <paul.durr...@citrix.com>
> > ---
> > Cc: Boris Ostrovsky <boris.ostrov...@oracle.com>
> > Cc: Juergen Gross <jgr...@suse.com>
> > Cc: Thomas Gleixner <t...@linutronix.de>
> > Cc: Ingo Molnar <mi...@redhat.com>
> > Cc: "H. Peter Anvin" <h...@zytor.com>
> > ---
> >  arch/x86/xen/mmu.c| 36 +---
> >  arch/x86/xen/mmu_pv.c | 11 +++
> >  drivers/xen/privcmd.c | 17 +
> >  include/xen/xen-ops.h |  7 +++
> >  4 files changed, 48 insertions(+), 23 deletions(-)
> >
> > diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> > index 3e15345abfe7..01837c36e293 100644
> > --- a/arch/x86/xen/mmu.c
> > +++ b/arch/x86/xen/mmu.c
> > @@ -91,12 +91,12 @@ static int remap_area_mfn_pte_fn(pte_t *ptep,
> pgtable_t token,
> > return 0;
> >  }
> >
> > -static int do_remap_gfn(struct vm_area_struct *vma,
> > -   unsigned long addr,
> > -   xen_pfn_t *gfn, int nr,
> > -   int *err_ptr, pgprot_t prot,
> > -   unsigned domid,
> > -   struct page **pages)
> > +int xen_remap_gfn(struct vm_area_struct *vma,
> > + unsigned long addr,
> > + xen_pfn_t *gfn, int nr,
> > + int *err_ptr, pgprot_t prot,
> > + unsigned int domid,
> > + struct page **pages)
> >  {
> > int err = 0;
> > struct remap_data rmd;
> > @@ -166,36 +166,34 @@ static int do_remap_gfn(struct vm_area_struct
> *vma,
> > return err < 0 ? err : mapped;
> >  }
> >
> > -int xen_remap_domain_gfn_range(struct vm_area_struct *vma,
> > -  unsigned long addr,
> > -  xen_pfn_t gfn, int nr,
> > -  pgprot_t prot, unsigned domid,
> > -  struct page **pages)
> > -{
> > -   return do_remap_gfn(vma, addr, , nr, NULL, prot, domid,
> pages);
> > -}
> > -EXPORT_SYMBOL_GPL(xen_remap_domain_gfn_range);
> > -
> >  int xen_remap_domain_gfn_array(struct vm_area_struct *vma,
> >unsigned long addr,
> >xen_pfn_t *gfn, int nr,
> >int *err_ptr, pgprot_t prot,
> >unsigned domid, struct page **pages)
> >  {
> > +   if (xen_feature(XENFEAT_auto_translated_physmap))
> > +   return xen_xlate_remap_gfn_array(vma, addr, gfn, nr,
> err_ptr,
> > +prot, domid, pages);
> > +
> > /* We BUG_ON because it's a programmer error to pass a NULL
> err_ptr,
> >  * and the consequences later is quite hard to detect what the actual
> >  * cause of "wrong memory was mapped in".
> >  */
> > BUG_ON(err_ptr == NULL);
> > -   return do_remap_gfn(vma, addr, gfn, nr, err_ptr, prot, domid,
> pages);
> > +   return xen_remap_gfn(vma, addr, gfn, nr, err_ptr, prot, domid,
> > +pages);
> >  }
> >  EXPORT_SYMBOL_GPL(xen_remap_domain_gfn_array);
> >
> >  /* Returns: 0 success */
> >  int xen_unmap_domain_gfn_range(struct vm_area_struct *vma,
> > -  int numpgs, struct page **pages)
> > +  int nr, struct page **pages)
> >  {
> > -   if (!pages || !xen_feature(XENFEAT_auto_translated_physmap))
> > +   if (xen

RE: [PATCH v2] xen: support priv-mapping in an HVM tools domain

2017-11-01 Thread Paul Durrant

> -Original Message-
> From: Juergen Gross [mailto:jgr...@suse.com]
> Sent: 01 November 2017 13:40
> To: Paul Durrant ; x...@kernel.org; xen-
> de...@lists.xenproject.org; linux-kernel@vger.kernel.org
> Cc: Boris Ostrovsky ; Thomas Gleixner
> ; Ingo Molnar ; H. Peter Anvin
> 
> Subject: Re: [PATCH v2] xen: support priv-mapping in an HVM tools domain
> 
> On 01/11/17 12:31, Paul Durrant wrote:
> > If the domain has XENFEAT_auto_translated_physmap then use of the PV-
> > specific HYPERVISOR_mmu_update hypercall is clearly incorrect.
> >
> > This patch adds checks in xen_remap_domain_gfn_array() and
> > xen_unmap_domain_gfn_array() which call through to the approprate
> > xlate_mmu function if the feature is present.
> >
> > This patch also moves xen_remap_domain_gfn_range() into the PV-only
> MMU
> > code and #ifdefs the (only) calling code in privcmd accordingly.
> >
> > Signed-off-by: Paul Durrant 
> > ---
> > Cc: Boris Ostrovsky 
> > Cc: Juergen Gross 
> > Cc: Thomas Gleixner 
> > Cc: Ingo Molnar 
> > Cc: "H. Peter Anvin" 
> > ---
> >  arch/x86/xen/mmu.c| 36 +---
> >  arch/x86/xen/mmu_pv.c | 11 +++
> >  drivers/xen/privcmd.c | 17 +
> >  include/xen/xen-ops.h |  7 +++
> >  4 files changed, 48 insertions(+), 23 deletions(-)
> >
> > diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> > index 3e15345abfe7..01837c36e293 100644
> > --- a/arch/x86/xen/mmu.c
> > +++ b/arch/x86/xen/mmu.c
> > @@ -91,12 +91,12 @@ static int remap_area_mfn_pte_fn(pte_t *ptep,
> pgtable_t token,
> > return 0;
> >  }
> >
> > -static int do_remap_gfn(struct vm_area_struct *vma,
> > -   unsigned long addr,
> > -   xen_pfn_t *gfn, int nr,
> > -   int *err_ptr, pgprot_t prot,
> > -   unsigned domid,
> > -   struct page **pages)
> > +int xen_remap_gfn(struct vm_area_struct *vma,
> > + unsigned long addr,
> > + xen_pfn_t *gfn, int nr,
> > + int *err_ptr, pgprot_t prot,
> > + unsigned int domid,
> > + struct page **pages)
> >  {
> > int err = 0;
> > struct remap_data rmd;
> > @@ -166,36 +166,34 @@ static int do_remap_gfn(struct vm_area_struct
> *vma,
> > return err < 0 ? err : mapped;
> >  }
> >
> > -int xen_remap_domain_gfn_range(struct vm_area_struct *vma,
> > -  unsigned long addr,
> > -  xen_pfn_t gfn, int nr,
> > -  pgprot_t prot, unsigned domid,
> > -  struct page **pages)
> > -{
> > -   return do_remap_gfn(vma, addr, , nr, NULL, prot, domid,
> pages);
> > -}
> > -EXPORT_SYMBOL_GPL(xen_remap_domain_gfn_range);
> > -
> >  int xen_remap_domain_gfn_array(struct vm_area_struct *vma,
> >unsigned long addr,
> >xen_pfn_t *gfn, int nr,
> >int *err_ptr, pgprot_t prot,
> >unsigned domid, struct page **pages)
> >  {
> > +   if (xen_feature(XENFEAT_auto_translated_physmap))
> > +   return xen_xlate_remap_gfn_array(vma, addr, gfn, nr,
> err_ptr,
> > +prot, domid, pages);
> > +
> > /* We BUG_ON because it's a programmer error to pass a NULL
> err_ptr,
> >  * and the consequences later is quite hard to detect what the actual
> >  * cause of "wrong memory was mapped in".
> >  */
> > BUG_ON(err_ptr == NULL);
> > -   return do_remap_gfn(vma, addr, gfn, nr, err_ptr, prot, domid,
> pages);
> > +   return xen_remap_gfn(vma, addr, gfn, nr, err_ptr, prot, domid,
> > +pages);
> >  }
> >  EXPORT_SYMBOL_GPL(xen_remap_domain_gfn_array);
> >
> >  /* Returns: 0 success */
> >  int xen_unmap_domain_gfn_range(struct vm_area_struct *vma,
> > -  int numpgs, struct page **pages)
> > +  int nr, struct page **pages)
> >  {
> > -   if (!pages || !xen_feature(XENFEAT_auto_translated_physmap))
> > +   if (xen_feature(XENFEAT_auto_translated_physmap))
> > +   return xen_xlate_unmap_gfn_range(vma, nr, pages);
> > +
> > +   if (!pages)
> > return 0;
> >
> > return -EINVAL;
> > diff --git a/arch/x86/xen/mmu_pv.

[PATCH v2] xen: support priv-mapping in an HVM tools domain

2017-11-01 Thread Paul Durrant

If the domain has XENFEAT_auto_translated_physmap then use of the PV-
specific HYPERVISOR_mmu_update hypercall is clearly incorrect.

This patch adds checks in xen_remap_domain_gfn_array() and
xen_unmap_domain_gfn_array() which call through to the approprate
xlate_mmu function if the feature is present.

This patch also moves xen_remap_domain_gfn_range() into the PV-only MMU
code and #ifdefs the (only) calling code in privcmd accordingly.

Signed-off-by: Paul Durrant <paul.durr...@citrix.com>
---
Cc: Boris Ostrovsky <boris.ostrov...@oracle.com>
Cc: Juergen Gross <jgr...@suse.com>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Ingo Molnar <mi...@redhat.com>
Cc: "H. Peter Anvin" <h...@zytor.com>
---
 arch/x86/xen/mmu.c| 36 +---
 arch/x86/xen/mmu_pv.c | 11 +++
 drivers/xen/privcmd.c | 17 +
 include/xen/xen-ops.h |  7 +++
 4 files changed, 48 insertions(+), 23 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 3e15345abfe7..01837c36e293 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -91,12 +91,12 @@ static int remap_area_mfn_pte_fn(pte_t *ptep, pgtable_t 
token,
return 0;
 }
 
-static int do_remap_gfn(struct vm_area_struct *vma,
-   unsigned long addr,
-   xen_pfn_t *gfn, int nr,
-   int *err_ptr, pgprot_t prot,
-   unsigned domid,
-   struct page **pages)
+int xen_remap_gfn(struct vm_area_struct *vma,
+ unsigned long addr,
+ xen_pfn_t *gfn, int nr,
+ int *err_ptr, pgprot_t prot,
+ unsigned int domid,
+ struct page **pages)
 {
int err = 0;
struct remap_data rmd;
@@ -166,36 +166,34 @@ static int do_remap_gfn(struct vm_area_struct *vma,
return err < 0 ? err : mapped;
 }
 
-int xen_remap_domain_gfn_range(struct vm_area_struct *vma,
-  unsigned long addr,
-  xen_pfn_t gfn, int nr,
-  pgprot_t prot, unsigned domid,
-  struct page **pages)
-{
-   return do_remap_gfn(vma, addr, , nr, NULL, prot, domid, pages);
-}
-EXPORT_SYMBOL_GPL(xen_remap_domain_gfn_range);
-
 int xen_remap_domain_gfn_array(struct vm_area_struct *vma,
   unsigned long addr,
   xen_pfn_t *gfn, int nr,
   int *err_ptr, pgprot_t prot,
   unsigned domid, struct page **pages)
 {
+   if (xen_feature(XENFEAT_auto_translated_physmap))
+   return xen_xlate_remap_gfn_array(vma, addr, gfn, nr, err_ptr,
+prot, domid, pages);
+
/* We BUG_ON because it's a programmer error to pass a NULL err_ptr,
 * and the consequences later is quite hard to detect what the actual
 * cause of "wrong memory was mapped in".
 */
BUG_ON(err_ptr == NULL);
-   return do_remap_gfn(vma, addr, gfn, nr, err_ptr, prot, domid, pages);
+   return xen_remap_gfn(vma, addr, gfn, nr, err_ptr, prot, domid,
+pages);
 }
 EXPORT_SYMBOL_GPL(xen_remap_domain_gfn_array);
 
 /* Returns: 0 success */
 int xen_unmap_domain_gfn_range(struct vm_area_struct *vma,
-  int numpgs, struct page **pages)
+  int nr, struct page **pages)
 {
-   if (!pages || !xen_feature(XENFEAT_auto_translated_physmap))
+   if (xen_feature(XENFEAT_auto_translated_physmap))
+   return xen_xlate_unmap_gfn_range(vma, nr, pages);
+
+   if (!pages)
return 0;
 
return -EINVAL;
diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c
index 71495f1a86d7..4974d8a6c2b4 100644
--- a/arch/x86/xen/mmu_pv.c
+++ b/arch/x86/xen/mmu_pv.c
@@ -2670,3 +2670,14 @@ phys_addr_t paddr_vmcoreinfo_note(void)
return __pa(vmcoreinfo_note);
 }
 #endif /* CONFIG_KEXEC_CORE */
+
+int xen_remap_domain_gfn_range(struct vm_area_struct *vma,
+  unsigned long addr,
+  xen_pfn_t gfn, int nr,
+  pgprot_t prot, unsigned int domid,
+  struct page **pages)
+{
+   return xen_remap_gfn(vma, addr, , nr, NULL, prot, domid,
+pages);
+}
+EXPORT_SYMBOL_GPL(xen_remap_domain_gfn_range);
diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
index feca75b07fdd..b58a1719b606 100644
--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -215,6 +215,8 @@ static int traverse_pages_block(unsigned nelem, size_t size,
return ret;
 }
 
+#ifdef CONFIG_XEN_PV
+
 struct mmap_gfn_state {
unsigned long va;
struct vm_area_struct *vma;
@@ -261,10 +263,6 @@ static lon

[PATCH v2] xen: support priv-mapping in an HVM tools domain

2017-11-01 Thread Paul Durrant

If the domain has XENFEAT_auto_translated_physmap then use of the PV-
specific HYPERVISOR_mmu_update hypercall is clearly incorrect.

This patch adds checks in xen_remap_domain_gfn_array() and
xen_unmap_domain_gfn_array() which call through to the approprate
xlate_mmu function if the feature is present.

This patch also moves xen_remap_domain_gfn_range() into the PV-only MMU
code and #ifdefs the (only) calling code in privcmd accordingly.

Signed-off-by: Paul Durrant 
---
Cc: Boris Ostrovsky 
Cc: Juergen Gross 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
---
 arch/x86/xen/mmu.c| 36 +---
 arch/x86/xen/mmu_pv.c | 11 +++
 drivers/xen/privcmd.c | 17 +
 include/xen/xen-ops.h |  7 +++
 4 files changed, 48 insertions(+), 23 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 3e15345abfe7..01837c36e293 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -91,12 +91,12 @@ static int remap_area_mfn_pte_fn(pte_t *ptep, pgtable_t 
token,
return 0;
 }
 
-static int do_remap_gfn(struct vm_area_struct *vma,
-   unsigned long addr,
-   xen_pfn_t *gfn, int nr,
-   int *err_ptr, pgprot_t prot,
-   unsigned domid,
-   struct page **pages)
+int xen_remap_gfn(struct vm_area_struct *vma,
+ unsigned long addr,
+ xen_pfn_t *gfn, int nr,
+ int *err_ptr, pgprot_t prot,
+ unsigned int domid,
+ struct page **pages)
 {
int err = 0;
struct remap_data rmd;
@@ -166,36 +166,34 @@ static int do_remap_gfn(struct vm_area_struct *vma,
return err < 0 ? err : mapped;
 }
 
-int xen_remap_domain_gfn_range(struct vm_area_struct *vma,
-  unsigned long addr,
-  xen_pfn_t gfn, int nr,
-  pgprot_t prot, unsigned domid,
-  struct page **pages)
-{
-   return do_remap_gfn(vma, addr, , nr, NULL, prot, domid, pages);
-}
-EXPORT_SYMBOL_GPL(xen_remap_domain_gfn_range);
-
 int xen_remap_domain_gfn_array(struct vm_area_struct *vma,
   unsigned long addr,
   xen_pfn_t *gfn, int nr,
   int *err_ptr, pgprot_t prot,
   unsigned domid, struct page **pages)
 {
+   if (xen_feature(XENFEAT_auto_translated_physmap))
+   return xen_xlate_remap_gfn_array(vma, addr, gfn, nr, err_ptr,
+prot, domid, pages);
+
/* We BUG_ON because it's a programmer error to pass a NULL err_ptr,
 * and the consequences later is quite hard to detect what the actual
 * cause of "wrong memory was mapped in".
 */
BUG_ON(err_ptr == NULL);
-   return do_remap_gfn(vma, addr, gfn, nr, err_ptr, prot, domid, pages);
+   return xen_remap_gfn(vma, addr, gfn, nr, err_ptr, prot, domid,
+pages);
 }
 EXPORT_SYMBOL_GPL(xen_remap_domain_gfn_array);
 
 /* Returns: 0 success */
 int xen_unmap_domain_gfn_range(struct vm_area_struct *vma,
-  int numpgs, struct page **pages)
+  int nr, struct page **pages)
 {
-   if (!pages || !xen_feature(XENFEAT_auto_translated_physmap))
+   if (xen_feature(XENFEAT_auto_translated_physmap))
+   return xen_xlate_unmap_gfn_range(vma, nr, pages);
+
+   if (!pages)
return 0;
 
return -EINVAL;
diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c
index 71495f1a86d7..4974d8a6c2b4 100644
--- a/arch/x86/xen/mmu_pv.c
+++ b/arch/x86/xen/mmu_pv.c
@@ -2670,3 +2670,14 @@ phys_addr_t paddr_vmcoreinfo_note(void)
return __pa(vmcoreinfo_note);
 }
 #endif /* CONFIG_KEXEC_CORE */
+
+int xen_remap_domain_gfn_range(struct vm_area_struct *vma,
+  unsigned long addr,
+  xen_pfn_t gfn, int nr,
+  pgprot_t prot, unsigned int domid,
+  struct page **pages)
+{
+   return xen_remap_gfn(vma, addr, , nr, NULL, prot, domid,
+pages);
+}
+EXPORT_SYMBOL_GPL(xen_remap_domain_gfn_range);
diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
index feca75b07fdd..b58a1719b606 100644
--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -215,6 +215,8 @@ static int traverse_pages_block(unsigned nelem, size_t size,
return ret;
 }
 
+#ifdef CONFIG_XEN_PV
+
 struct mmap_gfn_state {
unsigned long va;
struct vm_area_struct *vma;
@@ -261,10 +263,6 @@ static long privcmd_ioctl_mmap(struct file *file, void 
__user *udata)
LIST_HEAD(pagelist);
struct mmap_gfn_state state;
 
-   /* We only support privcmd_io

RE: [Xen-devel] [PATCH] x86/xen: support priv-mapping in an HVM tools domain

2017-10-20 Thread Paul Durrant

> -Original Message-
> From: Xen-devel [mailto:xen-devel-boun...@lists.xen.org] On Behalf Of
> Boris Ostrovsky
> Sent: 20 October 2017 16:09
> To: Paul Durrant <paul.durr...@citrix.com>; x...@kernel.org; xen-
> de...@lists.xenproject.org; linux-kernel@vger.kernel.org
> Cc: Juergen Gross <jgr...@suse.com>; Thomas Gleixner
> <t...@linutronix.de>; Ingo Molnar <mi...@redhat.com>; H. Peter Anvin
> <h...@zytor.com>
> Subject: Re: [Xen-devel] [PATCH] x86/xen: support priv-mapping in an HVM
> tools domain
> 
> On 10/20/2017 04:35 AM, Paul Durrant wrote:
> >> -Original Message-
> >> From: Xen-devel [mailto:xen-devel-boun...@lists.xen.org] On Behalf Of
> >> Boris Ostrovsky
> >> Sent: 19 October 2017 18:45
> >> To: Paul Durrant <paul.durr...@citrix.com>; x...@kernel.org; xen-
> >> de...@lists.xenproject.org; linux-kernel@vger.kernel.org
> >> Cc: Juergen Gross <jgr...@suse.com>; Thomas Gleixner
> >> <t...@linutronix.de>; Ingo Molnar <mi...@redhat.com>; H. Peter Anvin
> >> <h...@zytor.com>
> >> Subject: Re: [Xen-devel] [PATCH] x86/xen: support priv-mapping in an
> HVM
> >> tools domain
> >>
> >> On 10/19/2017 11:26 AM, Paul Durrant wrote:
> >>> If the domain has XENFEAT_auto_translated_physmap then use of the
> PV-
> >>> specific HYPERVISOR_mmu_update hypercall is clearly incorrect.
> >>>
> >>> This patch adds checks in xen_remap_domain_gfn_array() and
> >>> xen_unmap_domain_gfn_array() which call through to the approprate
> >>> xlate_mmu function if the feature is present. A check is also added
> >>> to xen_remap_domain_gfn_range() to fail with -EOPNOTSUPP since this
> >>> should not be used in an HVM tools domain.
> >>>
> >>> Signed-off-by: Paul Durrant <paul.durr...@citrix.com>
> >>> ---
> >>> Cc: Boris Ostrovsky <boris.ostrov...@oracle.com>
> >>> Cc: Juergen Gross <jgr...@suse.com>
> >>> Cc: Thomas Gleixner <t...@linutronix.de>
> >>> Cc: Ingo Molnar <mi...@redhat.com>
> >>> Cc: "H. Peter Anvin" <h...@zytor.com>
> >>> ---
> >>>  arch/x86/xen/mmu.c | 14 --
> >>>  1 file changed, 12 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> >>> index 3e15345abfe7..d33e7dbe3129 100644
> >>> --- a/arch/x86/xen/mmu.c
> >>> +++ b/arch/x86/xen/mmu.c
> >>> @@ -172,6 +172,9 @@ int xen_remap_domain_gfn_range(struct
> >> vm_area_struct *vma,
> >>>  pgprot_t prot, unsigned domid,
> >>>  struct page **pages)
> >>>  {
> >>> + if (xen_feature(XENFEAT_auto_translated_physmap))
> >>> + return -EOPNOTSUPP;
> >>> +
> >> This is never called on XENFEAT_auto_translated_physmap domains,
> there
> >> is a check in privcmd_ioctl_mmap() for that.
> > Yes, that's true but it seems like the wrong place for such a check. I could
> remove that one it you'd prefer.
> 
> I actually think that perhaps we could wrap privcmd_ioctl_mmap() with
> "#ifdef CONFIG_XEN_PV" (#else return -ENOSYS) and move
> xen_remap_domain_gfn_range() to mmu_pv.c. We can then remove it from
> ARM
> code too.
> 
> >
> >>>   return do_remap_gfn(vma, addr, , nr, NULL, prot, domid,
> >> pages);
> >>>  }
> >>>  EXPORT_SYMBOL_GPL(xen_remap_domain_gfn_range);
> >>> @@ -182,6 +185,10 @@ int xen_remap_domain_gfn_array(struct
> >> vm_area_struct *vma,
> >>>  int *err_ptr, pgprot_t prot,
> >>>  unsigned domid, struct page **pages)
> >>>  {
> >>> + if (xen_feature(XENFEAT_auto_translated_physmap))
> >>> + return xen_xlate_remap_gfn_array(vma, addr, gfn, nr,
> >> err_ptr,
> >>> +  prot, domid, pages);
> >>> +
> >> So how did this work before? In fact, I don't see any callers of
> >> xen_xlate_{re|un}map_gfn_range().
> > I assume mean 'array' for the map since there is no
> xen_xlate_remap_gfn_range() function. I'm not quite sure what you're
> asking? Without this patch the mmu code in an x86 domain simply assumes
> the domain is PV... the xlate code is currently only used via the arm mmu
> code (where it clearly knows it's not PV). AFAICS this Is

RE: [Xen-devel] [PATCH] x86/xen: support priv-mapping in an HVM tools domain

2017-10-20 Thread Paul Durrant

> -Original Message-
> From: Xen-devel [mailto:xen-devel-boun...@lists.xen.org] On Behalf Of
> Boris Ostrovsky
> Sent: 20 October 2017 16:09
> To: Paul Durrant ; x...@kernel.org; xen-
> de...@lists.xenproject.org; linux-kernel@vger.kernel.org
> Cc: Juergen Gross ; Thomas Gleixner
> ; Ingo Molnar ; H. Peter Anvin
> 
> Subject: Re: [Xen-devel] [PATCH] x86/xen: support priv-mapping in an HVM
> tools domain
> 
> On 10/20/2017 04:35 AM, Paul Durrant wrote:
> >> -Original Message-
> >> From: Xen-devel [mailto:xen-devel-boun...@lists.xen.org] On Behalf Of
> >> Boris Ostrovsky
> >> Sent: 19 October 2017 18:45
> >> To: Paul Durrant ; x...@kernel.org; xen-
> >> de...@lists.xenproject.org; linux-kernel@vger.kernel.org
> >> Cc: Juergen Gross ; Thomas Gleixner
> >> ; Ingo Molnar ; H. Peter Anvin
> >> 
> >> Subject: Re: [Xen-devel] [PATCH] x86/xen: support priv-mapping in an
> HVM
> >> tools domain
> >>
> >> On 10/19/2017 11:26 AM, Paul Durrant wrote:
> >>> If the domain has XENFEAT_auto_translated_physmap then use of the
> PV-
> >>> specific HYPERVISOR_mmu_update hypercall is clearly incorrect.
> >>>
> >>> This patch adds checks in xen_remap_domain_gfn_array() and
> >>> xen_unmap_domain_gfn_array() which call through to the approprate
> >>> xlate_mmu function if the feature is present. A check is also added
> >>> to xen_remap_domain_gfn_range() to fail with -EOPNOTSUPP since this
> >>> should not be used in an HVM tools domain.
> >>>
> >>> Signed-off-by: Paul Durrant 
> >>> ---
> >>> Cc: Boris Ostrovsky 
> >>> Cc: Juergen Gross 
> >>> Cc: Thomas Gleixner 
> >>> Cc: Ingo Molnar 
> >>> Cc: "H. Peter Anvin" 
> >>> ---
> >>>  arch/x86/xen/mmu.c | 14 --
> >>>  1 file changed, 12 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> >>> index 3e15345abfe7..d33e7dbe3129 100644
> >>> --- a/arch/x86/xen/mmu.c
> >>> +++ b/arch/x86/xen/mmu.c
> >>> @@ -172,6 +172,9 @@ int xen_remap_domain_gfn_range(struct
> >> vm_area_struct *vma,
> >>>  pgprot_t prot, unsigned domid,
> >>>  struct page **pages)
> >>>  {
> >>> + if (xen_feature(XENFEAT_auto_translated_physmap))
> >>> + return -EOPNOTSUPP;
> >>> +
> >> This is never called on XENFEAT_auto_translated_physmap domains,
> there
> >> is a check in privcmd_ioctl_mmap() for that.
> > Yes, that's true but it seems like the wrong place for such a check. I could
> remove that one it you'd prefer.
> 
> I actually think that perhaps we could wrap privcmd_ioctl_mmap() with
> "#ifdef CONFIG_XEN_PV" (#else return -ENOSYS) and move
> xen_remap_domain_gfn_range() to mmu_pv.c. We can then remove it from
> ARM
> code too.
> 
> >
> >>>   return do_remap_gfn(vma, addr, , nr, NULL, prot, domid,
> >> pages);
> >>>  }
> >>>  EXPORT_SYMBOL_GPL(xen_remap_domain_gfn_range);
> >>> @@ -182,6 +185,10 @@ int xen_remap_domain_gfn_array(struct
> >> vm_area_struct *vma,
> >>>  int *err_ptr, pgprot_t prot,
> >>>  unsigned domid, struct page **pages)
> >>>  {
> >>> + if (xen_feature(XENFEAT_auto_translated_physmap))
> >>> + return xen_xlate_remap_gfn_array(vma, addr, gfn, nr,
> >> err_ptr,
> >>> +  prot, domid, pages);
> >>> +
> >> So how did this work before? In fact, I don't see any callers of
> >> xen_xlate_{re|un}map_gfn_range().
> > I assume mean 'array' for the map since there is no
> xen_xlate_remap_gfn_range() function. I'm not quite sure what you're
> asking? Without this patch the mmu code in an x86 domain simply assumes
> the domain is PV... the xlate code is currently only used via the arm mmu
> code (where it clearly knows it's not PV). AFAICS this Is just a 
> straightforward
> buggy assumption in the x86 code.
> 
> Looks like this was originally intended for dom0 PVH and was removed by
> 063334f. So it should indeed be restored.
> 

Ok, I'll re-work the patch with your suggestion re xen_remap_domain_gfn_range() 
and send a v2.

Thanks,

  Paul

> 
> -boris
> 
> >
> >   Paul
> >
> >> -boris
> >>
> >>
> &g

RE: [Xen-devel] [PATCH] x86/xen: support priv-mapping in an HVM tools domain

2017-10-20 Thread Paul Durrant

> -Original Message-
> From: Xen-devel [mailto:xen-devel-boun...@lists.xen.org] On Behalf Of
> Boris Ostrovsky
> Sent: 19 October 2017 18:45
> To: Paul Durrant <paul.durr...@citrix.com>; x...@kernel.org; xen-
> de...@lists.xenproject.org; linux-kernel@vger.kernel.org
> Cc: Juergen Gross <jgr...@suse.com>; Thomas Gleixner
> <t...@linutronix.de>; Ingo Molnar <mi...@redhat.com>; H. Peter Anvin
> <h...@zytor.com>
> Subject: Re: [Xen-devel] [PATCH] x86/xen: support priv-mapping in an HVM
> tools domain
> 
> On 10/19/2017 11:26 AM, Paul Durrant wrote:
> > If the domain has XENFEAT_auto_translated_physmap then use of the PV-
> > specific HYPERVISOR_mmu_update hypercall is clearly incorrect.
> >
> > This patch adds checks in xen_remap_domain_gfn_array() and
> > xen_unmap_domain_gfn_array() which call through to the approprate
> > xlate_mmu function if the feature is present. A check is also added
> > to xen_remap_domain_gfn_range() to fail with -EOPNOTSUPP since this
> > should not be used in an HVM tools domain.
> >
> > Signed-off-by: Paul Durrant <paul.durr...@citrix.com>
> > ---
> > Cc: Boris Ostrovsky <boris.ostrov...@oracle.com>
> > Cc: Juergen Gross <jgr...@suse.com>
> > Cc: Thomas Gleixner <t...@linutronix.de>
> > Cc: Ingo Molnar <mi...@redhat.com>
> > Cc: "H. Peter Anvin" <h...@zytor.com>
> > ---
> >  arch/x86/xen/mmu.c | 14 --
> >  1 file changed, 12 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> > index 3e15345abfe7..d33e7dbe3129 100644
> > --- a/arch/x86/xen/mmu.c
> > +++ b/arch/x86/xen/mmu.c
> > @@ -172,6 +172,9 @@ int xen_remap_domain_gfn_range(struct
> vm_area_struct *vma,
> >pgprot_t prot, unsigned domid,
> >struct page **pages)
> >  {
> > +   if (xen_feature(XENFEAT_auto_translated_physmap))
> > +   return -EOPNOTSUPP;
> > +
> 
> This is never called on XENFEAT_auto_translated_physmap domains, there
> is a check in privcmd_ioctl_mmap() for that.

Yes, that's true but it seems like the wrong place for such a check. I could 
remove that one it you'd prefer.

> 
> > return do_remap_gfn(vma, addr, , nr, NULL, prot, domid,
> pages);
> >  }
> >  EXPORT_SYMBOL_GPL(xen_remap_domain_gfn_range);
> > @@ -182,6 +185,10 @@ int xen_remap_domain_gfn_array(struct
> vm_area_struct *vma,
> >int *err_ptr, pgprot_t prot,
> >unsigned domid, struct page **pages)
> >  {
> > +   if (xen_feature(XENFEAT_auto_translated_physmap))
> > +   return xen_xlate_remap_gfn_array(vma, addr, gfn, nr,
> err_ptr,
> > +prot, domid, pages);
> > +
> 
> So how did this work before? In fact, I don't see any callers of
> xen_xlate_{re|un}map_gfn_range().

I assume mean 'array' for the map since there is no xen_xlate_remap_gfn_range() 
function. I'm not quite sure what you're asking? Without this patch the mmu 
code in an x86 domain simply assumes the domain is PV... the xlate code is 
currently only used via the arm mmu code (where it clearly knows it's not PV). 
AFAICS this Is just a straightforward buggy assumption in the x86 code.

  Paul

> 
> -boris
> 
> 
> > /* We BUG_ON because it's a programmer error to pass a NULL
> err_ptr,
> >  * and the consequences later is quite hard to detect what the actual
> >  * cause of "wrong memory was mapped in".
> > @@ -193,9 +200,12 @@
> EXPORT_SYMBOL_GPL(xen_remap_domain_gfn_array);
> >
> >  /* Returns: 0 success */
> >  int xen_unmap_domain_gfn_range(struct vm_area_struct *vma,
> > -  int numpgs, struct page **pages)
> > +  int nr, struct page **pages)
> >  {
> > -   if (!pages || !xen_feature(XENFEAT_auto_translated_physmap))
> > +   if (xen_feature(XENFEAT_auto_translated_physmap))
> > +   return xen_xlate_unmap_gfn_range(vma, nr, pages);
> > +
> > +   if (!pages)
> > return 0;
> >
> > return -EINVAL;
> 
> 
> ___
> Xen-devel mailing list
> xen-de...@lists.xen.org
> https://lists.xen.org/xen-devel

RE: [Xen-devel] [PATCH] x86/xen: support priv-mapping in an HVM tools domain

2017-10-20 Thread Paul Durrant

> -Original Message-
> From: Xen-devel [mailto:xen-devel-boun...@lists.xen.org] On Behalf Of
> Boris Ostrovsky
> Sent: 19 October 2017 18:45
> To: Paul Durrant ; x...@kernel.org; xen-
> de...@lists.xenproject.org; linux-kernel@vger.kernel.org
> Cc: Juergen Gross ; Thomas Gleixner
> ; Ingo Molnar ; H. Peter Anvin
> 
> Subject: Re: [Xen-devel] [PATCH] x86/xen: support priv-mapping in an HVM
> tools domain
> 
> On 10/19/2017 11:26 AM, Paul Durrant wrote:
> > If the domain has XENFEAT_auto_translated_physmap then use of the PV-
> > specific HYPERVISOR_mmu_update hypercall is clearly incorrect.
> >
> > This patch adds checks in xen_remap_domain_gfn_array() and
> > xen_unmap_domain_gfn_array() which call through to the approprate
> > xlate_mmu function if the feature is present. A check is also added
> > to xen_remap_domain_gfn_range() to fail with -EOPNOTSUPP since this
> > should not be used in an HVM tools domain.
> >
> > Signed-off-by: Paul Durrant 
> > ---
> > Cc: Boris Ostrovsky 
> > Cc: Juergen Gross 
> > Cc: Thomas Gleixner 
> > Cc: Ingo Molnar 
> > Cc: "H. Peter Anvin" 
> > ---
> >  arch/x86/xen/mmu.c | 14 --
> >  1 file changed, 12 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> > index 3e15345abfe7..d33e7dbe3129 100644
> > --- a/arch/x86/xen/mmu.c
> > +++ b/arch/x86/xen/mmu.c
> > @@ -172,6 +172,9 @@ int xen_remap_domain_gfn_range(struct
> vm_area_struct *vma,
> >pgprot_t prot, unsigned domid,
> >struct page **pages)
> >  {
> > +   if (xen_feature(XENFEAT_auto_translated_physmap))
> > +   return -EOPNOTSUPP;
> > +
> 
> This is never called on XENFEAT_auto_translated_physmap domains, there
> is a check in privcmd_ioctl_mmap() for that.

Yes, that's true but it seems like the wrong place for such a check. I could 
remove that one it you'd prefer.

> 
> > return do_remap_gfn(vma, addr, , nr, NULL, prot, domid,
> pages);
> >  }
> >  EXPORT_SYMBOL_GPL(xen_remap_domain_gfn_range);
> > @@ -182,6 +185,10 @@ int xen_remap_domain_gfn_array(struct
> vm_area_struct *vma,
> >int *err_ptr, pgprot_t prot,
> >unsigned domid, struct page **pages)
> >  {
> > +   if (xen_feature(XENFEAT_auto_translated_physmap))
> > +   return xen_xlate_remap_gfn_array(vma, addr, gfn, nr,
> err_ptr,
> > +prot, domid, pages);
> > +
> 
> So how did this work before? In fact, I don't see any callers of
> xen_xlate_{re|un}map_gfn_range().

I assume mean 'array' for the map since there is no xen_xlate_remap_gfn_range() 
function. I'm not quite sure what you're asking? Without this patch the mmu 
code in an x86 domain simply assumes the domain is PV... the xlate code is 
currently only used via the arm mmu code (where it clearly knows it's not PV). 
AFAICS this Is just a straightforward buggy assumption in the x86 code.

  Paul

> 
> -boris
> 
> 
> > /* We BUG_ON because it's a programmer error to pass a NULL
> err_ptr,
> >  * and the consequences later is quite hard to detect what the actual
> >  * cause of "wrong memory was mapped in".
> > @@ -193,9 +200,12 @@
> EXPORT_SYMBOL_GPL(xen_remap_domain_gfn_array);
> >
> >  /* Returns: 0 success */
> >  int xen_unmap_domain_gfn_range(struct vm_area_struct *vma,
> > -  int numpgs, struct page **pages)
> > +  int nr, struct page **pages)
> >  {
> > -   if (!pages || !xen_feature(XENFEAT_auto_translated_physmap))
> > +   if (xen_feature(XENFEAT_auto_translated_physmap))
> > +   return xen_xlate_unmap_gfn_range(vma, nr, pages);
> > +
> > +   if (!pages)
> > return 0;
> >
> > return -EINVAL;
> 
> 
> ___
> Xen-devel mailing list
> xen-de...@lists.xen.org
> https://lists.xen.org/xen-devel

[PATCH] x86/xen: support priv-mapping in an HVM tools domain

2017-10-19 Thread Paul Durrant

If the domain has XENFEAT_auto_translated_physmap then use of the PV-
specific HYPERVISOR_mmu_update hypercall is clearly incorrect.

This patch adds checks in xen_remap_domain_gfn_array() and
xen_unmap_domain_gfn_array() which call through to the approprate
xlate_mmu function if the feature is present. A check is also added
to xen_remap_domain_gfn_range() to fail with -EOPNOTSUPP since this
should not be used in an HVM tools domain.

Signed-off-by: Paul Durrant <paul.durr...@citrix.com>
---
Cc: Boris Ostrovsky <boris.ostrov...@oracle.com>
Cc: Juergen Gross <jgr...@suse.com>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Ingo Molnar <mi...@redhat.com>
Cc: "H. Peter Anvin" <h...@zytor.com>
---
 arch/x86/xen/mmu.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 3e15345abfe7..d33e7dbe3129 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -172,6 +172,9 @@ int xen_remap_domain_gfn_range(struct vm_area_struct *vma,
   pgprot_t prot, unsigned domid,
   struct page **pages)
 {
+   if (xen_feature(XENFEAT_auto_translated_physmap))
+   return -EOPNOTSUPP;
+
return do_remap_gfn(vma, addr, , nr, NULL, prot, domid, pages);
 }
 EXPORT_SYMBOL_GPL(xen_remap_domain_gfn_range);
@@ -182,6 +185,10 @@ int xen_remap_domain_gfn_array(struct vm_area_struct *vma,
   int *err_ptr, pgprot_t prot,
   unsigned domid, struct page **pages)
 {
+   if (xen_feature(XENFEAT_auto_translated_physmap))
+   return xen_xlate_remap_gfn_array(vma, addr, gfn, nr, err_ptr,
+prot, domid, pages);
+
/* We BUG_ON because it's a programmer error to pass a NULL err_ptr,
 * and the consequences later is quite hard to detect what the actual
 * cause of "wrong memory was mapped in".
@@ -193,9 +200,12 @@ EXPORT_SYMBOL_GPL(xen_remap_domain_gfn_array);
 
 /* Returns: 0 success */
 int xen_unmap_domain_gfn_range(struct vm_area_struct *vma,
-  int numpgs, struct page **pages)
+  int nr, struct page **pages)
 {
-   if (!pages || !xen_feature(XENFEAT_auto_translated_physmap))
+   if (xen_feature(XENFEAT_auto_translated_physmap))
+   return xen_xlate_unmap_gfn_range(vma, nr, pages);
+
+   if (!pages)
return 0;
 
return -EINVAL;
-- 
2.11.0

[PATCH] x86/xen: support priv-mapping in an HVM tools domain

2017-10-19 Thread Paul Durrant

If the domain has XENFEAT_auto_translated_physmap then use of the PV-
specific HYPERVISOR_mmu_update hypercall is clearly incorrect.

This patch adds checks in xen_remap_domain_gfn_array() and
xen_unmap_domain_gfn_array() which call through to the approprate
xlate_mmu function if the feature is present. A check is also added
to xen_remap_domain_gfn_range() to fail with -EOPNOTSUPP since this
should not be used in an HVM tools domain.

Signed-off-by: Paul Durrant 
---
Cc: Boris Ostrovsky 
Cc: Juergen Gross 
Cc: Thomas Gleixner 
Cc: Ingo Molnar 
Cc: "H. Peter Anvin" 
---
 arch/x86/xen/mmu.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 3e15345abfe7..d33e7dbe3129 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -172,6 +172,9 @@ int xen_remap_domain_gfn_range(struct vm_area_struct *vma,
   pgprot_t prot, unsigned domid,
   struct page **pages)
 {
+   if (xen_feature(XENFEAT_auto_translated_physmap))
+   return -EOPNOTSUPP;
+
return do_remap_gfn(vma, addr, , nr, NULL, prot, domid, pages);
 }
 EXPORT_SYMBOL_GPL(xen_remap_domain_gfn_range);
@@ -182,6 +185,10 @@ int xen_remap_domain_gfn_array(struct vm_area_struct *vma,
   int *err_ptr, pgprot_t prot,
   unsigned domid, struct page **pages)
 {
+   if (xen_feature(XENFEAT_auto_translated_physmap))
+   return xen_xlate_remap_gfn_array(vma, addr, gfn, nr, err_ptr,
+prot, domid, pages);
+
/* We BUG_ON because it's a programmer error to pass a NULL err_ptr,
 * and the consequences later is quite hard to detect what the actual
 * cause of "wrong memory was mapped in".
@@ -193,9 +200,12 @@ EXPORT_SYMBOL_GPL(xen_remap_domain_gfn_array);
 
 /* Returns: 0 success */
 int xen_unmap_domain_gfn_range(struct vm_area_struct *vma,
-  int numpgs, struct page **pages)
+  int nr, struct page **pages)
 {
-   if (!pages || !xen_feature(XENFEAT_auto_translated_physmap))
+   if (xen_feature(XENFEAT_auto_translated_physmap))
+   return xen_xlate_unmap_gfn_range(vma, nr, pages);
+
+   if (!pages)
return 0;
 
return -EINVAL;
-- 
2.11.0

RE: [PATCH] x86/xen: support priv-mapping in an HVM tools domain

2017-10-19 Thread Paul Durrant

Apologies... I misformatted this. I will re-send.

  Paul
> -Original Message-
> From: Paul Durrant [mailto:paul.durr...@citrix.com]
> Sent: 19 October 2017 16:24
> To: x...@kernel.org; xen-de...@lists.xenproject.org; linux-
> ker...@vger.kernel.org
> Cc: Paul Durrant <paul.durr...@citrix.com>
> Subject: [PATCH] x86/xen: support priv-mapping in an HVM tools domain
> 
> If the domain has XENFEAT_auto_translated_physmap then use of the PV-
> specific HYPERVISOR_mmu_update hypercall is clearly incorrect.
> 
> This patch adds checks in xen_remap_domain_gfn_array() and
> xen_unmap_domain_gfn_array() which call through to the approprate
> xlate_mmu function if the feature is present. A check is also added
> to xen_remap_domain_gfn_range() to fail with -EOPNOTSUPP since this
> should not be used in an HVM tools domain.
> 
> Signed-off-by: Paul Durrant <paul.durr...@citrix.com>
> ---
> Boris Ostrovsky <boris.ostrov...@oracle.com>
> Juergen Gross <jgr...@suse.com>
> Thomas Gleixner <t...@linutronix.de>
> Ingo Molnar <mi...@redhat.com>
> "H. Peter Anvin" <h...@zytor.com>
> ---
>  arch/x86/xen/mmu.c | 14 --
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> index 3e15345abfe7..d33e7dbe3129 100644
> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -172,6 +172,9 @@ int xen_remap_domain_gfn_range(struct
> vm_area_struct *vma,
>  pgprot_t prot, unsigned domid,
>  struct page **pages)
>  {
> + if (xen_feature(XENFEAT_auto_translated_physmap))
> + return -EOPNOTSUPP;
> +
>   return do_remap_gfn(vma, addr, , nr, NULL, prot, domid,
> pages);
>  }
>  EXPORT_SYMBOL_GPL(xen_remap_domain_gfn_range);
> @@ -182,6 +185,10 @@ int xen_remap_domain_gfn_array(struct
> vm_area_struct *vma,
>  int *err_ptr, pgprot_t prot,
>  unsigned domid, struct page **pages)
>  {
> + if (xen_feature(XENFEAT_auto_translated_physmap))
> + return xen_xlate_remap_gfn_array(vma, addr, gfn, nr,
> err_ptr,
> +  prot, domid, pages);
> +
>   /* We BUG_ON because it's a programmer error to pass a NULL
> err_ptr,
>* and the consequences later is quite hard to detect what the actual
>* cause of "wrong memory was mapped in".
> @@ -193,9 +200,12 @@
> EXPORT_SYMBOL_GPL(xen_remap_domain_gfn_array);
> 
>  /* Returns: 0 success */
>  int xen_unmap_domain_gfn_range(struct vm_area_struct *vma,
> -int numpgs, struct page **pages)
> +int nr, struct page **pages)
>  {
> - if (!pages || !xen_feature(XENFEAT_auto_translated_physmap))
> + if (xen_feature(XENFEAT_auto_translated_physmap))
> + return xen_xlate_unmap_gfn_range(vma, nr, pages);
> +
> + if (!pages)
>   return 0;
> 
>   return -EINVAL;
> --
> 2.11.0

RE: [PATCH] x86/xen: support priv-mapping in an HVM tools domain

2017-10-19 Thread Paul Durrant

Apologies... I misformatted this. I will re-send.

  Paul
> -Original Message-
> From: Paul Durrant [mailto:paul.durr...@citrix.com]
> Sent: 19 October 2017 16:24
> To: x...@kernel.org; xen-de...@lists.xenproject.org; linux-
> ker...@vger.kernel.org
> Cc: Paul Durrant 
> Subject: [PATCH] x86/xen: support priv-mapping in an HVM tools domain
> 
> If the domain has XENFEAT_auto_translated_physmap then use of the PV-
> specific HYPERVISOR_mmu_update hypercall is clearly incorrect.
> 
> This patch adds checks in xen_remap_domain_gfn_array() and
> xen_unmap_domain_gfn_array() which call through to the approprate
> xlate_mmu function if the feature is present. A check is also added
> to xen_remap_domain_gfn_range() to fail with -EOPNOTSUPP since this
> should not be used in an HVM tools domain.
> 
> Signed-off-by: Paul Durrant 
> ---
> Boris Ostrovsky 
> Juergen Gross 
> Thomas Gleixner 
> Ingo Molnar 
> "H. Peter Anvin" 
> ---
>  arch/x86/xen/mmu.c | 14 --
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
> index 3e15345abfe7..d33e7dbe3129 100644
> --- a/arch/x86/xen/mmu.c
> +++ b/arch/x86/xen/mmu.c
> @@ -172,6 +172,9 @@ int xen_remap_domain_gfn_range(struct
> vm_area_struct *vma,
>  pgprot_t prot, unsigned domid,
>  struct page **pages)
>  {
> + if (xen_feature(XENFEAT_auto_translated_physmap))
> + return -EOPNOTSUPP;
> +
>   return do_remap_gfn(vma, addr, , nr, NULL, prot, domid,
> pages);
>  }
>  EXPORT_SYMBOL_GPL(xen_remap_domain_gfn_range);
> @@ -182,6 +185,10 @@ int xen_remap_domain_gfn_array(struct
> vm_area_struct *vma,
>  int *err_ptr, pgprot_t prot,
>  unsigned domid, struct page **pages)
>  {
> + if (xen_feature(XENFEAT_auto_translated_physmap))
> + return xen_xlate_remap_gfn_array(vma, addr, gfn, nr,
> err_ptr,
> +  prot, domid, pages);
> +
>   /* We BUG_ON because it's a programmer error to pass a NULL
> err_ptr,
>* and the consequences later is quite hard to detect what the actual
>* cause of "wrong memory was mapped in".
> @@ -193,9 +200,12 @@
> EXPORT_SYMBOL_GPL(xen_remap_domain_gfn_array);
> 
>  /* Returns: 0 success */
>  int xen_unmap_domain_gfn_range(struct vm_area_struct *vma,
> -int numpgs, struct page **pages)
> +int nr, struct page **pages)
>  {
> - if (!pages || !xen_feature(XENFEAT_auto_translated_physmap))
> + if (xen_feature(XENFEAT_auto_translated_physmap))
> + return xen_xlate_unmap_gfn_range(vma, nr, pages);
> +
> + if (!pages)
>   return 0;
> 
>   return -EINVAL;
> --
> 2.11.0

[PATCH] x86/xen: support priv-mapping in an HVM tools domain

2017-10-19 Thread Paul Durrant

If the domain has XENFEAT_auto_translated_physmap then use of the PV-
specific HYPERVISOR_mmu_update hypercall is clearly incorrect.

This patch adds checks in xen_remap_domain_gfn_array() and
xen_unmap_domain_gfn_array() which call through to the approprate
xlate_mmu function if the feature is present. A check is also added
to xen_remap_domain_gfn_range() to fail with -EOPNOTSUPP since this
should not be used in an HVM tools domain.

Signed-off-by: Paul Durrant <paul.durr...@citrix.com>
---
Boris Ostrovsky <boris.ostrov...@oracle.com>
Juergen Gross <jgr...@suse.com>
Thomas Gleixner <t...@linutronix.de>
Ingo Molnar <mi...@redhat.com>
"H. Peter Anvin" <h...@zytor.com>
---
 arch/x86/xen/mmu.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 3e15345abfe7..d33e7dbe3129 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -172,6 +172,9 @@ int xen_remap_domain_gfn_range(struct vm_area_struct *vma,
   pgprot_t prot, unsigned domid,
   struct page **pages)
 {
+   if (xen_feature(XENFEAT_auto_translated_physmap))
+   return -EOPNOTSUPP;
+
return do_remap_gfn(vma, addr, , nr, NULL, prot, domid, pages);
 }
 EXPORT_SYMBOL_GPL(xen_remap_domain_gfn_range);
@@ -182,6 +185,10 @@ int xen_remap_domain_gfn_array(struct vm_area_struct *vma,
   int *err_ptr, pgprot_t prot,
   unsigned domid, struct page **pages)
 {
+   if (xen_feature(XENFEAT_auto_translated_physmap))
+   return xen_xlate_remap_gfn_array(vma, addr, gfn, nr, err_ptr,
+prot, domid, pages);
+
/* We BUG_ON because it's a programmer error to pass a NULL err_ptr,
 * and the consequences later is quite hard to detect what the actual
 * cause of "wrong memory was mapped in".
@@ -193,9 +200,12 @@ EXPORT_SYMBOL_GPL(xen_remap_domain_gfn_array);
 
 /* Returns: 0 success */
 int xen_unmap_domain_gfn_range(struct vm_area_struct *vma,
-  int numpgs, struct page **pages)
+  int nr, struct page **pages)
 {
-   if (!pages || !xen_feature(XENFEAT_auto_translated_physmap))
+   if (xen_feature(XENFEAT_auto_translated_physmap))
+   return xen_xlate_unmap_gfn_range(vma, nr, pages);
+
+   if (!pages)
return 0;
 
return -EINVAL;
-- 
2.11.0

[PATCH] x86/xen: support priv-mapping in an HVM tools domain

2017-10-19 Thread Paul Durrant

If the domain has XENFEAT_auto_translated_physmap then use of the PV-
specific HYPERVISOR_mmu_update hypercall is clearly incorrect.

This patch adds checks in xen_remap_domain_gfn_array() and
xen_unmap_domain_gfn_array() which call through to the approprate
xlate_mmu function if the feature is present. A check is also added
to xen_remap_domain_gfn_range() to fail with -EOPNOTSUPP since this
should not be used in an HVM tools domain.

Signed-off-by: Paul Durrant 
---
Boris Ostrovsky 
Juergen Gross 
Thomas Gleixner 
Ingo Molnar 
"H. Peter Anvin" 
---
 arch/x86/xen/mmu.c | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 3e15345abfe7..d33e7dbe3129 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -172,6 +172,9 @@ int xen_remap_domain_gfn_range(struct vm_area_struct *vma,
   pgprot_t prot, unsigned domid,
   struct page **pages)
 {
+   if (xen_feature(XENFEAT_auto_translated_physmap))
+   return -EOPNOTSUPP;
+
return do_remap_gfn(vma, addr, , nr, NULL, prot, domid, pages);
 }
 EXPORT_SYMBOL_GPL(xen_remap_domain_gfn_range);
@@ -182,6 +185,10 @@ int xen_remap_domain_gfn_array(struct vm_area_struct *vma,
   int *err_ptr, pgprot_t prot,
   unsigned domid, struct page **pages)
 {
+   if (xen_feature(XENFEAT_auto_translated_physmap))
+   return xen_xlate_remap_gfn_array(vma, addr, gfn, nr, err_ptr,
+prot, domid, pages);
+
/* We BUG_ON because it's a programmer error to pass a NULL err_ptr,
 * and the consequences later is quite hard to detect what the actual
 * cause of "wrong memory was mapped in".
@@ -193,9 +200,12 @@ EXPORT_SYMBOL_GPL(xen_remap_domain_gfn_array);
 
 /* Returns: 0 success */
 int xen_unmap_domain_gfn_range(struct vm_area_struct *vma,
-  int numpgs, struct page **pages)
+  int nr, struct page **pages)
 {
-   if (!pages || !xen_feature(XENFEAT_auto_translated_physmap))
+   if (xen_feature(XENFEAT_auto_translated_physmap))
+   return xen_xlate_unmap_gfn_range(vma, nr, pages);
+
+   if (!pages)
return 0;
 
return -EINVAL;
-- 
2.11.0

RE: [Xen-devel] [PATCH] x86/xen: allow userspace access during hypercalls

2017-06-26 Thread Paul Durrant

> -Original Message-
> From: 'Marek Marczykowski-Górecki'
> [mailto:marma...@invisiblethingslab.com]
> Sent: 26 June 2017 14:22
> To: Paul Durrant <paul.durr...@citrix.com>
> Cc: Juergen Groß <jgr...@suse.com>; Andrew Cooper
> <andrew.coop...@citrix.com>; x...@kernel.org; linux-
> ker...@vger.kernel.org; sta...@vger.kernel.org; xen-
> de...@lists.xenproject.org; Boris Ostrovsky <boris.ostrov...@oracle.com>
> Subject: Re: [Xen-devel] [PATCH] x86/xen: allow userspace access during
> hypercalls
> 
> On Mon, Jun 26, 2017 at 01:09:58PM +, Paul Durrant wrote:
> > > -Original Message-
> > > From: Xen-devel [mailto:xen-devel-boun...@lists.xen.org] On Behalf Of
> > > Marek Marczykowski-Górecki
> > > Sent: 26 June 2017 13:45
> > > To: Juergen Groß <jgr...@suse.com>
> > > Cc: Andrew Cooper <andrew.coop...@citrix.com>; x...@kernel.org;
> linux-
> > > ker...@vger.kernel.org; sta...@vger.kernel.org; xen-
> > > de...@lists.xenproject.org; Boris Ostrovsky
> <boris.ostrov...@oracle.com>
> > > Subject: Re: [Xen-devel] [PATCH] x86/xen: allow userspace access during
> > > hypercalls
> > >
> > > On Mon, Jun 26, 2017 at 02:05:48PM +0200, Juergen Groß wrote:
> > > > On 06/23/2017 02:47 PM, Marek Marczykowski-Górecki wrote:
> > > > > Userspace application can do a hypercall through /dev/xen/privcmd,
> and
> > > > > some for some hypercalls argument is a pointers to user-provided
> > > > > structure. When SMAP is supported and enabled, hypervisor can't
> access.
> > > > > So, lets allow it.
> > > >
> > > > What about HYPERVISOR_dm_op?
> > >
> > > Indeed, arguments copied to kernel space there are only addresses of
> > > buffers. Will send v2 in a moment.
> > > But I can't test it right now, as for my understanding this require
> > > HVM/PVHv2 dom0 or stubdomain...
> > >
> >
> > No, you don't need anything particularly special to use dm_op. Just up-to-
> date xen, privcmd, and QEMU. QEMU should end up using dm_op by default
> if all three are in place.
> 
> But the issue this patch fixes applies only to hypercalls issued from HVM.

Oh, I see what you mean. Well I guess you could manually run QEMU from an HVM 
domain, but it would be a bit of a faff to set up.

  Paul

> 
> --
> Best Regards,
> Marek Marczykowski-Górecki
> Invisible Things Lab
> A: Because it messes up the order in which people normally read text.
> Q: Why is top-posting such a bad thing?

RE: [Xen-devel] [PATCH] x86/xen: allow userspace access during hypercalls

2017-06-26 Thread Paul Durrant

> -Original Message-
> From: 'Marek Marczykowski-Górecki'
> [mailto:marma...@invisiblethingslab.com]
> Sent: 26 June 2017 14:22
> To: Paul Durrant 
> Cc: Juergen Groß ; Andrew Cooper
> ; x...@kernel.org; linux-
> ker...@vger.kernel.org; sta...@vger.kernel.org; xen-
> de...@lists.xenproject.org; Boris Ostrovsky 
> Subject: Re: [Xen-devel] [PATCH] x86/xen: allow userspace access during
> hypercalls
> 
> On Mon, Jun 26, 2017 at 01:09:58PM +, Paul Durrant wrote:
> > > -Original Message-
> > > From: Xen-devel [mailto:xen-devel-boun...@lists.xen.org] On Behalf Of
> > > Marek Marczykowski-Górecki
> > > Sent: 26 June 2017 13:45
> > > To: Juergen Groß 
> > > Cc: Andrew Cooper ; x...@kernel.org;
> linux-
> > > ker...@vger.kernel.org; sta...@vger.kernel.org; xen-
> > > de...@lists.xenproject.org; Boris Ostrovsky
> 
> > > Subject: Re: [Xen-devel] [PATCH] x86/xen: allow userspace access during
> > > hypercalls
> > >
> > > On Mon, Jun 26, 2017 at 02:05:48PM +0200, Juergen Groß wrote:
> > > > On 06/23/2017 02:47 PM, Marek Marczykowski-Górecki wrote:
> > > > > Userspace application can do a hypercall through /dev/xen/privcmd,
> and
> > > > > some for some hypercalls argument is a pointers to user-provided
> > > > > structure. When SMAP is supported and enabled, hypervisor can't
> access.
> > > > > So, lets allow it.
> > > >
> > > > What about HYPERVISOR_dm_op?
> > >
> > > Indeed, arguments copied to kernel space there are only addresses of
> > > buffers. Will send v2 in a moment.
> > > But I can't test it right now, as for my understanding this require
> > > HVM/PVHv2 dom0 or stubdomain...
> > >
> >
> > No, you don't need anything particularly special to use dm_op. Just up-to-
> date xen, privcmd, and QEMU. QEMU should end up using dm_op by default
> if all three are in place.
> 
> But the issue this patch fixes applies only to hypercalls issued from HVM.

Oh, I see what you mean. Well I guess you could manually run QEMU from an HVM 
domain, but it would be a bit of a faff to set up.

  Paul

> 
> --
> Best Regards,
> Marek Marczykowski-Górecki
> Invisible Things Lab
> A: Because it messes up the order in which people normally read text.
> Q: Why is top-posting such a bad thing?

RE: [Xen-devel] [PATCH] x86/xen: allow userspace access during hypercalls

2017-06-26 Thread Paul Durrant

> -Original Message-
> From: Xen-devel [mailto:xen-devel-boun...@lists.xen.org] On Behalf Of
> Marek Marczykowski-Górecki
> Sent: 26 June 2017 13:45
> To: Juergen Groß 
> Cc: Andrew Cooper ; x...@kernel.org; linux-
> ker...@vger.kernel.org; sta...@vger.kernel.org; xen-
> de...@lists.xenproject.org; Boris Ostrovsky 
> Subject: Re: [Xen-devel] [PATCH] x86/xen: allow userspace access during
> hypercalls
> 
> On Mon, Jun 26, 2017 at 02:05:48PM +0200, Juergen Groß wrote:
> > On 06/23/2017 02:47 PM, Marek Marczykowski-Górecki wrote:
> > > Userspace application can do a hypercall through /dev/xen/privcmd, and
> > > some for some hypercalls argument is a pointers to user-provided
> > > structure. When SMAP is supported and enabled, hypervisor can't access.
> > > So, lets allow it.
> >
> > What about HYPERVISOR_dm_op?
> 
> Indeed, arguments copied to kernel space there are only addresses of
> buffers. Will send v2 in a moment.
> But I can't test it right now, as for my understanding this require
> HVM/PVHv2 dom0 or stubdomain...
> 

No, you don't need anything particularly special to use dm_op. Just up-to-date 
xen, privcmd, and QEMU. QEMU should end up using dm_op by default if all three 
are in place.

  Paul

> --
> Best Regards,
> Marek Marczykowski-Górecki
> Invisible Things Lab
> A: Because it messes up the order in which people normally read text.
> Q: Why is top-posting such a bad thing?

RE: [Xen-devel] [PATCH] x86/xen: allow userspace access during hypercalls

2017-06-26 Thread Paul Durrant

> -Original Message-
> From: Xen-devel [mailto:xen-devel-boun...@lists.xen.org] On Behalf Of
> Marek Marczykowski-Górecki
> Sent: 26 June 2017 13:45
> To: Juergen Groß 
> Cc: Andrew Cooper ; x...@kernel.org; linux-
> ker...@vger.kernel.org; sta...@vger.kernel.org; xen-
> de...@lists.xenproject.org; Boris Ostrovsky 
> Subject: Re: [Xen-devel] [PATCH] x86/xen: allow userspace access during
> hypercalls
> 
> On Mon, Jun 26, 2017 at 02:05:48PM +0200, Juergen Groß wrote:
> > On 06/23/2017 02:47 PM, Marek Marczykowski-Górecki wrote:
> > > Userspace application can do a hypercall through /dev/xen/privcmd, and
> > > some for some hypercalls argument is a pointers to user-provided
> > > structure. When SMAP is supported and enabled, hypervisor can't access.
> > > So, lets allow it.
> >
> > What about HYPERVISOR_dm_op?
> 
> Indeed, arguments copied to kernel space there are only addresses of
> buffers. Will send v2 in a moment.
> But I can't test it right now, as for my understanding this require
> HVM/PVHv2 dom0 or stubdomain...
> 

No, you don't need anything particularly special to use dm_op. Just up-to-date 
xen, privcmd, and QEMU. QEMU should end up using dm_op by default if all three 
are in place.

  Paul

> --
> Best Regards,
> Marek Marczykowski-Górecki
> Invisible Things Lab
> A: Because it messes up the order in which people normally read text.
> Q: Why is top-posting such a bad thing?

RE: BUG due to "xen-netback: protect resource cleaning on XenBus disconnect"

2017-03-02 Thread Paul Durrant

> -Original Message-
> From: Juergen Gross [mailto:jgr...@suse.com]
> Sent: 02 March 2017 12:13
> To: Wei Liu <wei.l...@citrix.com>
> Cc: Igor Druzhinin <igor.druzhi...@citrix.com>; xen-devel  de...@lists.xenproject.org>; Linux Kernel Mailing List  ker...@vger.kernel.org>; net...@vger.kernel.org; Boris Ostrovsky
> <boris.ostrov...@oracle.com>; David Miller <da...@davemloft.net>; Paul
> Durrant <paul.durr...@citrix.com>
> Subject: Re: BUG due to "xen-netback: protect resource cleaning on XenBus
> disconnect"
> 
> On 02/03/17 13:06, Wei Liu wrote:
> > On Thu, Mar 02, 2017 at 12:56:20PM +0100, Juergen Gross wrote:
> >> With commits f16f1df65 and 9a6cdf52b we get in our Xen testing:
> >>
> >> [  174.512861] switch: port 2(vif3.0) entered disabled state
> >> [  174.522735] BUG: sleeping function called from invalid context at
> >> /home/build/linux-linus/mm/vmalloc.c:1441
> >> [  174.523451] in_atomic(): 1, irqs_disabled(): 0, pid: 28, name: xenwatch
> >> [  174.524131] CPU: 1 PID: 28 Comm: xenwatch Tainted: GW
> >> 4.10.0upstream-11073-g4977ab6-dirty #1
> >> [  174.524819] Hardware name: MSI MS-7680/H61M-P23 (MS-7680), BIOS
> V17.0
> >> 03/14/2011
> >> [  174.525517] Call Trace:
> >> [  174.526217]  show_stack+0x23/0x60
> >> [  174.526899]  dump_stack+0x5b/0x88
> >> [  174.527562]  ___might_sleep+0xde/0x130
> >> [  174.528208]  __might_sleep+0x35/0xa0
> >> [  174.528840]  ? _raw_spin_unlock_irqrestore+0x13/0x20
> >> [  174.529463]  ? __wake_up+0x40/0x50
> >> [  174.530089]  remove_vm_area+0x20/0x90
> >> [  174.530724]  __vunmap+0x1d/0xc0
> >> [  174.531346]  ? delete_object_full+0x13/0x20
> >> [  174.531973]  vfree+0x40/0x80
> >> [  174.532594]  set_backend_state+0x18a/0xa90
> >> [  174.533221]  ? dwc_scan_descriptors+0x24d/0x430
> >> [  174.533850]  ? kfree+0x5b/0xc0
> >> [  174.534476]  ? xenbus_read+0x3d/0x50
> >> [  174.535101]  ? xenbus_read+0x3d/0x50
> >> [  174.535718]  ? xenbus_gather+0x31/0x90
> >> [  174.536332]  ? ___might_sleep+0xf6/0x130
> >> [  174.536945]  frontend_changed+0x6b/0xd0
> >> [  174.537565]  xenbus_otherend_changed+0x7d/0x80
> >> [  174.538185]  frontend_changed+0x12/0x20
> >> [  174.538803]  xenwatch_thread+0x74/0x110
> >> [  174.539417]  ? woken_wake_function+0x20/0x20
> >> [  174.540049]  kthread+0xe5/0x120
> >> [  174.540663]  ? xenbus_printf+0x50/0x50
> >> [  174.541278]  ? __kthread_init_worker+0x40/0x40
> >> [  174.541898]  ret_from_fork+0x21/0x2c
> >> [  174.548635] switch: port 2(vif3.0) entered disabled state
> >>
> >> I believe calling vfree() when holding a spin_lock isn't a good idea.
> >>
> >
> > Use vfree_atomic instead?
> 
> Hmm, isn't this overkill here?
> 
> You can just set a local variable with the address and do vfree() after
> releasing the lock.
> 

Yep, that's what I was thinking. Patch coming shortly.

  Paul

> 
> Juergen

RE: BUG due to "xen-netback: protect resource cleaning on XenBus disconnect"

2017-03-02 Thread Paul Durrant

> -Original Message-
> From: Juergen Gross [mailto:jgr...@suse.com]
> Sent: 02 March 2017 12:13
> To: Wei Liu 
> Cc: Igor Druzhinin ; xen-devel  de...@lists.xenproject.org>; Linux Kernel Mailing List  ker...@vger.kernel.org>; net...@vger.kernel.org; Boris Ostrovsky
> ; David Miller ; Paul
> Durrant 
> Subject: Re: BUG due to "xen-netback: protect resource cleaning on XenBus
> disconnect"
> 
> On 02/03/17 13:06, Wei Liu wrote:
> > On Thu, Mar 02, 2017 at 12:56:20PM +0100, Juergen Gross wrote:
> >> With commits f16f1df65 and 9a6cdf52b we get in our Xen testing:
> >>
> >> [  174.512861] switch: port 2(vif3.0) entered disabled state
> >> [  174.522735] BUG: sleeping function called from invalid context at
> >> /home/build/linux-linus/mm/vmalloc.c:1441
> >> [  174.523451] in_atomic(): 1, irqs_disabled(): 0, pid: 28, name: xenwatch
> >> [  174.524131] CPU: 1 PID: 28 Comm: xenwatch Tainted: GW
> >> 4.10.0upstream-11073-g4977ab6-dirty #1
> >> [  174.524819] Hardware name: MSI MS-7680/H61M-P23 (MS-7680), BIOS
> V17.0
> >> 03/14/2011
> >> [  174.525517] Call Trace:
> >> [  174.526217]  show_stack+0x23/0x60
> >> [  174.526899]  dump_stack+0x5b/0x88
> >> [  174.527562]  ___might_sleep+0xde/0x130
> >> [  174.528208]  __might_sleep+0x35/0xa0
> >> [  174.528840]  ? _raw_spin_unlock_irqrestore+0x13/0x20
> >> [  174.529463]  ? __wake_up+0x40/0x50
> >> [  174.530089]  remove_vm_area+0x20/0x90
> >> [  174.530724]  __vunmap+0x1d/0xc0
> >> [  174.531346]  ? delete_object_full+0x13/0x20
> >> [  174.531973]  vfree+0x40/0x80
> >> [  174.532594]  set_backend_state+0x18a/0xa90
> >> [  174.533221]  ? dwc_scan_descriptors+0x24d/0x430
> >> [  174.533850]  ? kfree+0x5b/0xc0
> >> [  174.534476]  ? xenbus_read+0x3d/0x50
> >> [  174.535101]  ? xenbus_read+0x3d/0x50
> >> [  174.535718]  ? xenbus_gather+0x31/0x90
> >> [  174.536332]  ? ___might_sleep+0xf6/0x130
> >> [  174.536945]  frontend_changed+0x6b/0xd0
> >> [  174.537565]  xenbus_otherend_changed+0x7d/0x80
> >> [  174.538185]  frontend_changed+0x12/0x20
> >> [  174.538803]  xenwatch_thread+0x74/0x110
> >> [  174.539417]  ? woken_wake_function+0x20/0x20
> >> [  174.540049]  kthread+0xe5/0x120
> >> [  174.540663]  ? xenbus_printf+0x50/0x50
> >> [  174.541278]  ? __kthread_init_worker+0x40/0x40
> >> [  174.541898]  ret_from_fork+0x21/0x2c
> >> [  174.548635] switch: port 2(vif3.0) entered disabled state
> >>
> >> I believe calling vfree() when holding a spin_lock isn't a good idea.
> >>
> >
> > Use vfree_atomic instead?
> 
> Hmm, isn't this overkill here?
> 
> You can just set a local variable with the address and do vfree() after
> releasing the lock.
> 

Yep, that's what I was thinking. Patch coming shortly.

  Paul

> 
> Juergen

RE: [PATCH v3 2/3] xen/privcmd: Add IOCTL_PRIVCMD_DM_OP

2017-02-15 Thread Paul Durrant

> -Original Message-
> From: Stefano Stabellini [mailto:sstabell...@kernel.org]
> Sent: 14 February 2017 18:39
> To: Boris Ostrovsky <boris.ostrov...@oracle.com>
> Cc: Paul Durrant <paul.durr...@citrix.com>; xen-de...@lists.xenproject.org;
> linux-kernel@vger.kernel.org; Stefano Stabellini <sstabell...@kernel.org>;
> Juergen Gross <jgr...@suse.com>
> Subject: Re: [PATCH v3 2/3] xen/privcmd: Add IOCTL_PRIVCMD_DM_OP
> 
> On Tue, 14 Feb 2017, Boris Ostrovsky wrote:
> > On 02/13/2017 12:03 PM, Paul Durrant wrote:
> > > Recently a new dm_op[1] hypercall was added to Xen to provide a
> mechanism
> > > for restricting device emulators (such as QEMU) to a limited set of
> > > hypervisor operations, and being able to audit those operations in the
> > > kernel of the domain in which they run.
> > >
> > > This patch adds IOCTL_PRIVCMD_DM_OP as gateway for
> __HYPERVISOR_dm_op.
> > >
> > > NOTE: There is no requirement for user-space code to bounce data
> through
> > >   locked memory buffers (as with IOCTL_PRIVCMD_HYPERCALL) since
> > >   privcmd has enough information to lock the original buffers
> > >   directly.
> > >
> > > [1] http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=524a98c2
> > >
> > > Signed-off-by: Paul Durrant <paul.durr...@citrix.com>
> >
> >
> > Stefano,
> >
> > Are you OK with ARM changes?
> 
> Yes:
> 
> Acked-by: Stefano Stabellini <sstabell...@kernel.org>
> 
> Thanks for CC'ing me, I missed the patch.
> 

Sorry. My fault for nor re-running get-maintaner.pl after fixing up the ARM 
build.

  Paul

RE: [PATCH v3 2/3] xen/privcmd: Add IOCTL_PRIVCMD_DM_OP

2017-02-15 Thread Paul Durrant

> -Original Message-
> From: Stefano Stabellini [mailto:sstabell...@kernel.org]
> Sent: 14 February 2017 18:39
> To: Boris Ostrovsky 
> Cc: Paul Durrant ; xen-de...@lists.xenproject.org;
> linux-kernel@vger.kernel.org; Stefano Stabellini ;
> Juergen Gross 
> Subject: Re: [PATCH v3 2/3] xen/privcmd: Add IOCTL_PRIVCMD_DM_OP
> 
> On Tue, 14 Feb 2017, Boris Ostrovsky wrote:
> > On 02/13/2017 12:03 PM, Paul Durrant wrote:
> > > Recently a new dm_op[1] hypercall was added to Xen to provide a
> mechanism
> > > for restricting device emulators (such as QEMU) to a limited set of
> > > hypervisor operations, and being able to audit those operations in the
> > > kernel of the domain in which they run.
> > >
> > > This patch adds IOCTL_PRIVCMD_DM_OP as gateway for
> __HYPERVISOR_dm_op.
> > >
> > > NOTE: There is no requirement for user-space code to bounce data
> through
> > >   locked memory buffers (as with IOCTL_PRIVCMD_HYPERCALL) since
> > >   privcmd has enough information to lock the original buffers
> > >   directly.
> > >
> > > [1] http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=524a98c2
> > >
> > > Signed-off-by: Paul Durrant 
> >
> >
> > Stefano,
> >
> > Are you OK with ARM changes?
> 
> Yes:
> 
> Acked-by: Stefano Stabellini 
> 
> Thanks for CC'ing me, I missed the patch.
> 

Sorry. My fault for nor re-running get-maintaner.pl after fixing up the ARM 
build.

  Paul

RE: [PATCH v3 2/3] xen/privcmd: Add IOCTL_PRIVCMD_DM_OP

2017-02-14 Thread Paul Durrant

My previous reply got bounced because my tablet insisted on using HTML...

> -Original Message-
> 
> These need to be static. (I can fix it when committing.)

Ok, thanks.

> 
> And I am still not sure about using XEN_PAGE_SIZE. There is no
> dependency in the hypervisor on buffers being page-sized, is there? If
> not, XEN_PAGE_SIZE is here just because it happens to be 4K, which is a
> reasonable value.
> 
> How about just setting it to 4096?
> 

I chose XEN_PAGE_SIZE because the hypercall will eventually copy in the buffer 
so it seemed like a reasonable value to use. If you want to just use 4096 then 
I am ok with that.

  Paul

PS: If you want to change from XEN_PAGE_SIZE to 4096 then I assume you are 
happy to do this at commit and don't need me to send a v4?

> 
> -boris

RE: [PATCH v3 2/3] xen/privcmd: Add IOCTL_PRIVCMD_DM_OP

2017-02-14 Thread Paul Durrant

My previous reply got bounced because my tablet insisted on using HTML...

> -Original Message-
> 
> These need to be static. (I can fix it when committing.)

Ok, thanks.

> 
> And I am still not sure about using XEN_PAGE_SIZE. There is no
> dependency in the hypervisor on buffers being page-sized, is there? If
> not, XEN_PAGE_SIZE is here just because it happens to be 4K, which is a
> reasonable value.
> 
> How about just setting it to 4096?
> 

I chose XEN_PAGE_SIZE because the hypercall will eventually copy in the buffer 
so it seemed like a reasonable value to use. If you want to just use 4096 then 
I am ok with that.

  Paul

PS: If you want to change from XEN_PAGE_SIZE to 4096 then I assume you are 
happy to do this at commit and don't need me to send a v4?

> 
> -boris

[PATCH v3 2/3] xen/privcmd: Add IOCTL_PRIVCMD_DM_OP

2017-02-13 Thread Paul Durrant

Recently a new dm_op[1] hypercall was added to Xen to provide a mechanism
for restricting device emulators (such as QEMU) to a limited set of
hypervisor operations, and being able to audit those operations in the
kernel of the domain in which they run.

This patch adds IOCTL_PRIVCMD_DM_OP as gateway for __HYPERVISOR_dm_op.

NOTE: There is no requirement for user-space code to bounce data through
  locked memory buffers (as with IOCTL_PRIVCMD_HYPERCALL) since
  privcmd has enough information to lock the original buffers
  directly.

[1] http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=524a98c2

Signed-off-by: Paul Durrant <paul.durr...@citrix.com>
---
Cc: Boris Ostrovsky <boris.ostrov...@oracle.com>
Cc: Juergen Gross <jgr...@suse.com>

v3:
- Add module parameters for max number of dm_op buffers and max buffer
  size
- Fix arm build
- Fix commit comment to reflect re-worked patch

v2:
- Lock the user pages rather than bouncing through kernel memory
---
 arch/arm/xen/enlighten.c |   1 +
 arch/arm/xen/hypercall.S |   1 +
 arch/arm64/xen/hypercall.S   |   1 +
 arch/x86/include/asm/xen/hypercall.h |   7 ++
 drivers/xen/privcmd.c| 139 +++
 include/uapi/xen/privcmd.h   |  13 
 include/xen/arm/hypercall.h  |   1 +
 include/xen/interface/hvm/dm_op.h|  32 
 include/xen/interface/xen.h  |   1 +
 9 files changed, 196 insertions(+)
 create mode 100644 include/xen/interface/hvm/dm_op.h

diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
index 11d9f28..81e3217 100644
--- a/arch/arm/xen/enlighten.c
+++ b/arch/arm/xen/enlighten.c
@@ -457,4 +457,5 @@ EXPORT_SYMBOL_GPL(HYPERVISOR_tmem_op);
 EXPORT_SYMBOL_GPL(HYPERVISOR_platform_op);
 EXPORT_SYMBOL_GPL(HYPERVISOR_multicall);
 EXPORT_SYMBOL_GPL(HYPERVISOR_vm_assist);
+EXPORT_SYMBOL_GPL(HYPERVISOR_dm_op);
 EXPORT_SYMBOL_GPL(privcmd_call);
diff --git a/arch/arm/xen/hypercall.S b/arch/arm/xen/hypercall.S
index a648dfc..b0b80c0 100644
--- a/arch/arm/xen/hypercall.S
+++ b/arch/arm/xen/hypercall.S
@@ -92,6 +92,7 @@ HYPERCALL1(tmem_op);
 HYPERCALL1(platform_op_raw);
 HYPERCALL2(multicall);
 HYPERCALL2(vm_assist);
+HYPERCALL3(dm_op);
 
 ENTRY(privcmd_call)
stmdb sp!, {r4}
diff --git a/arch/arm64/xen/hypercall.S b/arch/arm64/xen/hypercall.S
index 947830a..401ceb7 100644
--- a/arch/arm64/xen/hypercall.S
+++ b/arch/arm64/xen/hypercall.S
@@ -84,6 +84,7 @@ HYPERCALL1(tmem_op);
 HYPERCALL1(platform_op_raw);
 HYPERCALL2(multicall);
 HYPERCALL2(vm_assist);
+HYPERCALL3(dm_op);
 
 ENTRY(privcmd_call)
mov x16, x0
diff --git a/arch/x86/include/asm/xen/hypercall.h 
b/arch/x86/include/asm/xen/hypercall.h
index a12a047..f6d20f6 100644
--- a/arch/x86/include/asm/xen/hypercall.h
+++ b/arch/x86/include/asm/xen/hypercall.h
@@ -472,6 +472,13 @@ HYPERVISOR_xenpmu_op(unsigned int op, void *arg)
return _hypercall2(int, xenpmu_op, op, arg);
 }
 
+static inline int
+HYPERVISOR_dm_op(
+   domid_t dom, unsigned int nr_bufs, void *bufs)
+{
+   return _hypercall3(int, dm_op, dom, nr_bufs, bufs);
+}
+
 static inline void
 MULTI_fpu_taskswitch(struct multicall_entry *mcl, int set)
 {
diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
index 5e5c7ae..a33f17e 100644
--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -32,6 +33,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -43,6 +45,17 @@ MODULE_LICENSE("GPL");
 
 #define PRIV_VMA_LOCKED ((void *)1)
 
+unsigned int privcmd_dm_op_max_num = 16;
+module_param_named(dm_op_max_nr_bufs, privcmd_dm_op_max_num, uint, 0644);
+MODULE_PARM_DESC(dm_op_max_nr_bufs,
+"Maximum number of buffers per dm_op hypercall");
+
+unsigned int privcmd_dm_op_buf_max_size = XEN_PAGE_SIZE;
+module_param_named(dm_op_buf_max_size, privcmd_dm_op_buf_max_size, uint,
+  0644);
+MODULE_PARM_DESC(dm_op_buf_max_size,
+"Maximum size of a dm_op hypercall buffer");
+
 static int privcmd_vma_range_is_mapped(
struct vm_area_struct *vma,
unsigned long addr,
@@ -548,6 +561,128 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, 
int version)
goto out;
 }
 
+static int lock_pages(
+   struct privcmd_dm_op_buf kbufs[], unsigned int num,
+   struct page *pages[], unsigned int nr_pages)
+{
+   unsigned int i;
+
+   for (i = 0; i < num; i++) {
+   unsigned int requested;
+   int pinned;
+
+   requested = DIV_ROUND_UP(
+   offset_in_page(kbufs[i].uptr) + kbufs[i].size,
+   PAGE_SIZE);
+   if (requested > nr_pages)
+   return -ENOSPC;
+
+   pinned = get_user_pages_fast(
+   (unsigned long) kbufs[i].uptr

[PATCH v3 2/3] xen/privcmd: Add IOCTL_PRIVCMD_DM_OP

2017-02-13 Thread Paul Durrant

Recently a new dm_op[1] hypercall was added to Xen to provide a mechanism
for restricting device emulators (such as QEMU) to a limited set of
hypervisor operations, and being able to audit those operations in the
kernel of the domain in which they run.

This patch adds IOCTL_PRIVCMD_DM_OP as gateway for __HYPERVISOR_dm_op.

NOTE: There is no requirement for user-space code to bounce data through
  locked memory buffers (as with IOCTL_PRIVCMD_HYPERCALL) since
  privcmd has enough information to lock the original buffers
  directly.

[1] http://xenbits.xen.org/gitweb/?p=xen.git;a=commit;h=524a98c2

Signed-off-by: Paul Durrant 
---
Cc: Boris Ostrovsky 
Cc: Juergen Gross 

v3:
- Add module parameters for max number of dm_op buffers and max buffer
  size
- Fix arm build
- Fix commit comment to reflect re-worked patch

v2:
- Lock the user pages rather than bouncing through kernel memory
---
 arch/arm/xen/enlighten.c |   1 +
 arch/arm/xen/hypercall.S |   1 +
 arch/arm64/xen/hypercall.S   |   1 +
 arch/x86/include/asm/xen/hypercall.h |   7 ++
 drivers/xen/privcmd.c| 139 +++
 include/uapi/xen/privcmd.h   |  13 
 include/xen/arm/hypercall.h  |   1 +
 include/xen/interface/hvm/dm_op.h|  32 
 include/xen/interface/xen.h  |   1 +
 9 files changed, 196 insertions(+)
 create mode 100644 include/xen/interface/hvm/dm_op.h

diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
index 11d9f28..81e3217 100644
--- a/arch/arm/xen/enlighten.c
+++ b/arch/arm/xen/enlighten.c
@@ -457,4 +457,5 @@ EXPORT_SYMBOL_GPL(HYPERVISOR_tmem_op);
 EXPORT_SYMBOL_GPL(HYPERVISOR_platform_op);
 EXPORT_SYMBOL_GPL(HYPERVISOR_multicall);
 EXPORT_SYMBOL_GPL(HYPERVISOR_vm_assist);
+EXPORT_SYMBOL_GPL(HYPERVISOR_dm_op);
 EXPORT_SYMBOL_GPL(privcmd_call);
diff --git a/arch/arm/xen/hypercall.S b/arch/arm/xen/hypercall.S
index a648dfc..b0b80c0 100644
--- a/arch/arm/xen/hypercall.S
+++ b/arch/arm/xen/hypercall.S
@@ -92,6 +92,7 @@ HYPERCALL1(tmem_op);
 HYPERCALL1(platform_op_raw);
 HYPERCALL2(multicall);
 HYPERCALL2(vm_assist);
+HYPERCALL3(dm_op);
 
 ENTRY(privcmd_call)
stmdb sp!, {r4}
diff --git a/arch/arm64/xen/hypercall.S b/arch/arm64/xen/hypercall.S
index 947830a..401ceb7 100644
--- a/arch/arm64/xen/hypercall.S
+++ b/arch/arm64/xen/hypercall.S
@@ -84,6 +84,7 @@ HYPERCALL1(tmem_op);
 HYPERCALL1(platform_op_raw);
 HYPERCALL2(multicall);
 HYPERCALL2(vm_assist);
+HYPERCALL3(dm_op);
 
 ENTRY(privcmd_call)
mov x16, x0
diff --git a/arch/x86/include/asm/xen/hypercall.h 
b/arch/x86/include/asm/xen/hypercall.h
index a12a047..f6d20f6 100644
--- a/arch/x86/include/asm/xen/hypercall.h
+++ b/arch/x86/include/asm/xen/hypercall.h
@@ -472,6 +472,13 @@ HYPERVISOR_xenpmu_op(unsigned int op, void *arg)
return _hypercall2(int, xenpmu_op, op, arg);
 }
 
+static inline int
+HYPERVISOR_dm_op(
+   domid_t dom, unsigned int nr_bufs, void *bufs)
+{
+   return _hypercall3(int, dm_op, dom, nr_bufs, bufs);
+}
+
 static inline void
 MULTI_fpu_taskswitch(struct multicall_entry *mcl, int set)
 {
diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
index 5e5c7ae..a33f17e 100644
--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -32,6 +33,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -43,6 +45,17 @@ MODULE_LICENSE("GPL");
 
 #define PRIV_VMA_LOCKED ((void *)1)
 
+unsigned int privcmd_dm_op_max_num = 16;
+module_param_named(dm_op_max_nr_bufs, privcmd_dm_op_max_num, uint, 0644);
+MODULE_PARM_DESC(dm_op_max_nr_bufs,
+"Maximum number of buffers per dm_op hypercall");
+
+unsigned int privcmd_dm_op_buf_max_size = XEN_PAGE_SIZE;
+module_param_named(dm_op_buf_max_size, privcmd_dm_op_buf_max_size, uint,
+  0644);
+MODULE_PARM_DESC(dm_op_buf_max_size,
+"Maximum size of a dm_op hypercall buffer");
+
 static int privcmd_vma_range_is_mapped(
struct vm_area_struct *vma,
unsigned long addr,
@@ -548,6 +561,128 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, 
int version)
goto out;
 }
 
+static int lock_pages(
+   struct privcmd_dm_op_buf kbufs[], unsigned int num,
+   struct page *pages[], unsigned int nr_pages)
+{
+   unsigned int i;
+
+   for (i = 0; i < num; i++) {
+   unsigned int requested;
+   int pinned;
+
+   requested = DIV_ROUND_UP(
+   offset_in_page(kbufs[i].uptr) + kbufs[i].size,
+   PAGE_SIZE);
+   if (requested > nr_pages)
+   return -ENOSPC;
+
+   pinned = get_user_pages_fast(
+   (unsigned long) kbufs[i].uptr,
+   requested, FOLL_WRITE, pages);
+   if (pinned

[PATCH v3 3/3] xen/privcmd: add IOCTL_PRIVCMD_RESTRICT

2017-02-13 Thread Paul Durrant

The purpose if this ioctl is to allow a user of privcmd to restrict its
operation such that it will no longer service arbitrary hypercalls via
IOCTL_PRIVCMD_HYPERCALL, and will check for a matching domid when
servicing IOCTL_PRIVCMD_DM_OP. The aim of this is to limit the attack
surface for a compromised device model.

Signed-off-by: Paul Durrant <paul.durr...@citrix.com>
---
Cc: Boris Ostrovsky <boris.ostrov...@oracle.com>
Cc: Juergen Gross <jgr...@suse.com>

v3:
- Extend restriction to mapping ioctls

v2:
- Make sure that a restriction cannot be cleared
---
 drivers/xen/privcmd.c  | 88 +-
 include/uapi/xen/privcmd.h |  2 ++
 2 files changed, 81 insertions(+), 9 deletions(-)

diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
index a33f17e..f50d984 100644
--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -56,16 +56,25 @@ module_param_named(dm_op_buf_max_size, 
privcmd_dm_op_buf_max_size, uint,
 MODULE_PARM_DESC(dm_op_buf_max_size,
 "Maximum size of a dm_op hypercall buffer");
 
+struct privcmd_data {
+   domid_t domid;
+};
+
 static int privcmd_vma_range_is_mapped(
struct vm_area_struct *vma,
unsigned long addr,
unsigned long nr_pages);
 
-static long privcmd_ioctl_hypercall(void __user *udata)
+static long privcmd_ioctl_hypercall(struct file *file, void __user *udata)
 {
+   struct privcmd_data *data = file->private_data;
struct privcmd_hypercall hypercall;
long ret;
 
+   /* Disallow arbitrary hypercalls if restricted */
+   if (data->domid != DOMID_INVALID)
+   return -EPERM;
+
if (copy_from_user(, udata, sizeof(hypercall)))
return -EFAULT;
 
@@ -242,8 +251,9 @@ static int mmap_gfn_range(void *data, void *state)
return 0;
 }
 
-static long privcmd_ioctl_mmap(void __user *udata)
+static long privcmd_ioctl_mmap(struct file *file, void __user *udata)
 {
+   struct privcmd_data *data = file->private_data;
struct privcmd_mmap mmapcmd;
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma;
@@ -258,6 +268,10 @@ static long privcmd_ioctl_mmap(void __user *udata)
if (copy_from_user(, udata, sizeof(mmapcmd)))
return -EFAULT;
 
+   /* If restriction is in place, check the domid matches */
+   if (data->domid != DOMID_INVALID && data->domid != mmapcmd.dom)
+   return -EPERM;
+
rc = gather_array(,
  mmapcmd.num, sizeof(struct privcmd_mmap_entry),
  mmapcmd.entry);
@@ -429,8 +443,10 @@ static int alloc_empty_pages(struct vm_area_struct *vma, 
int numpgs)
 
 static const struct vm_operations_struct privcmd_vm_ops;
 
-static long privcmd_ioctl_mmap_batch(void __user *udata, int version)
+static long privcmd_ioctl_mmap_batch(
+   struct file *file, void __user *udata, int version)
 {
+   struct privcmd_data *data = file->private_data;
int ret;
struct privcmd_mmapbatch_v2 m;
struct mm_struct *mm = current->mm;
@@ -459,6 +475,10 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, 
int version)
return -EINVAL;
}
 
+   /* If restriction is in place, check the domid matches */
+   if (data->domid != DOMID_INVALID && data->domid != m.dom)
+   return -EPERM;
+
nr_pages = DIV_ROUND_UP(m.num, XEN_PFN_PER_PAGE);
if ((m.num <= 0) || (nr_pages > (LONG_MAX >> PAGE_SHIFT)))
return -EINVAL;
@@ -603,8 +623,9 @@ static void unlock_pages(struct page *pages[], unsigned int 
nr_pages)
}
 }
 
-static long privcmd_ioctl_dm_op(void __user *udata)
+static long privcmd_ioctl_dm_op(struct file *file, void __user *udata)
 {
+   struct privcmd_data *data = file->private_data;
struct privcmd_dm_op kdata;
struct privcmd_dm_op_buf *kbufs;
unsigned int nr_pages = 0;
@@ -616,6 +637,10 @@ static long privcmd_ioctl_dm_op(void __user *udata)
if (copy_from_user(, udata, sizeof(kdata)))
return -EFAULT;
 
+   /* If restriction is in place, check the domid matches */
+   if (data->domid != DOMID_INVALID && data->domid != kdata.dom)
+   return -EPERM;
+
if (kdata.num == 0)
return 0;
 
@@ -683,6 +708,23 @@ static long privcmd_ioctl_dm_op(void __user *udata)
return rc;
 }
 
+static long privcmd_ioctl_restrict(struct file *file, void __user *udata)
+{
+   struct privcmd_data *data = file->private_data;
+   domid_t dom;
+
+   if (copy_from_user(, udata, sizeof(dom)))
+   return -EFAULT;
+
+   /* Set restriction to the specified domain, or check it matches */
+   if (data->domid == DOMID_INVALID)
+   data->domid = dom;
+   else if (data->

[PATCH v3 3/3] xen/privcmd: add IOCTL_PRIVCMD_RESTRICT

2017-02-13 Thread Paul Durrant

The purpose if this ioctl is to allow a user of privcmd to restrict its
operation such that it will no longer service arbitrary hypercalls via
IOCTL_PRIVCMD_HYPERCALL, and will check for a matching domid when
servicing IOCTL_PRIVCMD_DM_OP. The aim of this is to limit the attack
surface for a compromised device model.

Signed-off-by: Paul Durrant 
---
Cc: Boris Ostrovsky 
Cc: Juergen Gross 

v3:
- Extend restriction to mapping ioctls

v2:
- Make sure that a restriction cannot be cleared
---
 drivers/xen/privcmd.c  | 88 +-
 include/uapi/xen/privcmd.h |  2 ++
 2 files changed, 81 insertions(+), 9 deletions(-)

diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
index a33f17e..f50d984 100644
--- a/drivers/xen/privcmd.c
+++ b/drivers/xen/privcmd.c
@@ -56,16 +56,25 @@ module_param_named(dm_op_buf_max_size, 
privcmd_dm_op_buf_max_size, uint,
 MODULE_PARM_DESC(dm_op_buf_max_size,
 "Maximum size of a dm_op hypercall buffer");
 
+struct privcmd_data {
+   domid_t domid;
+};
+
 static int privcmd_vma_range_is_mapped(
struct vm_area_struct *vma,
unsigned long addr,
unsigned long nr_pages);
 
-static long privcmd_ioctl_hypercall(void __user *udata)
+static long privcmd_ioctl_hypercall(struct file *file, void __user *udata)
 {
+   struct privcmd_data *data = file->private_data;
struct privcmd_hypercall hypercall;
long ret;
 
+   /* Disallow arbitrary hypercalls if restricted */
+   if (data->domid != DOMID_INVALID)
+   return -EPERM;
+
if (copy_from_user(, udata, sizeof(hypercall)))
return -EFAULT;
 
@@ -242,8 +251,9 @@ static int mmap_gfn_range(void *data, void *state)
return 0;
 }
 
-static long privcmd_ioctl_mmap(void __user *udata)
+static long privcmd_ioctl_mmap(struct file *file, void __user *udata)
 {
+   struct privcmd_data *data = file->private_data;
struct privcmd_mmap mmapcmd;
struct mm_struct *mm = current->mm;
struct vm_area_struct *vma;
@@ -258,6 +268,10 @@ static long privcmd_ioctl_mmap(void __user *udata)
if (copy_from_user(, udata, sizeof(mmapcmd)))
return -EFAULT;
 
+   /* If restriction is in place, check the domid matches */
+   if (data->domid != DOMID_INVALID && data->domid != mmapcmd.dom)
+   return -EPERM;
+
rc = gather_array(,
  mmapcmd.num, sizeof(struct privcmd_mmap_entry),
  mmapcmd.entry);
@@ -429,8 +443,10 @@ static int alloc_empty_pages(struct vm_area_struct *vma, 
int numpgs)
 
 static const struct vm_operations_struct privcmd_vm_ops;
 
-static long privcmd_ioctl_mmap_batch(void __user *udata, int version)
+static long privcmd_ioctl_mmap_batch(
+   struct file *file, void __user *udata, int version)
 {
+   struct privcmd_data *data = file->private_data;
int ret;
struct privcmd_mmapbatch_v2 m;
struct mm_struct *mm = current->mm;
@@ -459,6 +475,10 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, 
int version)
return -EINVAL;
}
 
+   /* If restriction is in place, check the domid matches */
+   if (data->domid != DOMID_INVALID && data->domid != m.dom)
+   return -EPERM;
+
nr_pages = DIV_ROUND_UP(m.num, XEN_PFN_PER_PAGE);
if ((m.num <= 0) || (nr_pages > (LONG_MAX >> PAGE_SHIFT)))
return -EINVAL;
@@ -603,8 +623,9 @@ static void unlock_pages(struct page *pages[], unsigned int 
nr_pages)
}
 }
 
-static long privcmd_ioctl_dm_op(void __user *udata)
+static long privcmd_ioctl_dm_op(struct file *file, void __user *udata)
 {
+   struct privcmd_data *data = file->private_data;
struct privcmd_dm_op kdata;
struct privcmd_dm_op_buf *kbufs;
unsigned int nr_pages = 0;
@@ -616,6 +637,10 @@ static long privcmd_ioctl_dm_op(void __user *udata)
if (copy_from_user(, udata, sizeof(kdata)))
return -EFAULT;
 
+   /* If restriction is in place, check the domid matches */
+   if (data->domid != DOMID_INVALID && data->domid != kdata.dom)
+   return -EPERM;
+
if (kdata.num == 0)
return 0;
 
@@ -683,6 +708,23 @@ static long privcmd_ioctl_dm_op(void __user *udata)
return rc;
 }
 
+static long privcmd_ioctl_restrict(struct file *file, void __user *udata)
+{
+   struct privcmd_data *data = file->private_data;
+   domid_t dom;
+
+   if (copy_from_user(, udata, sizeof(dom)))
+   return -EFAULT;
+
+   /* Set restriction to the specified domain, or check it matches */
+   if (data->domid == DOMID_INVALID)
+   data->domid = dom;
+   else if (data->domid != dom)
+   return -EINVAL;
+
+   return 0;
+}
+
 static lon

[PATCH v3 0/3] xen/privcmd: support for dm_op and restriction

2017-02-13 Thread Paul Durrant

This patch series follows on from my recent Xen series [1], to provide
support in privcmd for de-privileging of device emulators.

[1] https://lists.xen.org/archives/html/xen-devel/2017-01/msg02558.html

Paul Durrant (3):
  xen/privcmd: return -ENOTTY for unimplemented IOCTLs
  xen/privcmd: Add IOCTL_PRIVCMD_DM_OP
  xen/privcmd: add IOCTL_PRIVCMD_RESTRICT

 arch/arm/xen/enlighten.c |   1 +
 arch/arm/xen/hypercall.S |   1 +
 arch/arm64/xen/hypercall.S   |   1 +
 arch/x86/include/asm/xen/hypercall.h |   7 ++
 drivers/xen/privcmd.c| 226 +--
 include/uapi/xen/privcmd.h   |  15 +++
 include/xen/arm/hypercall.h  |   1 +
 include/xen/interface/hvm/dm_op.h|  32 +
 include/xen/interface/xen.h  |   1 +
 9 files changed, 276 insertions(+), 9 deletions(-)
 create mode 100644 include/xen/interface/hvm/dm_op.h

-- 
2.1.4

[PATCH v3 0/3] xen/privcmd: support for dm_op and restriction

2017-02-13 Thread Paul Durrant

This patch series follows on from my recent Xen series [1], to provide
support in privcmd for de-privileging of device emulators.

[1] https://lists.xen.org/archives/html/xen-devel/2017-01/msg02558.html

Paul Durrant (3):
  xen/privcmd: return -ENOTTY for unimplemented IOCTLs
  xen/privcmd: Add IOCTL_PRIVCMD_DM_OP
  xen/privcmd: add IOCTL_PRIVCMD_RESTRICT

 arch/arm/xen/enlighten.c |   1 +
 arch/arm/xen/hypercall.S |   1 +
 arch/arm64/xen/hypercall.S   |   1 +
 arch/x86/include/asm/xen/hypercall.h |   7 ++
 drivers/xen/privcmd.c| 226 +--
 include/uapi/xen/privcmd.h   |  15 +++
 include/xen/arm/hypercall.h  |   1 +
 include/xen/interface/hvm/dm_op.h|  32 +
 include/xen/interface/xen.h  |   1 +
 9 files changed, 276 insertions(+), 9 deletions(-)
 create mode 100644 include/xen/interface/hvm/dm_op.h

-- 
2.1.4

1 2 3 >

1 - 100 of 230 matches

Mail list logo