Re: [Xen-devel] [PATCH v9 08/13] Add IOREQ_TYPE_VMWARE_PORT

2015-02-26 Thread Jan Beulich
>>> On 26.02.15 at 20:52,  wrote:
> On 02/26/15 03:07, Jan Beulich wrote:
> On 25.02.15 at 21:20,  wrote:
>>> On 02/24/15 10:34, Jan Beulich wrote:
>>> On 17.02.15 at 00:05,  wrote:
> @@ -2474,7 +2594,12 @@ struct hvm_ioreq_server 
> *hvm_select_ioreq_server(struct domain *d,
>  BUILD_BUG_ON(IOREQ_TYPE_PIO != HVMOP_IO_RANGE_PORT);
>  BUILD_BUG_ON(IOREQ_TYPE_COPY != HVMOP_IO_RANGE_MEMORY);
>  BUILD_BUG_ON(IOREQ_TYPE_PCI_CONFIG != HVMOP_IO_RANGE_PCI);
> +BUILD_BUG_ON(IOREQ_TYPE_VMWARE_PORT != 
> HVMOP_IO_RANGE_VMWARE_PORT);
> +BUILD_BUG_ON(IOREQ_TYPE_TIMEOFFSET != HVMOP_IO_RANGE_TIMEOFFSET);
> +BUILD_BUG_ON(IOREQ_TYPE_INVALIDATE != HVMOP_IO_RANGE_INVALIDATE);
>  r = s->range[type];
> +if ( !r )
> +continue;

 Why, all of the sudden?

>>>
>>> This is the replacement for the deleted "if" above.  Continue will lead
>>> to the same return that was remove above (it is at the end).  They are
>>> currently the same because all ioreq servers have the same set of
>>> ranges.  But if it would help, I can change "continue" into the "return
>>> default".
>> 
>> So further down you talk of the "special range 1" (see there for
>> further remarks in this regard) - how would r be NULL here in the
>> first place?
> 
> Since there is a hole in the #defines 0,1,2,7,8 (currently) range[6] is
> where r will be NULL for example.  However no current code should be
> able to get here.  So if you want me to I can drop the "if".

That's where ASSERT() comes in handy.

>> I understand all this is non-trivial and not necessarily obvious. But
>> as said before - the x86 instruction emulator should please remain
>> something acting along _only_ architectural specifications. Any
>> extensions to support things like what you want to do here should
>> be added using neutral hooks, responsible for keeping state they
>> need on their own.
>> 
> 
> 
> How does (the incorrectly formatted for a smaller diff):

Quite a bit better imo!

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] correct mis-conversion set_bit() -> __cpumask_set_cpu() by 4aaca0e9cd

2015-02-26 Thread Jan Beulich
>>> On 26.02.15 at 17:53,  wrote:

> Monday, February 23, 2015, 12:06:00 PM, you wrote:
> 
>> I have no idea how I came to use __cpumask_set_cpu() there, the
>> conversion should have been set_bit() -> __set_bit(). The wrong
>> construct results in problems on systems with relatively few CPUs.
> 
>> Reported-by: Sander Eikelenboom 
>> Signed-off-by: Jan Beulich 
> 
>> --- a/xen/common/softirq.c
>> +++ b/xen/common/softirq.c
>> @@ -106,7 +106,7 @@ void cpu_raise_softirq(unsigned int cpu,
>>  if ( !per_cpu(batching, this_cpu) || in_irq() )
>>  smp_send_event_check_cpu(cpu);
>>  else
>> -__cpumask_set_cpu(nr, &per_cpu(batch_mask, this_cpu));
>> +__set_bit(nr, &per_cpu(batch_mask, this_cpu));
>>  }
>>  
>>  void cpu_raise_softirq_batch_begin(void)
> 
> Hi Jan,
> 
> Any reason this wasn't applied to staging yet ?

It didn't get ack-ed so far (and it was a little too early still to
ping it - I try to allow a week before doing so).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [rumpuserxen test] 35525: regressions - FAIL

2015-02-26 Thread xen . org
flight 35525 rumpuserxen real [real]
http://www.chiark.greenend.org.uk/~xensrcts/logs/35525/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 build-i386-rumpuserxen6 xen-build fail REGR. vs. 33866
 build-amd64-rumpuserxen   6 xen-build fail REGR. vs. 33866

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-rumpuserxen-amd64  1 build-check(1)   blocked n/a
 test-amd64-i386-rumpuserxen-i386  1 build-check(1)   blocked  n/a

version targeted for testing:
 rumpuserxen  21909666eb2d85c02770d04691795abfd4417392
baseline version:
 rumpuserxen  30d72f3fc5e35cd53afd82c8179cc0e0b11146ad


People who touched revisions under test:
  Antti Kantee 
  Ian Jackson 
  Martin Lucina 


jobs:
 build-amd64  pass
 build-i386   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 build-amd64-rumpuserxen  fail
 build-i386-rumpuserxen   fail
 test-amd64-amd64-rumpuserxen-amd64   blocked 
 test-amd64-i386-rumpuserxen-i386 blocked 



sg-report-flight on osstest.cam.xci-test.com
logs: /home/xc_osstest/logs
images: /home/xc_osstest/images

Logs, config files, etc. are available at
http://www.chiark.greenend.org.uk/~xensrcts/logs

Test harness code can be found at
http://xenbits.xensource.com/gitweb?p=osstest.git;a=summary


Not pushing.

(No revision log; it would be 339 lines long.)

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [Qemu-devel] [v2][PATCH] libxl: add one machine property to support IGD GFX passthrough

2015-02-26 Thread Chen, Tiejun

On 2015/2/27 0:17, Ian Campbell wrote:

On Thu, 2015-02-26 at 14:35 +0800, Chen, Tiejun wrote:


If we are going to do this then I think we need to arrange for the
interface to be able to express the need to force the workarounds for a
particular device. IOW a boolean will not suffice since it doesn't
indicate that IGD workarounds are needed.

Probably it would be simplest to just leave this functionality out for
the time being and revisit if/when maintaining the list becomes an
annoyance or an end user trips over it.



You mean we should maintain one list to save all targeted devices, then
tools uses ids as an index to lookup this list to pass something to qemu.


I (think I) meant a list of pci vid:did in libxl, which is matched
against the devices passed to the domain (e.g. "pci = [...]" in xl cfg),
which then enables the igd workarounds, i.e. by passing the option to


Yeah, this is exactly what I'm understanding.


qemu.


But actually one question that I have always been thinking about is, its
really a responsibility of Xen to determine which device type should be
passed by probing that pair of vendor and device ids? Xen is just one of
so many approaches to qemu so such a rare workaround option can be
passed actively by any user, instead of Xen. Furthermore, its becoming
flexible as well to those cases we want to force overriding this.


I'm not sure, but I think you are suggestion that qemu should autodetect
this situation, without being explicitly told "igd-passthru=on" on the
command line?

If the qemu maintainers are amenable to that, and it's not already the
case that other components (e.g. hvmloader) need to be told about these
workarounds, then I suppose that would work.


So I think qemu should mainly plays this role. If qemu realizes we're
passing through a IGD or other targeted device, it should post a warning
or even error message to indicate what right behavior is needed, or what
is that potential risk by default.


Hrm, here it sounds more like you are suggesting that qemu should detect
and warn, rather than detect and do the right thing?

I'm not sure how Qemu could indicate what the right behaviour is going
to be, it'll differ for different hypervisors or even for which Xen
toolstack (xl vs libvirt etc) is in use.

Or maybe I've misunderstood?



IGD is a tricky case since Qemu has to construct a ISA bridge and host 
bridge before we pass IGD device. But we don't like to expose these two 
bridges unconditionally, and this is also why we need this option.


Here I just mean when Qemu realizes IGD is passed through but without 
that appropriate option set, Qemu can post something to explicitly 
notify user that this option is needed in his case. But it may be a lazy 
idea.


So now I think I'd better go back handling this on Xen side with your 
comments. As you said the Boolean doesn't suffice to indicate that IGD 
workarounds are needed. So I think we can reuse that existing bool 
'gfx_passthru'.


Firstly we can redefine this as string,

-   ("gfx_passthru", libxl_defbool),
+   ("gfx_passthru", string),

Then

+
+if (libxl__is_igd_vga_passthru(gc, guest_config) ||
+(b_info->u.hvm.gfx_passthru &&
+ strncmp(b_info->u.hvm.gfx_passthru, "igd", 3) == 0) ) {
+machinearg = GCSPRINTF("%s,igd-passthru=on", machinearg);
+}
+

Of course we need modify something else to align this change.

Thanks
Tiejun

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: xen config changes v4

2015-02-26 Thread Juergen Gross

On 02/26/2015 07:48 PM, Luis R. Rodriguez wrote:

On Thu, Feb 26, 2015 at 05:42:57PM +, Stefano Stabellini wrote:

On Thu, 26 Feb 2015, Luis R. Rodriguez wrote:

On Thu, Feb 26, 2015 at 11:08:20AM +, Stefano Stabellini wrote:

On Thu, 26 Feb 2015, David Vrabel wrote:

On 26/02/15 04:59, Juergen Gross wrote:


So we are again in the situation that pv-drivers always imply the pvops
kernel (PARAVIRT selected). I started the whole Kconfig rework to
eliminate this dependency.


Yes.  Can you produce a series that just addresses this one issue.

In the absence of any concrete requirement for this big Kconfig reorg I
I don't think it is helpful.


I clearly missed some context as I didn't realize that this was the
intended goal. Why do we want this? Please explain as it won't come
for free.


We have a few PV interfaces for HVM guests that need PARAVIRT in Linux
in order to be used, for example pv_time_ops and HVMOP_pagetable_dying.
They are critical performance improvements and from the interface
perspective, small enough that doesn't make much sense having a separate
KConfig option for them.


In order to reach the goal above we necessarily need to introduce a
differentiation in terms of PV on HVM guests in Linux:

1) basic guests with PV network, disk, etc but no PV timers, no
HVMOP_pagetable_dying, no PV IPIs
2) full PV on HVM guests that have PV network, disk, timers,
HVMOP_pagetable_dying, PV IPIs and anything else that makes sense.

2) is much faster than 1) on Xen and 2) is only a tiny bit slower than
1) on native x86


Also don't we shove 2) down hvm guests right now? Even when everything is
built in I do not see how we opt out for HVM for 1) at run time right now.

If this is true then the question of motivation for this becomes even
stronger I think.


Yes, indeed there is no way to do 1) at the moment. And for good
reasons, see above.


OK if the goal is to be able to build front end drivers by avoiding building
PARAVIRT / PARAVIRT_CLOCK and if the gains to be able to do so (which haven't
been stated other than just the ability to do so) are small (as Stefano notes
simple hvm containers do not perform great) but requires a bit of work, I'd
rather ask -- why not address *why* we are avoiding PARAVIRT /
PARAVIRT_CLOCK and stick to the original goals behind the pvops model by
addressing what is required to be able to continue to be happy with one single
kernel. The work required to do that might be more than to just be able to
build simple Xen hvm containers without PARAVIRT / PARAVIRT_CLOCK  but I'd
think the gains would be much higher.


I absolutely agree. I think this is a long term goal we should work on.
PVH should address most of the issues, BTW.


If this resonates well then I'd like to ask: what are the current most pressing
issues with enabling PARAVIRT / PARAVIRT_CLOCK.


PARAVIRT: performance, especially memory management

PARAVIRT_CLOCK: none


Juergen


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: xen config changes v4

2015-02-26 Thread Juergen Gross

On 02/26/2015 06:42 PM, Stefano Stabellini wrote:

On Thu, 26 Feb 2015, Luis R. Rodriguez wrote:

On Thu, Feb 26, 2015 at 11:08:20AM +, Stefano Stabellini wrote:

On Thu, 26 Feb 2015, David Vrabel wrote:

On 26/02/15 04:59, Juergen Gross wrote:


So we are again in the situation that pv-drivers always imply the pvops
kernel (PARAVIRT selected). I started the whole Kconfig rework to
eliminate this dependency.


Yes.  Can you produce a series that just addresses this one issue.

In the absence of any concrete requirement for this big Kconfig reorg I
I don't think it is helpful.


I clearly missed some context as I didn't realize that this was the
intended goal. Why do we want this? Please explain as it won't come
for free.


We have a few PV interfaces for HVM guests that need PARAVIRT in Linux
in order to be used, for example pv_time_ops and HVMOP_pagetable_dying.
They are critical performance improvements and from the interface
perspective, small enough that doesn't make much sense having a separate
KConfig option for them.


In order to reach the goal above we necessarily need to introduce a
differentiation in terms of PV on HVM guests in Linux:

1) basic guests with PV network, disk, etc but no PV timers, no
HVMOP_pagetable_dying, no PV IPIs
2) full PV on HVM guests that have PV network, disk, timers,
HVMOP_pagetable_dying, PV IPIs and anything else that makes sense.

2) is much faster than 1) on Xen and 2) is only a tiny bit slower than
1) on native x86


Also don't we shove 2) down hvm guests right now? Even when everything is
built in I do not see how we opt out for HVM for 1) at run time right now.

If this is true then the question of motivation for this becomes even
stronger I think.


Yes, indeed there is no way to do 1) at the moment. And for good
reasons, see above.


Hmm, after checking the code I'm not convinced:

- HVMOP_pagetable_dying is obsolete on modern hardware supporting
  EPT/HAP

- PV IPIs are not needed on single-vcpu guests

- PARAVIRT_CLOCK doesn't need PARAVIRT (in fact the SUSEs kernel configs
  for all x86_64 kernels have CONFIG_PARAVIRT_CLOCK=y)

So I think we really should enable building Xen frontends without
PARAVIRT, implying at least no XEN_PV and no XEN_PVH.

I'll have a try setting up patches.


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 3/4] xen: sched: make counters for vCPU tickling generic

2015-02-26 Thread Meng Xu
2015-02-26 8:37 GMT-05:00 Dario Faggioli :

> and update them from Credit2 and RTDS schedulers.
>
> Signed-off-by: Dario Faggioli 
> Cc: Meng Xu 
> Cc: George Dunlap 
> Cc: Jan Beulich 
> Cc: Keir Fraser 
> ---
>  xen/common/sched_credit2.c   |2 ++
>  xen/common/sched_rt.c|2 ++
>  xen/include/xen/perfc_defn.h |4 ++--
>  3 files changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
> index 2b852cc..bf13a84 100644
> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -571,9 +571,11 @@ tickle:
>(unsigned char *)&d);
>  }
>  cpumask_set_cpu(ipid, &rqd->tickled);
> +SCHED_STAT_CRANK(tickle_idlers_some);
>  cpu_raise_softirq(ipid, SCHEDULE_SOFTIRQ);
>
>  no_tickle:
> +SCHED_STAT_CRANK(tickle_idlers_none);
>  return;
>  }
>
> diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
> index 49d1b83..2ad0c68 100644
> --- a/xen/common/sched_rt.c
> +++ b/xen/common/sched_rt.c
> @@ -929,6 +929,7 @@ runq_tickle(const struct scheduler *ops, struct
> rt_vcpu *new)
>  }
>
>  /* didn't tickle any cpu */
> +SCHED_STAT_CRANK(tickle_idlers_none);
>  return;
>  out:
>  /* TRACE */
> @@ -944,6 +945,7 @@ out:
>  }
>
>  cpumask_set_cpu(cpu_to_tickle, &prv->tickled);
> +SCHED_STAT_CRANK(tickle_idlers_some);
>  cpu_raise_softirq(cpu_to_tickle, SCHEDULE_SOFTIRQ);
>  return;
>  }
>


​The change for RTDS scheduler looks good to me.

Thanks,

Meng​


-- 


---
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/4] xen: sched: honour generic perf conuters in the RTDS scheduler

2015-02-26 Thread Meng Xu
Not sure if I should comment with Reviewed-by, I will just do it. Please
just ignore if I should not add Reviewed-by.

2015-02-26 8:36 GMT-05:00 Dario Faggioli :

> more specifically, about vCPU initialization and destruction events,
> in line with adb26c09f26e ("xen: sched: introduce a couple of counters
> in credit2 and SEDF").
>
> Signed-off-by: Dario Faggioli 
> Cc: Meng Xu 
> Cc: George Dunlap 
> Cc: Jan Beulich 
> Cc: Keir Fraser 
> ---
>  xen/common/sched_rt.c |4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
> index df4adac..58dd646 100644
> --- a/xen/common/sched_rt.c
> +++ b/xen/common/sched_rt.c
> @@ -525,6 +525,8 @@ rt_alloc_vdata(const struct scheduler *ops, struct
> vcpu *vc, void *dd)
>  if ( !is_idle_vcpu(vc) )
>  svc->budget = RTDS_DEFAULT_BUDGET;
>
> +SCHED_STAT_CRANK(vcpu_init);
> +
>  return svc;
>  }
>
> @@ -574,6 +576,8 @@ rt_vcpu_remove(const struct scheduler *ops, struct
> vcpu *vc)
>  struct rt_dom * const sdom = svc->sdom;
>  spinlock_t *lock;
>
> +SCHED_STAT_CRANK(vcpu_destroy);
> +
>  BUG_ON( sdom == NULL );
>
>  lock = vcpu_schedule_lock_irq(vc);
>
>
​Reviewed-by: Meng Xu ​

​Thanks,

Meng​



---
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [linux-3.16 test] 35326: regressions - FAIL

2015-02-26 Thread xen . org
flight 35326 linux-3.16 real [real]
http://www.chiark.greenend.org.uk/~xensrcts/logs/35326/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-amd64-xl-credit2  15 guest-localmigrate/x10fail REGR. vs. 34167
 test-amd64-i386-pair   17 guest-migrate/src_host/dst_host fail REGR. vs. 34167

Tests which are failing intermittently (not blocking):
 test-amd64-i386-rumpuserxen-i386  8 guest-start fail pass in 34793
 test-amd64-amd64-libvirt  9 guest-start fail pass in 34793
 test-amd64-amd64-xl-pvh-intel  5 xen-boot  fail in 34793 pass in 35326
 test-amd64-i386-rhel6hvm-intel  5 xen-boot fail in 34793 pass in 35326
 test-amd64-amd64-xl-sedf  3 host-install(3)  broken in 34793 pass in 35326
 test-amd64-i386-freebsd10-amd64  5 xen-bootfail in 34793 pass in 35326
 test-amd64-i386-xl-qemut-win7-amd64  5 xen-bootfail in 34793 pass in 35326
 test-amd64-i386-xl-winxpsp3   5 xen-boot   fail in 34793 pass in 35326
 test-amd64-i386-pair  8 xen-boot/dst_host  fail in 34793 pass in 35326
 test-amd64-i386-pair  7 xen-boot/src_host  fail in 34793 pass in 35326

Regressions which are regarded as allowable (not blocking):
 test-amd64-amd64-rumpuserxen-amd64 13 
rumpuserxen-demo-xenstorels/xenstorels/;.repeat fail in 34793 blocked in 34167

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-xl-pvh-intel  9 guest-start  fail  never pass
 test-armhf-armhf-xl-multivcpu 10 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-sedf 10 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt 10 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-sedf-pin 10 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-midway   10 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  10 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-pvh-amd   9 guest-start  fail   never pass
 test-armhf-armhf-xl  10 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-pcipt-intel  9 guest-start fail never pass
 test-armhf-armhf-xl-credit2  10 migrate-support-checkfail   never pass
 test-amd64-i386-freebsd10-i386  7 freebsd-install  fail never pass
 test-amd64-amd64-xl-multivcpu 15 guest-localmigrate/x10   fail  never pass
 test-amd64-amd64-xl-sedf 15 guest-localmigrate/x10   fail   never pass
 test-amd64-i386-freebsd10-amd64  7 freebsd-install fail never pass
 test-amd64-amd64-xl-sedf-pin 15 guest-localmigrate/x10   fail   never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 14 guest-stop fail never pass
 test-amd64-amd64-rumpuserxen-amd64 13 
rumpuserxen-demo-xenstorels/xenstorels.repeat fail never pass
 test-amd64-i386-xl-qemut-winxpsp3 14 guest-stopfail never pass
 test-amd64-i386-xl-qemut-win7-amd64 14 guest-stop  fail never pass
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-amd64-xl-win7-amd64 14 guest-stop   fail never pass
 test-amd64-amd64-xl-qemut-winxpsp3 14 guest-stop   fail never pass
 test-amd64-i386-xl-win7-amd64 14 guest-stop   fail  never pass
 test-amd64-amd64-xl-winxpsp3 14 guest-stop   fail   never pass
 test-amd64-i386-xl-qemuu-win7-amd64 14 guest-stop  fail never pass
 test-amd64-i386-xl-qemuu-winxpsp3 14 guest-stopfail never pass
 test-amd64-amd64-xl-qemut-win7-amd64 14 guest-stop fail never pass
 test-amd64-i386-xl-winxpsp3-vcpus1 14 guest-stop   fail never pass
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-i386-xl-winxpsp3  14 guest-stop   fail   never pass
 test-amd64-amd64-xl-qemuu-winxpsp3  7 windows-install  fail never pass
 test-amd64-amd64-libvirt 10 migrate-support-check fail in 34793 never pass

version targeted for testing:
 linux4ba6745b95608891fdec154f6e75479e15a8a24e
baseline version:
 linux19583ca584d6f574384e17fe7613dfaeadcdc4a6


1040 people touched revisions under test,
not listing them all


jobs:
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-armhf-pvops  

Re: [Xen-devel] how to assign resources exclusive to a single domU

2015-02-26 Thread Jürgen Groß

On 02/26/2015 09:57 AM, Olaf Hering wrote:

While working on pvscsi support for libxl I noticed that assigning a
resource exclusivly to just a single domU via libxl will be a major
effort. Up to now libxl could rely on the fact that a resource can be
either shared or the backend deals with the attempt to share.

There are two cases in pvscsi:

  1) a single physical HST:CHN:TGT:LUN device must be assigned to just a
 single domU. While the (xenlinux) backend driver allows to assign
 the device to more than one domU the sharing can not work in
 practice.


You should keep in mind that *some* cases might be absolutely okay.
Please don't assume all sharing configurations are nonsense!


  2) the xenlinux backend driver has two modes: emulation and raw. With
 raw mode the SCSI commands coming from domU will be passed directly
 to the physical device. I think its required to make sure that all
 devices connected to a physical scsi host must operate either
 entirely in raw mode or on emulation mode.


This can be mapped to case #1: the raw mode is selected by assigning
all LUNs of a target to a guest via "feature-host". If case #1 is
verified it wouldn't be possible to assign some LUNs multiple times
which would be required to have a mixture of raw and emulation for
a target.

I wouldn't do more than xend in this case. The pvops upstream pvscsi
backend doesn't need the emulation mode any more, this is handled by
the generic target infrastructure .


To handle both cases libxl could either assume that the admin is
responsible for proper configuration:
  - just one domU per physical device
  - if raw mode is enabled all devices on the physcial scsi host will be
assigned to just one domU


Like in the non-virtualized world: the admin has to ensure that devices
in the SAN are either used by only one system, or that the systems
using it coordinate the shared usage.


Or libxl gets functionality to verify that two cases above are really
enforced. Doing that means that there has to be some global lock under
which the system state in xenstore is parsed and the to be assigned domU
configuration is compared:
  - are the physical devices already assigned
  - is the raw mode properly configured

In xend the case #1 was not handled. There is some code for case #2, I
have to check how complete the enforcement in xend was.

I wonder what should be done in my changes for libxl.


If you are doing something, please add a flag to be able to disable
the additional security checks regarding multiple assignment.


Juergen

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [qemu-mainline test] 35298: regressions - FAIL

2015-02-26 Thread xen . org
flight 35298 qemu-mainline real [real]
http://www.chiark.greenend.org.uk/~xensrcts/logs/35298/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-amd64-i386-xl-win7-amd64  7 windows-install  fail REGR. vs. 33480
 test-amd64-amd64-xl-winxpsp3  7 windows-install   fail REGR. vs. 33480
 test-amd64-i386-xl-winxpsp3   7 windows-install   fail REGR. vs. 33480
 test-amd64-i386-xl-qemuu-winxpsp3  7 windows-install  fail REGR. vs. 33480
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 7 windows-install fail REGR. vs. 33480
 test-amd64-amd64-xl-win7-amd64  7 windows-install fail REGR. vs. 33480
 test-amd64-i386-xl-qemuu-ovmf-amd64  7 debian-hvm-install fail REGR. vs. 33480
 test-amd64-i386-qemuu-rhel6hvm-intel  7 redhat-installfail REGR. vs. 33480
 test-amd64-i386-xl-qemuu-debianhvm-amd64 7 debian-hvm-install fail REGR. vs. 
33480
 test-amd64-i386-xl-winxpsp3-vcpus1  7 windows-install fail REGR. vs. 33480
 test-amd64-amd64-xl-qemuu-ovmf-amd64 7 debian-hvm-install fail REGR. vs. 33480
 test-amd64-amd64-xl-qemuu-debianhvm-amd64 7 debian-hvm-install fail REGR. vs. 
33480
 test-amd64-i386-freebsd10-i386  8 guest-start fail REGR. vs. 33480
 test-amd64-i386-freebsd10-amd64  8 guest-startfail REGR. vs. 33480
 test-amd64-amd64-xl-qemuu-winxpsp3  7 windows-install fail REGR. vs. 33480
 test-amd64-i386-xl-qemuu-win7-amd64  7 windows-installfail REGR. vs. 33480
 test-amd64-i386-qemuu-rhel6hvm-amd  7 redhat-install  fail REGR. vs. 33480
 test-amd64-i386-rhel6hvm-amd  7 redhat-installfail REGR. vs. 33480
 test-amd64-amd64-xl-qemuu-win7-amd64  7 windows-install   fail REGR. vs. 33480
 test-amd64-i386-rhel6hvm-intel  7 redhat-install  fail REGR. vs. 33480

Regressions which are regarded as allowable (not blocking):
 test-amd64-i386-pair17 guest-migrate/src_host/dst_host fail like 33480

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-xl  10 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-sedf 10 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-sedf-pin 10 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-midway   10 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-pvh-intel  9 guest-start  fail  never pass
 test-armhf-armhf-xl-credit2  10 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 10 migrate-support-checkfail  never pass
 test-armhf-armhf-libvirt 10 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  10 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-pvh-amd   9 guest-start  fail   never pass
 test-amd64-amd64-libvirt 10 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-pcipt-intel  9 guest-start fail never pass
 test-amd64-i386-xl-qemut-win7-amd64 14 guest-stop  fail never pass
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-amd64-xl-qemut-winxpsp3 14 guest-stop   fail never pass
 test-amd64-amd64-xl-qemut-win7-amd64 14 guest-stop fail never pass
 test-amd64-i386-xl-qemut-winxpsp3 14 guest-stopfail never pass

version targeted for testing:
 qemuucd2d5541271f1934345d8ca42f5fafff1744eee7
baseline version:
 qemuu1e42c353469cb58ca4f3b450eea4211af7d0b147


People who touched revisions under test:
  Alberto Garcia 
  Alex Suykov 
  Alex Williamson 
  Alexander Graf 
  Alexey Kardashevskiy 
  Alistair Francis 
  Amit Shah 
  Andreas Färber 
  Aurelien Jarno 
  Avi Kivity 
  Bastian Koppelmann 
  Ben Taylor 
  Benjamin Herrenschmidt 
  Bharata B Rao 
  Blue Swirl 
  Chen Fan 
  Chen Gang 
  Chen Gang S 
  Christian Borntraeger 
  Christophe Lyon 
  Claudio Fontana 
  Cornelia Huck 
  Daniel P. Berrange 
  Denis V. Lunev 
  Dinar Valeev 
  Don Koch 
  Don Slutz 
  Dr. David Alan Gilbert 
  Ed Swierk 
  Eduardo Habkost 
  Eduardo Otubo 
  Fabrice Bellard 
  Fam Zheng 
  Felix Janda 
  Francesco Romani 
  Frank Blaschka 
  Gerd Hoffmann 
  Gonglei 
  Greg Bellows 
  Greg Kurz 
  Guan Xuetao 
  Igor Mammedov 
  Ildar Isaev 
  Jan Kiszka 
  Jason Wang 
  Jeff Cody 
  Jiri Slaby 
  John Arbuckle 
  Juan Quintela 
  Kevin Wolf 
  Kirill Batuzov 
  Laszlo Ersek 
  Laurent Desnogues 
  Leon Yu 
  Marc-André Lureau 
  Mark Cave-Ayland 
  Markus Armbruster 
  Markus Armbruster 
  Max Filippov 
  Max Reitz 
  Maxim Ostapenko 
  Michael S. Tsirkin 
  Michael Tokarev 
  Paolo Bonzini 
  Paul Brook 
  Paul Durrant 
  Peter Lieven 
  Peter Maydell 
  Peter Wu 
  Pranavkumar Sawargaonkar 
  Programmingkid 
  Richard Henderson 
  Richard Sandiford 
  Riku Voipio 
  Stefan Hajnoczi 
  Stefan Weil 
  Stefano Stabellini 
  Thomas Huth 
  Torbjorn Gr

Re: [Xen-devel] [PATCH 3/3] libxl: libxl__device_from_disk should retrieve backend from xenstore

2015-02-26 Thread Jim Fehlig
Wei Liu wrote:
> On Wed, Feb 11, 2015 at 10:18:18AM -0700, Jim Fehlig wrote:
>   
>> At minimum, libvirt will populate the pdev_path, vdev, backend, and
>> format fields. If backend and format (which, in libvirt-speack
>> correspond to the 'name' and 'type' attributes on the optional 
>> element) are not specified, they are set to LIBXL_DISK_BACKEND_UNKNOWN
>> and LIBXL_DISK_FORMAT_RAW respectively.
>>
>> 
>
> Since libvirt has a tendency of specifying everything, how come there is
> no "name" and "type" in ?

The  element is optional. From
http://libvirt.org/formatdomain.html#elementsDisks

"|driver: |The optional driver element allows specifying further details
related to the hypervisor driver used to provide the disk"

And when not specified, Ian C. recommended allowing libxl to pick
suitable defaults

https://www.redhat.com/archives/libvir-list/2013-February/msg01126.html

>  Can we actually generate all the
>   
> fields needed when attaching a disk and store in libvirt's diskspec?

Yes, it was this way before the suggested change.

Regards,
Jim


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] freemem-slack and large memory environments

2015-02-26 Thread Mike Latimer
On Thursday, February 26, 2015 01:45:16 PM Mike Latimer wrote:
> On Thursday, February 26, 2015 05:53:06 PM Stefano Stabellini wrote:
> > What is the return value of libxl_set_memory_target and
> > libxl_wait_for_free_memory in that case? Isn't it just a matter of
> > properly handle the return values?
> 
> The return from libxl_set_memory_target is 0, as the assignment works just
> fine. I don't have the return from libxl_wait_for_free_memory in my notes,
> so I'll spin up another test and track that down.

I slightly misspoke here... In my testing, the returns are actually:

   libxl_set_memory_target = 1
   libxl_wait_for_free_memory = -5
   libxl_wait_for_memory_target = 0
  Note - libxl_wait_for_memory_target is confusing, as rc can be set
  to ERROR_FAIL, but the function returns 0 anyway (unless an error
  is encountered earlier.) I guess this just means we need to continue
  to wait...

I was testing spinning up a 64GB guest on a 2TB host. After the ballooning had 
completed, dom0 had ballooned down an extra ~320GB. On this particular 
machine, each iteration of the loop was showing only 5-7GB of memory being 
freed at a time. (The loop took 12 iterations.)

-Mike

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [RFC 2/2] x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader

2015-02-26 Thread Andy Lutomirski
On Thu, Jan 8, 2015 at 2:43 PM, Andy Lutomirski  wrote:
> On Thu, Jan 8, 2015 at 2:31 PM, Marcelo Tosatti  wrote:
>> On Tue, Jan 06, 2015 at 11:49:09AM -0800, Andy Lutomirski wrote:
>>> On Tue, Jan 6, 2015 at 10:45 AM, Marcelo Tosatti  
>>> wrote:
>>> > On Tue, Jan 06, 2015 at 10:26:22AM -0800, Andy Lutomirski wrote:
>>> >> On Tue, Jan 6, 2015 at 10:13 AM, Marcelo Tosatti  
>>> >> wrote:
>>> >> > On Tue, Jan 06, 2015 at 08:56:40AM -0800, Andy Lutomirski wrote:
>>> >> >> On Jan 6, 2015 4:01 AM, "Paolo Bonzini"  wrote:
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> > On 06/01/2015 09:42, Paolo Bonzini wrote:
>>> >> >> > > > > Still confused.  So we can freeze all vCPUs in the host, then 
>>> >> >> > > > > update
>>> >> >> > > > > pvti 1, then resume vCPU 1, then update pvti 0?  In that 
>>> >> >> > > > > case, we have
>>> >> >> > > > > a problem, because vCPU 1 can observe pvti 0 mid-update, and 
>>> >> >> > > > > KVM
>>> >> >> > > > > doesn't increment the version pre-update, and we can return 
>>> >> >> > > > > completely
>>> >> >> > > > > bogus results.
>>> >> >> > > > Yes.
>>> >> >> > > But then the getcpu test would fail (1->0).  Even if you have an 
>>> >> >> > > ABA
>>> >> >> > > situation (1->0->1), it's okay because the pvti that is fetched 
>>> >> >> > > is the
>>> >> >> > > one returned by the first getcpu.
>>> >> >> >
>>> >> >> > ... this case of partial update of pvti, which is caught by the 
>>> >> >> > version
>>> >> >> > field, if of course different from the other (extremely unlikely) 
>>> >> >> > that
>>> >> >> > Andy pointed out.  That is when the getcpus are done on the same 
>>> >> >> > vCPU,
>>> >> >> > but the rdtsc is another.
>>> >> >> >
>>> >> >> > That one can be fixed by rdtscp, like
>>> >> >> >
>>> >> >> > do {
>>> >> >> > // get a consistent (pvti, v, tsc) tuple
>>> >> >> > do {
>>> >> >> > cpu = get_cpu();
>>> >> >> > pvti = get_pvti(cpu);
>>> >> >> > v = pvti->version & ~1;
>>> >> >> > // also acts as rmb();
>>> >> >> > rdtsc_barrier();
>>> >> >> > tsc = rdtscp(&cpu1);
>>> >> >>
>>> >> >> Off-topic note: rdtscp doesn't need a barrier at all.  AIUI AMD
>>> >> >> specified it that way and both AMD and Intel implement it correctly.
>>> >> >> (rdtsc, on the other hand, definitely needs the barrier beforehand.)
>>> >> >>
>>> >> >> > // control dependency, no need for rdtsc_barrier?
>>> >> >> > } while(cpu != cpu1);
>>> >> >> >
>>> >> >> > // ... compute nanoseconds from pvti and tsc ...
>>> >> >> > rmb();
>>> >> >> > }   while(v != pvti->version);
>>> >> >>
>>> >> >> Still no good.  We can migrate a bunch of times so we see the same CPU
>>> >> >> all three times and *still* don't get a consistent read, unless we
>>> >> >> play nasty games with lots of version checks (I have a patch for that,
>>> >> >> but I don't like it very much).  The patch is here:
>>> >> >>
>>> >> >> https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/vdso_paranoia&id=a69754dc5ff33f5187162b5338854ad23dd7be8d
>>> >> >>
>>> >> >> but I don't like it.
>>> >> >>
>>> >> >> Thus far, I've been told unambiguously that a guest can't observe pvti
>>> >> >> while it's being written, and I think you're now telling me that this
>>> >> >> isn't true and that a guest *can* observe pvti while it's being
>>> >> >> written while the low bit of the version field is not set.  If so,
>>> >> >> this is rather strongly incompatible with the spec in the KVM docs.
>>> >> >>
>>> >> >> I don't suppose that you and Marcelo could agree on what the actual
>>> >> >> semantics that KVM provides are and could write it down in a way that
>>> >> >> people who haven't spent a long time staring at the request code
>>> >> >> understand?  And maybe you could even fix the implementation while
>>> >> >> you're at it if the implementation is, indeed, broken.  I have ugly
>>> >> >> patches to fix it here:
>>> >> >>
>>> >> >> https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/vdso_paranoia&id=3b718a050cba52563d831febc2e1ca184c02bac0
>>> >> >>
>>> >> >> but I'm not thrilled with them.
>>> >> >>
>>> >> >> --Andy
>>> >> >
>>> >> > I suppose that separating the version write from the rest of the 
>>> >> > pvclock
>>> >> > structure is sufficient, as that would guarantee the writes are not
>>> >> > reordered even with fast string REP MOVS.
>>> >> >
>>> >> > Thanks for catching this Andy!
>>> >> >
>>> >>
>>> >> Don't you stil need:
>>> >>
>>> >> version++;
>>> >> write the rest;
>>> >> version++;
>>> >>
>>> >> with possible smp_wmb() in there to keep the compiler from messing 
>>> >> around?
>>> >
>>> > Correct. Could just as well follow the protocol and use odd/even, which
>>> > is what your patch does.
>>> >
>>> > What is the point with the new flags bit though?
>>>
>>> To try to work around the problem on old hosts.  I'm not at all
>>> convinced that this is worthwhile or that it helps, though.
>>
>> Andy,
>>
>> Are you going to submit t

Re: [Xen-devel] Shared page tables between ETP and IOMMU issue

2015-02-26 Thread Elena Ufimtseva
On Thu, Feb 26, 2015 at 2:31 PM, Roger Pau Monné  wrote:
> El 26/02/15 a les 19.02, Roger Pau Monné ha escrit:
>> El 26/02/15 a les 17.43, Jan Beulich ha escrit:
>> On 26.02.15 at 17:29,  wrote:
 OK, I will try to take a look. All those faults come from physical
 memory ranges that are supposed to be usable, and in fact the CPU seems
 to be able to read/write from them without problems, or else the guest
 would have crashed much more early. Regarding sharing the page tables
 between EPT and the IOMMU, is there some bit that needs to be set in the
 ept entry in order to mark a page as available by the IOMMU?
>>>
>>> Bits 0 and 1 (read and write) are shared between VT-d and EPT
>>> (as is bit 7 - see struct dma_pte and ept_entry_t).
>>
>> I've added some debug prints at the end of construct_dom0 to print the
>> MFN of a RAM page (using get_gfn_query_unlocked) and the VTd entry
>> (using print_vtd_entries):
>>
>> (XEN) print_vtd_entries: iommu 8302197c3a40 dev :00:1f.2 gmfn 43e0
>> (XEN) root_entry = 8302197c
>> (XEN) root_entry[0] = 140144001
>> (XEN) context = 830140144000
>> (XEN) context[fa] = 2_140148001
>> (XEN) l4 = 830140148000
>> (XEN) l4_index = 0
>> (XEN) l4[0] = 140147003
>> (XEN) l3 = 830140147000
>> (XEN) l3_index = 0
>> (XEN) l3[0] = 140146003
>> (XEN) l2 = 830140146000
>> (XEN) l2_index = 21
>> (XEN) l2[21] = 0
>> (XEN) l2[21] not present
>> (XEN) GFN: 0x43e0 MFN: 0x1401e3 type: 0
>>
>> This is before Dom0 has been started, so I think there's something
>> wrong in the way we build the page tables, because AFAICT the VTd
>> code is not able to resolve the GFN, but the EPT code is.
>
> BTW, if I set no-sharept the output is as expected:
>
> (XEN) print_vtd_entries: iommu 8302197c3a40 dev :00:1f.2 gmfn 43e0
> (XEN) root_entry = 8302197c
> (XEN) root_entry[0] = 19793f001
> (XEN) context = 83019793f000
> (XEN) context[fa] = 2_140149001
> (XEN) l4 = 830140149000
> (XEN) l4_index = 0
> (XEN) l4[0] = 140148003
> (XEN) l3 = 830140148000
> (XEN) l3_index = 0
> (XEN) l3[0] = 140147003
> (XEN) l2 = 830140147000
> (XEN) l2_index = 21
> (XEN) l2[21] = 14012c003
> (XEN) l1 = 83014012c000
> (XEN) l1_index = 1e0
> (XEN) l1[1e0] = 1401e3003
> (XEN) GFN: 0x43e0 MFN: 0x1401e3 type: 0
>
>
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel


Hi Roger

Can you please print same debug for 7cb92 address (where L3 page table
is missing).
With shared and not shared ept?

Thank you!

-- 
Elena

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] freemem-slack and large memory environments

2015-02-26 Thread Mike Latimer
On Thursday, February 26, 2015 05:53:06 PM Stefano Stabellini wrote:
> What is the return value of libxl_set_memory_target and
> libxl_wait_for_free_memory in that case? Isn't it just a matter of
> properly handle the return values?

The return from libxl_set_memory_target is 0, as the assignment works just 
fine. I don't have the return from libxl_wait_for_free_memory in my notes, so 
I'll spin up another test and track that down.

> Or maybe we just need to change the libxl_set_memory_target call to use
> an absolute memory target to avoid restricting dom0 memory more than
> necessary at each iteration. Also increasing the timeout argument passed
> to the libxl_wait_for_free_memory call could help.

Using an absolute target would help, and would obviously only have to be set 
once - which is similar to what my patch did.

Increasing the timeout would help, but if the timeout were insufficient (say 
when dealing with very large guests), it wouldn't solve the problem.

-Mike

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] freemem-slack and large memory environments

2015-02-26 Thread Mike Latimer
(Sorry for the delayed response, dealing with ENOTIME.)

On Thursday, February 26, 2015 05:47:21 PM Ian Campbell wrote:
> On Thu, 2015-02-26 at 10:38 -0700, Mike Latimer wrote:
>
> >rc = libxl_set_memory_target(ctx, 0, free_memkb - need_memkb, 1, 0);
>
> I think so. In essence we just need to update need_memkb on each
> iteration, right?

Not quite...  need_memkb is used in the loop to determine if we have enough 
free memory for the new domain. So, need_memkb should always remain set to the 
total amount of memory requested - not just the amount of change still 
required.

The easiest thing to do is set the dom0's memory target before the loop, which 
is what my original patch did.

Another approach would be something like this:

uint32_t dom0_memkb, dom0_targetkb, pending_memkb;

libxl_get_memory(ctx, 0, &dom0_memkb);   <--doesn't actually exist

libxl_get_memory_target(ctx, 0, &dom0_targetkb);

pending_memkb = (free_memkb + (dom0_memkb - dom0_targetkb));

if (pending_memkb < need_memkb) {
libxl_set_memory_target(ctx, 0, pending_memkb - need_memkb, 1, 0);
}

which essentially sets pending_memkb to the amount of free memory plus the 
amount of memory which will be freed once dom0 hits the its target.

The final possibility I can think of is to ensure libxl_wait_for_memory_target 
does not return until the memory target has been reached. That raises some 
concern about what happens if the target cannot be reached though...

-Mike



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] RFC: Automatically check xen's public headers for C++ pitfalls.

2015-02-26 Thread Don Slutz
On 02/26/15 14:22, Tim Deegan wrote:
> At 19:49 +0200 on 26 Feb (1424976562), Razvan Cojocaru wrote:
>> On 02/26/2015 07:01 PM, Tim Deegan wrote:
>>> +#ifdef __cplusplus
>>> +/* 'private' is a keyword in C++, so we have to use a different name for
>>> + * private state there.  Leaving the C name alone to avoid unnecessary
>>> + * pain for the existing users. */
>>> +#define XEN_RING_PRIVATE pvt
>>> +#else
>>> +#define XEN_RING_PRIVATE private
>>> +#endif
>>
>> Are there likely to be many users outside of the ones using that code
>> with mem_event?
> 
> Yes, lots.  It's used to implement split drivers for net, block, etc.
> Most users will have taken copies of this header into their own trees,
> though, and so won't face build breakage, and this isn't an ABI change.
> 
> So far, I've seen David and Andrew in favour of just changing the
> field's name and letting out-of-tree users update their copies when/if
> they want to.  Jan would prefer to avoid changing the field name for C
> users.  I'm not delighted with any of these options but I think this
> ifdeffery is worse than the others. :)
> 
> Let's see what anyone else has to say.
> 

Since I am one of the user of C++ and Xen headers, I like this a lot.
I do not like the ifdeffery above.  I am in favour of just changing the
the field's name.

-Don Slutz

> Cheers,
> 
> Tim.
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v3] RFC: Automatically check xen's public headers for C++ pitfalls.

2015-02-26 Thread Don Slutz
On 02/26/15 11:24, Tim Deegan wrote:
> Add a check, like the existing check for non-ANSI C in the public
> headers, that runs the public headers through a C++ compiler to
> flag non-C++-friendly constructs.
> 
> Unlike the ANSI C check, we accept GCC-isms (gnu++98), and we also
> check various tools-only headers.
> 
> Explicitly _not_ addressing the use of 'private' in various fields,
> since we'd previously decided not to fix that.

This sentence and the "-Dprivate=private_is_a_keyword_in_cpp" below
appear to be at odds.  Maybe add something like:

The check ignores the use of 'private'.

> 
> Also tidy up the runes for these checks to be a bit more readable.
> 
> Reported-by: Razvan Cojocaru 
> Signed-off-by: Tim Deegan 
> Cc: Jan Beulich 
> 
> ---
> 

You can add my

Tested-by: Don Slutz 

   -Don Slutz

> v3: rebase on staging.
> 
> v2: test more headers;
> define __XEN_TOOLS__;
> use g++98 rather than ansi;
> tidy the makefile for readability;
> add a missing include to flask_op.h, which uses evtchn_port_t.
> ---
>  .gitignore|  1 +
>  config/StdGNU.mk  |  2 ++
>  config/SunOS.mk   |  1 +
>  xen/include/Makefile  | 28 
>  xen/include/public/xsm/flask_op.h |  2 ++
>  5 files changed, 30 insertions(+), 4 deletions(-)
> 
> diff --git a/.gitignore b/.gitignore
> index 13ee05b..78958ea 100644
> --- a/.gitignore
> +++ b/.gitignore
> @@ -233,6 +233,7 @@ xen/arch/*/efi/compat.c
>  xen/arch/*/efi/efi.h
>  xen/arch/*/efi/runtime.c
>  xen/include/headers.chk
> +xen/include/headers++.chk
>  xen/include/asm
>  xen/include/asm-*/asm-offsets.h
>  xen/include/compat/*
> diff --git a/config/StdGNU.mk b/config/StdGNU.mk
> index 4efebe3..e10ed39 100644
> --- a/config/StdGNU.mk
> +++ b/config/StdGNU.mk
> @@ -2,9 +2,11 @@ AS = $(CROSS_COMPILE)as
>  LD = $(CROSS_COMPILE)ld
>  ifeq ($(clang),y)
>  CC = $(CROSS_COMPILE)clang
> +CXX= $(CROSS_COMPILE)clang++
>  LD_LTO = $(CROSS_COMPILE)llvm-ld
>  else
>  CC = $(CROSS_COMPILE)gcc
> +CXX= $(CROSS_COMPILE)g++
>  LD_LTO = $(CROSS_COMPILE)ld
>  endif
>  CPP= $(CC) -E
> diff --git a/config/SunOS.mk b/config/SunOS.mk
> index 3316280..c2be37d 100644
> --- a/config/SunOS.mk
> +++ b/config/SunOS.mk
> @@ -2,6 +2,7 @@ AS = $(CROSS_COMPILE)gas
>  LD = $(CROSS_COMPILE)gld
>  CC = $(CROSS_COMPILE)gcc
>  CPP= $(CROSS_COMPILE)gcc -E
> +CXX= $(CROSS_COMPILE)g++
>  AR = $(CROSS_COMPILE)gar
>  RANLIB = $(CROSS_COMPILE)granlib
>  NM = $(CROSS_COMPILE)gnm
> diff --git a/xen/include/Makefile b/xen/include/Makefile
> index 94112d1..d48a642 100644
> --- a/xen/include/Makefile
> +++ b/xen/include/Makefile
> @@ -87,13 +87,33 @@ compat/xlat.h: $(addprefix compat/.xlat/,$(xlat-y)) 
> Makefile
>  
>  ifeq ($(XEN_TARGET_ARCH),$(XEN_COMPILE_ARCH))
>  
> -all: headers.chk
> +all: headers.chk headers++.chk
>  
> -headers.chk: $(filter-out public/arch-% public/%ctl.h public/xsm/% 
> public/%hvm/save.h, $(wildcard public/*.h public/*/*.h) $(public-y)) Makefile
> - for i in $(filter %.h,$^); do $(CC) -ansi -include stdint.h -Wall -W 
> -Werror -S -o /dev/null -x c $$i || exit 1; echo $$i; done >$@.new
> +PUBLIC_HEADERS := $(filter-out public/arch-% public/dom0_ops.h, $(wildcard 
> public/*.h public/*/*.h) $(public-y))
> +
> +PUBLIC_ANSI_HEADERS := $(filter-out public/%ctl.h public/xsm/% 
> public/%hvm/save.h, $(PUBLIC_HEADERS))
> +
> +headers.chk: $(PUBLIC_ANSI_HEADERS) Makefile
> + for i in $(filter %.h,$^); do \
> + $(CC) -x c -ansi -Wall -Werror -include stdint.h \
> +   -S -o /dev/null $$i || exit 1; \
> + echo $$i; \
> + done >$@.new
> + mv $@.new $@
> +
> +headers++.chk: $(PUBLIC_HEADERS) Makefile
> + if $(CXX) -v >/dev/null 2>&1; then \
> + for i in $(filter %.h,$^); do \
> + $(CXX) -x c++ -std=gnu++98 -Wall -Werror \
> +-D__XEN_TOOLS__ -Dprivate=private_is_a_keyword_in_cpp \
> +-include stdint.h -include public/xen.h \
> +-S -o /dev/null $$i || exit 1; \
> + echo $$i; \
> + done ; \
> + fi >$@.new
>   mv $@.new $@
>  
>  endif
>  
>  clean::
> - rm -rf compat headers.chk
> + rm -rf compat headers.chk headers++.chk
> diff --git a/xen/include/public/xsm/flask_op.h 
> b/xen/include/public/xsm/flask_op.h
> index 233de81..f874589 100644
> --- a/xen/include/public/xsm/flask_op.h
> +++ b/xen/include/public/xsm/flask_op.h
> @@ -25,6 +25,8 @@
>  #ifndef __FLASK_OP_H__
>  #define __FLASK_OP_H__
>  
> +#include "../event_channel.h"
> +
>  #define XEN_FLASK_INTERFACE_VERSION 1
>  
>  struct xen_flask_load {
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [xen-unstable test] 35257: regressions - FAIL

2015-02-26 Thread xen . org
flight 35257 xen-unstable real [real]
http://www.chiark.greenend.org.uk/~xensrcts/logs/35257/

Regressions :-(

Tests which did not succeed and are blocking,
including tests which could not be run:
 test-armhf-armhf-libvirt 12 guest-start.2 fail REGR. vs. 34629

Regressions which are regarded as allowable (not blocking):
 test-amd64-i386-pair17 guest-migrate/src_host/dst_host fail like 34629

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-xl-sedf-pin 10 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  10 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-sedf 10 migrate-support-checkfail   never pass
 test-amd64-amd64-rumpuserxen-amd64 13 
rumpuserxen-demo-xenstorels/xenstorels.repeat fail never pass
 test-armhf-armhf-xl-credit2  10 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-midway   10 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-pvh-amd   9 guest-start  fail   never pass
 test-armhf-armhf-libvirt 10 migrate-support-checkfail   never pass
 test-amd64-amd64-xl-pvh-intel  9 guest-start  fail  never pass
 test-amd64-i386-libvirt  10 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-multivcpu 10 migrate-support-checkfail  never pass
 test-amd64-amd64-xl-pcipt-intel  9 guest-start fail never pass
 test-amd64-amd64-libvirt 10 migrate-support-checkfail   never pass
 test-amd64-i386-xl-qemut-winxpsp3 14 guest-stopfail never pass
 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-i386-xl-qemuu-winxpsp3 14 guest-stopfail never pass
 test-amd64-amd64-xl-qemut-win7-amd64 14 guest-stop fail never pass
 test-amd64-i386-xl-qemuu-win7-amd64 14 guest-stop  fail never pass
 test-amd64-amd64-xl-winxpsp3 14 guest-stop   fail   never pass
 test-amd64-amd64-xl-qemuu-win7-amd64 14 guest-stop fail never pass
 test-amd64-amd64-xl-win7-amd64 14 guest-stop   fail never pass
 test-amd64-i386-xl-qemut-winxpsp3-vcpus1 14 guest-stop fail never pass
 test-amd64-i386-xl-qemut-win7-amd64 14 guest-stop  fail never pass
 test-amd64-i386-xl-win7-amd64 14 guest-stop   fail  never pass
 test-amd64-i386-xl-winxpsp3-vcpus1 14 guest-stop   fail never pass
 test-amd64-i386-xl-winxpsp3  14 guest-stop   fail   never pass
 test-amd64-amd64-xl-qemut-winxpsp3 14 guest-stop   fail never pass
 test-amd64-amd64-xl-qemuu-winxpsp3 14 guest-stop   fail never pass

version targeted for testing:
 xen  24b2b8dea180098a3acc809a91cde6c0cc4c8607
baseline version:
 xen  cb34a7c8d741aa447d79e1b01d71168a4088a4d7


People who touched revisions under test:
  Andrew Cooper 
  Dario Faggioli 
  David Scott 
  Don Slutz 
  Elena Ufimsteva 
  George Dunlap 
  Ian Campbell 
  Ian Jackson 
  Jan Beulich 
  Jintack Lim 
  Julien Grall 
  Michael Young 
  Olaf Hering 
  Stefano Stabellini 
  Wei Liu 


jobs:
 build-amd64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-oldkern  pass
 build-i386-oldkern   pass
 build-amd64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 build-amd64-rumpuserxen  pass
 build-i386-rumpuserxen   pass
 test-amd64-amd64-xl  pass
 test-armhf-armhf-xl  pass
 test-amd64-i386-xl   pass
 test-amd64-amd64-xl-pvh-amd  fail
 test-amd64-i386-rhel6hvm-amd pass
 test-amd64-i386-qemut-rhel6hvm-amd   pass
 test-amd64-i386-qemuu-rhel6hvm-amd   pass
 test-amd64-amd64-xl-qemut-debianhvm-amd64pass
 test-amd64-i386-xl-qemut-debianhvm-amd64 pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-i386-xl-qemuu-debianhvm-amd64 

Re: [Xen-devel] [PATCH v9 08/13] Add IOREQ_TYPE_VMWARE_PORT

2015-02-26 Thread Don Slutz
On 02/26/15 03:07, Jan Beulich wrote:
 On 25.02.15 at 21:20,  wrote:
>> On 02/24/15 10:34, Jan Beulich wrote:
>> On 17.02.15 at 00:05,  wrote:
 @@ -501,22 +542,50 @@ static void hvm_free_ioreq_gmfn(struct domain *d, 
 unsigned long gmfn)

[snip]

 @@ -2429,9 +2552,6 @@ struct hvm_ioreq_server 
 *hvm_select_ioreq_server(struct 
>> domain *d,
  if ( list_empty(&d->arch.hvm_domain.ioreq_server.list) )
  return NULL;
  
 -if ( p->type != IOREQ_TYPE_COPY && p->type != IOREQ_TYPE_PIO )
 -return d->arch.hvm_domain.default_ioreq_server;
>>>
>>> Shouldn't this rather be amended than deleted?
>>>
>>
>> The reason is below:
>>
 @@ -2474,7 +2594,12 @@ struct hvm_ioreq_server 
 *hvm_select_ioreq_server(struct domain *d,
  BUILD_BUG_ON(IOREQ_TYPE_PIO != HVMOP_IO_RANGE_PORT);
  BUILD_BUG_ON(IOREQ_TYPE_COPY != HVMOP_IO_RANGE_MEMORY);
  BUILD_BUG_ON(IOREQ_TYPE_PCI_CONFIG != HVMOP_IO_RANGE_PCI);
 +BUILD_BUG_ON(IOREQ_TYPE_VMWARE_PORT != 
 HVMOP_IO_RANGE_VMWARE_PORT);
 +BUILD_BUG_ON(IOREQ_TYPE_TIMEOFFSET != HVMOP_IO_RANGE_TIMEOFFSET);
 +BUILD_BUG_ON(IOREQ_TYPE_INVALIDATE != HVMOP_IO_RANGE_INVALIDATE);
  r = s->range[type];
 +if ( !r )
 +continue;
>>>
>>> Why, all of the sudden?
>>>
>>
>> This is the replacement for the deleted "if" above.  Continue will lead
>> to the same return that was remove above (it is at the end).  They are
>> currently the same because all ioreq servers have the same set of
>> ranges.  But if it would help, I can change "continue" into the "return
>> default".
> 
> So further down you talk of the "special range 1" (see there for
> further remarks in this regard) - how would r be NULL here in the
> first place?

Since there is a hole in the #defines 0,1,2,7,8 (currently) range[6] is
where r will be NULL for example.  However no current code should be
able to get here.  So if you want me to I can drop the "if".

> That said - yes, making this explicitly do what is
> intended (perhaps rather using "break" instead of "return") would
> seem very desirable. There simply is no point in continuing the
> loop.
> 

Will use break if the "if" is not dropped.

 @@ -2501,6 +2626,13 @@ struct hvm_ioreq_server 
 *hvm_select_ioreq_server(struct domain *d,
  }
  
  break;
 +case IOREQ_TYPE_VMWARE_PORT:
 +case IOREQ_TYPE_TIMEOFFSET:
 +case IOREQ_TYPE_INVALIDATE:
 +if ( rangeset_contains_singleton(r, 1) )
 +return s;
>>>
>>> This literal 1 at least needs explanation (i.e. a comment).
>>>
>>
>> The comment is below (copied here).  Will duplicate it here (with any
>> adjustments needed):
>>
>>  + * NOTE: The 'special' range of 1 is what is checked for outside
>>  + * of the three types of I/O.
>>
>> How about /* The 'special' range of 1 is checked for being enabled */?
> 
> Along these lines, yes (fixed for coding style). And then "1" is not
> a range of any kind. I suppose writing it as a proper range (e.g.
> [1,1]) would already help.
> 

I will adjust to using [1,1].

 --- a/xen/arch/x86/x86_emulate/x86_emulate.h
 +++ b/xen/arch/x86/x86_emulate/x86_emulate.h
 @@ -112,6 +112,8 @@ struct __packed segment_register {
  #define X86EMUL_RETRY  3
   /* (cmpxchg accessor): CMPXCHG failed. Maps to X86EMUL_RETRY in caller. 
 */
  #define X86EMUL_CMPXCHG_FAILED 3
 + /* Send part of registers also to DM. */
 +#define X86EMUL_VMPORT_SEND4
>>>
>>> Introducing a new value here seems rather fragile, as various code
>>> paths you don't touch would need auditing that they do the right
>>> thing upon this value being returned. Plus even conceptually this
>>> doesn't belong here - the instruction emulator shouldn't be concerned
>>> with things like VMware emulation.
>>>
>>
>> The only place I know of where rc is not checked by name is in
>> x86_emulate.c.  There are a lot of 0 and != 0 checks.  Also in area of
>> code there are places that return X86EMUL_OKAY when it looks to me that
>> the return value is checked for 0 and ignored otherwise.
> 
> The point aren't the checks against zero, but the ones against the
> #define-d values. Code may exist that, after excluding certain
> values, assumes that only some specific value can be left. While
> we aim at adding ASSERT()s for such cases, I'm nowhere near to
> being convinced this is the case everywhere.
> 
>> So I will agree that the use of these defines is complex.  However, I
>> need a way to pass back X86EMUL_UNHANDLEABLE and send a few registers to
>> QEMU.  Now since the code path that I need to do this is:
>>
>> ...
>>  hvmemul_do_io
>>   hvm_portio_intercept
>>hvm_io_intercept
>> process_portio_intercept
>>  vmport_ioport
>>
>>
>> Since there is only 1 caller to hvm_portio_intercept() --
>> hvmemul_do_io, and hvmemul_do_i

Re: [Xen-devel] Shared page tables between ETP and IOMMU issue

2015-02-26 Thread Roger Pau Monné
El 26/02/15 a les 19.02, Roger Pau Monné ha escrit:
> El 26/02/15 a les 17.43, Jan Beulich ha escrit:
> On 26.02.15 at 17:29,  wrote:
>>> OK, I will try to take a look. All those faults come from physical
>>> memory ranges that are supposed to be usable, and in fact the CPU seems
>>> to be able to read/write from them without problems, or else the guest
>>> would have crashed much more early. Regarding sharing the page tables
>>> between EPT and the IOMMU, is there some bit that needs to be set in the
>>> ept entry in order to mark a page as available by the IOMMU?
>>
>> Bits 0 and 1 (read and write) are shared between VT-d and EPT
>> (as is bit 7 - see struct dma_pte and ept_entry_t).
> 
> I've added some debug prints at the end of construct_dom0 to print the 
> MFN of a RAM page (using get_gfn_query_unlocked) and the VTd entry 
> (using print_vtd_entries):
> 
> (XEN) print_vtd_entries: iommu 8302197c3a40 dev :00:1f.2 gmfn 43e0
> (XEN) root_entry = 8302197c
> (XEN) root_entry[0] = 140144001
> (XEN) context = 830140144000
> (XEN) context[fa] = 2_140148001
> (XEN) l4 = 830140148000
> (XEN) l4_index = 0
> (XEN) l4[0] = 140147003
> (XEN) l3 = 830140147000
> (XEN) l3_index = 0
> (XEN) l3[0] = 140146003
> (XEN) l2 = 830140146000
> (XEN) l2_index = 21
> (XEN) l2[21] = 0
> (XEN) l2[21] not present
> (XEN) GFN: 0x43e0 MFN: 0x1401e3 type: 0
> 
> This is before Dom0 has been started, so I think there's something 
> wrong in the way we build the page tables, because AFAICT the VTd 
> code is not able to resolve the GFN, but the EPT code is.

BTW, if I set no-sharept the output is as expected:

(XEN) print_vtd_entries: iommu 8302197c3a40 dev :00:1f.2 gmfn 43e0
(XEN) root_entry = 8302197c
(XEN) root_entry[0] = 19793f001
(XEN) context = 83019793f000
(XEN) context[fa] = 2_140149001
(XEN) l4 = 830140149000
(XEN) l4_index = 0
(XEN) l4[0] = 140148003
(XEN) l3 = 830140148000
(XEN) l3_index = 0
(XEN) l3[0] = 140147003
(XEN) l2 = 830140147000
(XEN) l2_index = 21
(XEN) l2[21] = 14012c003
(XEN) l1 = 83014012c000
(XEN) l1_index = 1e0
(XEN) l1[1e0] = 1401e3003
(XEN) GFN: 0x43e0 MFN: 0x1401e3 type: 0


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Shared page tables between ETP and IOMMU issue

2015-02-26 Thread Roger Pau Monné
El 26/02/15 a les 20.28, Konrad Rzeszutek Wilk ha escrit:
> On Thu, Feb 26, 2015 at 07:02:22PM +0100, Roger Pau Monné wrote:
>> El 26/02/15 a les 17.43, Jan Beulich ha escrit:
>> On 26.02.15 at 17:29,  wrote:
 OK, I will try to take a look. All those faults come from physical
 memory ranges that are supposed to be usable, and in fact the CPU seems
 to be able to read/write from them without problems, or else the guest
 would have crashed much more early. Regarding sharing the page tables
 between EPT and the IOMMU, is there some bit that needs to be set in the
 ept entry in order to mark a page as available by the IOMMU?
>>>
>>> Bits 0 and 1 (read and write) are shared between VT-d and EPT
>>> (as is bit 7 - see struct dma_pte and ept_entry_t).
>>
>> I've added some debug prints at the end of construct_dom0 to print the 
>> MFN of a RAM page (using get_gfn_query_unlocked) and the VTd entry 
>> (using print_vtd_entries):
>>
>> (XEN) print_vtd_entries: iommu 8302197c3a40 dev :00:1f.2 gmfn 43e0
>> (XEN) root_entry = 8302197c
>> (XEN) root_entry[0] = 140144001
>> (XEN) context = 830140144000
>> (XEN) context[fa] = 2_140148001
>> (XEN) l4 = 830140148000
>> (XEN) l4_index = 0
>> (XEN) l4[0] = 140147003
>> (XEN) l3 = 830140147000
>> (XEN) l3_index = 0
>> (XEN) l3[0] = 140146003
>> (XEN) l2 = 830140146000
>> (XEN) l2_index = 21
>> (XEN) l2[21] = 0
>> (XEN) l2[21] not present
>> (XEN) GFN: 0x43e0 MFN: 0x1401e3 type: 0
>>
>> This is before Dom0 has been started, so I think there's something 
>> wrong in the way we build the page tables, because AFAICT the VTd 
>> code is not able to resolve the GFN, but the EPT code is.
> 
> This looks like what Elena was hitting (how we parsed E820_RSV or
> MMIO ranges). Are those GPFNs  special? 

No, they are regular RAM (p2m_ram_rw). I think Elena's problem was due
to missing RMRR regions in the ACPI tables. On the other hand this is
the IOMMU failing to provide translations for RAM regions. It seems like
it's caused by sharing the page tables between EPT and the IOMMUs.

Roger.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Shared page tables between ETP and IOMMU issue

2015-02-26 Thread Konrad Rzeszutek Wilk
On Thu, Feb 26, 2015 at 07:02:22PM +0100, Roger Pau Monné wrote:
> El 26/02/15 a les 17.43, Jan Beulich ha escrit:
>  On 26.02.15 at 17:29,  wrote:
> >> OK, I will try to take a look. All those faults come from physical
> >> memory ranges that are supposed to be usable, and in fact the CPU seems
> >> to be able to read/write from them without problems, or else the guest
> >> would have crashed much more early. Regarding sharing the page tables
> >> between EPT and the IOMMU, is there some bit that needs to be set in the
> >> ept entry in order to mark a page as available by the IOMMU?
> > 
> > Bits 0 and 1 (read and write) are shared between VT-d and EPT
> > (as is bit 7 - see struct dma_pte and ept_entry_t).
> 
> I've added some debug prints at the end of construct_dom0 to print the 
> MFN of a RAM page (using get_gfn_query_unlocked) and the VTd entry 
> (using print_vtd_entries):
> 
> (XEN) print_vtd_entries: iommu 8302197c3a40 dev :00:1f.2 gmfn 43e0
> (XEN) root_entry = 8302197c
> (XEN) root_entry[0] = 140144001
> (XEN) context = 830140144000
> (XEN) context[fa] = 2_140148001
> (XEN) l4 = 830140148000
> (XEN) l4_index = 0
> (XEN) l4[0] = 140147003
> (XEN) l3 = 830140147000
> (XEN) l3_index = 0
> (XEN) l3[0] = 140146003
> (XEN) l2 = 830140146000
> (XEN) l2_index = 21
> (XEN) l2[21] = 0
> (XEN) l2[21] not present
> (XEN) GFN: 0x43e0 MFN: 0x1401e3 type: 0
> 
> This is before Dom0 has been started, so I think there's something 
> wrong in the way we build the page tables, because AFAICT the VTd 
> code is not able to resolve the GFN, but the EPT code is.

This looks like what Elena was hitting (how we parsed E820_RSV or
MMIO ranges). Are those GPFNs  special? 
> 
> Roger.
> 
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4] libxl_set_memory_target: retain the same maxmem offset on top of the current target

2015-02-26 Thread Wei Liu
On Thu, Feb 26, 2015 at 06:53:29PM +, Stefano Stabellini wrote:
[...]
> > >  }
> > >  
> > > -libxl_dominfo_init(&ptr);
> > > -xcinfo2xlinfo(ctx, &info, &ptr);
> > 
> > If I'm not mistaken, &info is only used here. I think you can delete
> > info and relevant code all together.
> 
> info is used later as an argument to xc_domain_getinfolist
> 
> 

What I meant was, the sole purpose of info and two function calls
xc_domain_getinfolist + xcinfo2xlinfo is to fill in ptr, which is done
by a single call to  libxl_domain_info at the beginning of your patch,
so it's possible to remove info and those two function calls all
together.

Wei.

> > 
> > > -uuid = libxl__uuid2string(gc, ptr.uuid);
> > >  libxl__xs_write(gc, t, libxl__sprintf(gc, "/vm/%s/memory", uuid),
> > >  "%"PRIu32, new_target_memkb / 1024);
> > > -libxl_dominfo_dispose(&ptr);
> > >  
> > >  out:
> > >  if (!xs_transaction_end(ctx->xsh, t, abort_transaction)
> > > -- 
> > > 1.7.10.4
> > 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] RFC: Automatically check xen's public headers for C++ pitfalls.

2015-02-26 Thread Tim Deegan
At 19:49 +0200 on 26 Feb (1424976562), Razvan Cojocaru wrote:
> On 02/26/2015 07:01 PM, Tim Deegan wrote:
> > +#ifdef __cplusplus
> > +/* 'private' is a keyword in C++, so we have to use a different name for
> > + * private state there.  Leaving the C name alone to avoid unnecessary
> > + * pain for the existing users. */
> > +#define XEN_RING_PRIVATE pvt
> > +#else
> > +#define XEN_RING_PRIVATE private
> > +#endif
> 
> Are there likely to be many users outside of the ones using that code
> with mem_event?

Yes, lots.  It's used to implement split drivers for net, block, etc.
Most users will have taken copies of this header into their own trees,
though, and so won't face build breakage, and this isn't an ABI change.

So far, I've seen David and Andrew in favour of just changing the
field's name and letting out-of-tree users update their copies when/if
they want to.  Jan would prefer to avoid changing the field name for C
users.  I'm not delighted with any of these options but I think this
ifdeffery is worse than the others. :)

Let's see what anyone else has to say.

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v5] libxl_set_memory_target: retain the same maxmem offset on top of the current target

2015-02-26 Thread Stefano Stabellini
In libxl_set_memory_target when setting the new maxmem, retain the same
offset on top of the current target. In the future the offset will
include memory allocated by QEMU for rom files.

Signed-off-by: Stefano Stabellini 

---

Changes in v5:
- call libxl_dominfo_init;
- move libxl_dominfo_dispose call before returning to the caller;

Changes in v4:
- remove new_target_memkb <= 0 check.

Changes in v3:
- move call to libxl__uuid2string and libxl_dominfo_dispose earlier;
- error out if new_target_memkb <= 0.

Changes in v2:
- remove LIBXL_MAXMEM_CONSTANT from LIBXL__LOG_ERRNO.
---
 tools/libxl/libxl.c |   18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
index b9a1941..7d42dc6 100644
--- a/tools/libxl/libxl.c
+++ b/tools/libxl/libxl.c
@@ -4720,6 +4720,12 @@ int libxl_set_memory_target(libxl_ctx *ctx, uint32_t 
domid,
 char *uuid;
 xs_transaction_t t;
 
+libxl_dominfo_init(&info);
+if (libxl_domain_info(ctx, &ptr, domid) < 0)
+goto out_no_transaction;
+
+uuid = libxl__uuid2string(gc, ptr.uuid);
+
 retry_transaction:
 t = xs_transaction_start(ctx->xsh);
 
@@ -4795,13 +4801,12 @@ retry_transaction:
 }
 
 if (enforce) {
-memorykb = new_target_memkb + videoram;
-rc = xc_domain_setmaxmem(ctx->xch, domid, memorykb +
-LIBXL_MAXMEM_CONSTANT);
+memorykb = ptr.max_memkb - current_target_memkb + new_target_memkb;
+rc = xc_domain_setmaxmem(ctx->xch, domid, memorykb);
 if (rc != 0) {
 LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR,
 "xc_domain_setmaxmem domid=%d memkb=%d failed "
-"rc=%d\n", domid, memorykb + LIBXL_MAXMEM_CONSTANT, rc);
+"rc=%d\n", domid, memorykb, rc);
 abort_transaction = 1;
 goto out;
 }
@@ -4826,12 +4831,8 @@ retry_transaction:
 goto out;
 }
 
-libxl_dominfo_init(&ptr);
-xcinfo2xlinfo(ctx, &info, &ptr);
-uuid = libxl__uuid2string(gc, ptr.uuid);
 libxl__xs_write(gc, t, libxl__sprintf(gc, "/vm/%s/memory", uuid),
 "%"PRIu32, new_target_memkb / 1024);
-libxl_dominfo_dispose(&ptr);
 
 out:
 if (!xs_transaction_end(ctx->xsh, t, abort_transaction)
@@ -4840,6 +4841,7 @@ out:
 goto retry_transaction;
 
 out_no_transaction:
+libxl_dominfo_dispose(&ptr);
 GC_FREE;
 return rc;
 }
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Crash of guest with nested vmx with Unknown nested vmexit reason 80000021.

2015-02-26 Thread Jeroen Groenewegen van der Weyden

Hi Jan,

Anything planned concerning this?

BR,
Jeroen.

Jan Beulich schreef op 9-12-2014 om 10:17:

On 09.12.14 at 10:09,  wrote:

Did anyone find the time yet? I'm still more then willing testing any
patches.

Just yesterday we were told by Intel that they still can't foresee when
they will find time.

Jan





___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v4] libxl_set_memory_target: retain the same maxmem offset on top of the current target

2015-02-26 Thread Stefano Stabellini
On Thu, 26 Feb 2015, Wei Liu wrote:
> On Wed, Feb 25, 2015 at 03:18:45PM +, Stefano Stabellini wrote:
> > In libxl_set_memory_target when setting the new maxmem, retain the same
> > offset on top of the current target. In the future the offset will
> > include memory allocated by QEMU for rom files.
> > 
> > Signed-off-by: Stefano Stabellini 
> > 
> > ---
> > 
> > Changes in v4:
> > - remove new_target_memkb <= 0 check.
> > 
> > Changes in v3:
> > - move call to libxl__uuid2string and libxl_dominfo_dispose earlier;
> > - error out if new_target_memkb <= 0.
> > 
> > Changes in v2:
> > - remove LIBXL_MAXMEM_CONSTANT from LIBXL__LOG_ERRNO.
> > ---
> >  tools/libxl/libxl.c |   12 +++-
> >  1 file changed, 7 insertions(+), 5 deletions(-)
> > 
> > diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c
> > index 52a783a..143cb3e 100644
> > --- a/tools/libxl/libxl.c
> > +++ b/tools/libxl/libxl.c
> > @@ -4681,6 +4681,12 @@ int libxl_set_memory_target(libxl_ctx *ctx, uint32_t 
> > domid,
> >  char *uuid;
> >  xs_transaction_t t;
> >  
> 
> Should have:
> 
> libxl_dominfo_init(&ptr);
> 
> > +if (libxl_domain_info(ctx, &ptr, domid) < 0)
> > +goto out_no_transaction;
> > +
> > +uuid = libxl__uuid2string(gc, ptr.uuid);
> > +libxl_dominfo_dispose(&ptr);
> > +
> 
> Since you need to use ptr later, you cannot dispose it here.
> 
> You can safely call dispose before returning to caller.
> 
> >  retry_transaction:
> >  t = xs_transaction_start(ctx->xsh);
> >  
> > @@ -4756,7 +4762,7 @@ retry_transaction:
> >  }
> >  
> >  if (enforce) {
> > -memorykb = new_target_memkb + videoram;
> > +memorykb = ptr.max_memkb - current_target_memkb + new_target_memkb;
> >  rc = xc_domain_setmaxmem(ctx->xch, domid, memorykb);
> >  if (rc != 0) {
> >  LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR,
> > @@ -4786,12 +4792,8 @@ retry_transaction:
> >  goto out;
> >  }
> >  
> > -libxl_dominfo_init(&ptr);
> > -xcinfo2xlinfo(ctx, &info, &ptr);
> 
> If I'm not mistaken, &info is only used here. I think you can delete
> info and relevant code all together.

info is used later as an argument to xc_domain_getinfolist


> 
> > -uuid = libxl__uuid2string(gc, ptr.uuid);
> >  libxl__xs_write(gc, t, libxl__sprintf(gc, "/vm/%s/memory", uuid),
> >  "%"PRIu32, new_target_memkb / 1024);
> > -libxl_dominfo_dispose(&ptr);
> >  
> >  out:
> >  if (!xs_transaction_end(ctx->xsh, t, abort_transaction)
> > -- 
> > 1.7.10.4
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: xen config changes v4

2015-02-26 Thread Luis R. Rodriguez
On Thu, Feb 26, 2015 at 05:42:57PM +, Stefano Stabellini wrote:
> On Thu, 26 Feb 2015, Luis R. Rodriguez wrote:
> > On Thu, Feb 26, 2015 at 11:08:20AM +, Stefano Stabellini wrote:
> > > On Thu, 26 Feb 2015, David Vrabel wrote:
> > > > On 26/02/15 04:59, Juergen Gross wrote:
> > > > > 
> > > > > So we are again in the situation that pv-drivers always imply the 
> > > > > pvops
> > > > > kernel (PARAVIRT selected). I started the whole Kconfig rework to
> > > > > eliminate this dependency.
> > > > 
> > > > Yes.  Can you produce a series that just addresses this one issue.
> > > > 
> > > > In the absence of any concrete requirement for this big Kconfig reorg I
> > > > I don't think it is helpful.
> > > 
> > > I clearly missed some context as I didn't realize that this was the
> > > intended goal. Why do we want this? Please explain as it won't come
> > > for free.
> > > 
> > > 
> > > We have a few PV interfaces for HVM guests that need PARAVIRT in Linux
> > > in order to be used, for example pv_time_ops and HVMOP_pagetable_dying.
> > > They are critical performance improvements and from the interface
> > > perspective, small enough that doesn't make much sense having a separate
> > > KConfig option for them.
> > > 
> > > 
> > > In order to reach the goal above we necessarily need to introduce a
> > > differentiation in terms of PV on HVM guests in Linux:
> > > 
> > > 1) basic guests with PV network, disk, etc but no PV timers, no
> > >HVMOP_pagetable_dying, no PV IPIs
> > > 2) full PV on HVM guests that have PV network, disk, timers,
> > >HVMOP_pagetable_dying, PV IPIs and anything else that makes sense.
> > > 
> > > 2) is much faster than 1) on Xen and 2) is only a tiny bit slower than
> > > 1) on native x86
> > 
> > Also don't we shove 2) down hvm guests right now? Even when everything is
> > built in I do not see how we opt out for HVM for 1) at run time right now.
> >
> > If this is true then the question of motivation for this becomes even
> > stronger I think.
> 
> Yes, indeed there is no way to do 1) at the moment. And for good
> reasons, see above.

OK if the goal is to be able to build front end drivers by avoiding building
PARAVIRT / PARAVIRT_CLOCK and if the gains to be able to do so (which haven't
been stated other than just the ability to do so) are small (as Stefano notes
simple hvm containers do not perform great) but requires a bit of work, I'd
rather ask -- why not address *why* we are avoiding PARAVIRT /
PARAVIRT_CLOCK and stick to the original goals behind the pvops model by
addressing what is required to be able to continue to be happy with one single
kernel. The work required to do that might be more than to just be able to
build simple Xen hvm containers without PARAVIRT / PARAVIRT_CLOCK  but I'd
think the gains would be much higher.

If this resonates well then I'd like to ask: what are the current most pressing
issues with enabling PARAVIRT / PARAVIRT_CLOCK.

  Luis

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Branch Trace Storage for guests andVPMUinitialization

2015-02-26 Thread Boris Ostrovsky

On 02/26/2015 12:57 PM, kevin.ma...@gdata.de wrote:




-Ursprüngliche Nachricht-
Von: Boris Ostrovsky [mailto:boris.ostrov...@oracle.com]
Gesendet: Donnerstag, 26. Februar 2015 17:35
An: Dietmar Hahn; xen-devel@lists.xen.org
Cc: Mayer, Kevin
Betreff: Re: [Xen-devel] Branch Trace Storage for guests and
VPMUinitialization

On 02/26/2015 03:56 AM, Dietmar Hahn wrote:

Am Mittwoch 25 Februar 2015, 11:31:31 schrieb Boris Ostrovsky:

On 02/25/2015 10:12 AM, kevin.ma...@gdata.de wrote:

-Ursprüngliche Nachricht-
Von: Boris Ostrovsky [mailto:boris.ostrov...@oracle.com]
Gesendet: Dienstag, 24. Februar 2015 18:13
An: Mayer, Kevin; xen-devel@lists.xen.org
Betreff: Re: [Xen-devel] Branch Trace Storage for guests and VPMU
initialization

On 02/24/2015 10:27 AM, kevin.ma...@gdata.de wrote:

Hi guys

I`m trying to set up the BTS so that I can log the branches taken
in the guest using Xen 4.4.1 with a WinXP SP3 guest on a Core i7
Sandy Bridge.

I added the vpmu=bts boot parameter to my grub2 configuration and
extended the libxl,libxc,domctl,… with an own command so that I
can trigger the activation of the BTS whenever I want.


I am not sure why you are doing all these changes to Xen code. BTS
is supposed to be managed from the guest. For example, a Fedora

HVM

guest will produce this:

[root@dhcp-burlington7-2nd-B-east-10-152-55-140 ~]# perf record -e
branches:u -c 1 -d sleep 1 [ perf record: Woken up 3838 times to
write data ] [ perf record: Captured and wrote 0.704 MB perf.data
(~30756 samples) ]
[root@dhcp-burlington7-2nd-B-east-10-152-55-140 ~]# perf script -f
ip,addr,sym,dso,symoff --show-kernel-path
 8167c347 native_irq_return_iret+0x0 (/proc/kcore) =>
328c001590 [unknown] (/proc/kcore)
 8167c347 native_irq_return_iret+0x0 (/proc/kcore) =>
328c001590 [unknown] ([unknown])
   328c001593 [unknown] ([unknown]) =>   328c004b70 [unknown]
([unknown])
...


I want to be able to log the taken branches (of the guest) without the

need to modify the guest at all.

This means I have to do all the logic in the hypervisor, or am I wrong?

In that case, yes. But then you have to make sure that at least
* you don't load guest's VPMU (or, at least, BTS-related
registers) on context switch

But you need to modify PMU registers when switching to/from the guest
context to get PMU running.




I was thinking that all BTS stuff can be controlled from dom0 and so we can
use dom0's version of these registers. I didn't realize that DS_AREA would
have to be accessed in guest's address space (and that DEBUGCTL is loaded
from VMCS).

Which is what I think I said in response to this message (which didn't show up
on the list because Kevin accidentally dropped xen-devel).

-boris


Terribly sorry about that...

So the VPMU doesn’t get loaded when there is a VMENTER?



Not exactly. For BTS, DEBUGCTL register, which lives in VMCS, does get 
loaded. But not DS_AREA --- it gets loaded by SW during 
context_switch()->vpmu_load().


(As for general VPMU registers such as counters --- they are also loaded 
during context_switch(). But I don't think you care about those. From 
what little I know about BTS, DEBUGCTL and DS_AREA are the only two 
registers you are interested in)



I thought I could set the domU->vcpu->vpmu to enable BTS while in dom0 (with 
modified versions of msr_write_intercept, vpmu_do_wrmsr and core2_vpmu_do_wrmsr of 
course since the build in ones use the current-vcpu which would be the dom0-vcpu)
and as soon as there is a context switch to domU the vpmu gets loaded and the 
guest starts logging.


And it should work, provided that DS_AREA is set up correctly.


If the described behavior is correct the only problem I can see is with 
allocating memory in dom0 in a way that the guest can access it.


This sounds right. All you have to do now is implementation details ;-)

-boris



But if I got it wrong please explain how the vpmu really works.

Cheers

Kevin






I didn't think of using the VPMU stuff with modifying the context from
outside the guest.


* You don't send the interrupt to the guest (meaning that you will
need to somehow inform dom0 of the BTS interrupt)

and probably more.

Essentially, you want dom0 to profile the guest. I have been working
on patches that would allow that but they are still under review.



In this command I do the following:

I set up the memory region for the BTS Buffer and the DS Buffer
Management Area using xzalloc_bytes


I don't think you should be allocating BTS buffers in the
hypervisor, they are in guest's memory.

I agree. As I said I think this is where my main problem is at the moment.
Is there any way I can allocate memory in the hypervisor in a way the

guest can access it?

I am not sure this is what you want since you seem to *not* want the
guest to process the samples, right?

But yes, you can. E.g. something like what map_vcpu_info() does. (I
have no idea how you'd do this from Windows.)

The DS buffer has to be map

Re: [Xen-devel] [PATCH v2 3/3] xen/arm: allow console=hvc0 to be omitted for guests

2015-02-26 Thread Stefano Stabellini
On Wed, 18 Feb 2015, Ian Campbell wrote:
> On Wed, 2015-02-18 at 09:50 -0600, Rob Herring wrote:
> > On Wed, Feb 18, 2015 at 7:51 AM, Julien Grall  
> > wrote:
> > > From: Ard Biesheuvel 
> > >
> > > This patch registers hvc0 as the preferred console if no console
> > > has been specified explicitly on the kernel command line.
> > >
> > > The purpose is to allow platform agnostic kernels and boot images
> > > (such as distro installers) to boot in a Xen/ARM domU without the
> > > need to modify the command line by hand.
> > 
> > How does this interact with DT chosen stdout-path?
> 
> I think it shouldn't any more than the existing calls from e.g. the 8250
> driver to preferred_console do.
>
> > Is there a node for hvc0?
> 
> Not a direct one, it is inferred from the presence of the general Xen
> node.

Xen PV consoles, including hvc0, as all the other Xen PV devices are
advertised on xenstore.


> I did vaguely consider handling a stdout-path pointing to that --
> but it seemed a bit of an abuse.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2 1/3] arm/xen: Correctly check if the event channel interrupt is present

2015-02-26 Thread Stefano Stabellini
On Wed, 18 Feb 2015, Julien Grall wrote:
> The function irq_of_parse_and_map returns 0 when the IRQ is not found.
> 
> Futhermore, move the check before notifying the user that we are running on
> Xen.
> 
> Signed-off-by: Julien Grall 
> Acked-by: Ian Campbell 

Acked-by: Stefano Stabellini 


> ---
> Changes in v2:
> - Add Ian's ack
> - Re-add __read_mostly
> ---
>  arch/arm/xen/enlighten.c | 10 ++
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c
> index 263a204..c8d3a17 100644
> --- a/arch/arm/xen/enlighten.c
> +++ b/arch/arm/xen/enlighten.c
> @@ -51,7 +51,7 @@ EXPORT_SYMBOL_GPL(xen_have_vector_callback);
>  int xen_platform_pci_unplug = XEN_UNPLUG_ALL;
>  EXPORT_SYMBOL_GPL(xen_platform_pci_unplug);
>  
> -static __read_mostly int xen_events_irq = -1;
> +static __read_mostly unsigned int xen_events_irq;
>  
>  /* map fgmfn of domid to lpfn in the current domain */
>  static int map_foreign_page(unsigned long lpfn, unsigned long fgmfn,
> @@ -251,12 +251,14 @@ static int __init xen_guest_init(void)
>   return 0;
>   grant_frames = res.start;
>   xen_events_irq = irq_of_parse_and_map(node, 0);
> + if (!xen_events_irq) {
> + pr_debug("Xen event channel interrupt not found\n");
> + return -ENODEV;
> + }
> +
>   pr_info("Xen %s support found, events_irq=%d gnttab_frame=%pa\n",
>   version, xen_events_irq, &grant_frames);
>  
> - if (xen_events_irq < 0)
> - return -ENODEV;
> -
>   xen_domain_type = XEN_HVM_DOMAIN;
>  
>   xen_setup_features();
> -- 
> 2.1.4
> 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [OSSTEST PATCH 8/8] ap-fetch-version: Use osstest's home to find master tree

2015-02-26 Thread Ian Jackson
Ian Campbell writes ("Re: [OSSTEST PATCH 8/8] ap-fetch-version: Use osstest's 
home to find master tree"):
> On Wed, 2015-02-25 at 13:01 +, Ian Jackson wrote:
> > When ap-fetch-version and ap-fetch-version-old are run on the osstest
> > controller but as a different user they should look in ~osstest, not
> > $HOME, for the master testing.git tree.
...
> But what if they are run not on the osstest controller where ~osstest
> may not exist?

Then they ought not to look for the user's $HOME/testing.git, which is
unlikely to (a) exist or (b) be relevant if it does.  They ought to
fail.

> I think your previous changes have already arranged that standalone mode
> won't get to either of these anyway, so, that being the case:

Yes, that's the intent.

> Acked-by: Ian Campbell 

I have added something about this to the commit message (and retained
your ack).

Thanks,
Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Shared page tables between ETP and IOMMU issue

2015-02-26 Thread Roger Pau Monné
El 26/02/15 a les 17.43, Jan Beulich ha escrit:
 On 26.02.15 at 17:29,  wrote:
>> OK, I will try to take a look. All those faults come from physical
>> memory ranges that are supposed to be usable, and in fact the CPU seems
>> to be able to read/write from them without problems, or else the guest
>> would have crashed much more early. Regarding sharing the page tables
>> between EPT and the IOMMU, is there some bit that needs to be set in the
>> ept entry in order to mark a page as available by the IOMMU?
> 
> Bits 0 and 1 (read and write) are shared between VT-d and EPT
> (as is bit 7 - see struct dma_pte and ept_entry_t).

I've added some debug prints at the end of construct_dom0 to print the 
MFN of a RAM page (using get_gfn_query_unlocked) and the VTd entry 
(using print_vtd_entries):

(XEN) print_vtd_entries: iommu 8302197c3a40 dev :00:1f.2 gmfn 43e0
(XEN) root_entry = 8302197c
(XEN) root_entry[0] = 140144001
(XEN) context = 830140144000
(XEN) context[fa] = 2_140148001
(XEN) l4 = 830140148000
(XEN) l4_index = 0
(XEN) l4[0] = 140147003
(XEN) l3 = 830140147000
(XEN) l3_index = 0
(XEN) l3[0] = 140146003
(XEN) l2 = 830140146000
(XEN) l2_index = 21
(XEN) l2[21] = 0
(XEN) l2[21] not present
(XEN) GFN: 0x43e0 MFN: 0x1401e3 type: 0

This is before Dom0 has been started, so I think there's something 
wrong in the way we build the page tables, because AFAICT the VTd 
code is not able to resolve the GFN, but the EPT code is.

Roger.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v5.99.1 RFC 1/4] xen/arm: Duplicate gic-v2.c file to support hip04 platform version

2015-02-26 Thread Stefano Stabellini
On Thu, 26 Feb 2015, Ian Campbell wrote:
> On Wed, 2015-02-25 at 16:34 +, Stefano Stabellini wrote:
> > I think we should disable the build of all drivers in Xen by default,
> > except for the ARM standard compliant ones (for aarch64 the SBSA is a
> > nice summary of what is considered compliant), to keep the size of the
> > binary small.
> 
> I think this last statement was based on information that the gic-v2
> driver was of the order of 70-100K in size, but I think that information
> was wrong (I suspect it was the raw .o size, which includes debug info
> and other extraneous bits). Here I see:
> 
> $ du -h xen/arch/arm/gic-v2.o
> 148K  xen/arch/arm/gic-v2.o
> $ aarch64-linux-gnu-size xen/arch/arm/gic-v2.o 
>text  data bss dec hex filename
>6619 0  9767161a3c xen/arch/arm/gic-v2.o
> 
> IOW the actual binary size is on the order of 6K (gic-v3.o is around the
> same). This is arm64, I can't be bothered to rebuild for arm32, it'll be
> similar.
> 
> Given that then I really don't think it is worth introducing a two tier
> build over it.
> 
> If we really cared about these sorts of savings we would arrange to
> discard all of the unused GIC/SMMU/UART driver's .text/.data/.bss after
> boot (easy enough to achieve by putting each in a dedicated segment).
> 
> But I don't think we have enough such drivers to start worrying about
> doing that just now. We have that opportunity in our back pocket if we
> ever get to that point, which is good enough I think.
> 
> > Could you please introduce a Xen build time option in
> > xen/arch/arm/Rules.mk, called HAS_NON_STANDARD_DRIVERS, that by default
> > is n, and gate the build of gic-hip04.c on it?
> 
> Frediano, I see you've already done so in v6, thanks for that. Sorry to
> go back on it.
> 
> Assuming the rest of the series in v6 is OK (gets acked and whatever)
> then I expect I can just skip that one patch when applying and fixup the
> Makefile in the obvious way (approx s/HAS_NON.../CONFIG_ARM32/) in the
> dependent patch.

v6 is fine from my POV, you can add my Acked-by to all patches.
I am OK with dropping HAS_NON_STANDARD_DRIVERS.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] freemem-slack and large memory environments

2015-02-26 Thread Stefano Stabellini
On Thu, 26 Feb 2015, Mike Latimer wrote:
> On Thursday, February 26, 2015 03:57:54 PM Ian Campbell wrote:
> > On Thu, 2015-02-26 at 08:36 -0700, Mike Latimer wrote:
> > > There is still one aspect of my original patch that is important. As the
> > > code currently stands, the target for dom0 is set lower during each
> > > iteration of the loop. Unless only one iteration is required, dom0 will
> > > end up being set to a much lower target than is actually required.
> > 
> > Is this because some sort of slack is applied once per iteration rather
> > than once at the start or is it something else?
> 
> No - the slack reservation just complicated the request by (potentially) 
> requiring more free memory than domU initially requested.
> 
> With or without slack, the central loop in tools/libxl/xl_cmdimpl.c:freemem, 
> frees memory for domU by lowering the memory target for dom0. However, this 
> is 
> not a single request (e.g. free 64GB for domX), rather the memory target for 
> dom0 is set lower during every iteration through:
> 
>rc = libxl_set_memory_target(ctx, 0, free_memkb - need_memkb, 1, 0);
> 
> This causes dom0's memory target to be lowered by the needed amount during 
> every iteration of the loop. In practice, this causes the first request to 
> lower dom0's target by the full amount (e.g. -64GB), and subsequent 
> iterations 
> further lower dom0's target by however much memory that still appears to be 
> required (e.g. three iterations of the loop might lower dom0's target by 
> -25GB, then -25GB, for a total of dom0 ballooning down 114GB). The issue 
> itself is due to the loop ignoring the fact that the original request set 
> dom0's target to the correct amount, but the ballooning has not completed.

What is the return value of libxl_set_memory_target and
libxl_wait_for_free_memory in that case? Isn't it just a matter of
properly handle the return values?

Or maybe we just need to change the libxl_set_memory_target call to use
an absolute memory target to avoid restricting dom0 memory more than
necessary at each iteration. Also increasing the timeout argument passed
to the libxl_wait_for_free_memory call could help.


> The problem itself is easier to see when domU memory sizes are increased. As 
> mentioned before, starting a 512GB domain should guarantee that multiple 
> iterations of the loop are required, and dom0 will balloon down much further 
> than the required 512GB.
> 
> Does this clarify the situation?



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [OSSTEST PATCH 9/8] README.dev: Runes for adhoc testing in the production environment

2015-02-26 Thread Ian Jackson
Signed-off-by: Ian Jackson 
---
 README.dev |   18 ++
 1 file changed, 18 insertions(+)

diff --git a/README.dev b/README.dev
index aae4f17..03c3e61 100644
--- a/README.dev
+++ b/README.dev
@@ -164,3 +164,21 @@ $HOME/bisects/for-$branch.git/stop
 $HOME/testing.git/$xenbranch.stop
 
   stops everything using $xenbranch
+
+Adhoc testing in the production environment
+===
+
+Adhoc (`play') testing of a proposed osstest branch:
+
+  As yourself on the osstest controller VM:
+
+  Check out the version of osstest to be tested.  If you are editing
+  on your workstation, it is easiest to commit everything and then
+ git-push osstestvm:osstest-wombat-tree.git +HEAD:t
+  and on the controller
+ git checkout t~0
+
+  Create (on the controller) daily-cron-email-foo containing
+ To: something appropriate
+  Then
+ OSSTEST_EMAIL_HEADER=daily-cron-email-foo OSSTEST_USE_HEAD=y 
OSSTEST_NO_BASELINE=y ./cr-daily-branch osstest
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [OSSTEST PATCH 5/8] standalone: Always set OSSTEST_NO_BASELINE

2015-02-26 Thread Ian Jackson
Ian Campbell writes ("Re: [OSSTEST PATCH 5/8] standalone: Always set 
OSSTEST_NO_BASELINE"):
> On Wed, 2015-02-25 at 13:01 +, Ian Jackson wrote:
> > OSSTEST_NO_BASELINE disables the thing where cr-daily-branch decides
> Acked-by: Ian Campbell 
> 
> Although:
> > -   --baseline)nobaseline=n; shift 1;;
> > +   --baseline) echo >&2 'warning: --baseline is obsolete'; shift 1;;
> 
> TBH I think you could just nuke it from a tool like this. I rather
> suspect noone is using it... I can't even remember why I wanted it.

OK, I have done that and retained your ack.

Thanks,
Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [OSSTEST PATCH] rump kernels: Send build mails to new list

2015-02-26 Thread Ian Jackson
Signed-off-by: Ian Jackson 
CC: Antti Kantee 
---
 daily-cron-email-real--rumpuserxen |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/daily-cron-email-real--rumpuserxen 
b/daily-cron-email-real--rumpuserxen
index 67c48bf..a9166a0 100644
--- a/daily-cron-email-real--rumpuserxen
+++ b/daily-cron-email-real--rumpuserxen
@@ -1,3 +1,3 @@
 To: xen-de...@lists.xensource.com,
-rumpkernel-bui...@lists.sourceforge.net
+rumpkernel-bui...@freelists.org
 Cc: ian.jack...@eu.citrix.com
-- 
1.7.10.4


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v5.99.1 RFC 1/4] xen/arm: Duplicate gic-v2.c file to support hip04 platform version

2015-02-26 Thread Ian Campbell
On Wed, 2015-02-25 at 16:34 +, Stefano Stabellini wrote:
> I think we should disable the build of all drivers in Xen by default,
> except for the ARM standard compliant ones (for aarch64 the SBSA is a
> nice summary of what is considered compliant), to keep the size of the
> binary small.

I think this last statement was based on information that the gic-v2
driver was of the order of 70-100K in size, but I think that information
was wrong (I suspect it was the raw .o size, which includes debug info
and other extraneous bits). Here I see:

$ du -h xen/arch/arm/gic-v2.o
148Kxen/arch/arm/gic-v2.o
$ aarch64-linux-gnu-size xen/arch/arm/gic-v2.o 
   textdata bss dec hex filename
   6619   0  9767161a3c xen/arch/arm/gic-v2.o

IOW the actual binary size is on the order of 6K (gic-v3.o is around the
same). This is arm64, I can't be bothered to rebuild for arm32, it'll be
similar.

Given that then I really don't think it is worth introducing a two tier
build over it.

If we really cared about these sorts of savings we would arrange to
discard all of the unused GIC/SMMU/UART driver's .text/.data/.bss after
boot (easy enough to achieve by putting each in a dedicated segment).

But I don't think we have enough such drivers to start worrying about
doing that just now. We have that opportunity in our back pocket if we
ever get to that point, which is good enough I think.

> Could you please introduce a Xen build time option in
> xen/arch/arm/Rules.mk, called HAS_NON_STANDARD_DRIVERS, that by default
> is n, and gate the build of gic-hip04.c on it?

Frediano, I see you've already done so in v6, thanks for that. Sorry to
go back on it.

Assuming the rest of the series in v6 is OK (gets acked and whatever)
then I expect I can just skip that one patch when applying and fixup the
Makefile in the obvious way (approx s/HAS_NON.../CONFIG_ARM32/) in the
dependent patch.

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Branch Trace Storage for guests andVPMUinitialization

2015-02-26 Thread Kevin.Mayer


> -Ursprüngliche Nachricht-
> Von: Boris Ostrovsky [mailto:boris.ostrov...@oracle.com]
> Gesendet: Donnerstag, 26. Februar 2015 17:35
> An: Dietmar Hahn; xen-devel@lists.xen.org
> Cc: Mayer, Kevin
> Betreff: Re: [Xen-devel] Branch Trace Storage for guests and
> VPMUinitialization
> 
> On 02/26/2015 03:56 AM, Dietmar Hahn wrote:
> > Am Mittwoch 25 Februar 2015, 11:31:31 schrieb Boris Ostrovsky:
> >> On 02/25/2015 10:12 AM, kevin.ma...@gdata.de wrote:
>  -Ursprüngliche Nachricht-
>  Von: Boris Ostrovsky [mailto:boris.ostrov...@oracle.com]
>  Gesendet: Dienstag, 24. Februar 2015 18:13
>  An: Mayer, Kevin; xen-devel@lists.xen.org
>  Betreff: Re: [Xen-devel] Branch Trace Storage for guests and VPMU
>  initialization
> 
>  On 02/24/2015 10:27 AM, kevin.ma...@gdata.de wrote:
> > Hi guys
> >
> > I`m trying to set up the BTS so that I can log the branches taken
> > in the guest using Xen 4.4.1 with a WinXP SP3 guest on a Core i7
> > Sandy Bridge.
> >
> > I added the vpmu=bts boot parameter to my grub2 configuration and
> > extended the libxl,libxc,domctl,… with an own command so that I
> > can trigger the activation of the BTS whenever I want.
> >
>  I am not sure why you are doing all these changes to Xen code. BTS
>  is supposed to be managed from the guest. For example, a Fedora
> HVM
>  guest will produce this:
> 
>  [root@dhcp-burlington7-2nd-B-east-10-152-55-140 ~]# perf record -e
>  branches:u -c 1 -d sleep 1 [ perf record: Woken up 3838 times to
>  write data ] [ perf record: Captured and wrote 0.704 MB perf.data
>  (~30756 samples) ]
>  [root@dhcp-burlington7-2nd-B-east-10-152-55-140 ~]# perf script -f
>  ip,addr,sym,dso,symoff --show-kernel-path
>  8167c347 native_irq_return_iret+0x0 (/proc/kcore) =>
>  328c001590 [unknown] (/proc/kcore)
>  8167c347 native_irq_return_iret+0x0 (/proc/kcore) =>
>  328c001590 [unknown] ([unknown])
>    328c001593 [unknown] ([unknown]) =>   328c004b70 [unknown]
>  ([unknown])
>  ...
> 
> >>> I want to be able to log the taken branches (of the guest) without the
> need to modify the guest at all.
> >>> This means I have to do all the logic in the hypervisor, or am I wrong?
> >> In that case, yes. But then you have to make sure that at least
> >>* you don't load guest's VPMU (or, at least, BTS-related
> >> registers) on context switch
> > But you need to modify PMU registers when switching to/from the guest
> > context to get PMU running.
> 
> 
> 
> I was thinking that all BTS stuff can be controlled from dom0 and so we can
> use dom0's version of these registers. I didn't realize that DS_AREA would
> have to be accessed in guest's address space (and that DEBUGCTL is loaded
> from VMCS).
> 
> Which is what I think I said in response to this message (which didn't show up
> on the list because Kevin accidentally dropped xen-devel).
> 
> -boris
 
Terribly sorry about that...

So the VPMU doesn’t get loaded when there is a VMENTER?
I thought I could set the domU->vcpu->vpmu to enable BTS while in dom0 (with 
modified versions of msr_write_intercept, vpmu_do_wrmsr and core2_vpmu_do_wrmsr 
of course since the build in ones use the current-vcpu which would be the 
dom0-vcpu)
and as soon as there is a context switch to domU the vpmu gets loaded and the 
guest starts logging.
If the described behavior is correct the only problem I can see is with 
allocating memory in dom0 in a way that the guest can access it.
But if I got it wrong please explain how the vpmu really works.

Cheers

Kevin


> 
> 
> > I didn't think of using the VPMU stuff with modifying the context from
> > outside the guest.
> >
> >>* You don't send the interrupt to the guest (meaning that you will
> >> need to somehow inform dom0 of the BTS interrupt)
> >>
> >> and probably more.
> >>
> >> Essentially, you want dom0 to profile the guest. I have been working
> >> on patches that would allow that but they are still under review.
> >>
> >>
> > In this command I do the following:
> >
> > I set up the memory region for the BTS Buffer and the DS Buffer
> > Management Area using xzalloc_bytes
> >
>  I don't think you should be allocating BTS buffers in the
>  hypervisor, they are in guest's memory.
> >>> I agree. As I said I think this is where my main problem is at the moment.
> >>> Is there any way I can allocate memory in the hypervisor in a way the
> guest can access it?
> >> I am not sure this is what you want since you seem to *not* want the
> >> guest to process the samples, right?
> >>
> >> But yes, you can. E.g. something like what map_vcpu_info() does. (I
> >> have no idea how you'd do this from Windows.)
> > The DS buffer has to be mapped within the guests address space so the
> > CPU running in guest context can access this area. Otherwise you get
> > this triple fault.
> > So I woul

Re: [Xen-devel] Branch Trace Storage for guestsandVPMUinitialization

2015-02-26 Thread Boris Ostrovsky

On 02/26/2015 08:44 AM, kevin.ma...@gdata.de wrote:



-Ursprüngliche Nachricht-
Von: Boris Ostrovsky [mailto:boris.ostrov...@oracle.com]
Gesendet: Mittwoch, 25. Februar 2015 23:20
An: Mayer, Kevin
Betreff: Re: AW: AW: [Xen-devel] Branch Trace Storage for guests
andVPMUinitialization

On 02/25/2015 01:23 PM, kevin.ma...@gdata.de wrote:

-Ursprüngliche Nachricht-
Von: Boris Ostrovsky [mailto:boris.ostrov...@oracle.com]
Gesendet: Mittwoch, 25. Februar 2015 17:32
An: Mayer, Kevin
Cc: xen-devel@lists.xen.org
Betreff: Re: AW: [Xen-devel] Branch Trace Storage for guests and
VPMUinitialization

On 02/25/2015 10:12 AM, kevin.ma...@gdata.de wrote:

-Ursprüngliche Nachricht-
Von: Boris Ostrovsky [mailto:boris.ostrov...@oracle.com]
Gesendet: Dienstag, 24. Februar 2015 18:13
An: Mayer, Kevin; xen-devel@lists.xen.org
Betreff: Re: [Xen-devel] Branch Trace Storage for guests and VPMU
initialization

On 02/24/2015 10:27 AM, kevin.ma...@gdata.de wrote:

Hi guys

I`m trying to set up the BTS so that I can log the branches taken
in the guest using Xen 4.4.1 with a WinXP SP3 guest on a Core i7
Sandy Bridge.

I added the vpmu=bts boot parameter to my grub2 configuration and
extended the libxl,libxc,domctl,… with an own command so that I
can trigger the activation of the BTS whenever I want.


I am not sure why you are doing all these changes to Xen code. BTS
is supposed to be managed from the guest. For example, a Fedora

HVM

guest will produce this:

[root@dhcp-burlington7-2nd-B-east-10-152-55-140 ~]# perf record -e
branches:u -c 1 -d sleep 1 [ perf record: Woken up 3838 times to
write data ] [ perf record: Captured and wrote 0.704 MB perf.data
(~30756 samples) ]
[root@dhcp-burlington7-2nd-B-east-10-152-55-140 ~]# perf script -f
ip,addr,sym,dso,symoff --show-kernel-path
 8167c347 native_irq_return_iret+0x0 (/proc/kcore) =>
328c001590 [unknown] (/proc/kcore)
 8167c347 native_irq_return_iret+0x0 (/proc/kcore) =>
328c001590 [unknown] ([unknown])
   328c001593 [unknown] ([unknown]) =>   328c004b70 [unknown]
([unknown])
...


I want to be able to log the taken branches (of the guest) without
the need

to modify the guest at all.

This means I have to do all the logic in the hypervisor, or am I wrong?

In that case, yes. But then you have to make sure that at least
* you don't load guest's VPMU (or, at least, BTS-related
registers) on context switch
* You don't send the interrupt to the guest (meaning that you will
need to somehow inform dom0 of the BTS interrupt)

and probably more.

Essentially, you want dom0 to profile the guest. I have been working
on patches that would allow that but they are still under review.


Yes, this is exactly what I want to do.
Too bad that your patches are under review. Would have been pretty

helpful I think.

To be honest, I never tested them for BTS so they may not work in that
mode. In fact, as you will realize by reading what I said below, they probably
don't ;-(


Maybe I should point out that I´m a total noob with xen and I definitely

don’t understand all parts yet.

So there may be some dumb mistakes in my assumptions.


In this command I do the following:

I set up the memory region for the BTS Buffer and the DS Buffer
Management Area using xzalloc_bytes


I don't think you should be allocating BTS buffers in the
hypervisor, they

are

in guest's memory.

I agree. As I said I think this is where my main problem is at the moment.
Is there any way I can allocate memory in the hypervisor in a way
the guest

can access it?

I am not sure this is what you want since you seem to *not* want the
guest to process the samples, right?

But yes, you can. E.g. something like what map_vcpu_info() does. (I
have no idea how you'd do this from Windows.)

Right again. As you said my goal is to profile the guest from dom0. So

whenever the CPU is in guestmode and a branch is taken it should be stored
in the BTS, but not when the CPU is running dom0. My idea was basically to
set up the memory for the BTS and the GUEST_IA32_DEBUGCTL so when
there is a vmexit the logging stops and starts again when there is a vmenter.
As far as I understand the IA32_DEBUGCTL gets switched between the
dom0-value and the guest-value (stored in vmcs) when there is a
vmexit/vmenter, right?

Right. And now I am not longer sure whether your buffer should be in
hypervisor or guest's space: after VMENTER the hardware will load guest's
versions of IA32_DEBUGCTLMSR and MSR_IA32_DS_AREA. I don't know
whether you can prevent this from happening (need to look in the spec).
And if that's the case then you might be able to:

1. Map DS area and BTS buffer in both guest and hypervisor. I believe your
guest will have to have this mapped since these ares will be accessed via
guest's EPT. As I said, I don't know how you'd do this in Windows --- I know
nothing about programming there. I assume it can be done since there are
Windows PV drivers for Xen.
2. Have dom0 set appropriate 

Re: [Xen-devel] [PATCH v2] RFC: Automatically check xen's public headers for C++ pitfalls.

2015-02-26 Thread Razvan Cojocaru
On 02/26/2015 07:01 PM, Tim Deegan wrote:
> +#ifdef __cplusplus
> +/* 'private' is a keyword in C++, so we have to use a different name for
> + * private state there.  Leaving the C name alone to avoid unnecessary
> + * pain for the existing users. */
> +#define XEN_RING_PRIVATE pvt
> +#else
> +#define XEN_RING_PRIVATE private
> +#endif

Are there likely to be many users outside of the ones using that code
with mem_event? Because if there aren't, there are much more drastic
changes happening in Tamas' pending series, so perhaps seen that way the
change becomes more acceptable.


Thanks,
Razvan

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] freemem-slack and large memory environments

2015-02-26 Thread Ian Campbell
On Thu, 2015-02-26 at 10:38 -0700, Mike Latimer wrote:
> On Thursday, February 26, 2015 03:57:54 PM Ian Campbell wrote:
> > On Thu, 2015-02-26 at 08:36 -0700, Mike Latimer wrote:
> > > There is still one aspect of my original patch that is important. As the
> > > code currently stands, the target for dom0 is set lower during each
> > > iteration of the loop. Unless only one iteration is required, dom0 will
> > > end up being set to a much lower target than is actually required.
> > 
> > Is this because some sort of slack is applied once per iteration rather
> > than once at the start or is it something else?
> 
> No - the slack reservation just complicated the request by (potentially) 
> requiring more free memory than domU initially requested.
> 
> With or without slack, the central loop in tools/libxl/xl_cmdimpl.c:freemem, 
> frees memory for domU by lowering the memory target for dom0. However, this 
> is 
> not a single request (e.g. free 64GB for domX), rather the memory target for 
> dom0 is set lower during every iteration through:
> 
>rc = libxl_set_memory_target(ctx, 0, free_memkb - need_memkb, 1, 0);
> 
> This causes dom0's memory target to be lowered by the needed amount during 
> every iteration of the loop. In practice, this causes the first request to 
> lower dom0's target by the full amount (e.g. -64GB), and subsequent 
> iterations 
> further lower dom0's target by however much memory that still appears to be 
> required (e.g. three iterations of the loop might lower dom0's target by 
> -25GB, then -25GB, for a total of dom0 ballooning down 114GB). The issue 
> itself is due to the loop ignoring the fact that the original request set 
> dom0's target to the correct amount, but the ballooning has not completed.
> 
> The problem itself is easier to see when domU memory sizes are increased. As 
> mentioned before, starting a 512GB domain should guarantee that multiple 
> iterations of the loop are required, and dom0 will balloon down much further 
> than the required 512GB.
> 
> Does this clarify the situation?

I think so. In essence we just need to update need_memkb on each
iteration, right?

Ian.



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen's Linux kernel config options

2015-02-26 Thread Stefano Stabellini
On Thu, 26 Feb 2015, Luis R. Rodriguez wrote:
> On Thu, Feb 26, 2015 at 11:19:17AM +, Stefano Stabellini wrote:
> > On Wed, 25 Feb 2015, Luis R. Rodriguez wrote:
> > > On Wed, Feb 25, 2015 at 12:01:31PM +, Stefano Stabellini wrote:
> > > > On Tue, 24 Feb 2015, Luis R. Rodriguez wrote:
> > > > > On Tue, Feb 24, 2015 at 7:21 AM, Stefano Stabellini
> > > > >  wrote:
> > > > > > On Mon, 23 Feb 2015, Luis R. Rodriguez wrote:
> > > > > >> On Thu, Feb 19, 2015 at 3:43 PM, Luis R. Rodriguez 
> > > > > >>  wrote:
> > > > > >> > On Fri, Dec 12, 2014 at 9:29 AM, David Vrabel 
> > > > > >> >  wrote:
> > > > > >> >> On 12/12/14 13:17, Juergen Gross wrote:
> > > > > >> >>> XEN_PVHVM
> > > > > >> >>
> > > > > >> >> Move XEN_PVHVM under XEN and have it select PARAVIRT and 
> > > > > >> >> PARAVIRT_CLOCK.
> > > > > >> >
> > > > > >> > FWIW, although it seems we do not want to let users just build
> > > > > >> > XEN_PVHVM hypervisors I have the changes required now to at 
> > > > > >> > least get
> > > > > >> > this to build so I do know what it takes.
> > > > > >> >
> > > > > >> >>> XEN_FRONTENDXEN_PV 
> > > > > >> >>> ||
> > > > > >> >>> 
> > > > > >> >>> XEN_PVH ||
> > > > > >> >>> 
> > > > > >> >>> XEN_PVHVM
> > > > > >> >>
> > > > > >> >> This enables all the basic infrastructure for frontends: event 
> > > > > >> >> channels,
> > > > > >> >> grant tables and Xenbus.
> > > > > >> >>
> > > > > >> >> Don't make XEN_FRONTEND depend on any XEN_* variant.  It should 
> > > > > >> >> be
> > > > > >> >> possible to have frontend drivers without support for any of the
> > > > > >> >> PV/PVHVM/PVH guest types.
> > > > > >> >
> > > > > >> > David, can you elaborate on the type of Xen guest it would be on 
> > > > > >> > x86
> > > > > >> > its not PV, PVHVM, or PVH? I'm particularly curious about the
> > > > > >> > xen_domain_type and how it would end up to selected. As it is we 
> > > > > >> > tie
> > > > > >> > in XEN_PVHVM at build time with XEN_PVH, in order to have 
> > > > > >> > XEN_PVHVM
> > > > > >> > completely removed from XEN_PVH we need quite a bit of code 
> > > > > >> > changes
> > > > > >> > which at least as code exercise I have completed already. If we 
> > > > > >> > want
> > > > > >> > at the very least xen_domain_type set when XEN_PV, XEN_PVHVM, and
> > > > > >> > XEN_PVH are not available we need a bit more work.
> > > > > >>
> > > > > >> OK I think I see the issue. We have nothing quite like
> > > > > >> xen_guest_init() on x86 enlighten.c, we do have this for ARM and I
> > > > > >> think I can that close the gap I'm observing.
> > > > > >>
> > > > > >> >>  Frontends only need event channels, grant
> > > > > >> >> table and xenbus.
> > > > > >> >
> > > > > >> > Well xenbus_probe_initcall() will check for xen_domain() and that
> > > > > >> > won't be set on x86 right now unless we have XEN_PV, XEN_PVHVM or
> > > > > >> > XEN_PVH set -- to start off with. Then
> > > > > >> > drivers/xen/xenbus/xenbus_client.c will check xen_feature in 
> > > > > >> > quite a
> > > > > >> > bit of places as well, that won't be set unless 
> > > > > >> > xen_setup_features()
> > > > > >> > is called which right now is only done on x86 
> > > > > >> > arch/x86/xen/enlighten.c
> > > > > >> > which as Juergen pointed out, is not needed if you don't have 
> > > > > >> > XEN_PV
> > > > > >> > or XEN_PVH. As it turns out this is incorrect though, its needed 
> > > > > >> > for
> > > > > >> > XEN_PVHVM as well and my split exercise in code addresses this. 
> > > > > >> > Now,
> > > > > >> > at least in my code if you don't have XEN_PV, XEN_PVHVM, or 
> > > > > >> > XEN_PVH we
> > > > > >> > don't call xen_setup_features() and its unclear to me where or 
> > > > > >> > how
> > > > > >> > that should happen in other cases.
> > > > > >>
> > > > > >> Yeah I think having an x86 equivalent of xen_guest_init() would 
> > > > > >> solve
> > > > > >> this, Stefano, thoughts?
> > > > > >
> > > > > > Having xen_guest_init() on x86 would be nice.  Being able to set
> > > > > > xen_domain_type to XEN_HVM_DOMAIN if we are running on Xen, 
> > > > > > regardless
> > > > > > of XEN_PV/PVH/PVHVM also makes sense from Linux POV.
> > > > > 
> > > > > OK great, thanks for the feedback.
> > > > > 
> > > > > > That said, I don't see much value in removing XEN_PVHVM: why are we 
> > > > > > even
> > > > > > doing this? What is the improvement we are seeking?
> > > > > 
> > > > > We would not, the above discussed about the possibility of letting
> > > > > users enable XEN_PVHVM without XEN_PVH, that's all.
> > > > 
> > > > OK, that makes sense.
> > > > 
> > > > > As is the only thing that can enable XEN_PVHVM is if you enable
> > > > > XEN_PVH.
> > > > 
> > > > This is the bit that we need to change but it shouldn't be difficult.
> > > > 
> > > > > If we want
> > > > > xen_guest_init() alone thoug

Re: [Xen-devel] [OSSTEST PATCH 3/8] emails: honour OSSTEST_EMAIL_SUBJECT_PREFIX

2015-02-26 Thread Ian Jackson
Ian Campbell writes ("Re: [OSSTEST PATCH 3/8] emails: honour 
OSSTEST_EMAIL_SUBJECT_PREFIX"):
> On Wed, 2015-02-25 at 13:01 +, Ian Jackson wrote:
> > This is prefixed before the other computed prefixes.  It makes it
> > easier to distinguish an adhoc cr-daily-branch test runs for a real
> > branch.
> 
> Do they not already get "adhoc" in the $subject? i.e. my commissioning
> runs for the new arm create (following README.dev procedure) resulted in
> mails with:
> 
> [adhoc test] 34418: trouble: blocked/broken/fail/pass
> 
> (IOW it seems $branch is replaced by adhoc somewhere along the say)

That happens if you use mg-execute-flight.  If you let cr-daily-branch
run the flight for you, it uses the standard email stuff.

Ian.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: xen config changes v4

2015-02-26 Thread Stefano Stabellini
On Thu, 26 Feb 2015, Luis R. Rodriguez wrote:
> On Thu, Feb 26, 2015 at 11:08:20AM +, Stefano Stabellini wrote:
> > On Thu, 26 Feb 2015, David Vrabel wrote:
> > > On 26/02/15 04:59, Juergen Gross wrote:
> > > > 
> > > > So we are again in the situation that pv-drivers always imply the pvops
> > > > kernel (PARAVIRT selected). I started the whole Kconfig rework to
> > > > eliminate this dependency.
> > > 
> > > Yes.  Can you produce a series that just addresses this one issue.
> > > 
> > > In the absence of any concrete requirement for this big Kconfig reorg I
> > > I don't think it is helpful.
> > 
> > I clearly missed some context as I didn't realize that this was the
> > intended goal. Why do we want this? Please explain as it won't come
> > for free.
> > 
> > 
> > We have a few PV interfaces for HVM guests that need PARAVIRT in Linux
> > in order to be used, for example pv_time_ops and HVMOP_pagetable_dying.
> > They are critical performance improvements and from the interface
> > perspective, small enough that doesn't make much sense having a separate
> > KConfig option for them.
> > 
> > 
> > In order to reach the goal above we necessarily need to introduce a
> > differentiation in terms of PV on HVM guests in Linux:
> > 
> > 1) basic guests with PV network, disk, etc but no PV timers, no
> >HVMOP_pagetable_dying, no PV IPIs
> > 2) full PV on HVM guests that have PV network, disk, timers,
> >HVMOP_pagetable_dying, PV IPIs and anything else that makes sense.
> > 
> > 2) is much faster than 1) on Xen and 2) is only a tiny bit slower than
> > 1) on native x86
> 
> Also don't we shove 2) down hvm guests right now? Even when everything is
> built in I do not see how we opt out for HVM for 1) at run time right now.
>
> If this is true then the question of motivation for this becomes even
> stronger I think.

Yes, indeed there is no way to do 1) at the moment. And for good
reasons, see above.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] freemem-slack and large memory environments

2015-02-26 Thread Mike Latimer
On Thursday, February 26, 2015 03:57:54 PM Ian Campbell wrote:
> On Thu, 2015-02-26 at 08:36 -0700, Mike Latimer wrote:
> > There is still one aspect of my original patch that is important. As the
> > code currently stands, the target for dom0 is set lower during each
> > iteration of the loop. Unless only one iteration is required, dom0 will
> > end up being set to a much lower target than is actually required.
> 
> Is this because some sort of slack is applied once per iteration rather
> than once at the start or is it something else?

No - the slack reservation just complicated the request by (potentially) 
requiring more free memory than domU initially requested.

With or without slack, the central loop in tools/libxl/xl_cmdimpl.c:freemem, 
frees memory for domU by lowering the memory target for dom0. However, this is 
not a single request (e.g. free 64GB for domX), rather the memory target for 
dom0 is set lower during every iteration through:

   rc = libxl_set_memory_target(ctx, 0, free_memkb - need_memkb, 1, 0);

This causes dom0's memory target to be lowered by the needed amount during 
every iteration of the loop. In practice, this causes the first request to 
lower dom0's target by the full amount (e.g. -64GB), and subsequent iterations 
further lower dom0's target by however much memory that still appears to be 
required (e.g. three iterations of the loop might lower dom0's target by 
-25GB, then -25GB, for a total of dom0 ballooning down 114GB). The issue 
itself is due to the loop ignoring the fact that the original request set 
dom0's target to the correct amount, but the ballooning has not completed.

The problem itself is easier to see when domU memory sizes are increased. As 
mentioned before, starting a 512GB domain should guarantee that multiple 
iterations of the loop are required, and dom0 will balloon down much further 
than the required 512GB.

Does this clarify the situation?

-Mike

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v10 0/4] enable Memory Bandwidth Monitoring (MBM) for VMs

2015-02-26 Thread Chao Peng
Changes from v9:
* Move libxc refactoring code into standalone patch;
* Make libxl get_sample interface more generic;

Changes from v8:
* Merge event mask patch to MBM enabling patch;
* Address comments from Ian Campbell(Detail in patch itself).

Changes from v7:
* Make obfuscating more complex as Jan suggested.
* Minor adjustment for commit message.

Changes from v6:
* Obfuscate the read value of MSR_IA32_TSC by adding a booting random;
* Minor coding style/comments adjustment;

Changes from v5:
* Remove common IRQ disable flag but instead disable IRQ when other MSR
  read followed by MSR_IA32_TSC read;
* Add comments for special handle for MSR_IA32_TSC;

Changes from v4:
* Make the counter read and timestamp read atomic by disable IRQ;
* Treat MSR_IA32_TSC as a special case and return NOW() for read path;
* Add MBM description in xl command line.

Changes from v3:
* Get timestamp information from host along with the monitoring counter;
  This is required for counter overlow detection.
* Address comments from Wei on the last patch.

Changes from v2:
* Remove the usage of "static" to cache data in xc;
  NOTE: Other places that already existed before are not touched due to
the needs for API change. Will fix in separate patch if desirable.
* Coding style;

Changes from v1:
* Move event type check from xc to xl;
* Add retry capability for MBM sampling;
* Fix Coding style/docs;

Hypervisor part for this serial is already in, this contains only tools
side changes.

Intel Memory Bandwidth Monitoring(MBM) is a new hardware feature
which builds on the CMT infrastructure to allow monitoring of system
memory bandwidth. Event codes are provided to monitor both "total"
and "local" bandwidth, meaning bandwidth over QPI and other external
links can be monitored.

For XEN, MBM is used to monitor memory bandwidth for VMs. Due to its
dependency on CMT, the software also makes use of most of CMT codes.
Actually, besides introducing two additional events and some cpuid
feature bits, there are no extra changes compared to cache occupancy
monitoring in CMT. Due to this, CMT should be enabled first to use
this feature.

For interface changes, the patch serial introduces a new command
"XEN_SYSCTL_PSR_CMT_get_l3_event_mask" which exposes MBM feature
capability to user space and modified "resource_op" to support reading
host system time together with the monitored counter.

On the tool stack side, two additional options introduced for
"xl psr-cmt-show":
total_mem_bandwidth: Show total memory bandwidth
local_mem_bandwidth: Show local memory bandwidth

The usage flow keeps the same with CMT.


Chao Peng (4):
  tools: correct coding style for psr
  tools/libxc: code refactoring in xc_psr_cmt_get_data
  tools/libxl: code refactoring for MBM
  tools, docs: add total/local memory bandwith monitoring

 docs/man/xl.pod.1   |  11 +++-
 docs/misc/xen-command-line.markdown |   3 +
 tools/libxc/include/xenctrl.h   |  14 +++--
 tools/libxc/xc_msr_x86.h|   1 +
 tools/libxc/xc_psr.c|  76 ++--
 tools/libxl/libxl.h |  28 +++--
 tools/libxl/libxl_psr.c |  59 +++
 tools/libxl/libxl_types.idl |   2 +
 tools/libxl/xl_cmdimpl.c| 113 +---
 tools/libxl/xl_cmdtable.c   |   4 +-
 10 files changed, 253 insertions(+), 58 deletions(-)

-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] RFC: Automatically check xen's public headers for C++ pitfalls.

2015-02-26 Thread Tim Deegan
At 16:11 + on 26 Feb (1424963496), Tim Deegan wrote:
> Add a check, like the existing check for non-ANSI C in the public
> headers, that runs the public headers through a C++ compiler to
> flag non-C++-friendly constructs.

Oops, this still has the EFI changes in it.  v3, rebased, is on its way.

> Unlike the ANSI C check, we accept GCC-isms (gnu++98), and we also
> check various tools-only headers.
> 
> Explicitly _not_ addressing the use of 'private' in various fields,
> since we'd previously decided not to fix that.

BTW, ring.h is the only instance of that, so the extra diff to clear
that up too is pretty small (see below).

Not sure what people think about that though - it might be
quite a PITA for downstream users of it, though they ought really to
be using local copies so they can update in a controlled way.

diff --git a/xen/include/Makefile b/xen/include/Makefile
index d48a642..c7a1d52 100644
--- a/xen/include/Makefile
+++ b/xen/include/Makefile
@@ -104,8 +104,7 @@ headers.chk: $(PUBLIC_ANSI_HEADERS) Makefile
 headers++.chk: $(PUBLIC_HEADERS) Makefile
if $(CXX) -v >/dev/null 2>&1; then \
for i in $(filter %.h,$^); do \
-   $(CXX) -x c++ -std=gnu++98 -Wall -Werror \
-  -D__XEN_TOOLS__ -Dprivate=private_is_a_keyword_in_cpp \
+   $(CXX) -x c++ -std=gnu++98 -Wall -Werror -D__XEN_TOOLS__ \
   -include stdint.h -include public/xen.h \
   -S -o /dev/null $$i || exit 1; \
echo $$i; \
diff --git a/xen/include/public/io/ring.h b/xen/include/public/io/ring.h
index 73e13d7..bb13494 100644
--- a/xen/include/public/io/ring.h
+++ b/xen/include/public/io/ring.h
@@ -111,7 +111,7 @@ struct __name##_sring { 
\
 uint8_t msg;\
 } tapif_user;   \
 uint8_t pvt_pad[4]; \
-} private;  \
+} local;\
 uint8_t __pad[44];  \
 union __name##_sring_entry ring[1]; /* variable-length */   \
 };  \
@@ -156,7 +156,7 @@ typedef struct __name##_back_ring __name##_back_ring_t
 #define SHARED_RING_INIT(_s) do {   \
 (_s)->req_prod  = (_s)->rsp_prod  = 0;  \
 (_s)->req_event = (_s)->rsp_event = 1;  \
-(void)memset((_s)->private.pvt_pad, 0, sizeof((_s)->private.pvt_pad)); \
+(void)memset((_s)->local.pvt_pad, 0, sizeof((_s)->local.pvt_pad));  \
 (void)memset((_s)->__pad, 0, sizeof((_s)->__pad));  \
 } while(0)
 



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v6 16/23] libxl: disallow memory relocation when vNUMA is enabled

2015-02-26 Thread Wei Liu
Disallow memory relocation when vNUMA is enabled, because relocated
memory ends up off node. Further more, even if we dynamically expand
node coverage in hvmloader, low memory and high memory may reside
in different physical nodes, blindly relocating low memory to high
memory gives us a sub-optimal configuration.

Introduce a function called libxl__vnuma_configured and use it.

Signed-off-by: Wei Liu 
Cc: Ian Campbell 
Cc: Ian Jackson 
Cc: Konrad Wilk 
---
Changes in v6:
1. Introduce a helper function.
---
 tools/libxl/libxl_dm.c   | 6 --
 tools/libxl/libxl_internal.h | 1 +
 tools/libxl/libxl_vnuma.c| 5 +
 3 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c
index 8599a6a..7b09512 100644
--- a/tools/libxl/libxl_dm.c
+++ b/tools/libxl/libxl_dm.c
@@ -1365,13 +1365,15 @@ void libxl__spawn_local_dm(libxl__egc *egc, 
libxl__dm_spawn_state *dmss)
 libxl__sprintf(gc, "%s/hvmloader/bios", path),
 "%s", libxl_bios_type_to_string(b_info->u.hvm.bios));
 /* Disable relocating memory to make the MMIO hole larger
- * unless we're running qemu-traditional */
+ * unless we're running qemu-traditional and vNUMA is not
+ * configured. */
 libxl__xs_write(gc, XBT_NULL,
 libxl__sprintf(gc,
"%s/hvmloader/allow-memory-relocate",
path),
 "%d",
-
b_info->device_model_version==LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL);
+
b_info->device_model_version==LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL &&
+!libxl__vnuma_configured(b_info));
 free(path);
 }
 
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index e93089a..d04b6aa 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3413,6 +3413,7 @@ int libxl__vnuma_build_vmemrange_hvm(libxl__gc *gc,
  libxl_domain_build_info *b_info,
  libxl__domain_build_state *state,
  struct xc_hvm_build_args *args);
+bool libxl__vnuma_configured(const libxl_domain_build_info *b_info);
 
 _hidden int libxl__ms_vm_genid_set(libxl__gc *gc, uint32_t domid,
const libxl_ms_vm_genid *id);
diff --git a/tools/libxl/libxl_vnuma.c b/tools/libxl/libxl_vnuma.c
index a0576ee..6af3cde 100644
--- a/tools/libxl/libxl_vnuma.c
+++ b/tools/libxl/libxl_vnuma.c
@@ -17,6 +17,11 @@
 #include "libxl_arch.h"
 #include 
 
+bool libxl__vnuma_configured(const libxl_domain_build_info *b_info)
+{
+return b_info->num_vnuma_nodes != 0;
+}
+
 /* Sort vmemranges in ascending order with "start" */
 static int compare_vmemrange(const void *a, const void *b)
 {
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH] correct mis-conversion set_bit() -> __cpumask_set_cpu() by 4aaca0e9cd

2015-02-26 Thread Sander Eikelenboom

Monday, February 23, 2015, 12:06:00 PM, you wrote:

> I have no idea how I came to use __cpumask_set_cpu() there, the
> conversion should have been set_bit() -> __set_bit(). The wrong
> construct results in problems on systems with relatively few CPUs.

> Reported-by: Sander Eikelenboom 
> Signed-off-by: Jan Beulich 

> --- a/xen/common/softirq.c
> +++ b/xen/common/softirq.c
> @@ -106,7 +106,7 @@ void cpu_raise_softirq(unsigned int cpu,
>  if ( !per_cpu(batching, this_cpu) || in_irq() )
>  smp_send_event_check_cpu(cpu);
>  else
> -__cpumask_set_cpu(nr, &per_cpu(batch_mask, this_cpu));
> +__set_bit(nr, &per_cpu(batch_mask, this_cpu));
>  }
>  
>  void cpu_raise_softirq_batch_begin(void)

Hi Jan,

Any reason this wasn't applied to staging yet ?

--
Sander



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v6 09/23] libxl: introduce libxl__vnuma_config_check

2015-02-26 Thread Wei Liu
This function is used to check whether vNUMA configuration (be it
auto-generated or supplied by user) is valid.

Define a new error code ERROR_VNUMA_CONFIG_INVALID.

The checks performed can be found in the comment of the function.

This vNUMA function (and future ones) is placed in a new file called
libxl_vnuma.c

Signed-off-by: Wei Liu 
Cc: Ian Campbell 
Cc: Ian Jackson 
Cc: Dario Faggioli 
Cc: Elena Ufimtseva 
---
Changes in v6:
1. Address comments from Andrew.
2. Check vdistances.
3. use libxl_numainfo_list_free.
4. Change p to v.

Changes in v5:
1. Define and use new error code.
2. Use LOG macro.
3. Fix hard tabs.

Changes in v4:
1. Adapt to new interface.

Changes in v3:
1. Rewrite commit log.
2. Shorten two error messages.
---
 tools/libxl/Makefile |   2 +-
 tools/libxl/libxl_internal.h |   7 ++
 tools/libxl/libxl_types.idl  |   1 +
 tools/libxl/libxl_vnuma.c| 151 +++
 4 files changed, 160 insertions(+), 1 deletion(-)
 create mode 100644 tools/libxl/libxl_vnuma.c

diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile
index 7329521..1b16598 100644
--- a/tools/libxl/Makefile
+++ b/tools/libxl/Makefile
@@ -93,7 +93,7 @@ LIBXL_LIBS += -lyajl
 LIBXL_OBJS = flexarray.o libxl.o libxl_create.o libxl_dm.o libxl_pci.o \
libxl_dom.o libxl_exec.o libxl_xshelp.o libxl_device.o \
libxl_internal.o libxl_utils.o libxl_uuid.o \
-   libxl_json.o libxl_aoutils.o libxl_numa.o \
+   libxl_json.o libxl_aoutils.o libxl_numa.o libxl_vnuma.o 
\
libxl_save_callout.o _libxl_save_msgs_callout.o \
libxl_qmp.o libxl_event.o libxl_fork.o $(LIBXL_OBJS-y)
 LIBXL_OBJS += libxl_genid.o
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 6d3ac58..258be0d 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3394,6 +3394,13 @@ void libxl__numa_candidate_put_nodemap(libxl__gc *gc,
 libxl_bitmap_copy(CTX, &cndt->nodemap, nodemap);
 }
 
+/* Check if vNUMA config is valid. Returns 0 if valid,
+ * ERROR_VNUMA_CONFIG_INVALID otherwise.
+ */
+int libxl__vnuma_config_check(libxl__gc *gc,
+  const libxl_domain_build_info *b_info,
+  const libxl__domain_build_state *state);
+
 _hidden int libxl__ms_vm_genid_set(libxl__gc *gc, uint32_t domid,
const libxl_ms_vm_genid *id);
 
diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 14c7e7c..23951fc 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -63,6 +63,7 @@ libxl_error = Enumeration("error", [
 (-17, "DEVICE_EXISTS"),
 (-18, "REMUS_DEVOPS_DOES_NOT_MATCH"),
 (-19, "REMUS_DEVICE_NOT_SUPPORTED"),
+(-20, "VNUMA_CONFIG_INVALID"),
 ], value_namespace = "")
 
 libxl_domain_type = Enumeration("domain_type", [
diff --git a/tools/libxl/libxl_vnuma.c b/tools/libxl/libxl_vnuma.c
new file mode 100644
index 000..33d7a3c
--- /dev/null
+++ b/tools/libxl/libxl_vnuma.c
@@ -0,0 +1,151 @@
+/*
+ * Copyright (C) 2014  Citrix Ltd.
+ * Author Wei Liu 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU Lesser General Public License as published
+ * by the Free Software Foundation; version 2.1 only. with the special
+ * exception on linking described in file LICENSE.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU Lesser General Public License for more details.
+ */
+#include "libxl_osdeps.h" /* must come before any other headers */
+#include "libxl_internal.h"
+#include 
+
+/* Sort vmemranges in ascending order with "start" */
+static int compare_vmemrange(const void *a, const void *b)
+{
+const xen_vmemrange_t *x = a, *y = b;
+if (x->start < y->start)
+return -1;
+if (x->start > y->start)
+return 1;
+return 0;
+}
+
+/* Check if vNUMA configuration is valid:
+ *  1. all pnodes inside vnode_to_pnode array are valid
+ *  2. each vcpu belongs to one and only one vnode
+ *  3. each vmemrange is valid and doesn't overlap with any other
+ *  4. local distance cannot be larger than remote distance
+ */
+int libxl__vnuma_config_check(libxl__gc *gc,
+  const libxl_domain_build_info *b_info,
+  const libxl__domain_build_state *state)
+{
+int nr_nodes = 0, rc = ERROR_VNUMA_CONFIG_INVALID;
+unsigned int i, j;
+libxl_numainfo *ninfo = NULL;
+uint64_t total_memkb = 0;
+libxl_bitmap cpumap;
+libxl_vnode_info *v;
+
+libxl_bitmap_init(&cpumap);
+
+/* Check pnode specified is valid */
+ninfo = libxl_get_numainfo(CTX, &nr_nodes);
+if (!ninfo) {
+LOG(ERROR, "libxl_get_numain

Re: [Xen-devel] [PATCH v2] RFC: Automatically check xen's public headers for C++ pitfalls.

2015-02-26 Thread David Vrabel
On 26/02/15 16:28, Tim Deegan wrote:
> 
> BTW, ring.h is the only instance of that, so the extra diff to clear
> that up too is pretty small (see below).
> 
> Not sure what people think about that though - it might be
> quite a PITA for downstream users of it, though they ought really to
> be using local copies so they can update in a controlled way.

With my linux maintainer hat on, this is fine by me.

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v6 05/23] libxc: add p2m_size to xc_dom_image

2015-02-26 Thread Wei Liu
Add a new field p2m_size to keep track of the number of pages covered by
p2m.  Change total_pages to p2m_size in functions which in fact need
the size of p2m.

This is needed because we are going to ditch the assumption that PV x86
has only one contiguous ram region. Originally the p2m size was always
equal to total_pages, but we will soon change that in later patch.

This patch doesn't change the behaviour of libxc.

Signed-off-by: Wei Liu 
Reviewed-by: Dario Faggioli 
Cc: Ian Campbell 
Cc: Ian Jackson 
---
 tools/libxc/include/xc_dom.h |  1 +
 tools/libxc/xc_dom_arm.c |  1 +
 tools/libxc/xc_dom_core.c|  8 
 tools/libxc/xc_dom_x86.c | 19 +++
 4 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h
index 07d7224..6b8ddf4 100644
--- a/tools/libxc/include/xc_dom.h
+++ b/tools/libxc/include/xc_dom.h
@@ -129,6 +129,7 @@ struct xc_dom_image {
  */
 xen_pfn_t rambase_pfn;
 xen_pfn_t total_pages;
+xen_pfn_t p2m_size; /* number of pfns covered by p2m */
 struct xc_dom_phys *phys_pages;
 int realmodearea_log;
 #if defined (__arm__) || defined(__aarch64__)
diff --git a/tools/libxc/xc_dom_arm.c b/tools/libxc/xc_dom_arm.c
index c7feca7..b9fa66d 100644
--- a/tools/libxc/xc_dom_arm.c
+++ b/tools/libxc/xc_dom_arm.c
@@ -449,6 +449,7 @@ int arch_setup_meminit(struct xc_dom_image *dom)
 assert(dom->rambank_size[0] != 0);
 assert(ramsize == 0); /* Too much RAM is rejected above */
 
+dom->p2m_size = p2m_size;
 dom->p2m_host = xc_dom_malloc(dom, sizeof(xen_pfn_t) * p2m_size);
 if ( dom->p2m_host == NULL )
 return -EINVAL;
diff --git a/tools/libxc/xc_dom_core.c b/tools/libxc/xc_dom_core.c
index ecbf981..b100ce1 100644
--- a/tools/libxc/xc_dom_core.c
+++ b/tools/libxc/xc_dom_core.c
@@ -931,9 +931,9 @@ int xc_dom_update_guest_p2m(struct xc_dom_image *dom)
 {
 case 4:
 DOMPRINTF("%s: dst 32bit, pages 0x%" PRIpfn "",
-  __FUNCTION__, dom->total_pages);
+  __FUNCTION__, dom->p2m_size);
 p2m_32 = dom->p2m_guest;
-for ( i = 0; i < dom->total_pages; i++ )
+for ( i = 0; i < dom->p2m_size; i++ )
 if ( dom->p2m_host[i] != INVALID_P2M_ENTRY )
 p2m_32[i] = dom->p2m_host[i];
 else
@@ -941,9 +941,9 @@ int xc_dom_update_guest_p2m(struct xc_dom_image *dom)
 break;
 case 8:
 DOMPRINTF("%s: dst 64bit, pages 0x%" PRIpfn "",
-  __FUNCTION__, dom->total_pages);
+  __FUNCTION__, dom->p2m_size);
 p2m_64 = dom->p2m_guest;
-for ( i = 0; i < dom->total_pages; i++ )
+for ( i = 0; i < dom->p2m_size; i++ )
 if ( dom->p2m_host[i] != INVALID_P2M_ENTRY )
 p2m_64[i] = dom->p2m_host[i];
 else
diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
index 9dbaedb..bea54f2 100644
--- a/tools/libxc/xc_dom_x86.c
+++ b/tools/libxc/xc_dom_x86.c
@@ -122,11 +122,11 @@ static int count_pgtables(struct xc_dom_image *dom, int 
pae,
 
 try_pfn_end = (try_virt_end - dom->parms.virt_base) >> PAGE_SHIFT_X86;
 
-if ( try_pfn_end > dom->total_pages )
+if ( try_pfn_end > dom->p2m_size )
 {
 xc_dom_panic(dom->xch, XC_OUT_OF_MEMORY,
  "%s: not enough memory for initial mapping 
(%#"PRIpfn" > %#"PRIpfn")",
- __FUNCTION__, try_pfn_end, dom->total_pages);
+ __FUNCTION__, try_pfn_end, dom->p2m_size);
 return -ENOMEM;
 }
 
@@ -440,10 +440,11 @@ pfn_error:
 
 static int alloc_magic_pages(struct xc_dom_image *dom)
 {
-size_t p2m_size = dom->total_pages * dom->arch_hooks->sizeof_pfn;
+size_t p2m_alloc_size = dom->p2m_size * dom->arch_hooks->sizeof_pfn;
 
 /* allocate phys2mach table */
-if ( xc_dom_alloc_segment(dom, &dom->p2m_seg, "phys2mach", 0, p2m_size) )
+if ( xc_dom_alloc_segment(dom, &dom->p2m_seg, "phys2mach",
+  0, p2m_alloc_size) )
 return -1;
 dom->p2m_guest = xc_dom_seg_to_ptr(dom, &dom->p2m_seg);
 if ( dom->p2m_guest == NULL )
@@ -777,8 +778,9 @@ int arch_setup_meminit(struct xc_dom_image *dom)
 int count = dom->total_pages >> SUPERPAGE_PFN_SHIFT;
 xen_pfn_t extents[count];
 
+dom->p2m_size = dom->total_pages;
 dom->p2m_host = xc_dom_malloc(dom, sizeof(xen_pfn_t) *
-  dom->total_pages);
+  dom->p2m_size);
 if ( dom->p2m_host == NULL )
 return -EINVAL;
 
@@ -810,8 +812,9 @@ int arch_setup_meminit(struct xc_dom_image *dom)
 return rc;
 }
 /* setup initial p2m */
+dom->p2m_size = dom->total_pages;
 dom->p2m_host = xc_dom_malloc(dom, sizeof(xen_pfn_t) *
-  dom->total_pages);
+  

Re: [Xen-devel] Shared page tables between ETP and IOMMU issue

2015-02-26 Thread Jan Beulich
>>> On 26.02.15 at 16:45,  wrote:
> While testing PVH Dom0 support on a newer Core i3-5010U I've found that 
> sharing the page tables between EPT and the IOMMUs don't work. Booting 
> with iommu=no-sharept solves the problem, but I'm unsure what causes 
> this issue.

Is FreeBSD fiddling with its own memory map in some way? It's rather
surprising to see not just an occasional fault, but many of them, and
with L2 or even L3 entries not present. I.e. if it's not the OS
requesting re-arrangements, I would suppose table setup itself is
screwed up in some way. In the end - knowing the valid GFN range
for the guest - you may want to monitor/log how tables get created
and whether (and if so by whom) later some of the entries get
zapped.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] RFC: Automatically check xen's public headers for C++ pitfalls.

2015-02-26 Thread Jan Beulich
>>> On 26.02.15 at 17:28,  wrote:
> At 16:11 + on 26 Feb (1424963496), Tim Deegan wrote:
>> Explicitly _not_ addressing the use of 'private' in various fields,
>> since we'd previously decided not to fix that.
> 
> BTW, ring.h is the only instance of that, so the extra diff to clear
> that up too is pretty small (see below).
> 
> Not sure what people think about that though - it might be
> quite a PITA for downstream users of it, though they ought really to
> be using local copies so they can update in a controlled way.

linux-2.6.18-xen.hg always having consumed them (almost)
verbatim, I don't think we should break users not massaging
the headers. I.e. at least make the field name conditional upon
using C vs C++.

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH] Config.mk: update OVMF revision

2015-02-26 Thread Wei Liu
Update OVMF revision to the latest tested commit.

Signed-off-by: Wei Liu 
Cc: Ian Campbell 
Cc: Ian Jackson 
Cc: Anthony Perard 
---
Before applying this patch, please pull from

  git://xenbits.xen.org/osstest/ovmf.git xen-tested-master

and push all changes to

  git://xenbits.xen.org/ovmf.git master

It should be a fast-forward push.
---
 Config.mk | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Config.mk b/Config.mk
index d12ad91..173c2f7 100644
--- a/Config.mk
+++ b/Config.mk
@@ -251,7 +251,7 @@ QEMU_UPSTREAM_URL ?= 
git://xenbits.xen.org/qemu-upstream-unstable.git
 QEMU_TRADITIONAL_URL ?= git://xenbits.xen.org/qemu-xen-unstable.git
 SEABIOS_UPSTREAM_URL ?= git://xenbits.xen.org/seabios.git
 endif
-OVMF_UPSTREAM_REVISION ?= 447d264115c476142f884af0be287622cd244423
+OVMF_UPSTREAM_REVISION ?= a065efc7c7ce8bb3e5cb3e463099d023d4a92927
 QEMU_UPSTREAM_REVISION ?= master
 SEABIOS_UPSTREAM_REVISION ?= rel-1.7.5
 # Thu May 22 16:59:16 2014 -0400
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Branch Trace Storage for guests and VPMUinitialization

2015-02-26 Thread Boris Ostrovsky

On 02/26/2015 03:56 AM, Dietmar Hahn wrote:

Am Mittwoch 25 Februar 2015, 11:31:31 schrieb Boris Ostrovsky:

On 02/25/2015 10:12 AM, kevin.ma...@gdata.de wrote:

-Ursprüngliche Nachricht-
Von: Boris Ostrovsky [mailto:boris.ostrov...@oracle.com]
Gesendet: Dienstag, 24. Februar 2015 18:13
An: Mayer, Kevin; xen-devel@lists.xen.org
Betreff: Re: [Xen-devel] Branch Trace Storage for guests and VPMU
initialization

On 02/24/2015 10:27 AM, kevin.ma...@gdata.de wrote:

Hi guys

I`m trying to set up the BTS so that I can log the branches taken in
the guest using Xen 4.4.1 with a WinXP SP3 guest on a Core i7 Sandy
Bridge.

I added the vpmu=bts boot parameter to my grub2 configuration and
extended the libxl,libxc,domctl,… with an own command so that I can
trigger the activation of the BTS whenever I want.


I am not sure why you are doing all these changes to Xen code. BTS is
supposed to be managed from the guest. For example, a Fedora HVM guest
will produce this:

[root@dhcp-burlington7-2nd-B-east-10-152-55-140 ~]# perf record -e
branches:u -c 1 -d sleep 1 [ perf record: Woken up 3838 times to write data ] [
perf record: Captured and wrote 0.704 MB perf.data (~30756 samples) ]
[root@dhcp-burlington7-2nd-B-east-10-152-55-140 ~]# perf script -f
ip,addr,sym,dso,symoff --show-kernel-path
8167c347 native_irq_return_iret+0x0 (/proc/kcore) =>
328c001590 [unknown] (/proc/kcore)
8167c347 native_irq_return_iret+0x0 (/proc/kcore) =>
328c001590 [unknown] ([unknown])
  328c001593 [unknown] ([unknown]) =>   328c004b70 [unknown]
([unknown])
...


I want to be able to log the taken branches (of the guest) without the need to 
modify the guest at all.
This means I have to do all the logic in the hypervisor, or am I wrong?

In that case, yes. But then you have to make sure that at least
   * you don't load guest's VPMU (or, at least, BTS-related registers) on
context switch

But you need to modify PMU registers when switching to/from the guest context
to get PMU running.




I was thinking that all BTS stuff can be controlled from dom0 and so we 
can use dom0's version of these registers. I didn't realize that DS_AREA 
would have to be accessed in guest's address space (and that DEBUGCTL is 
loaded from VMCS).


Which is what I think I said in response to this message (which didn't 
show up on the list because Kevin accidentally dropped xen-devel).


-boris




I didn't think of using the VPMU stuff with modifying the context from outside
the guest.


   * You don't send the interrupt to the guest (meaning that you will
need to somehow inform dom0 of the BTS interrupt)

and probably more.

Essentially, you want dom0 to profile the guest. I have been working on
patches that would allow that but they are still under review.



In this command I do the following:

I set up the memory region for the BTS Buffer and the DS Buffer
Management Area using xzalloc_bytes


I don't think you should be allocating BTS buffers in the hypervisor, they are
in guest's memory.

I agree. As I said I think this is where my main problem is at the moment.
Is there any way I can allocate memory in the hypervisor in a way the guest can 
access it?

I am not sure this is what you want since you seem to *not* want the
guest to process the samples, right?

But yes, you can. E.g. something like what map_vcpu_info() does. (I have
no idea how you'd do this from Windows.)

The DS buffer has to be mapped within the guests address space so the CPU
running in guest context can access this area. Otherwise you get this
triple fault.
So I would think you need a mixture of writing some stuff in Windows and
patching the hypervisor.

Dietmar.




Of course the guest must not be able to use this memory in its normal 
operations but just for BTS.
Is this even possible? I am rather confused at the moment. :-D


Then I write the pointer to the BTS Buffer into the DS Buffer
Management Area at +0x0 and +0x8 (BTS Buffer Base and BTS Index)

When I use vmx_msr_write_intercept to store the value in
MSR_IA32_DS_AREA the host reboots (my idea is he tries to access a
vpmu-struct that isn´t there in the current vcpu and panics).


Who is trying to write to MSR_IA32_DS_AREA? The guest or dom0? I thought
you said that you want dom0 to do sampling. Or are you trying to setup
DS area from your guest and control it from dom0? I am somewhat confused.


Can you post hypervisor log? (hard to say how helpful it will be without
seeing your code changes though)


Right after enabling the BTS I get a triple fault.
hvm.c:1357:d2 Triple fault on VCPU0 - invoking HVM shutdown action 1.


That's not host reboot, this is your guest dying.



When I use a modified version of vmx_msr_write_intercept I don’t get
any crashes as long as I don’t enable BTS and TR in the
GUEST_IA32_DEBUGCTL (BTR works). When I enable the BTS (and TR) the
guest crashes. I suppose he gets killed by the hypervisor for
accessing forbidden memory.


Possibly because DS area point to hyperv

[Xen-devel] [PATCH v6 13/23] libxc: indentation change to xc_hvm_build_x86.c

2015-02-26 Thread Wei Liu
Move a while loop in xc_hvm_build_x86 one block to the right. No
functional change introduced.

Functional changes will be introduced in next patch.

Signed-off-by: Wei Liu 
Cc: Ian Campbell 
Cc: Ian Jackson 
Cc: Dario Faggioli 
Cc: Elena Ufimtseva 
Acked-by: Ian Campbell 
---
 tools/libxc/xc_hvm_build_x86.c | 153 ++---
 1 file changed, 81 insertions(+), 72 deletions(-)

diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
index c81a25b..ecc3224 100644
--- a/tools/libxc/xc_hvm_build_x86.c
+++ b/tools/libxc/xc_hvm_build_x86.c
@@ -353,98 +353,107 @@ static int setup_guest(xc_interface *xch,
 cur_pages = 0xc0;
 stat_normal_pages = 0xc0;
 
-while ( (rc == 0) && (nr_pages > cur_pages) )
 {
-/* Clip count to maximum 1GB extent. */
-unsigned long count = nr_pages - cur_pages;
-unsigned long max_pages = SUPERPAGE_1GB_NR_PFNS;
-
-if ( count > max_pages )
-count = max_pages;
-
-cur_pfn = page_array[cur_pages];
-
-/* Take care the corner cases of super page tails */
-if ( ((cur_pfn & (SUPERPAGE_1GB_NR_PFNS-1)) != 0) &&
- (count > (-cur_pfn & (SUPERPAGE_1GB_NR_PFNS-1))) )
-count = -cur_pfn & (SUPERPAGE_1GB_NR_PFNS-1);
-else if ( ((count & (SUPERPAGE_1GB_NR_PFNS-1)) != 0) &&
-  (count > SUPERPAGE_1GB_NR_PFNS) )
-count &= ~(SUPERPAGE_1GB_NR_PFNS - 1);
-
-/* Attemp to allocate 1GB super page. Because in each pass we only
- * allocate at most 1GB, we don't have to clip super page boundaries.
- */
-if ( ((count | cur_pfn) & (SUPERPAGE_1GB_NR_PFNS - 1)) == 0 &&
- /* Check if there exists MMIO hole in the 1GB memory range */
- !check_mmio_hole(cur_pfn << PAGE_SHIFT,
-  SUPERPAGE_1GB_NR_PFNS << PAGE_SHIFT,
-  mmio_start, mmio_size) )
+while ( (rc == 0) && (nr_pages > cur_pages) )
 {
-long done;
-unsigned long nr_extents = count >> SUPERPAGE_1GB_SHIFT;
-xen_pfn_t sp_extents[nr_extents];
-
-for ( i = 0; i < nr_extents; i++ )
-sp_extents[i] = page_array[cur_pages+(i< 0 )
-{
-stat_1gb_pages += done;
-done <<= SUPERPAGE_1GB_SHIFT;
-cur_pages += done;
-count -= done;
-}
-}
+/* Clip count to maximum 1GB extent. */
+unsigned long count = nr_pages - cur_pages;
+unsigned long max_pages = SUPERPAGE_1GB_NR_PFNS;
 
-if ( count != 0 )
-{
-/* Clip count to maximum 8MB extent. */
-max_pages = SUPERPAGE_2MB_NR_PFNS * 4;
 if ( count > max_pages )
 count = max_pages;
-
-/* Clip partial superpage extents to superpage boundaries. */
-if ( ((cur_pfn & (SUPERPAGE_2MB_NR_PFNS-1)) != 0) &&
- (count > (-cur_pfn & (SUPERPAGE_2MB_NR_PFNS-1))) )
-count = -cur_pfn & (SUPERPAGE_2MB_NR_PFNS-1);
-else if ( ((count & (SUPERPAGE_2MB_NR_PFNS-1)) != 0) &&
-  (count > SUPERPAGE_2MB_NR_PFNS) )
-count &= ~(SUPERPAGE_2MB_NR_PFNS - 1); /* clip non-s.p. tail */
-
-/* Attempt to allocate superpage extents. */
-if ( ((count | cur_pfn) & (SUPERPAGE_2MB_NR_PFNS - 1)) == 0 )
+
+cur_pfn = page_array[cur_pages];
+
+/* Take care the corner cases of super page tails */
+if ( ((cur_pfn & (SUPERPAGE_1GB_NR_PFNS-1)) != 0) &&
+ (count > (-cur_pfn & (SUPERPAGE_1GB_NR_PFNS-1))) )
+count = -cur_pfn & (SUPERPAGE_1GB_NR_PFNS-1);
+else if ( ((count & (SUPERPAGE_1GB_NR_PFNS-1)) != 0) &&
+  (count > SUPERPAGE_1GB_NR_PFNS) )
+count &= ~(SUPERPAGE_1GB_NR_PFNS - 1);
+
+/* Attemp to allocate 1GB super page. Because in each pass
+ * we only allocate at most 1GB, we don't have to clip
+ * super page boundaries.
+ */
+if ( ((count | cur_pfn) & (SUPERPAGE_1GB_NR_PFNS - 1)) == 0 &&
+ /* Check if there exists MMIO hole in the 1GB memory
+  * range */
+ !check_mmio_hole(cur_pfn << PAGE_SHIFT,
+  SUPERPAGE_1GB_NR_PFNS << PAGE_SHIFT,
+  mmio_start, mmio_size) )
 {
 long done;
-unsigned long nr_extents = count >> SUPERPAGE_2MB_SHIFT;
+unsigned long nr_extents = count >> SUPERPAGE_1GB_SHIFT;
 xen_pfn_t sp_extents[nr_extents];
 
 for ( i = 0; i < nr_extents; i++ )
-sp_extents[i] = 
page_array[cur_pages+(i< 0 )
 {
-stat_2mb_pages += done;
-

Re: [Xen-devel] [RFC] When to use "domain creation flag" or "HVM param"?

2015-02-26 Thread Tim Deegan
At 15:33 + on 26 Feb (1424961188), Julien Grall wrote:
> Hi,
> 
> On 26/02/15 11:09, Lars Kurth wrote:
> > Tim, Andrew, Jan,
> > it seems as if we are slowly coming to some conclusion on this thread. If
> > I am mistaken, I am wondering whether it would make sense to have an IRC
> > meeting with all the involved stake-holders and report back to the list.
> 
> I'm not sure where I should answer...
> 
> We have a similar problem on ARM where we have arch-specific information
> (GIC version, number of interrupts) which changes between each domain.
> 
> On Xen 4.5, we took the approach to create a separate DOMCTL for passing
> information. It has to be called before any VCPUs is created
> (DOMCTL_set_max_vcpus) and make the code more complicate to handle
> because we have to defer some domain initialization.
> 
> I took another approach for Xen 4.6 based on Jan suggestion [1]. A v3 as
> been send recently [2] and we had some discussion about what is the best
> approach.

This line (adding these immutable config options at create time) seems
like a good one to me.

For migration, we'd need a hypercall that lets the Xen tools extract
the correct values to pass to the receiving Xen.  Xen would fill in
the actual values used for anything (like this GIC option) that
was set to 'default' or 'don't care' on the initial create op.

Andrew Cooper had some reasons why we might want to split this into a
bare create op (which might do no more than allocate a domid) and a
set-config op that would take these and all other immutable flags.
I'm not wild for that but could be convinced either way -- I'll let
him fill in the details.

Cheers,

Tim.

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v6 04/23] libxc: duplicate snippet to allocate p2m_host array

2015-02-26 Thread Wei Liu
Currently all in tree code doesn't set the superpage flag, but Konrad
wants it retained for the moment.

As I'm going to change the p2m_host array allocation, duplicate the code
snippet to allocate p2m_host array in this patch, so that we retain the
behaviour in superpage case.

This patch introduces no functional change and it will make future patch
easier to review. Also removed one stray tab while I was there.

Signed-off-by: Wei Liu 
Cc: Ian Campbell 
Cc: Ian Jackson 
CC: Konrad Wilk 
---
 tools/libxc/xc_dom_x86.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
index bf06fe4..9dbaedb 100644
--- a/tools/libxc/xc_dom_x86.c
+++ b/tools/libxc/xc_dom_x86.c
@@ -772,15 +772,16 @@ int arch_setup_meminit(struct xc_dom_image *dom)
 return rc;
 }
 
-dom->p2m_host = xc_dom_malloc(dom, sizeof(xen_pfn_t) * dom->total_pages);
-if ( dom->p2m_host == NULL )
-return -EINVAL;
-
 if ( dom->superpages )
 {
 int count = dom->total_pages >> SUPERPAGE_PFN_SHIFT;
 xen_pfn_t extents[count];
 
+dom->p2m_host = xc_dom_malloc(dom, sizeof(xen_pfn_t) *
+  dom->total_pages);
+if ( dom->p2m_host == NULL )
+return -EINVAL;
+
 DOMPRINTF("Populating memory with %d superpages", count);
 for ( pfn = 0; pfn < count; pfn++ )
 extents[pfn] = pfn << SUPERPAGE_PFN_SHIFT;
@@ -809,9 +810,13 @@ int arch_setup_meminit(struct xc_dom_image *dom)
 return rc;
 }
 /* setup initial p2m */
+dom->p2m_host = xc_dom_malloc(dom, sizeof(xen_pfn_t) *
+  dom->total_pages);
+if ( dom->p2m_host == NULL )
+return -EINVAL;
 for ( pfn = 0; pfn < dom->total_pages; pfn++ )
 dom->p2m_host[pfn] = pfn;
-
+
 /* allocate guest memory */
 for ( i = rc = allocsz = 0;
   (i < dom->total_pages) && !rc;
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: xen config changes v4

2015-02-26 Thread Luis R. Rodriguez
On Thu, Feb 26, 2015 at 11:08:20AM +, Stefano Stabellini wrote:
> On Thu, 26 Feb 2015, David Vrabel wrote:
> > On 26/02/15 04:59, Juergen Gross wrote:
> > > 
> > > So we are again in the situation that pv-drivers always imply the pvops
> > > kernel (PARAVIRT selected). I started the whole Kconfig rework to
> > > eliminate this dependency.
> > 
> > Yes.  Can you produce a series that just addresses this one issue.
> > 
> > In the absence of any concrete requirement for this big Kconfig reorg I
> > I don't think it is helpful.
> 
> I clearly missed some context as I didn't realize that this was the
> intended goal. Why do we want this? Please explain as it won't come
> for free.
> 
> 
> We have a few PV interfaces for HVM guests that need PARAVIRT in Linux
> in order to be used, for example pv_time_ops and HVMOP_pagetable_dying.
> They are critical performance improvements and from the interface
> perspective, small enough that doesn't make much sense having a separate
> KConfig option for them.
> 
> 
> In order to reach the goal above we necessarily need to introduce a
> differentiation in terms of PV on HVM guests in Linux:
> 
> 1) basic guests with PV network, disk, etc but no PV timers, no
>HVMOP_pagetable_dying, no PV IPIs
> 2) full PV on HVM guests that have PV network, disk, timers,
>HVMOP_pagetable_dying, PV IPIs and anything else that makes sense.
> 
> 2) is much faster than 1) on Xen and 2) is only a tiny bit slower than
> 1) on native x86

Also don't we shove 2) down hvm guests right now? Even when everything is
built in I do not see how we opt out for HVM for 1) at run time right now.
If this is true then the question of motivation for this becomes even
stronger I think.

  Luis

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v6 12/23] libxl: build, check and pass vNUMA info to Xen for PV guest

2015-02-26 Thread Wei Liu
Transform the user supplied vNUMA configuration into libxl internal
representations, and finally libxc representations. Check validity of
the configuration along the line.

Signed-off-by: Wei Liu 
Reviewed-by: Dario Faggioli 
Cc: Ian Campbell 
Cc: Ian Jackson 
Cc: Dario Faggioli 
Cc: Elena Ufimtseva 
Acked-by: Ian Campbell 
---
Changes in v6:
1. Use "unsigned" for some variables.
2. Variable name: bit -> j.

Changes in v5:
1. Adapt to change of interface (ditching xc_vnuma_info).

Changes in v4:
1. Adapt to new interfaces.

Changes in v3:
1. Add more commit log.
---
 tools/libxl/libxl_dom.c | 77 +
 1 file changed, 77 insertions(+)

diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index a16d4a1..b58a19b 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -515,6 +515,51 @@ retry_transaction:
 return 0;
 }
 
+static int set_vnuma_info(libxl__gc *gc, uint32_t domid,
+  const libxl_domain_build_info *info,
+  const libxl__domain_build_state *state)
+{
+int rc = 0;
+unsigned int i, nr_vdistance;
+unsigned int *vcpu_to_vnode, *vnode_to_pnode, *vdistance = NULL;
+
+vcpu_to_vnode = libxl__calloc(gc, info->max_vcpus,
+  sizeof(unsigned int));
+vnode_to_pnode = libxl__calloc(gc, info->num_vnuma_nodes,
+   sizeof(unsigned int));
+
+nr_vdistance = info->num_vnuma_nodes * info->num_vnuma_nodes;
+vdistance = libxl__calloc(gc, nr_vdistance, sizeof(unsigned int));
+
+for (i = 0; i < info->num_vnuma_nodes; i++) {
+libxl_vnode_info *v = &info->vnuma_nodes[i];
+int j;
+
+/* vnode to pnode mapping */
+vnode_to_pnode[i] = v->pnode;
+
+/* vcpu to vnode mapping */
+libxl_for_each_set_bit(j, v->vcpus)
+vcpu_to_vnode[j] = i;
+
+/* node distances */
+assert(info->num_vnuma_nodes == v->num_distances);
+memcpy(vdistance + (i * info->num_vnuma_nodes),
+   v->distances,
+   v->num_distances * sizeof(unsigned int));
+}
+
+if (xc_domain_setvnuma(CTX->xch, domid, info->num_vnuma_nodes,
+   state->num_vmemranges, info->max_vcpus,
+   state->vmemranges, vdistance,
+   vcpu_to_vnode, vnode_to_pnode) < 0) {
+LOGE(ERROR, "xc_domain_setvnuma failed");
+rc = ERROR_FAIL;
+}
+
+return rc;
+}
+
 int libxl__build_pv(libxl__gc *gc, uint32_t domid,
  libxl_domain_build_info *info, libxl__domain_build_state *state)
 {
@@ -572,6 +617,38 @@ int libxl__build_pv(libxl__gc *gc, uint32_t domid,
 dom->xenstore_domid = state->store_domid;
 dom->claim_enabled = libxl_defbool_val(info->claim_mode);
 
+if (info->num_vnuma_nodes != 0) {
+unsigned int i;
+
+ret = libxl__vnuma_build_vmemrange_pv(gc, domid, info, state);
+if (ret) {
+LOGE(ERROR, "cannot build vmemranges");
+goto out;
+}
+ret = libxl__vnuma_config_check(gc, info, state);
+if (ret) goto out;
+
+ret = set_vnuma_info(gc, domid, info, state);
+if (ret) goto out;
+
+dom->nr_vmemranges = state->num_vmemranges;
+dom->vmemranges = xc_dom_malloc(dom, sizeof(*dom->vmemranges) *
+dom->nr_vmemranges);
+
+for (i = 0; i < dom->nr_vmemranges; i++) {
+dom->vmemranges[i].start = state->vmemranges[i].start;
+dom->vmemranges[i].end   = state->vmemranges[i].end;
+dom->vmemranges[i].flags = state->vmemranges[i].flags;
+dom->vmemranges[i].nid   = state->vmemranges[i].nid;
+}
+
+dom->nr_vnodes = info->num_vnuma_nodes;
+dom->vnode_to_pnode = xc_dom_malloc(dom, sizeof(*dom->vnode_to_pnode) *
+dom->nr_vnodes);
+for (i = 0; i < info->num_vnuma_nodes; i++)
+dom->vnode_to_pnode[i] = info->vnuma_nodes[i].pnode;
+}
+
 if ( (ret = xc_dom_boot_xen_init(dom, ctx->xch, domid)) != 0 ) {
 LOGE(ERROR, "xc_dom_boot_xen_init failed");
 goto out;
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v6 06/23] libxc: allocate memory with vNUMA information for PV guest

2015-02-26 Thread Wei Liu
>From libxc's point of view, it only needs to know vnode to pnode mapping
and size of each vnode to allocate memory accordingly. Add these fields
to xc_dom structure.

The caller might not pass in vNUMA information. In that case, a dummy
layout is generated for the convenience of libxc's allocation code. The
upper layer (libxl etc) still sees the domain has no vNUMA
configuration.

Note that for this patch on PV x86 guest can have multiple regions of
ram allocated.

Signed-off-by: Wei Liu 
Cc: Ian Campbell 
Cc: Ian Jackson 
Cc: Dario Faggioli 
Cc: Elena Ufimtseva 
---
Changes in v6:
1. Ditch XC_VNUMA_NO_NODE and use XEN_NUMA_NO_NODE.
2. Update comment in xc_dom.h.

Changes in v5:
1. Ditch xc_vnuma_info.

Changes in v4:
1. Pack fields into a struct.
2. Use "page" as unit.
3. __FUNCTION__ -> __func__.
4. Don't print total_pages.
5. Improve comment.

Changes in v3:
1. Rewrite commit log.
2. Shorten some error messages.
---
 tools/libxc/include/xc_dom.h |  12 -
 tools/libxc/xc_dom_x86.c | 101 +--
 2 files changed, 97 insertions(+), 16 deletions(-)

diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h
index 6b8ddf4..a7d059a 100644
--- a/tools/libxc/include/xc_dom.h
+++ b/tools/libxc/include/xc_dom.h
@@ -119,8 +119,10 @@ struct xc_dom_image {
 
 /* physical memory
  *
- * An x86 PV guest has a single contiguous block of physical RAM,
- * consisting of total_pages starting at rambase_pfn.
+ * An x86 PV guest has one or more blocks of physical RAM,
+ * consisting of total_pages starting at rambase_pfn. The start
+ * address and size of each block is controlled by vNUMA
+ * structures.
  *
  * An ARM guest has GUEST_RAM_BANKS regions of RAM, with
  * rambank_size[i] pages in each. The lowest RAM address
@@ -168,6 +170,12 @@ struct xc_dom_image {
 struct xc_dom_loader *kernel_loader;
 void *private_loader;
 
+/* vNUMA information */
+xen_vmemrange_t *vmemranges;
+unsigned int nr_vmemranges;
+unsigned int *vnode_to_pnode;
+unsigned int nr_vnodes;
+
 /* kernel loader */
 struct xc_dom_arch *arch_hooks;
 /* allocate up to virt_alloc_end */
diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c
index bea54f2..268d4db 100644
--- a/tools/libxc/xc_dom_x86.c
+++ b/tools/libxc/xc_dom_x86.c
@@ -760,7 +760,8 @@ static int x86_shadow(xc_interface *xch, domid_t domid)
 int arch_setup_meminit(struct xc_dom_image *dom)
 {
 int rc;
-xen_pfn_t pfn, allocsz, i, j, mfn;
+xen_pfn_t pfn, allocsz, mfn, total, pfn_base;
+int i, j;
 
 rc = x86_compat(dom->xch, dom->guest_domid, dom->guest_type);
 if ( rc )
@@ -811,26 +812,98 @@ int arch_setup_meminit(struct xc_dom_image *dom)
 if ( rc )
 return rc;
 }
-/* setup initial p2m */
-dom->p2m_size = dom->total_pages;
+
+/* Setup dummy vNUMA information if it's not provided. Note
+ * that this is a valid state if libxl doesn't provide any
+ * vNUMA information.
+ *
+ * The dummy values make libxc allocate all pages from
+ * arbitrary physical nodes. This is the expected behaviour if
+ * no vNUMA configuration is provided to libxc.
+ *
+ * Note that the following hunk is just for the convenience of
+ * allocation code. No defaulting happens in libxc.
+ */
+if ( dom->nr_vmemranges == 0 )
+{
+dom->nr_vmemranges = 1;
+dom->vmemranges = xc_dom_malloc(dom, sizeof(*dom->vmemranges));
+dom->vmemranges[0].start = 0;
+dom->vmemranges[0].end   = dom->total_pages << PAGE_SHIFT;
+dom->vmemranges[0].flags = 0;
+dom->vmemranges[0].nid   = 0;
+
+dom->nr_vnodes = 1;
+dom->vnode_to_pnode = xc_dom_malloc(dom,
+  sizeof(*dom->vnode_to_pnode));
+dom->vnode_to_pnode[0] = XEN_NUMA_NO_NODE;
+}
+
+total = dom->p2m_size = 0;
+for ( i = 0; i < dom->nr_vmemranges; i++ )
+{
+total += ((dom->vmemranges[i].end - dom->vmemranges[i].start)
+  >> PAGE_SHIFT);
+dom->p2m_size =
+dom->p2m_size > (dom->vmemranges[i].end >> PAGE_SHIFT) ?
+dom->p2m_size : (dom->vmemranges[i].end >> PAGE_SHIFT);
+}
+if ( total != dom->total_pages )
+{
+xc_dom_panic(dom->xch, XC_INTERNAL_ERROR,
+ "%s: vNUMA page count mismatch (0x%"PRIpfn" != 
0x%"PRIpfn")\n",
+ __func__, total, dom->total_pages);
+return -EINVAL;
+}
+
 dom->p2m_host = xc_dom_malloc(dom, sizeof(xen_pfn_t) *
   dom->p2m_size);
 if ( dom->p2m_host == NULL )
 return -EINVAL;
-for ( pfn = 0; pfn < dom->total_pages; pfn++ )
-d

Re: [Xen-devel] [Qemu-devel] [v2][PATCH] libxl: add one machine property to support IGD GFX passthrough

2015-02-26 Thread Ian Campbell
On Thu, 2015-02-26 at 14:35 +0800, Chen, Tiejun wrote:

> > If we are going to do this then I think we need to arrange for the
> > interface to be able to express the need to force the workarounds for a
> > particular device. IOW a boolean will not suffice since it doesn't
> > indicate that IGD workarounds are needed.
> >
> > Probably it would be simplest to just leave this functionality out for
> > the time being and revisit if/when maintaining the list becomes an
> > annoyance or an end user trips over it.
> >
> 
> You mean we should maintain one list to save all targeted devices, then 
> tools uses ids as an index to lookup this list to pass something to qemu.

I (think I) meant a list of pci vid:did in libxl, which is matched
against the devices passed to the domain (e.g. "pci = [...]" in xl cfg),
which then enables the igd workarounds, i.e. by passing the option to
qemu.

> But actually one question that I have always been thinking about is, its 
> really a responsibility of Xen to determine which device type should be 
> passed by probing that pair of vendor and device ids? Xen is just one of 
> so many approaches to qemu so such a rare workaround option can be 
> passed actively by any user, instead of Xen. Furthermore, its becoming 
> flexible as well to those cases we want to force overriding this.

I'm not sure, but I think you are suggestion that qemu should autodetect
this situation, without being explicitly told "igd-passthru=on" on the
command line?

If the qemu maintainers are amenable to that, and it's not already the
case that other components (e.g. hvmloader) need to be told about these
workarounds, then I suppose that would work.

> So I think qemu should mainly plays this role. If qemu realizes we're 
> passing through a IGD or other targeted device, it should post a warning 
> or even error message to indicate what right behavior is needed, or what 
> is that potential risk by default.

Hrm, here it sounds more like you are suggesting that qemu should detect
and warn, rather than detect and do the right thing?

I'm not sure how Qemu could indicate what the right behaviour is going
to be, it'll differ for different hypervisors or even for which Xen
toolstack (xl vs libvirt etc) is in use.

Or maybe I've misunderstood?

Ian.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Branch Trace Storage for guests and VPMUinitialization

2015-02-26 Thread Dietmar Hahn
Am Mittwoch 25 Februar 2015, 11:31:31 schrieb Boris Ostrovsky:
> On 02/25/2015 10:12 AM, kevin.ma...@gdata.de wrote:
> >> -Ursprüngliche Nachricht-
> >> Von: Boris Ostrovsky [mailto:boris.ostrov...@oracle.com]
> >> Gesendet: Dienstag, 24. Februar 2015 18:13
> >> An: Mayer, Kevin; xen-devel@lists.xen.org
> >> Betreff: Re: [Xen-devel] Branch Trace Storage for guests and VPMU
> >> initialization
> >>
> >> On 02/24/2015 10:27 AM, kevin.ma...@gdata.de wrote:
> >>> Hi guys
> >>>
> >>> I`m trying to set up the BTS so that I can log the branches taken in
> >>> the guest using Xen 4.4.1 with a WinXP SP3 guest on a Core i7 Sandy
> >>> Bridge.
> >>>
> >>> I added the vpmu=bts boot parameter to my grub2 configuration and
> >>> extended the libxl,libxc,domctl,… with an own command so that I can
> >>> trigger the activation of the BTS whenever I want.
> >>>
> >>
> >> I am not sure why you are doing all these changes to Xen code. BTS is
> >> supposed to be managed from the guest. For example, a Fedora HVM guest
> >> will produce this:
> >>
> >> [root@dhcp-burlington7-2nd-B-east-10-152-55-140 ~]# perf record -e
> >> branches:u -c 1 -d sleep 1 [ perf record: Woken up 3838 times to write 
> >> data ] [
> >> perf record: Captured and wrote 0.704 MB perf.data (~30756 samples) ]
> >> [root@dhcp-burlington7-2nd-B-east-10-152-55-140 ~]# perf script -f
> >> ip,addr,sym,dso,symoff --show-kernel-path
> >>8167c347 native_irq_return_iret+0x0 (/proc/kcore) =>
> >> 328c001590 [unknown] (/proc/kcore)
> >>8167c347 native_irq_return_iret+0x0 (/proc/kcore) =>
> >> 328c001590 [unknown] ([unknown])
> >>  328c001593 [unknown] ([unknown]) =>   328c004b70 [unknown]
> >> ([unknown])
> >> ...
> >>
> > I want to be able to log the taken branches (of the guest) without the need 
> > to modify the guest at all.
> > This means I have to do all the logic in the hypervisor, or am I wrong?
> 
> In that case, yes. But then you have to make sure that at least
>   * you don't load guest's VPMU (or, at least, BTS-related registers) on 
> context switch

But you need to modify PMU registers when switching to/from the guest context
to get PMU running.
I didn't think of using the VPMU stuff with modifying the context from outside
the guest.

>   * You don't send the interrupt to the guest (meaning that you will 
> need to somehow inform dom0 of the BTS interrupt)
> 
> and probably more.
> 
> Essentially, you want dom0 to profile the guest. I have been working on
> patches that would allow that but they are still under review.
> 
> 
> >
> >>> In this command I do the following:
> >>>
> >>> I set up the memory region for the BTS Buffer and the DS Buffer
> >>> Management Area using xzalloc_bytes
> >>>
> >>
> >> I don't think you should be allocating BTS buffers in the hypervisor, they 
> >> are
> >> in guest's memory.
> > I agree. As I said I think this is where my main problem is at the moment.
> > Is there any way I can allocate memory in the hypervisor in a way the guest 
> > can access it?
> 
> I am not sure this is what you want since you seem to *not* want the 
> guest to process the samples, right?
> 
> But yes, you can. E.g. something like what map_vcpu_info() does. (I have 
> no idea how you'd do this from Windows.)

The DS buffer has to be mapped within the guests address space so the CPU
running in guest context can access this area. Otherwise you get this
triple fault.
So I would think you need a mixture of writing some stuff in Windows and
patching the hypervisor.

Dietmar.

> 
> 
> > Of course the guest must not be able to use this memory in its normal 
> > operations but just for BTS.
> > Is this even possible? I am rather confused at the moment. :-D
> >
> >>> Then I write the pointer to the BTS Buffer into the DS Buffer
> >>> Management Area at +0x0 and +0x8 (BTS Buffer Base and BTS Index)
> >>>
> >>> When I use vmx_msr_write_intercept to store the value in
> >>> MSR_IA32_DS_AREA the host reboots (my idea is he tries to access a
> >>> vpmu-struct that isn´t there in the current vcpu and panics).
> 
> 
> Who is trying to write to MSR_IA32_DS_AREA? The guest or dom0? I thought 
> you said that you want dom0 to do sampling. Or are you trying to setup 
> DS area from your guest and control it from dom0? I am somewhat confused.
> 
> >>>
> >> Can you post hypervisor log? (hard to say how helpful it will be without
> >> seeing your code changes though)
> >>
> > Right after enabling the BTS I get a triple fault.
> > hvm.c:1357:d2 Triple fault on VCPU0 - invoking HVM shutdown action 1.
> 
> 
> That's not host reboot, this is your guest dying.
> 
> 
> >
> >>> When I use a modified version of vmx_msr_write_intercept I don’t get
> >>> any crashes as long as I don’t enable BTS and TR in the
> >>> GUEST_IA32_DEBUGCTL (BTR works). When I enable the BTS (and TR) the
> >>> guest crashes. I suppose he gets killed by the hypervisor for
> >>> accessing forbidden memory.
> >>>
> >> Possibly because DS area point to hyperviso

Re: [Xen-devel] freemem-slack and large memory environments

2015-02-26 Thread Ian Campbell
On Thu, 2015-02-26 at 08:36 -0700, Mike Latimer wrote:
> On Wednesday, February 25, 2015 02:09:50 PM Stefano Stabellini wrote:
> > > Is the upshot that Mike doesn't need to do anything further with his
> > > patch (i.e. can drop it)? I think so?
> > 
> > Yes, I think so. Maybe he could help out testing the patches I am going
> > to write :-)
> 
> Sorry for not responding to this yesterday.
> 
> There is still one aspect of my original patch that is important. As the code 
> currently stands, the target for dom0 is set lower during each iteration of 
> the loop. Unless only one iteration is required, dom0 will end up being set 
> to 
> a much lower target than is actually required.

Is this because some sort of slack is applied once per iteration rather
than once at the start or is it something else?

> 
> There are two ways to fix this issue:
> 
>  - Set the memory target for dom0 once, before entering the loop
>  - During each iteration of the loop, compare the amount of needed memory to 
> the amount of memory which will be available once dom0 hits the target, and 
> only lower the target if additional memory is needed.
> 
> My patch earlier in this thread does the former, but I think the second 
> option 
> is also possible. Is there a preference between those approaches (or a better 
> idea)?
> 
> Thanks,
> Mike
> 
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] Shared page tables between ETP and IOMMU issue

2015-02-26 Thread Roger Pau Monné
Hello,

While testing PVH Dom0 support on a newer Core i3-5010U I've found that 
sharing the page tables between EPT and the IOMMUs don't work. Booting 
with iommu=no-sharept solves the problem, but I'm unsure what causes 
this issue.

Here is the output of the system successfully booting with 
iommu=debug,no-sharept:

/boot/xen data=0x1de9f0+0x7fd22610 -
/boot/kernel/kernel size=0x14bcd33
/boot/kernel/zfs.ko size 0x37d888 at 0x8155
loading required module 'opensolaris'
/boot/kernel/opensolaris.ko size 0xc790 at 0x818ce000
Booting...
 Xen 4.6-unstable
(XEN) Xen version 4.6-unstable (root@) (gcc47 (FreeBSD Ports Collection) 4.7.4) 
debug=y Thu Feb 26 19:23:57 UTC 2015
(XEN) Latest ChangeSet: Wed Feb 11 17:21:14 2015 +0100 git:cb34a7c-dirty
(XEN) Bootloader: FreeBSD Loader
(XEN) Command line: dom0_mem=2048M dom0pvh=1 console=com1,vga 
iommu=debug,no-sharept guest_loglvl=all loglvl=all
(XEN) Video information:
(XEN)  VGA is text mode 80x25, font 8x16
(XEN)  VBE/DDC methods: V2; EDID transfer time: 1 seconds
(XEN) Disc information:
(XEN)  Found 1 MBR signatures
(XEN)  Found 1 EDD information structures
(XEN) Xen-e820 RAM map:
(XEN)   - 0009d800 (usable)
(XEN)  0009d800 - 000a (reserved)
(XEN)  000e - 0010 (reserved)
(XEN)  0010 - d76d8000 (usable)
(XEN)  d76d8000 - d7bb5000 (reserved)
(XEN)  d7bb5000 - dc319000 (usable)
(XEN)  dc319000 - dc378000 (reserved)
(XEN)  dc378000 - dc39b000 (ACPI data)
(XEN)  dc39b000 - dcccb000 (ACPI NVS)
(XEN)  dcccb000 - dcfff000 (reserved)
(XEN)  dcfff000 - dd00 (usable)
(XEN)  dd80 - e000 (reserved)
(XEN)  f800 - fc00 (reserved)
(XEN)  fec0 - fec01000 (reserved)
(XEN)  fed0 - fed04000 (reserved)
(XEN)  fed1c000 - fed2 (reserved)
(XEN)  fee0 - fee01000 (reserved)
(XEN)  ff00 - 0001 (reserved)
(XEN)  0001 - 00021f00 (usable)
(XEN) ACPI: RSDP 000F0580, 0024 (r2 INTEL )
(XEN) ACPI: XSDT DC37F090, 00A4 (r1  INTEL NUC5i3MY  1072009 AMI 10013)
(XEN) ACPI: FACP DC392C10, 010C (r5  INTEL NUC5i3MY  1072009 AMI 10013)
(XEN) ACPI: DSDT DC37F1C8, 13A48 (r2  INTEL NUC5i3MY  1072009 INTL 20120913)
(XEN) ACPI: FACS DCCC9F80, 0040
(XEN) ACPI: APIC DC392D20, 0084 (r3  INTEL NUC5i3MY  1072009 AMI 10013)
(XEN) ACPI: FPDT DC392DA8, 0044 (r1  INTEL NUC5i3MY  1072009 AMI 10013)
(XEN) ACPI: FIDT DC392DF0, 009C (r1  INTEL NUC5i3MY  1072009 AMI 10013)
(XEN) ACPI: MCFG DC392E90, 003C (r1  INTEL NUC5i3MY  1072009 MSFT   97)
(XEN) ACPI: HPET DC392ED0, 0038 (r1  INTEL NUC5i3MY  1072009 AMI.5)
(XEN) ACPI: SSDT DC392F08, 0315 (r1  INTEL NUC5i3MY 1000 INTL 20120913)
(XEN) ACPI: UEFI DC393220, 0042 (r1  INTEL NUC5i3MY0 0)
(XEN) ACPI: SSDT DC393268, 0C7D (r2  INTEL NUC5i3MY 1000 INTL 20120913)
(XEN) ACPI: ASF! DC393EE8, 00A0 (r32  INTEL NUC5i3MY1 TFSMF4240)
(XEN) ACPI: SSDT DC393F88, 0539 (r2  INTEL NUC5i3MY 3000 INTL 20120913)
(XEN) ACPI: SSDT DC3944C8, 0B74 (r2  INTEL NUC5i3MY 3000 INTL 20120913)
(XEN) ACPI: TPM2 DC395040, 0034 (r3  INTEL NUC5i3MY1 AMI 0)
(XEN) ACPI: SSDT DC395078, 0041 (r1  INTEL NUC5i3MY 1000 INTL 20120913)
(XEN) ACPI: SSDT DC3950C0, 5CF6 (r2  INTEL NUC5i3MY 3000 INTL 20120913)
(XEN) ACPI: DMAR DC39ADB8, 00B0 (r1  INTEL NUC5i3MY1 INTL1)
(XEN) System RAM: 8109MB (8304488kB)
(XEN) No NUMA configuration found
(XEN) Faking a node at -00021f00
(XEN) Domain heap initialised
(XEN) found SMP MP-table at 000fd7c0
(XEN) DMI 2.8 present.
(XEN) Using APIC driver default
(XEN) ACPI: PM-Timer IO Port: 0x1808
(XEN) ACPI: v5 SLEEP INFO: control[0:0], status[0:0]
(XEN) ACPI: SLEEP INFO: pm1x_cnt[1:1804,1:0], pm1x_evt[1:1800,1:0]
(XEN) ACPI: 32/64X FACS address mismatch in FADT - dccc9f80/, 
using 32
(XEN) ACPI: wakeup_vec[dccc9f8c], vec_size[20]
(XEN) ACPI: Local APIC address 0xfee0
(XEN) ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled)
(XEN) Processor #0 7:13 APIC version 21
(XEN) ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled)
(XEN) Processor #2 7:13 APIC version 21
(XEN) ACPI: LAPIC (acpi_id[0x03] lapic_id[0x01] enabled)
(XEN) Processor #1 7:13 APIC version 21
(XEN) ACPI: LAPIC (acpi_id[0x04] lapic_id[0x03] enabled)
(XEN) Processor #3 7:13 APIC version 21
(XEN) ACPI: LAPIC_NMI (acpi_id[0x01] dfl dfl lint[0])
(XEN) ACPI: NMI not connected to LINT 1!
(XEN) ACPI: LAPIC_NMI (acpi_id[0x02] dfl dfl lint[0])
(XEN) ACPI: NMI not connected to LINT 1!
(XEN) ACPI: LAPIC_NMI (acpi_id[0x03] dfl dfl lint[0])
(XEN) ACPI: NMI not connected to LINT 1!
(XEN) ACPI: LAPIC_NMI (acpi_id[0x04] dfl dfl lint[0])
(XEN) ACPI: NMI not connected to LINT 1!
(XEN) ACPI: IOAPIC (id[0x02]

[Xen-devel] how to assign resources exclusive to a single domU

2015-02-26 Thread Olaf Hering
While working on pvscsi support for libxl I noticed that assigning a
resource exclusivly to just a single domU via libxl will be a major
effort. Up to now libxl could rely on the fact that a resource can be
either shared or the backend deals with the attempt to share.

There are two cases in pvscsi:

 1) a single physical HST:CHN:TGT:LUN device must be assigned to just a
single domU. While the (xenlinux) backend driver allows to assign
the device to more than one domU the sharing can not work in
practice.
 2) the xenlinux backend driver has two modes: emulation and raw. With
raw mode the SCSI commands coming from domU will be passed directly
to the physical device. I think its required to make sure that all
devices connected to a physical scsi host must operate either
entirely in raw mode or on emulation mode.

To handle both cases libxl could either assume that the admin is
responsible for proper configuration:
 - just one domU per physical device
 - if raw mode is enabled all devices on the physcial scsi host will be
   assigned to just one domU

Or libxl gets functionality to verify that two cases above are really
enforced. Doing that means that there has to be some global lock under
which the system state in xenstore is parsed and the to be assigned domU
configuration is compared:
 - are the physical devices already assigned
 - is the raw mode properly configured

In xend the case #1 was not handled. There is some code for case #2, I
have to check how complete the enforcement in xend was.

I wonder what should be done in my changes for libxl.

Olaf

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 3/5] xen/arm: Make gic-v2 code handle hip04-d01 platform

2015-02-26 Thread Frediano Ziglio
...
> >  /*
> > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index
> > 390c8b0..e4512a8 100644
> > --- a/xen/arch/arm/gic.c
> > +++ b/xen/arch/arm/gic.c
> > @@ -565,12 +565,13 @@ static void do_sgi(struct cpu_user_regs *regs,
> > enum gic_sgi sgi)  void gic_interrupt(struct cpu_user_regs *regs, int
> > is_fiq)  {
> >  unsigned int irq;
> > +unsigned int max_irq = gic_hw_ops->info->nr_lines;
> >
> >  do  {
> >  /* Reading IRQ will ACK it */
> >  irq = gic_hw_ops->read_irq();
> >
> > -if ( likely(irq >= 16 && irq < 1021) )
> > +if ( likely(irq >= 16 && irq < max_irq) )
> >  {
> >  local_irq_enable();
> >  do_IRQ(regs, irq, is_fiq);
> 
> This change should belong to a separate patch.
> 

Looking at code paths and discussing with a colleague that partially wrote the 
patch I think this test is not necessary at all.

I'll check it.

Frediano


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v6 02/23] xen: move NUMA_NO_NODE to public memory.h as XEN_NUMA_NO_NODE

2015-02-26 Thread Wei Liu
Update NUMA_NO_NODE in Xen code to use the new macro.

No functional change introduced.

Signed-off-by: Wei Liu 
Cc: Andrew Cooper 
Cc: Jan Beulich 
---
 xen/arch/x86/hpet.c  |  2 +-
 xen/arch/x86/irq.c   |  4 ++--
 xen/arch/x86/numa.c  | 14 +++---
 xen/arch/x86/physdev.c   |  2 +-
 xen/arch/x86/setup.c |  2 +-
 xen/arch/x86/smpboot.c   |  2 +-
 xen/arch/x86/srat.c  | 28 ++--
 xen/arch/x86/x86_64/mm.c |  2 +-
 xen/common/page_alloc.c  |  4 ++--
 xen/drivers/passthrough/amd/iommu_init.c |  2 +-
 xen/drivers/passthrough/vtd/iommu.c  |  8 
 xen/include/public/memory.h  |  2 ++
 xen/include/xen/numa.h   |  5 ++---
 13 files changed, 39 insertions(+), 38 deletions(-)

diff --git a/xen/arch/x86/hpet.c b/xen/arch/x86/hpet.c
index 8f36f6f..3b6d12f 100644
--- a/xen/arch/x86/hpet.c
+++ b/xen/arch/x86/hpet.c
@@ -375,7 +375,7 @@ static int __init hpet_assign_irq(struct hpet_event_channel 
*ch)
 {
 int irq;
 
-if ( (irq = create_irq(NUMA_NO_NODE)) < 0 )
+if ( (irq = create_irq(XEN_NUMA_NO_NODE)) < 0 )
 return irq;
 
 ch->msi.irq = irq;
diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c
index 786d1fc..deb67d7 100644
--- a/xen/arch/x86/irq.c
+++ b/xen/arch/x86/irq.c
@@ -173,7 +173,7 @@ int create_irq(nodeid_t node)
 {
 cpumask_t *mask = NULL;
 
-if ( node != NUMA_NO_NODE )
+if ( node != XEN_NUMA_NO_NODE )
 {
 mask = &node_to_cpumask(node);
 if (cpumask_empty(mask))
@@ -2000,7 +2000,7 @@ int map_domain_pirq(
 spin_unlock_irqrestore(&desc->lock, flags);
 
 info = NULL;
-irq = create_irq(NUMA_NO_NODE);
+irq = create_irq(XEN_NUMA_NO_NODE);
 ret = irq >= 0 ? prepare_domain_irq_pirq(d, irq, pirq + nr, &info)
: irq;
 if ( ret )
diff --git a/xen/arch/x86/numa.c b/xen/arch/x86/numa.c
index 132d694..6e1a0b8 100644
--- a/xen/arch/x86/numa.c
+++ b/xen/arch/x86/numa.c
@@ -37,13 +37,13 @@ unsigned long memnodemapsize;
 u8 *memnodemap;
 
 nodeid_t cpu_to_node[NR_CPUS] __read_mostly = {
-[0 ... NR_CPUS-1] = NUMA_NO_NODE
+[0 ... NR_CPUS-1] = XEN_NUMA_NO_NODE
 };
 /*
  * Keep BIOS's CPU2node information, should not be used for memory allocaion
  */
 nodeid_t apicid_to_node[MAX_LOCAL_APIC] __cpuinitdata = {
-[0 ... MAX_LOCAL_APIC-1] = NUMA_NO_NODE
+[0 ... MAX_LOCAL_APIC-1] = XEN_NUMA_NO_NODE
 };
 cpumask_t node_to_cpumask[MAX_NUMNODES] __read_mostly;
 
@@ -71,7 +71,7 @@ static int __init populate_memnodemap(const struct node 
*nodes,
 unsigned long spdx, epdx;
 int i, res = -1;
 
-memset(memnodemap, NUMA_NO_NODE, memnodemapsize * sizeof(*memnodemap));
+memset(memnodemap, XEN_NUMA_NO_NODE, memnodemapsize * sizeof(*memnodemap));
 for ( i = 0; i < numnodes; i++ )
 {
 spdx = paddr_to_pdx(nodes[i].start);
@@ -81,7 +81,7 @@ static int __init populate_memnodemap(const struct node 
*nodes,
 if ( (epdx >> shift) >= memnodemapsize )
 return 0;
 do {
-if ( memnodemap[spdx >> shift] != NUMA_NO_NODE )
+if ( memnodemap[spdx >> shift] != XEN_NUMA_NO_NODE )
 return -1;
 
 if ( !nodeids )
@@ -199,7 +199,7 @@ void __init numa_init_array(void)
 rr = first_node(node_online_map);
 for ( i = 0; i < nr_cpu_ids; i++ )
 {
-if ( cpu_to_node[i] != NUMA_NO_NODE )
+if ( cpu_to_node[i] != XEN_NUMA_NO_NODE )
 continue;
 numa_set_node(i, rr);
 rr = next_node(rr, node_online_map);
@@ -350,7 +350,7 @@ void __init init_cpu_to_node(void)
 if ( apicid == BAD_APICID )
 continue;
 node = apicid_to_node[apicid];
-if ( node == NUMA_NO_NODE || !node_online(node) )
+if ( node == XEN_NUMA_NO_NODE || !node_online(node) )
 node = 0;
 numa_set_node(i, node);
 }
@@ -433,7 +433,7 @@ static void dump_numa(unsigned char key)
 
 err = snprintf(keyhandler_scratch, 12, "%3u",
 vnuma->vnode_to_pnode[i]);
-if ( err < 0 || vnuma->vnode_to_pnode[i] == NUMA_NO_NODE )
+if ( err < 0 || vnuma->vnode_to_pnode[i] == XEN_NUMA_NO_NODE )
 strlcpy(keyhandler_scratch, "???", sizeof(keyhandler_scratch));
 
 printk("   %3u: pnode %s,", i, keyhandler_scratch);
diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c
index 1be1d50..a3a9564 100644
--- a/xen/arch/x86/physdev.c
+++ b/xen/arch/x86/physdev.c
@@ -146,7 +146,7 @@ int physdev_map_pirq(domid_t domid, int type, int *index, 
int *pirq_p,
 irq = *index;
 if ( irq == -1 )
 case MAP_PIRQ_TYPE_MULTI_MSI:
-irq = create_irq(NUMA_NO_NODE);
+irq = create_irq(XEN_NUMA_N

[Xen-devel] [PATCH v6 11/23] libxl: functions to build vmemranges for PV guest

2015-02-26 Thread Wei Liu
Introduce a arch-independent routine to generate one vmemrange per
vnode. Also introduce arch-dependent routines for different
architectures because part of the process is arch-specific -- ARM has
yet have NUMA support and E820 is x86 only.

For those x86 guests who care about machine E820 map (i.e. with
e820_host=1), vnode is further split into several vmemranges to
accommodate memory holes.  A few stubs for libxl_arm.c are created.

Signed-off-by: Wei Liu 
Reviewed-by: Dario Faggioli 
Cc: Ian Campbell 
Cc: Ian Jackson 
Cc: Dario Faggioli 
Cc: Elena Ufimtseva 
---
Changes in v5:
1. Allocate array all in one go.
2. Reverse the logic of vmemranges generation.

Changes in v4:
1. Adapt to new interface.
2. Address Ian Jackson's comments.

Changes in v3:
1. Rewrite commit log.
---
 tools/libxl/libxl_arch.h |  6 
 tools/libxl/libxl_arm.c  |  8 +
 tools/libxl/libxl_internal.h |  8 +
 tools/libxl/libxl_vnuma.c| 41 +
 tools/libxl/libxl_x86.c  | 73 
 5 files changed, 136 insertions(+)

diff --git a/tools/libxl/libxl_arch.h b/tools/libxl/libxl_arch.h
index d3bc136..e249048 100644
--- a/tools/libxl/libxl_arch.h
+++ b/tools/libxl/libxl_arch.h
@@ -27,4 +27,10 @@ int libxl__arch_domain_init_hw_description(libxl__gc *gc,
 int libxl__arch_domain_finalise_hw_description(libxl__gc *gc,
   libxl_domain_build_info *info,
   struct xc_dom_image *dom);
+
+/* build vNUMA vmemrange with arch specific information */
+int libxl__arch_vnuma_build_vmemrange(libxl__gc *gc,
+  uint32_t domid,
+  libxl_domain_build_info *b_info,
+  libxl__domain_build_state *state);
 #endif
diff --git a/tools/libxl/libxl_arm.c b/tools/libxl/libxl_arm.c
index 65a762b..7da254f 100644
--- a/tools/libxl/libxl_arm.c
+++ b/tools/libxl/libxl_arm.c
@@ -707,6 +707,14 @@ int libxl__arch_domain_finalise_hw_description(libxl__gc 
*gc,
 return 0;
 }
 
+int libxl__arch_vnuma_build_vmemrange(libxl__gc *gc,
+  uint32_t domid,
+  libxl_domain_build_info *info,
+  libxl__domain_build_state *state)
+{
+return libxl__vnuma_build_vmemrange_pv_generic(gc, domid, info, state);
+}
+
 /*
  * Local variables:
  * mode: C
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 258be0d..7d1e1cf 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3400,6 +3400,14 @@ void libxl__numa_candidate_put_nodemap(libxl__gc *gc,
 int libxl__vnuma_config_check(libxl__gc *gc,
   const libxl_domain_build_info *b_info,
   const libxl__domain_build_state *state);
+int libxl__vnuma_build_vmemrange_pv_generic(libxl__gc *gc,
+uint32_t domid,
+libxl_domain_build_info *b_info,
+libxl__domain_build_state *state);
+int libxl__vnuma_build_vmemrange_pv(libxl__gc *gc,
+uint32_t domid,
+libxl_domain_build_info *b_info,
+libxl__domain_build_state *state);
 
 _hidden int libxl__ms_vm_genid_set(libxl__gc *gc, uint32_t domid,
const libxl_ms_vm_genid *id);
diff --git a/tools/libxl/libxl_vnuma.c b/tools/libxl/libxl_vnuma.c
index 33d7a3c..04672b5 100644
--- a/tools/libxl/libxl_vnuma.c
+++ b/tools/libxl/libxl_vnuma.c
@@ -14,6 +14,7 @@
  */
 #include "libxl_osdeps.h" /* must come before any other headers */
 #include "libxl_internal.h"
+#include "libxl_arch.h"
 #include 
 
 /* Sort vmemranges in ascending order with "start" */
@@ -142,6 +143,46 @@ out:
 return rc;
 }
 
+
+int libxl__vnuma_build_vmemrange_pv_generic(libxl__gc *gc,
+uint32_t domid,
+libxl_domain_build_info *b_info,
+libxl__domain_build_state *state)
+{
+int i;
+uint64_t next;
+xen_vmemrange_t *v = NULL;
+
+/* Generate one vmemrange for each virtual node. */
+GCREALLOC_ARRAY(v, b_info->num_vnuma_nodes);
+next = 0;
+for (i = 0; i < b_info->num_vnuma_nodes; i++) {
+libxl_vnode_info *p = &b_info->vnuma_nodes[i];
+
+v[i].start = next;
+v[i].end = next + (p->memkb << 10);
+v[i].flags = 0;
+v[i].nid = i;
+
+next = v[i].end;
+}
+
+state->vmemranges = v;
+state->num_vmemranges = i;
+
+return 0;
+}
+
+/* Build vmemranges for PV guest */
+int libxl__vnuma_build_vmemrange_pv(libxl__gc *gc,
+uint32_t domid,
+l

[Xen-devel] [PATCH v10 2/4] tools/libxc: code refactoring in xc_psr_cmt_get_data

2015-02-26 Thread Chao Peng
Use calculated array index instead of hardcoded array index.
No functional change involved.

Signed-off-by: Chao Peng 
---
 tools/libxc/xc_psr.c | 24 +---
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/tools/libxc/xc_psr.c b/tools/libxc/xc_psr.c
index cfae172..70d9067 100644
--- a/tools/libxc/xc_psr.c
+++ b/tools/libxc/xc_psr.c
@@ -143,7 +143,7 @@ int xc_psr_cmt_get_data(xc_interface *xch, uint32_t rmid, 
uint32_t cpu,
 {
 xc_resource_op_t op;
 xc_resource_entry_t entries[2];
-uint32_t evtid;
+uint32_t evtid, nr = 0;
 int rc;
 
 switch ( type )
@@ -155,25 +155,27 @@ int xc_psr_cmt_get_data(xc_interface *xch, uint32_t rmid, 
uint32_t cpu,
 return -1;
 }
 
-entries[0].u.cmd = XEN_RESOURCE_OP_MSR_WRITE;
-entries[0].idx = MSR_IA32_CMT_EVTSEL;
-entries[0].val = (uint64_t)rmid << 32 | evtid;
-entries[0].rsvd = 0;
+entries[nr].u.cmd = XEN_RESOURCE_OP_MSR_WRITE;
+entries[nr].idx = MSR_IA32_CMT_EVTSEL;
+entries[nr].val = (uint64_t)rmid << 32 | evtid;
+entries[nr].rsvd = 0;
+nr++;
 
-entries[1].u.cmd = XEN_RESOURCE_OP_MSR_READ;
-entries[1].idx = MSR_IA32_CMT_CTR;
-entries[1].val = 0;
-entries[1].rsvd = 0;
+entries[nr].u.cmd = XEN_RESOURCE_OP_MSR_READ;
+entries[nr].idx = MSR_IA32_CMT_CTR;
+entries[nr].val = 0;
+entries[nr].rsvd = 0;
+nr++;
 
 op.cpu = cpu;
-op.nr_entries = 2;
+op.nr_entries = nr;
 op.entries = entries;
 
 rc = xc_resource_op(xch, 1, &op);
 if ( rc < 0 )
 return rc;
 
-if ( op.result !=2 || entries[1].val & IA32_CMT_CTR_ERROR_MASK )
+if ( op.result != nr || entries[1].val & IA32_CMT_CTR_ERROR_MASK )
 return -1;
 
 *monitor_data = entries[1].val;
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v6 08/23] libxl: add vmemrange to libxl__domain_build_state

2015-02-26 Thread Wei Liu
A vnode consists of one or more vmemranges (virtual memory range).  One
example of multiple vmemranges is that there is a hole in one vnode.

Currently we haven't exported vmemrange interface to libxl user.
Vmemranges are generated during domain build, so we have relevant
structures in domain build state.

Later if we discover we need to export the interface, those structures
can be moved to libxl_domain_build_info as well.

These new fields (along with other fields in that struct) are set to 0
at start of day so we don't need to explicitly initialise them. A
following patch which introduces an independent checking function will
need to access these fields. I don't feel very comfortable squashing
this change into that one so I didn't use a single commit.

Signed-off-by: Wei Liu 
Reviewed-by: Dario Faggioli 
Cc: Ian Campbell 
Cc: Ian Jackson 
Cc: Dario Faggioli 
Cc: Elena Ufimtseva 
Acked-by: Ian Campbell 
---
Changes in v5:
1. Fix commit message.

Changes in v4:
1. Improve commit message.

Changes in v3:
1. Rewrite commit message.
---
 tools/libxl/libxl_internal.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 934465a..6d3ac58 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -973,6 +973,9 @@ typedef struct {
 libxl__file_reference pv_ramdisk;
 const char * pv_cmdline;
 bool pvh_enabled;
+
+xen_vmemrange_t *vmemranges;
+uint32_t num_vmemranges;
 } libxl__domain_build_state;
 
 _hidden int libxl__build_pre(libxl__gc *gc, uint32_t domid,
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Shared page tables between ETP and IOMMU issue

2015-02-26 Thread Roger Pau Monné
El 26/02/15 a les 16.57, Jan Beulich ha escrit:
 On 26.02.15 at 16:45,  wrote:
>> While testing PVH Dom0 support on a newer Core i3-5010U I've found that 
>> sharing the page tables between EPT and the IOMMUs don't work. Booting 
>> with iommu=no-sharept solves the problem, but I'm unsure what causes 
>> this issue.
> 
> Is FreeBSD fiddling with its own memory map in some way? It's rather
> surprising to see not just an occasional fault, but many of them, and
> with L2 or even L3 entries not present.

No, FreeBSD doesn't touch the physical memory map at all. No ballooning
or anything like that.

> I.e. if it's not the OS
> requesting re-arrangements, I would suppose table setup itself is
> screwed up in some way. In the end - knowing the valid GFN range
> for the guest - you may want to monitor/log how tables get created
> and whether (and if so by whom) later some of the entries get
> zapped.

OK, I will try to take a look. All those faults come from physical
memory ranges that are supposed to be usable, and in fact the CPU seems
to be able to read/write from them without problems, or else the guest
would have crashed much more early. Regarding sharing the page tables
between EPT and the IOMMU, is there some bit that needs to be set in the
ept entry in order to mark a page as available by the IOMMU?

Roger.


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] RFC: Automatically check xen's public headers for C++ pitfalls.

2015-02-26 Thread Andrew Cooper
On 26/02/15 16:28, Tim Deegan wrote:
> At 16:11 + on 26 Feb (1424963496), Tim Deegan wrote:
>> Add a check, like the existing check for non-ANSI C in the public
>> headers, that runs the public headers through a C++ compiler to
>> flag non-C++-friendly constructs.
> Oops, this still has the EFI changes in it.  v3, rebased, is on its way.
>
>> Unlike the ANSI C check, we accept GCC-isms (gnu++98), and we also
>> check various tools-only headers.
>>
>> Explicitly _not_ addressing the use of 'private' in various fields,
>> since we'd previously decided not to fix that.
> BTW, ring.h is the only instance of that, so the extra diff to clear
> that up too is pretty small (see below).
>
> Not sure what people think about that though - it might be
> quite a PITA for downstream users of it, though they ought really to
> be using local copies so they can update in a controlled way.

It is basically no effort, wont (directly) break consumers, and will
make the headers fully friendly (other than extern C, which can be dealt
with using the C++ #include  pattern).

+1 throw this in and be done with the incompatibilities for good.

~Andrew

>
> diff --git a/xen/include/Makefile b/xen/include/Makefile
> index d48a642..c7a1d52 100644
> --- a/xen/include/Makefile
> +++ b/xen/include/Makefile
> @@ -104,8 +104,7 @@ headers.chk: $(PUBLIC_ANSI_HEADERS) Makefile
>  headers++.chk: $(PUBLIC_HEADERS) Makefile
>   if $(CXX) -v >/dev/null 2>&1; then \
>   for i in $(filter %.h,$^); do \
> - $(CXX) -x c++ -std=gnu++98 -Wall -Werror \
> --D__XEN_TOOLS__ -Dprivate=private_is_a_keyword_in_cpp \
> + $(CXX) -x c++ -std=gnu++98 -Wall -Werror -D__XEN_TOOLS__ \
>  -include stdint.h -include public/xen.h \
>  -S -o /dev/null $$i || exit 1; \
>   echo $$i; \
> diff --git a/xen/include/public/io/ring.h b/xen/include/public/io/ring.h
> index 73e13d7..bb13494 100644
> --- a/xen/include/public/io/ring.h
> +++ b/xen/include/public/io/ring.h
> @@ -111,7 +111,7 @@ struct __name##_sring {   
>   \
>  uint8_t msg;\
>  } tapif_user;   \
>  uint8_t pvt_pad[4]; \
> -} private;  \
> +} local;\
>  uint8_t __pad[44];  \
>  union __name##_sring_entry ring[1]; /* variable-length */   \
>  };  \
> @@ -156,7 +156,7 @@ typedef struct __name##_back_ring __name##_back_ring_t
>  #define SHARED_RING_INIT(_s) do {   \
>  (_s)->req_prod  = (_s)->rsp_prod  = 0;  \
>  (_s)->req_event = (_s)->rsp_event = 1;  \
> -(void)memset((_s)->private.pvt_pad, 0, sizeof((_s)->private.pvt_pad)); \
> +(void)memset((_s)->local.pvt_pad, 0, sizeof((_s)->local.pvt_pad));  \
>  (void)memset((_s)->__pad, 0, sizeof((_s)->__pad));  \
>  } while(0)
>  
>
>
>
> ___
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v2] RFC: Automatically check xen's public headers for C++ pitfalls.

2015-02-26 Thread Tim Deegan
At 16:47 + on 26 Feb (1424965651), Jan Beulich wrote:
> >>> On 26.02.15 at 17:28,  wrote:
> > At 16:11 + on 26 Feb (1424963496), Tim Deegan wrote:
> >> Explicitly _not_ addressing the use of 'private' in various fields,
> >> since we'd previously decided not to fix that.
> > 
> > BTW, ring.h is the only instance of that, so the extra diff to clear
> > that up too is pretty small (see below).
> > 
> > Not sure what people think about that though - it might be
> > quite a PITA for downstream users of it, though they ought really to
> > be using local copies so they can update in a controlled way.
> 
> linux-2.6.18-xen.hg always having consumed them (almost)
> verbatim, I don't think we should break users not massaging
> the headers. I.e. at least make the field name conditional upon
> using C vs C++.

Something like this?  This is the kind of uglification that I would
like to avoid, though (and I don't like '#define private pvt' much
either).

Tim.

diff --git a/xen/include/Makefile b/xen/include/Makefile
index d48a642..c7a1d52 100644
--- a/xen/include/Makefile
+++ b/xen/include/Makefile
@@ -104,8 +104,7 @@ headers.chk: $(PUBLIC_ANSI_HEADERS) Makefile
 headers++.chk: $(PUBLIC_HEADERS) Makefile
if $(CXX) -v >/dev/null 2>&1; then \
for i in $(filter %.h,$^); do \
-   $(CXX) -x c++ -std=gnu++98 -Wall -Werror \
-  -D__XEN_TOOLS__ -Dprivate=private_is_a_keyword_in_cpp \
+   $(CXX) -x c++ -std=gnu++98 -Wall -Werror -D__XEN_TOOLS__ \
   -include stdint.h -include public/xen.h \
   -S -o /dev/null $$i || exit 1; \
echo $$i; \
diff --git a/xen/include/public/io/ring.h b/xen/include/public/io/ring.h
index 73e13d7..86fb991 100644
--- a/xen/include/public/io/ring.h
+++ b/xen/include/public/io/ring.h
@@ -35,6 +35,15 @@
 #define xen_wmb() wmb()
 #endif
 
+#ifdef __cplusplus
+/* 'private' is a keyword in C++, so we have to use a different name for
+ * private state there.  Leaving the C name alone to avoid unnecessary
+ * pain for the existing users. */
+#define XEN_RING_PRIVATE pvt
+#else
+#define XEN_RING_PRIVATE private
+#endif
+
 typedef unsigned int RING_IDX;
 
 /* Round a 32-bit unsigned constant down to the nearest power of two. */
@@ -111,7 +120,7 @@ struct __name##_sring { 
\
 uint8_t msg;\
 } tapif_user;   \
 uint8_t pvt_pad[4]; \
-} private;  \
+} XEN_RING_PRIVATE; \
 uint8_t __pad[44];  \
 union __name##_sring_entry ring[1]; /* variable-length */   \
 };  \
@@ -156,7 +165,8 @@ typedef struct __name##_back_ring __name##_back_ring_t
 #define SHARED_RING_INIT(_s) do {   \
 (_s)->req_prod  = (_s)->rsp_prod  = 0;  \
 (_s)->req_event = (_s)->rsp_event = 1;  \
-(void)memset((_s)->private.pvt_pad, 0, sizeof((_s)->private.pvt_pad)); \
+(void)memset((_s)->XEN_RING_PRIVATE.pvt_pad, 0, \
+ sizeof((_s)->XEN_RING_PRIVATE.pvt_pad));   \
 (void)memset((_s)->__pad, 0, sizeof((_s)->__pad));  \
 } while(0)
 

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v6 14/23] libxc: allocate memory with vNUMA information for HVM guest

2015-02-26 Thread Wei Liu
The algorithm is more or less the same as the one used for PV guest.
Libxc gets hold of the mapping of vnode to pnode and size of each vnode
then allocate memory accordingly.

And then the function returns low memory end, high memory end and mmio
start to caller. Libxl needs those values to construct vmemranges for
that guest.

Signed-off-by: Wei Liu 
Cc: Ian Campbell 
Cc: Ian Jackson 
Cc: Dario Faggioli 
Cc: Elena Ufimtseva 
---
Changes in v6:
1. Use XEN_NUMA_NO_NODE.
2. Fix a minor bug discovered by Dario.

Changes in v5:
1. Use a better loop variable name vnid.

Changes in v4:
1. Adapt to new interface.
2. Shorten error message.
3. This patch includes only functional changes.

Changes in v3:
1. Rewrite commit log.
2. Add a few code comments.
---
 tools/libxc/include/xenguest.h |  11 +
 tools/libxc/xc_hvm_build_x86.c | 102 ++---
 2 files changed, 97 insertions(+), 16 deletions(-)

diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h
index 40bbac8..ff66cb1 100644
--- a/tools/libxc/include/xenguest.h
+++ b/tools/libxc/include/xenguest.h
@@ -230,6 +230,17 @@ struct xc_hvm_build_args {
 struct xc_hvm_firmware_module smbios_module;
 /* Whether to use claim hypercall (1 - enable, 0 - disable). */
 int claim_enabled;
+
+/* vNUMA information*/
+xen_vmemrange_t *vmemranges;
+unsigned int nr_vmemranges;
+unsigned int *vnode_to_pnode;
+unsigned int nr_vnodes;
+
+/* Out parameters  */
+uint64_t lowmem_end;
+uint64_t highmem_end;
+uint64_t mmio_start;
 };
 
 /**
diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c
index ecc3224..fba02fb 100644
--- a/tools/libxc/xc_hvm_build_x86.c
+++ b/tools/libxc/xc_hvm_build_x86.c
@@ -89,7 +89,8 @@ static int modules_init(struct xc_hvm_build_args *args,
 }
 
 static void build_hvm_info(void *hvm_info_page, uint64_t mem_size,
-   uint64_t mmio_start, uint64_t mmio_size)
+   uint64_t mmio_start, uint64_t mmio_size,
+   struct xc_hvm_build_args *args)
 {
 struct hvm_info_table *hvm_info = (struct hvm_info_table *)
 (((unsigned char *)hvm_info_page) + HVM_INFO_OFFSET);
@@ -119,6 +120,10 @@ static void build_hvm_info(void *hvm_info_page, uint64_t 
mem_size,
 hvm_info->high_mem_pgend = highmem_end >> PAGE_SHIFT;
 hvm_info->reserved_mem_pgstart = ioreq_server_pfn(0);
 
+args->lowmem_end = lowmem_end;
+args->highmem_end = highmem_end;
+args->mmio_start = mmio_start;
+
 /* Finish with the checksum. */
 for ( i = 0, sum = 0; i < hvm_info->length; i++ )
 sum += ((uint8_t *)hvm_info)[i];
@@ -244,7 +249,7 @@ static int setup_guest(xc_interface *xch,
char *image, unsigned long image_size)
 {
 xen_pfn_t *page_array = NULL;
-unsigned long i, nr_pages = args->mem_size >> PAGE_SHIFT;
+unsigned long i, vmemid, nr_pages = args->mem_size >> PAGE_SHIFT;
 unsigned long target_pages = args->mem_target >> PAGE_SHIFT;
 uint64_t mmio_start = (1ull << 32) - args->mmio_size;
 uint64_t mmio_size = args->mmio_size;
@@ -258,13 +263,13 @@ static int setup_guest(xc_interface *xch,
 xen_capabilities_info_t caps;
 unsigned long stat_normal_pages = 0, stat_2mb_pages = 0, 
 stat_1gb_pages = 0;
-int pod_mode = 0;
+unsigned int memflags = 0;
 int claim_enabled = args->claim_enabled;
 xen_pfn_t special_array[NR_SPECIAL_PAGES];
 xen_pfn_t ioreq_server_array[NR_IOREQ_SERVER_PAGES];
-
-if ( nr_pages > target_pages )
-pod_mode = XENMEMF_populate_on_demand;
+uint64_t total_pages;
+xen_vmemrange_t dummy_vmemrange;
+unsigned int dummy_vnode_to_pnode;
 
 memset(&elf, 0, sizeof(elf));
 if ( elf_init(&elf, image, image_size) != 0 )
@@ -276,6 +281,43 @@ static int setup_guest(xc_interface *xch,
 v_start = 0;
 v_end = args->mem_size;
 
+if ( nr_pages > target_pages )
+memflags |= XENMEMF_populate_on_demand;
+
+if ( args->nr_vmemranges == 0 )
+{
+/* Build dummy vnode information */
+dummy_vmemrange.start = 0;
+dummy_vmemrange.end   = args->mem_size;
+dummy_vmemrange.flags = 0;
+dummy_vmemrange.nid   = 0;
+args->nr_vmemranges = 1;
+args->vmemranges = &dummy_vmemrange;
+
+dummy_vnode_to_pnode = XEN_NUMA_NO_NODE;
+args->nr_vnodes = 1;
+args->vnode_to_pnode = &dummy_vnode_to_pnode;
+}
+else
+{
+if ( nr_pages > target_pages )
+{
+PERROR("Cannot enable vNUMA and PoD at the same time");
+goto error_out;
+}
+}
+
+total_pages = 0;
+for ( i = 0; i < args->nr_vmemranges; i++ )
+total_pages += ((args->vmemranges[i].end - args->vmemranges[i].start)
+>> PAGE_SHIFT);
+if ( total_pages != (args->mem_size >> PAGE_SHIFT) )
+{
+PERROR("vNUMA memory pages mismatch (

[Xen-devel] [PATCH v6 22/23] xl: introduce xcalloc

2015-02-26 Thread Wei Liu
Signed-off-by: Wei Liu 
Cc: Ian Campbell 
Cc: Ian Jackson 
---
Changes in v6:
1. Join two lines to make code more compact.
2. Use %zu and drop casting.
---
 tools/libxl/xl_cmdimpl.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 53c16eb..5b366f2 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -289,6 +289,16 @@ static void *xmalloc(size_t sz) {
 return r;
 }
 
+static void *xcalloc(size_t n, size_t sz) __attribute__((unused));
+static void *xcalloc(size_t n, size_t sz) {
+void *r = calloc(n, sz);
+if (!r) {
+fprintf(stderr,"xl: Unable to calloc %zu bytes.\n", sz*n);
+exit(-ERROR_FAIL);
+}
+return r;
+}
+
 static void *xrealloc(void *ptr, size_t sz) {
 void *r;
 if (!sz) { free(ptr); return 0; }
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v6 01/23] xen: factor out construct_memop_from_reservation

2015-02-26 Thread Wei Liu
No functional change.

Signed-off-by: Wei Liu 
Cc: Jan Beulich 
Cc: Andrew Cooper 
---
 xen/common/memory.c | 52 +++-
 1 file changed, 35 insertions(+), 17 deletions(-)

diff --git a/xen/common/memory.c b/xen/common/memory.c
index e84ace9..d24b001 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -692,11 +692,43 @@ out:
 return rc;
 }
 
+static int construct_memop_from_reservation(
+   const struct xen_memory_reservation *r,
+   struct memop_args *a)
+{
+int rc;
+unsigned int address_bits;
+
+a->extent_list  = r->extent_start;
+a->nr_extents   = r->nr_extents;
+a->extent_order = r->extent_order;
+a->memflags = 0;
+
+address_bits = XENMEMF_get_address_bits(r->mem_flags);
+if ( (address_bits != 0) &&
+ (address_bits < (get_order_from_pages(max_page) + PAGE_SHIFT)) )
+{
+if ( address_bits <= PAGE_SHIFT )
+{
+rc = -EINVAL;
+goto out;
+}
+a->memflags = MEMF_bits(address_bits);
+}
+
+a->memflags |= MEMF_node(XENMEMF_get_node(r->mem_flags));
+if ( r->mem_flags & XENMEMF_exact_node_request )
+a->memflags |= MEMF_exact_node;
+
+rc = 0;
+ out:
+return rc;
+}
+
 long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 {
 struct domain *d;
 long rc;
-unsigned int address_bits;
 struct xen_memory_reservation reservation;
 struct memop_args args;
 domid_t domid;
@@ -718,25 +750,11 @@ long do_memory_op(unsigned long cmd, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 if ( unlikely(start_extent >= reservation.nr_extents) )
 return start_extent;
 
-args.extent_list  = reservation.extent_start;
-args.nr_extents   = reservation.nr_extents;
-args.extent_order = reservation.extent_order;
 args.nr_done  = start_extent;
 args.preempted= 0;
-args.memflags = 0;
 
-address_bits = XENMEMF_get_address_bits(reservation.mem_flags);
-if ( (address_bits != 0) &&
- (address_bits < (get_order_from_pages(max_page) + PAGE_SHIFT)) )
-{
-if ( address_bits <= PAGE_SHIFT )
-return start_extent;
-args.memflags = MEMF_bits(address_bits);
-}
-
-args.memflags |= MEMF_node(XENMEMF_get_node(reservation.mem_flags));
-if ( reservation.mem_flags & XENMEMF_exact_node_request )
-args.memflags |= MEMF_exact_node;
+if ( construct_memop_from_reservation(&reservation, &args) )
+return start_extent;
 
 if ( op == XENMEM_populate_physmap
  && (reservation.mem_flags & XENMEMF_populate_on_demand) )
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v10 4/4] tools, docs: add total/local memory bandwith monitoring

2015-02-26 Thread Chao Peng
Add Memory Bandwidth Monitoring(MBM) for VMs. Two types of monitoring
are supported: total and local memory bandwidth monitoring. To use it,
CMT should be enabled in hypervisor.

Signed-off-by: Chao Peng 
---
Changes in v10:
1. Move refactoring code into standalone patch.
2. Create generic interface libxl_psr_cmt_get_sample for both
   cache_occupancy and memory bandwith.
Changes in v9:
1. Refactor code in xc_psr_cmt_get_data.
2. Move bandwidth calculation(sleep) from libxl to xl.
3. Broadcast feature with LIBXL_HAVE_PSR_MBM.
4. Check event mask with libxl_psr_cmt_type_supported.
5. Coding style/Document fix.
Changes in v6:
1. Remove DISABLE_IRQ flag as hypervisor disable IRQ for MSR_IA32_TSC
   implicitly.
Changes in v5:
1. Add MBM description in xen command line.
2. Use the tsc from hypervisor directly which is already ns.
3. Call resource_op with DISABLE_IRQ flag.
Changes in v4:
1. Get timestamp from hypervisor and use that for bandwidth calculation.
2. Minor document and coding style fix.
---
 docs/man/xl.pod.1   | 11 +-
 docs/misc/xen-command-line.markdown |  3 ++
 tools/libxc/include/xenctrl.h   |  6 +++-
 tools/libxc/xc_msr_x86.h|  1 +
 tools/libxc/xc_psr.c| 44 +--
 tools/libxl/libxl.h | 17 +
 tools/libxl/libxl_psr.c | 56 +++--
 tools/libxl/libxl_types.idl |  2 ++
 tools/libxl/xl_cmdimpl.c| 72 +
 tools/libxl/xl_cmdtable.c   |  4 ++-
 10 files changed, 195 insertions(+), 21 deletions(-)

diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1
index 6b89ba8..cd80ffc 100644
--- a/docs/man/xl.pod.1
+++ b/docs/man/xl.pod.1
@@ -1461,6 +1461,13 @@ is domain level. To monitor a specific domain, just 
attach the domain id with
 the monitoring service. When the domain doesn't need to be monitored any more,
 detach the domain id from the monitoring service.
 
+Intel Broadwell and later server platforms also offer total/local memory
+bandwidth monitoring. Xen supports per-domain monitoring for these two
+additional monitoring types. Both memory bandwidth monitoring and L3 cache
+occupancy monitoring share the same set of underlying monitoring service. Once
+a domain is attached to the monitoring service, monitoring data can be showed
+for any of these monitoring types.
+
 =over 4
 
 =item B [I]
@@ -1475,7 +1482,9 @@ detach: Detach the platform shared resource monitoring 
service from a domain.
 
 Show monitoring data for a certain domain or all domains. Current supported
 monitor types are:
- - "cache-occupancy": showing the L3 cache occupancy.
+ - "cache-occupancy": showing the L3 cache occupancy(KB).
+ - "total-mem-bandwidth": showing the total memory bandwidth(KB/s).
+ - "local-mem-bandwidth": showing the local memory bandwidth(KB/s).
 
 =back
 
diff --git a/docs/misc/xen-command-line.markdown 
b/docs/misc/xen-command-line.markdown
index bc316be..a09ec01 100644
--- a/docs/misc/xen-command-line.markdown
+++ b/docs/misc/xen-command-line.markdown
@@ -1097,6 +1097,9 @@ The following resources are available:
   L3 cache occupancy.
   * `cmt` instructs Xen to enable/disable Cache Monitoring Technology.
   * `rmid_max` indicates the max value for rmid.
+* Memory Bandwidth Monitoring (Broadwell and later). Information regarding the
+  total/local memory bandwidth. Follow the same options with Cache Monitoring
+  Technology.
 
 ### reboot
 > `= t[riple] | k[bd] | a[cpi] | p[ci] | n[o] [, [w]arm | [c]old]`
diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h
index 09d819f..54043ee 100644
--- a/tools/libxc/include/xenctrl.h
+++ b/tools/libxc/include/xenctrl.h
@@ -2688,6 +2688,8 @@ int xc_resource_op(xc_interface *xch, uint32_t nr_ops, 
xc_resource_op_t *ops);
 #if defined(__i386__) || defined(__x86_64__)
 enum xc_psr_cmt_type {
 XC_PSR_CMT_L3_OCCUPANCY,
+XC_PSR_CMT_TOTAL_MEM_BANDWIDTH,
+XC_PSR_CMT_LOCAL_MEM_BANDWIDTH,
 };
 typedef enum xc_psr_cmt_type xc_psr_cmt_type;
 int xc_psr_cmt_attach(xc_interface *xch, uint32_t domid);
@@ -2697,10 +2699,12 @@ int xc_psr_cmt_get_domain_rmid(xc_interface *xch, 
uint32_t domid,
 int xc_psr_cmt_get_total_rmid(xc_interface *xch, uint32_t *total_rmid);
 int xc_psr_cmt_get_l3_upscaling_factor(xc_interface *xch,
uint32_t *upscaling_factor);
+int xc_psr_cmt_get_l3_event_mask(xc_interface *xch, uint32_t *event_mask);
 int xc_psr_cmt_get_l3_cache_size(xc_interface *xch, uint32_t cpu,
  uint32_t *l3_cache_size);
 int xc_psr_cmt_get_data(xc_interface *xch, uint32_t rmid, uint32_t cpu,
-uint32_t psr_cmt_type, uint64_t *monitor_data);
+uint32_t psr_cmt_type, uint64_t *monitor_data,
+uint64_t *tsc);
 int xc_psr_cmt_enabled(xc_interface *xch);
 #endif
 
diff --git a/tools/libxc/xc_msr_x86.h b/tools/libxc/xc_msr_x86.h
index 7c3e1a3..7

Re: [Xen-devel] Shared page tables between ETP and IOMMU issue

2015-02-26 Thread Jan Beulich
>>> On 26.02.15 at 17:29,  wrote:
> OK, I will try to take a look. All those faults come from physical
> memory ranges that are supposed to be usable, and in fact the CPU seems
> to be able to read/write from them without problems, or else the guest
> would have crashed much more early. Regarding sharing the page tables
> between EPT and the IOMMU, is there some bit that needs to be set in the
> ept entry in order to mark a page as available by the IOMMU?

Bits 0 and 1 (read and write) are shared between VT-d and EPT
(as is bit 7 - see struct dma_pte and ept_entry_t).

Jan


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/1] xen-netback: remove compilation warning

2015-02-26 Thread David Miller
From: pedro 
Date: Thu, 26 Feb 2015 09:25:41 +0100

> From: pmarzo 
> 
> offset and size are of type uint16_t so the %lu gives a warning
> A %u specifier, the same used in size makes gcc happy
> Not sure if a %x would be more correct
> 
> Signed-off-by: Pedro Marzo Perez 

This patch actually adds a warning on my machine, and your analysis
of the types is therefore probably incorrect:

drivers/net/xen-netback/netback.c: In function ‘xenvif_tx_build_gops’:
drivers/net/xen-netback/netback.c:1259:8: warning: format ‘%u’ expects argument 
of type ‘unsigned int’, but argument 5 has type ‘long unsigned int’ [-Wformat=]

The issue is probably "~PAGE_MASK" and I think the type of that
propagates into the type of the overall calculation.
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH v6 11/23] libxl: functions to build vmemranges for PV guest

2015-02-26 Thread Dario Faggioli
On Thu, 2015-02-26 at 15:55 +, Wei Liu wrote:
> Introduce a arch-independent routine to generate one vmemrange per
> vnode. Also introduce arch-dependent routines for different
> architectures because part of the process is arch-specific -- ARM has
> yet have NUMA support and E820 is x86 only.
> 
> For those x86 guests who care about machine E820 map (i.e. with
> e820_host=1), vnode is further split into several vmemranges to
> accommodate memory holes.  A few stubs for libxl_arm.c are created.
> 
> Signed-off-by: Wei Liu 
> Reviewed-by: Dario Faggioli 
> Cc: Ian Campbell 
> Cc: Ian Jackson 
> Cc: Dario Faggioli 
> Cc: Elena Ufimtseva 

> diff --git a/tools/libxl/libxl_vnuma.c b/tools/libxl/libxl_vnuma.c
> index 33d7a3c..04672b5 100644
> --- a/tools/libxl/libxl_vnuma.c
> +++ b/tools/libxl/libxl_vnuma.c
> @@ -14,6 +14,7 @@
>   */
>  #include "libxl_osdeps.h" /* must come before any other headers */
>  #include "libxl_internal.h"
> +#include "libxl_arch.h"
>  #include 
>  
>  /* Sort vmemranges in ascending order with "start" */
> @@ -142,6 +143,46 @@ out:
>  return rc;
>  }
>  
> +
Aren't you adding an extra, unnecessary, blank line here?

> +int libxl__vnuma_build_vmemrange_pv_generic(libxl__gc *gc,
> +uint32_t domid,
> +libxl_domain_build_info *b_info,
> +libxl__domain_build_state *state)
>



Of course, my Reviewed-by still stands... I just noticed this while
having a quick look. So, if you happen to have to resend... :-)

Regards,
Dario


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v10 3/4] tools/libxl: code refactoring for MBM

2015-02-26 Thread Chao Peng
Make some internal routines common so that total/local memory bandwidth
monitoring in the next patch can make use of them.

Signed-off-by: Chao Peng 
Acked-by: Wei Liu 
---
Changes in v10:
1. Merge libxl change into next patch.
2. Minor function name changes to make them more generic.
---
 tools/libxl/xl_cmdimpl.c | 54 +---
 1 file changed, 33 insertions(+), 21 deletions(-)

diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 8b41093..846a4b2 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -7822,8 +7822,9 @@ out:
 }
 
 #ifdef LIBXL_HAVE_PSR_CMT
-static void psr_cmt_print_domain_cache_occupancy(libxl_dominfo *dominfo,
- uint32_t nr_sockets)
+static void psr_cmt_print_domain_info(libxl_dominfo *dominfo,
+  libxl_psr_cmt_type type,
+  uint32_t nr_sockets)
 {
 char *domain_name;
 uint32_t socketid;
@@ -7837,15 +7838,23 @@ static void 
psr_cmt_print_domain_cache_occupancy(libxl_dominfo *dominfo,
 free(domain_name);
 
 for (socketid = 0; socketid < nr_sockets; socketid++) {
-if (!libxl_psr_cmt_get_cache_occupancy(ctx, dominfo->domid, socketid,
-   &l3_cache_occupancy))
-printf("%13u KB", l3_cache_occupancy);
+switch (type) {
+case LIBXL_PSR_CMT_TYPE_CACHE_OCCUPANCY:
+if (!libxl_psr_cmt_get_cache_occupancy(ctx,
+   dominfo->domid,
+   socketid,
+   &l3_cache_occupancy))
+printf("%13u KB", l3_cache_occupancy);
+break;
+default:
+return;
+}
 }
 
 printf("\n");
 }
 
-static int psr_cmt_show_cache_occupancy(uint32_t domid)
+static int psr_cmt_show(libxl_psr_cmt_type type, uint32_t domid)
 {
 uint32_t i, socketid, nr_sockets, total_rmid;
 uint32_t l3_cache_size;
@@ -7881,19 +7890,22 @@ static int psr_cmt_show_cache_occupancy(uint32_t domid)
 printf("%14s %d", "Socket", socketid);
 printf("\n");
 
-/* Total L3 cache size */
-printf("%-46s", "Total L3 Cache Size");
-for (socketid = 0; socketid < nr_sockets; socketid++) {
-rc = libxl_psr_cmt_get_l3_cache_size(ctx, socketid, &l3_cache_size);
-if (rc < 0) {
-fprintf(stderr,
-"Failed to get system l3 cache size for socket:%d\n",
-socketid);
-return -1;
-}
-printf("%13u KB", l3_cache_size);
+if (type == LIBXL_PSR_CMT_TYPE_CACHE_OCCUPANCY) {
+/* Total L3 cache size */
+printf("%-46s", "Total L3 Cache Size");
+for (socketid = 0; socketid < nr_sockets; socketid++) {
+rc = libxl_psr_cmt_get_l3_cache_size(ctx, socketid,
+ &l3_cache_size);
+if (rc < 0) {
+fprintf(stderr,
+"Failed to get system l3 cache size for 
socket:%d\n",
+socketid);
+return -1;
+}
+printf("%13u KB", l3_cache_size);
+}
+printf("\n");
 }
-printf("\n");
 
 /* Each domain */
 if (domid != INVALID_DOMID) {
@@ -7902,7 +7914,7 @@ static int psr_cmt_show_cache_occupancy(uint32_t domid)
 fprintf(stderr, "Failed to get domain info for %d\n", domid);
 return -1;
 }
-psr_cmt_print_domain_cache_occupancy(&dominfo, nr_sockets);
+psr_cmt_print_domain_info(&dominfo, type, nr_sockets);
 }
 else
 {
@@ -7912,7 +7924,7 @@ static int psr_cmt_show_cache_occupancy(uint32_t domid)
 return -1;
 }
 for (i = 0; i < nr_domains; i++)
-psr_cmt_print_domain_cache_occupancy(list + i, nr_sockets);
+psr_cmt_print_domain_info(list + i, type, nr_sockets);
 libxl_dominfo_list_free(list, nr_domains);
 }
 return 0;
@@ -7971,7 +7983,7 @@ int main_psr_cmt_show(int argc, char **argv)
 
 switch (type) {
 case LIBXL_PSR_CMT_TYPE_CACHE_OCCUPANCY:
-ret = psr_cmt_show_cache_occupancy(domid);
+ret = psr_cmt_show(type, domid);
 break;
 default:
 help("psr-cmt-show");
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 4/4] xen: add Xen pvUSB maintainer

2015-02-26 Thread Konrad Rzeszutek Wilk
On February 26, 2015 8:35:17 AM EST, Juergen Gross  wrote:
>Add myself as maintainer for the Xen pvUSB stuff.
>
>Signed-off-by: Juergen Gross 
>---
> MAINTAINERS | 8 
> 1 file changed, 8 insertions(+)
>
>diff --git a/MAINTAINERS b/MAINTAINERS
>index ddc5a8c..8ec1e1f 100644
>--- a/MAINTAINERS
>+++ b/MAINTAINERS
>@@ -10787,6 +10787,14 @@ F:drivers/scsi/xen-scsifront.c
> F:drivers/xen/xen-scsiback.c
> F:include/xen/interface/io/vscsiif.h
> 
>+XEN PVUSB DRIVERS
>+M:Juergen Gross 
>+L:xen-de...@lists.xenproject.org (moderated for non-subscribers)
>+L:linux-...@vger.kernel.org
>+S:Supported
>+F:divers/usb/xen/
>+F:include/xen/interface/io/usbif.h

Acked-by: Konrad Rzeszutek Wilk 

On the include/Xen/... part.
>+
> XEN SWIOTLB SUBSYSTEM
> M:Konrad Rzeszutek Wilk 
> L:xen-de...@lists.xenproject.org (moderated for non-subscribers)



___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] [PATCH 1/5] x86: allow specifying the NUMA nodes Dom0 should run on

2015-02-26 Thread Dario Faggioli
On Thu, 2015-02-26 at 13:52 +, Jan Beulich wrote:
> ... by introducing a "dom0_nodes" option augmenting the "dom0_mem" and
> "dom0_max_vcpus" ones.
> 
> Note that this gives meaning to MEMF_exact_node specified alone (i.e.
> implicitly combined with NUMA_NO_NODE): In such a case any node inside
> the domain's node mask is acceptable, but no other node. This changed
> behavior is (implicitly) being exposed through the memop hypercalls.
> 
> Note further that this change doesn't take care of moving the initrd
> image into memory matching Dom0's affinity when the initrd doesn't get
> copied (because of being part of the initial mapping) anyway.
> 
> Signed-off-by: Jan Beulich 
>
Reviewed-by: Dario Faggioli 

Just a couple of questions/comments.

> ---
> I'm uncertain whether range restricting the PXMs permitted for Dom0 is
> the right approach (matching what other NUMA code did until recently),
> or whether we would instead want to simply limit the number of PXMs we
> can handler there (i.e. using a static array instead of a static
> bitmap).
> 
FWIW, I think the approach taken in the patch is ok.

> --- a/docs/misc/xen-command-line.markdown
> +++ b/docs/misc/xen-command-line.markdown
> @@ -540,6 +540,15 @@ any dom0 autoballooning feature present 
>  _xl.conf(5)_ man page or [Xen Best
>  
> Practices](http://wiki.xen.org/wiki/Xen_Best_Practices#Xen_dom0_dedicated_memory_and_preventing_dom0_memory_ballooning).
>  
> +### dom0\_nodes
> +
> +> `= [,...]`
> +
> +Specify the NUMA nodes to place Dom0 on. Defaults for vCPU-s created
> +and memory assigned to Dom0 will be adjusted to match the node
> +restrictions set up here. Note that the values to be specified here are
> +ACPI PXM ones, not Xen internal node numbers.
> +
Why use PXM ids? It might be me being much more used to work with NUMA
node ids, but wouldn't the other way round be more consistent (almost
everything the user interacts with after boot speak node ids) and easier
for the user to figure things out (e.g., with tools like numactl on
baremetal)?

> --- a/xen/arch/x86/domain_build.c
> +++ b/xen/arch/x86/domain_build.c

> +static struct vcpu *__init setup_vcpu(struct domain *d, unsigned int vcpu_id,
> +  unsigned int cpu)
> +{
> +struct vcpu *v = alloc_vcpu(d, vcpu_id, cpu);
> +
> +if ( v )
> +{
> +if ( !d->is_pinned )
> +cpumask_copy(v->cpu_hard_affinity, &dom0_cpus);
> +cpumask_copy(v->cpu_soft_affinity, &dom0_cpus);
> +}
> +
About this, for DomUs, now that we have soft affinity available, what we
do is set only soft affinity to match the NUMA placement. I think I see
and agree why we want to be 'more strict' in Dom0, but I felt like it
was worth to point out the difference in behaviour (should it be
documented somewhere?).

Regards,
Dario

BTW, mostly out of curiosity, I've had a few strange issues/conflicts in
applying this on top of staging, in order to test it... Was it me doing
something very stupid, or was this based on something different?


signature.asc
Description: This is a digitally signed message part
___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] RFC: xen config changes v4

2015-02-26 Thread David Vrabel
On 26/02/15 04:59, Juergen Gross wrote:
> 
> So we are again in the situation that pv-drivers always imply the pvops
> kernel (PARAVIRT selected). I started the whole Kconfig rework to
> eliminate this dependency.

Yes.  Can you produce a series that just addresses this one issue.

In the absence of any concrete requirement for this big Kconfig reorg I
I don't think it is helpful.

David

___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v6 00/23] Virtual NUMA for PV and HVM

2015-02-26 Thread Wei Liu
Hi all

This is version 6 of this series rebased on top of staging.

This patch series implements virtual NUMA support for both PV and HVM guest.
That is, admin can configure via libxl what virtual NUMA topology the guest
sees.

This is the stage 1 (basic vNUMA support) and part of stage 2 (vNUMA-ware
ballooning, hypervisor side) described in my previous email to xen-devel [0].

This series is broken into several parts:

1. xen patches: vNUMA debug output and vNUMA-aware memory hypercall support.
2. libxc/libxl support for PV vNUMA.
3. libxc/libxl/hypervisor support for HVM vNUMA.
4. xl vNUMA configuration documentation and parser.

One significant difference from Elena's work is that this patch series makes
use of multiple vmemranges should there be a memory hole, instead of shrinking
ram. This matches the behaviour of real hardware.

The vNUMA auto placement algorithm is missing at the moment and Dario is
working on it.

This series can be found at:
 git://xenbits.xen.org/people/liuw/xen.git wip.vnuma-v5

With this series, the following configuration can be used to enabled virtual
NUMA support, and it works for both PV and HVM guests.

vnuma = [ [ "pnode=0","size=3000","vcpus=0-3","vdistances=10,20"  ],
  [ "pnode=0","size=3000","vcpus=4-7","vdistances=20,10"  ],
]

For example output of guest NUMA information, please look at [1].

In terms of libxl / libxc internal, things are broken into several
parts:

1. libxl interface

Users of libxl can only specify how many vnodes a guest can have, but
currently they have no control over the actual memory layout. Note that
it's fairly easy to export the interface to control memory layout in the
future.

2. libxl internal

It generates some internal vNUMA configurations when building domain,
then transform them into libxc representations. It also validates vNUMA
configuration along the line.

3. libxc internal

Libxc does what it's told to do. It doesn't do anything smart (in fact,
I delibrately didn't put any smart logic inside it). Libxc will also
report back some information in HVM case to libxl but that's it.

Wei.

[0] <2014173606.gc21...@zion.uk.xensource.com>
[1] <1416582421-10789-1-git-send-email-wei.l...@citrix.com>

Wei Liu (23):
  xen: factor out construct_memop_from_reservation
  xen: move NUMA_NO_NODE to public memory.h as XEN_NUMA_NO_NODE
  xen: make two memory hypercalls vNUMA-aware
  libxc: duplicate snippet to allocate p2m_host array
  libxc: add p2m_size to xc_dom_image
  libxc: allocate memory with vNUMA information for PV guest
  libxl: introduce vNUMA types
  libxl: add vmemrange to libxl__domain_build_state
  libxl: introduce libxl__vnuma_config_check
  libxl: x86: factor out e820_host_sanitize
  libxl: functions to build vmemranges for PV guest
  libxl: build, check and pass vNUMA info to Xen for PV guest
  libxc: indentation change to xc_hvm_build_x86.c
  libxc: allocate memory with vNUMA information for HVM guest
  libxl: build, check and pass vNUMA info to Xen for HVM guest
  libxl: disallow memory relocation when vNUMA is enabled
  libxl: define LIBXL_HAVE_VNUMA
  libxlu: rework internal representation of setting
  libxlu: nested list support
  libxlu: record line and column number when parsing values
  libxlu: introduce new APIs
  xl: introduce xcalloc
  xl: vNUMA support

 docs/man/xl.cfg.pod.5|  54 +++
 tools/libxc/include/xc_dom.h |  13 +-
 tools/libxc/include/xenguest.h   |  11 ++
 tools/libxc/xc_dom_arm.c |   1 +
 tools/libxc/xc_dom_core.c|   8 +-
 tools/libxc/xc_dom_x86.c | 129 +---
 tools/libxc/xc_hvm_build_x86.c   | 237 +++--
 tools/libxl/Makefile |   2 +-
 tools/libxl/libxl.h  |   7 +
 tools/libxl/libxl_arch.h |   6 +
 tools/libxl/libxl_arm.c  |   8 +
 tools/libxl/libxl_create.c   |   9 ++
 tools/libxl/libxl_dm.c   |   6 +-
 tools/libxl/libxl_dom.c  | 120 +++
 tools/libxl/libxl_internal.h |  24 +++
 tools/libxl/libxl_types.idl  |  10 ++
 tools/libxl/libxl_vnuma.c| 253 +++
 tools/libxl/libxl_x86.c  | 105 +++--
 tools/libxl/libxlu_cfg.c | 209 ++---
 tools/libxl/libxlu_cfg_i.h   |  14 +-
 tools/libxl/libxlu_cfg_y.c   |  72 -
 tools/libxl/libxlu_cfg_y.h   |   2 +-
 tools/libxl/libxlu_cfg_y.y   |  18 ++-
 tools/libxl/libxlu_internal.h|  24 ++-
 tools/libxl/libxlutil.h  |  13 ++
 tools/libxl/xl_cmdimpl.c | 150 +-
 xen/arch/x86/hpet.c  |   2 +-
 xen/arch/x86/irq.c   |   4 +-
 xen/arch/x86/numa.c  |  14 +-
 xen/arch/x86/physdev.c   |   2 +-
 x

[Xen-devel] [PATCH v2] RFC: Automatically check xen's public headers for C++ pitfalls.

2015-02-26 Thread Tim Deegan
Add a check, like the existing check for non-ANSI C in the public
headers, that runs the public headers through a C++ compiler to
flag non-C++-friendly constructs.

Unlike the ANSI C check, we accept GCC-isms (gnu++98), and we also
check various tools-only headers.

Explicitly _not_ addressing the use of 'private' in various fields,
since we'd previously decided not to fix that.

Also tidy up the runes for these checks to be a bit more readable.

Reported-by: Razvan Cojocaru 
Signed-off-by: Tim Deegan 
Cc: Jan Beulich 

---

v2: test more headers;
define __XEN_TOOLS__;
use g++98 rather than ansi;
tidy the makefile for readability;
add a missing include to flask_op.h, which uses evtchn_port_t.
---
 .gitignore|  1 +
 config/StdGNU.mk  |  2 ++
 config/SunOS.mk   |  1 +
 xen/include/Makefile  | 28 
 xen/include/public/platform.h | 39 ++-
 xen/include/public/xsm/flask_op.h |  2 ++
 6 files changed, 52 insertions(+), 21 deletions(-)

diff --git a/.gitignore b/.gitignore
index 13ee05b..78958ea 100644
--- a/.gitignore
+++ b/.gitignore
@@ -233,6 +233,7 @@ xen/arch/*/efi/compat.c
 xen/arch/*/efi/efi.h
 xen/arch/*/efi/runtime.c
 xen/include/headers.chk
+xen/include/headers++.chk
 xen/include/asm
 xen/include/asm-*/asm-offsets.h
 xen/include/compat/*
diff --git a/config/StdGNU.mk b/config/StdGNU.mk
index 4efebe3..e10ed39 100644
--- a/config/StdGNU.mk
+++ b/config/StdGNU.mk
@@ -2,9 +2,11 @@ AS = $(CROSS_COMPILE)as
 LD = $(CROSS_COMPILE)ld
 ifeq ($(clang),y)
 CC = $(CROSS_COMPILE)clang
+CXX= $(CROSS_COMPILE)clang++
 LD_LTO = $(CROSS_COMPILE)llvm-ld
 else
 CC = $(CROSS_COMPILE)gcc
+CXX= $(CROSS_COMPILE)g++
 LD_LTO = $(CROSS_COMPILE)ld
 endif
 CPP= $(CC) -E
diff --git a/config/SunOS.mk b/config/SunOS.mk
index 3316280..c2be37d 100644
--- a/config/SunOS.mk
+++ b/config/SunOS.mk
@@ -2,6 +2,7 @@ AS = $(CROSS_COMPILE)gas
 LD = $(CROSS_COMPILE)gld
 CC = $(CROSS_COMPILE)gcc
 CPP= $(CROSS_COMPILE)gcc -E
+CXX= $(CROSS_COMPILE)g++
 AR = $(CROSS_COMPILE)gar
 RANLIB = $(CROSS_COMPILE)granlib
 NM = $(CROSS_COMPILE)gnm
diff --git a/xen/include/Makefile b/xen/include/Makefile
index 94112d1..d48a642 100644
--- a/xen/include/Makefile
+++ b/xen/include/Makefile
@@ -87,13 +87,33 @@ compat/xlat.h: $(addprefix compat/.xlat/,$(xlat-y)) Makefile
 
 ifeq ($(XEN_TARGET_ARCH),$(XEN_COMPILE_ARCH))
 
-all: headers.chk
+all: headers.chk headers++.chk
 
-headers.chk: $(filter-out public/arch-% public/%ctl.h public/xsm/% 
public/%hvm/save.h, $(wildcard public/*.h public/*/*.h) $(public-y)) Makefile
-   for i in $(filter %.h,$^); do $(CC) -ansi -include stdint.h -Wall -W 
-Werror -S -o /dev/null -x c $$i || exit 1; echo $$i; done >$@.new
+PUBLIC_HEADERS := $(filter-out public/arch-% public/dom0_ops.h, $(wildcard 
public/*.h public/*/*.h) $(public-y))
+
+PUBLIC_ANSI_HEADERS := $(filter-out public/%ctl.h public/xsm/% 
public/%hvm/save.h, $(PUBLIC_HEADERS))
+
+headers.chk: $(PUBLIC_ANSI_HEADERS) Makefile
+   for i in $(filter %.h,$^); do \
+   $(CC) -x c -ansi -Wall -Werror -include stdint.h \
+ -S -o /dev/null $$i || exit 1; \
+   echo $$i; \
+   done >$@.new
+   mv $@.new $@
+
+headers++.chk: $(PUBLIC_HEADERS) Makefile
+   if $(CXX) -v >/dev/null 2>&1; then \
+   for i in $(filter %.h,$^); do \
+   $(CXX) -x c++ -std=gnu++98 -Wall -Werror \
+  -D__XEN_TOOLS__ -Dprivate=private_is_a_keyword_in_cpp \
+  -include stdint.h -include public/xen.h \
+  -S -o /dev/null $$i || exit 1; \
+   echo $$i; \
+   done ; \
+   fi >$@.new
mv $@.new $@
 
 endif
 
 clean::
-   rm -rf compat headers.chk
+   rm -rf compat headers.chk headers++.chk
diff --git a/xen/include/public/platform.h b/xen/include/public/platform.h
index 3e340b4..dd03447 100644
--- a/xen/include/public/platform.h
+++ b/xen/include/public/platform.h
@@ -126,6 +126,26 @@ DEFINE_XEN_GUEST_HANDLE(xenpf_platform_quirk_t);
 #define XEN_EFI_query_variable_info   9
 #define XEN_EFI_query_capsule_capabilities   10
 #define XEN_EFI_update_capsule   11
+
+struct xenpf_efi_guid {
+uint32_t data1;
+uint16_t data2;
+uint16_t data3;
+uint8_t data4[8];
+};
+
+struct xenpf_efi_time {
+uint16_t year;
+uint8_t month;
+uint8_t day;
+uint8_t hour;
+uint8_t min;
+uint8_t sec;
+uint32_t ns;
+int16_t tz;
+uint8_t daylight;
+};
+
 struct xenpf_efi_runtime_call {
 uint32_t function;
 /*
@@ -138,17 +158,7 @@ struct xenpf_efi_runtime_call {
 union {
 #define XEN_EFI_GET_TIME_SET_CLEARS_NS 0x0001
 struct {
-struct xenpf_efi_time {
-uint16_t year;
-uint8_t

[Xen-devel] [PATCH v6 07/23] libxl: introduce vNUMA types

2015-02-26 Thread Wei Liu
A domain can contain several virtual NUMA nodes, hence we introduce an
array in libxl_domain_build_info.

libxl_vnode_info contains the size of memory in that node, the distance
from that node to every nodes, the underlying pnode and a bitmap of
vcpus.

Signed-off-by: Wei Liu 
Reviewed-by: Dario Faggioli 
Cc: Ian Campbell 
Cc: Ian Jackson 
Cc: Dario Faggioli 
Cc: Elena Ufimtseva 
Acked-by: Ian Campbell 
---
Changes in v4:
1. Use MemKB.

Changes in v3:
1. Add commit message.
---
 tools/libxl/libxl_types.idl | 9 +
 1 file changed, 9 insertions(+)

diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl
index 02be466..14c7e7c 100644
--- a/tools/libxl/libxl_types.idl
+++ b/tools/libxl/libxl_types.idl
@@ -356,6 +356,13 @@ libxl_domain_sched_params = Struct("domain_sched_params",[
 ("budget",   integer, {'init_val': 
'LIBXL_DOMAIN_SCHED_PARAM_BUDGET_DEFAULT'}),
 ])
 
+libxl_vnode_info = Struct("vnode_info", [
+("memkb", MemKB),
+("distances", Array(uint32, "num_distances")), # distances from this node 
to other nodes
+("pnode", uint32), # physical node of this node
+("vcpus", libxl_bitmap), # vcpus in this node
+])
+
 libxl_domain_build_info = Struct("domain_build_info",[
 ("max_vcpus",   integer),
 ("avail_vcpus", libxl_bitmap),
@@ -376,6 +383,8 @@ libxl_domain_build_info = Struct("domain_build_info",[
 ("disable_migrate", libxl_defbool),
 ("cpuid",   libxl_cpuid_policy_list),
 ("blkdev_start",string),
+
+("vnuma_nodes", Array(libxl_vnode_info, "num_vnuma_nodes")),
 
 ("device_model_version", libxl_device_model_version),
 ("device_model_stubdomain", libxl_defbool),
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


Re: [Xen-devel] Xen's Linux kernel config options

2015-02-26 Thread Luis R. Rodriguez
On Thu, Feb 26, 2015 at 11:19:17AM +, Stefano Stabellini wrote:
> On Wed, 25 Feb 2015, Luis R. Rodriguez wrote:
> > On Wed, Feb 25, 2015 at 12:01:31PM +, Stefano Stabellini wrote:
> > > On Tue, 24 Feb 2015, Luis R. Rodriguez wrote:
> > > > On Tue, Feb 24, 2015 at 7:21 AM, Stefano Stabellini
> > > >  wrote:
> > > > > On Mon, 23 Feb 2015, Luis R. Rodriguez wrote:
> > > > >> On Thu, Feb 19, 2015 at 3:43 PM, Luis R. Rodriguez  
> > > > >> wrote:
> > > > >> > On Fri, Dec 12, 2014 at 9:29 AM, David Vrabel 
> > > > >> >  wrote:
> > > > >> >> On 12/12/14 13:17, Juergen Gross wrote:
> > > > >> >>> XEN_PVHVM
> > > > >> >>
> > > > >> >> Move XEN_PVHVM under XEN and have it select PARAVIRT and 
> > > > >> >> PARAVIRT_CLOCK.
> > > > >> >
> > > > >> > FWIW, although it seems we do not want to let users just build
> > > > >> > XEN_PVHVM hypervisors I have the changes required now to at least 
> > > > >> > get
> > > > >> > this to build so I do know what it takes.
> > > > >> >
> > > > >> >>> XEN_FRONTENDXEN_PV ||
> > > > >> >>> XEN_PVH 
> > > > >> >>> ||
> > > > >> >>> XEN_PVHVM
> > > > >> >>
> > > > >> >> This enables all the basic infrastructure for frontends: event 
> > > > >> >> channels,
> > > > >> >> grant tables and Xenbus.
> > > > >> >>
> > > > >> >> Don't make XEN_FRONTEND depend on any XEN_* variant.  It should be
> > > > >> >> possible to have frontend drivers without support for any of the
> > > > >> >> PV/PVHVM/PVH guest types.
> > > > >> >
> > > > >> > David, can you elaborate on the type of Xen guest it would be on 
> > > > >> > x86
> > > > >> > its not PV, PVHVM, or PVH? I'm particularly curious about the
> > > > >> > xen_domain_type and how it would end up to selected. As it is we 
> > > > >> > tie
> > > > >> > in XEN_PVHVM at build time with XEN_PVH, in order to have XEN_PVHVM
> > > > >> > completely removed from XEN_PVH we need quite a bit of code changes
> > > > >> > which at least as code exercise I have completed already. If we 
> > > > >> > want
> > > > >> > at the very least xen_domain_type set when XEN_PV, XEN_PVHVM, and
> > > > >> > XEN_PVH are not available we need a bit more work.
> > > > >>
> > > > >> OK I think I see the issue. We have nothing quite like
> > > > >> xen_guest_init() on x86 enlighten.c, we do have this for ARM and I
> > > > >> think I can that close the gap I'm observing.
> > > > >>
> > > > >> >>  Frontends only need event channels, grant
> > > > >> >> table and xenbus.
> > > > >> >
> > > > >> > Well xenbus_probe_initcall() will check for xen_domain() and that
> > > > >> > won't be set on x86 right now unless we have XEN_PV, XEN_PVHVM or
> > > > >> > XEN_PVH set -- to start off with. Then
> > > > >> > drivers/xen/xenbus/xenbus_client.c will check xen_feature in quite 
> > > > >> > a
> > > > >> > bit of places as well, that won't be set unless 
> > > > >> > xen_setup_features()
> > > > >> > is called which right now is only done on x86 
> > > > >> > arch/x86/xen/enlighten.c
> > > > >> > which as Juergen pointed out, is not needed if you don't have 
> > > > >> > XEN_PV
> > > > >> > or XEN_PVH. As it turns out this is incorrect though, its needed 
> > > > >> > for
> > > > >> > XEN_PVHVM as well and my split exercise in code addresses this. 
> > > > >> > Now,
> > > > >> > at least in my code if you don't have XEN_PV, XEN_PVHVM, or 
> > > > >> > XEN_PVH we
> > > > >> > don't call xen_setup_features() and its unclear to me where or how
> > > > >> > that should happen in other cases.
> > > > >>
> > > > >> Yeah I think having an x86 equivalent of xen_guest_init() would solve
> > > > >> this, Stefano, thoughts?
> > > > >
> > > > > Having xen_guest_init() on x86 would be nice.  Being able to set
> > > > > xen_domain_type to XEN_HVM_DOMAIN if we are running on Xen, regardless
> > > > > of XEN_PV/PVH/PVHVM also makes sense from Linux POV.
> > > > 
> > > > OK great, thanks for the feedback.
> > > > 
> > > > > That said, I don't see much value in removing XEN_PVHVM: why are we 
> > > > > even
> > > > > doing this? What is the improvement we are seeking?
> > > > 
> > > > We would not, the above discussed about the possibility of letting
> > > > users enable XEN_PVHVM without XEN_PVH, that's all.
> > > 
> > > OK, that makes sense.
> > > 
> > > > As is the only thing that can enable XEN_PVHVM is if you enable
> > > > XEN_PVH.
> > > 
> > > This is the bit that we need to change but it shouldn't be difficult.
> > > 
> > > > If we want
> > > > xen_guest_init() alone though we might need the decoupling though at
> > > > least at build time so that if XEN_PV or XEN_PVH is not selected we'd
> > > > at least have XEN_PVHVM. Thoughts?
> > > 
> > > Today pv(h) and pvhvm have very different boot paths already: pv and pvh
> > > initialize via xen_start_kernel while pvhvm via xen_hvm_guest_init.
> > 
> > Ah I see, this helps a lot thanks!
> > 
>

[Xen-devel] [PATCH v6 23/23] xl: vNUMA support

2015-02-26 Thread Wei Liu
This patch includes configuration options parser and documentation.

Please find the hunk to xl.cfg.pod.5 for more information.

Signed-off-by: Wei Liu 
Cc: Ian Campbell 
Cc: Ian Jackson 
---
Changes in v6:
1. Disable NUMA auto-placement.
---
 docs/man/xl.cfg.pod.5|  54 ++
 tools/libxl/xl_cmdimpl.c | 140 ++-
 2 files changed, 193 insertions(+), 1 deletion(-)

diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5
index 408653f..2a27b1c 100644
--- a/docs/man/xl.cfg.pod.5
+++ b/docs/man/xl.cfg.pod.5
@@ -266,6 +266,60 @@ it will crash.
 
 =back
 
+=head3 Guest Virtual NUMA Configuration
+
+=over 4
+
+=item B in the list specifies the configuration of nth
+virtual node.
+
+Each B is a list, which has a form of
+"[VNODE_CONFIG_OPTION,VNODE_CONFIG_OPTION, ... ]"  (without quotes).
+
+For example vnuma = [ ["pnode=0","size=512","vcpus=0-4","vdistances=10,20"] ]
+means vnode 0 is mapped to pnode 0, has 512MB ram, has vcpus 0 to 4, the
+distance to itself is 10 and the distance to vnode 1 is 20.
+
+Each B is a quoted string. Supported
+Bs are:
+
+=over 4
+
+=item B
+
+Specify which physical node this virtual node maps to.
+
+=item B
+
+Specify the size of this virtual node. The sum of memory size of all
+vnodes must match B (or B if B is not
+specified).
+
+=item B
+
+Specify which vcpus belong to this node. B is a string
+separated by comma. You can specify range and single cpu. An example
+is "vcpus=0-5,8", which means you specify vcpu 0 to vcpu 5, and vcpu
+8.
+
+=item B
+
+Specify virtual distance from this node to all nodes (including
+itself) with positional arguments. For example, "vdistance=10,20"
+for vnode 0 means the distance from vnode 0 to vnode 0 is 10, from
+vnode 0 to vnode 1 is 20. The number of arguments supplied must match
+the total number of vnodes.
+
+Normally you can use the values from "xl info -n" or "numactl
+--hardware" to fill in vdistance list.
+
+=back
+
+=back
+
 =head3 Event Actions
 
 =over 4
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 5b366f2..2899d9f 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -158,7 +158,6 @@ struct domain_create {
 };
 
 
-static uint32_t find_domain(const char *p) __attribute__((warn_unused_result));
 static uint32_t find_domain(const char *p)
 {
 uint32_t domid;
@@ -987,6 +986,143 @@ static int parse_nic_config(libxl_device_nic *nic, 
XLU_Config **config, char *to
 return 0;
 }
 
+static void parse_vnuma_config(const XLU_Config *config,
+   libxl_domain_build_info *b_info)
+{
+libxl_physinfo physinfo;
+uint32_t nr_nodes;
+XLU_ConfigList *vnuma;
+int i, j, len, num_vnuma;
+
+
+libxl_physinfo_init(&physinfo);
+if (libxl_get_physinfo(ctx, &physinfo) != 0) {
+libxl_physinfo_dispose(&physinfo);
+fprintf(stderr, "libxl_get_physinfo failed\n");
+exit(1);
+}
+
+nr_nodes = physinfo.nr_nodes;
+libxl_physinfo_dispose(&physinfo);
+
+if (xlu_cfg_get_list(config, "vnuma", &vnuma, &num_vnuma, 1))
+return;
+
+b_info->num_vnuma_nodes = num_vnuma;
+b_info->vnuma_nodes = xcalloc(num_vnuma, sizeof(libxl_vnode_info));
+
+for (i = 0; i < b_info->num_vnuma_nodes; i++) {
+libxl_vnode_info *p = &b_info->vnuma_nodes[i];
+
+libxl_vnode_info_init(p);
+libxl_cpu_bitmap_alloc(ctx, &p->vcpus, b_info->max_vcpus);
+libxl_bitmap_set_none(&p->vcpus);
+p->distances = xcalloc(b_info->num_vnuma_nodes,
+   sizeof(*p->distances));
+p->num_distances = b_info->num_vnuma_nodes;
+}
+
+for (i = 0; i < num_vnuma; i++) {
+XLU_ConfigValue *vnode_spec, *conf_option;
+XLU_ConfigList *vnode_config_list;
+int conf_count;
+libxl_vnode_info *p = &b_info->vnuma_nodes[i];
+
+vnode_spec = xlu_cfg_get_listitem2(vnuma, i);
+assert(vnode_spec);
+
+xlu_cfg_value_get_list(config, vnode_spec, &vnode_config_list, 0);
+if (!vnode_config_list) {
+fprintf(stderr, "xl: cannot get vnode config option list\n");
+exit(1);
+}
+
+for (conf_count = 0;
+ (conf_option =
+  xlu_cfg_get_listitem2(vnode_config_list, conf_count));
+ conf_count++) {
+
+if (xlu_cfg_value_type(conf_option) == XLU_STRING) {
+char *buf, *option_untrimmed, *value_untrimmed;
+char *option, *value;
+char *endptr;
+unsigned long val;
+
+xlu_cfg_value_get_string(config, conf_option, &buf, 0);
+
+if (!buf) continue;
+
+if (split_string_into_pair(buf, "=",
+   &option_untrimmed,
+   &value_untrimmed)) {
+fprintf(stderr, "xl: failed to split \"%s\" into pair\n",
+   

[Xen-devel] [PATCH v6 03/23] xen: make two memory hypercalls vNUMA-aware

2015-02-26 Thread Wei Liu
Make XENMEM_increase_reservation and XENMEM_populate_physmap
vNUMA-aware.

That is, if guest requests Xen to allocate memory for specific vnode,
Xen can translate vnode to pnode using vNUMA information of that guest.

XENMEMF_vnode is introduced for the guest to mark the node number is in
fact virtual node number and should be translated by Xen.

XENFEAT_memory_op_vnode_supported is introduced to indicate that Xen is
able to translate virtual node to physical node.

Signed-off-by: Wei Liu 
Cc: Jan Beulich 
Cc: Andrew Cooper 
---
Changes in v6:
1. Add logic in construct_memop_from_reservation.
---
 xen/common/kernel.c   |  2 +-
 xen/common/memory.c   | 45 ---
 xen/include/public/features.h |  3 +++
 xen/include/public/memory.h   |  2 ++
 4 files changed, 44 insertions(+), 8 deletions(-)

diff --git a/xen/common/kernel.c b/xen/common/kernel.c
index 0d9e519..e5e0050 100644
--- a/xen/common/kernel.c
+++ b/xen/common/kernel.c
@@ -301,7 +301,7 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
 switch ( fi.submap_idx )
 {
 case 0:
-fi.submap = 0;
+fi.submap = (1U << XENFEAT_memory_op_vnode_supported);
 if ( VM_ASSIST(d, VMASST_TYPE_pae_extended_cr3) )
 fi.submap |= (1U << XENFEAT_pae_pgdir_above_4gb);
 if ( paging_mode_translate(current->domain) )
diff --git a/xen/common/memory.c b/xen/common/memory.c
index d24b001..9f8891b 100644
--- a/xen/common/memory.c
+++ b/xen/common/memory.c
@@ -692,7 +692,7 @@ out:
 return rc;
 }
 
-static int construct_memop_from_reservation(
+static int construct_memop_from_reservation(struct domain *d,
const struct xen_memory_reservation *r,
struct memop_args *a)
 {
@@ -716,9 +716,37 @@ static int construct_memop_from_reservation(
 a->memflags = MEMF_bits(address_bits);
 }
 
-a->memflags |= MEMF_node(XENMEMF_get_node(r->mem_flags));
-if ( r->mem_flags & XENMEMF_exact_node_request )
-a->memflags |= MEMF_exact_node;
+if ( r->mem_flags & XENMEMF_vnode )
+{
+unsigned int vnode, pnode;
+
+read_lock(&d->vnuma_rwlock);
+if ( d->vnuma )
+{
+vnode = XENMEMF_get_node(r->mem_flags);
+if ( vnode >= d->vnuma->nr_vnodes )
+{
+rc = -EINVAL;
+read_unlock(&d->vnuma_rwlock);
+goto out;
+}
+
+pnode = d->vnuma->vnode_to_pnode[vnode];
+if ( pnode != XEN_NUMA_NO_NODE )
+{
+a->memflags |= MEMF_node(pnode);
+if ( r->mem_flags & XENMEMF_exact_node_request )
+a->memflags |= MEMF_exact_node;
+}
+}
+read_unlock(&d->vnuma_rwlock);
+}
+else
+{
+a->memflags |= MEMF_node(XENMEMF_get_node(r->mem_flags));
+if ( r->mem_flags & XENMEMF_exact_node_request )
+a->memflags |= MEMF_exact_node;
+}
 
 rc = 0;
  out:
@@ -753,9 +781,6 @@ long do_memory_op(unsigned long cmd, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 args.nr_done  = start_extent;
 args.preempted= 0;
 
-if ( construct_memop_from_reservation(&reservation, &args) )
-return start_extent;
-
 if ( op == XENMEM_populate_physmap
  && (reservation.mem_flags & XENMEMF_populate_on_demand) )
 args.memflags |= MEMF_populate_on_demand;
@@ -765,6 +790,12 @@ long do_memory_op(unsigned long cmd, 
XEN_GUEST_HANDLE_PARAM(void) arg)
 return start_extent;
 args.domain = d;
 
+if ( construct_memop_from_reservation(d, &reservation, &args) )
+{
+rcu_unlock_domain(d);
+return start_extent;
+}
+
 if ( xsm_memory_adjust_reservation(XSM_TARGET, current->domain, d) )
 {
 rcu_unlock_domain(d);
diff --git a/xen/include/public/features.h b/xen/include/public/features.h
index 16d92aa..2110b04 100644
--- a/xen/include/public/features.h
+++ b/xen/include/public/features.h
@@ -99,6 +99,9 @@
 #define XENFEAT_grant_map_identity12
  */
 
+/* Guest can use XENMEMF_vnode to specify virtual node for memory op. */
+#define XENFEAT_memory_op_vnode_supported 13
+
 #define XENFEAT_NR_SUBMAPS 1
 
 #endif /* __XEN_PUBLIC_FEATURES_H__ */
diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h
index 0d8c85f..d71127f 100644
--- a/xen/include/public/memory.h
+++ b/xen/include/public/memory.h
@@ -57,6 +57,8 @@
 /* Flag to request allocation only from the node specified */
 #define XENMEMF_exact_node_request  (1<<17)
 #define XENMEMF_exact_node(n) (XENMEMF_node(n) | XENMEMF_exact_node_request)
+/* Flag to indicate the node specified is virtual node */
+#define XENMEMF_vnode  (1<<18)
 #endif
 
 struct xen_memory_reservation {
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.

[Xen-devel] [PATCH v6 10/23] libxl: x86: factor out e820_host_sanitize

2015-02-26 Thread Wei Liu
This function gets the machine E820 map and sanitize it according to PV
guest configuration.

This will be used in later patch. No functional change introduced in
this patch.

Signed-off-by: Wei Liu 
Reviewed-by: Andrew Cooper 
Reviewed-by: Dario Faggioli 
Cc: Ian Campbell 
Cc: Ian Jackson 
Cc: Elena Ufimtseva 
Acked-by: Ian Campbell 
---
Changes in v4:
1. Use actual size of the map instead of using E820MAX.
---
 tools/libxl/libxl_x86.c | 32 +++-
 1 file changed, 23 insertions(+), 9 deletions(-)

diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c
index 9ceb373..d012b4d 100644
--- a/tools/libxl/libxl_x86.c
+++ b/tools/libxl/libxl_x86.c
@@ -207,6 +207,27 @@ static int e820_sanitize(libxl_ctx *ctx, struct e820entry 
src[],
 return 0;
 }
 
+static int e820_host_sanitize(libxl__gc *gc,
+  libxl_domain_build_info *b_info,
+  struct e820entry map[],
+  uint32_t *nr)
+{
+int rc;
+
+rc = xc_get_machine_memory_map(CTX->xch, map, *nr);
+if (rc < 0) {
+errno = rc;
+return ERROR_FAIL;
+}
+
+*nr = rc;
+
+rc = e820_sanitize(CTX, map, nr, b_info->target_memkb,
+   (b_info->max_memkb - b_info->target_memkb) +
+   b_info->u.pv.slack_memkb);
+return rc;
+}
+
 static int libxl__e820_alloc(libxl__gc *gc, uint32_t domid,
 libxl_domain_config *d_config)
 {
@@ -223,15 +244,8 @@ static int libxl__e820_alloc(libxl__gc *gc, uint32_t domid,
 if (!libxl_defbool_val(b_info->u.pv.e820_host))
 return ERROR_INVAL;
 
-rc = xc_get_machine_memory_map(ctx->xch, map, E820MAX);
-if (rc < 0) {
-errno = rc;
-return ERROR_FAIL;
-}
-nr = rc;
-rc = e820_sanitize(ctx, map, &nr, b_info->target_memkb,
-   (b_info->max_memkb - b_info->target_memkb) +
-   b_info->u.pv.slack_memkb);
+nr = E820MAX;
+rc = e820_host_sanitize(gc, b_info, map, &nr);
 if (rc)
 return ERROR_FAIL;
 
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v6 15/23] libxl: build, check and pass vNUMA info to Xen for HVM guest

2015-02-26 Thread Wei Liu
Transform user supplied vNUMA configuration into libxl internal
representations then libxc representations. Check validity along the
line.

Libxc has more involvement in building vmemranges in HVM case compared
to PV case. The building of vmemranges is placed after xc_hvm_build
returns, because it relies on memory hole information provided by
xc_hvm_build.

Signed-off-by: Wei Liu 
Reviewed-by: Dario Faggioli 
Cc: Ian Campbell 
Cc: Ian Jackson 
Cc: Dario Faggioli 
Cc: Elena Ufimtseva 
---
Changes in v6:
1. Fix a minor bug discovered by Dario.

Changes in v5:
1. Check vnode 0 is large enough to accommodate video ram.

Changes in v4:
1. Adapt to new interface.
2. Rename some variables.
3. Use GCREALLOC_ARRAY.

Changes in v3:
1. Rewrite commit log.
---
 tools/libxl/libxl_create.c   |  9 +++
 tools/libxl/libxl_dom.c  | 43 ++
 tools/libxl/libxl_internal.h |  5 
 tools/libxl/libxl_vnuma.c| 56 
 4 files changed, 113 insertions(+)

diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c
index 98687bd..af04248 100644
--- a/tools/libxl/libxl_create.c
+++ b/tools/libxl/libxl_create.c
@@ -853,6 +853,15 @@ static void initiate_domain_create(libxl__egc *egc,
 goto error_out;
 }
 
+/* Disallow PoD and vNUMA to be enabled at the same time because PoD
+ * pool is not vNUMA-aware yet.
+ */
+if (pod_enabled && d_config->b_info.num_vnuma_nodes) {
+ret = ERROR_INVAL;
+LOG(ERROR, "Cannot enable PoD and vNUMA at the same time");
+goto error_out;
+}
+
 ret = libxl__domain_create_info_setdefault(gc, &d_config->c_info);
 if (ret) goto error_out;
 
diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c
index b58a19b..c1a409d 100644
--- a/tools/libxl/libxl_dom.c
+++ b/tools/libxl/libxl_dom.c
@@ -893,12 +893,55 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid,
 goto out;
 }
 
+if (info->num_vnuma_nodes != 0) {
+int i;
+
+args.nr_vmemranges = state->num_vmemranges;
+args.vmemranges = libxl__malloc(gc, sizeof(*args.vmemranges) *
+args.nr_vmemranges);
+
+for (i = 0; i < args.nr_vmemranges; i++) {
+args.vmemranges[i].start = state->vmemranges[i].start;
+args.vmemranges[i].end   = state->vmemranges[i].end;
+args.vmemranges[i].flags = state->vmemranges[i].flags;
+args.vmemranges[i].nid   = state->vmemranges[i].nid;
+}
+
+/* Consider video ram belongs to vmemrange 0 -- just shrink it
+ * by the size of video ram.
+ */
+if (((args.vmemranges[0].end - args.vmemranges[0].start) >> 10)
+< info->video_memkb) {
+LOG(ERROR, "vmemrange 0 too small to contain video ram");
+goto out;
+}
+
+args.vmemranges[0].end -= (info->video_memkb << 10);
+
+args.nr_vnodes = info->num_vnuma_nodes;
+args.vnode_to_pnode = libxl__malloc(gc, sizeof(*args.vnode_to_pnode) *
+args.nr_vnodes);
+for (i = 0; i < args.nr_vnodes; i++)
+args.vnode_to_pnode[i] = info->vnuma_nodes[i].pnode;
+}
+
 ret = xc_hvm_build(ctx->xch, domid, &args);
 if (ret) {
 LOGEV(ERROR, ret, "hvm building failed");
 goto out;
 }
 
+if (info->num_vnuma_nodes != 0) {
+ret = libxl__vnuma_build_vmemrange_hvm(gc, domid, info, state, &args);
+if (ret) {
+LOGEV(ERROR, ret, "hvm build vmemranges failed");
+goto out;
+}
+ret = libxl__vnuma_config_check(gc, info, state);
+if (ret) goto out;
+ret = set_vnuma_info(gc, domid, info, state);
+if (ret) goto out;
+}
 ret = hvm_build_set_params(ctx->xch, domid, info, state->store_port,
&state->store_mfn, state->console_port,
&state->console_mfn, state->store_domid,
diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h
index 7d1e1cf..e93089a 100644
--- a/tools/libxl/libxl_internal.h
+++ b/tools/libxl/libxl_internal.h
@@ -3408,6 +3408,11 @@ int libxl__vnuma_build_vmemrange_pv(libxl__gc *gc,
 uint32_t domid,
 libxl_domain_build_info *b_info,
 libxl__domain_build_state *state);
+int libxl__vnuma_build_vmemrange_hvm(libxl__gc *gc,
+ uint32_t domid,
+ libxl_domain_build_info *b_info,
+ libxl__domain_build_state *state,
+ struct xc_hvm_build_args *args);
 
 _hidden int libxl__ms_vm_genid_set(libxl__gc *gc, uint32_t domid,
const libxl_ms_vm_genid *id);
diff --git a/tools/libxl/libxl_vnuma.c b/tools/libxl/libxl_v

[Xen-devel] [PATCH v6 21/23] libxlu: introduce new APIs

2015-02-26 Thread Wei Liu
These APIs can be used to manipulate XLU_ConfigValue and XLU_ConfigList.

APIs introduced:
1. xlu_cfg_value_type
2. xlu_cfg_value_get_string
3. xlu_cfg_value_get_list
4. xlu_cfg_get_listitem2

Move some definitions from private header to public header as needed.

Signed-off-by: Wei Liu 
Cc: Ian Jackson 
Cc: Ian Campbell 
---
Changes in v6:
1. Report value's line and column number on error.

Changes in v5:
1. Use calling convention like old APIs.
---
 tools/libxl/libxlu_cfg.c  | 45 +++
 tools/libxl/libxlu_internal.h |  7 ---
 tools/libxl/libxlutil.h   | 13 +
 3 files changed, 58 insertions(+), 7 deletions(-)

diff --git a/tools/libxl/libxlu_cfg.c b/tools/libxl/libxlu_cfg.c
index b921a13..62fb798 100644
--- a/tools/libxl/libxlu_cfg.c
+++ b/tools/libxl/libxlu_cfg.c
@@ -199,6 +199,51 @@ static int find_atom(const XLU_Config *cfg, const char *n,
 return 0;
 }
 
+
+enum XLU_ConfigValueType xlu_cfg_value_type(const XLU_ConfigValue *value)
+{
+return value->type;
+}
+
+int xlu_cfg_value_get_string(const XLU_Config *cfg, XLU_ConfigValue *value,
+ char **value_r, int dont_warn)
+{
+if (value->type != XLU_STRING) {
+if (!dont_warn)
+fprintf(cfg->report,
+"%s:%d:%d: warning: value is not a string\n",
+cfg->config_source, value->line, value->column);
+*value_r = NULL;
+return EINVAL;
+}
+
+*value_r = value->u.string;
+return 0;
+}
+
+int xlu_cfg_value_get_list(const XLU_Config *cfg, XLU_ConfigValue *value,
+   XLU_ConfigList **value_r, int dont_warn)
+{
+if (value->type != XLU_LIST) {
+if (!dont_warn)
+fprintf(cfg->report,
+"%s:%d:%d: warning: value is not a list\n",
+cfg->config_source, value->line, value->column);
+*value_r = NULL;
+return EINVAL;
+}
+
+*value_r = &value->u.list;
+return 0;
+}
+
+XLU_ConfigValue *xlu_cfg_get_listitem2(const XLU_ConfigList *list,
+   int entry)
+{
+if (entry < 0 || entry >= list->nvalues) return NULL;
+return list->values[entry];
+}
+
 int xlu_cfg_get_string(const XLU_Config *cfg, const char *n,
const char **value_r, int dont_warn) {
 XLU_ConfigSetting *set;
diff --git a/tools/libxl/libxlu_internal.h b/tools/libxl/libxlu_internal.h
index 73fd85f..1d310b1 100644
--- a/tools/libxl/libxlu_internal.h
+++ b/tools/libxl/libxlu_internal.h
@@ -25,13 +25,6 @@
 
 #include "libxlutil.h"
 
-enum XLU_ConfigValueType {
-XLU_STRING,
-XLU_LIST,
-};
-
-typedef struct XLU_ConfigValue XLU_ConfigValue;
-
 typedef struct XLU_ConfigList {
 int avalues; /* available slots */
 int nvalues; /* actual occupied slots */
diff --git a/tools/libxl/libxlutil.h b/tools/libxl/libxlutil.h
index 0333e55..989605a 100644
--- a/tools/libxl/libxlutil.h
+++ b/tools/libxl/libxlutil.h
@@ -20,9 +20,15 @@
 
 #include "libxl.h"
 
+enum XLU_ConfigValueType {
+XLU_STRING,
+XLU_LIST,
+};
+
 /* Unless otherwise stated, all functions return an errno value. */
 typedef struct XLU_Config XLU_Config;
 typedef struct XLU_ConfigList XLU_ConfigList;
+typedef struct XLU_ConfigValue XLU_ConfigValue;
 
 XLU_Config *xlu_cfg_init(FILE *report, const char *report_filename);
   /* 0 means we got ENOMEM. */
@@ -66,6 +72,13 @@ const char *xlu_cfg_get_listitem(const XLU_ConfigList*, int 
entry);
   /* xlu_cfg_get_listitem cannot fail, except that if entry is
* out of range it returns 0 (not setting errno) */
 
+enum XLU_ConfigValueType xlu_cfg_value_type(const XLU_ConfigValue *value);
+int xlu_cfg_value_get_string(const XLU_Config *cfg,  XLU_ConfigValue *value,
+ char **value_r, int dont_warn);
+int xlu_cfg_value_get_list(const XLU_Config *cfg, XLU_ConfigValue *value,
+   XLU_ConfigList **value_r, int dont_warn);
+XLU_ConfigValue *xlu_cfg_get_listitem2(const XLU_ConfigList *list,
+   int entry);
 
 /*
  * Disk specification parsing.
-- 
1.9.1


___
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel


[Xen-devel] [PATCH v6 18/23] libxlu: rework internal representation of setting

2015-02-26 Thread Wei Liu
This patches does following things:

1. Properly define a XLU_ConfigList type. Originally it was defined to
   be XLU_ConfigSetting.
2. Define XLU_ConfigValue type, which can be either a string or a list
   of XLU_ConfigValue.
3. ConfigSetting now references XLU_ConfigValue. Originally it only
   worked with **string.
4. Properly construct list where necessary, see changes to .y file.

To achieve above changes:

1. xlu__cfg_set_mk and xlu__cfg_set_add are deleted, because they
   are no more needed in the new code.
2. Introduce xlu__cfg_string_mk to make a XLU_ConfigSetting that points
   to a XLU_ConfigValue that wraps a string.
3. Introduce xlu__cfg_list_mk to make a XLU_ConfigSetting that points
   to XLU_ConfigValue that is a list.
4. The parser now generates XLU_ConfigValue instead of XLU_ConfigSetting
   when construct values, which enables us to recursively generate list
   of lists.
5. XLU_ConfigSetting is generated in xlu__cfg_set_store.
6. Adapt other functions to use new types.

No change to public API. Xl compiles without problem and 'xl create -n
guest.cfg' is valgrind clean.

This patch is needed because we're going to implement nested list
support, which requires support for list of list.

Signed-off-by: Wei Liu 
Cc: Ian Jackson 
Cc: Ian Campbell 
Acked-by: Ian Jackson 
---
Changes in v5:
1. Use standard expanding-array pattern.
---
 tools/libxl/libxlu_cfg.c  | 170 ++
 tools/libxl/libxlu_cfg_i.h|  12 ++-
 tools/libxl/libxlu_cfg_y.c|  24 +++---
 tools/libxl/libxlu_cfg_y.h|   2 +-
 tools/libxl/libxlu_cfg_y.y|  14 ++--
 tools/libxl/libxlu_internal.h |  30 ++--
 6 files changed, 173 insertions(+), 79 deletions(-)

diff --git a/tools/libxl/libxlu_cfg.c b/tools/libxl/libxlu_cfg.c
index 22adcb0..f000eed 100644
--- a/tools/libxl/libxlu_cfg.c
+++ b/tools/libxl/libxlu_cfg.c
@@ -131,14 +131,28 @@ int xlu_cfg_readdata(XLU_Config *cfg, const char *data, 
int length) {
 return ctx.err;
 }
 
-void xlu__cfg_set_free(XLU_ConfigSetting *set) {
+void xlu__cfg_value_free(XLU_ConfigValue *value)
+{
 int i;
 
+if (!value) return;
+
+switch (value->type) {
+case XLU_STRING:
+free(value->u.string);
+break;
+case XLU_LIST:
+for (i = 0; i < value->u.list.nvalues; i++)
+xlu__cfg_value_free(value->u.list.values[i]);
+free(value->u.list.values);
+}
+free(value);
+}
+
+void xlu__cfg_set_free(XLU_ConfigSetting *set) {
 if (!set) return;
 free(set->name);
-for (i=0; invalues; i++)
-free(set->values[i]);
-free(set->values);
+xlu__cfg_value_free(set->value);
 free(set);
 }
 
@@ -173,7 +187,7 @@ static int find_atom(const XLU_Config *cfg, const char *n,
 set= find(cfg,n);
 if (!set) return ESRCH;
 
-if (set->avalues!=1) {
+if (set->value->type!=XLU_STRING) {
 if (!dont_warn)
 fprintf(cfg->report,
 "%s:%d: warning: parameter `%s' is"
@@ -191,7 +205,7 @@ int xlu_cfg_get_string(const XLU_Config *cfg, const char *n,
 int e;
 
 e= find_atom(cfg,n,&set,dont_warn);  if (e) return e;
-*value_r= set->values[0];
+*value_r= set->value->u.string;
 return 0;
 }
 
@@ -202,7 +216,7 @@ int xlu_cfg_replace_string(const XLU_Config *cfg, const 
char *n,
 
 e= find_atom(cfg,n,&set,dont_warn);  if (e) return e;
 free(*value_r);
-*value_r= strdup(set->values[0]);
+*value_r= strdup(set->value->u.string);
 return 0;
 }
 
@@ -214,7 +228,7 @@ int xlu_cfg_get_long(const XLU_Config *cfg, const char *n,
 char *ep;
 
 e= find_atom(cfg,n,&set,dont_warn);  if (e) return e;
-errno= 0; l= strtol(set->values[0], &ep, 0);
+errno= 0; l= strtol(set->value->u.string, &ep, 0);
 e= errno;
 if (errno) {
 e= errno;
@@ -226,7 +240,7 @@ int xlu_cfg_get_long(const XLU_Config *cfg, const char *n,
 cfg->config_source, set->lineno, n, strerror(e));
 return e;
 }
-if (*ep || ep==set->values[0]) {
+if (*ep || ep==set->value->u.string) {
 if (!dont_warn)
 fprintf(cfg->report,
 "%s:%d: warning: parameter `%s' is not a valid number\n",
@@ -253,7 +267,7 @@ int xlu_cfg_get_list(const XLU_Config *cfg, const char *n,
  XLU_ConfigList **list_r, int *entries_r, int dont_warn) {
 XLU_ConfigSetting *set;
 set= find(cfg,n);  if (!set) return ESRCH;
-if (set->avalues==1) {
+if (set->value->type!=XLU_LIST) {
 if (!dont_warn) {
 fprintf(cfg->report,
 "%s:%d: warning: parameter `%s' is a single value"
@@ -262,8 +276,8 @@ int xlu_cfg_get_list(const XLU_Config *cfg, const char *n,
 }
 return EINVAL;
 }
-if (list_r) *list_r= set;
-if (entries_r) *entries_r= set->nvalues;
+if (list_r) *list_r= &set->value->u.list;
+if (entries_r) *entries_r= set->value->u.list.nvalues;
 return 0;
 }
 
@@ -290,72 +304,130 @@ i

  1   2   3   >