Re: [Xen-devel] [PATCH v9 08/13] Add IOREQ_TYPE_VMWARE_PORT
>>> On 26.02.15 at 20:52, wrote: > On 02/26/15 03:07, Jan Beulich wrote: > On 25.02.15 at 21:20, wrote: >>> On 02/24/15 10:34, Jan Beulich wrote: >>> On 17.02.15 at 00:05, wrote: > @@ -2474,7 +2594,12 @@ struct hvm_ioreq_server > *hvm_select_ioreq_server(struct domain *d, > BUILD_BUG_ON(IOREQ_TYPE_PIO != HVMOP_IO_RANGE_PORT); > BUILD_BUG_ON(IOREQ_TYPE_COPY != HVMOP_IO_RANGE_MEMORY); > BUILD_BUG_ON(IOREQ_TYPE_PCI_CONFIG != HVMOP_IO_RANGE_PCI); > +BUILD_BUG_ON(IOREQ_TYPE_VMWARE_PORT != > HVMOP_IO_RANGE_VMWARE_PORT); > +BUILD_BUG_ON(IOREQ_TYPE_TIMEOFFSET != HVMOP_IO_RANGE_TIMEOFFSET); > +BUILD_BUG_ON(IOREQ_TYPE_INVALIDATE != HVMOP_IO_RANGE_INVALIDATE); > r = s->range[type]; > +if ( !r ) > +continue; Why, all of the sudden? >>> >>> This is the replacement for the deleted "if" above. Continue will lead >>> to the same return that was remove above (it is at the end). They are >>> currently the same because all ioreq servers have the same set of >>> ranges. But if it would help, I can change "continue" into the "return >>> default". >> >> So further down you talk of the "special range 1" (see there for >> further remarks in this regard) - how would r be NULL here in the >> first place? > > Since there is a hole in the #defines 0,1,2,7,8 (currently) range[6] is > where r will be NULL for example. However no current code should be > able to get here. So if you want me to I can drop the "if". That's where ASSERT() comes in handy. >> I understand all this is non-trivial and not necessarily obvious. But >> as said before - the x86 instruction emulator should please remain >> something acting along _only_ architectural specifications. Any >> extensions to support things like what you want to do here should >> be added using neutral hooks, responsible for keeping state they >> need on their own. >> > > > How does (the incorrectly formatted for a smaller diff): Quite a bit better imo! Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] correct mis-conversion set_bit() -> __cpumask_set_cpu() by 4aaca0e9cd
>>> On 26.02.15 at 17:53, wrote: > Monday, February 23, 2015, 12:06:00 PM, you wrote: > >> I have no idea how I came to use __cpumask_set_cpu() there, the >> conversion should have been set_bit() -> __set_bit(). The wrong >> construct results in problems on systems with relatively few CPUs. > >> Reported-by: Sander Eikelenboom >> Signed-off-by: Jan Beulich > >> --- a/xen/common/softirq.c >> +++ b/xen/common/softirq.c >> @@ -106,7 +106,7 @@ void cpu_raise_softirq(unsigned int cpu, >> if ( !per_cpu(batching, this_cpu) || in_irq() ) >> smp_send_event_check_cpu(cpu); >> else >> -__cpumask_set_cpu(nr, &per_cpu(batch_mask, this_cpu)); >> +__set_bit(nr, &per_cpu(batch_mask, this_cpu)); >> } >> >> void cpu_raise_softirq_batch_begin(void) > > Hi Jan, > > Any reason this wasn't applied to staging yet ? It didn't get ack-ed so far (and it was a little too early still to ping it - I try to allow a week before doing so). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [rumpuserxen test] 35525: regressions - FAIL
flight 35525 rumpuserxen real [real] http://www.chiark.greenend.org.uk/~xensrcts/logs/35525/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: build-i386-rumpuserxen6 xen-build fail REGR. vs. 33866 build-amd64-rumpuserxen 6 xen-build fail REGR. vs. 33866 Tests which did not succeed, but are not blocking: test-amd64-amd64-rumpuserxen-amd64 1 build-check(1) blocked n/a test-amd64-i386-rumpuserxen-i386 1 build-check(1) blocked n/a version targeted for testing: rumpuserxen 21909666eb2d85c02770d04691795abfd4417392 baseline version: rumpuserxen 30d72f3fc5e35cd53afd82c8179cc0e0b11146ad People who touched revisions under test: Antti Kantee Ian Jackson Martin Lucina jobs: build-amd64 pass build-i386 pass build-amd64-pvopspass build-i386-pvops pass build-amd64-rumpuserxen fail build-i386-rumpuserxen fail test-amd64-amd64-rumpuserxen-amd64 blocked test-amd64-i386-rumpuserxen-i386 blocked sg-report-flight on osstest.cam.xci-test.com logs: /home/xc_osstest/logs images: /home/xc_osstest/images Logs, config files, etc. are available at http://www.chiark.greenend.org.uk/~xensrcts/logs Test harness code can be found at http://xenbits.xensource.com/gitweb?p=osstest.git;a=summary Not pushing. (No revision log; it would be 339 lines long.) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [Qemu-devel] [v2][PATCH] libxl: add one machine property to support IGD GFX passthrough
On 2015/2/27 0:17, Ian Campbell wrote: On Thu, 2015-02-26 at 14:35 +0800, Chen, Tiejun wrote: If we are going to do this then I think we need to arrange for the interface to be able to express the need to force the workarounds for a particular device. IOW a boolean will not suffice since it doesn't indicate that IGD workarounds are needed. Probably it would be simplest to just leave this functionality out for the time being and revisit if/when maintaining the list becomes an annoyance or an end user trips over it. You mean we should maintain one list to save all targeted devices, then tools uses ids as an index to lookup this list to pass something to qemu. I (think I) meant a list of pci vid:did in libxl, which is matched against the devices passed to the domain (e.g. "pci = [...]" in xl cfg), which then enables the igd workarounds, i.e. by passing the option to Yeah, this is exactly what I'm understanding. qemu. But actually one question that I have always been thinking about is, its really a responsibility of Xen to determine which device type should be passed by probing that pair of vendor and device ids? Xen is just one of so many approaches to qemu so such a rare workaround option can be passed actively by any user, instead of Xen. Furthermore, its becoming flexible as well to those cases we want to force overriding this. I'm not sure, but I think you are suggestion that qemu should autodetect this situation, without being explicitly told "igd-passthru=on" on the command line? If the qemu maintainers are amenable to that, and it's not already the case that other components (e.g. hvmloader) need to be told about these workarounds, then I suppose that would work. So I think qemu should mainly plays this role. If qemu realizes we're passing through a IGD or other targeted device, it should post a warning or even error message to indicate what right behavior is needed, or what is that potential risk by default. Hrm, here it sounds more like you are suggesting that qemu should detect and warn, rather than detect and do the right thing? I'm not sure how Qemu could indicate what the right behaviour is going to be, it'll differ for different hypervisors or even for which Xen toolstack (xl vs libvirt etc) is in use. Or maybe I've misunderstood? IGD is a tricky case since Qemu has to construct a ISA bridge and host bridge before we pass IGD device. But we don't like to expose these two bridges unconditionally, and this is also why we need this option. Here I just mean when Qemu realizes IGD is passed through but without that appropriate option set, Qemu can post something to explicitly notify user that this option is needed in his case. But it may be a lazy idea. So now I think I'd better go back handling this on Xen side with your comments. As you said the Boolean doesn't suffice to indicate that IGD workarounds are needed. So I think we can reuse that existing bool 'gfx_passthru'. Firstly we can redefine this as string, - ("gfx_passthru", libxl_defbool), + ("gfx_passthru", string), Then + +if (libxl__is_igd_vga_passthru(gc, guest_config) || +(b_info->u.hvm.gfx_passthru && + strncmp(b_info->u.hvm.gfx_passthru, "igd", 3) == 0) ) { +machinearg = GCSPRINTF("%s,igd-passthru=on", machinearg); +} + Of course we need modify something else to align this change. Thanks Tiejun ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: xen config changes v4
On 02/26/2015 07:48 PM, Luis R. Rodriguez wrote: On Thu, Feb 26, 2015 at 05:42:57PM +, Stefano Stabellini wrote: On Thu, 26 Feb 2015, Luis R. Rodriguez wrote: On Thu, Feb 26, 2015 at 11:08:20AM +, Stefano Stabellini wrote: On Thu, 26 Feb 2015, David Vrabel wrote: On 26/02/15 04:59, Juergen Gross wrote: So we are again in the situation that pv-drivers always imply the pvops kernel (PARAVIRT selected). I started the whole Kconfig rework to eliminate this dependency. Yes. Can you produce a series that just addresses this one issue. In the absence of any concrete requirement for this big Kconfig reorg I I don't think it is helpful. I clearly missed some context as I didn't realize that this was the intended goal. Why do we want this? Please explain as it won't come for free. We have a few PV interfaces for HVM guests that need PARAVIRT in Linux in order to be used, for example pv_time_ops and HVMOP_pagetable_dying. They are critical performance improvements and from the interface perspective, small enough that doesn't make much sense having a separate KConfig option for them. In order to reach the goal above we necessarily need to introduce a differentiation in terms of PV on HVM guests in Linux: 1) basic guests with PV network, disk, etc but no PV timers, no HVMOP_pagetable_dying, no PV IPIs 2) full PV on HVM guests that have PV network, disk, timers, HVMOP_pagetable_dying, PV IPIs and anything else that makes sense. 2) is much faster than 1) on Xen and 2) is only a tiny bit slower than 1) on native x86 Also don't we shove 2) down hvm guests right now? Even when everything is built in I do not see how we opt out for HVM for 1) at run time right now. If this is true then the question of motivation for this becomes even stronger I think. Yes, indeed there is no way to do 1) at the moment. And for good reasons, see above. OK if the goal is to be able to build front end drivers by avoiding building PARAVIRT / PARAVIRT_CLOCK and if the gains to be able to do so (which haven't been stated other than just the ability to do so) are small (as Stefano notes simple hvm containers do not perform great) but requires a bit of work, I'd rather ask -- why not address *why* we are avoiding PARAVIRT / PARAVIRT_CLOCK and stick to the original goals behind the pvops model by addressing what is required to be able to continue to be happy with one single kernel. The work required to do that might be more than to just be able to build simple Xen hvm containers without PARAVIRT / PARAVIRT_CLOCK but I'd think the gains would be much higher. I absolutely agree. I think this is a long term goal we should work on. PVH should address most of the issues, BTW. If this resonates well then I'd like to ask: what are the current most pressing issues with enabling PARAVIRT / PARAVIRT_CLOCK. PARAVIRT: performance, especially memory management PARAVIRT_CLOCK: none Juergen ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: xen config changes v4
On 02/26/2015 06:42 PM, Stefano Stabellini wrote: On Thu, 26 Feb 2015, Luis R. Rodriguez wrote: On Thu, Feb 26, 2015 at 11:08:20AM +, Stefano Stabellini wrote: On Thu, 26 Feb 2015, David Vrabel wrote: On 26/02/15 04:59, Juergen Gross wrote: So we are again in the situation that pv-drivers always imply the pvops kernel (PARAVIRT selected). I started the whole Kconfig rework to eliminate this dependency. Yes. Can you produce a series that just addresses this one issue. In the absence of any concrete requirement for this big Kconfig reorg I I don't think it is helpful. I clearly missed some context as I didn't realize that this was the intended goal. Why do we want this? Please explain as it won't come for free. We have a few PV interfaces for HVM guests that need PARAVIRT in Linux in order to be used, for example pv_time_ops and HVMOP_pagetable_dying. They are critical performance improvements and from the interface perspective, small enough that doesn't make much sense having a separate KConfig option for them. In order to reach the goal above we necessarily need to introduce a differentiation in terms of PV on HVM guests in Linux: 1) basic guests with PV network, disk, etc but no PV timers, no HVMOP_pagetable_dying, no PV IPIs 2) full PV on HVM guests that have PV network, disk, timers, HVMOP_pagetable_dying, PV IPIs and anything else that makes sense. 2) is much faster than 1) on Xen and 2) is only a tiny bit slower than 1) on native x86 Also don't we shove 2) down hvm guests right now? Even when everything is built in I do not see how we opt out for HVM for 1) at run time right now. If this is true then the question of motivation for this becomes even stronger I think. Yes, indeed there is no way to do 1) at the moment. And for good reasons, see above. Hmm, after checking the code I'm not convinced: - HVMOP_pagetable_dying is obsolete on modern hardware supporting EPT/HAP - PV IPIs are not needed on single-vcpu guests - PARAVIRT_CLOCK doesn't need PARAVIRT (in fact the SUSEs kernel configs for all x86_64 kernels have CONFIG_PARAVIRT_CLOCK=y) So I think we really should enable building Xen frontends without PARAVIRT, implying at least no XEN_PV and no XEN_PVH. I'll have a try setting up patches. Juergen ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 3/4] xen: sched: make counters for vCPU tickling generic
2015-02-26 8:37 GMT-05:00 Dario Faggioli : > and update them from Credit2 and RTDS schedulers. > > Signed-off-by: Dario Faggioli > Cc: Meng Xu > Cc: George Dunlap > Cc: Jan Beulich > Cc: Keir Fraser > --- > xen/common/sched_credit2.c |2 ++ > xen/common/sched_rt.c|2 ++ > xen/include/xen/perfc_defn.h |4 ++-- > 3 files changed, 6 insertions(+), 2 deletions(-) > > diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c > index 2b852cc..bf13a84 100644 > --- a/xen/common/sched_credit2.c > +++ b/xen/common/sched_credit2.c > @@ -571,9 +571,11 @@ tickle: >(unsigned char *)&d); > } > cpumask_set_cpu(ipid, &rqd->tickled); > +SCHED_STAT_CRANK(tickle_idlers_some); > cpu_raise_softirq(ipid, SCHEDULE_SOFTIRQ); > > no_tickle: > +SCHED_STAT_CRANK(tickle_idlers_none); > return; > } > > diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c > index 49d1b83..2ad0c68 100644 > --- a/xen/common/sched_rt.c > +++ b/xen/common/sched_rt.c > @@ -929,6 +929,7 @@ runq_tickle(const struct scheduler *ops, struct > rt_vcpu *new) > } > > /* didn't tickle any cpu */ > +SCHED_STAT_CRANK(tickle_idlers_none); > return; > out: > /* TRACE */ > @@ -944,6 +945,7 @@ out: > } > > cpumask_set_cpu(cpu_to_tickle, &prv->tickled); > +SCHED_STAT_CRANK(tickle_idlers_some); > cpu_raise_softirq(cpu_to_tickle, SCHEDULE_SOFTIRQ); > return; > } > The change for RTDS scheduler looks good to me. Thanks, Meng -- --- Meng Xu PhD Student in Computer and Information Science University of Pennsylvania ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/4] xen: sched: honour generic perf conuters in the RTDS scheduler
Not sure if I should comment with Reviewed-by, I will just do it. Please just ignore if I should not add Reviewed-by. 2015-02-26 8:36 GMT-05:00 Dario Faggioli : > more specifically, about vCPU initialization and destruction events, > in line with adb26c09f26e ("xen: sched: introduce a couple of counters > in credit2 and SEDF"). > > Signed-off-by: Dario Faggioli > Cc: Meng Xu > Cc: George Dunlap > Cc: Jan Beulich > Cc: Keir Fraser > --- > xen/common/sched_rt.c |4 > 1 file changed, 4 insertions(+) > > diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c > index df4adac..58dd646 100644 > --- a/xen/common/sched_rt.c > +++ b/xen/common/sched_rt.c > @@ -525,6 +525,8 @@ rt_alloc_vdata(const struct scheduler *ops, struct > vcpu *vc, void *dd) > if ( !is_idle_vcpu(vc) ) > svc->budget = RTDS_DEFAULT_BUDGET; > > +SCHED_STAT_CRANK(vcpu_init); > + > return svc; > } > > @@ -574,6 +576,8 @@ rt_vcpu_remove(const struct scheduler *ops, struct > vcpu *vc) > struct rt_dom * const sdom = svc->sdom; > spinlock_t *lock; > > +SCHED_STAT_CRANK(vcpu_destroy); > + > BUG_ON( sdom == NULL ); > > lock = vcpu_schedule_lock_irq(vc); > > Reviewed-by: Meng Xu Thanks, Meng --- Meng Xu PhD Student in Computer and Information Science University of Pennsylvania ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [linux-3.16 test] 35326: regressions - FAIL
flight 35326 linux-3.16 real [real] http://www.chiark.greenend.org.uk/~xensrcts/logs/35326/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-amd64-xl-credit2 15 guest-localmigrate/x10fail REGR. vs. 34167 test-amd64-i386-pair 17 guest-migrate/src_host/dst_host fail REGR. vs. 34167 Tests which are failing intermittently (not blocking): test-amd64-i386-rumpuserxen-i386 8 guest-start fail pass in 34793 test-amd64-amd64-libvirt 9 guest-start fail pass in 34793 test-amd64-amd64-xl-pvh-intel 5 xen-boot fail in 34793 pass in 35326 test-amd64-i386-rhel6hvm-intel 5 xen-boot fail in 34793 pass in 35326 test-amd64-amd64-xl-sedf 3 host-install(3) broken in 34793 pass in 35326 test-amd64-i386-freebsd10-amd64 5 xen-bootfail in 34793 pass in 35326 test-amd64-i386-xl-qemut-win7-amd64 5 xen-bootfail in 34793 pass in 35326 test-amd64-i386-xl-winxpsp3 5 xen-boot fail in 34793 pass in 35326 test-amd64-i386-pair 8 xen-boot/dst_host fail in 34793 pass in 35326 test-amd64-i386-pair 7 xen-boot/src_host fail in 34793 pass in 35326 Regressions which are regarded as allowable (not blocking): test-amd64-amd64-rumpuserxen-amd64 13 rumpuserxen-demo-xenstorels/xenstorels/;.repeat fail in 34793 blocked in 34167 Tests which did not succeed, but are not blocking: test-amd64-amd64-xl-pvh-intel 9 guest-start fail never pass test-armhf-armhf-xl-multivcpu 10 migrate-support-checkfail never pass test-armhf-armhf-xl-sedf 10 migrate-support-checkfail never pass test-armhf-armhf-libvirt 10 migrate-support-checkfail never pass test-armhf-armhf-xl-sedf-pin 10 migrate-support-checkfail never pass test-armhf-armhf-xl-midway 10 migrate-support-checkfail never pass test-amd64-i386-libvirt 10 migrate-support-checkfail never pass test-amd64-amd64-xl-pvh-amd 9 guest-start fail never pass test-armhf-armhf-xl 10 migrate-support-checkfail never pass test-amd64-amd64-xl-pcipt-intel 9 guest-start fail never pass test-armhf-armhf-xl-credit2 10 migrate-support-checkfail never pass test-amd64-i386-freebsd10-i386 7 freebsd-install fail never pass test-amd64-amd64-xl-multivcpu 15 guest-localmigrate/x10 fail never pass test-amd64-amd64-xl-sedf 15 guest-localmigrate/x10 fail never pass test-amd64-i386-freebsd10-amd64 7 freebsd-install fail never pass test-amd64-amd64-xl-sedf-pin 15 guest-localmigrate/x10 fail never pass test-amd64-amd64-xl-qemuu-win7-amd64 14 guest-stop fail never pass test-amd64-amd64-rumpuserxen-amd64 13 rumpuserxen-demo-xenstorels/xenstorels.repeat fail never pass test-amd64-i386-xl-qemut-winxpsp3 14 guest-stopfail never pass test-amd64-i386-xl-qemut-win7-amd64 14 guest-stop fail never pass test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 14 guest-stop fail never pass test-amd64-amd64-xl-win7-amd64 14 guest-stop fail never pass test-amd64-amd64-xl-qemut-winxpsp3 14 guest-stop fail never pass test-amd64-i386-xl-win7-amd64 14 guest-stop fail never pass test-amd64-amd64-xl-winxpsp3 14 guest-stop fail never pass test-amd64-i386-xl-qemuu-win7-amd64 14 guest-stop fail never pass test-amd64-i386-xl-qemuu-winxpsp3 14 guest-stopfail never pass test-amd64-amd64-xl-qemut-win7-amd64 14 guest-stop fail never pass test-amd64-i386-xl-winxpsp3-vcpus1 14 guest-stop fail never pass test-amd64-i386-xl-qemut-winxpsp3-vcpus1 14 guest-stop fail never pass test-amd64-i386-xl-winxpsp3 14 guest-stop fail never pass test-amd64-amd64-xl-qemuu-winxpsp3 7 windows-install fail never pass test-amd64-amd64-libvirt 10 migrate-support-check fail in 34793 never pass version targeted for testing: linux4ba6745b95608891fdec154f6e75479e15a8a24e baseline version: linux19583ca584d6f574384e17fe7613dfaeadcdc4a6 1040 people touched revisions under test, not listing them all jobs: build-amd64 pass build-armhf pass build-i386 pass build-amd64-libvirt pass build-armhf-libvirt pass build-i386-libvirt pass build-amd64-pvopspass build-armhf-pvops
Re: [Xen-devel] how to assign resources exclusive to a single domU
On 02/26/2015 09:57 AM, Olaf Hering wrote: While working on pvscsi support for libxl I noticed that assigning a resource exclusivly to just a single domU via libxl will be a major effort. Up to now libxl could rely on the fact that a resource can be either shared or the backend deals with the attempt to share. There are two cases in pvscsi: 1) a single physical HST:CHN:TGT:LUN device must be assigned to just a single domU. While the (xenlinux) backend driver allows to assign the device to more than one domU the sharing can not work in practice. You should keep in mind that *some* cases might be absolutely okay. Please don't assume all sharing configurations are nonsense! 2) the xenlinux backend driver has two modes: emulation and raw. With raw mode the SCSI commands coming from domU will be passed directly to the physical device. I think its required to make sure that all devices connected to a physical scsi host must operate either entirely in raw mode or on emulation mode. This can be mapped to case #1: the raw mode is selected by assigning all LUNs of a target to a guest via "feature-host". If case #1 is verified it wouldn't be possible to assign some LUNs multiple times which would be required to have a mixture of raw and emulation for a target. I wouldn't do more than xend in this case. The pvops upstream pvscsi backend doesn't need the emulation mode any more, this is handled by the generic target infrastructure . To handle both cases libxl could either assume that the admin is responsible for proper configuration: - just one domU per physical device - if raw mode is enabled all devices on the physcial scsi host will be assigned to just one domU Like in the non-virtualized world: the admin has to ensure that devices in the SAN are either used by only one system, or that the systems using it coordinate the shared usage. Or libxl gets functionality to verify that two cases above are really enforced. Doing that means that there has to be some global lock under which the system state in xenstore is parsed and the to be assigned domU configuration is compared: - are the physical devices already assigned - is the raw mode properly configured In xend the case #1 was not handled. There is some code for case #2, I have to check how complete the enforcement in xend was. I wonder what should be done in my changes for libxl. If you are doing something, please add a flag to be able to disable the additional security checks regarding multiple assignment. Juergen ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [qemu-mainline test] 35298: regressions - FAIL
flight 35298 qemu-mainline real [real] http://www.chiark.greenend.org.uk/~xensrcts/logs/35298/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-amd64-i386-xl-win7-amd64 7 windows-install fail REGR. vs. 33480 test-amd64-amd64-xl-winxpsp3 7 windows-install fail REGR. vs. 33480 test-amd64-i386-xl-winxpsp3 7 windows-install fail REGR. vs. 33480 test-amd64-i386-xl-qemuu-winxpsp3 7 windows-install fail REGR. vs. 33480 test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 7 windows-install fail REGR. vs. 33480 test-amd64-amd64-xl-win7-amd64 7 windows-install fail REGR. vs. 33480 test-amd64-i386-xl-qemuu-ovmf-amd64 7 debian-hvm-install fail REGR. vs. 33480 test-amd64-i386-qemuu-rhel6hvm-intel 7 redhat-installfail REGR. vs. 33480 test-amd64-i386-xl-qemuu-debianhvm-amd64 7 debian-hvm-install fail REGR. vs. 33480 test-amd64-i386-xl-winxpsp3-vcpus1 7 windows-install fail REGR. vs. 33480 test-amd64-amd64-xl-qemuu-ovmf-amd64 7 debian-hvm-install fail REGR. vs. 33480 test-amd64-amd64-xl-qemuu-debianhvm-amd64 7 debian-hvm-install fail REGR. vs. 33480 test-amd64-i386-freebsd10-i386 8 guest-start fail REGR. vs. 33480 test-amd64-i386-freebsd10-amd64 8 guest-startfail REGR. vs. 33480 test-amd64-amd64-xl-qemuu-winxpsp3 7 windows-install fail REGR. vs. 33480 test-amd64-i386-xl-qemuu-win7-amd64 7 windows-installfail REGR. vs. 33480 test-amd64-i386-qemuu-rhel6hvm-amd 7 redhat-install fail REGR. vs. 33480 test-amd64-i386-rhel6hvm-amd 7 redhat-installfail REGR. vs. 33480 test-amd64-amd64-xl-qemuu-win7-amd64 7 windows-install fail REGR. vs. 33480 test-amd64-i386-rhel6hvm-intel 7 redhat-install fail REGR. vs. 33480 Regressions which are regarded as allowable (not blocking): test-amd64-i386-pair17 guest-migrate/src_host/dst_host fail like 33480 Tests which did not succeed, but are not blocking: test-armhf-armhf-xl 10 migrate-support-checkfail never pass test-armhf-armhf-xl-sedf 10 migrate-support-checkfail never pass test-armhf-armhf-xl-sedf-pin 10 migrate-support-checkfail never pass test-armhf-armhf-xl-midway 10 migrate-support-checkfail never pass test-amd64-amd64-xl-pvh-intel 9 guest-start fail never pass test-armhf-armhf-xl-credit2 10 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 10 migrate-support-checkfail never pass test-armhf-armhf-libvirt 10 migrate-support-checkfail never pass test-amd64-i386-libvirt 10 migrate-support-checkfail never pass test-amd64-amd64-xl-pvh-amd 9 guest-start fail never pass test-amd64-amd64-libvirt 10 migrate-support-checkfail never pass test-amd64-amd64-xl-pcipt-intel 9 guest-start fail never pass test-amd64-i386-xl-qemut-win7-amd64 14 guest-stop fail never pass test-amd64-i386-xl-qemut-winxpsp3-vcpus1 14 guest-stop fail never pass test-amd64-amd64-xl-qemut-winxpsp3 14 guest-stop fail never pass test-amd64-amd64-xl-qemut-win7-amd64 14 guest-stop fail never pass test-amd64-i386-xl-qemut-winxpsp3 14 guest-stopfail never pass version targeted for testing: qemuucd2d5541271f1934345d8ca42f5fafff1744eee7 baseline version: qemuu1e42c353469cb58ca4f3b450eea4211af7d0b147 People who touched revisions under test: Alberto Garcia Alex Suykov Alex Williamson Alexander Graf Alexey Kardashevskiy Alistair Francis Amit Shah Andreas Färber Aurelien Jarno Avi Kivity Bastian Koppelmann Ben Taylor Benjamin Herrenschmidt Bharata B Rao Blue Swirl Chen Fan Chen Gang Chen Gang S Christian Borntraeger Christophe Lyon Claudio Fontana Cornelia Huck Daniel P. Berrange Denis V. Lunev Dinar Valeev Don Koch Don Slutz Dr. David Alan Gilbert Ed Swierk Eduardo Habkost Eduardo Otubo Fabrice Bellard Fam Zheng Felix Janda Francesco Romani Frank Blaschka Gerd Hoffmann Gonglei Greg Bellows Greg Kurz Guan Xuetao Igor Mammedov Ildar Isaev Jan Kiszka Jason Wang Jeff Cody Jiri Slaby John Arbuckle Juan Quintela Kevin Wolf Kirill Batuzov Laszlo Ersek Laurent Desnogues Leon Yu Marc-André Lureau Mark Cave-Ayland Markus Armbruster Markus Armbruster Max Filippov Max Reitz Maxim Ostapenko Michael S. Tsirkin Michael Tokarev Paolo Bonzini Paul Brook Paul Durrant Peter Lieven Peter Maydell Peter Wu Pranavkumar Sawargaonkar Programmingkid Richard Henderson Richard Sandiford Riku Voipio Stefan Hajnoczi Stefan Weil Stefano Stabellini Thomas Huth Torbjorn Gr
Re: [Xen-devel] [PATCH 3/3] libxl: libxl__device_from_disk should retrieve backend from xenstore
Wei Liu wrote: > On Wed, Feb 11, 2015 at 10:18:18AM -0700, Jim Fehlig wrote: > >> At minimum, libvirt will populate the pdev_path, vdev, backend, and >> format fields. If backend and format (which, in libvirt-speack >> correspond to the 'name' and 'type' attributes on the optional >> element) are not specified, they are set to LIBXL_DISK_BACKEND_UNKNOWN >> and LIBXL_DISK_FORMAT_RAW respectively. >> >> > > Since libvirt has a tendency of specifying everything, how come there is > no "name" and "type" in ? The element is optional. From http://libvirt.org/formatdomain.html#elementsDisks "|driver: |The optional driver element allows specifying further details related to the hypervisor driver used to provide the disk" And when not specified, Ian C. recommended allowing libxl to pick suitable defaults https://www.redhat.com/archives/libvir-list/2013-February/msg01126.html > Can we actually generate all the > > fields needed when attaching a disk and store in libvirt's diskspec? Yes, it was this way before the suggested change. Regards, Jim ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] freemem-slack and large memory environments
On Thursday, February 26, 2015 01:45:16 PM Mike Latimer wrote: > On Thursday, February 26, 2015 05:53:06 PM Stefano Stabellini wrote: > > What is the return value of libxl_set_memory_target and > > libxl_wait_for_free_memory in that case? Isn't it just a matter of > > properly handle the return values? > > The return from libxl_set_memory_target is 0, as the assignment works just > fine. I don't have the return from libxl_wait_for_free_memory in my notes, > so I'll spin up another test and track that down. I slightly misspoke here... In my testing, the returns are actually: libxl_set_memory_target = 1 libxl_wait_for_free_memory = -5 libxl_wait_for_memory_target = 0 Note - libxl_wait_for_memory_target is confusing, as rc can be set to ERROR_FAIL, but the function returns 0 anyway (unless an error is encountered earlier.) I guess this just means we need to continue to wait... I was testing spinning up a 64GB guest on a 2TB host. After the ballooning had completed, dom0 had ballooned down an extra ~320GB. On this particular machine, each iteration of the loop was showing only 5-7GB of memory being freed at a time. (The loop took 12 iterations.) -Mike ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [RFC 2/2] x86, vdso, pvclock: Simplify and speed up the vdso pvclock reader
On Thu, Jan 8, 2015 at 2:43 PM, Andy Lutomirski wrote: > On Thu, Jan 8, 2015 at 2:31 PM, Marcelo Tosatti wrote: >> On Tue, Jan 06, 2015 at 11:49:09AM -0800, Andy Lutomirski wrote: >>> On Tue, Jan 6, 2015 at 10:45 AM, Marcelo Tosatti >>> wrote: >>> > On Tue, Jan 06, 2015 at 10:26:22AM -0800, Andy Lutomirski wrote: >>> >> On Tue, Jan 6, 2015 at 10:13 AM, Marcelo Tosatti >>> >> wrote: >>> >> > On Tue, Jan 06, 2015 at 08:56:40AM -0800, Andy Lutomirski wrote: >>> >> >> On Jan 6, 2015 4:01 AM, "Paolo Bonzini" wrote: >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> > On 06/01/2015 09:42, Paolo Bonzini wrote: >>> >> >> > > > > Still confused. So we can freeze all vCPUs in the host, then >>> >> >> > > > > update >>> >> >> > > > > pvti 1, then resume vCPU 1, then update pvti 0? In that >>> >> >> > > > > case, we have >>> >> >> > > > > a problem, because vCPU 1 can observe pvti 0 mid-update, and >>> >> >> > > > > KVM >>> >> >> > > > > doesn't increment the version pre-update, and we can return >>> >> >> > > > > completely >>> >> >> > > > > bogus results. >>> >> >> > > > Yes. >>> >> >> > > But then the getcpu test would fail (1->0). Even if you have an >>> >> >> > > ABA >>> >> >> > > situation (1->0->1), it's okay because the pvti that is fetched >>> >> >> > > is the >>> >> >> > > one returned by the first getcpu. >>> >> >> > >>> >> >> > ... this case of partial update of pvti, which is caught by the >>> >> >> > version >>> >> >> > field, if of course different from the other (extremely unlikely) >>> >> >> > that >>> >> >> > Andy pointed out. That is when the getcpus are done on the same >>> >> >> > vCPU, >>> >> >> > but the rdtsc is another. >>> >> >> > >>> >> >> > That one can be fixed by rdtscp, like >>> >> >> > >>> >> >> > do { >>> >> >> > // get a consistent (pvti, v, tsc) tuple >>> >> >> > do { >>> >> >> > cpu = get_cpu(); >>> >> >> > pvti = get_pvti(cpu); >>> >> >> > v = pvti->version & ~1; >>> >> >> > // also acts as rmb(); >>> >> >> > rdtsc_barrier(); >>> >> >> > tsc = rdtscp(&cpu1); >>> >> >> >>> >> >> Off-topic note: rdtscp doesn't need a barrier at all. AIUI AMD >>> >> >> specified it that way and both AMD and Intel implement it correctly. >>> >> >> (rdtsc, on the other hand, definitely needs the barrier beforehand.) >>> >> >> >>> >> >> > // control dependency, no need for rdtsc_barrier? >>> >> >> > } while(cpu != cpu1); >>> >> >> > >>> >> >> > // ... compute nanoseconds from pvti and tsc ... >>> >> >> > rmb(); >>> >> >> > } while(v != pvti->version); >>> >> >> >>> >> >> Still no good. We can migrate a bunch of times so we see the same CPU >>> >> >> all three times and *still* don't get a consistent read, unless we >>> >> >> play nasty games with lots of version checks (I have a patch for that, >>> >> >> but I don't like it very much). The patch is here: >>> >> >> >>> >> >> https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/vdso_paranoia&id=a69754dc5ff33f5187162b5338854ad23dd7be8d >>> >> >> >>> >> >> but I don't like it. >>> >> >> >>> >> >> Thus far, I've been told unambiguously that a guest can't observe pvti >>> >> >> while it's being written, and I think you're now telling me that this >>> >> >> isn't true and that a guest *can* observe pvti while it's being >>> >> >> written while the low bit of the version field is not set. If so, >>> >> >> this is rather strongly incompatible with the spec in the KVM docs. >>> >> >> >>> >> >> I don't suppose that you and Marcelo could agree on what the actual >>> >> >> semantics that KVM provides are and could write it down in a way that >>> >> >> people who haven't spent a long time staring at the request code >>> >> >> understand? And maybe you could even fix the implementation while >>> >> >> you're at it if the implementation is, indeed, broken. I have ugly >>> >> >> patches to fix it here: >>> >> >> >>> >> >> https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/commit/?h=x86/vdso_paranoia&id=3b718a050cba52563d831febc2e1ca184c02bac0 >>> >> >> >>> >> >> but I'm not thrilled with them. >>> >> >> >>> >> >> --Andy >>> >> > >>> >> > I suppose that separating the version write from the rest of the >>> >> > pvclock >>> >> > structure is sufficient, as that would guarantee the writes are not >>> >> > reordered even with fast string REP MOVS. >>> >> > >>> >> > Thanks for catching this Andy! >>> >> > >>> >> >>> >> Don't you stil need: >>> >> >>> >> version++; >>> >> write the rest; >>> >> version++; >>> >> >>> >> with possible smp_wmb() in there to keep the compiler from messing >>> >> around? >>> > >>> > Correct. Could just as well follow the protocol and use odd/even, which >>> > is what your patch does. >>> > >>> > What is the point with the new flags bit though? >>> >>> To try to work around the problem on old hosts. I'm not at all >>> convinced that this is worthwhile or that it helps, though. >> >> Andy, >> >> Are you going to submit t
Re: [Xen-devel] Shared page tables between ETP and IOMMU issue
On Thu, Feb 26, 2015 at 2:31 PM, Roger Pau Monné wrote: > El 26/02/15 a les 19.02, Roger Pau Monné ha escrit: >> El 26/02/15 a les 17.43, Jan Beulich ha escrit: >> On 26.02.15 at 17:29, wrote: OK, I will try to take a look. All those faults come from physical memory ranges that are supposed to be usable, and in fact the CPU seems to be able to read/write from them without problems, or else the guest would have crashed much more early. Regarding sharing the page tables between EPT and the IOMMU, is there some bit that needs to be set in the ept entry in order to mark a page as available by the IOMMU? >>> >>> Bits 0 and 1 (read and write) are shared between VT-d and EPT >>> (as is bit 7 - see struct dma_pte and ept_entry_t). >> >> I've added some debug prints at the end of construct_dom0 to print the >> MFN of a RAM page (using get_gfn_query_unlocked) and the VTd entry >> (using print_vtd_entries): >> >> (XEN) print_vtd_entries: iommu 8302197c3a40 dev :00:1f.2 gmfn 43e0 >> (XEN) root_entry = 8302197c >> (XEN) root_entry[0] = 140144001 >> (XEN) context = 830140144000 >> (XEN) context[fa] = 2_140148001 >> (XEN) l4 = 830140148000 >> (XEN) l4_index = 0 >> (XEN) l4[0] = 140147003 >> (XEN) l3 = 830140147000 >> (XEN) l3_index = 0 >> (XEN) l3[0] = 140146003 >> (XEN) l2 = 830140146000 >> (XEN) l2_index = 21 >> (XEN) l2[21] = 0 >> (XEN) l2[21] not present >> (XEN) GFN: 0x43e0 MFN: 0x1401e3 type: 0 >> >> This is before Dom0 has been started, so I think there's something >> wrong in the way we build the page tables, because AFAICT the VTd >> code is not able to resolve the GFN, but the EPT code is. > > BTW, if I set no-sharept the output is as expected: > > (XEN) print_vtd_entries: iommu 8302197c3a40 dev :00:1f.2 gmfn 43e0 > (XEN) root_entry = 8302197c > (XEN) root_entry[0] = 19793f001 > (XEN) context = 83019793f000 > (XEN) context[fa] = 2_140149001 > (XEN) l4 = 830140149000 > (XEN) l4_index = 0 > (XEN) l4[0] = 140148003 > (XEN) l3 = 830140148000 > (XEN) l3_index = 0 > (XEN) l3[0] = 140147003 > (XEN) l2 = 830140147000 > (XEN) l2_index = 21 > (XEN) l2[21] = 14012c003 > (XEN) l1 = 83014012c000 > (XEN) l1_index = 1e0 > (XEN) l1[1e0] = 1401e3003 > (XEN) GFN: 0x43e0 MFN: 0x1401e3 type: 0 > > > ___ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel Hi Roger Can you please print same debug for 7cb92 address (where L3 page table is missing). With shared and not shared ept? Thank you! -- Elena ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] freemem-slack and large memory environments
On Thursday, February 26, 2015 05:53:06 PM Stefano Stabellini wrote: > What is the return value of libxl_set_memory_target and > libxl_wait_for_free_memory in that case? Isn't it just a matter of > properly handle the return values? The return from libxl_set_memory_target is 0, as the assignment works just fine. I don't have the return from libxl_wait_for_free_memory in my notes, so I'll spin up another test and track that down. > Or maybe we just need to change the libxl_set_memory_target call to use > an absolute memory target to avoid restricting dom0 memory more than > necessary at each iteration. Also increasing the timeout argument passed > to the libxl_wait_for_free_memory call could help. Using an absolute target would help, and would obviously only have to be set once - which is similar to what my patch did. Increasing the timeout would help, but if the timeout were insufficient (say when dealing with very large guests), it wouldn't solve the problem. -Mike ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] freemem-slack and large memory environments
(Sorry for the delayed response, dealing with ENOTIME.) On Thursday, February 26, 2015 05:47:21 PM Ian Campbell wrote: > On Thu, 2015-02-26 at 10:38 -0700, Mike Latimer wrote: > > >rc = libxl_set_memory_target(ctx, 0, free_memkb - need_memkb, 1, 0); > > I think so. In essence we just need to update need_memkb on each > iteration, right? Not quite... need_memkb is used in the loop to determine if we have enough free memory for the new domain. So, need_memkb should always remain set to the total amount of memory requested - not just the amount of change still required. The easiest thing to do is set the dom0's memory target before the loop, which is what my original patch did. Another approach would be something like this: uint32_t dom0_memkb, dom0_targetkb, pending_memkb; libxl_get_memory(ctx, 0, &dom0_memkb); <--doesn't actually exist libxl_get_memory_target(ctx, 0, &dom0_targetkb); pending_memkb = (free_memkb + (dom0_memkb - dom0_targetkb)); if (pending_memkb < need_memkb) { libxl_set_memory_target(ctx, 0, pending_memkb - need_memkb, 1, 0); } which essentially sets pending_memkb to the amount of free memory plus the amount of memory which will be freed once dom0 hits the its target. The final possibility I can think of is to ensure libxl_wait_for_memory_target does not return until the memory target has been reached. That raises some concern about what happens if the target cannot be reached though... -Mike ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2] RFC: Automatically check xen's public headers for C++ pitfalls.
On 02/26/15 14:22, Tim Deegan wrote: > At 19:49 +0200 on 26 Feb (1424976562), Razvan Cojocaru wrote: >> On 02/26/2015 07:01 PM, Tim Deegan wrote: >>> +#ifdef __cplusplus >>> +/* 'private' is a keyword in C++, so we have to use a different name for >>> + * private state there. Leaving the C name alone to avoid unnecessary >>> + * pain for the existing users. */ >>> +#define XEN_RING_PRIVATE pvt >>> +#else >>> +#define XEN_RING_PRIVATE private >>> +#endif >> >> Are there likely to be many users outside of the ones using that code >> with mem_event? > > Yes, lots. It's used to implement split drivers for net, block, etc. > Most users will have taken copies of this header into their own trees, > though, and so won't face build breakage, and this isn't an ABI change. > > So far, I've seen David and Andrew in favour of just changing the > field's name and letting out-of-tree users update their copies when/if > they want to. Jan would prefer to avoid changing the field name for C > users. I'm not delighted with any of these options but I think this > ifdeffery is worse than the others. :) > > Let's see what anyone else has to say. > Since I am one of the user of C++ and Xen headers, I like this a lot. I do not like the ifdeffery above. I am in favour of just changing the the field's name. -Don Slutz > Cheers, > > Tim. > > ___ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel > ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v3] RFC: Automatically check xen's public headers for C++ pitfalls.
On 02/26/15 11:24, Tim Deegan wrote: > Add a check, like the existing check for non-ANSI C in the public > headers, that runs the public headers through a C++ compiler to > flag non-C++-friendly constructs. > > Unlike the ANSI C check, we accept GCC-isms (gnu++98), and we also > check various tools-only headers. > > Explicitly _not_ addressing the use of 'private' in various fields, > since we'd previously decided not to fix that. This sentence and the "-Dprivate=private_is_a_keyword_in_cpp" below appear to be at odds. Maybe add something like: The check ignores the use of 'private'. > > Also tidy up the runes for these checks to be a bit more readable. > > Reported-by: Razvan Cojocaru > Signed-off-by: Tim Deegan > Cc: Jan Beulich > > --- > You can add my Tested-by: Don Slutz -Don Slutz > v3: rebase on staging. > > v2: test more headers; > define __XEN_TOOLS__; > use g++98 rather than ansi; > tidy the makefile for readability; > add a missing include to flask_op.h, which uses evtchn_port_t. > --- > .gitignore| 1 + > config/StdGNU.mk | 2 ++ > config/SunOS.mk | 1 + > xen/include/Makefile | 28 > xen/include/public/xsm/flask_op.h | 2 ++ > 5 files changed, 30 insertions(+), 4 deletions(-) > > diff --git a/.gitignore b/.gitignore > index 13ee05b..78958ea 100644 > --- a/.gitignore > +++ b/.gitignore > @@ -233,6 +233,7 @@ xen/arch/*/efi/compat.c > xen/arch/*/efi/efi.h > xen/arch/*/efi/runtime.c > xen/include/headers.chk > +xen/include/headers++.chk > xen/include/asm > xen/include/asm-*/asm-offsets.h > xen/include/compat/* > diff --git a/config/StdGNU.mk b/config/StdGNU.mk > index 4efebe3..e10ed39 100644 > --- a/config/StdGNU.mk > +++ b/config/StdGNU.mk > @@ -2,9 +2,11 @@ AS = $(CROSS_COMPILE)as > LD = $(CROSS_COMPILE)ld > ifeq ($(clang),y) > CC = $(CROSS_COMPILE)clang > +CXX= $(CROSS_COMPILE)clang++ > LD_LTO = $(CROSS_COMPILE)llvm-ld > else > CC = $(CROSS_COMPILE)gcc > +CXX= $(CROSS_COMPILE)g++ > LD_LTO = $(CROSS_COMPILE)ld > endif > CPP= $(CC) -E > diff --git a/config/SunOS.mk b/config/SunOS.mk > index 3316280..c2be37d 100644 > --- a/config/SunOS.mk > +++ b/config/SunOS.mk > @@ -2,6 +2,7 @@ AS = $(CROSS_COMPILE)gas > LD = $(CROSS_COMPILE)gld > CC = $(CROSS_COMPILE)gcc > CPP= $(CROSS_COMPILE)gcc -E > +CXX= $(CROSS_COMPILE)g++ > AR = $(CROSS_COMPILE)gar > RANLIB = $(CROSS_COMPILE)granlib > NM = $(CROSS_COMPILE)gnm > diff --git a/xen/include/Makefile b/xen/include/Makefile > index 94112d1..d48a642 100644 > --- a/xen/include/Makefile > +++ b/xen/include/Makefile > @@ -87,13 +87,33 @@ compat/xlat.h: $(addprefix compat/.xlat/,$(xlat-y)) > Makefile > > ifeq ($(XEN_TARGET_ARCH),$(XEN_COMPILE_ARCH)) > > -all: headers.chk > +all: headers.chk headers++.chk > > -headers.chk: $(filter-out public/arch-% public/%ctl.h public/xsm/% > public/%hvm/save.h, $(wildcard public/*.h public/*/*.h) $(public-y)) Makefile > - for i in $(filter %.h,$^); do $(CC) -ansi -include stdint.h -Wall -W > -Werror -S -o /dev/null -x c $$i || exit 1; echo $$i; done >$@.new > +PUBLIC_HEADERS := $(filter-out public/arch-% public/dom0_ops.h, $(wildcard > public/*.h public/*/*.h) $(public-y)) > + > +PUBLIC_ANSI_HEADERS := $(filter-out public/%ctl.h public/xsm/% > public/%hvm/save.h, $(PUBLIC_HEADERS)) > + > +headers.chk: $(PUBLIC_ANSI_HEADERS) Makefile > + for i in $(filter %.h,$^); do \ > + $(CC) -x c -ansi -Wall -Werror -include stdint.h \ > + -S -o /dev/null $$i || exit 1; \ > + echo $$i; \ > + done >$@.new > + mv $@.new $@ > + > +headers++.chk: $(PUBLIC_HEADERS) Makefile > + if $(CXX) -v >/dev/null 2>&1; then \ > + for i in $(filter %.h,$^); do \ > + $(CXX) -x c++ -std=gnu++98 -Wall -Werror \ > +-D__XEN_TOOLS__ -Dprivate=private_is_a_keyword_in_cpp \ > +-include stdint.h -include public/xen.h \ > +-S -o /dev/null $$i || exit 1; \ > + echo $$i; \ > + done ; \ > + fi >$@.new > mv $@.new $@ > > endif > > clean:: > - rm -rf compat headers.chk > + rm -rf compat headers.chk headers++.chk > diff --git a/xen/include/public/xsm/flask_op.h > b/xen/include/public/xsm/flask_op.h > index 233de81..f874589 100644 > --- a/xen/include/public/xsm/flask_op.h > +++ b/xen/include/public/xsm/flask_op.h > @@ -25,6 +25,8 @@ > #ifndef __FLASK_OP_H__ > #define __FLASK_OP_H__ > > +#include "../event_channel.h" > + > #define XEN_FLASK_INTERFACE_VERSION 1 > > struct xen_flask_load { > ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [xen-unstable test] 35257: regressions - FAIL
flight 35257 xen-unstable real [real] http://www.chiark.greenend.org.uk/~xensrcts/logs/35257/ Regressions :-( Tests which did not succeed and are blocking, including tests which could not be run: test-armhf-armhf-libvirt 12 guest-start.2 fail REGR. vs. 34629 Regressions which are regarded as allowable (not blocking): test-amd64-i386-pair17 guest-migrate/src_host/dst_host fail like 34629 Tests which did not succeed, but are not blocking: test-armhf-armhf-xl-sedf-pin 10 migrate-support-checkfail never pass test-armhf-armhf-xl 10 migrate-support-checkfail never pass test-armhf-armhf-xl-sedf 10 migrate-support-checkfail never pass test-amd64-amd64-rumpuserxen-amd64 13 rumpuserxen-demo-xenstorels/xenstorels.repeat fail never pass test-armhf-armhf-xl-credit2 10 migrate-support-checkfail never pass test-armhf-armhf-xl-midway 10 migrate-support-checkfail never pass test-amd64-amd64-xl-pvh-amd 9 guest-start fail never pass test-armhf-armhf-libvirt 10 migrate-support-checkfail never pass test-amd64-amd64-xl-pvh-intel 9 guest-start fail never pass test-amd64-i386-libvirt 10 migrate-support-checkfail never pass test-armhf-armhf-xl-multivcpu 10 migrate-support-checkfail never pass test-amd64-amd64-xl-pcipt-intel 9 guest-start fail never pass test-amd64-amd64-libvirt 10 migrate-support-checkfail never pass test-amd64-i386-xl-qemut-winxpsp3 14 guest-stopfail never pass test-amd64-i386-xl-qemuu-winxpsp3-vcpus1 14 guest-stop fail never pass test-amd64-i386-xl-qemuu-winxpsp3 14 guest-stopfail never pass test-amd64-amd64-xl-qemut-win7-amd64 14 guest-stop fail never pass test-amd64-i386-xl-qemuu-win7-amd64 14 guest-stop fail never pass test-amd64-amd64-xl-winxpsp3 14 guest-stop fail never pass test-amd64-amd64-xl-qemuu-win7-amd64 14 guest-stop fail never pass test-amd64-amd64-xl-win7-amd64 14 guest-stop fail never pass test-amd64-i386-xl-qemut-winxpsp3-vcpus1 14 guest-stop fail never pass test-amd64-i386-xl-qemut-win7-amd64 14 guest-stop fail never pass test-amd64-i386-xl-win7-amd64 14 guest-stop fail never pass test-amd64-i386-xl-winxpsp3-vcpus1 14 guest-stop fail never pass test-amd64-i386-xl-winxpsp3 14 guest-stop fail never pass test-amd64-amd64-xl-qemut-winxpsp3 14 guest-stop fail never pass test-amd64-amd64-xl-qemuu-winxpsp3 14 guest-stop fail never pass version targeted for testing: xen 24b2b8dea180098a3acc809a91cde6c0cc4c8607 baseline version: xen cb34a7c8d741aa447d79e1b01d71168a4088a4d7 People who touched revisions under test: Andrew Cooper Dario Faggioli David Scott Don Slutz Elena Ufimsteva George Dunlap Ian Campbell Ian Jackson Jan Beulich Jintack Lim Julien Grall Michael Young Olaf Hering Stefano Stabellini Wei Liu jobs: build-amd64 pass build-armhf pass build-i386 pass build-amd64-libvirt pass build-armhf-libvirt pass build-i386-libvirt pass build-amd64-oldkern pass build-i386-oldkern pass build-amd64-pvopspass build-armhf-pvopspass build-i386-pvops pass build-amd64-rumpuserxen pass build-i386-rumpuserxen pass test-amd64-amd64-xl pass test-armhf-armhf-xl pass test-amd64-i386-xl pass test-amd64-amd64-xl-pvh-amd fail test-amd64-i386-rhel6hvm-amd pass test-amd64-i386-qemut-rhel6hvm-amd pass test-amd64-i386-qemuu-rhel6hvm-amd pass test-amd64-amd64-xl-qemut-debianhvm-amd64pass test-amd64-i386-xl-qemut-debianhvm-amd64 pass test-amd64-amd64-xl-qemuu-debianhvm-amd64pass test-amd64-i386-xl-qemuu-debianhvm-amd64
Re: [Xen-devel] [PATCH v9 08/13] Add IOREQ_TYPE_VMWARE_PORT
On 02/26/15 03:07, Jan Beulich wrote: On 25.02.15 at 21:20, wrote: >> On 02/24/15 10:34, Jan Beulich wrote: >> On 17.02.15 at 00:05, wrote: @@ -501,22 +542,50 @@ static void hvm_free_ioreq_gmfn(struct domain *d, unsigned long gmfn) [snip] @@ -2429,9 +2552,6 @@ struct hvm_ioreq_server *hvm_select_ioreq_server(struct >> domain *d, if ( list_empty(&d->arch.hvm_domain.ioreq_server.list) ) return NULL; -if ( p->type != IOREQ_TYPE_COPY && p->type != IOREQ_TYPE_PIO ) -return d->arch.hvm_domain.default_ioreq_server; >>> >>> Shouldn't this rather be amended than deleted? >>> >> >> The reason is below: >> @@ -2474,7 +2594,12 @@ struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d, BUILD_BUG_ON(IOREQ_TYPE_PIO != HVMOP_IO_RANGE_PORT); BUILD_BUG_ON(IOREQ_TYPE_COPY != HVMOP_IO_RANGE_MEMORY); BUILD_BUG_ON(IOREQ_TYPE_PCI_CONFIG != HVMOP_IO_RANGE_PCI); +BUILD_BUG_ON(IOREQ_TYPE_VMWARE_PORT != HVMOP_IO_RANGE_VMWARE_PORT); +BUILD_BUG_ON(IOREQ_TYPE_TIMEOFFSET != HVMOP_IO_RANGE_TIMEOFFSET); +BUILD_BUG_ON(IOREQ_TYPE_INVALIDATE != HVMOP_IO_RANGE_INVALIDATE); r = s->range[type]; +if ( !r ) +continue; >>> >>> Why, all of the sudden? >>> >> >> This is the replacement for the deleted "if" above. Continue will lead >> to the same return that was remove above (it is at the end). They are >> currently the same because all ioreq servers have the same set of >> ranges. But if it would help, I can change "continue" into the "return >> default". > > So further down you talk of the "special range 1" (see there for > further remarks in this regard) - how would r be NULL here in the > first place? Since there is a hole in the #defines 0,1,2,7,8 (currently) range[6] is where r will be NULL for example. However no current code should be able to get here. So if you want me to I can drop the "if". > That said - yes, making this explicitly do what is > intended (perhaps rather using "break" instead of "return") would > seem very desirable. There simply is no point in continuing the > loop. > Will use break if the "if" is not dropped. @@ -2501,6 +2626,13 @@ struct hvm_ioreq_server *hvm_select_ioreq_server(struct domain *d, } break; +case IOREQ_TYPE_VMWARE_PORT: +case IOREQ_TYPE_TIMEOFFSET: +case IOREQ_TYPE_INVALIDATE: +if ( rangeset_contains_singleton(r, 1) ) +return s; >>> >>> This literal 1 at least needs explanation (i.e. a comment). >>> >> >> The comment is below (copied here). Will duplicate it here (with any >> adjustments needed): >> >> + * NOTE: The 'special' range of 1 is what is checked for outside >> + * of the three types of I/O. >> >> How about /* The 'special' range of 1 is checked for being enabled */? > > Along these lines, yes (fixed for coding style). And then "1" is not > a range of any kind. I suppose writing it as a proper range (e.g. > [1,1]) would already help. > I will adjust to using [1,1]. --- a/xen/arch/x86/x86_emulate/x86_emulate.h +++ b/xen/arch/x86/x86_emulate/x86_emulate.h @@ -112,6 +112,8 @@ struct __packed segment_register { #define X86EMUL_RETRY 3 /* (cmpxchg accessor): CMPXCHG failed. Maps to X86EMUL_RETRY in caller. */ #define X86EMUL_CMPXCHG_FAILED 3 + /* Send part of registers also to DM. */ +#define X86EMUL_VMPORT_SEND4 >>> >>> Introducing a new value here seems rather fragile, as various code >>> paths you don't touch would need auditing that they do the right >>> thing upon this value being returned. Plus even conceptually this >>> doesn't belong here - the instruction emulator shouldn't be concerned >>> with things like VMware emulation. >>> >> >> The only place I know of where rc is not checked by name is in >> x86_emulate.c. There are a lot of 0 and != 0 checks. Also in area of >> code there are places that return X86EMUL_OKAY when it looks to me that >> the return value is checked for 0 and ignored otherwise. > > The point aren't the checks against zero, but the ones against the > #define-d values. Code may exist that, after excluding certain > values, assumes that only some specific value can be left. While > we aim at adding ASSERT()s for such cases, I'm nowhere near to > being convinced this is the case everywhere. > >> So I will agree that the use of these defines is complex. However, I >> need a way to pass back X86EMUL_UNHANDLEABLE and send a few registers to >> QEMU. Now since the code path that I need to do this is: >> >> ... >> hvmemul_do_io >> hvm_portio_intercept >>hvm_io_intercept >> process_portio_intercept >> vmport_ioport >> >> >> Since there is only 1 caller to hvm_portio_intercept() -- >> hvmemul_do_io, and hvmemul_do_i
Re: [Xen-devel] Shared page tables between ETP and IOMMU issue
El 26/02/15 a les 19.02, Roger Pau Monné ha escrit: > El 26/02/15 a les 17.43, Jan Beulich ha escrit: > On 26.02.15 at 17:29, wrote: >>> OK, I will try to take a look. All those faults come from physical >>> memory ranges that are supposed to be usable, and in fact the CPU seems >>> to be able to read/write from them without problems, or else the guest >>> would have crashed much more early. Regarding sharing the page tables >>> between EPT and the IOMMU, is there some bit that needs to be set in the >>> ept entry in order to mark a page as available by the IOMMU? >> >> Bits 0 and 1 (read and write) are shared between VT-d and EPT >> (as is bit 7 - see struct dma_pte and ept_entry_t). > > I've added some debug prints at the end of construct_dom0 to print the > MFN of a RAM page (using get_gfn_query_unlocked) and the VTd entry > (using print_vtd_entries): > > (XEN) print_vtd_entries: iommu 8302197c3a40 dev :00:1f.2 gmfn 43e0 > (XEN) root_entry = 8302197c > (XEN) root_entry[0] = 140144001 > (XEN) context = 830140144000 > (XEN) context[fa] = 2_140148001 > (XEN) l4 = 830140148000 > (XEN) l4_index = 0 > (XEN) l4[0] = 140147003 > (XEN) l3 = 830140147000 > (XEN) l3_index = 0 > (XEN) l3[0] = 140146003 > (XEN) l2 = 830140146000 > (XEN) l2_index = 21 > (XEN) l2[21] = 0 > (XEN) l2[21] not present > (XEN) GFN: 0x43e0 MFN: 0x1401e3 type: 0 > > This is before Dom0 has been started, so I think there's something > wrong in the way we build the page tables, because AFAICT the VTd > code is not able to resolve the GFN, but the EPT code is. BTW, if I set no-sharept the output is as expected: (XEN) print_vtd_entries: iommu 8302197c3a40 dev :00:1f.2 gmfn 43e0 (XEN) root_entry = 8302197c (XEN) root_entry[0] = 19793f001 (XEN) context = 83019793f000 (XEN) context[fa] = 2_140149001 (XEN) l4 = 830140149000 (XEN) l4_index = 0 (XEN) l4[0] = 140148003 (XEN) l3 = 830140148000 (XEN) l3_index = 0 (XEN) l3[0] = 140147003 (XEN) l2 = 830140147000 (XEN) l2_index = 21 (XEN) l2[21] = 14012c003 (XEN) l1 = 83014012c000 (XEN) l1_index = 1e0 (XEN) l1[1e0] = 1401e3003 (XEN) GFN: 0x43e0 MFN: 0x1401e3 type: 0 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Shared page tables between ETP and IOMMU issue
El 26/02/15 a les 20.28, Konrad Rzeszutek Wilk ha escrit: > On Thu, Feb 26, 2015 at 07:02:22PM +0100, Roger Pau Monné wrote: >> El 26/02/15 a les 17.43, Jan Beulich ha escrit: >> On 26.02.15 at 17:29, wrote: OK, I will try to take a look. All those faults come from physical memory ranges that are supposed to be usable, and in fact the CPU seems to be able to read/write from them without problems, or else the guest would have crashed much more early. Regarding sharing the page tables between EPT and the IOMMU, is there some bit that needs to be set in the ept entry in order to mark a page as available by the IOMMU? >>> >>> Bits 0 and 1 (read and write) are shared between VT-d and EPT >>> (as is bit 7 - see struct dma_pte and ept_entry_t). >> >> I've added some debug prints at the end of construct_dom0 to print the >> MFN of a RAM page (using get_gfn_query_unlocked) and the VTd entry >> (using print_vtd_entries): >> >> (XEN) print_vtd_entries: iommu 8302197c3a40 dev :00:1f.2 gmfn 43e0 >> (XEN) root_entry = 8302197c >> (XEN) root_entry[0] = 140144001 >> (XEN) context = 830140144000 >> (XEN) context[fa] = 2_140148001 >> (XEN) l4 = 830140148000 >> (XEN) l4_index = 0 >> (XEN) l4[0] = 140147003 >> (XEN) l3 = 830140147000 >> (XEN) l3_index = 0 >> (XEN) l3[0] = 140146003 >> (XEN) l2 = 830140146000 >> (XEN) l2_index = 21 >> (XEN) l2[21] = 0 >> (XEN) l2[21] not present >> (XEN) GFN: 0x43e0 MFN: 0x1401e3 type: 0 >> >> This is before Dom0 has been started, so I think there's something >> wrong in the way we build the page tables, because AFAICT the VTd >> code is not able to resolve the GFN, but the EPT code is. > > This looks like what Elena was hitting (how we parsed E820_RSV or > MMIO ranges). Are those GPFNs special? No, they are regular RAM (p2m_ram_rw). I think Elena's problem was due to missing RMRR regions in the ACPI tables. On the other hand this is the IOMMU failing to provide translations for RAM regions. It seems like it's caused by sharing the page tables between EPT and the IOMMUs. Roger. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Shared page tables between ETP and IOMMU issue
On Thu, Feb 26, 2015 at 07:02:22PM +0100, Roger Pau Monné wrote: > El 26/02/15 a les 17.43, Jan Beulich ha escrit: > On 26.02.15 at 17:29, wrote: > >> OK, I will try to take a look. All those faults come from physical > >> memory ranges that are supposed to be usable, and in fact the CPU seems > >> to be able to read/write from them without problems, or else the guest > >> would have crashed much more early. Regarding sharing the page tables > >> between EPT and the IOMMU, is there some bit that needs to be set in the > >> ept entry in order to mark a page as available by the IOMMU? > > > > Bits 0 and 1 (read and write) are shared between VT-d and EPT > > (as is bit 7 - see struct dma_pte and ept_entry_t). > > I've added some debug prints at the end of construct_dom0 to print the > MFN of a RAM page (using get_gfn_query_unlocked) and the VTd entry > (using print_vtd_entries): > > (XEN) print_vtd_entries: iommu 8302197c3a40 dev :00:1f.2 gmfn 43e0 > (XEN) root_entry = 8302197c > (XEN) root_entry[0] = 140144001 > (XEN) context = 830140144000 > (XEN) context[fa] = 2_140148001 > (XEN) l4 = 830140148000 > (XEN) l4_index = 0 > (XEN) l4[0] = 140147003 > (XEN) l3 = 830140147000 > (XEN) l3_index = 0 > (XEN) l3[0] = 140146003 > (XEN) l2 = 830140146000 > (XEN) l2_index = 21 > (XEN) l2[21] = 0 > (XEN) l2[21] not present > (XEN) GFN: 0x43e0 MFN: 0x1401e3 type: 0 > > This is before Dom0 has been started, so I think there's something > wrong in the way we build the page tables, because AFAICT the VTd > code is not able to resolve the GFN, but the EPT code is. This looks like what Elena was hitting (how we parsed E820_RSV or MMIO ranges). Are those GPFNs special? > > Roger. > > > ___ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4] libxl_set_memory_target: retain the same maxmem offset on top of the current target
On Thu, Feb 26, 2015 at 06:53:29PM +, Stefano Stabellini wrote: [...] > > > } > > > > > > -libxl_dominfo_init(&ptr); > > > -xcinfo2xlinfo(ctx, &info, &ptr); > > > > If I'm not mistaken, &info is only used here. I think you can delete > > info and relevant code all together. > > info is used later as an argument to xc_domain_getinfolist > > What I meant was, the sole purpose of info and two function calls xc_domain_getinfolist + xcinfo2xlinfo is to fill in ptr, which is done by a single call to libxl_domain_info at the beginning of your patch, so it's possible to remove info and those two function calls all together. Wei. > > > > > -uuid = libxl__uuid2string(gc, ptr.uuid); > > > libxl__xs_write(gc, t, libxl__sprintf(gc, "/vm/%s/memory", uuid), > > > "%"PRIu32, new_target_memkb / 1024); > > > -libxl_dominfo_dispose(&ptr); > > > > > > out: > > > if (!xs_transaction_end(ctx->xsh, t, abort_transaction) > > > -- > > > 1.7.10.4 > > ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2] RFC: Automatically check xen's public headers for C++ pitfalls.
At 19:49 +0200 on 26 Feb (1424976562), Razvan Cojocaru wrote: > On 02/26/2015 07:01 PM, Tim Deegan wrote: > > +#ifdef __cplusplus > > +/* 'private' is a keyword in C++, so we have to use a different name for > > + * private state there. Leaving the C name alone to avoid unnecessary > > + * pain for the existing users. */ > > +#define XEN_RING_PRIVATE pvt > > +#else > > +#define XEN_RING_PRIVATE private > > +#endif > > Are there likely to be many users outside of the ones using that code > with mem_event? Yes, lots. It's used to implement split drivers for net, block, etc. Most users will have taken copies of this header into their own trees, though, and so won't face build breakage, and this isn't an ABI change. So far, I've seen David and Andrew in favour of just changing the field's name and letting out-of-tree users update their copies when/if they want to. Jan would prefer to avoid changing the field name for C users. I'm not delighted with any of these options but I think this ifdeffery is worse than the others. :) Let's see what anyone else has to say. Cheers, Tim. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v5] libxl_set_memory_target: retain the same maxmem offset on top of the current target
In libxl_set_memory_target when setting the new maxmem, retain the same offset on top of the current target. In the future the offset will include memory allocated by QEMU for rom files. Signed-off-by: Stefano Stabellini --- Changes in v5: - call libxl_dominfo_init; - move libxl_dominfo_dispose call before returning to the caller; Changes in v4: - remove new_target_memkb <= 0 check. Changes in v3: - move call to libxl__uuid2string and libxl_dominfo_dispose earlier; - error out if new_target_memkb <= 0. Changes in v2: - remove LIBXL_MAXMEM_CONSTANT from LIBXL__LOG_ERRNO. --- tools/libxl/libxl.c | 18 ++ 1 file changed, 10 insertions(+), 8 deletions(-) diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c index b9a1941..7d42dc6 100644 --- a/tools/libxl/libxl.c +++ b/tools/libxl/libxl.c @@ -4720,6 +4720,12 @@ int libxl_set_memory_target(libxl_ctx *ctx, uint32_t domid, char *uuid; xs_transaction_t t; +libxl_dominfo_init(&info); +if (libxl_domain_info(ctx, &ptr, domid) < 0) +goto out_no_transaction; + +uuid = libxl__uuid2string(gc, ptr.uuid); + retry_transaction: t = xs_transaction_start(ctx->xsh); @@ -4795,13 +4801,12 @@ retry_transaction: } if (enforce) { -memorykb = new_target_memkb + videoram; -rc = xc_domain_setmaxmem(ctx->xch, domid, memorykb + -LIBXL_MAXMEM_CONSTANT); +memorykb = ptr.max_memkb - current_target_memkb + new_target_memkb; +rc = xc_domain_setmaxmem(ctx->xch, domid, memorykb); if (rc != 0) { LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, "xc_domain_setmaxmem domid=%d memkb=%d failed " -"rc=%d\n", domid, memorykb + LIBXL_MAXMEM_CONSTANT, rc); +"rc=%d\n", domid, memorykb, rc); abort_transaction = 1; goto out; } @@ -4826,12 +4831,8 @@ retry_transaction: goto out; } -libxl_dominfo_init(&ptr); -xcinfo2xlinfo(ctx, &info, &ptr); -uuid = libxl__uuid2string(gc, ptr.uuid); libxl__xs_write(gc, t, libxl__sprintf(gc, "/vm/%s/memory", uuid), "%"PRIu32, new_target_memkb / 1024); -libxl_dominfo_dispose(&ptr); out: if (!xs_transaction_end(ctx->xsh, t, abort_transaction) @@ -4840,6 +4841,7 @@ out: goto retry_transaction; out_no_transaction: +libxl_dominfo_dispose(&ptr); GC_FREE; return rc; } -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Crash of guest with nested vmx with Unknown nested vmexit reason 80000021.
Hi Jan, Anything planned concerning this? BR, Jeroen. Jan Beulich schreef op 9-12-2014 om 10:17: On 09.12.14 at 10:09, wrote: Did anyone find the time yet? I'm still more then willing testing any patches. Just yesterday we were told by Intel that they still can't foresee when they will find time. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v4] libxl_set_memory_target: retain the same maxmem offset on top of the current target
On Thu, 26 Feb 2015, Wei Liu wrote: > On Wed, Feb 25, 2015 at 03:18:45PM +, Stefano Stabellini wrote: > > In libxl_set_memory_target when setting the new maxmem, retain the same > > offset on top of the current target. In the future the offset will > > include memory allocated by QEMU for rom files. > > > > Signed-off-by: Stefano Stabellini > > > > --- > > > > Changes in v4: > > - remove new_target_memkb <= 0 check. > > > > Changes in v3: > > - move call to libxl__uuid2string and libxl_dominfo_dispose earlier; > > - error out if new_target_memkb <= 0. > > > > Changes in v2: > > - remove LIBXL_MAXMEM_CONSTANT from LIBXL__LOG_ERRNO. > > --- > > tools/libxl/libxl.c | 12 +++- > > 1 file changed, 7 insertions(+), 5 deletions(-) > > > > diff --git a/tools/libxl/libxl.c b/tools/libxl/libxl.c > > index 52a783a..143cb3e 100644 > > --- a/tools/libxl/libxl.c > > +++ b/tools/libxl/libxl.c > > @@ -4681,6 +4681,12 @@ int libxl_set_memory_target(libxl_ctx *ctx, uint32_t > > domid, > > char *uuid; > > xs_transaction_t t; > > > > Should have: > > libxl_dominfo_init(&ptr); > > > +if (libxl_domain_info(ctx, &ptr, domid) < 0) > > +goto out_no_transaction; > > + > > +uuid = libxl__uuid2string(gc, ptr.uuid); > > +libxl_dominfo_dispose(&ptr); > > + > > Since you need to use ptr later, you cannot dispose it here. > > You can safely call dispose before returning to caller. > > > retry_transaction: > > t = xs_transaction_start(ctx->xsh); > > > > @@ -4756,7 +4762,7 @@ retry_transaction: > > } > > > > if (enforce) { > > -memorykb = new_target_memkb + videoram; > > +memorykb = ptr.max_memkb - current_target_memkb + new_target_memkb; > > rc = xc_domain_setmaxmem(ctx->xch, domid, memorykb); > > if (rc != 0) { > > LIBXL__LOG_ERRNO(ctx, LIBXL__LOG_ERROR, > > @@ -4786,12 +4792,8 @@ retry_transaction: > > goto out; > > } > > > > -libxl_dominfo_init(&ptr); > > -xcinfo2xlinfo(ctx, &info, &ptr); > > If I'm not mistaken, &info is only used here. I think you can delete > info and relevant code all together. info is used later as an argument to xc_domain_getinfolist > > > -uuid = libxl__uuid2string(gc, ptr.uuid); > > libxl__xs_write(gc, t, libxl__sprintf(gc, "/vm/%s/memory", uuid), > > "%"PRIu32, new_target_memkb / 1024); > > -libxl_dominfo_dispose(&ptr); > > > > out: > > if (!xs_transaction_end(ctx->xsh, t, abort_transaction) > > -- > > 1.7.10.4 > ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: xen config changes v4
On Thu, Feb 26, 2015 at 05:42:57PM +, Stefano Stabellini wrote: > On Thu, 26 Feb 2015, Luis R. Rodriguez wrote: > > On Thu, Feb 26, 2015 at 11:08:20AM +, Stefano Stabellini wrote: > > > On Thu, 26 Feb 2015, David Vrabel wrote: > > > > On 26/02/15 04:59, Juergen Gross wrote: > > > > > > > > > > So we are again in the situation that pv-drivers always imply the > > > > > pvops > > > > > kernel (PARAVIRT selected). I started the whole Kconfig rework to > > > > > eliminate this dependency. > > > > > > > > Yes. Can you produce a series that just addresses this one issue. > > > > > > > > In the absence of any concrete requirement for this big Kconfig reorg I > > > > I don't think it is helpful. > > > > > > I clearly missed some context as I didn't realize that this was the > > > intended goal. Why do we want this? Please explain as it won't come > > > for free. > > > > > > > > > We have a few PV interfaces for HVM guests that need PARAVIRT in Linux > > > in order to be used, for example pv_time_ops and HVMOP_pagetable_dying. > > > They are critical performance improvements and from the interface > > > perspective, small enough that doesn't make much sense having a separate > > > KConfig option for them. > > > > > > > > > In order to reach the goal above we necessarily need to introduce a > > > differentiation in terms of PV on HVM guests in Linux: > > > > > > 1) basic guests with PV network, disk, etc but no PV timers, no > > >HVMOP_pagetable_dying, no PV IPIs > > > 2) full PV on HVM guests that have PV network, disk, timers, > > >HVMOP_pagetable_dying, PV IPIs and anything else that makes sense. > > > > > > 2) is much faster than 1) on Xen and 2) is only a tiny bit slower than > > > 1) on native x86 > > > > Also don't we shove 2) down hvm guests right now? Even when everything is > > built in I do not see how we opt out for HVM for 1) at run time right now. > > > > If this is true then the question of motivation for this becomes even > > stronger I think. > > Yes, indeed there is no way to do 1) at the moment. And for good > reasons, see above. OK if the goal is to be able to build front end drivers by avoiding building PARAVIRT / PARAVIRT_CLOCK and if the gains to be able to do so (which haven't been stated other than just the ability to do so) are small (as Stefano notes simple hvm containers do not perform great) but requires a bit of work, I'd rather ask -- why not address *why* we are avoiding PARAVIRT / PARAVIRT_CLOCK and stick to the original goals behind the pvops model by addressing what is required to be able to continue to be happy with one single kernel. The work required to do that might be more than to just be able to build simple Xen hvm containers without PARAVIRT / PARAVIRT_CLOCK but I'd think the gains would be much higher. If this resonates well then I'd like to ask: what are the current most pressing issues with enabling PARAVIRT / PARAVIRT_CLOCK. Luis ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Branch Trace Storage for guests andVPMUinitialization
On 02/26/2015 12:57 PM, kevin.ma...@gdata.de wrote: -Ursprüngliche Nachricht- Von: Boris Ostrovsky [mailto:boris.ostrov...@oracle.com] Gesendet: Donnerstag, 26. Februar 2015 17:35 An: Dietmar Hahn; xen-devel@lists.xen.org Cc: Mayer, Kevin Betreff: Re: [Xen-devel] Branch Trace Storage for guests and VPMUinitialization On 02/26/2015 03:56 AM, Dietmar Hahn wrote: Am Mittwoch 25 Februar 2015, 11:31:31 schrieb Boris Ostrovsky: On 02/25/2015 10:12 AM, kevin.ma...@gdata.de wrote: -Ursprüngliche Nachricht- Von: Boris Ostrovsky [mailto:boris.ostrov...@oracle.com] Gesendet: Dienstag, 24. Februar 2015 18:13 An: Mayer, Kevin; xen-devel@lists.xen.org Betreff: Re: [Xen-devel] Branch Trace Storage for guests and VPMU initialization On 02/24/2015 10:27 AM, kevin.ma...@gdata.de wrote: Hi guys I`m trying to set up the BTS so that I can log the branches taken in the guest using Xen 4.4.1 with a WinXP SP3 guest on a Core i7 Sandy Bridge. I added the vpmu=bts boot parameter to my grub2 configuration and extended the libxl,libxc,domctl,… with an own command so that I can trigger the activation of the BTS whenever I want. I am not sure why you are doing all these changes to Xen code. BTS is supposed to be managed from the guest. For example, a Fedora HVM guest will produce this: [root@dhcp-burlington7-2nd-B-east-10-152-55-140 ~]# perf record -e branches:u -c 1 -d sleep 1 [ perf record: Woken up 3838 times to write data ] [ perf record: Captured and wrote 0.704 MB perf.data (~30756 samples) ] [root@dhcp-burlington7-2nd-B-east-10-152-55-140 ~]# perf script -f ip,addr,sym,dso,symoff --show-kernel-path 8167c347 native_irq_return_iret+0x0 (/proc/kcore) => 328c001590 [unknown] (/proc/kcore) 8167c347 native_irq_return_iret+0x0 (/proc/kcore) => 328c001590 [unknown] ([unknown]) 328c001593 [unknown] ([unknown]) => 328c004b70 [unknown] ([unknown]) ... I want to be able to log the taken branches (of the guest) without the need to modify the guest at all. This means I have to do all the logic in the hypervisor, or am I wrong? In that case, yes. But then you have to make sure that at least * you don't load guest's VPMU (or, at least, BTS-related registers) on context switch But you need to modify PMU registers when switching to/from the guest context to get PMU running. I was thinking that all BTS stuff can be controlled from dom0 and so we can use dom0's version of these registers. I didn't realize that DS_AREA would have to be accessed in guest's address space (and that DEBUGCTL is loaded from VMCS). Which is what I think I said in response to this message (which didn't show up on the list because Kevin accidentally dropped xen-devel). -boris Terribly sorry about that... So the VPMU doesn’t get loaded when there is a VMENTER? Not exactly. For BTS, DEBUGCTL register, which lives in VMCS, does get loaded. But not DS_AREA --- it gets loaded by SW during context_switch()->vpmu_load(). (As for general VPMU registers such as counters --- they are also loaded during context_switch(). But I don't think you care about those. From what little I know about BTS, DEBUGCTL and DS_AREA are the only two registers you are interested in) I thought I could set the domU->vcpu->vpmu to enable BTS while in dom0 (with modified versions of msr_write_intercept, vpmu_do_wrmsr and core2_vpmu_do_wrmsr of course since the build in ones use the current-vcpu which would be the dom0-vcpu) and as soon as there is a context switch to domU the vpmu gets loaded and the guest starts logging. And it should work, provided that DS_AREA is set up correctly. If the described behavior is correct the only problem I can see is with allocating memory in dom0 in a way that the guest can access it. This sounds right. All you have to do now is implementation details ;-) -boris But if I got it wrong please explain how the vpmu really works. Cheers Kevin I didn't think of using the VPMU stuff with modifying the context from outside the guest. * You don't send the interrupt to the guest (meaning that you will need to somehow inform dom0 of the BTS interrupt) and probably more. Essentially, you want dom0 to profile the guest. I have been working on patches that would allow that but they are still under review. In this command I do the following: I set up the memory region for the BTS Buffer and the DS Buffer Management Area using xzalloc_bytes I don't think you should be allocating BTS buffers in the hypervisor, they are in guest's memory. I agree. As I said I think this is where my main problem is at the moment. Is there any way I can allocate memory in the hypervisor in a way the guest can access it? I am not sure this is what you want since you seem to *not* want the guest to process the samples, right? But yes, you can. E.g. something like what map_vcpu_info() does. (I have no idea how you'd do this from Windows.) The DS buffer has to be map
Re: [Xen-devel] [PATCH v2 3/3] xen/arm: allow console=hvc0 to be omitted for guests
On Wed, 18 Feb 2015, Ian Campbell wrote: > On Wed, 2015-02-18 at 09:50 -0600, Rob Herring wrote: > > On Wed, Feb 18, 2015 at 7:51 AM, Julien Grall > > wrote: > > > From: Ard Biesheuvel > > > > > > This patch registers hvc0 as the preferred console if no console > > > has been specified explicitly on the kernel command line. > > > > > > The purpose is to allow platform agnostic kernels and boot images > > > (such as distro installers) to boot in a Xen/ARM domU without the > > > need to modify the command line by hand. > > > > How does this interact with DT chosen stdout-path? > > I think it shouldn't any more than the existing calls from e.g. the 8250 > driver to preferred_console do. > > > Is there a node for hvc0? > > Not a direct one, it is inferred from the presence of the general Xen > node. Xen PV consoles, including hvc0, as all the other Xen PV devices are advertised on xenstore. > I did vaguely consider handling a stdout-path pointing to that -- > but it seemed a bit of an abuse. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2 1/3] arm/xen: Correctly check if the event channel interrupt is present
On Wed, 18 Feb 2015, Julien Grall wrote: > The function irq_of_parse_and_map returns 0 when the IRQ is not found. > > Futhermore, move the check before notifying the user that we are running on > Xen. > > Signed-off-by: Julien Grall > Acked-by: Ian Campbell Acked-by: Stefano Stabellini > --- > Changes in v2: > - Add Ian's ack > - Re-add __read_mostly > --- > arch/arm/xen/enlighten.c | 10 ++ > 1 file changed, 6 insertions(+), 4 deletions(-) > > diff --git a/arch/arm/xen/enlighten.c b/arch/arm/xen/enlighten.c > index 263a204..c8d3a17 100644 > --- a/arch/arm/xen/enlighten.c > +++ b/arch/arm/xen/enlighten.c > @@ -51,7 +51,7 @@ EXPORT_SYMBOL_GPL(xen_have_vector_callback); > int xen_platform_pci_unplug = XEN_UNPLUG_ALL; > EXPORT_SYMBOL_GPL(xen_platform_pci_unplug); > > -static __read_mostly int xen_events_irq = -1; > +static __read_mostly unsigned int xen_events_irq; > > /* map fgmfn of domid to lpfn in the current domain */ > static int map_foreign_page(unsigned long lpfn, unsigned long fgmfn, > @@ -251,12 +251,14 @@ static int __init xen_guest_init(void) > return 0; > grant_frames = res.start; > xen_events_irq = irq_of_parse_and_map(node, 0); > + if (!xen_events_irq) { > + pr_debug("Xen event channel interrupt not found\n"); > + return -ENODEV; > + } > + > pr_info("Xen %s support found, events_irq=%d gnttab_frame=%pa\n", > version, xen_events_irq, &grant_frames); > > - if (xen_events_irq < 0) > - return -ENODEV; > - > xen_domain_type = XEN_HVM_DOMAIN; > > xen_setup_features(); > -- > 2.1.4 > ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [OSSTEST PATCH 8/8] ap-fetch-version: Use osstest's home to find master tree
Ian Campbell writes ("Re: [OSSTEST PATCH 8/8] ap-fetch-version: Use osstest's home to find master tree"): > On Wed, 2015-02-25 at 13:01 +, Ian Jackson wrote: > > When ap-fetch-version and ap-fetch-version-old are run on the osstest > > controller but as a different user they should look in ~osstest, not > > $HOME, for the master testing.git tree. ... > But what if they are run not on the osstest controller where ~osstest > may not exist? Then they ought not to look for the user's $HOME/testing.git, which is unlikely to (a) exist or (b) be relevant if it does. They ought to fail. > I think your previous changes have already arranged that standalone mode > won't get to either of these anyway, so, that being the case: Yes, that's the intent. > Acked-by: Ian Campbell I have added something about this to the commit message (and retained your ack). Thanks, Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Shared page tables between ETP and IOMMU issue
El 26/02/15 a les 17.43, Jan Beulich ha escrit: On 26.02.15 at 17:29, wrote: >> OK, I will try to take a look. All those faults come from physical >> memory ranges that are supposed to be usable, and in fact the CPU seems >> to be able to read/write from them without problems, or else the guest >> would have crashed much more early. Regarding sharing the page tables >> between EPT and the IOMMU, is there some bit that needs to be set in the >> ept entry in order to mark a page as available by the IOMMU? > > Bits 0 and 1 (read and write) are shared between VT-d and EPT > (as is bit 7 - see struct dma_pte and ept_entry_t). I've added some debug prints at the end of construct_dom0 to print the MFN of a RAM page (using get_gfn_query_unlocked) and the VTd entry (using print_vtd_entries): (XEN) print_vtd_entries: iommu 8302197c3a40 dev :00:1f.2 gmfn 43e0 (XEN) root_entry = 8302197c (XEN) root_entry[0] = 140144001 (XEN) context = 830140144000 (XEN) context[fa] = 2_140148001 (XEN) l4 = 830140148000 (XEN) l4_index = 0 (XEN) l4[0] = 140147003 (XEN) l3 = 830140147000 (XEN) l3_index = 0 (XEN) l3[0] = 140146003 (XEN) l2 = 830140146000 (XEN) l2_index = 21 (XEN) l2[21] = 0 (XEN) l2[21] not present (XEN) GFN: 0x43e0 MFN: 0x1401e3 type: 0 This is before Dom0 has been started, so I think there's something wrong in the way we build the page tables, because AFAICT the VTd code is not able to resolve the GFN, but the EPT code is. Roger. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v5.99.1 RFC 1/4] xen/arm: Duplicate gic-v2.c file to support hip04 platform version
On Thu, 26 Feb 2015, Ian Campbell wrote: > On Wed, 2015-02-25 at 16:34 +, Stefano Stabellini wrote: > > I think we should disable the build of all drivers in Xen by default, > > except for the ARM standard compliant ones (for aarch64 the SBSA is a > > nice summary of what is considered compliant), to keep the size of the > > binary small. > > I think this last statement was based on information that the gic-v2 > driver was of the order of 70-100K in size, but I think that information > was wrong (I suspect it was the raw .o size, which includes debug info > and other extraneous bits). Here I see: > > $ du -h xen/arch/arm/gic-v2.o > 148K xen/arch/arm/gic-v2.o > $ aarch64-linux-gnu-size xen/arch/arm/gic-v2.o >text data bss dec hex filename >6619 0 9767161a3c xen/arch/arm/gic-v2.o > > IOW the actual binary size is on the order of 6K (gic-v3.o is around the > same). This is arm64, I can't be bothered to rebuild for arm32, it'll be > similar. > > Given that then I really don't think it is worth introducing a two tier > build over it. > > If we really cared about these sorts of savings we would arrange to > discard all of the unused GIC/SMMU/UART driver's .text/.data/.bss after > boot (easy enough to achieve by putting each in a dedicated segment). > > But I don't think we have enough such drivers to start worrying about > doing that just now. We have that opportunity in our back pocket if we > ever get to that point, which is good enough I think. > > > Could you please introduce a Xen build time option in > > xen/arch/arm/Rules.mk, called HAS_NON_STANDARD_DRIVERS, that by default > > is n, and gate the build of gic-hip04.c on it? > > Frediano, I see you've already done so in v6, thanks for that. Sorry to > go back on it. > > Assuming the rest of the series in v6 is OK (gets acked and whatever) > then I expect I can just skip that one patch when applying and fixup the > Makefile in the obvious way (approx s/HAS_NON.../CONFIG_ARM32/) in the > dependent patch. v6 is fine from my POV, you can add my Acked-by to all patches. I am OK with dropping HAS_NON_STANDARD_DRIVERS. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] freemem-slack and large memory environments
On Thu, 26 Feb 2015, Mike Latimer wrote: > On Thursday, February 26, 2015 03:57:54 PM Ian Campbell wrote: > > On Thu, 2015-02-26 at 08:36 -0700, Mike Latimer wrote: > > > There is still one aspect of my original patch that is important. As the > > > code currently stands, the target for dom0 is set lower during each > > > iteration of the loop. Unless only one iteration is required, dom0 will > > > end up being set to a much lower target than is actually required. > > > > Is this because some sort of slack is applied once per iteration rather > > than once at the start or is it something else? > > No - the slack reservation just complicated the request by (potentially) > requiring more free memory than domU initially requested. > > With or without slack, the central loop in tools/libxl/xl_cmdimpl.c:freemem, > frees memory for domU by lowering the memory target for dom0. However, this > is > not a single request (e.g. free 64GB for domX), rather the memory target for > dom0 is set lower during every iteration through: > >rc = libxl_set_memory_target(ctx, 0, free_memkb - need_memkb, 1, 0); > > This causes dom0's memory target to be lowered by the needed amount during > every iteration of the loop. In practice, this causes the first request to > lower dom0's target by the full amount (e.g. -64GB), and subsequent > iterations > further lower dom0's target by however much memory that still appears to be > required (e.g. three iterations of the loop might lower dom0's target by > -25GB, then -25GB, for a total of dom0 ballooning down 114GB). The issue > itself is due to the loop ignoring the fact that the original request set > dom0's target to the correct amount, but the ballooning has not completed. What is the return value of libxl_set_memory_target and libxl_wait_for_free_memory in that case? Isn't it just a matter of properly handle the return values? Or maybe we just need to change the libxl_set_memory_target call to use an absolute memory target to avoid restricting dom0 memory more than necessary at each iteration. Also increasing the timeout argument passed to the libxl_wait_for_free_memory call could help. > The problem itself is easier to see when domU memory sizes are increased. As > mentioned before, starting a 512GB domain should guarantee that multiple > iterations of the loop are required, and dom0 will balloon down much further > than the required 512GB. > > Does this clarify the situation? ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [OSSTEST PATCH 9/8] README.dev: Runes for adhoc testing in the production environment
Signed-off-by: Ian Jackson --- README.dev | 18 ++ 1 file changed, 18 insertions(+) diff --git a/README.dev b/README.dev index aae4f17..03c3e61 100644 --- a/README.dev +++ b/README.dev @@ -164,3 +164,21 @@ $HOME/bisects/for-$branch.git/stop $HOME/testing.git/$xenbranch.stop stops everything using $xenbranch + +Adhoc testing in the production environment +=== + +Adhoc (`play') testing of a proposed osstest branch: + + As yourself on the osstest controller VM: + + Check out the version of osstest to be tested. If you are editing + on your workstation, it is easiest to commit everything and then + git-push osstestvm:osstest-wombat-tree.git +HEAD:t + and on the controller + git checkout t~0 + + Create (on the controller) daily-cron-email-foo containing + To: something appropriate + Then + OSSTEST_EMAIL_HEADER=daily-cron-email-foo OSSTEST_USE_HEAD=y OSSTEST_NO_BASELINE=y ./cr-daily-branch osstest -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [OSSTEST PATCH 5/8] standalone: Always set OSSTEST_NO_BASELINE
Ian Campbell writes ("Re: [OSSTEST PATCH 5/8] standalone: Always set OSSTEST_NO_BASELINE"): > On Wed, 2015-02-25 at 13:01 +, Ian Jackson wrote: > > OSSTEST_NO_BASELINE disables the thing where cr-daily-branch decides > Acked-by: Ian Campbell > > Although: > > - --baseline)nobaseline=n; shift 1;; > > + --baseline) echo >&2 'warning: --baseline is obsolete'; shift 1;; > > TBH I think you could just nuke it from a tool like this. I rather > suspect noone is using it... I can't even remember why I wanted it. OK, I have done that and retained your ack. Thanks, Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [OSSTEST PATCH] rump kernels: Send build mails to new list
Signed-off-by: Ian Jackson CC: Antti Kantee --- daily-cron-email-real--rumpuserxen |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/daily-cron-email-real--rumpuserxen b/daily-cron-email-real--rumpuserxen index 67c48bf..a9166a0 100644 --- a/daily-cron-email-real--rumpuserxen +++ b/daily-cron-email-real--rumpuserxen @@ -1,3 +1,3 @@ To: xen-de...@lists.xensource.com, -rumpkernel-bui...@lists.sourceforge.net +rumpkernel-bui...@freelists.org Cc: ian.jack...@eu.citrix.com -- 1.7.10.4 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v5.99.1 RFC 1/4] xen/arm: Duplicate gic-v2.c file to support hip04 platform version
On Wed, 2015-02-25 at 16:34 +, Stefano Stabellini wrote: > I think we should disable the build of all drivers in Xen by default, > except for the ARM standard compliant ones (for aarch64 the SBSA is a > nice summary of what is considered compliant), to keep the size of the > binary small. I think this last statement was based on information that the gic-v2 driver was of the order of 70-100K in size, but I think that information was wrong (I suspect it was the raw .o size, which includes debug info and other extraneous bits). Here I see: $ du -h xen/arch/arm/gic-v2.o 148Kxen/arch/arm/gic-v2.o $ aarch64-linux-gnu-size xen/arch/arm/gic-v2.o textdata bss dec hex filename 6619 0 9767161a3c xen/arch/arm/gic-v2.o IOW the actual binary size is on the order of 6K (gic-v3.o is around the same). This is arm64, I can't be bothered to rebuild for arm32, it'll be similar. Given that then I really don't think it is worth introducing a two tier build over it. If we really cared about these sorts of savings we would arrange to discard all of the unused GIC/SMMU/UART driver's .text/.data/.bss after boot (easy enough to achieve by putting each in a dedicated segment). But I don't think we have enough such drivers to start worrying about doing that just now. We have that opportunity in our back pocket if we ever get to that point, which is good enough I think. > Could you please introduce a Xen build time option in > xen/arch/arm/Rules.mk, called HAS_NON_STANDARD_DRIVERS, that by default > is n, and gate the build of gic-hip04.c on it? Frediano, I see you've already done so in v6, thanks for that. Sorry to go back on it. Assuming the rest of the series in v6 is OK (gets acked and whatever) then I expect I can just skip that one patch when applying and fixup the Makefile in the obvious way (approx s/HAS_NON.../CONFIG_ARM32/) in the dependent patch. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Branch Trace Storage for guests andVPMUinitialization
> -Ursprüngliche Nachricht- > Von: Boris Ostrovsky [mailto:boris.ostrov...@oracle.com] > Gesendet: Donnerstag, 26. Februar 2015 17:35 > An: Dietmar Hahn; xen-devel@lists.xen.org > Cc: Mayer, Kevin > Betreff: Re: [Xen-devel] Branch Trace Storage for guests and > VPMUinitialization > > On 02/26/2015 03:56 AM, Dietmar Hahn wrote: > > Am Mittwoch 25 Februar 2015, 11:31:31 schrieb Boris Ostrovsky: > >> On 02/25/2015 10:12 AM, kevin.ma...@gdata.de wrote: > -Ursprüngliche Nachricht- > Von: Boris Ostrovsky [mailto:boris.ostrov...@oracle.com] > Gesendet: Dienstag, 24. Februar 2015 18:13 > An: Mayer, Kevin; xen-devel@lists.xen.org > Betreff: Re: [Xen-devel] Branch Trace Storage for guests and VPMU > initialization > > On 02/24/2015 10:27 AM, kevin.ma...@gdata.de wrote: > > Hi guys > > > > I`m trying to set up the BTS so that I can log the branches taken > > in the guest using Xen 4.4.1 with a WinXP SP3 guest on a Core i7 > > Sandy Bridge. > > > > I added the vpmu=bts boot parameter to my grub2 configuration and > > extended the libxl,libxc,domctl,… with an own command so that I > > can trigger the activation of the BTS whenever I want. > > > I am not sure why you are doing all these changes to Xen code. BTS > is supposed to be managed from the guest. For example, a Fedora > HVM > guest will produce this: > > [root@dhcp-burlington7-2nd-B-east-10-152-55-140 ~]# perf record -e > branches:u -c 1 -d sleep 1 [ perf record: Woken up 3838 times to > write data ] [ perf record: Captured and wrote 0.704 MB perf.data > (~30756 samples) ] > [root@dhcp-burlington7-2nd-B-east-10-152-55-140 ~]# perf script -f > ip,addr,sym,dso,symoff --show-kernel-path > 8167c347 native_irq_return_iret+0x0 (/proc/kcore) => > 328c001590 [unknown] (/proc/kcore) > 8167c347 native_irq_return_iret+0x0 (/proc/kcore) => > 328c001590 [unknown] ([unknown]) > 328c001593 [unknown] ([unknown]) => 328c004b70 [unknown] > ([unknown]) > ... > > >>> I want to be able to log the taken branches (of the guest) without the > need to modify the guest at all. > >>> This means I have to do all the logic in the hypervisor, or am I wrong? > >> In that case, yes. But then you have to make sure that at least > >>* you don't load guest's VPMU (or, at least, BTS-related > >> registers) on context switch > > But you need to modify PMU registers when switching to/from the guest > > context to get PMU running. > > > > I was thinking that all BTS stuff can be controlled from dom0 and so we can > use dom0's version of these registers. I didn't realize that DS_AREA would > have to be accessed in guest's address space (and that DEBUGCTL is loaded > from VMCS). > > Which is what I think I said in response to this message (which didn't show up > on the list because Kevin accidentally dropped xen-devel). > > -boris Terribly sorry about that... So the VPMU doesn’t get loaded when there is a VMENTER? I thought I could set the domU->vcpu->vpmu to enable BTS while in dom0 (with modified versions of msr_write_intercept, vpmu_do_wrmsr and core2_vpmu_do_wrmsr of course since the build in ones use the current-vcpu which would be the dom0-vcpu) and as soon as there is a context switch to domU the vpmu gets loaded and the guest starts logging. If the described behavior is correct the only problem I can see is with allocating memory in dom0 in a way that the guest can access it. But if I got it wrong please explain how the vpmu really works. Cheers Kevin > > > > I didn't think of using the VPMU stuff with modifying the context from > > outside the guest. > > > >>* You don't send the interrupt to the guest (meaning that you will > >> need to somehow inform dom0 of the BTS interrupt) > >> > >> and probably more. > >> > >> Essentially, you want dom0 to profile the guest. I have been working > >> on patches that would allow that but they are still under review. > >> > >> > > In this command I do the following: > > > > I set up the memory region for the BTS Buffer and the DS Buffer > > Management Area using xzalloc_bytes > > > I don't think you should be allocating BTS buffers in the > hypervisor, they are in guest's memory. > >>> I agree. As I said I think this is where my main problem is at the moment. > >>> Is there any way I can allocate memory in the hypervisor in a way the > guest can access it? > >> I am not sure this is what you want since you seem to *not* want the > >> guest to process the samples, right? > >> > >> But yes, you can. E.g. something like what map_vcpu_info() does. (I > >> have no idea how you'd do this from Windows.) > > The DS buffer has to be mapped within the guests address space so the > > CPU running in guest context can access this area. Otherwise you get > > this triple fault. > > So I woul
Re: [Xen-devel] Branch Trace Storage for guestsandVPMUinitialization
On 02/26/2015 08:44 AM, kevin.ma...@gdata.de wrote: -Ursprüngliche Nachricht- Von: Boris Ostrovsky [mailto:boris.ostrov...@oracle.com] Gesendet: Mittwoch, 25. Februar 2015 23:20 An: Mayer, Kevin Betreff: Re: AW: AW: [Xen-devel] Branch Trace Storage for guests andVPMUinitialization On 02/25/2015 01:23 PM, kevin.ma...@gdata.de wrote: -Ursprüngliche Nachricht- Von: Boris Ostrovsky [mailto:boris.ostrov...@oracle.com] Gesendet: Mittwoch, 25. Februar 2015 17:32 An: Mayer, Kevin Cc: xen-devel@lists.xen.org Betreff: Re: AW: [Xen-devel] Branch Trace Storage for guests and VPMUinitialization On 02/25/2015 10:12 AM, kevin.ma...@gdata.de wrote: -Ursprüngliche Nachricht- Von: Boris Ostrovsky [mailto:boris.ostrov...@oracle.com] Gesendet: Dienstag, 24. Februar 2015 18:13 An: Mayer, Kevin; xen-devel@lists.xen.org Betreff: Re: [Xen-devel] Branch Trace Storage for guests and VPMU initialization On 02/24/2015 10:27 AM, kevin.ma...@gdata.de wrote: Hi guys I`m trying to set up the BTS so that I can log the branches taken in the guest using Xen 4.4.1 with a WinXP SP3 guest on a Core i7 Sandy Bridge. I added the vpmu=bts boot parameter to my grub2 configuration and extended the libxl,libxc,domctl,… with an own command so that I can trigger the activation of the BTS whenever I want. I am not sure why you are doing all these changes to Xen code. BTS is supposed to be managed from the guest. For example, a Fedora HVM guest will produce this: [root@dhcp-burlington7-2nd-B-east-10-152-55-140 ~]# perf record -e branches:u -c 1 -d sleep 1 [ perf record: Woken up 3838 times to write data ] [ perf record: Captured and wrote 0.704 MB perf.data (~30756 samples) ] [root@dhcp-burlington7-2nd-B-east-10-152-55-140 ~]# perf script -f ip,addr,sym,dso,symoff --show-kernel-path 8167c347 native_irq_return_iret+0x0 (/proc/kcore) => 328c001590 [unknown] (/proc/kcore) 8167c347 native_irq_return_iret+0x0 (/proc/kcore) => 328c001590 [unknown] ([unknown]) 328c001593 [unknown] ([unknown]) => 328c004b70 [unknown] ([unknown]) ... I want to be able to log the taken branches (of the guest) without the need to modify the guest at all. This means I have to do all the logic in the hypervisor, or am I wrong? In that case, yes. But then you have to make sure that at least * you don't load guest's VPMU (or, at least, BTS-related registers) on context switch * You don't send the interrupt to the guest (meaning that you will need to somehow inform dom0 of the BTS interrupt) and probably more. Essentially, you want dom0 to profile the guest. I have been working on patches that would allow that but they are still under review. Yes, this is exactly what I want to do. Too bad that your patches are under review. Would have been pretty helpful I think. To be honest, I never tested them for BTS so they may not work in that mode. In fact, as you will realize by reading what I said below, they probably don't ;-( Maybe I should point out that I´m a total noob with xen and I definitely don’t understand all parts yet. So there may be some dumb mistakes in my assumptions. In this command I do the following: I set up the memory region for the BTS Buffer and the DS Buffer Management Area using xzalloc_bytes I don't think you should be allocating BTS buffers in the hypervisor, they are in guest's memory. I agree. As I said I think this is where my main problem is at the moment. Is there any way I can allocate memory in the hypervisor in a way the guest can access it? I am not sure this is what you want since you seem to *not* want the guest to process the samples, right? But yes, you can. E.g. something like what map_vcpu_info() does. (I have no idea how you'd do this from Windows.) Right again. As you said my goal is to profile the guest from dom0. So whenever the CPU is in guestmode and a branch is taken it should be stored in the BTS, but not when the CPU is running dom0. My idea was basically to set up the memory for the BTS and the GUEST_IA32_DEBUGCTL so when there is a vmexit the logging stops and starts again when there is a vmenter. As far as I understand the IA32_DEBUGCTL gets switched between the dom0-value and the guest-value (stored in vmcs) when there is a vmexit/vmenter, right? Right. And now I am not longer sure whether your buffer should be in hypervisor or guest's space: after VMENTER the hardware will load guest's versions of IA32_DEBUGCTLMSR and MSR_IA32_DS_AREA. I don't know whether you can prevent this from happening (need to look in the spec). And if that's the case then you might be able to: 1. Map DS area and BTS buffer in both guest and hypervisor. I believe your guest will have to have this mapped since these ares will be accessed via guest's EPT. As I said, I don't know how you'd do this in Windows --- I know nothing about programming there. I assume it can be done since there are Windows PV drivers for Xen. 2. Have dom0 set appropriate
Re: [Xen-devel] [PATCH v2] RFC: Automatically check xen's public headers for C++ pitfalls.
On 02/26/2015 07:01 PM, Tim Deegan wrote: > +#ifdef __cplusplus > +/* 'private' is a keyword in C++, so we have to use a different name for > + * private state there. Leaving the C name alone to avoid unnecessary > + * pain for the existing users. */ > +#define XEN_RING_PRIVATE pvt > +#else > +#define XEN_RING_PRIVATE private > +#endif Are there likely to be many users outside of the ones using that code with mem_event? Because if there aren't, there are much more drastic changes happening in Tamas' pending series, so perhaps seen that way the change becomes more acceptable. Thanks, Razvan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] freemem-slack and large memory environments
On Thu, 2015-02-26 at 10:38 -0700, Mike Latimer wrote: > On Thursday, February 26, 2015 03:57:54 PM Ian Campbell wrote: > > On Thu, 2015-02-26 at 08:36 -0700, Mike Latimer wrote: > > > There is still one aspect of my original patch that is important. As the > > > code currently stands, the target for dom0 is set lower during each > > > iteration of the loop. Unless only one iteration is required, dom0 will > > > end up being set to a much lower target than is actually required. > > > > Is this because some sort of slack is applied once per iteration rather > > than once at the start or is it something else? > > No - the slack reservation just complicated the request by (potentially) > requiring more free memory than domU initially requested. > > With or without slack, the central loop in tools/libxl/xl_cmdimpl.c:freemem, > frees memory for domU by lowering the memory target for dom0. However, this > is > not a single request (e.g. free 64GB for domX), rather the memory target for > dom0 is set lower during every iteration through: > >rc = libxl_set_memory_target(ctx, 0, free_memkb - need_memkb, 1, 0); > > This causes dom0's memory target to be lowered by the needed amount during > every iteration of the loop. In practice, this causes the first request to > lower dom0's target by the full amount (e.g. -64GB), and subsequent > iterations > further lower dom0's target by however much memory that still appears to be > required (e.g. three iterations of the loop might lower dom0's target by > -25GB, then -25GB, for a total of dom0 ballooning down 114GB). The issue > itself is due to the loop ignoring the fact that the original request set > dom0's target to the correct amount, but the ballooning has not completed. > > The problem itself is easier to see when domU memory sizes are increased. As > mentioned before, starting a 512GB domain should guarantee that multiple > iterations of the loop are required, and dom0 will balloon down much further > than the required 512GB. > > Does this clarify the situation? I think so. In essence we just need to update need_memkb on each iteration, right? Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xen's Linux kernel config options
On Thu, 26 Feb 2015, Luis R. Rodriguez wrote: > On Thu, Feb 26, 2015 at 11:19:17AM +, Stefano Stabellini wrote: > > On Wed, 25 Feb 2015, Luis R. Rodriguez wrote: > > > On Wed, Feb 25, 2015 at 12:01:31PM +, Stefano Stabellini wrote: > > > > On Tue, 24 Feb 2015, Luis R. Rodriguez wrote: > > > > > On Tue, Feb 24, 2015 at 7:21 AM, Stefano Stabellini > > > > > wrote: > > > > > > On Mon, 23 Feb 2015, Luis R. Rodriguez wrote: > > > > > >> On Thu, Feb 19, 2015 at 3:43 PM, Luis R. Rodriguez > > > > > >> wrote: > > > > > >> > On Fri, Dec 12, 2014 at 9:29 AM, David Vrabel > > > > > >> > wrote: > > > > > >> >> On 12/12/14 13:17, Juergen Gross wrote: > > > > > >> >>> XEN_PVHVM > > > > > >> >> > > > > > >> >> Move XEN_PVHVM under XEN and have it select PARAVIRT and > > > > > >> >> PARAVIRT_CLOCK. > > > > > >> > > > > > > >> > FWIW, although it seems we do not want to let users just build > > > > > >> > XEN_PVHVM hypervisors I have the changes required now to at > > > > > >> > least get > > > > > >> > this to build so I do know what it takes. > > > > > >> > > > > > > >> >>> XEN_FRONTENDXEN_PV > > > > > >> >>> || > > > > > >> >>> > > > > > >> >>> XEN_PVH || > > > > > >> >>> > > > > > >> >>> XEN_PVHVM > > > > > >> >> > > > > > >> >> This enables all the basic infrastructure for frontends: event > > > > > >> >> channels, > > > > > >> >> grant tables and Xenbus. > > > > > >> >> > > > > > >> >> Don't make XEN_FRONTEND depend on any XEN_* variant. It should > > > > > >> >> be > > > > > >> >> possible to have frontend drivers without support for any of the > > > > > >> >> PV/PVHVM/PVH guest types. > > > > > >> > > > > > > >> > David, can you elaborate on the type of Xen guest it would be on > > > > > >> > x86 > > > > > >> > its not PV, PVHVM, or PVH? I'm particularly curious about the > > > > > >> > xen_domain_type and how it would end up to selected. As it is we > > > > > >> > tie > > > > > >> > in XEN_PVHVM at build time with XEN_PVH, in order to have > > > > > >> > XEN_PVHVM > > > > > >> > completely removed from XEN_PVH we need quite a bit of code > > > > > >> > changes > > > > > >> > which at least as code exercise I have completed already. If we > > > > > >> > want > > > > > >> > at the very least xen_domain_type set when XEN_PV, XEN_PVHVM, and > > > > > >> > XEN_PVH are not available we need a bit more work. > > > > > >> > > > > > >> OK I think I see the issue. We have nothing quite like > > > > > >> xen_guest_init() on x86 enlighten.c, we do have this for ARM and I > > > > > >> think I can that close the gap I'm observing. > > > > > >> > > > > > >> >> Frontends only need event channels, grant > > > > > >> >> table and xenbus. > > > > > >> > > > > > > >> > Well xenbus_probe_initcall() will check for xen_domain() and that > > > > > >> > won't be set on x86 right now unless we have XEN_PV, XEN_PVHVM or > > > > > >> > XEN_PVH set -- to start off with. Then > > > > > >> > drivers/xen/xenbus/xenbus_client.c will check xen_feature in > > > > > >> > quite a > > > > > >> > bit of places as well, that won't be set unless > > > > > >> > xen_setup_features() > > > > > >> > is called which right now is only done on x86 > > > > > >> > arch/x86/xen/enlighten.c > > > > > >> > which as Juergen pointed out, is not needed if you don't have > > > > > >> > XEN_PV > > > > > >> > or XEN_PVH. As it turns out this is incorrect though, its needed > > > > > >> > for > > > > > >> > XEN_PVHVM as well and my split exercise in code addresses this. > > > > > >> > Now, > > > > > >> > at least in my code if you don't have XEN_PV, XEN_PVHVM, or > > > > > >> > XEN_PVH we > > > > > >> > don't call xen_setup_features() and its unclear to me where or > > > > > >> > how > > > > > >> > that should happen in other cases. > > > > > >> > > > > > >> Yeah I think having an x86 equivalent of xen_guest_init() would > > > > > >> solve > > > > > >> this, Stefano, thoughts? > > > > > > > > > > > > Having xen_guest_init() on x86 would be nice. Being able to set > > > > > > xen_domain_type to XEN_HVM_DOMAIN if we are running on Xen, > > > > > > regardless > > > > > > of XEN_PV/PVH/PVHVM also makes sense from Linux POV. > > > > > > > > > > OK great, thanks for the feedback. > > > > > > > > > > > That said, I don't see much value in removing XEN_PVHVM: why are we > > > > > > even > > > > > > doing this? What is the improvement we are seeking? > > > > > > > > > > We would not, the above discussed about the possibility of letting > > > > > users enable XEN_PVHVM without XEN_PVH, that's all. > > > > > > > > OK, that makes sense. > > > > > > > > > As is the only thing that can enable XEN_PVHVM is if you enable > > > > > XEN_PVH. > > > > > > > > This is the bit that we need to change but it shouldn't be difficult. > > > > > > > > > If we want > > > > > xen_guest_init() alone thoug
Re: [Xen-devel] [OSSTEST PATCH 3/8] emails: honour OSSTEST_EMAIL_SUBJECT_PREFIX
Ian Campbell writes ("Re: [OSSTEST PATCH 3/8] emails: honour OSSTEST_EMAIL_SUBJECT_PREFIX"): > On Wed, 2015-02-25 at 13:01 +, Ian Jackson wrote: > > This is prefixed before the other computed prefixes. It makes it > > easier to distinguish an adhoc cr-daily-branch test runs for a real > > branch. > > Do they not already get "adhoc" in the $subject? i.e. my commissioning > runs for the new arm create (following README.dev procedure) resulted in > mails with: > > [adhoc test] 34418: trouble: blocked/broken/fail/pass > > (IOW it seems $branch is replaced by adhoc somewhere along the say) That happens if you use mg-execute-flight. If you let cr-daily-branch run the flight for you, it uses the standard email stuff. Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: xen config changes v4
On Thu, 26 Feb 2015, Luis R. Rodriguez wrote: > On Thu, Feb 26, 2015 at 11:08:20AM +, Stefano Stabellini wrote: > > On Thu, 26 Feb 2015, David Vrabel wrote: > > > On 26/02/15 04:59, Juergen Gross wrote: > > > > > > > > So we are again in the situation that pv-drivers always imply the pvops > > > > kernel (PARAVIRT selected). I started the whole Kconfig rework to > > > > eliminate this dependency. > > > > > > Yes. Can you produce a series that just addresses this one issue. > > > > > > In the absence of any concrete requirement for this big Kconfig reorg I > > > I don't think it is helpful. > > > > I clearly missed some context as I didn't realize that this was the > > intended goal. Why do we want this? Please explain as it won't come > > for free. > > > > > > We have a few PV interfaces for HVM guests that need PARAVIRT in Linux > > in order to be used, for example pv_time_ops and HVMOP_pagetable_dying. > > They are critical performance improvements and from the interface > > perspective, small enough that doesn't make much sense having a separate > > KConfig option for them. > > > > > > In order to reach the goal above we necessarily need to introduce a > > differentiation in terms of PV on HVM guests in Linux: > > > > 1) basic guests with PV network, disk, etc but no PV timers, no > >HVMOP_pagetable_dying, no PV IPIs > > 2) full PV on HVM guests that have PV network, disk, timers, > >HVMOP_pagetable_dying, PV IPIs and anything else that makes sense. > > > > 2) is much faster than 1) on Xen and 2) is only a tiny bit slower than > > 1) on native x86 > > Also don't we shove 2) down hvm guests right now? Even when everything is > built in I do not see how we opt out for HVM for 1) at run time right now. > > If this is true then the question of motivation for this becomes even > stronger I think. Yes, indeed there is no way to do 1) at the moment. And for good reasons, see above. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] freemem-slack and large memory environments
On Thursday, February 26, 2015 03:57:54 PM Ian Campbell wrote: > On Thu, 2015-02-26 at 08:36 -0700, Mike Latimer wrote: > > There is still one aspect of my original patch that is important. As the > > code currently stands, the target for dom0 is set lower during each > > iteration of the loop. Unless only one iteration is required, dom0 will > > end up being set to a much lower target than is actually required. > > Is this because some sort of slack is applied once per iteration rather > than once at the start or is it something else? No - the slack reservation just complicated the request by (potentially) requiring more free memory than domU initially requested. With or without slack, the central loop in tools/libxl/xl_cmdimpl.c:freemem, frees memory for domU by lowering the memory target for dom0. However, this is not a single request (e.g. free 64GB for domX), rather the memory target for dom0 is set lower during every iteration through: rc = libxl_set_memory_target(ctx, 0, free_memkb - need_memkb, 1, 0); This causes dom0's memory target to be lowered by the needed amount during every iteration of the loop. In practice, this causes the first request to lower dom0's target by the full amount (e.g. -64GB), and subsequent iterations further lower dom0's target by however much memory that still appears to be required (e.g. three iterations of the loop might lower dom0's target by -25GB, then -25GB, for a total of dom0 ballooning down 114GB). The issue itself is due to the loop ignoring the fact that the original request set dom0's target to the correct amount, but the ballooning has not completed. The problem itself is easier to see when domU memory sizes are increased. As mentioned before, starting a 512GB domain should guarantee that multiple iterations of the loop are required, and dom0 will balloon down much further than the required 512GB. Does this clarify the situation? -Mike ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v10 0/4] enable Memory Bandwidth Monitoring (MBM) for VMs
Changes from v9: * Move libxc refactoring code into standalone patch; * Make libxl get_sample interface more generic; Changes from v8: * Merge event mask patch to MBM enabling patch; * Address comments from Ian Campbell(Detail in patch itself). Changes from v7: * Make obfuscating more complex as Jan suggested. * Minor adjustment for commit message. Changes from v6: * Obfuscate the read value of MSR_IA32_TSC by adding a booting random; * Minor coding style/comments adjustment; Changes from v5: * Remove common IRQ disable flag but instead disable IRQ when other MSR read followed by MSR_IA32_TSC read; * Add comments for special handle for MSR_IA32_TSC; Changes from v4: * Make the counter read and timestamp read atomic by disable IRQ; * Treat MSR_IA32_TSC as a special case and return NOW() for read path; * Add MBM description in xl command line. Changes from v3: * Get timestamp information from host along with the monitoring counter; This is required for counter overlow detection. * Address comments from Wei on the last patch. Changes from v2: * Remove the usage of "static" to cache data in xc; NOTE: Other places that already existed before are not touched due to the needs for API change. Will fix in separate patch if desirable. * Coding style; Changes from v1: * Move event type check from xc to xl; * Add retry capability for MBM sampling; * Fix Coding style/docs; Hypervisor part for this serial is already in, this contains only tools side changes. Intel Memory Bandwidth Monitoring(MBM) is a new hardware feature which builds on the CMT infrastructure to allow monitoring of system memory bandwidth. Event codes are provided to monitor both "total" and "local" bandwidth, meaning bandwidth over QPI and other external links can be monitored. For XEN, MBM is used to monitor memory bandwidth for VMs. Due to its dependency on CMT, the software also makes use of most of CMT codes. Actually, besides introducing two additional events and some cpuid feature bits, there are no extra changes compared to cache occupancy monitoring in CMT. Due to this, CMT should be enabled first to use this feature. For interface changes, the patch serial introduces a new command "XEN_SYSCTL_PSR_CMT_get_l3_event_mask" which exposes MBM feature capability to user space and modified "resource_op" to support reading host system time together with the monitored counter. On the tool stack side, two additional options introduced for "xl psr-cmt-show": total_mem_bandwidth: Show total memory bandwidth local_mem_bandwidth: Show local memory bandwidth The usage flow keeps the same with CMT. Chao Peng (4): tools: correct coding style for psr tools/libxc: code refactoring in xc_psr_cmt_get_data tools/libxl: code refactoring for MBM tools, docs: add total/local memory bandwith monitoring docs/man/xl.pod.1 | 11 +++- docs/misc/xen-command-line.markdown | 3 + tools/libxc/include/xenctrl.h | 14 +++-- tools/libxc/xc_msr_x86.h| 1 + tools/libxc/xc_psr.c| 76 ++-- tools/libxl/libxl.h | 28 +++-- tools/libxl/libxl_psr.c | 59 +++ tools/libxl/libxl_types.idl | 2 + tools/libxl/xl_cmdimpl.c| 113 +--- tools/libxl/xl_cmdtable.c | 4 +- 10 files changed, 253 insertions(+), 58 deletions(-) -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2] RFC: Automatically check xen's public headers for C++ pitfalls.
At 16:11 + on 26 Feb (1424963496), Tim Deegan wrote: > Add a check, like the existing check for non-ANSI C in the public > headers, that runs the public headers through a C++ compiler to > flag non-C++-friendly constructs. Oops, this still has the EFI changes in it. v3, rebased, is on its way. > Unlike the ANSI C check, we accept GCC-isms (gnu++98), and we also > check various tools-only headers. > > Explicitly _not_ addressing the use of 'private' in various fields, > since we'd previously decided not to fix that. BTW, ring.h is the only instance of that, so the extra diff to clear that up too is pretty small (see below). Not sure what people think about that though - it might be quite a PITA for downstream users of it, though they ought really to be using local copies so they can update in a controlled way. diff --git a/xen/include/Makefile b/xen/include/Makefile index d48a642..c7a1d52 100644 --- a/xen/include/Makefile +++ b/xen/include/Makefile @@ -104,8 +104,7 @@ headers.chk: $(PUBLIC_ANSI_HEADERS) Makefile headers++.chk: $(PUBLIC_HEADERS) Makefile if $(CXX) -v >/dev/null 2>&1; then \ for i in $(filter %.h,$^); do \ - $(CXX) -x c++ -std=gnu++98 -Wall -Werror \ - -D__XEN_TOOLS__ -Dprivate=private_is_a_keyword_in_cpp \ + $(CXX) -x c++ -std=gnu++98 -Wall -Werror -D__XEN_TOOLS__ \ -include stdint.h -include public/xen.h \ -S -o /dev/null $$i || exit 1; \ echo $$i; \ diff --git a/xen/include/public/io/ring.h b/xen/include/public/io/ring.h index 73e13d7..bb13494 100644 --- a/xen/include/public/io/ring.h +++ b/xen/include/public/io/ring.h @@ -111,7 +111,7 @@ struct __name##_sring { \ uint8_t msg;\ } tapif_user; \ uint8_t pvt_pad[4]; \ -} private; \ +} local;\ uint8_t __pad[44]; \ union __name##_sring_entry ring[1]; /* variable-length */ \ }; \ @@ -156,7 +156,7 @@ typedef struct __name##_back_ring __name##_back_ring_t #define SHARED_RING_INIT(_s) do { \ (_s)->req_prod = (_s)->rsp_prod = 0; \ (_s)->req_event = (_s)->rsp_event = 1; \ -(void)memset((_s)->private.pvt_pad, 0, sizeof((_s)->private.pvt_pad)); \ +(void)memset((_s)->local.pvt_pad, 0, sizeof((_s)->local.pvt_pad)); \ (void)memset((_s)->__pad, 0, sizeof((_s)->__pad)); \ } while(0) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v6 16/23] libxl: disallow memory relocation when vNUMA is enabled
Disallow memory relocation when vNUMA is enabled, because relocated memory ends up off node. Further more, even if we dynamically expand node coverage in hvmloader, low memory and high memory may reside in different physical nodes, blindly relocating low memory to high memory gives us a sub-optimal configuration. Introduce a function called libxl__vnuma_configured and use it. Signed-off-by: Wei Liu Cc: Ian Campbell Cc: Ian Jackson Cc: Konrad Wilk --- Changes in v6: 1. Introduce a helper function. --- tools/libxl/libxl_dm.c | 6 -- tools/libxl/libxl_internal.h | 1 + tools/libxl/libxl_vnuma.c| 5 + 3 files changed, 10 insertions(+), 2 deletions(-) diff --git a/tools/libxl/libxl_dm.c b/tools/libxl/libxl_dm.c index 8599a6a..7b09512 100644 --- a/tools/libxl/libxl_dm.c +++ b/tools/libxl/libxl_dm.c @@ -1365,13 +1365,15 @@ void libxl__spawn_local_dm(libxl__egc *egc, libxl__dm_spawn_state *dmss) libxl__sprintf(gc, "%s/hvmloader/bios", path), "%s", libxl_bios_type_to_string(b_info->u.hvm.bios)); /* Disable relocating memory to make the MMIO hole larger - * unless we're running qemu-traditional */ + * unless we're running qemu-traditional and vNUMA is not + * configured. */ libxl__xs_write(gc, XBT_NULL, libxl__sprintf(gc, "%s/hvmloader/allow-memory-relocate", path), "%d", - b_info->device_model_version==LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL); + b_info->device_model_version==LIBXL_DEVICE_MODEL_VERSION_QEMU_XEN_TRADITIONAL && +!libxl__vnuma_configured(b_info)); free(path); } diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h index e93089a..d04b6aa 100644 --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h @@ -3413,6 +3413,7 @@ int libxl__vnuma_build_vmemrange_hvm(libxl__gc *gc, libxl_domain_build_info *b_info, libxl__domain_build_state *state, struct xc_hvm_build_args *args); +bool libxl__vnuma_configured(const libxl_domain_build_info *b_info); _hidden int libxl__ms_vm_genid_set(libxl__gc *gc, uint32_t domid, const libxl_ms_vm_genid *id); diff --git a/tools/libxl/libxl_vnuma.c b/tools/libxl/libxl_vnuma.c index a0576ee..6af3cde 100644 --- a/tools/libxl/libxl_vnuma.c +++ b/tools/libxl/libxl_vnuma.c @@ -17,6 +17,11 @@ #include "libxl_arch.h" #include +bool libxl__vnuma_configured(const libxl_domain_build_info *b_info) +{ +return b_info->num_vnuma_nodes != 0; +} + /* Sort vmemranges in ascending order with "start" */ static int compare_vmemrange(const void *a, const void *b) { -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH] correct mis-conversion set_bit() -> __cpumask_set_cpu() by 4aaca0e9cd
Monday, February 23, 2015, 12:06:00 PM, you wrote: > I have no idea how I came to use __cpumask_set_cpu() there, the > conversion should have been set_bit() -> __set_bit(). The wrong > construct results in problems on systems with relatively few CPUs. > Reported-by: Sander Eikelenboom > Signed-off-by: Jan Beulich > --- a/xen/common/softirq.c > +++ b/xen/common/softirq.c > @@ -106,7 +106,7 @@ void cpu_raise_softirq(unsigned int cpu, > if ( !per_cpu(batching, this_cpu) || in_irq() ) > smp_send_event_check_cpu(cpu); > else > -__cpumask_set_cpu(nr, &per_cpu(batch_mask, this_cpu)); > +__set_bit(nr, &per_cpu(batch_mask, this_cpu)); > } > > void cpu_raise_softirq_batch_begin(void) Hi Jan, Any reason this wasn't applied to staging yet ? -- Sander ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v6 09/23] libxl: introduce libxl__vnuma_config_check
This function is used to check whether vNUMA configuration (be it auto-generated or supplied by user) is valid. Define a new error code ERROR_VNUMA_CONFIG_INVALID. The checks performed can be found in the comment of the function. This vNUMA function (and future ones) is placed in a new file called libxl_vnuma.c Signed-off-by: Wei Liu Cc: Ian Campbell Cc: Ian Jackson Cc: Dario Faggioli Cc: Elena Ufimtseva --- Changes in v6: 1. Address comments from Andrew. 2. Check vdistances. 3. use libxl_numainfo_list_free. 4. Change p to v. Changes in v5: 1. Define and use new error code. 2. Use LOG macro. 3. Fix hard tabs. Changes in v4: 1. Adapt to new interface. Changes in v3: 1. Rewrite commit log. 2. Shorten two error messages. --- tools/libxl/Makefile | 2 +- tools/libxl/libxl_internal.h | 7 ++ tools/libxl/libxl_types.idl | 1 + tools/libxl/libxl_vnuma.c| 151 +++ 4 files changed, 160 insertions(+), 1 deletion(-) create mode 100644 tools/libxl/libxl_vnuma.c diff --git a/tools/libxl/Makefile b/tools/libxl/Makefile index 7329521..1b16598 100644 --- a/tools/libxl/Makefile +++ b/tools/libxl/Makefile @@ -93,7 +93,7 @@ LIBXL_LIBS += -lyajl LIBXL_OBJS = flexarray.o libxl.o libxl_create.o libxl_dm.o libxl_pci.o \ libxl_dom.o libxl_exec.o libxl_xshelp.o libxl_device.o \ libxl_internal.o libxl_utils.o libxl_uuid.o \ - libxl_json.o libxl_aoutils.o libxl_numa.o \ + libxl_json.o libxl_aoutils.o libxl_numa.o libxl_vnuma.o \ libxl_save_callout.o _libxl_save_msgs_callout.o \ libxl_qmp.o libxl_event.o libxl_fork.o $(LIBXL_OBJS-y) LIBXL_OBJS += libxl_genid.o diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h index 6d3ac58..258be0d 100644 --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h @@ -3394,6 +3394,13 @@ void libxl__numa_candidate_put_nodemap(libxl__gc *gc, libxl_bitmap_copy(CTX, &cndt->nodemap, nodemap); } +/* Check if vNUMA config is valid. Returns 0 if valid, + * ERROR_VNUMA_CONFIG_INVALID otherwise. + */ +int libxl__vnuma_config_check(libxl__gc *gc, + const libxl_domain_build_info *b_info, + const libxl__domain_build_state *state); + _hidden int libxl__ms_vm_genid_set(libxl__gc *gc, uint32_t domid, const libxl_ms_vm_genid *id); diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl index 14c7e7c..23951fc 100644 --- a/tools/libxl/libxl_types.idl +++ b/tools/libxl/libxl_types.idl @@ -63,6 +63,7 @@ libxl_error = Enumeration("error", [ (-17, "DEVICE_EXISTS"), (-18, "REMUS_DEVOPS_DOES_NOT_MATCH"), (-19, "REMUS_DEVICE_NOT_SUPPORTED"), +(-20, "VNUMA_CONFIG_INVALID"), ], value_namespace = "") libxl_domain_type = Enumeration("domain_type", [ diff --git a/tools/libxl/libxl_vnuma.c b/tools/libxl/libxl_vnuma.c new file mode 100644 index 000..33d7a3c --- /dev/null +++ b/tools/libxl/libxl_vnuma.c @@ -0,0 +1,151 @@ +/* + * Copyright (C) 2014 Citrix Ltd. + * Author Wei Liu + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU Lesser General Public License as published + * by the Free Software Foundation; version 2.1 only. with the special + * exception on linking described in file LICENSE. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU Lesser General Public License for more details. + */ +#include "libxl_osdeps.h" /* must come before any other headers */ +#include "libxl_internal.h" +#include + +/* Sort vmemranges in ascending order with "start" */ +static int compare_vmemrange(const void *a, const void *b) +{ +const xen_vmemrange_t *x = a, *y = b; +if (x->start < y->start) +return -1; +if (x->start > y->start) +return 1; +return 0; +} + +/* Check if vNUMA configuration is valid: + * 1. all pnodes inside vnode_to_pnode array are valid + * 2. each vcpu belongs to one and only one vnode + * 3. each vmemrange is valid and doesn't overlap with any other + * 4. local distance cannot be larger than remote distance + */ +int libxl__vnuma_config_check(libxl__gc *gc, + const libxl_domain_build_info *b_info, + const libxl__domain_build_state *state) +{ +int nr_nodes = 0, rc = ERROR_VNUMA_CONFIG_INVALID; +unsigned int i, j; +libxl_numainfo *ninfo = NULL; +uint64_t total_memkb = 0; +libxl_bitmap cpumap; +libxl_vnode_info *v; + +libxl_bitmap_init(&cpumap); + +/* Check pnode specified is valid */ +ninfo = libxl_get_numainfo(CTX, &nr_nodes); +if (!ninfo) { +LOG(ERROR, "libxl_get_numain
Re: [Xen-devel] [PATCH v2] RFC: Automatically check xen's public headers for C++ pitfalls.
On 26/02/15 16:28, Tim Deegan wrote: > > BTW, ring.h is the only instance of that, so the extra diff to clear > that up too is pretty small (see below). > > Not sure what people think about that though - it might be > quite a PITA for downstream users of it, though they ought really to > be using local copies so they can update in a controlled way. With my linux maintainer hat on, this is fine by me. David ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v6 05/23] libxc: add p2m_size to xc_dom_image
Add a new field p2m_size to keep track of the number of pages covered by p2m. Change total_pages to p2m_size in functions which in fact need the size of p2m. This is needed because we are going to ditch the assumption that PV x86 has only one contiguous ram region. Originally the p2m size was always equal to total_pages, but we will soon change that in later patch. This patch doesn't change the behaviour of libxc. Signed-off-by: Wei Liu Reviewed-by: Dario Faggioli Cc: Ian Campbell Cc: Ian Jackson --- tools/libxc/include/xc_dom.h | 1 + tools/libxc/xc_dom_arm.c | 1 + tools/libxc/xc_dom_core.c| 8 tools/libxc/xc_dom_x86.c | 19 +++ 4 files changed, 17 insertions(+), 12 deletions(-) diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h index 07d7224..6b8ddf4 100644 --- a/tools/libxc/include/xc_dom.h +++ b/tools/libxc/include/xc_dom.h @@ -129,6 +129,7 @@ struct xc_dom_image { */ xen_pfn_t rambase_pfn; xen_pfn_t total_pages; +xen_pfn_t p2m_size; /* number of pfns covered by p2m */ struct xc_dom_phys *phys_pages; int realmodearea_log; #if defined (__arm__) || defined(__aarch64__) diff --git a/tools/libxc/xc_dom_arm.c b/tools/libxc/xc_dom_arm.c index c7feca7..b9fa66d 100644 --- a/tools/libxc/xc_dom_arm.c +++ b/tools/libxc/xc_dom_arm.c @@ -449,6 +449,7 @@ int arch_setup_meminit(struct xc_dom_image *dom) assert(dom->rambank_size[0] != 0); assert(ramsize == 0); /* Too much RAM is rejected above */ +dom->p2m_size = p2m_size; dom->p2m_host = xc_dom_malloc(dom, sizeof(xen_pfn_t) * p2m_size); if ( dom->p2m_host == NULL ) return -EINVAL; diff --git a/tools/libxc/xc_dom_core.c b/tools/libxc/xc_dom_core.c index ecbf981..b100ce1 100644 --- a/tools/libxc/xc_dom_core.c +++ b/tools/libxc/xc_dom_core.c @@ -931,9 +931,9 @@ int xc_dom_update_guest_p2m(struct xc_dom_image *dom) { case 4: DOMPRINTF("%s: dst 32bit, pages 0x%" PRIpfn "", - __FUNCTION__, dom->total_pages); + __FUNCTION__, dom->p2m_size); p2m_32 = dom->p2m_guest; -for ( i = 0; i < dom->total_pages; i++ ) +for ( i = 0; i < dom->p2m_size; i++ ) if ( dom->p2m_host[i] != INVALID_P2M_ENTRY ) p2m_32[i] = dom->p2m_host[i]; else @@ -941,9 +941,9 @@ int xc_dom_update_guest_p2m(struct xc_dom_image *dom) break; case 8: DOMPRINTF("%s: dst 64bit, pages 0x%" PRIpfn "", - __FUNCTION__, dom->total_pages); + __FUNCTION__, dom->p2m_size); p2m_64 = dom->p2m_guest; -for ( i = 0; i < dom->total_pages; i++ ) +for ( i = 0; i < dom->p2m_size; i++ ) if ( dom->p2m_host[i] != INVALID_P2M_ENTRY ) p2m_64[i] = dom->p2m_host[i]; else diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c index 9dbaedb..bea54f2 100644 --- a/tools/libxc/xc_dom_x86.c +++ b/tools/libxc/xc_dom_x86.c @@ -122,11 +122,11 @@ static int count_pgtables(struct xc_dom_image *dom, int pae, try_pfn_end = (try_virt_end - dom->parms.virt_base) >> PAGE_SHIFT_X86; -if ( try_pfn_end > dom->total_pages ) +if ( try_pfn_end > dom->p2m_size ) { xc_dom_panic(dom->xch, XC_OUT_OF_MEMORY, "%s: not enough memory for initial mapping (%#"PRIpfn" > %#"PRIpfn")", - __FUNCTION__, try_pfn_end, dom->total_pages); + __FUNCTION__, try_pfn_end, dom->p2m_size); return -ENOMEM; } @@ -440,10 +440,11 @@ pfn_error: static int alloc_magic_pages(struct xc_dom_image *dom) { -size_t p2m_size = dom->total_pages * dom->arch_hooks->sizeof_pfn; +size_t p2m_alloc_size = dom->p2m_size * dom->arch_hooks->sizeof_pfn; /* allocate phys2mach table */ -if ( xc_dom_alloc_segment(dom, &dom->p2m_seg, "phys2mach", 0, p2m_size) ) +if ( xc_dom_alloc_segment(dom, &dom->p2m_seg, "phys2mach", + 0, p2m_alloc_size) ) return -1; dom->p2m_guest = xc_dom_seg_to_ptr(dom, &dom->p2m_seg); if ( dom->p2m_guest == NULL ) @@ -777,8 +778,9 @@ int arch_setup_meminit(struct xc_dom_image *dom) int count = dom->total_pages >> SUPERPAGE_PFN_SHIFT; xen_pfn_t extents[count]; +dom->p2m_size = dom->total_pages; dom->p2m_host = xc_dom_malloc(dom, sizeof(xen_pfn_t) * - dom->total_pages); + dom->p2m_size); if ( dom->p2m_host == NULL ) return -EINVAL; @@ -810,8 +812,9 @@ int arch_setup_meminit(struct xc_dom_image *dom) return rc; } /* setup initial p2m */ +dom->p2m_size = dom->total_pages; dom->p2m_host = xc_dom_malloc(dom, sizeof(xen_pfn_t) * - dom->total_pages); +
Re: [Xen-devel] Shared page tables between ETP and IOMMU issue
>>> On 26.02.15 at 16:45, wrote: > While testing PVH Dom0 support on a newer Core i3-5010U I've found that > sharing the page tables between EPT and the IOMMUs don't work. Booting > with iommu=no-sharept solves the problem, but I'm unsure what causes > this issue. Is FreeBSD fiddling with its own memory map in some way? It's rather surprising to see not just an occasional fault, but many of them, and with L2 or even L3 entries not present. I.e. if it's not the OS requesting re-arrangements, I would suppose table setup itself is screwed up in some way. In the end - knowing the valid GFN range for the guest - you may want to monitor/log how tables get created and whether (and if so by whom) later some of the entries get zapped. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2] RFC: Automatically check xen's public headers for C++ pitfalls.
>>> On 26.02.15 at 17:28, wrote: > At 16:11 + on 26 Feb (1424963496), Tim Deegan wrote: >> Explicitly _not_ addressing the use of 'private' in various fields, >> since we'd previously decided not to fix that. > > BTW, ring.h is the only instance of that, so the extra diff to clear > that up too is pretty small (see below). > > Not sure what people think about that though - it might be > quite a PITA for downstream users of it, though they ought really to > be using local copies so they can update in a controlled way. linux-2.6.18-xen.hg always having consumed them (almost) verbatim, I don't think we should break users not massaging the headers. I.e. at least make the field name conditional upon using C vs C++. Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH] Config.mk: update OVMF revision
Update OVMF revision to the latest tested commit. Signed-off-by: Wei Liu Cc: Ian Campbell Cc: Ian Jackson Cc: Anthony Perard --- Before applying this patch, please pull from git://xenbits.xen.org/osstest/ovmf.git xen-tested-master and push all changes to git://xenbits.xen.org/ovmf.git master It should be a fast-forward push. --- Config.mk | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Config.mk b/Config.mk index d12ad91..173c2f7 100644 --- a/Config.mk +++ b/Config.mk @@ -251,7 +251,7 @@ QEMU_UPSTREAM_URL ?= git://xenbits.xen.org/qemu-upstream-unstable.git QEMU_TRADITIONAL_URL ?= git://xenbits.xen.org/qemu-xen-unstable.git SEABIOS_UPSTREAM_URL ?= git://xenbits.xen.org/seabios.git endif -OVMF_UPSTREAM_REVISION ?= 447d264115c476142f884af0be287622cd244423 +OVMF_UPSTREAM_REVISION ?= a065efc7c7ce8bb3e5cb3e463099d023d4a92927 QEMU_UPSTREAM_REVISION ?= master SEABIOS_UPSTREAM_REVISION ?= rel-1.7.5 # Thu May 22 16:59:16 2014 -0400 -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Branch Trace Storage for guests and VPMUinitialization
On 02/26/2015 03:56 AM, Dietmar Hahn wrote: Am Mittwoch 25 Februar 2015, 11:31:31 schrieb Boris Ostrovsky: On 02/25/2015 10:12 AM, kevin.ma...@gdata.de wrote: -Ursprüngliche Nachricht- Von: Boris Ostrovsky [mailto:boris.ostrov...@oracle.com] Gesendet: Dienstag, 24. Februar 2015 18:13 An: Mayer, Kevin; xen-devel@lists.xen.org Betreff: Re: [Xen-devel] Branch Trace Storage for guests and VPMU initialization On 02/24/2015 10:27 AM, kevin.ma...@gdata.de wrote: Hi guys I`m trying to set up the BTS so that I can log the branches taken in the guest using Xen 4.4.1 with a WinXP SP3 guest on a Core i7 Sandy Bridge. I added the vpmu=bts boot parameter to my grub2 configuration and extended the libxl,libxc,domctl,… with an own command so that I can trigger the activation of the BTS whenever I want. I am not sure why you are doing all these changes to Xen code. BTS is supposed to be managed from the guest. For example, a Fedora HVM guest will produce this: [root@dhcp-burlington7-2nd-B-east-10-152-55-140 ~]# perf record -e branches:u -c 1 -d sleep 1 [ perf record: Woken up 3838 times to write data ] [ perf record: Captured and wrote 0.704 MB perf.data (~30756 samples) ] [root@dhcp-burlington7-2nd-B-east-10-152-55-140 ~]# perf script -f ip,addr,sym,dso,symoff --show-kernel-path 8167c347 native_irq_return_iret+0x0 (/proc/kcore) => 328c001590 [unknown] (/proc/kcore) 8167c347 native_irq_return_iret+0x0 (/proc/kcore) => 328c001590 [unknown] ([unknown]) 328c001593 [unknown] ([unknown]) => 328c004b70 [unknown] ([unknown]) ... I want to be able to log the taken branches (of the guest) without the need to modify the guest at all. This means I have to do all the logic in the hypervisor, or am I wrong? In that case, yes. But then you have to make sure that at least * you don't load guest's VPMU (or, at least, BTS-related registers) on context switch But you need to modify PMU registers when switching to/from the guest context to get PMU running. I was thinking that all BTS stuff can be controlled from dom0 and so we can use dom0's version of these registers. I didn't realize that DS_AREA would have to be accessed in guest's address space (and that DEBUGCTL is loaded from VMCS). Which is what I think I said in response to this message (which didn't show up on the list because Kevin accidentally dropped xen-devel). -boris I didn't think of using the VPMU stuff with modifying the context from outside the guest. * You don't send the interrupt to the guest (meaning that you will need to somehow inform dom0 of the BTS interrupt) and probably more. Essentially, you want dom0 to profile the guest. I have been working on patches that would allow that but they are still under review. In this command I do the following: I set up the memory region for the BTS Buffer and the DS Buffer Management Area using xzalloc_bytes I don't think you should be allocating BTS buffers in the hypervisor, they are in guest's memory. I agree. As I said I think this is where my main problem is at the moment. Is there any way I can allocate memory in the hypervisor in a way the guest can access it? I am not sure this is what you want since you seem to *not* want the guest to process the samples, right? But yes, you can. E.g. something like what map_vcpu_info() does. (I have no idea how you'd do this from Windows.) The DS buffer has to be mapped within the guests address space so the CPU running in guest context can access this area. Otherwise you get this triple fault. So I would think you need a mixture of writing some stuff in Windows and patching the hypervisor. Dietmar. Of course the guest must not be able to use this memory in its normal operations but just for BTS. Is this even possible? I am rather confused at the moment. :-D Then I write the pointer to the BTS Buffer into the DS Buffer Management Area at +0x0 and +0x8 (BTS Buffer Base and BTS Index) When I use vmx_msr_write_intercept to store the value in MSR_IA32_DS_AREA the host reboots (my idea is he tries to access a vpmu-struct that isn´t there in the current vcpu and panics). Who is trying to write to MSR_IA32_DS_AREA? The guest or dom0? I thought you said that you want dom0 to do sampling. Or are you trying to setup DS area from your guest and control it from dom0? I am somewhat confused. Can you post hypervisor log? (hard to say how helpful it will be without seeing your code changes though) Right after enabling the BTS I get a triple fault. hvm.c:1357:d2 Triple fault on VCPU0 - invoking HVM shutdown action 1. That's not host reboot, this is your guest dying. When I use a modified version of vmx_msr_write_intercept I don’t get any crashes as long as I don’t enable BTS and TR in the GUEST_IA32_DEBUGCTL (BTR works). When I enable the BTS (and TR) the guest crashes. I suppose he gets killed by the hypervisor for accessing forbidden memory. Possibly because DS area point to hyperv
[Xen-devel] [PATCH v6 13/23] libxc: indentation change to xc_hvm_build_x86.c
Move a while loop in xc_hvm_build_x86 one block to the right. No functional change introduced. Functional changes will be introduced in next patch. Signed-off-by: Wei Liu Cc: Ian Campbell Cc: Ian Jackson Cc: Dario Faggioli Cc: Elena Ufimtseva Acked-by: Ian Campbell --- tools/libxc/xc_hvm_build_x86.c | 153 ++--- 1 file changed, 81 insertions(+), 72 deletions(-) diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c index c81a25b..ecc3224 100644 --- a/tools/libxc/xc_hvm_build_x86.c +++ b/tools/libxc/xc_hvm_build_x86.c @@ -353,98 +353,107 @@ static int setup_guest(xc_interface *xch, cur_pages = 0xc0; stat_normal_pages = 0xc0; -while ( (rc == 0) && (nr_pages > cur_pages) ) { -/* Clip count to maximum 1GB extent. */ -unsigned long count = nr_pages - cur_pages; -unsigned long max_pages = SUPERPAGE_1GB_NR_PFNS; - -if ( count > max_pages ) -count = max_pages; - -cur_pfn = page_array[cur_pages]; - -/* Take care the corner cases of super page tails */ -if ( ((cur_pfn & (SUPERPAGE_1GB_NR_PFNS-1)) != 0) && - (count > (-cur_pfn & (SUPERPAGE_1GB_NR_PFNS-1))) ) -count = -cur_pfn & (SUPERPAGE_1GB_NR_PFNS-1); -else if ( ((count & (SUPERPAGE_1GB_NR_PFNS-1)) != 0) && - (count > SUPERPAGE_1GB_NR_PFNS) ) -count &= ~(SUPERPAGE_1GB_NR_PFNS - 1); - -/* Attemp to allocate 1GB super page. Because in each pass we only - * allocate at most 1GB, we don't have to clip super page boundaries. - */ -if ( ((count | cur_pfn) & (SUPERPAGE_1GB_NR_PFNS - 1)) == 0 && - /* Check if there exists MMIO hole in the 1GB memory range */ - !check_mmio_hole(cur_pfn << PAGE_SHIFT, - SUPERPAGE_1GB_NR_PFNS << PAGE_SHIFT, - mmio_start, mmio_size) ) +while ( (rc == 0) && (nr_pages > cur_pages) ) { -long done; -unsigned long nr_extents = count >> SUPERPAGE_1GB_SHIFT; -xen_pfn_t sp_extents[nr_extents]; - -for ( i = 0; i < nr_extents; i++ ) -sp_extents[i] = page_array[cur_pages+(i< 0 ) -{ -stat_1gb_pages += done; -done <<= SUPERPAGE_1GB_SHIFT; -cur_pages += done; -count -= done; -} -} +/* Clip count to maximum 1GB extent. */ +unsigned long count = nr_pages - cur_pages; +unsigned long max_pages = SUPERPAGE_1GB_NR_PFNS; -if ( count != 0 ) -{ -/* Clip count to maximum 8MB extent. */ -max_pages = SUPERPAGE_2MB_NR_PFNS * 4; if ( count > max_pages ) count = max_pages; - -/* Clip partial superpage extents to superpage boundaries. */ -if ( ((cur_pfn & (SUPERPAGE_2MB_NR_PFNS-1)) != 0) && - (count > (-cur_pfn & (SUPERPAGE_2MB_NR_PFNS-1))) ) -count = -cur_pfn & (SUPERPAGE_2MB_NR_PFNS-1); -else if ( ((count & (SUPERPAGE_2MB_NR_PFNS-1)) != 0) && - (count > SUPERPAGE_2MB_NR_PFNS) ) -count &= ~(SUPERPAGE_2MB_NR_PFNS - 1); /* clip non-s.p. tail */ - -/* Attempt to allocate superpage extents. */ -if ( ((count | cur_pfn) & (SUPERPAGE_2MB_NR_PFNS - 1)) == 0 ) + +cur_pfn = page_array[cur_pages]; + +/* Take care the corner cases of super page tails */ +if ( ((cur_pfn & (SUPERPAGE_1GB_NR_PFNS-1)) != 0) && + (count > (-cur_pfn & (SUPERPAGE_1GB_NR_PFNS-1))) ) +count = -cur_pfn & (SUPERPAGE_1GB_NR_PFNS-1); +else if ( ((count & (SUPERPAGE_1GB_NR_PFNS-1)) != 0) && + (count > SUPERPAGE_1GB_NR_PFNS) ) +count &= ~(SUPERPAGE_1GB_NR_PFNS - 1); + +/* Attemp to allocate 1GB super page. Because in each pass + * we only allocate at most 1GB, we don't have to clip + * super page boundaries. + */ +if ( ((count | cur_pfn) & (SUPERPAGE_1GB_NR_PFNS - 1)) == 0 && + /* Check if there exists MMIO hole in the 1GB memory + * range */ + !check_mmio_hole(cur_pfn << PAGE_SHIFT, + SUPERPAGE_1GB_NR_PFNS << PAGE_SHIFT, + mmio_start, mmio_size) ) { long done; -unsigned long nr_extents = count >> SUPERPAGE_2MB_SHIFT; +unsigned long nr_extents = count >> SUPERPAGE_1GB_SHIFT; xen_pfn_t sp_extents[nr_extents]; for ( i = 0; i < nr_extents; i++ ) -sp_extents[i] = page_array[cur_pages+(i< 0 ) { -stat_2mb_pages += done; -
Re: [Xen-devel] [RFC] When to use "domain creation flag" or "HVM param"?
At 15:33 + on 26 Feb (1424961188), Julien Grall wrote: > Hi, > > On 26/02/15 11:09, Lars Kurth wrote: > > Tim, Andrew, Jan, > > it seems as if we are slowly coming to some conclusion on this thread. If > > I am mistaken, I am wondering whether it would make sense to have an IRC > > meeting with all the involved stake-holders and report back to the list. > > I'm not sure where I should answer... > > We have a similar problem on ARM where we have arch-specific information > (GIC version, number of interrupts) which changes between each domain. > > On Xen 4.5, we took the approach to create a separate DOMCTL for passing > information. It has to be called before any VCPUs is created > (DOMCTL_set_max_vcpus) and make the code more complicate to handle > because we have to defer some domain initialization. > > I took another approach for Xen 4.6 based on Jan suggestion [1]. A v3 as > been send recently [2] and we had some discussion about what is the best > approach. This line (adding these immutable config options at create time) seems like a good one to me. For migration, we'd need a hypercall that lets the Xen tools extract the correct values to pass to the receiving Xen. Xen would fill in the actual values used for anything (like this GIC option) that was set to 'default' or 'don't care' on the initial create op. Andrew Cooper had some reasons why we might want to split this into a bare create op (which might do no more than allocate a domid) and a set-config op that would take these and all other immutable flags. I'm not wild for that but could be convinced either way -- I'll let him fill in the details. Cheers, Tim. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v6 04/23] libxc: duplicate snippet to allocate p2m_host array
Currently all in tree code doesn't set the superpage flag, but Konrad wants it retained for the moment. As I'm going to change the p2m_host array allocation, duplicate the code snippet to allocate p2m_host array in this patch, so that we retain the behaviour in superpage case. This patch introduces no functional change and it will make future patch easier to review. Also removed one stray tab while I was there. Signed-off-by: Wei Liu Cc: Ian Campbell Cc: Ian Jackson CC: Konrad Wilk --- tools/libxc/xc_dom_x86.c | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c index bf06fe4..9dbaedb 100644 --- a/tools/libxc/xc_dom_x86.c +++ b/tools/libxc/xc_dom_x86.c @@ -772,15 +772,16 @@ int arch_setup_meminit(struct xc_dom_image *dom) return rc; } -dom->p2m_host = xc_dom_malloc(dom, sizeof(xen_pfn_t) * dom->total_pages); -if ( dom->p2m_host == NULL ) -return -EINVAL; - if ( dom->superpages ) { int count = dom->total_pages >> SUPERPAGE_PFN_SHIFT; xen_pfn_t extents[count]; +dom->p2m_host = xc_dom_malloc(dom, sizeof(xen_pfn_t) * + dom->total_pages); +if ( dom->p2m_host == NULL ) +return -EINVAL; + DOMPRINTF("Populating memory with %d superpages", count); for ( pfn = 0; pfn < count; pfn++ ) extents[pfn] = pfn << SUPERPAGE_PFN_SHIFT; @@ -809,9 +810,13 @@ int arch_setup_meminit(struct xc_dom_image *dom) return rc; } /* setup initial p2m */ +dom->p2m_host = xc_dom_malloc(dom, sizeof(xen_pfn_t) * + dom->total_pages); +if ( dom->p2m_host == NULL ) +return -EINVAL; for ( pfn = 0; pfn < dom->total_pages; pfn++ ) dom->p2m_host[pfn] = pfn; - + /* allocate guest memory */ for ( i = rc = allocsz = 0; (i < dom->total_pages) && !rc; -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: xen config changes v4
On Thu, Feb 26, 2015 at 11:08:20AM +, Stefano Stabellini wrote: > On Thu, 26 Feb 2015, David Vrabel wrote: > > On 26/02/15 04:59, Juergen Gross wrote: > > > > > > So we are again in the situation that pv-drivers always imply the pvops > > > kernel (PARAVIRT selected). I started the whole Kconfig rework to > > > eliminate this dependency. > > > > Yes. Can you produce a series that just addresses this one issue. > > > > In the absence of any concrete requirement for this big Kconfig reorg I > > I don't think it is helpful. > > I clearly missed some context as I didn't realize that this was the > intended goal. Why do we want this? Please explain as it won't come > for free. > > > We have a few PV interfaces for HVM guests that need PARAVIRT in Linux > in order to be used, for example pv_time_ops and HVMOP_pagetable_dying. > They are critical performance improvements and from the interface > perspective, small enough that doesn't make much sense having a separate > KConfig option for them. > > > In order to reach the goal above we necessarily need to introduce a > differentiation in terms of PV on HVM guests in Linux: > > 1) basic guests with PV network, disk, etc but no PV timers, no >HVMOP_pagetable_dying, no PV IPIs > 2) full PV on HVM guests that have PV network, disk, timers, >HVMOP_pagetable_dying, PV IPIs and anything else that makes sense. > > 2) is much faster than 1) on Xen and 2) is only a tiny bit slower than > 1) on native x86 Also don't we shove 2) down hvm guests right now? Even when everything is built in I do not see how we opt out for HVM for 1) at run time right now. If this is true then the question of motivation for this becomes even stronger I think. Luis ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v6 12/23] libxl: build, check and pass vNUMA info to Xen for PV guest
Transform the user supplied vNUMA configuration into libxl internal representations, and finally libxc representations. Check validity of the configuration along the line. Signed-off-by: Wei Liu Reviewed-by: Dario Faggioli Cc: Ian Campbell Cc: Ian Jackson Cc: Dario Faggioli Cc: Elena Ufimtseva Acked-by: Ian Campbell --- Changes in v6: 1. Use "unsigned" for some variables. 2. Variable name: bit -> j. Changes in v5: 1. Adapt to change of interface (ditching xc_vnuma_info). Changes in v4: 1. Adapt to new interfaces. Changes in v3: 1. Add more commit log. --- tools/libxl/libxl_dom.c | 77 + 1 file changed, 77 insertions(+) diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c index a16d4a1..b58a19b 100644 --- a/tools/libxl/libxl_dom.c +++ b/tools/libxl/libxl_dom.c @@ -515,6 +515,51 @@ retry_transaction: return 0; } +static int set_vnuma_info(libxl__gc *gc, uint32_t domid, + const libxl_domain_build_info *info, + const libxl__domain_build_state *state) +{ +int rc = 0; +unsigned int i, nr_vdistance; +unsigned int *vcpu_to_vnode, *vnode_to_pnode, *vdistance = NULL; + +vcpu_to_vnode = libxl__calloc(gc, info->max_vcpus, + sizeof(unsigned int)); +vnode_to_pnode = libxl__calloc(gc, info->num_vnuma_nodes, + sizeof(unsigned int)); + +nr_vdistance = info->num_vnuma_nodes * info->num_vnuma_nodes; +vdistance = libxl__calloc(gc, nr_vdistance, sizeof(unsigned int)); + +for (i = 0; i < info->num_vnuma_nodes; i++) { +libxl_vnode_info *v = &info->vnuma_nodes[i]; +int j; + +/* vnode to pnode mapping */ +vnode_to_pnode[i] = v->pnode; + +/* vcpu to vnode mapping */ +libxl_for_each_set_bit(j, v->vcpus) +vcpu_to_vnode[j] = i; + +/* node distances */ +assert(info->num_vnuma_nodes == v->num_distances); +memcpy(vdistance + (i * info->num_vnuma_nodes), + v->distances, + v->num_distances * sizeof(unsigned int)); +} + +if (xc_domain_setvnuma(CTX->xch, domid, info->num_vnuma_nodes, + state->num_vmemranges, info->max_vcpus, + state->vmemranges, vdistance, + vcpu_to_vnode, vnode_to_pnode) < 0) { +LOGE(ERROR, "xc_domain_setvnuma failed"); +rc = ERROR_FAIL; +} + +return rc; +} + int libxl__build_pv(libxl__gc *gc, uint32_t domid, libxl_domain_build_info *info, libxl__domain_build_state *state) { @@ -572,6 +617,38 @@ int libxl__build_pv(libxl__gc *gc, uint32_t domid, dom->xenstore_domid = state->store_domid; dom->claim_enabled = libxl_defbool_val(info->claim_mode); +if (info->num_vnuma_nodes != 0) { +unsigned int i; + +ret = libxl__vnuma_build_vmemrange_pv(gc, domid, info, state); +if (ret) { +LOGE(ERROR, "cannot build vmemranges"); +goto out; +} +ret = libxl__vnuma_config_check(gc, info, state); +if (ret) goto out; + +ret = set_vnuma_info(gc, domid, info, state); +if (ret) goto out; + +dom->nr_vmemranges = state->num_vmemranges; +dom->vmemranges = xc_dom_malloc(dom, sizeof(*dom->vmemranges) * +dom->nr_vmemranges); + +for (i = 0; i < dom->nr_vmemranges; i++) { +dom->vmemranges[i].start = state->vmemranges[i].start; +dom->vmemranges[i].end = state->vmemranges[i].end; +dom->vmemranges[i].flags = state->vmemranges[i].flags; +dom->vmemranges[i].nid = state->vmemranges[i].nid; +} + +dom->nr_vnodes = info->num_vnuma_nodes; +dom->vnode_to_pnode = xc_dom_malloc(dom, sizeof(*dom->vnode_to_pnode) * +dom->nr_vnodes); +for (i = 0; i < info->num_vnuma_nodes; i++) +dom->vnode_to_pnode[i] = info->vnuma_nodes[i].pnode; +} + if ( (ret = xc_dom_boot_xen_init(dom, ctx->xch, domid)) != 0 ) { LOGE(ERROR, "xc_dom_boot_xen_init failed"); goto out; -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v6 06/23] libxc: allocate memory with vNUMA information for PV guest
>From libxc's point of view, it only needs to know vnode to pnode mapping and size of each vnode to allocate memory accordingly. Add these fields to xc_dom structure. The caller might not pass in vNUMA information. In that case, a dummy layout is generated for the convenience of libxc's allocation code. The upper layer (libxl etc) still sees the domain has no vNUMA configuration. Note that for this patch on PV x86 guest can have multiple regions of ram allocated. Signed-off-by: Wei Liu Cc: Ian Campbell Cc: Ian Jackson Cc: Dario Faggioli Cc: Elena Ufimtseva --- Changes in v6: 1. Ditch XC_VNUMA_NO_NODE and use XEN_NUMA_NO_NODE. 2. Update comment in xc_dom.h. Changes in v5: 1. Ditch xc_vnuma_info. Changes in v4: 1. Pack fields into a struct. 2. Use "page" as unit. 3. __FUNCTION__ -> __func__. 4. Don't print total_pages. 5. Improve comment. Changes in v3: 1. Rewrite commit log. 2. Shorten some error messages. --- tools/libxc/include/xc_dom.h | 12 - tools/libxc/xc_dom_x86.c | 101 +-- 2 files changed, 97 insertions(+), 16 deletions(-) diff --git a/tools/libxc/include/xc_dom.h b/tools/libxc/include/xc_dom.h index 6b8ddf4..a7d059a 100644 --- a/tools/libxc/include/xc_dom.h +++ b/tools/libxc/include/xc_dom.h @@ -119,8 +119,10 @@ struct xc_dom_image { /* physical memory * - * An x86 PV guest has a single contiguous block of physical RAM, - * consisting of total_pages starting at rambase_pfn. + * An x86 PV guest has one or more blocks of physical RAM, + * consisting of total_pages starting at rambase_pfn. The start + * address and size of each block is controlled by vNUMA + * structures. * * An ARM guest has GUEST_RAM_BANKS regions of RAM, with * rambank_size[i] pages in each. The lowest RAM address @@ -168,6 +170,12 @@ struct xc_dom_image { struct xc_dom_loader *kernel_loader; void *private_loader; +/* vNUMA information */ +xen_vmemrange_t *vmemranges; +unsigned int nr_vmemranges; +unsigned int *vnode_to_pnode; +unsigned int nr_vnodes; + /* kernel loader */ struct xc_dom_arch *arch_hooks; /* allocate up to virt_alloc_end */ diff --git a/tools/libxc/xc_dom_x86.c b/tools/libxc/xc_dom_x86.c index bea54f2..268d4db 100644 --- a/tools/libxc/xc_dom_x86.c +++ b/tools/libxc/xc_dom_x86.c @@ -760,7 +760,8 @@ static int x86_shadow(xc_interface *xch, domid_t domid) int arch_setup_meminit(struct xc_dom_image *dom) { int rc; -xen_pfn_t pfn, allocsz, i, j, mfn; +xen_pfn_t pfn, allocsz, mfn, total, pfn_base; +int i, j; rc = x86_compat(dom->xch, dom->guest_domid, dom->guest_type); if ( rc ) @@ -811,26 +812,98 @@ int arch_setup_meminit(struct xc_dom_image *dom) if ( rc ) return rc; } -/* setup initial p2m */ -dom->p2m_size = dom->total_pages; + +/* Setup dummy vNUMA information if it's not provided. Note + * that this is a valid state if libxl doesn't provide any + * vNUMA information. + * + * The dummy values make libxc allocate all pages from + * arbitrary physical nodes. This is the expected behaviour if + * no vNUMA configuration is provided to libxc. + * + * Note that the following hunk is just for the convenience of + * allocation code. No defaulting happens in libxc. + */ +if ( dom->nr_vmemranges == 0 ) +{ +dom->nr_vmemranges = 1; +dom->vmemranges = xc_dom_malloc(dom, sizeof(*dom->vmemranges)); +dom->vmemranges[0].start = 0; +dom->vmemranges[0].end = dom->total_pages << PAGE_SHIFT; +dom->vmemranges[0].flags = 0; +dom->vmemranges[0].nid = 0; + +dom->nr_vnodes = 1; +dom->vnode_to_pnode = xc_dom_malloc(dom, + sizeof(*dom->vnode_to_pnode)); +dom->vnode_to_pnode[0] = XEN_NUMA_NO_NODE; +} + +total = dom->p2m_size = 0; +for ( i = 0; i < dom->nr_vmemranges; i++ ) +{ +total += ((dom->vmemranges[i].end - dom->vmemranges[i].start) + >> PAGE_SHIFT); +dom->p2m_size = +dom->p2m_size > (dom->vmemranges[i].end >> PAGE_SHIFT) ? +dom->p2m_size : (dom->vmemranges[i].end >> PAGE_SHIFT); +} +if ( total != dom->total_pages ) +{ +xc_dom_panic(dom->xch, XC_INTERNAL_ERROR, + "%s: vNUMA page count mismatch (0x%"PRIpfn" != 0x%"PRIpfn")\n", + __func__, total, dom->total_pages); +return -EINVAL; +} + dom->p2m_host = xc_dom_malloc(dom, sizeof(xen_pfn_t) * dom->p2m_size); if ( dom->p2m_host == NULL ) return -EINVAL; -for ( pfn = 0; pfn < dom->total_pages; pfn++ ) -d
Re: [Xen-devel] [Qemu-devel] [v2][PATCH] libxl: add one machine property to support IGD GFX passthrough
On Thu, 2015-02-26 at 14:35 +0800, Chen, Tiejun wrote: > > If we are going to do this then I think we need to arrange for the > > interface to be able to express the need to force the workarounds for a > > particular device. IOW a boolean will not suffice since it doesn't > > indicate that IGD workarounds are needed. > > > > Probably it would be simplest to just leave this functionality out for > > the time being and revisit if/when maintaining the list becomes an > > annoyance or an end user trips over it. > > > > You mean we should maintain one list to save all targeted devices, then > tools uses ids as an index to lookup this list to pass something to qemu. I (think I) meant a list of pci vid:did in libxl, which is matched against the devices passed to the domain (e.g. "pci = [...]" in xl cfg), which then enables the igd workarounds, i.e. by passing the option to qemu. > But actually one question that I have always been thinking about is, its > really a responsibility of Xen to determine which device type should be > passed by probing that pair of vendor and device ids? Xen is just one of > so many approaches to qemu so such a rare workaround option can be > passed actively by any user, instead of Xen. Furthermore, its becoming > flexible as well to those cases we want to force overriding this. I'm not sure, but I think you are suggestion that qemu should autodetect this situation, without being explicitly told "igd-passthru=on" on the command line? If the qemu maintainers are amenable to that, and it's not already the case that other components (e.g. hvmloader) need to be told about these workarounds, then I suppose that would work. > So I think qemu should mainly plays this role. If qemu realizes we're > passing through a IGD or other targeted device, it should post a warning > or even error message to indicate what right behavior is needed, or what > is that potential risk by default. Hrm, here it sounds more like you are suggesting that qemu should detect and warn, rather than detect and do the right thing? I'm not sure how Qemu could indicate what the right behaviour is going to be, it'll differ for different hypervisors or even for which Xen toolstack (xl vs libvirt etc) is in use. Or maybe I've misunderstood? Ian. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Branch Trace Storage for guests and VPMUinitialization
Am Mittwoch 25 Februar 2015, 11:31:31 schrieb Boris Ostrovsky: > On 02/25/2015 10:12 AM, kevin.ma...@gdata.de wrote: > >> -Ursprüngliche Nachricht- > >> Von: Boris Ostrovsky [mailto:boris.ostrov...@oracle.com] > >> Gesendet: Dienstag, 24. Februar 2015 18:13 > >> An: Mayer, Kevin; xen-devel@lists.xen.org > >> Betreff: Re: [Xen-devel] Branch Trace Storage for guests and VPMU > >> initialization > >> > >> On 02/24/2015 10:27 AM, kevin.ma...@gdata.de wrote: > >>> Hi guys > >>> > >>> I`m trying to set up the BTS so that I can log the branches taken in > >>> the guest using Xen 4.4.1 with a WinXP SP3 guest on a Core i7 Sandy > >>> Bridge. > >>> > >>> I added the vpmu=bts boot parameter to my grub2 configuration and > >>> extended the libxl,libxc,domctl,… with an own command so that I can > >>> trigger the activation of the BTS whenever I want. > >>> > >> > >> I am not sure why you are doing all these changes to Xen code. BTS is > >> supposed to be managed from the guest. For example, a Fedora HVM guest > >> will produce this: > >> > >> [root@dhcp-burlington7-2nd-B-east-10-152-55-140 ~]# perf record -e > >> branches:u -c 1 -d sleep 1 [ perf record: Woken up 3838 times to write > >> data ] [ > >> perf record: Captured and wrote 0.704 MB perf.data (~30756 samples) ] > >> [root@dhcp-burlington7-2nd-B-east-10-152-55-140 ~]# perf script -f > >> ip,addr,sym,dso,symoff --show-kernel-path > >>8167c347 native_irq_return_iret+0x0 (/proc/kcore) => > >> 328c001590 [unknown] (/proc/kcore) > >>8167c347 native_irq_return_iret+0x0 (/proc/kcore) => > >> 328c001590 [unknown] ([unknown]) > >> 328c001593 [unknown] ([unknown]) => 328c004b70 [unknown] > >> ([unknown]) > >> ... > >> > > I want to be able to log the taken branches (of the guest) without the need > > to modify the guest at all. > > This means I have to do all the logic in the hypervisor, or am I wrong? > > In that case, yes. But then you have to make sure that at least > * you don't load guest's VPMU (or, at least, BTS-related registers) on > context switch But you need to modify PMU registers when switching to/from the guest context to get PMU running. I didn't think of using the VPMU stuff with modifying the context from outside the guest. > * You don't send the interrupt to the guest (meaning that you will > need to somehow inform dom0 of the BTS interrupt) > > and probably more. > > Essentially, you want dom0 to profile the guest. I have been working on > patches that would allow that but they are still under review. > > > > > >>> In this command I do the following: > >>> > >>> I set up the memory region for the BTS Buffer and the DS Buffer > >>> Management Area using xzalloc_bytes > >>> > >> > >> I don't think you should be allocating BTS buffers in the hypervisor, they > >> are > >> in guest's memory. > > I agree. As I said I think this is where my main problem is at the moment. > > Is there any way I can allocate memory in the hypervisor in a way the guest > > can access it? > > I am not sure this is what you want since you seem to *not* want the > guest to process the samples, right? > > But yes, you can. E.g. something like what map_vcpu_info() does. (I have > no idea how you'd do this from Windows.) The DS buffer has to be mapped within the guests address space so the CPU running in guest context can access this area. Otherwise you get this triple fault. So I would think you need a mixture of writing some stuff in Windows and patching the hypervisor. Dietmar. > > > > Of course the guest must not be able to use this memory in its normal > > operations but just for BTS. > > Is this even possible? I am rather confused at the moment. :-D > > > >>> Then I write the pointer to the BTS Buffer into the DS Buffer > >>> Management Area at +0x0 and +0x8 (BTS Buffer Base and BTS Index) > >>> > >>> When I use vmx_msr_write_intercept to store the value in > >>> MSR_IA32_DS_AREA the host reboots (my idea is he tries to access a > >>> vpmu-struct that isn´t there in the current vcpu and panics). > > > Who is trying to write to MSR_IA32_DS_AREA? The guest or dom0? I thought > you said that you want dom0 to do sampling. Or are you trying to setup > DS area from your guest and control it from dom0? I am somewhat confused. > > >>> > >> Can you post hypervisor log? (hard to say how helpful it will be without > >> seeing your code changes though) > >> > > Right after enabling the BTS I get a triple fault. > > hvm.c:1357:d2 Triple fault on VCPU0 - invoking HVM shutdown action 1. > > > That's not host reboot, this is your guest dying. > > > > > >>> When I use a modified version of vmx_msr_write_intercept I don’t get > >>> any crashes as long as I don’t enable BTS and TR in the > >>> GUEST_IA32_DEBUGCTL (BTR works). When I enable the BTS (and TR) the > >>> guest crashes. I suppose he gets killed by the hypervisor for > >>> accessing forbidden memory. > >>> > >> Possibly because DS area point to hyperviso
Re: [Xen-devel] freemem-slack and large memory environments
On Thu, 2015-02-26 at 08:36 -0700, Mike Latimer wrote: > On Wednesday, February 25, 2015 02:09:50 PM Stefano Stabellini wrote: > > > Is the upshot that Mike doesn't need to do anything further with his > > > patch (i.e. can drop it)? I think so? > > > > Yes, I think so. Maybe he could help out testing the patches I am going > > to write :-) > > Sorry for not responding to this yesterday. > > There is still one aspect of my original patch that is important. As the code > currently stands, the target for dom0 is set lower during each iteration of > the loop. Unless only one iteration is required, dom0 will end up being set > to > a much lower target than is actually required. Is this because some sort of slack is applied once per iteration rather than once at the start or is it something else? > > There are two ways to fix this issue: > > - Set the memory target for dom0 once, before entering the loop > - During each iteration of the loop, compare the amount of needed memory to > the amount of memory which will be available once dom0 hits the target, and > only lower the target if additional memory is needed. > > My patch earlier in this thread does the former, but I think the second > option > is also possible. Is there a preference between those approaches (or a better > idea)? > > Thanks, > Mike > > ___ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] Shared page tables between ETP and IOMMU issue
Hello, While testing PVH Dom0 support on a newer Core i3-5010U I've found that sharing the page tables between EPT and the IOMMUs don't work. Booting with iommu=no-sharept solves the problem, but I'm unsure what causes this issue. Here is the output of the system successfully booting with iommu=debug,no-sharept: /boot/xen data=0x1de9f0+0x7fd22610 - /boot/kernel/kernel size=0x14bcd33 /boot/kernel/zfs.ko size 0x37d888 at 0x8155 loading required module 'opensolaris' /boot/kernel/opensolaris.ko size 0xc790 at 0x818ce000 Booting... Xen 4.6-unstable (XEN) Xen version 4.6-unstable (root@) (gcc47 (FreeBSD Ports Collection) 4.7.4) debug=y Thu Feb 26 19:23:57 UTC 2015 (XEN) Latest ChangeSet: Wed Feb 11 17:21:14 2015 +0100 git:cb34a7c-dirty (XEN) Bootloader: FreeBSD Loader (XEN) Command line: dom0_mem=2048M dom0pvh=1 console=com1,vga iommu=debug,no-sharept guest_loglvl=all loglvl=all (XEN) Video information: (XEN) VGA is text mode 80x25, font 8x16 (XEN) VBE/DDC methods: V2; EDID transfer time: 1 seconds (XEN) Disc information: (XEN) Found 1 MBR signatures (XEN) Found 1 EDD information structures (XEN) Xen-e820 RAM map: (XEN) - 0009d800 (usable) (XEN) 0009d800 - 000a (reserved) (XEN) 000e - 0010 (reserved) (XEN) 0010 - d76d8000 (usable) (XEN) d76d8000 - d7bb5000 (reserved) (XEN) d7bb5000 - dc319000 (usable) (XEN) dc319000 - dc378000 (reserved) (XEN) dc378000 - dc39b000 (ACPI data) (XEN) dc39b000 - dcccb000 (ACPI NVS) (XEN) dcccb000 - dcfff000 (reserved) (XEN) dcfff000 - dd00 (usable) (XEN) dd80 - e000 (reserved) (XEN) f800 - fc00 (reserved) (XEN) fec0 - fec01000 (reserved) (XEN) fed0 - fed04000 (reserved) (XEN) fed1c000 - fed2 (reserved) (XEN) fee0 - fee01000 (reserved) (XEN) ff00 - 0001 (reserved) (XEN) 0001 - 00021f00 (usable) (XEN) ACPI: RSDP 000F0580, 0024 (r2 INTEL ) (XEN) ACPI: XSDT DC37F090, 00A4 (r1 INTEL NUC5i3MY 1072009 AMI 10013) (XEN) ACPI: FACP DC392C10, 010C (r5 INTEL NUC5i3MY 1072009 AMI 10013) (XEN) ACPI: DSDT DC37F1C8, 13A48 (r2 INTEL NUC5i3MY 1072009 INTL 20120913) (XEN) ACPI: FACS DCCC9F80, 0040 (XEN) ACPI: APIC DC392D20, 0084 (r3 INTEL NUC5i3MY 1072009 AMI 10013) (XEN) ACPI: FPDT DC392DA8, 0044 (r1 INTEL NUC5i3MY 1072009 AMI 10013) (XEN) ACPI: FIDT DC392DF0, 009C (r1 INTEL NUC5i3MY 1072009 AMI 10013) (XEN) ACPI: MCFG DC392E90, 003C (r1 INTEL NUC5i3MY 1072009 MSFT 97) (XEN) ACPI: HPET DC392ED0, 0038 (r1 INTEL NUC5i3MY 1072009 AMI.5) (XEN) ACPI: SSDT DC392F08, 0315 (r1 INTEL NUC5i3MY 1000 INTL 20120913) (XEN) ACPI: UEFI DC393220, 0042 (r1 INTEL NUC5i3MY0 0) (XEN) ACPI: SSDT DC393268, 0C7D (r2 INTEL NUC5i3MY 1000 INTL 20120913) (XEN) ACPI: ASF! DC393EE8, 00A0 (r32 INTEL NUC5i3MY1 TFSMF4240) (XEN) ACPI: SSDT DC393F88, 0539 (r2 INTEL NUC5i3MY 3000 INTL 20120913) (XEN) ACPI: SSDT DC3944C8, 0B74 (r2 INTEL NUC5i3MY 3000 INTL 20120913) (XEN) ACPI: TPM2 DC395040, 0034 (r3 INTEL NUC5i3MY1 AMI 0) (XEN) ACPI: SSDT DC395078, 0041 (r1 INTEL NUC5i3MY 1000 INTL 20120913) (XEN) ACPI: SSDT DC3950C0, 5CF6 (r2 INTEL NUC5i3MY 3000 INTL 20120913) (XEN) ACPI: DMAR DC39ADB8, 00B0 (r1 INTEL NUC5i3MY1 INTL1) (XEN) System RAM: 8109MB (8304488kB) (XEN) No NUMA configuration found (XEN) Faking a node at -00021f00 (XEN) Domain heap initialised (XEN) found SMP MP-table at 000fd7c0 (XEN) DMI 2.8 present. (XEN) Using APIC driver default (XEN) ACPI: PM-Timer IO Port: 0x1808 (XEN) ACPI: v5 SLEEP INFO: control[0:0], status[0:0] (XEN) ACPI: SLEEP INFO: pm1x_cnt[1:1804,1:0], pm1x_evt[1:1800,1:0] (XEN) ACPI: 32/64X FACS address mismatch in FADT - dccc9f80/, using 32 (XEN) ACPI: wakeup_vec[dccc9f8c], vec_size[20] (XEN) ACPI: Local APIC address 0xfee0 (XEN) ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled) (XEN) Processor #0 7:13 APIC version 21 (XEN) ACPI: LAPIC (acpi_id[0x02] lapic_id[0x02] enabled) (XEN) Processor #2 7:13 APIC version 21 (XEN) ACPI: LAPIC (acpi_id[0x03] lapic_id[0x01] enabled) (XEN) Processor #1 7:13 APIC version 21 (XEN) ACPI: LAPIC (acpi_id[0x04] lapic_id[0x03] enabled) (XEN) Processor #3 7:13 APIC version 21 (XEN) ACPI: LAPIC_NMI (acpi_id[0x01] dfl dfl lint[0]) (XEN) ACPI: NMI not connected to LINT 1! (XEN) ACPI: LAPIC_NMI (acpi_id[0x02] dfl dfl lint[0]) (XEN) ACPI: NMI not connected to LINT 1! (XEN) ACPI: LAPIC_NMI (acpi_id[0x03] dfl dfl lint[0]) (XEN) ACPI: NMI not connected to LINT 1! (XEN) ACPI: LAPIC_NMI (acpi_id[0x04] dfl dfl lint[0]) (XEN) ACPI: NMI not connected to LINT 1! (XEN) ACPI: IOAPIC (id[0x02]
[Xen-devel] how to assign resources exclusive to a single domU
While working on pvscsi support for libxl I noticed that assigning a resource exclusivly to just a single domU via libxl will be a major effort. Up to now libxl could rely on the fact that a resource can be either shared or the backend deals with the attempt to share. There are two cases in pvscsi: 1) a single physical HST:CHN:TGT:LUN device must be assigned to just a single domU. While the (xenlinux) backend driver allows to assign the device to more than one domU the sharing can not work in practice. 2) the xenlinux backend driver has two modes: emulation and raw. With raw mode the SCSI commands coming from domU will be passed directly to the physical device. I think its required to make sure that all devices connected to a physical scsi host must operate either entirely in raw mode or on emulation mode. To handle both cases libxl could either assume that the admin is responsible for proper configuration: - just one domU per physical device - if raw mode is enabled all devices on the physcial scsi host will be assigned to just one domU Or libxl gets functionality to verify that two cases above are really enforced. Doing that means that there has to be some global lock under which the system state in xenstore is parsed and the to be assigned domU configuration is compared: - are the physical devices already assigned - is the raw mode properly configured In xend the case #1 was not handled. There is some code for case #2, I have to check how complete the enforcement in xend was. I wonder what should be done in my changes for libxl. Olaf ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v6 3/5] xen/arm: Make gic-v2 code handle hip04-d01 platform
... > > /* > > diff --git a/xen/arch/arm/gic.c b/xen/arch/arm/gic.c index > > 390c8b0..e4512a8 100644 > > --- a/xen/arch/arm/gic.c > > +++ b/xen/arch/arm/gic.c > > @@ -565,12 +565,13 @@ static void do_sgi(struct cpu_user_regs *regs, > > enum gic_sgi sgi) void gic_interrupt(struct cpu_user_regs *regs, int > > is_fiq) { > > unsigned int irq; > > +unsigned int max_irq = gic_hw_ops->info->nr_lines; > > > > do { > > /* Reading IRQ will ACK it */ > > irq = gic_hw_ops->read_irq(); > > > > -if ( likely(irq >= 16 && irq < 1021) ) > > +if ( likely(irq >= 16 && irq < max_irq) ) > > { > > local_irq_enable(); > > do_IRQ(regs, irq, is_fiq); > > This change should belong to a separate patch. > Looking at code paths and discussing with a colleague that partially wrote the patch I think this test is not necessary at all. I'll check it. Frediano ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v6 02/23] xen: move NUMA_NO_NODE to public memory.h as XEN_NUMA_NO_NODE
Update NUMA_NO_NODE in Xen code to use the new macro. No functional change introduced. Signed-off-by: Wei Liu Cc: Andrew Cooper Cc: Jan Beulich --- xen/arch/x86/hpet.c | 2 +- xen/arch/x86/irq.c | 4 ++-- xen/arch/x86/numa.c | 14 +++--- xen/arch/x86/physdev.c | 2 +- xen/arch/x86/setup.c | 2 +- xen/arch/x86/smpboot.c | 2 +- xen/arch/x86/srat.c | 28 ++-- xen/arch/x86/x86_64/mm.c | 2 +- xen/common/page_alloc.c | 4 ++-- xen/drivers/passthrough/amd/iommu_init.c | 2 +- xen/drivers/passthrough/vtd/iommu.c | 8 xen/include/public/memory.h | 2 ++ xen/include/xen/numa.h | 5 ++--- 13 files changed, 39 insertions(+), 38 deletions(-) diff --git a/xen/arch/x86/hpet.c b/xen/arch/x86/hpet.c index 8f36f6f..3b6d12f 100644 --- a/xen/arch/x86/hpet.c +++ b/xen/arch/x86/hpet.c @@ -375,7 +375,7 @@ static int __init hpet_assign_irq(struct hpet_event_channel *ch) { int irq; -if ( (irq = create_irq(NUMA_NO_NODE)) < 0 ) +if ( (irq = create_irq(XEN_NUMA_NO_NODE)) < 0 ) return irq; ch->msi.irq = irq; diff --git a/xen/arch/x86/irq.c b/xen/arch/x86/irq.c index 786d1fc..deb67d7 100644 --- a/xen/arch/x86/irq.c +++ b/xen/arch/x86/irq.c @@ -173,7 +173,7 @@ int create_irq(nodeid_t node) { cpumask_t *mask = NULL; -if ( node != NUMA_NO_NODE ) +if ( node != XEN_NUMA_NO_NODE ) { mask = &node_to_cpumask(node); if (cpumask_empty(mask)) @@ -2000,7 +2000,7 @@ int map_domain_pirq( spin_unlock_irqrestore(&desc->lock, flags); info = NULL; -irq = create_irq(NUMA_NO_NODE); +irq = create_irq(XEN_NUMA_NO_NODE); ret = irq >= 0 ? prepare_domain_irq_pirq(d, irq, pirq + nr, &info) : irq; if ( ret ) diff --git a/xen/arch/x86/numa.c b/xen/arch/x86/numa.c index 132d694..6e1a0b8 100644 --- a/xen/arch/x86/numa.c +++ b/xen/arch/x86/numa.c @@ -37,13 +37,13 @@ unsigned long memnodemapsize; u8 *memnodemap; nodeid_t cpu_to_node[NR_CPUS] __read_mostly = { -[0 ... NR_CPUS-1] = NUMA_NO_NODE +[0 ... NR_CPUS-1] = XEN_NUMA_NO_NODE }; /* * Keep BIOS's CPU2node information, should not be used for memory allocaion */ nodeid_t apicid_to_node[MAX_LOCAL_APIC] __cpuinitdata = { -[0 ... MAX_LOCAL_APIC-1] = NUMA_NO_NODE +[0 ... MAX_LOCAL_APIC-1] = XEN_NUMA_NO_NODE }; cpumask_t node_to_cpumask[MAX_NUMNODES] __read_mostly; @@ -71,7 +71,7 @@ static int __init populate_memnodemap(const struct node *nodes, unsigned long spdx, epdx; int i, res = -1; -memset(memnodemap, NUMA_NO_NODE, memnodemapsize * sizeof(*memnodemap)); +memset(memnodemap, XEN_NUMA_NO_NODE, memnodemapsize * sizeof(*memnodemap)); for ( i = 0; i < numnodes; i++ ) { spdx = paddr_to_pdx(nodes[i].start); @@ -81,7 +81,7 @@ static int __init populate_memnodemap(const struct node *nodes, if ( (epdx >> shift) >= memnodemapsize ) return 0; do { -if ( memnodemap[spdx >> shift] != NUMA_NO_NODE ) +if ( memnodemap[spdx >> shift] != XEN_NUMA_NO_NODE ) return -1; if ( !nodeids ) @@ -199,7 +199,7 @@ void __init numa_init_array(void) rr = first_node(node_online_map); for ( i = 0; i < nr_cpu_ids; i++ ) { -if ( cpu_to_node[i] != NUMA_NO_NODE ) +if ( cpu_to_node[i] != XEN_NUMA_NO_NODE ) continue; numa_set_node(i, rr); rr = next_node(rr, node_online_map); @@ -350,7 +350,7 @@ void __init init_cpu_to_node(void) if ( apicid == BAD_APICID ) continue; node = apicid_to_node[apicid]; -if ( node == NUMA_NO_NODE || !node_online(node) ) +if ( node == XEN_NUMA_NO_NODE || !node_online(node) ) node = 0; numa_set_node(i, node); } @@ -433,7 +433,7 @@ static void dump_numa(unsigned char key) err = snprintf(keyhandler_scratch, 12, "%3u", vnuma->vnode_to_pnode[i]); -if ( err < 0 || vnuma->vnode_to_pnode[i] == NUMA_NO_NODE ) +if ( err < 0 || vnuma->vnode_to_pnode[i] == XEN_NUMA_NO_NODE ) strlcpy(keyhandler_scratch, "???", sizeof(keyhandler_scratch)); printk(" %3u: pnode %s,", i, keyhandler_scratch); diff --git a/xen/arch/x86/physdev.c b/xen/arch/x86/physdev.c index 1be1d50..a3a9564 100644 --- a/xen/arch/x86/physdev.c +++ b/xen/arch/x86/physdev.c @@ -146,7 +146,7 @@ int physdev_map_pirq(domid_t domid, int type, int *index, int *pirq_p, irq = *index; if ( irq == -1 ) case MAP_PIRQ_TYPE_MULTI_MSI: -irq = create_irq(NUMA_NO_NODE); +irq = create_irq(XEN_NUMA_N
[Xen-devel] [PATCH v6 11/23] libxl: functions to build vmemranges for PV guest
Introduce a arch-independent routine to generate one vmemrange per vnode. Also introduce arch-dependent routines for different architectures because part of the process is arch-specific -- ARM has yet have NUMA support and E820 is x86 only. For those x86 guests who care about machine E820 map (i.e. with e820_host=1), vnode is further split into several vmemranges to accommodate memory holes. A few stubs for libxl_arm.c are created. Signed-off-by: Wei Liu Reviewed-by: Dario Faggioli Cc: Ian Campbell Cc: Ian Jackson Cc: Dario Faggioli Cc: Elena Ufimtseva --- Changes in v5: 1. Allocate array all in one go. 2. Reverse the logic of vmemranges generation. Changes in v4: 1. Adapt to new interface. 2. Address Ian Jackson's comments. Changes in v3: 1. Rewrite commit log. --- tools/libxl/libxl_arch.h | 6 tools/libxl/libxl_arm.c | 8 + tools/libxl/libxl_internal.h | 8 + tools/libxl/libxl_vnuma.c| 41 + tools/libxl/libxl_x86.c | 73 5 files changed, 136 insertions(+) diff --git a/tools/libxl/libxl_arch.h b/tools/libxl/libxl_arch.h index d3bc136..e249048 100644 --- a/tools/libxl/libxl_arch.h +++ b/tools/libxl/libxl_arch.h @@ -27,4 +27,10 @@ int libxl__arch_domain_init_hw_description(libxl__gc *gc, int libxl__arch_domain_finalise_hw_description(libxl__gc *gc, libxl_domain_build_info *info, struct xc_dom_image *dom); + +/* build vNUMA vmemrange with arch specific information */ +int libxl__arch_vnuma_build_vmemrange(libxl__gc *gc, + uint32_t domid, + libxl_domain_build_info *b_info, + libxl__domain_build_state *state); #endif diff --git a/tools/libxl/libxl_arm.c b/tools/libxl/libxl_arm.c index 65a762b..7da254f 100644 --- a/tools/libxl/libxl_arm.c +++ b/tools/libxl/libxl_arm.c @@ -707,6 +707,14 @@ int libxl__arch_domain_finalise_hw_description(libxl__gc *gc, return 0; } +int libxl__arch_vnuma_build_vmemrange(libxl__gc *gc, + uint32_t domid, + libxl_domain_build_info *info, + libxl__domain_build_state *state) +{ +return libxl__vnuma_build_vmemrange_pv_generic(gc, domid, info, state); +} + /* * Local variables: * mode: C diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h index 258be0d..7d1e1cf 100644 --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h @@ -3400,6 +3400,14 @@ void libxl__numa_candidate_put_nodemap(libxl__gc *gc, int libxl__vnuma_config_check(libxl__gc *gc, const libxl_domain_build_info *b_info, const libxl__domain_build_state *state); +int libxl__vnuma_build_vmemrange_pv_generic(libxl__gc *gc, +uint32_t domid, +libxl_domain_build_info *b_info, +libxl__domain_build_state *state); +int libxl__vnuma_build_vmemrange_pv(libxl__gc *gc, +uint32_t domid, +libxl_domain_build_info *b_info, +libxl__domain_build_state *state); _hidden int libxl__ms_vm_genid_set(libxl__gc *gc, uint32_t domid, const libxl_ms_vm_genid *id); diff --git a/tools/libxl/libxl_vnuma.c b/tools/libxl/libxl_vnuma.c index 33d7a3c..04672b5 100644 --- a/tools/libxl/libxl_vnuma.c +++ b/tools/libxl/libxl_vnuma.c @@ -14,6 +14,7 @@ */ #include "libxl_osdeps.h" /* must come before any other headers */ #include "libxl_internal.h" +#include "libxl_arch.h" #include /* Sort vmemranges in ascending order with "start" */ @@ -142,6 +143,46 @@ out: return rc; } + +int libxl__vnuma_build_vmemrange_pv_generic(libxl__gc *gc, +uint32_t domid, +libxl_domain_build_info *b_info, +libxl__domain_build_state *state) +{ +int i; +uint64_t next; +xen_vmemrange_t *v = NULL; + +/* Generate one vmemrange for each virtual node. */ +GCREALLOC_ARRAY(v, b_info->num_vnuma_nodes); +next = 0; +for (i = 0; i < b_info->num_vnuma_nodes; i++) { +libxl_vnode_info *p = &b_info->vnuma_nodes[i]; + +v[i].start = next; +v[i].end = next + (p->memkb << 10); +v[i].flags = 0; +v[i].nid = i; + +next = v[i].end; +} + +state->vmemranges = v; +state->num_vmemranges = i; + +return 0; +} + +/* Build vmemranges for PV guest */ +int libxl__vnuma_build_vmemrange_pv(libxl__gc *gc, +uint32_t domid, +l
[Xen-devel] [PATCH v10 2/4] tools/libxc: code refactoring in xc_psr_cmt_get_data
Use calculated array index instead of hardcoded array index. No functional change involved. Signed-off-by: Chao Peng --- tools/libxc/xc_psr.c | 24 +--- 1 file changed, 13 insertions(+), 11 deletions(-) diff --git a/tools/libxc/xc_psr.c b/tools/libxc/xc_psr.c index cfae172..70d9067 100644 --- a/tools/libxc/xc_psr.c +++ b/tools/libxc/xc_psr.c @@ -143,7 +143,7 @@ int xc_psr_cmt_get_data(xc_interface *xch, uint32_t rmid, uint32_t cpu, { xc_resource_op_t op; xc_resource_entry_t entries[2]; -uint32_t evtid; +uint32_t evtid, nr = 0; int rc; switch ( type ) @@ -155,25 +155,27 @@ int xc_psr_cmt_get_data(xc_interface *xch, uint32_t rmid, uint32_t cpu, return -1; } -entries[0].u.cmd = XEN_RESOURCE_OP_MSR_WRITE; -entries[0].idx = MSR_IA32_CMT_EVTSEL; -entries[0].val = (uint64_t)rmid << 32 | evtid; -entries[0].rsvd = 0; +entries[nr].u.cmd = XEN_RESOURCE_OP_MSR_WRITE; +entries[nr].idx = MSR_IA32_CMT_EVTSEL; +entries[nr].val = (uint64_t)rmid << 32 | evtid; +entries[nr].rsvd = 0; +nr++; -entries[1].u.cmd = XEN_RESOURCE_OP_MSR_READ; -entries[1].idx = MSR_IA32_CMT_CTR; -entries[1].val = 0; -entries[1].rsvd = 0; +entries[nr].u.cmd = XEN_RESOURCE_OP_MSR_READ; +entries[nr].idx = MSR_IA32_CMT_CTR; +entries[nr].val = 0; +entries[nr].rsvd = 0; +nr++; op.cpu = cpu; -op.nr_entries = 2; +op.nr_entries = nr; op.entries = entries; rc = xc_resource_op(xch, 1, &op); if ( rc < 0 ) return rc; -if ( op.result !=2 || entries[1].val & IA32_CMT_CTR_ERROR_MASK ) +if ( op.result != nr || entries[1].val & IA32_CMT_CTR_ERROR_MASK ) return -1; *monitor_data = entries[1].val; -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v6 08/23] libxl: add vmemrange to libxl__domain_build_state
A vnode consists of one or more vmemranges (virtual memory range). One example of multiple vmemranges is that there is a hole in one vnode. Currently we haven't exported vmemrange interface to libxl user. Vmemranges are generated during domain build, so we have relevant structures in domain build state. Later if we discover we need to export the interface, those structures can be moved to libxl_domain_build_info as well. These new fields (along with other fields in that struct) are set to 0 at start of day so we don't need to explicitly initialise them. A following patch which introduces an independent checking function will need to access these fields. I don't feel very comfortable squashing this change into that one so I didn't use a single commit. Signed-off-by: Wei Liu Reviewed-by: Dario Faggioli Cc: Ian Campbell Cc: Ian Jackson Cc: Dario Faggioli Cc: Elena Ufimtseva Acked-by: Ian Campbell --- Changes in v5: 1. Fix commit message. Changes in v4: 1. Improve commit message. Changes in v3: 1. Rewrite commit message. --- tools/libxl/libxl_internal.h | 3 +++ 1 file changed, 3 insertions(+) diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h index 934465a..6d3ac58 100644 --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h @@ -973,6 +973,9 @@ typedef struct { libxl__file_reference pv_ramdisk; const char * pv_cmdline; bool pvh_enabled; + +xen_vmemrange_t *vmemranges; +uint32_t num_vmemranges; } libxl__domain_build_state; _hidden int libxl__build_pre(libxl__gc *gc, uint32_t domid, -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Shared page tables between ETP and IOMMU issue
El 26/02/15 a les 16.57, Jan Beulich ha escrit: On 26.02.15 at 16:45, wrote: >> While testing PVH Dom0 support on a newer Core i3-5010U I've found that >> sharing the page tables between EPT and the IOMMUs don't work. Booting >> with iommu=no-sharept solves the problem, but I'm unsure what causes >> this issue. > > Is FreeBSD fiddling with its own memory map in some way? It's rather > surprising to see not just an occasional fault, but many of them, and > with L2 or even L3 entries not present. No, FreeBSD doesn't touch the physical memory map at all. No ballooning or anything like that. > I.e. if it's not the OS > requesting re-arrangements, I would suppose table setup itself is > screwed up in some way. In the end - knowing the valid GFN range > for the guest - you may want to monitor/log how tables get created > and whether (and if so by whom) later some of the entries get > zapped. OK, I will try to take a look. All those faults come from physical memory ranges that are supposed to be usable, and in fact the CPU seems to be able to read/write from them without problems, or else the guest would have crashed much more early. Regarding sharing the page tables between EPT and the IOMMU, is there some bit that needs to be set in the ept entry in order to mark a page as available by the IOMMU? Roger. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2] RFC: Automatically check xen's public headers for C++ pitfalls.
On 26/02/15 16:28, Tim Deegan wrote: > At 16:11 + on 26 Feb (1424963496), Tim Deegan wrote: >> Add a check, like the existing check for non-ANSI C in the public >> headers, that runs the public headers through a C++ compiler to >> flag non-C++-friendly constructs. > Oops, this still has the EFI changes in it. v3, rebased, is on its way. > >> Unlike the ANSI C check, we accept GCC-isms (gnu++98), and we also >> check various tools-only headers. >> >> Explicitly _not_ addressing the use of 'private' in various fields, >> since we'd previously decided not to fix that. > BTW, ring.h is the only instance of that, so the extra diff to clear > that up too is pretty small (see below). > > Not sure what people think about that though - it might be > quite a PITA for downstream users of it, though they ought really to > be using local copies so they can update in a controlled way. It is basically no effort, wont (directly) break consumers, and will make the headers fully friendly (other than extern C, which can be dealt with using the C++ #include pattern). +1 throw this in and be done with the incompatibilities for good. ~Andrew > > diff --git a/xen/include/Makefile b/xen/include/Makefile > index d48a642..c7a1d52 100644 > --- a/xen/include/Makefile > +++ b/xen/include/Makefile > @@ -104,8 +104,7 @@ headers.chk: $(PUBLIC_ANSI_HEADERS) Makefile > headers++.chk: $(PUBLIC_HEADERS) Makefile > if $(CXX) -v >/dev/null 2>&1; then \ > for i in $(filter %.h,$^); do \ > - $(CXX) -x c++ -std=gnu++98 -Wall -Werror \ > --D__XEN_TOOLS__ -Dprivate=private_is_a_keyword_in_cpp \ > + $(CXX) -x c++ -std=gnu++98 -Wall -Werror -D__XEN_TOOLS__ \ > -include stdint.h -include public/xen.h \ > -S -o /dev/null $$i || exit 1; \ > echo $$i; \ > diff --git a/xen/include/public/io/ring.h b/xen/include/public/io/ring.h > index 73e13d7..bb13494 100644 > --- a/xen/include/public/io/ring.h > +++ b/xen/include/public/io/ring.h > @@ -111,7 +111,7 @@ struct __name##_sring { > \ > uint8_t msg;\ > } tapif_user; \ > uint8_t pvt_pad[4]; \ > -} private; \ > +} local;\ > uint8_t __pad[44]; \ > union __name##_sring_entry ring[1]; /* variable-length */ \ > }; \ > @@ -156,7 +156,7 @@ typedef struct __name##_back_ring __name##_back_ring_t > #define SHARED_RING_INIT(_s) do { \ > (_s)->req_prod = (_s)->rsp_prod = 0; \ > (_s)->req_event = (_s)->rsp_event = 1; \ > -(void)memset((_s)->private.pvt_pad, 0, sizeof((_s)->private.pvt_pad)); \ > +(void)memset((_s)->local.pvt_pad, 0, sizeof((_s)->local.pvt_pad)); \ > (void)memset((_s)->__pad, 0, sizeof((_s)->__pad)); \ > } while(0) > > > > > ___ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v2] RFC: Automatically check xen's public headers for C++ pitfalls.
At 16:47 + on 26 Feb (1424965651), Jan Beulich wrote: > >>> On 26.02.15 at 17:28, wrote: > > At 16:11 + on 26 Feb (1424963496), Tim Deegan wrote: > >> Explicitly _not_ addressing the use of 'private' in various fields, > >> since we'd previously decided not to fix that. > > > > BTW, ring.h is the only instance of that, so the extra diff to clear > > that up too is pretty small (see below). > > > > Not sure what people think about that though - it might be > > quite a PITA for downstream users of it, though they ought really to > > be using local copies so they can update in a controlled way. > > linux-2.6.18-xen.hg always having consumed them (almost) > verbatim, I don't think we should break users not massaging > the headers. I.e. at least make the field name conditional upon > using C vs C++. Something like this? This is the kind of uglification that I would like to avoid, though (and I don't like '#define private pvt' much either). Tim. diff --git a/xen/include/Makefile b/xen/include/Makefile index d48a642..c7a1d52 100644 --- a/xen/include/Makefile +++ b/xen/include/Makefile @@ -104,8 +104,7 @@ headers.chk: $(PUBLIC_ANSI_HEADERS) Makefile headers++.chk: $(PUBLIC_HEADERS) Makefile if $(CXX) -v >/dev/null 2>&1; then \ for i in $(filter %.h,$^); do \ - $(CXX) -x c++ -std=gnu++98 -Wall -Werror \ - -D__XEN_TOOLS__ -Dprivate=private_is_a_keyword_in_cpp \ + $(CXX) -x c++ -std=gnu++98 -Wall -Werror -D__XEN_TOOLS__ \ -include stdint.h -include public/xen.h \ -S -o /dev/null $$i || exit 1; \ echo $$i; \ diff --git a/xen/include/public/io/ring.h b/xen/include/public/io/ring.h index 73e13d7..86fb991 100644 --- a/xen/include/public/io/ring.h +++ b/xen/include/public/io/ring.h @@ -35,6 +35,15 @@ #define xen_wmb() wmb() #endif +#ifdef __cplusplus +/* 'private' is a keyword in C++, so we have to use a different name for + * private state there. Leaving the C name alone to avoid unnecessary + * pain for the existing users. */ +#define XEN_RING_PRIVATE pvt +#else +#define XEN_RING_PRIVATE private +#endif + typedef unsigned int RING_IDX; /* Round a 32-bit unsigned constant down to the nearest power of two. */ @@ -111,7 +120,7 @@ struct __name##_sring { \ uint8_t msg;\ } tapif_user; \ uint8_t pvt_pad[4]; \ -} private; \ +} XEN_RING_PRIVATE; \ uint8_t __pad[44]; \ union __name##_sring_entry ring[1]; /* variable-length */ \ }; \ @@ -156,7 +165,8 @@ typedef struct __name##_back_ring __name##_back_ring_t #define SHARED_RING_INIT(_s) do { \ (_s)->req_prod = (_s)->rsp_prod = 0; \ (_s)->req_event = (_s)->rsp_event = 1; \ -(void)memset((_s)->private.pvt_pad, 0, sizeof((_s)->private.pvt_pad)); \ +(void)memset((_s)->XEN_RING_PRIVATE.pvt_pad, 0, \ + sizeof((_s)->XEN_RING_PRIVATE.pvt_pad)); \ (void)memset((_s)->__pad, 0, sizeof((_s)->__pad)); \ } while(0) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v6 14/23] libxc: allocate memory with vNUMA information for HVM guest
The algorithm is more or less the same as the one used for PV guest. Libxc gets hold of the mapping of vnode to pnode and size of each vnode then allocate memory accordingly. And then the function returns low memory end, high memory end and mmio start to caller. Libxl needs those values to construct vmemranges for that guest. Signed-off-by: Wei Liu Cc: Ian Campbell Cc: Ian Jackson Cc: Dario Faggioli Cc: Elena Ufimtseva --- Changes in v6: 1. Use XEN_NUMA_NO_NODE. 2. Fix a minor bug discovered by Dario. Changes in v5: 1. Use a better loop variable name vnid. Changes in v4: 1. Adapt to new interface. 2. Shorten error message. 3. This patch includes only functional changes. Changes in v3: 1. Rewrite commit log. 2. Add a few code comments. --- tools/libxc/include/xenguest.h | 11 + tools/libxc/xc_hvm_build_x86.c | 102 ++--- 2 files changed, 97 insertions(+), 16 deletions(-) diff --git a/tools/libxc/include/xenguest.h b/tools/libxc/include/xenguest.h index 40bbac8..ff66cb1 100644 --- a/tools/libxc/include/xenguest.h +++ b/tools/libxc/include/xenguest.h @@ -230,6 +230,17 @@ struct xc_hvm_build_args { struct xc_hvm_firmware_module smbios_module; /* Whether to use claim hypercall (1 - enable, 0 - disable). */ int claim_enabled; + +/* vNUMA information*/ +xen_vmemrange_t *vmemranges; +unsigned int nr_vmemranges; +unsigned int *vnode_to_pnode; +unsigned int nr_vnodes; + +/* Out parameters */ +uint64_t lowmem_end; +uint64_t highmem_end; +uint64_t mmio_start; }; /** diff --git a/tools/libxc/xc_hvm_build_x86.c b/tools/libxc/xc_hvm_build_x86.c index ecc3224..fba02fb 100644 --- a/tools/libxc/xc_hvm_build_x86.c +++ b/tools/libxc/xc_hvm_build_x86.c @@ -89,7 +89,8 @@ static int modules_init(struct xc_hvm_build_args *args, } static void build_hvm_info(void *hvm_info_page, uint64_t mem_size, - uint64_t mmio_start, uint64_t mmio_size) + uint64_t mmio_start, uint64_t mmio_size, + struct xc_hvm_build_args *args) { struct hvm_info_table *hvm_info = (struct hvm_info_table *) (((unsigned char *)hvm_info_page) + HVM_INFO_OFFSET); @@ -119,6 +120,10 @@ static void build_hvm_info(void *hvm_info_page, uint64_t mem_size, hvm_info->high_mem_pgend = highmem_end >> PAGE_SHIFT; hvm_info->reserved_mem_pgstart = ioreq_server_pfn(0); +args->lowmem_end = lowmem_end; +args->highmem_end = highmem_end; +args->mmio_start = mmio_start; + /* Finish with the checksum. */ for ( i = 0, sum = 0; i < hvm_info->length; i++ ) sum += ((uint8_t *)hvm_info)[i]; @@ -244,7 +249,7 @@ static int setup_guest(xc_interface *xch, char *image, unsigned long image_size) { xen_pfn_t *page_array = NULL; -unsigned long i, nr_pages = args->mem_size >> PAGE_SHIFT; +unsigned long i, vmemid, nr_pages = args->mem_size >> PAGE_SHIFT; unsigned long target_pages = args->mem_target >> PAGE_SHIFT; uint64_t mmio_start = (1ull << 32) - args->mmio_size; uint64_t mmio_size = args->mmio_size; @@ -258,13 +263,13 @@ static int setup_guest(xc_interface *xch, xen_capabilities_info_t caps; unsigned long stat_normal_pages = 0, stat_2mb_pages = 0, stat_1gb_pages = 0; -int pod_mode = 0; +unsigned int memflags = 0; int claim_enabled = args->claim_enabled; xen_pfn_t special_array[NR_SPECIAL_PAGES]; xen_pfn_t ioreq_server_array[NR_IOREQ_SERVER_PAGES]; - -if ( nr_pages > target_pages ) -pod_mode = XENMEMF_populate_on_demand; +uint64_t total_pages; +xen_vmemrange_t dummy_vmemrange; +unsigned int dummy_vnode_to_pnode; memset(&elf, 0, sizeof(elf)); if ( elf_init(&elf, image, image_size) != 0 ) @@ -276,6 +281,43 @@ static int setup_guest(xc_interface *xch, v_start = 0; v_end = args->mem_size; +if ( nr_pages > target_pages ) +memflags |= XENMEMF_populate_on_demand; + +if ( args->nr_vmemranges == 0 ) +{ +/* Build dummy vnode information */ +dummy_vmemrange.start = 0; +dummy_vmemrange.end = args->mem_size; +dummy_vmemrange.flags = 0; +dummy_vmemrange.nid = 0; +args->nr_vmemranges = 1; +args->vmemranges = &dummy_vmemrange; + +dummy_vnode_to_pnode = XEN_NUMA_NO_NODE; +args->nr_vnodes = 1; +args->vnode_to_pnode = &dummy_vnode_to_pnode; +} +else +{ +if ( nr_pages > target_pages ) +{ +PERROR("Cannot enable vNUMA and PoD at the same time"); +goto error_out; +} +} + +total_pages = 0; +for ( i = 0; i < args->nr_vmemranges; i++ ) +total_pages += ((args->vmemranges[i].end - args->vmemranges[i].start) +>> PAGE_SHIFT); +if ( total_pages != (args->mem_size >> PAGE_SHIFT) ) +{ +PERROR("vNUMA memory pages mismatch (
[Xen-devel] [PATCH v6 22/23] xl: introduce xcalloc
Signed-off-by: Wei Liu Cc: Ian Campbell Cc: Ian Jackson --- Changes in v6: 1. Join two lines to make code more compact. 2. Use %zu and drop casting. --- tools/libxl/xl_cmdimpl.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c index 53c16eb..5b366f2 100644 --- a/tools/libxl/xl_cmdimpl.c +++ b/tools/libxl/xl_cmdimpl.c @@ -289,6 +289,16 @@ static void *xmalloc(size_t sz) { return r; } +static void *xcalloc(size_t n, size_t sz) __attribute__((unused)); +static void *xcalloc(size_t n, size_t sz) { +void *r = calloc(n, sz); +if (!r) { +fprintf(stderr,"xl: Unable to calloc %zu bytes.\n", sz*n); +exit(-ERROR_FAIL); +} +return r; +} + static void *xrealloc(void *ptr, size_t sz) { void *r; if (!sz) { free(ptr); return 0; } -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v6 01/23] xen: factor out construct_memop_from_reservation
No functional change. Signed-off-by: Wei Liu Cc: Jan Beulich Cc: Andrew Cooper --- xen/common/memory.c | 52 +++- 1 file changed, 35 insertions(+), 17 deletions(-) diff --git a/xen/common/memory.c b/xen/common/memory.c index e84ace9..d24b001 100644 --- a/xen/common/memory.c +++ b/xen/common/memory.c @@ -692,11 +692,43 @@ out: return rc; } +static int construct_memop_from_reservation( + const struct xen_memory_reservation *r, + struct memop_args *a) +{ +int rc; +unsigned int address_bits; + +a->extent_list = r->extent_start; +a->nr_extents = r->nr_extents; +a->extent_order = r->extent_order; +a->memflags = 0; + +address_bits = XENMEMF_get_address_bits(r->mem_flags); +if ( (address_bits != 0) && + (address_bits < (get_order_from_pages(max_page) + PAGE_SHIFT)) ) +{ +if ( address_bits <= PAGE_SHIFT ) +{ +rc = -EINVAL; +goto out; +} +a->memflags = MEMF_bits(address_bits); +} + +a->memflags |= MEMF_node(XENMEMF_get_node(r->mem_flags)); +if ( r->mem_flags & XENMEMF_exact_node_request ) +a->memflags |= MEMF_exact_node; + +rc = 0; + out: +return rc; +} + long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg) { struct domain *d; long rc; -unsigned int address_bits; struct xen_memory_reservation reservation; struct memop_args args; domid_t domid; @@ -718,25 +750,11 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg) if ( unlikely(start_extent >= reservation.nr_extents) ) return start_extent; -args.extent_list = reservation.extent_start; -args.nr_extents = reservation.nr_extents; -args.extent_order = reservation.extent_order; args.nr_done = start_extent; args.preempted= 0; -args.memflags = 0; -address_bits = XENMEMF_get_address_bits(reservation.mem_flags); -if ( (address_bits != 0) && - (address_bits < (get_order_from_pages(max_page) + PAGE_SHIFT)) ) -{ -if ( address_bits <= PAGE_SHIFT ) -return start_extent; -args.memflags = MEMF_bits(address_bits); -} - -args.memflags |= MEMF_node(XENMEMF_get_node(reservation.mem_flags)); -if ( reservation.mem_flags & XENMEMF_exact_node_request ) -args.memflags |= MEMF_exact_node; +if ( construct_memop_from_reservation(&reservation, &args) ) +return start_extent; if ( op == XENMEM_populate_physmap && (reservation.mem_flags & XENMEMF_populate_on_demand) ) -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v10 4/4] tools, docs: add total/local memory bandwith monitoring
Add Memory Bandwidth Monitoring(MBM) for VMs. Two types of monitoring are supported: total and local memory bandwidth monitoring. To use it, CMT should be enabled in hypervisor. Signed-off-by: Chao Peng --- Changes in v10: 1. Move refactoring code into standalone patch. 2. Create generic interface libxl_psr_cmt_get_sample for both cache_occupancy and memory bandwith. Changes in v9: 1. Refactor code in xc_psr_cmt_get_data. 2. Move bandwidth calculation(sleep) from libxl to xl. 3. Broadcast feature with LIBXL_HAVE_PSR_MBM. 4. Check event mask with libxl_psr_cmt_type_supported. 5. Coding style/Document fix. Changes in v6: 1. Remove DISABLE_IRQ flag as hypervisor disable IRQ for MSR_IA32_TSC implicitly. Changes in v5: 1. Add MBM description in xen command line. 2. Use the tsc from hypervisor directly which is already ns. 3. Call resource_op with DISABLE_IRQ flag. Changes in v4: 1. Get timestamp from hypervisor and use that for bandwidth calculation. 2. Minor document and coding style fix. --- docs/man/xl.pod.1 | 11 +- docs/misc/xen-command-line.markdown | 3 ++ tools/libxc/include/xenctrl.h | 6 +++- tools/libxc/xc_msr_x86.h| 1 + tools/libxc/xc_psr.c| 44 +-- tools/libxl/libxl.h | 17 + tools/libxl/libxl_psr.c | 56 +++-- tools/libxl/libxl_types.idl | 2 ++ tools/libxl/xl_cmdimpl.c| 72 + tools/libxl/xl_cmdtable.c | 4 ++- 10 files changed, 195 insertions(+), 21 deletions(-) diff --git a/docs/man/xl.pod.1 b/docs/man/xl.pod.1 index 6b89ba8..cd80ffc 100644 --- a/docs/man/xl.pod.1 +++ b/docs/man/xl.pod.1 @@ -1461,6 +1461,13 @@ is domain level. To monitor a specific domain, just attach the domain id with the monitoring service. When the domain doesn't need to be monitored any more, detach the domain id from the monitoring service. +Intel Broadwell and later server platforms also offer total/local memory +bandwidth monitoring. Xen supports per-domain monitoring for these two +additional monitoring types. Both memory bandwidth monitoring and L3 cache +occupancy monitoring share the same set of underlying monitoring service. Once +a domain is attached to the monitoring service, monitoring data can be showed +for any of these monitoring types. + =over 4 =item B [I] @@ -1475,7 +1482,9 @@ detach: Detach the platform shared resource monitoring service from a domain. Show monitoring data for a certain domain or all domains. Current supported monitor types are: - - "cache-occupancy": showing the L3 cache occupancy. + - "cache-occupancy": showing the L3 cache occupancy(KB). + - "total-mem-bandwidth": showing the total memory bandwidth(KB/s). + - "local-mem-bandwidth": showing the local memory bandwidth(KB/s). =back diff --git a/docs/misc/xen-command-line.markdown b/docs/misc/xen-command-line.markdown index bc316be..a09ec01 100644 --- a/docs/misc/xen-command-line.markdown +++ b/docs/misc/xen-command-line.markdown @@ -1097,6 +1097,9 @@ The following resources are available: L3 cache occupancy. * `cmt` instructs Xen to enable/disable Cache Monitoring Technology. * `rmid_max` indicates the max value for rmid. +* Memory Bandwidth Monitoring (Broadwell and later). Information regarding the + total/local memory bandwidth. Follow the same options with Cache Monitoring + Technology. ### reboot > `= t[riple] | k[bd] | a[cpi] | p[ci] | n[o] [, [w]arm | [c]old]` diff --git a/tools/libxc/include/xenctrl.h b/tools/libxc/include/xenctrl.h index 09d819f..54043ee 100644 --- a/tools/libxc/include/xenctrl.h +++ b/tools/libxc/include/xenctrl.h @@ -2688,6 +2688,8 @@ int xc_resource_op(xc_interface *xch, uint32_t nr_ops, xc_resource_op_t *ops); #if defined(__i386__) || defined(__x86_64__) enum xc_psr_cmt_type { XC_PSR_CMT_L3_OCCUPANCY, +XC_PSR_CMT_TOTAL_MEM_BANDWIDTH, +XC_PSR_CMT_LOCAL_MEM_BANDWIDTH, }; typedef enum xc_psr_cmt_type xc_psr_cmt_type; int xc_psr_cmt_attach(xc_interface *xch, uint32_t domid); @@ -2697,10 +2699,12 @@ int xc_psr_cmt_get_domain_rmid(xc_interface *xch, uint32_t domid, int xc_psr_cmt_get_total_rmid(xc_interface *xch, uint32_t *total_rmid); int xc_psr_cmt_get_l3_upscaling_factor(xc_interface *xch, uint32_t *upscaling_factor); +int xc_psr_cmt_get_l3_event_mask(xc_interface *xch, uint32_t *event_mask); int xc_psr_cmt_get_l3_cache_size(xc_interface *xch, uint32_t cpu, uint32_t *l3_cache_size); int xc_psr_cmt_get_data(xc_interface *xch, uint32_t rmid, uint32_t cpu, -uint32_t psr_cmt_type, uint64_t *monitor_data); +uint32_t psr_cmt_type, uint64_t *monitor_data, +uint64_t *tsc); int xc_psr_cmt_enabled(xc_interface *xch); #endif diff --git a/tools/libxc/xc_msr_x86.h b/tools/libxc/xc_msr_x86.h index 7c3e1a3..7
Re: [Xen-devel] Shared page tables between ETP and IOMMU issue
>>> On 26.02.15 at 17:29, wrote: > OK, I will try to take a look. All those faults come from physical > memory ranges that are supposed to be usable, and in fact the CPU seems > to be able to read/write from them without problems, or else the guest > would have crashed much more early. Regarding sharing the page tables > between EPT and the IOMMU, is there some bit that needs to be set in the > ept entry in order to mark a page as available by the IOMMU? Bits 0 and 1 (read and write) are shared between VT-d and EPT (as is bit 7 - see struct dma_pte and ept_entry_t). Jan ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/1] xen-netback: remove compilation warning
From: pedro Date: Thu, 26 Feb 2015 09:25:41 +0100 > From: pmarzo > > offset and size are of type uint16_t so the %lu gives a warning > A %u specifier, the same used in size makes gcc happy > Not sure if a %x would be more correct > > Signed-off-by: Pedro Marzo Perez This patch actually adds a warning on my machine, and your analysis of the types is therefore probably incorrect: drivers/net/xen-netback/netback.c: In function ‘xenvif_tx_build_gops’: drivers/net/xen-netback/netback.c:1259:8: warning: format ‘%u’ expects argument of type ‘unsigned int’, but argument 5 has type ‘long unsigned int’ [-Wformat=] The issue is probably "~PAGE_MASK" and I think the type of that propagates into the type of the overall calculation. ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH v6 11/23] libxl: functions to build vmemranges for PV guest
On Thu, 2015-02-26 at 15:55 +, Wei Liu wrote: > Introduce a arch-independent routine to generate one vmemrange per > vnode. Also introduce arch-dependent routines for different > architectures because part of the process is arch-specific -- ARM has > yet have NUMA support and E820 is x86 only. > > For those x86 guests who care about machine E820 map (i.e. with > e820_host=1), vnode is further split into several vmemranges to > accommodate memory holes. A few stubs for libxl_arm.c are created. > > Signed-off-by: Wei Liu > Reviewed-by: Dario Faggioli > Cc: Ian Campbell > Cc: Ian Jackson > Cc: Dario Faggioli > Cc: Elena Ufimtseva > diff --git a/tools/libxl/libxl_vnuma.c b/tools/libxl/libxl_vnuma.c > index 33d7a3c..04672b5 100644 > --- a/tools/libxl/libxl_vnuma.c > +++ b/tools/libxl/libxl_vnuma.c > @@ -14,6 +14,7 @@ > */ > #include "libxl_osdeps.h" /* must come before any other headers */ > #include "libxl_internal.h" > +#include "libxl_arch.h" > #include > > /* Sort vmemranges in ascending order with "start" */ > @@ -142,6 +143,46 @@ out: > return rc; > } > > + Aren't you adding an extra, unnecessary, blank line here? > +int libxl__vnuma_build_vmemrange_pv_generic(libxl__gc *gc, > +uint32_t domid, > +libxl_domain_build_info *b_info, > +libxl__domain_build_state *state) > Of course, my Reviewed-by still stands... I just noticed this while having a quick look. So, if you happen to have to resend... :-) Regards, Dario signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v10 3/4] tools/libxl: code refactoring for MBM
Make some internal routines common so that total/local memory bandwidth monitoring in the next patch can make use of them. Signed-off-by: Chao Peng Acked-by: Wei Liu --- Changes in v10: 1. Merge libxl change into next patch. 2. Minor function name changes to make them more generic. --- tools/libxl/xl_cmdimpl.c | 54 +--- 1 file changed, 33 insertions(+), 21 deletions(-) diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c index 8b41093..846a4b2 100644 --- a/tools/libxl/xl_cmdimpl.c +++ b/tools/libxl/xl_cmdimpl.c @@ -7822,8 +7822,9 @@ out: } #ifdef LIBXL_HAVE_PSR_CMT -static void psr_cmt_print_domain_cache_occupancy(libxl_dominfo *dominfo, - uint32_t nr_sockets) +static void psr_cmt_print_domain_info(libxl_dominfo *dominfo, + libxl_psr_cmt_type type, + uint32_t nr_sockets) { char *domain_name; uint32_t socketid; @@ -7837,15 +7838,23 @@ static void psr_cmt_print_domain_cache_occupancy(libxl_dominfo *dominfo, free(domain_name); for (socketid = 0; socketid < nr_sockets; socketid++) { -if (!libxl_psr_cmt_get_cache_occupancy(ctx, dominfo->domid, socketid, - &l3_cache_occupancy)) -printf("%13u KB", l3_cache_occupancy); +switch (type) { +case LIBXL_PSR_CMT_TYPE_CACHE_OCCUPANCY: +if (!libxl_psr_cmt_get_cache_occupancy(ctx, + dominfo->domid, + socketid, + &l3_cache_occupancy)) +printf("%13u KB", l3_cache_occupancy); +break; +default: +return; +} } printf("\n"); } -static int psr_cmt_show_cache_occupancy(uint32_t domid) +static int psr_cmt_show(libxl_psr_cmt_type type, uint32_t domid) { uint32_t i, socketid, nr_sockets, total_rmid; uint32_t l3_cache_size; @@ -7881,19 +7890,22 @@ static int psr_cmt_show_cache_occupancy(uint32_t domid) printf("%14s %d", "Socket", socketid); printf("\n"); -/* Total L3 cache size */ -printf("%-46s", "Total L3 Cache Size"); -for (socketid = 0; socketid < nr_sockets; socketid++) { -rc = libxl_psr_cmt_get_l3_cache_size(ctx, socketid, &l3_cache_size); -if (rc < 0) { -fprintf(stderr, -"Failed to get system l3 cache size for socket:%d\n", -socketid); -return -1; -} -printf("%13u KB", l3_cache_size); +if (type == LIBXL_PSR_CMT_TYPE_CACHE_OCCUPANCY) { +/* Total L3 cache size */ +printf("%-46s", "Total L3 Cache Size"); +for (socketid = 0; socketid < nr_sockets; socketid++) { +rc = libxl_psr_cmt_get_l3_cache_size(ctx, socketid, + &l3_cache_size); +if (rc < 0) { +fprintf(stderr, +"Failed to get system l3 cache size for socket:%d\n", +socketid); +return -1; +} +printf("%13u KB", l3_cache_size); +} +printf("\n"); } -printf("\n"); /* Each domain */ if (domid != INVALID_DOMID) { @@ -7902,7 +7914,7 @@ static int psr_cmt_show_cache_occupancy(uint32_t domid) fprintf(stderr, "Failed to get domain info for %d\n", domid); return -1; } -psr_cmt_print_domain_cache_occupancy(&dominfo, nr_sockets); +psr_cmt_print_domain_info(&dominfo, type, nr_sockets); } else { @@ -7912,7 +7924,7 @@ static int psr_cmt_show_cache_occupancy(uint32_t domid) return -1; } for (i = 0; i < nr_domains; i++) -psr_cmt_print_domain_cache_occupancy(list + i, nr_sockets); +psr_cmt_print_domain_info(list + i, type, nr_sockets); libxl_dominfo_list_free(list, nr_domains); } return 0; @@ -7971,7 +7983,7 @@ int main_psr_cmt_show(int argc, char **argv) switch (type) { case LIBXL_PSR_CMT_TYPE_CACHE_OCCUPANCY: -ret = psr_cmt_show_cache_occupancy(domid); +ret = psr_cmt_show(type, domid); break; default: help("psr-cmt-show"); -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 4/4] xen: add Xen pvUSB maintainer
On February 26, 2015 8:35:17 AM EST, Juergen Gross wrote: >Add myself as maintainer for the Xen pvUSB stuff. > >Signed-off-by: Juergen Gross >--- > MAINTAINERS | 8 > 1 file changed, 8 insertions(+) > >diff --git a/MAINTAINERS b/MAINTAINERS >index ddc5a8c..8ec1e1f 100644 >--- a/MAINTAINERS >+++ b/MAINTAINERS >@@ -10787,6 +10787,14 @@ F:drivers/scsi/xen-scsifront.c > F:drivers/xen/xen-scsiback.c > F:include/xen/interface/io/vscsiif.h > >+XEN PVUSB DRIVERS >+M:Juergen Gross >+L:xen-de...@lists.xenproject.org (moderated for non-subscribers) >+L:linux-...@vger.kernel.org >+S:Supported >+F:divers/usb/xen/ >+F:include/xen/interface/io/usbif.h Acked-by: Konrad Rzeszutek Wilk On the include/Xen/... part. >+ > XEN SWIOTLB SUBSYSTEM > M:Konrad Rzeszutek Wilk > L:xen-de...@lists.xenproject.org (moderated for non-subscribers) ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] [PATCH 1/5] x86: allow specifying the NUMA nodes Dom0 should run on
On Thu, 2015-02-26 at 13:52 +, Jan Beulich wrote: > ... by introducing a "dom0_nodes" option augmenting the "dom0_mem" and > "dom0_max_vcpus" ones. > > Note that this gives meaning to MEMF_exact_node specified alone (i.e. > implicitly combined with NUMA_NO_NODE): In such a case any node inside > the domain's node mask is acceptable, but no other node. This changed > behavior is (implicitly) being exposed through the memop hypercalls. > > Note further that this change doesn't take care of moving the initrd > image into memory matching Dom0's affinity when the initrd doesn't get > copied (because of being part of the initial mapping) anyway. > > Signed-off-by: Jan Beulich > Reviewed-by: Dario Faggioli Just a couple of questions/comments. > --- > I'm uncertain whether range restricting the PXMs permitted for Dom0 is > the right approach (matching what other NUMA code did until recently), > or whether we would instead want to simply limit the number of PXMs we > can handler there (i.e. using a static array instead of a static > bitmap). > FWIW, I think the approach taken in the patch is ok. > --- a/docs/misc/xen-command-line.markdown > +++ b/docs/misc/xen-command-line.markdown > @@ -540,6 +540,15 @@ any dom0 autoballooning feature present > _xl.conf(5)_ man page or [Xen Best > > Practices](http://wiki.xen.org/wiki/Xen_Best_Practices#Xen_dom0_dedicated_memory_and_preventing_dom0_memory_ballooning). > > +### dom0\_nodes > + > +> `= [,...]` > + > +Specify the NUMA nodes to place Dom0 on. Defaults for vCPU-s created > +and memory assigned to Dom0 will be adjusted to match the node > +restrictions set up here. Note that the values to be specified here are > +ACPI PXM ones, not Xen internal node numbers. > + Why use PXM ids? It might be me being much more used to work with NUMA node ids, but wouldn't the other way round be more consistent (almost everything the user interacts with after boot speak node ids) and easier for the user to figure things out (e.g., with tools like numactl on baremetal)? > --- a/xen/arch/x86/domain_build.c > +++ b/xen/arch/x86/domain_build.c > +static struct vcpu *__init setup_vcpu(struct domain *d, unsigned int vcpu_id, > + unsigned int cpu) > +{ > +struct vcpu *v = alloc_vcpu(d, vcpu_id, cpu); > + > +if ( v ) > +{ > +if ( !d->is_pinned ) > +cpumask_copy(v->cpu_hard_affinity, &dom0_cpus); > +cpumask_copy(v->cpu_soft_affinity, &dom0_cpus); > +} > + About this, for DomUs, now that we have soft affinity available, what we do is set only soft affinity to match the NUMA placement. I think I see and agree why we want to be 'more strict' in Dom0, but I felt like it was worth to point out the difference in behaviour (should it be documented somewhere?). Regards, Dario BTW, mostly out of curiosity, I've had a few strange issues/conflicts in applying this on top of staging, in order to test it... Was it me doing something very stupid, or was this based on something different? signature.asc Description: This is a digitally signed message part ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] RFC: xen config changes v4
On 26/02/15 04:59, Juergen Gross wrote: > > So we are again in the situation that pv-drivers always imply the pvops > kernel (PARAVIRT selected). I started the whole Kconfig rework to > eliminate this dependency. Yes. Can you produce a series that just addresses this one issue. In the absence of any concrete requirement for this big Kconfig reorg I I don't think it is helpful. David ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v6 00/23] Virtual NUMA for PV and HVM
Hi all This is version 6 of this series rebased on top of staging. This patch series implements virtual NUMA support for both PV and HVM guest. That is, admin can configure via libxl what virtual NUMA topology the guest sees. This is the stage 1 (basic vNUMA support) and part of stage 2 (vNUMA-ware ballooning, hypervisor side) described in my previous email to xen-devel [0]. This series is broken into several parts: 1. xen patches: vNUMA debug output and vNUMA-aware memory hypercall support. 2. libxc/libxl support for PV vNUMA. 3. libxc/libxl/hypervisor support for HVM vNUMA. 4. xl vNUMA configuration documentation and parser. One significant difference from Elena's work is that this patch series makes use of multiple vmemranges should there be a memory hole, instead of shrinking ram. This matches the behaviour of real hardware. The vNUMA auto placement algorithm is missing at the moment and Dario is working on it. This series can be found at: git://xenbits.xen.org/people/liuw/xen.git wip.vnuma-v5 With this series, the following configuration can be used to enabled virtual NUMA support, and it works for both PV and HVM guests. vnuma = [ [ "pnode=0","size=3000","vcpus=0-3","vdistances=10,20" ], [ "pnode=0","size=3000","vcpus=4-7","vdistances=20,10" ], ] For example output of guest NUMA information, please look at [1]. In terms of libxl / libxc internal, things are broken into several parts: 1. libxl interface Users of libxl can only specify how many vnodes a guest can have, but currently they have no control over the actual memory layout. Note that it's fairly easy to export the interface to control memory layout in the future. 2. libxl internal It generates some internal vNUMA configurations when building domain, then transform them into libxc representations. It also validates vNUMA configuration along the line. 3. libxc internal Libxc does what it's told to do. It doesn't do anything smart (in fact, I delibrately didn't put any smart logic inside it). Libxc will also report back some information in HVM case to libxl but that's it. Wei. [0] <2014173606.gc21...@zion.uk.xensource.com> [1] <1416582421-10789-1-git-send-email-wei.l...@citrix.com> Wei Liu (23): xen: factor out construct_memop_from_reservation xen: move NUMA_NO_NODE to public memory.h as XEN_NUMA_NO_NODE xen: make two memory hypercalls vNUMA-aware libxc: duplicate snippet to allocate p2m_host array libxc: add p2m_size to xc_dom_image libxc: allocate memory with vNUMA information for PV guest libxl: introduce vNUMA types libxl: add vmemrange to libxl__domain_build_state libxl: introduce libxl__vnuma_config_check libxl: x86: factor out e820_host_sanitize libxl: functions to build vmemranges for PV guest libxl: build, check and pass vNUMA info to Xen for PV guest libxc: indentation change to xc_hvm_build_x86.c libxc: allocate memory with vNUMA information for HVM guest libxl: build, check and pass vNUMA info to Xen for HVM guest libxl: disallow memory relocation when vNUMA is enabled libxl: define LIBXL_HAVE_VNUMA libxlu: rework internal representation of setting libxlu: nested list support libxlu: record line and column number when parsing values libxlu: introduce new APIs xl: introduce xcalloc xl: vNUMA support docs/man/xl.cfg.pod.5| 54 +++ tools/libxc/include/xc_dom.h | 13 +- tools/libxc/include/xenguest.h | 11 ++ tools/libxc/xc_dom_arm.c | 1 + tools/libxc/xc_dom_core.c| 8 +- tools/libxc/xc_dom_x86.c | 129 +--- tools/libxc/xc_hvm_build_x86.c | 237 +++-- tools/libxl/Makefile | 2 +- tools/libxl/libxl.h | 7 + tools/libxl/libxl_arch.h | 6 + tools/libxl/libxl_arm.c | 8 + tools/libxl/libxl_create.c | 9 ++ tools/libxl/libxl_dm.c | 6 +- tools/libxl/libxl_dom.c | 120 +++ tools/libxl/libxl_internal.h | 24 +++ tools/libxl/libxl_types.idl | 10 ++ tools/libxl/libxl_vnuma.c| 253 +++ tools/libxl/libxl_x86.c | 105 +++-- tools/libxl/libxlu_cfg.c | 209 ++--- tools/libxl/libxlu_cfg_i.h | 14 +- tools/libxl/libxlu_cfg_y.c | 72 - tools/libxl/libxlu_cfg_y.h | 2 +- tools/libxl/libxlu_cfg_y.y | 18 ++- tools/libxl/libxlu_internal.h| 24 ++- tools/libxl/libxlutil.h | 13 ++ tools/libxl/xl_cmdimpl.c | 150 +- xen/arch/x86/hpet.c | 2 +- xen/arch/x86/irq.c | 4 +- xen/arch/x86/numa.c | 14 +- xen/arch/x86/physdev.c | 2 +- x
[Xen-devel] [PATCH v2] RFC: Automatically check xen's public headers for C++ pitfalls.
Add a check, like the existing check for non-ANSI C in the public headers, that runs the public headers through a C++ compiler to flag non-C++-friendly constructs. Unlike the ANSI C check, we accept GCC-isms (gnu++98), and we also check various tools-only headers. Explicitly _not_ addressing the use of 'private' in various fields, since we'd previously decided not to fix that. Also tidy up the runes for these checks to be a bit more readable. Reported-by: Razvan Cojocaru Signed-off-by: Tim Deegan Cc: Jan Beulich --- v2: test more headers; define __XEN_TOOLS__; use g++98 rather than ansi; tidy the makefile for readability; add a missing include to flask_op.h, which uses evtchn_port_t. --- .gitignore| 1 + config/StdGNU.mk | 2 ++ config/SunOS.mk | 1 + xen/include/Makefile | 28 xen/include/public/platform.h | 39 ++- xen/include/public/xsm/flask_op.h | 2 ++ 6 files changed, 52 insertions(+), 21 deletions(-) diff --git a/.gitignore b/.gitignore index 13ee05b..78958ea 100644 --- a/.gitignore +++ b/.gitignore @@ -233,6 +233,7 @@ xen/arch/*/efi/compat.c xen/arch/*/efi/efi.h xen/arch/*/efi/runtime.c xen/include/headers.chk +xen/include/headers++.chk xen/include/asm xen/include/asm-*/asm-offsets.h xen/include/compat/* diff --git a/config/StdGNU.mk b/config/StdGNU.mk index 4efebe3..e10ed39 100644 --- a/config/StdGNU.mk +++ b/config/StdGNU.mk @@ -2,9 +2,11 @@ AS = $(CROSS_COMPILE)as LD = $(CROSS_COMPILE)ld ifeq ($(clang),y) CC = $(CROSS_COMPILE)clang +CXX= $(CROSS_COMPILE)clang++ LD_LTO = $(CROSS_COMPILE)llvm-ld else CC = $(CROSS_COMPILE)gcc +CXX= $(CROSS_COMPILE)g++ LD_LTO = $(CROSS_COMPILE)ld endif CPP= $(CC) -E diff --git a/config/SunOS.mk b/config/SunOS.mk index 3316280..c2be37d 100644 --- a/config/SunOS.mk +++ b/config/SunOS.mk @@ -2,6 +2,7 @@ AS = $(CROSS_COMPILE)gas LD = $(CROSS_COMPILE)gld CC = $(CROSS_COMPILE)gcc CPP= $(CROSS_COMPILE)gcc -E +CXX= $(CROSS_COMPILE)g++ AR = $(CROSS_COMPILE)gar RANLIB = $(CROSS_COMPILE)granlib NM = $(CROSS_COMPILE)gnm diff --git a/xen/include/Makefile b/xen/include/Makefile index 94112d1..d48a642 100644 --- a/xen/include/Makefile +++ b/xen/include/Makefile @@ -87,13 +87,33 @@ compat/xlat.h: $(addprefix compat/.xlat/,$(xlat-y)) Makefile ifeq ($(XEN_TARGET_ARCH),$(XEN_COMPILE_ARCH)) -all: headers.chk +all: headers.chk headers++.chk -headers.chk: $(filter-out public/arch-% public/%ctl.h public/xsm/% public/%hvm/save.h, $(wildcard public/*.h public/*/*.h) $(public-y)) Makefile - for i in $(filter %.h,$^); do $(CC) -ansi -include stdint.h -Wall -W -Werror -S -o /dev/null -x c $$i || exit 1; echo $$i; done >$@.new +PUBLIC_HEADERS := $(filter-out public/arch-% public/dom0_ops.h, $(wildcard public/*.h public/*/*.h) $(public-y)) + +PUBLIC_ANSI_HEADERS := $(filter-out public/%ctl.h public/xsm/% public/%hvm/save.h, $(PUBLIC_HEADERS)) + +headers.chk: $(PUBLIC_ANSI_HEADERS) Makefile + for i in $(filter %.h,$^); do \ + $(CC) -x c -ansi -Wall -Werror -include stdint.h \ + -S -o /dev/null $$i || exit 1; \ + echo $$i; \ + done >$@.new + mv $@.new $@ + +headers++.chk: $(PUBLIC_HEADERS) Makefile + if $(CXX) -v >/dev/null 2>&1; then \ + for i in $(filter %.h,$^); do \ + $(CXX) -x c++ -std=gnu++98 -Wall -Werror \ + -D__XEN_TOOLS__ -Dprivate=private_is_a_keyword_in_cpp \ + -include stdint.h -include public/xen.h \ + -S -o /dev/null $$i || exit 1; \ + echo $$i; \ + done ; \ + fi >$@.new mv $@.new $@ endif clean:: - rm -rf compat headers.chk + rm -rf compat headers.chk headers++.chk diff --git a/xen/include/public/platform.h b/xen/include/public/platform.h index 3e340b4..dd03447 100644 --- a/xen/include/public/platform.h +++ b/xen/include/public/platform.h @@ -126,6 +126,26 @@ DEFINE_XEN_GUEST_HANDLE(xenpf_platform_quirk_t); #define XEN_EFI_query_variable_info 9 #define XEN_EFI_query_capsule_capabilities 10 #define XEN_EFI_update_capsule 11 + +struct xenpf_efi_guid { +uint32_t data1; +uint16_t data2; +uint16_t data3; +uint8_t data4[8]; +}; + +struct xenpf_efi_time { +uint16_t year; +uint8_t month; +uint8_t day; +uint8_t hour; +uint8_t min; +uint8_t sec; +uint32_t ns; +int16_t tz; +uint8_t daylight; +}; + struct xenpf_efi_runtime_call { uint32_t function; /* @@ -138,17 +158,7 @@ struct xenpf_efi_runtime_call { union { #define XEN_EFI_GET_TIME_SET_CLEARS_NS 0x0001 struct { -struct xenpf_efi_time { -uint16_t year; -uint8_t
[Xen-devel] [PATCH v6 07/23] libxl: introduce vNUMA types
A domain can contain several virtual NUMA nodes, hence we introduce an array in libxl_domain_build_info. libxl_vnode_info contains the size of memory in that node, the distance from that node to every nodes, the underlying pnode and a bitmap of vcpus. Signed-off-by: Wei Liu Reviewed-by: Dario Faggioli Cc: Ian Campbell Cc: Ian Jackson Cc: Dario Faggioli Cc: Elena Ufimtseva Acked-by: Ian Campbell --- Changes in v4: 1. Use MemKB. Changes in v3: 1. Add commit message. --- tools/libxl/libxl_types.idl | 9 + 1 file changed, 9 insertions(+) diff --git a/tools/libxl/libxl_types.idl b/tools/libxl/libxl_types.idl index 02be466..14c7e7c 100644 --- a/tools/libxl/libxl_types.idl +++ b/tools/libxl/libxl_types.idl @@ -356,6 +356,13 @@ libxl_domain_sched_params = Struct("domain_sched_params",[ ("budget", integer, {'init_val': 'LIBXL_DOMAIN_SCHED_PARAM_BUDGET_DEFAULT'}), ]) +libxl_vnode_info = Struct("vnode_info", [ +("memkb", MemKB), +("distances", Array(uint32, "num_distances")), # distances from this node to other nodes +("pnode", uint32), # physical node of this node +("vcpus", libxl_bitmap), # vcpus in this node +]) + libxl_domain_build_info = Struct("domain_build_info",[ ("max_vcpus", integer), ("avail_vcpus", libxl_bitmap), @@ -376,6 +383,8 @@ libxl_domain_build_info = Struct("domain_build_info",[ ("disable_migrate", libxl_defbool), ("cpuid", libxl_cpuid_policy_list), ("blkdev_start",string), + +("vnuma_nodes", Array(libxl_vnode_info, "num_vnuma_nodes")), ("device_model_version", libxl_device_model_version), ("device_model_stubdomain", libxl_defbool), -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
Re: [Xen-devel] Xen's Linux kernel config options
On Thu, Feb 26, 2015 at 11:19:17AM +, Stefano Stabellini wrote: > On Wed, 25 Feb 2015, Luis R. Rodriguez wrote: > > On Wed, Feb 25, 2015 at 12:01:31PM +, Stefano Stabellini wrote: > > > On Tue, 24 Feb 2015, Luis R. Rodriguez wrote: > > > > On Tue, Feb 24, 2015 at 7:21 AM, Stefano Stabellini > > > > wrote: > > > > > On Mon, 23 Feb 2015, Luis R. Rodriguez wrote: > > > > >> On Thu, Feb 19, 2015 at 3:43 PM, Luis R. Rodriguez > > > > >> wrote: > > > > >> > On Fri, Dec 12, 2014 at 9:29 AM, David Vrabel > > > > >> > wrote: > > > > >> >> On 12/12/14 13:17, Juergen Gross wrote: > > > > >> >>> XEN_PVHVM > > > > >> >> > > > > >> >> Move XEN_PVHVM under XEN and have it select PARAVIRT and > > > > >> >> PARAVIRT_CLOCK. > > > > >> > > > > > >> > FWIW, although it seems we do not want to let users just build > > > > >> > XEN_PVHVM hypervisors I have the changes required now to at least > > > > >> > get > > > > >> > this to build so I do know what it takes. > > > > >> > > > > > >> >>> XEN_FRONTENDXEN_PV || > > > > >> >>> XEN_PVH > > > > >> >>> || > > > > >> >>> XEN_PVHVM > > > > >> >> > > > > >> >> This enables all the basic infrastructure for frontends: event > > > > >> >> channels, > > > > >> >> grant tables and Xenbus. > > > > >> >> > > > > >> >> Don't make XEN_FRONTEND depend on any XEN_* variant. It should be > > > > >> >> possible to have frontend drivers without support for any of the > > > > >> >> PV/PVHVM/PVH guest types. > > > > >> > > > > > >> > David, can you elaborate on the type of Xen guest it would be on > > > > >> > x86 > > > > >> > its not PV, PVHVM, or PVH? I'm particularly curious about the > > > > >> > xen_domain_type and how it would end up to selected. As it is we > > > > >> > tie > > > > >> > in XEN_PVHVM at build time with XEN_PVH, in order to have XEN_PVHVM > > > > >> > completely removed from XEN_PVH we need quite a bit of code changes > > > > >> > which at least as code exercise I have completed already. If we > > > > >> > want > > > > >> > at the very least xen_domain_type set when XEN_PV, XEN_PVHVM, and > > > > >> > XEN_PVH are not available we need a bit more work. > > > > >> > > > > >> OK I think I see the issue. We have nothing quite like > > > > >> xen_guest_init() on x86 enlighten.c, we do have this for ARM and I > > > > >> think I can that close the gap I'm observing. > > > > >> > > > > >> >> Frontends only need event channels, grant > > > > >> >> table and xenbus. > > > > >> > > > > > >> > Well xenbus_probe_initcall() will check for xen_domain() and that > > > > >> > won't be set on x86 right now unless we have XEN_PV, XEN_PVHVM or > > > > >> > XEN_PVH set -- to start off with. Then > > > > >> > drivers/xen/xenbus/xenbus_client.c will check xen_feature in quite > > > > >> > a > > > > >> > bit of places as well, that won't be set unless > > > > >> > xen_setup_features() > > > > >> > is called which right now is only done on x86 > > > > >> > arch/x86/xen/enlighten.c > > > > >> > which as Juergen pointed out, is not needed if you don't have > > > > >> > XEN_PV > > > > >> > or XEN_PVH. As it turns out this is incorrect though, its needed > > > > >> > for > > > > >> > XEN_PVHVM as well and my split exercise in code addresses this. > > > > >> > Now, > > > > >> > at least in my code if you don't have XEN_PV, XEN_PVHVM, or > > > > >> > XEN_PVH we > > > > >> > don't call xen_setup_features() and its unclear to me where or how > > > > >> > that should happen in other cases. > > > > >> > > > > >> Yeah I think having an x86 equivalent of xen_guest_init() would solve > > > > >> this, Stefano, thoughts? > > > > > > > > > > Having xen_guest_init() on x86 would be nice. Being able to set > > > > > xen_domain_type to XEN_HVM_DOMAIN if we are running on Xen, regardless > > > > > of XEN_PV/PVH/PVHVM also makes sense from Linux POV. > > > > > > > > OK great, thanks for the feedback. > > > > > > > > > That said, I don't see much value in removing XEN_PVHVM: why are we > > > > > even > > > > > doing this? What is the improvement we are seeking? > > > > > > > > We would not, the above discussed about the possibility of letting > > > > users enable XEN_PVHVM without XEN_PVH, that's all. > > > > > > OK, that makes sense. > > > > > > > As is the only thing that can enable XEN_PVHVM is if you enable > > > > XEN_PVH. > > > > > > This is the bit that we need to change but it shouldn't be difficult. > > > > > > > If we want > > > > xen_guest_init() alone though we might need the decoupling though at > > > > least at build time so that if XEN_PV or XEN_PVH is not selected we'd > > > > at least have XEN_PVHVM. Thoughts? > > > > > > Today pv(h) and pvhvm have very different boot paths already: pv and pvh > > > initialize via xen_start_kernel while pvhvm via xen_hvm_guest_init. > > > > Ah I see, this helps a lot thanks! > > >
[Xen-devel] [PATCH v6 23/23] xl: vNUMA support
This patch includes configuration options parser and documentation. Please find the hunk to xl.cfg.pod.5 for more information. Signed-off-by: Wei Liu Cc: Ian Campbell Cc: Ian Jackson --- Changes in v6: 1. Disable NUMA auto-placement. --- docs/man/xl.cfg.pod.5| 54 ++ tools/libxl/xl_cmdimpl.c | 140 ++- 2 files changed, 193 insertions(+), 1 deletion(-) diff --git a/docs/man/xl.cfg.pod.5 b/docs/man/xl.cfg.pod.5 index 408653f..2a27b1c 100644 --- a/docs/man/xl.cfg.pod.5 +++ b/docs/man/xl.cfg.pod.5 @@ -266,6 +266,60 @@ it will crash. =back +=head3 Guest Virtual NUMA Configuration + +=over 4 + +=item B in the list specifies the configuration of nth +virtual node. + +Each B is a list, which has a form of +"[VNODE_CONFIG_OPTION,VNODE_CONFIG_OPTION, ... ]" (without quotes). + +For example vnuma = [ ["pnode=0","size=512","vcpus=0-4","vdistances=10,20"] ] +means vnode 0 is mapped to pnode 0, has 512MB ram, has vcpus 0 to 4, the +distance to itself is 10 and the distance to vnode 1 is 20. + +Each B is a quoted string. Supported +Bs are: + +=over 4 + +=item B + +Specify which physical node this virtual node maps to. + +=item B + +Specify the size of this virtual node. The sum of memory size of all +vnodes must match B (or B if B is not +specified). + +=item B + +Specify which vcpus belong to this node. B is a string +separated by comma. You can specify range and single cpu. An example +is "vcpus=0-5,8", which means you specify vcpu 0 to vcpu 5, and vcpu +8. + +=item B + +Specify virtual distance from this node to all nodes (including +itself) with positional arguments. For example, "vdistance=10,20" +for vnode 0 means the distance from vnode 0 to vnode 0 is 10, from +vnode 0 to vnode 1 is 20. The number of arguments supplied must match +the total number of vnodes. + +Normally you can use the values from "xl info -n" or "numactl +--hardware" to fill in vdistance list. + +=back + +=back + =head3 Event Actions =over 4 diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c index 5b366f2..2899d9f 100644 --- a/tools/libxl/xl_cmdimpl.c +++ b/tools/libxl/xl_cmdimpl.c @@ -158,7 +158,6 @@ struct domain_create { }; -static uint32_t find_domain(const char *p) __attribute__((warn_unused_result)); static uint32_t find_domain(const char *p) { uint32_t domid; @@ -987,6 +986,143 @@ static int parse_nic_config(libxl_device_nic *nic, XLU_Config **config, char *to return 0; } +static void parse_vnuma_config(const XLU_Config *config, + libxl_domain_build_info *b_info) +{ +libxl_physinfo physinfo; +uint32_t nr_nodes; +XLU_ConfigList *vnuma; +int i, j, len, num_vnuma; + + +libxl_physinfo_init(&physinfo); +if (libxl_get_physinfo(ctx, &physinfo) != 0) { +libxl_physinfo_dispose(&physinfo); +fprintf(stderr, "libxl_get_physinfo failed\n"); +exit(1); +} + +nr_nodes = physinfo.nr_nodes; +libxl_physinfo_dispose(&physinfo); + +if (xlu_cfg_get_list(config, "vnuma", &vnuma, &num_vnuma, 1)) +return; + +b_info->num_vnuma_nodes = num_vnuma; +b_info->vnuma_nodes = xcalloc(num_vnuma, sizeof(libxl_vnode_info)); + +for (i = 0; i < b_info->num_vnuma_nodes; i++) { +libxl_vnode_info *p = &b_info->vnuma_nodes[i]; + +libxl_vnode_info_init(p); +libxl_cpu_bitmap_alloc(ctx, &p->vcpus, b_info->max_vcpus); +libxl_bitmap_set_none(&p->vcpus); +p->distances = xcalloc(b_info->num_vnuma_nodes, + sizeof(*p->distances)); +p->num_distances = b_info->num_vnuma_nodes; +} + +for (i = 0; i < num_vnuma; i++) { +XLU_ConfigValue *vnode_spec, *conf_option; +XLU_ConfigList *vnode_config_list; +int conf_count; +libxl_vnode_info *p = &b_info->vnuma_nodes[i]; + +vnode_spec = xlu_cfg_get_listitem2(vnuma, i); +assert(vnode_spec); + +xlu_cfg_value_get_list(config, vnode_spec, &vnode_config_list, 0); +if (!vnode_config_list) { +fprintf(stderr, "xl: cannot get vnode config option list\n"); +exit(1); +} + +for (conf_count = 0; + (conf_option = + xlu_cfg_get_listitem2(vnode_config_list, conf_count)); + conf_count++) { + +if (xlu_cfg_value_type(conf_option) == XLU_STRING) { +char *buf, *option_untrimmed, *value_untrimmed; +char *option, *value; +char *endptr; +unsigned long val; + +xlu_cfg_value_get_string(config, conf_option, &buf, 0); + +if (!buf) continue; + +if (split_string_into_pair(buf, "=", + &option_untrimmed, + &value_untrimmed)) { +fprintf(stderr, "xl: failed to split \"%s\" into pair\n", +
[Xen-devel] [PATCH v6 03/23] xen: make two memory hypercalls vNUMA-aware
Make XENMEM_increase_reservation and XENMEM_populate_physmap vNUMA-aware. That is, if guest requests Xen to allocate memory for specific vnode, Xen can translate vnode to pnode using vNUMA information of that guest. XENMEMF_vnode is introduced for the guest to mark the node number is in fact virtual node number and should be translated by Xen. XENFEAT_memory_op_vnode_supported is introduced to indicate that Xen is able to translate virtual node to physical node. Signed-off-by: Wei Liu Cc: Jan Beulich Cc: Andrew Cooper --- Changes in v6: 1. Add logic in construct_memop_from_reservation. --- xen/common/kernel.c | 2 +- xen/common/memory.c | 45 --- xen/include/public/features.h | 3 +++ xen/include/public/memory.h | 2 ++ 4 files changed, 44 insertions(+), 8 deletions(-) diff --git a/xen/common/kernel.c b/xen/common/kernel.c index 0d9e519..e5e0050 100644 --- a/xen/common/kernel.c +++ b/xen/common/kernel.c @@ -301,7 +301,7 @@ DO(xen_version)(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg) switch ( fi.submap_idx ) { case 0: -fi.submap = 0; +fi.submap = (1U << XENFEAT_memory_op_vnode_supported); if ( VM_ASSIST(d, VMASST_TYPE_pae_extended_cr3) ) fi.submap |= (1U << XENFEAT_pae_pgdir_above_4gb); if ( paging_mode_translate(current->domain) ) diff --git a/xen/common/memory.c b/xen/common/memory.c index d24b001..9f8891b 100644 --- a/xen/common/memory.c +++ b/xen/common/memory.c @@ -692,7 +692,7 @@ out: return rc; } -static int construct_memop_from_reservation( +static int construct_memop_from_reservation(struct domain *d, const struct xen_memory_reservation *r, struct memop_args *a) { @@ -716,9 +716,37 @@ static int construct_memop_from_reservation( a->memflags = MEMF_bits(address_bits); } -a->memflags |= MEMF_node(XENMEMF_get_node(r->mem_flags)); -if ( r->mem_flags & XENMEMF_exact_node_request ) -a->memflags |= MEMF_exact_node; +if ( r->mem_flags & XENMEMF_vnode ) +{ +unsigned int vnode, pnode; + +read_lock(&d->vnuma_rwlock); +if ( d->vnuma ) +{ +vnode = XENMEMF_get_node(r->mem_flags); +if ( vnode >= d->vnuma->nr_vnodes ) +{ +rc = -EINVAL; +read_unlock(&d->vnuma_rwlock); +goto out; +} + +pnode = d->vnuma->vnode_to_pnode[vnode]; +if ( pnode != XEN_NUMA_NO_NODE ) +{ +a->memflags |= MEMF_node(pnode); +if ( r->mem_flags & XENMEMF_exact_node_request ) +a->memflags |= MEMF_exact_node; +} +} +read_unlock(&d->vnuma_rwlock); +} +else +{ +a->memflags |= MEMF_node(XENMEMF_get_node(r->mem_flags)); +if ( r->mem_flags & XENMEMF_exact_node_request ) +a->memflags |= MEMF_exact_node; +} rc = 0; out: @@ -753,9 +781,6 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg) args.nr_done = start_extent; args.preempted= 0; -if ( construct_memop_from_reservation(&reservation, &args) ) -return start_extent; - if ( op == XENMEM_populate_physmap && (reservation.mem_flags & XENMEMF_populate_on_demand) ) args.memflags |= MEMF_populate_on_demand; @@ -765,6 +790,12 @@ long do_memory_op(unsigned long cmd, XEN_GUEST_HANDLE_PARAM(void) arg) return start_extent; args.domain = d; +if ( construct_memop_from_reservation(d, &reservation, &args) ) +{ +rcu_unlock_domain(d); +return start_extent; +} + if ( xsm_memory_adjust_reservation(XSM_TARGET, current->domain, d) ) { rcu_unlock_domain(d); diff --git a/xen/include/public/features.h b/xen/include/public/features.h index 16d92aa..2110b04 100644 --- a/xen/include/public/features.h +++ b/xen/include/public/features.h @@ -99,6 +99,9 @@ #define XENFEAT_grant_map_identity12 */ +/* Guest can use XENMEMF_vnode to specify virtual node for memory op. */ +#define XENFEAT_memory_op_vnode_supported 13 + #define XENFEAT_NR_SUBMAPS 1 #endif /* __XEN_PUBLIC_FEATURES_H__ */ diff --git a/xen/include/public/memory.h b/xen/include/public/memory.h index 0d8c85f..d71127f 100644 --- a/xen/include/public/memory.h +++ b/xen/include/public/memory.h @@ -57,6 +57,8 @@ /* Flag to request allocation only from the node specified */ #define XENMEMF_exact_node_request (1<<17) #define XENMEMF_exact_node(n) (XENMEMF_node(n) | XENMEMF_exact_node_request) +/* Flag to indicate the node specified is virtual node */ +#define XENMEMF_vnode (1<<18) #endif struct xen_memory_reservation { -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.
[Xen-devel] [PATCH v6 10/23] libxl: x86: factor out e820_host_sanitize
This function gets the machine E820 map and sanitize it according to PV guest configuration. This will be used in later patch. No functional change introduced in this patch. Signed-off-by: Wei Liu Reviewed-by: Andrew Cooper Reviewed-by: Dario Faggioli Cc: Ian Campbell Cc: Ian Jackson Cc: Elena Ufimtseva Acked-by: Ian Campbell --- Changes in v4: 1. Use actual size of the map instead of using E820MAX. --- tools/libxl/libxl_x86.c | 32 +++- 1 file changed, 23 insertions(+), 9 deletions(-) diff --git a/tools/libxl/libxl_x86.c b/tools/libxl/libxl_x86.c index 9ceb373..d012b4d 100644 --- a/tools/libxl/libxl_x86.c +++ b/tools/libxl/libxl_x86.c @@ -207,6 +207,27 @@ static int e820_sanitize(libxl_ctx *ctx, struct e820entry src[], return 0; } +static int e820_host_sanitize(libxl__gc *gc, + libxl_domain_build_info *b_info, + struct e820entry map[], + uint32_t *nr) +{ +int rc; + +rc = xc_get_machine_memory_map(CTX->xch, map, *nr); +if (rc < 0) { +errno = rc; +return ERROR_FAIL; +} + +*nr = rc; + +rc = e820_sanitize(CTX, map, nr, b_info->target_memkb, + (b_info->max_memkb - b_info->target_memkb) + + b_info->u.pv.slack_memkb); +return rc; +} + static int libxl__e820_alloc(libxl__gc *gc, uint32_t domid, libxl_domain_config *d_config) { @@ -223,15 +244,8 @@ static int libxl__e820_alloc(libxl__gc *gc, uint32_t domid, if (!libxl_defbool_val(b_info->u.pv.e820_host)) return ERROR_INVAL; -rc = xc_get_machine_memory_map(ctx->xch, map, E820MAX); -if (rc < 0) { -errno = rc; -return ERROR_FAIL; -} -nr = rc; -rc = e820_sanitize(ctx, map, &nr, b_info->target_memkb, - (b_info->max_memkb - b_info->target_memkb) + - b_info->u.pv.slack_memkb); +nr = E820MAX; +rc = e820_host_sanitize(gc, b_info, map, &nr); if (rc) return ERROR_FAIL; -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v6 15/23] libxl: build, check and pass vNUMA info to Xen for HVM guest
Transform user supplied vNUMA configuration into libxl internal representations then libxc representations. Check validity along the line. Libxc has more involvement in building vmemranges in HVM case compared to PV case. The building of vmemranges is placed after xc_hvm_build returns, because it relies on memory hole information provided by xc_hvm_build. Signed-off-by: Wei Liu Reviewed-by: Dario Faggioli Cc: Ian Campbell Cc: Ian Jackson Cc: Dario Faggioli Cc: Elena Ufimtseva --- Changes in v6: 1. Fix a minor bug discovered by Dario. Changes in v5: 1. Check vnode 0 is large enough to accommodate video ram. Changes in v4: 1. Adapt to new interface. 2. Rename some variables. 3. Use GCREALLOC_ARRAY. Changes in v3: 1. Rewrite commit log. --- tools/libxl/libxl_create.c | 9 +++ tools/libxl/libxl_dom.c | 43 ++ tools/libxl/libxl_internal.h | 5 tools/libxl/libxl_vnuma.c| 56 4 files changed, 113 insertions(+) diff --git a/tools/libxl/libxl_create.c b/tools/libxl/libxl_create.c index 98687bd..af04248 100644 --- a/tools/libxl/libxl_create.c +++ b/tools/libxl/libxl_create.c @@ -853,6 +853,15 @@ static void initiate_domain_create(libxl__egc *egc, goto error_out; } +/* Disallow PoD and vNUMA to be enabled at the same time because PoD + * pool is not vNUMA-aware yet. + */ +if (pod_enabled && d_config->b_info.num_vnuma_nodes) { +ret = ERROR_INVAL; +LOG(ERROR, "Cannot enable PoD and vNUMA at the same time"); +goto error_out; +} + ret = libxl__domain_create_info_setdefault(gc, &d_config->c_info); if (ret) goto error_out; diff --git a/tools/libxl/libxl_dom.c b/tools/libxl/libxl_dom.c index b58a19b..c1a409d 100644 --- a/tools/libxl/libxl_dom.c +++ b/tools/libxl/libxl_dom.c @@ -893,12 +893,55 @@ int libxl__build_hvm(libxl__gc *gc, uint32_t domid, goto out; } +if (info->num_vnuma_nodes != 0) { +int i; + +args.nr_vmemranges = state->num_vmemranges; +args.vmemranges = libxl__malloc(gc, sizeof(*args.vmemranges) * +args.nr_vmemranges); + +for (i = 0; i < args.nr_vmemranges; i++) { +args.vmemranges[i].start = state->vmemranges[i].start; +args.vmemranges[i].end = state->vmemranges[i].end; +args.vmemranges[i].flags = state->vmemranges[i].flags; +args.vmemranges[i].nid = state->vmemranges[i].nid; +} + +/* Consider video ram belongs to vmemrange 0 -- just shrink it + * by the size of video ram. + */ +if (((args.vmemranges[0].end - args.vmemranges[0].start) >> 10) +< info->video_memkb) { +LOG(ERROR, "vmemrange 0 too small to contain video ram"); +goto out; +} + +args.vmemranges[0].end -= (info->video_memkb << 10); + +args.nr_vnodes = info->num_vnuma_nodes; +args.vnode_to_pnode = libxl__malloc(gc, sizeof(*args.vnode_to_pnode) * +args.nr_vnodes); +for (i = 0; i < args.nr_vnodes; i++) +args.vnode_to_pnode[i] = info->vnuma_nodes[i].pnode; +} + ret = xc_hvm_build(ctx->xch, domid, &args); if (ret) { LOGEV(ERROR, ret, "hvm building failed"); goto out; } +if (info->num_vnuma_nodes != 0) { +ret = libxl__vnuma_build_vmemrange_hvm(gc, domid, info, state, &args); +if (ret) { +LOGEV(ERROR, ret, "hvm build vmemranges failed"); +goto out; +} +ret = libxl__vnuma_config_check(gc, info, state); +if (ret) goto out; +ret = set_vnuma_info(gc, domid, info, state); +if (ret) goto out; +} ret = hvm_build_set_params(ctx->xch, domid, info, state->store_port, &state->store_mfn, state->console_port, &state->console_mfn, state->store_domid, diff --git a/tools/libxl/libxl_internal.h b/tools/libxl/libxl_internal.h index 7d1e1cf..e93089a 100644 --- a/tools/libxl/libxl_internal.h +++ b/tools/libxl/libxl_internal.h @@ -3408,6 +3408,11 @@ int libxl__vnuma_build_vmemrange_pv(libxl__gc *gc, uint32_t domid, libxl_domain_build_info *b_info, libxl__domain_build_state *state); +int libxl__vnuma_build_vmemrange_hvm(libxl__gc *gc, + uint32_t domid, + libxl_domain_build_info *b_info, + libxl__domain_build_state *state, + struct xc_hvm_build_args *args); _hidden int libxl__ms_vm_genid_set(libxl__gc *gc, uint32_t domid, const libxl_ms_vm_genid *id); diff --git a/tools/libxl/libxl_vnuma.c b/tools/libxl/libxl_v
[Xen-devel] [PATCH v6 21/23] libxlu: introduce new APIs
These APIs can be used to manipulate XLU_ConfigValue and XLU_ConfigList. APIs introduced: 1. xlu_cfg_value_type 2. xlu_cfg_value_get_string 3. xlu_cfg_value_get_list 4. xlu_cfg_get_listitem2 Move some definitions from private header to public header as needed. Signed-off-by: Wei Liu Cc: Ian Jackson Cc: Ian Campbell --- Changes in v6: 1. Report value's line and column number on error. Changes in v5: 1. Use calling convention like old APIs. --- tools/libxl/libxlu_cfg.c | 45 +++ tools/libxl/libxlu_internal.h | 7 --- tools/libxl/libxlutil.h | 13 + 3 files changed, 58 insertions(+), 7 deletions(-) diff --git a/tools/libxl/libxlu_cfg.c b/tools/libxl/libxlu_cfg.c index b921a13..62fb798 100644 --- a/tools/libxl/libxlu_cfg.c +++ b/tools/libxl/libxlu_cfg.c @@ -199,6 +199,51 @@ static int find_atom(const XLU_Config *cfg, const char *n, return 0; } + +enum XLU_ConfigValueType xlu_cfg_value_type(const XLU_ConfigValue *value) +{ +return value->type; +} + +int xlu_cfg_value_get_string(const XLU_Config *cfg, XLU_ConfigValue *value, + char **value_r, int dont_warn) +{ +if (value->type != XLU_STRING) { +if (!dont_warn) +fprintf(cfg->report, +"%s:%d:%d: warning: value is not a string\n", +cfg->config_source, value->line, value->column); +*value_r = NULL; +return EINVAL; +} + +*value_r = value->u.string; +return 0; +} + +int xlu_cfg_value_get_list(const XLU_Config *cfg, XLU_ConfigValue *value, + XLU_ConfigList **value_r, int dont_warn) +{ +if (value->type != XLU_LIST) { +if (!dont_warn) +fprintf(cfg->report, +"%s:%d:%d: warning: value is not a list\n", +cfg->config_source, value->line, value->column); +*value_r = NULL; +return EINVAL; +} + +*value_r = &value->u.list; +return 0; +} + +XLU_ConfigValue *xlu_cfg_get_listitem2(const XLU_ConfigList *list, + int entry) +{ +if (entry < 0 || entry >= list->nvalues) return NULL; +return list->values[entry]; +} + int xlu_cfg_get_string(const XLU_Config *cfg, const char *n, const char **value_r, int dont_warn) { XLU_ConfigSetting *set; diff --git a/tools/libxl/libxlu_internal.h b/tools/libxl/libxlu_internal.h index 73fd85f..1d310b1 100644 --- a/tools/libxl/libxlu_internal.h +++ b/tools/libxl/libxlu_internal.h @@ -25,13 +25,6 @@ #include "libxlutil.h" -enum XLU_ConfigValueType { -XLU_STRING, -XLU_LIST, -}; - -typedef struct XLU_ConfigValue XLU_ConfigValue; - typedef struct XLU_ConfigList { int avalues; /* available slots */ int nvalues; /* actual occupied slots */ diff --git a/tools/libxl/libxlutil.h b/tools/libxl/libxlutil.h index 0333e55..989605a 100644 --- a/tools/libxl/libxlutil.h +++ b/tools/libxl/libxlutil.h @@ -20,9 +20,15 @@ #include "libxl.h" +enum XLU_ConfigValueType { +XLU_STRING, +XLU_LIST, +}; + /* Unless otherwise stated, all functions return an errno value. */ typedef struct XLU_Config XLU_Config; typedef struct XLU_ConfigList XLU_ConfigList; +typedef struct XLU_ConfigValue XLU_ConfigValue; XLU_Config *xlu_cfg_init(FILE *report, const char *report_filename); /* 0 means we got ENOMEM. */ @@ -66,6 +72,13 @@ const char *xlu_cfg_get_listitem(const XLU_ConfigList*, int entry); /* xlu_cfg_get_listitem cannot fail, except that if entry is * out of range it returns 0 (not setting errno) */ +enum XLU_ConfigValueType xlu_cfg_value_type(const XLU_ConfigValue *value); +int xlu_cfg_value_get_string(const XLU_Config *cfg, XLU_ConfigValue *value, + char **value_r, int dont_warn); +int xlu_cfg_value_get_list(const XLU_Config *cfg, XLU_ConfigValue *value, + XLU_ConfigList **value_r, int dont_warn); +XLU_ConfigValue *xlu_cfg_get_listitem2(const XLU_ConfigList *list, + int entry); /* * Disk specification parsing. -- 1.9.1 ___ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel
[Xen-devel] [PATCH v6 18/23] libxlu: rework internal representation of setting
This patches does following things: 1. Properly define a XLU_ConfigList type. Originally it was defined to be XLU_ConfigSetting. 2. Define XLU_ConfigValue type, which can be either a string or a list of XLU_ConfigValue. 3. ConfigSetting now references XLU_ConfigValue. Originally it only worked with **string. 4. Properly construct list where necessary, see changes to .y file. To achieve above changes: 1. xlu__cfg_set_mk and xlu__cfg_set_add are deleted, because they are no more needed in the new code. 2. Introduce xlu__cfg_string_mk to make a XLU_ConfigSetting that points to a XLU_ConfigValue that wraps a string. 3. Introduce xlu__cfg_list_mk to make a XLU_ConfigSetting that points to XLU_ConfigValue that is a list. 4. The parser now generates XLU_ConfigValue instead of XLU_ConfigSetting when construct values, which enables us to recursively generate list of lists. 5. XLU_ConfigSetting is generated in xlu__cfg_set_store. 6. Adapt other functions to use new types. No change to public API. Xl compiles without problem and 'xl create -n guest.cfg' is valgrind clean. This patch is needed because we're going to implement nested list support, which requires support for list of list. Signed-off-by: Wei Liu Cc: Ian Jackson Cc: Ian Campbell Acked-by: Ian Jackson --- Changes in v5: 1. Use standard expanding-array pattern. --- tools/libxl/libxlu_cfg.c | 170 ++ tools/libxl/libxlu_cfg_i.h| 12 ++- tools/libxl/libxlu_cfg_y.c| 24 +++--- tools/libxl/libxlu_cfg_y.h| 2 +- tools/libxl/libxlu_cfg_y.y| 14 ++-- tools/libxl/libxlu_internal.h | 30 ++-- 6 files changed, 173 insertions(+), 79 deletions(-) diff --git a/tools/libxl/libxlu_cfg.c b/tools/libxl/libxlu_cfg.c index 22adcb0..f000eed 100644 --- a/tools/libxl/libxlu_cfg.c +++ b/tools/libxl/libxlu_cfg.c @@ -131,14 +131,28 @@ int xlu_cfg_readdata(XLU_Config *cfg, const char *data, int length) { return ctx.err; } -void xlu__cfg_set_free(XLU_ConfigSetting *set) { +void xlu__cfg_value_free(XLU_ConfigValue *value) +{ int i; +if (!value) return; + +switch (value->type) { +case XLU_STRING: +free(value->u.string); +break; +case XLU_LIST: +for (i = 0; i < value->u.list.nvalues; i++) +xlu__cfg_value_free(value->u.list.values[i]); +free(value->u.list.values); +} +free(value); +} + +void xlu__cfg_set_free(XLU_ConfigSetting *set) { if (!set) return; free(set->name); -for (i=0; invalues; i++) -free(set->values[i]); -free(set->values); +xlu__cfg_value_free(set->value); free(set); } @@ -173,7 +187,7 @@ static int find_atom(const XLU_Config *cfg, const char *n, set= find(cfg,n); if (!set) return ESRCH; -if (set->avalues!=1) { +if (set->value->type!=XLU_STRING) { if (!dont_warn) fprintf(cfg->report, "%s:%d: warning: parameter `%s' is" @@ -191,7 +205,7 @@ int xlu_cfg_get_string(const XLU_Config *cfg, const char *n, int e; e= find_atom(cfg,n,&set,dont_warn); if (e) return e; -*value_r= set->values[0]; +*value_r= set->value->u.string; return 0; } @@ -202,7 +216,7 @@ int xlu_cfg_replace_string(const XLU_Config *cfg, const char *n, e= find_atom(cfg,n,&set,dont_warn); if (e) return e; free(*value_r); -*value_r= strdup(set->values[0]); +*value_r= strdup(set->value->u.string); return 0; } @@ -214,7 +228,7 @@ int xlu_cfg_get_long(const XLU_Config *cfg, const char *n, char *ep; e= find_atom(cfg,n,&set,dont_warn); if (e) return e; -errno= 0; l= strtol(set->values[0], &ep, 0); +errno= 0; l= strtol(set->value->u.string, &ep, 0); e= errno; if (errno) { e= errno; @@ -226,7 +240,7 @@ int xlu_cfg_get_long(const XLU_Config *cfg, const char *n, cfg->config_source, set->lineno, n, strerror(e)); return e; } -if (*ep || ep==set->values[0]) { +if (*ep || ep==set->value->u.string) { if (!dont_warn) fprintf(cfg->report, "%s:%d: warning: parameter `%s' is not a valid number\n", @@ -253,7 +267,7 @@ int xlu_cfg_get_list(const XLU_Config *cfg, const char *n, XLU_ConfigList **list_r, int *entries_r, int dont_warn) { XLU_ConfigSetting *set; set= find(cfg,n); if (!set) return ESRCH; -if (set->avalues==1) { +if (set->value->type!=XLU_LIST) { if (!dont_warn) { fprintf(cfg->report, "%s:%d: warning: parameter `%s' is a single value" @@ -262,8 +276,8 @@ int xlu_cfg_get_list(const XLU_Config *cfg, const char *n, } return EINVAL; } -if (list_r) *list_r= set; -if (entries_r) *entries_r= set->nvalues; +if (list_r) *list_r= &set->value->u.list; +if (entries_r) *entries_r= set->value->u.list.nvalues; return 0; } @@ -290,72 +304,130 @@ i