Re: [Xen-devel] [PATCH v4 4/6] xen: add xen_pin_vcpu() to support calling functions on a dedicated pcpu
On 05/04/16 11:01, Juergen Gross wrote: > > No, I don't think this is a good idea. In the EINVAL or EBUSY case a > simple Xen admin command might be enough to make the next call succeed. > I don't want to disable pinning in this case. Ok. Acked-by: David Vrabel <david.vra...@citrix.com> David ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [Xen-devel] [PATCH v4 4/6] xen: add xen_pin_vcpu() to support calling functions on a dedicated pcpu
On 05/04/16 06:10, Juergen Gross wrote: > Some hardware models (e.g. Dell Studio 1555 laptops) require calls to > the firmware to be issued on cpu 0 only. As Dom0 might have to use > these calls, add xen_pin_vcpu() to achieve this functionality. > > In case either the domain doesn't have the privilege to make the > related hypercall or the hypervisor isn't supporting it, issue a > warning once and disable further pinning attempts. [...] > --- a/arch/x86/xen/enlighten.c > +++ b/arch/x86/xen/enlighten.c > @@ -1885,6 +1885,45 @@ static void xen_set_cpu_features(struct cpuinfo_x86 *c) > } > } > > +static void xen_pin_vcpu(int cpu) > +{ > + static bool disable_pinning; > + struct sched_pin_override pin_override; > + int ret; > + > + if (disable_pinning) > + return; > + > + pin_override.pcpu = cpu; > + ret = HYPERVISOR_sched_op(SCHEDOP_pin_override, _override); /* Ignore errors when removing override. */ > + if (cpu < 0) > + return; > + > + switch (ret) { > + case -ENOSYS: > + pr_warn("The kernel tried to call a function on physical cpu > %d, but Xen isn't\n" > + "supporting this. In case of problems you might > consider vcpu pinning.\n", > + cpu); > + disable_pinning = true; > + break; > + case -EPERM: > + WARN(1, "Trying to pin vcpu without having privilege to do > so\n"); > + disable_pinning = true; > + break; > + case -EINVAL: > + case -EBUSY: > + pr_warn("The kernel tried to call a function on physical cpu > %d, but this cpu\n" > + "seems not to be available. Please check your Xen cpu > configuration.\n", > + cpu); > + break; > + case 0: > + break; > + default: > + WARN(1, "rc %d while trying to pin vcpu\n", ret); > + disable_pinning = true; > + } These messages are a bit wordy for my taste and since they don't say what function failed or what driver is affected they're not as useful as they could be. I'd probably turn these all into: if (cpu >= 0 && ret < 0) { pr_warn("Failed to pin VCPU %d to physical CPU %d: %d", smp_processor_id(), cpu, ret); disable_pinning = true; } And look at getting the user of this API to print a more useful error. "i8k: unable to call SMM BIOS on physical CPU %d: %d" Or whatever. David ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [Xen-devel] [PATCH v4 1/6] xen: sync xen header
On 05/04/16 06:10, Juergen Gross wrote: > Import the actual version of include/xen/interface/sched.h from Xen. Acked-by: David Vrabel <david.vra...@citrix.com> David ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [Xen-devel] [PATCH v2 5/6] virt, sched: add cpu pinning to smp_call_sync_on_phys_cpu()
On 21/03/16 13:42, Jan Beulich wrote: > > Also don't you need to call smp_processor_id() after preempt_disable()? I suggest using get_cpu()/put_cpu() here. David ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v7 9/9] vring: Use the DMA API on Xen
On 03/02/16 05:46, Andy Lutomirski wrote: > Signed-off-by: Andy LutomirskiYou forgot the previous Reviewed-by tags. David ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v6 9/9] vring: Use the DMA API on Xen
On 01/02/16 18:00, Andy Lutomirski wrote: > Signed-off-by: Andy Lutomirski <l...@kernel.org> Reviewed-by: David Vrabel <david.vra...@citrix.com> Thanks. David ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [Xen-devel] [PATCH v5 09/10] vring: Use the DMA API on Xen
On 29/01/16 02:31, Andy Lutomirski wrote: > Signed-off-by: Andy Lutomirski> --- > drivers/virtio/virtio_ring.c | 12 > 1 file changed, 12 insertions(+) > > diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c > index c169c6444637..305c05cc249a 100644 > --- a/drivers/virtio/virtio_ring.c > +++ b/drivers/virtio/virtio_ring.c > @@ -47,6 +47,18 @@ > > static bool vring_use_dma_api(void) > { > +#if defined(CONFIG_X86) && defined(CONFIG_XEN) > + /* > + * In theory, it's possible to have a buggy QEMU-supposed > + * emulated Q35 IOMMU and Xen enabled at the same time. On > + * such a configuration, virtio has never worked and will > + * not work without an even larger kludge. Instead, enable > + * the DMA API if we're a Xen guest, which at least allows > + * all of the sensible Xen configurations to work correctly. > + */ > + return static_cpu_has(X86_FEATURE_XENPV); You want: if (xen_domain()) return true; Without the #if so we use the DMA API for all types of Xen guest on all architectures. David ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v3 39/41] xen/events: use virt_xxx barriers
On 10/01/16 14:21, Michael S. Tsirkin wrote: > drivers/xen/events/events_fifo.c uses rmb() to communicate with the > other side. > > For guests compiled with CONFIG_SMP, smp_rmb would be sufficient, so > rmb() here is only needed if a non-SMP guest runs on an SMP host. > > Switch to the virt_rmb barrier which serves this exact purpose. > > Pull in asm/barrier.h here to make sure the file is self-contained. > > Suggested-by: David Vrabel <david.vra...@citrix.com> > Signed-off-by: Michael S. Tsirkin <m...@redhat.com> Acked-by: David Vrabel <david.vra...@citrix.com> David ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [Xen-devel] [PATCH v2 34/34] xen/io: use virt_xxx barriers
On 31/12/15 19:10, Michael S. Tsirkin wrote: > include/xen/interface/io/ring.h uses > full memory barriers to communicate with the other side. > > For guests compiled with CONFIG_SMP, smp_wmb and smp_mb > would be sufficient, so mb() and wmb() here are only needed if > a non-SMP guest runs on an SMP host. > > Switch to virt_xxx barriers which serve this exact purpose. Acked-by: David Vrabel <david.vra...@citrix.com> David ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [Xen-devel] [PATCH v2 33/34] xenbus: use virt_xxx barriers
On 31/12/15 19:10, Michael S. Tsirkin wrote: > drivers/xen/xenbus/xenbus_comms.c uses > full memory barriers to communicate with the other side. > > For guests compiled with CONFIG_SMP, smp_wmb and smp_mb > would be sufficient, so mb() and wmb() here are only needed if > a non-SMP guest runs on an SMP host. > > Switch to virt_xxx barriers which serve this exact purpose. Acked-by: David Vrabel <david.vra...@citrix.com> If you're feeling particularly keen there's a rmb() consume_one_event() in drivers/xen/events/events_fifo.c that can be converted to virt_rmb() as well. David ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [Xen-devel] new barrier type for paravirt (was Re: [PATCH] virtio_ring: use smp_store_mb)
On 20/12/15 09:25, Michael S. Tsirkin wrote: > > I noticed that drivers/xen/xenbus/xenbus_comms.c uses > full memory barriers to communicate with the other side. > For example: > > /* Must write data /after/ reading the consumer index. * */ > mb(); > > memcpy(dst, data, avail); > data += avail; > len -= avail; > > /* Other side must not see new producer until data is * > there. */ > wmb(); > intf->req_prod += avail; > > /* Implies mb(): other side will see the updated producer. */ > notify_remote_via_evtchn(xen_store_evtchn); > > To me, it looks like for guests compiled with CONFIG_SMP, smp_wmb and smp_mb > would be sufficient, so mb() and wmb() here are only needed if > a non-SMP guest runs on an SMP host. > > Is my analysis correct? For x86, yes. For arm/arm64 I think so, but would prefer one of the Xen arm maintainers to confirm. In particular, whether inner-shareable barriers are sufficient for memory shared with the hypervisor. > So what I'm suggesting is something like the below patch, > except instead of using virtio directly, a new set of barriers > that behaves identically for SMP and non-SMP guests will be introduced. > > And of course the weak barriers flag is not needed for Xen - > that's a virtio only thing. > > For example: > > smp_pv_wmb() > smp_pv_rmb() > smp_pv_mb() The smp_ prefix doesn't make a lot of sense to me here since these barriers are going to be the same whether the kernel is SMP or not. David ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [Xen-devel] [PATCH RFC 0/3] Xen on Virtio
On 07/12/15 16:19, Stefano Stabellini wrote: > Hi all, > > this patch series introduces support for running Linux on top of Xen > inside a virtual machine with virtio devices (nested virt scenario). > The problem is that Linux virtio drivers use virt_to_phys to get the > guest pseudo-physical addresses to pass to the backend, which doesn't > work as expected on Xen. > > Switching the virtio drivers to the dma APIs (dma_alloc_coherent, > dma_map/unmap_single and dma_map/unmap_sg) would solve the problem, as > Xen support in Linux provides an implementation of the dma API which > takes care of the additional address conversions. However using the dma > API would increase the complexity of the non-Xen case too. We would also > need to keep track of the physical or virtual address in addition to the > dma address for each vring_desc to be able to free the memory in > detach_buf (see patch #3). > > Instead this series adds few obvious checks to perform address > translations in a couple of key places, without changing non-Xen code > paths. You are welcome to suggest improvements or alternative > implementations. Andy Lutomirski also looked at this. Andy what happened to this work? David ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [Xen-devel] [PATCH RFC 1/3] xen: export xen_phys_to_bus, xen_bus_to_phys and xen_virt_to_bus
On 07/12/15 16:19, Stefano Stabellini wrote: > Signed-off-by: Stefano Stabellini <stefano.stabell...@eu.citrix.com> Can you add a brief description about why these are being moved? Then, assuming this is needed in the end: Acked-by: David Vrabel <david.vra...@citrix.com> David ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH] paravirt: remove unused pv_apic_ops structure
On 17/11/15 13:44, Juergen Gross wrote: > The only member of that structure is startup_ipi_hook which is always > set to paravirt_nop. Reviewed-by: David Vrabel <david.vra...@citrix.com> David ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [Xen-devel] [PATCH v15 12/15] pvqspinlock, x86: Enable PV qspinlock for Xen
On 07/04/15 03:55, Waiman Long wrote: This patch adds the necessary Xen specific code to allow Xen to support the CPU halting and kicking operations needed by the queue spinlock PV code. This basically looks the same as the version I wrote, except I think you broke it. +static void xen_qlock_wait(u8 *byte, u8 val) +{ + int irq = __this_cpu_read(lock_kicker_irq); + + /* If kicker interrupts not initialized yet, just spin */ + if (irq == -1) + return; + + /* clear pending */ + xen_clear_irq_pending(irq); + + /* + * We check the byte value after clearing pending IRQ to make sure + * that we won't miss a wakeup event because of the clearing. My version had a barrier() here to ensure this. The documentation of READ_ONCE() suggests that it is not sufficient to meet this requirement (and a READ_ONCE() here is not required anyway). + * + * The sync_clear_bit() call in xen_clear_irq_pending() is atomic. + * So it is effectively a memory barrier for x86. + */ + if (READ_ONCE(*byte) != val) + return; + + /* + * If an interrupt happens here, it will leave the wakeup irq + * pending, which will cause xen_poll_irq() to return + * immediately. + */ + + /* Block until irq becomes pending (or perhaps a spurious wakeup) */ + xen_poll_irq(irq); +} David ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [Xen-devel] kasan_map_early_shadow() on Xen
On 03/03/15 09:40, Luis R. Rodriguez wrote: Andrey, I believe that on Xen we should disable kasan, would like confirmation Why? This is the first of heard of this. from someone on xen-devel though. Here's the thing though -- if true -- I'd like to do it *properly*, where *properly* means addressing a bit of architecture. A simple Kconfig slap seems rather reactive. I'd like to address a way to properly ensure we don't run into this and other similar issues in the future. The CR4 shadow issue was another recent example issue, also introduced via v4.0 [0]. We can't keep doing this reactively. Let's go down the rabbit hole for a bit. HAVE_ARCH_KASAN will be selected on x86 when: if X86_64 SPARSEMEM_VMEMMAP Now Xen should not have SPARSEMEM_VMEMMAP but PVOPs' goal is to enable Why? Again, this is the first I've heard of this as well. FWIW, all the Xen configs we use have SPARSEMEM_VMEMMAP enabled. David distributions to be able to have a single binary kernels and let the rest be figured out, so we can't just disable SPARSEMEM_VMEMMAP for Xen alone, we want to build Xen.. or part of Xen and perhaps keep SPARSEMEM_VMEMMAP, and only later figure things out. How do we do this cleanly and avoid future reactive measures? If the answer is not upon us, I'd like to at least highlight the issue so that in case we do come up with something its no surprise PVOPs is falling short for that single binary pipe dream right now. [0] https://lkml.org/lkml/2015/2/23/328 Luis ___ Xen-devel mailing list xen-de...@lists.xen.org http://lists.xen.org/xen-devel ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [Xen-devel] [PATCH V5] x86 spinlock: Fix memory corruption on completing completions
On 15/02/15 17:30, Raghavendra K T wrote: --- a/arch/x86/xen/spinlock.c +++ b/arch/x86/xen/spinlock.c @@ -41,7 +41,7 @@ static u8 zero_stats; static inline void check_zero(void) { u8 ret; - u8 old = ACCESS_ONCE(zero_stats); + u8 old = READ_ONCE(zero_stats); if (unlikely(old)) { ret = cmpxchg(zero_stats, old, 0); /* This ensures only one fellow resets the stat */ @@ -112,6 +112,7 @@ __visible void xen_lock_spinning(struct arch_spinlock *lock, __ticket_t want) struct xen_lock_waiting *w = this_cpu_ptr(lock_waiting); int cpu = smp_processor_id(); u64 start; + __ticket_t head; unsigned long flags; /* If kicker interrupts not initialized yet, just spin */ @@ -159,11 +160,15 @@ __visible void xen_lock_spinning(struct arch_spinlock *lock, __ticket_t want) */ __ticket_enter_slowpath(lock); + /* make sure enter_slowpath, which is atomic does not cross the read */ + smp_mb__after_atomic(); + /* * check again make sure it didn't become free while * we weren't looking */ - if (ACCESS_ONCE(lock-tickets.head) == want) { + head = READ_ONCE(lock-tickets.head); + if (__tickets_equal(head, want)) { add_stats(TAKEN_SLOW_PICKUP, 1); goto out; } @@ -204,8 +209,8 @@ static void xen_unlock_kick(struct arch_spinlock *lock, __ticket_t next) const struct xen_lock_waiting *w = per_cpu(lock_waiting, cpu); /* Make sure we read lock before want */ - if (ACCESS_ONCE(w-lock) == lock - ACCESS_ONCE(w-want) == next) { + if (READ_ONCE(w-lock) == lock + READ_ONCE(w-want) == next) { add_stats(RELEASED_SLOW_KICKED, 1); xen_send_IPI_one(cpu, XEN_SPIN_UNLOCK_VECTOR); break; Acked-by: David Vrabel david.vra...@citrix.com Although some of the ACCESS_ONCE to READ_ONCE changes are cosmetic and are perhaps best left out of a patch destined for stable. David ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [Xen-devel] [PATCH v14 11/11] pvqspinlock, x86: Enable PV qspinlock for XEN
On 20/01/15 20:12, Waiman Long wrote: This patch adds the necessary XEN specific code to allow XEN to support the CPU halting and kicking operations needed by the queue spinlock PV code. Xen is a word, please don't capitalize it. +void xen_lock_stats(int stat_types) +{ + if (stat_types PV_LOCKSTAT_WAKE_KICKED) + add_smp(wake_kick_stats, 1); + if (stat_types PV_LOCKSTAT_WAKE_SPURIOUS) + add_smp(wake_spur_stats, 1); + if (stat_types PV_LOCKSTAT_KICK_NOHALT) + add_smp(kick_nohlt_stats, 1); + if (stat_types PV_LOCKSTAT_HALT_QHEAD) + add_smp(halt_qhead_stats, 1); + if (stat_types PV_LOCKSTAT_HALT_QNODE) + add_smp(halt_qnode_stats, 1); + if (stat_types PV_LOCKSTAT_HALT_ABORT) + add_smp(halt_abort_stats, 1); +} +PV_CALLEE_SAVE_REGS_THUNK(xen_lock_stats); This is not inlined and the 6 test-and-branch cannot be optimized away. +/* + * Halt the current CPU release it back to the host + * Return 0 if halted, -1 otherwise. + */ +int xen_halt_cpu(u8 *byte, u8 val) +{ + int irq = __this_cpu_read(lock_kicker_irq); + unsigned long flags; + u64 start; + + /* If kicker interrupts not initialized yet, just spin */ + if (irq == -1) + return -1; + + /* + * Make sure an interrupt handler can't upset things in a + * partially setup state. + */ + local_irq_save(flags); + start = spin_time_start(); + + /* clear pending */ + xen_clear_irq_pending(irq); + + /* Allow interrupts while blocked */ + local_irq_restore(flags); It's not clear what partially setup state is being protected here. xen_clear_irq_pending() is an atomic bit clear. I think you can drop the irq save/restore here. + /* + * Don't halt if the content of the given byte address differs from + * the expected value. A read memory barrier is added to make sure that + * the latest value of the byte address is fetched. + */ + smp_rmb(); The atomic bit clear in xen_clear_irq_pending() acts as a full memory barrier. I don't think you need an additional memory barrier here, only a compiler one. I suggest using READ_ONCE(). + if (*byte != val) { + xen_lock_stats(PV_LOCKSTAT_HALT_ABORT); + return -1; + } + /* + * If an interrupt happens here, it will leave the wakeup irq + * pending, which will cause xen_poll_irq() to return + * immediately. + */ + + /* Block until irq becomes pending (or perhaps a spurious wakeup) */ + xen_poll_irq(irq); + spin_time_accum_blocked(start); + return 0; +} +PV_CALLEE_SAVE_REGS_THUNK(xen_halt_cpu); + +#endif /* CONFIG_QUEUE_SPINLOCK */ + static irqreturn_t dummy_handler(int irq, void *dev_id) { BUG(); David ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [Xen-devel] [RFC] Hypervisor RNG and enumeration
On 29/10/14 05:19, Andy Lutomirski wrote: CPUID leaf 4F02H: miscellaneous features [...] ### CommonHV RNG If CPUID.4F02H.EAX is nonzero, then it contains an MSR index used to communicate with a hypervisor random number generator. This MSR is referred to as MSR_COMMONHV_RNG. rdmsr(MSR_COMMONHV_RNG) returns a 64-bit best-effort random number. If the hypervisor is able to generate a 64-bit cryptographically secure random number, it SHOULD return it. If not, then the hypervisor SHOULD do its best to return a random number suitable for seeding a cryptographic RNG. A guest is expected to read MSR_COMMONHV_RNG several times in a row. The hypervisor SHOULD return different values each time. rdmsr(MSR_COMMONHV_RNG) MUST NOT result in an exception, but guests MUST NOT assume that its return value is indeed secure. For example, a hypervisor is free to return zero in response to rdmsr(MSR_COMMONHV_RNG). I would add: If the hypervisor's pool of random data is exhausted, it MAY return 0. The hypervisor MUST provide at least 4 (?) non-zero numbers to each guest. Xen does not have a continual source of entropy and the only feasible way is for the toolstack to provide each guest with a fixed size pool of random data during guest creation. The fixed size pool could be refilled by the guest if further random data is needed (e.g., before an in-guest kexec). wrmsr(MSR_COMMONHV_RNG) offers the hypervisor up to 64 bits of entropy. The hypervisor MAY use it as it sees fit to improve its own random number generator. A hypervisor SHOULD make a reasonable effort to avoid making values written to MSR_COMMONHV_RNG visible to untrusted parties, but guests SHOULD NOT write sensitive values to wrmsr(MSR_COMMONHV_RNG). I don't think unprivileged guests should be able to influence the hypervisor's RNG. Unless the intention here is it only affects the numbers returned to this guest? But since the write is optional, I don't object to it. David ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [Xen-devel] [RFC] Hypervisor RNG and enumeration
On 29/10/14 13:45, Paolo Bonzini wrote: On 10/29/2014 11:37 AM, Andrew Cooper wrote: While testing various nested combinations, XenServer has found that modern Windows Server versions must have the hypervisor bit hidden from them for them to be happy running HyperV, despite the fact that they will make use of the Viridian virtual extensions also provided. Right. As a result, while it is certainly advisable for the hypervisor bit to be set, CommonHV should be available to be found by paravirtualised drivers inside an OS which can't cope with the hypervisor bit set. Microsoft should just stop putting arbitrary limitations on their software; or pay the price which, in this case, is not being able to use the features from the common specification. I guess what they'd do is reinvent the RNG as a Viridian extension (if they need it). You can certainly do CPUID(0x4F00) even if HYPERVISOR=0. What you get back is undefined, but in all likelihood it won't be the CommonHVIntf string. Microsoft already has a specification to obtain a random number via an ACPI device. The VM Generation ID. http://www.microsoft.com/en-us/download/details.aspx?id=30707 David ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH v6 05/11] pvqspinlock, x86: Allow unfair spinlock in a PV guest
On 12/03/14 18:54, Waiman Long wrote: Locking is always an issue in a virtualized environment as the virtual CPU that is waiting on a lock may get scheduled out and hence block any progress in lock acquisition even when the lock has been freed. One solution to this problem is to allow unfair lock in a para-virtualized environment. In this case, a new lock acquirer can come and steal the lock if the next-in-line CPU to get the lock is scheduled out. Unfair lock in a native environment is generally not a good idea as there is a possibility of lock starvation for a heavily contended lock. I do not think this is a good idea -- the problems with unfair locks are worse in a virtualized guest. If a waiting VCPU deschedules and has to be kicked to grab a lock then it is very likely to lose a race with another running VCPU trying to take a lock (since it takes time for the VCPU to be rescheduled). With the unfair locking activated on bare metal 4-socket Westmere-EX box, the execution times (in ms) of a spinlock micro-benchmark were as follows: # ofTicket Fair Unfair taskslock queue lockqueue lock -- --- ---- 1 135135 137 2 1045 1120 747 3 1827 23451084 4 2689 29341438 5 3736 36581722 6 4942 44342092 7 6304 5176 2245 8 7736 5955 2388 Are these figures with or without the later PV support patches? David ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC v6 09/11] pvqspinlock, x86: Add qspinlock para-virtualization support
On 12/03/14 18:54, Waiman Long wrote: This patch adds para-virtualization support to the queue spinlock in the same way as was done in the PV ticket lock code. In essence, the lock waiters will spin for a specified number of times (QSPIN_THRESHOLD = 2^14) and then halted itself. The queue head waiter will spins 2*QSPIN_THRESHOLD times before halting itself. When it has spinned QSPIN_THRESHOLD times, the queue head will assume that the lock holder may be scheduled out and attempt to kick the lock holder CPU if it has the CPU number on hand. I don't really understand the reasoning for kicking the lock holder. It will either be: running, runnable, or halted because it's in a slow path wait for another lock. In any of these states I do not see how a kick is useful. Enabling the PV code does have a performance impact on spinlock acquisitions and releases. The following table shows the execution time (in ms) of a spinlock micro-benchmark that does lock/unlock operations 5M times for each task versus the number of contending tasks on a Westmere-EX system. # ofTicket lock Queue lock tasks PV off/PV on/%Change PV off/PV on/%Change -- - 1 135/ 179/+33% 137/ 169/+23% 2 1045/ 1103/ +6% 1120/ 1536/+37% 3 1827/ 2683/+47% 2313/ 2425/ +5% 4 2689/ 4191/+56% 2914/ 3128/ +7% 5 3736/ 5830/+56% 3715/ 3762/ +1% 6 4942/ 7609/+54% 4504/ 4558/ +2% 7 6304/ 9570/+52% 5292/ 5351/ +1% 8 7736/11323/+46% 6037/ 6097/ +1% Do you have measurements from tests when VCPUs are overcommitted? +#ifdef CONFIG_PARAVIRT_SPINLOCKS +/** + * queue_spin_unlock_slowpath - kick up the CPU of the queue head + * @lock : Pointer to queue spinlock structure + * + * The lock is released after finding the queue head to avoid racing + * condition between the queue head and the lock holder. + */ +void queue_spin_unlock_slowpath(struct qspinlock *lock) +{ + struct qnode *node, *prev; + u32 qcode = (u32)queue_get_qcode(lock); + + /* + * Get the queue tail node + */ + node = xlate_qcode(qcode); + + /* + * Locate the queue head node by following the prev pointer from + * tail to head. + * It is assumed that the PV guests won't have that many CPUs so + * that it won't take a long time to follow the pointers. This isn't a valid assumption, but this isn't that different from the search done in the ticket slow unlock path so I guess it's ok. David ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC v5 4/8] pvqspinlock, x86: Allow unfair spinlock in a real PV environment
On 04/03/14 15:15, Waiman Long wrote: On 03/03/2014 05:55 AM, Paolo Bonzini wrote: Il 28/02/2014 18:06, Waiman Long ha scritto: On 02/26/2014 12:07 PM, Konrad Rzeszutek Wilk wrote: On Wed, Feb 26, 2014 at 10:14:24AM -0500, Waiman Long wrote: Locking is always an issue in a virtualized environment as the virtual CPU that is waiting on a lock may get scheduled out and hence block any progress in lock acquisition even when the lock has been freed. One solution to this problem is to allow unfair lock in a para-virtualized environment. In this case, a new lock acquirer can come and steal the lock if the next-in-line CPU to get the lock is scheduled out. Unfair lock in a native environment is generally not a Hmm, how do you know if the 'next-in-line CPU' is scheduled out? As in the hypervisor knows - but you as a guest might have no idea of it. I use a heart-beat counter to see if the other side responses within a certain time limit. If not, I assume it has been scheduled out probably due to PLE. PLE is unnecessary if you have true pv spinlocks where the next-in-line schedules itself out with a hypercall (Xen) or hlt instruction (KVM). Set a bit in the qspinlock before going to sleep, and the lock owner will know that it needs to kick the next-in-line. I think there is no need for the unfair lock bits. 1-2% is a pretty large hit. Paolo I don't think that PLE is something that can be controlled by software. You can avoid PLE by not issuing PAUSE instructions when spinning. You may want to consider this if you have a lock that explicitly deschedules the VCPU while waiting (or just deschedule before PLE would trigger). It is done in hardware. I maybe wrong. Anyway, I plan to add code to schedule out the CPUs waiting in the queue except the first 2 in the next version of the patch. I think you should deschedule all waiters. The PV code in the v5 patch did seem to improve benchmark performance with moderate to heavy spinlock contention. The goal of PV aware locks is to improve performance when locks are contented /and/ VCPUs are over-committed. Is this something you're actually measuring? It's not clear to me. David ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [Xen-devel] [PATCH RFC v5 7/8] pvqspinlock, x86: Add qspinlock para-virtualization support
On 27/02/14 15:50, Paolo Bonzini wrote: Il 27/02/2014 16:22, Raghavendra K T ha scritto: On 02/27/2014 08:15 PM, Paolo Bonzini wrote: [...] But neither of the VCPUs being kicked here are halted -- they're either running or runnable (descheduled by the hypervisor). /me actually looks at Waiman's code... Right, this is really different from pvticketlocks, where the *unlock* primitive wakes up a sleeping VCPU. It is more similar to PLE (pause-loop exiting). Adding to the discussion, I see there are two possibilities here, considering that in undercommit cases we should not exceed HEAD_SPIN_THRESHOLD, 1. the looping vcpu in pv_head_spin_check() should do halt() considering that we have done enough spinning (more than typical lock-hold time), and hence we are in potential overcommit. 2. multiplex kick_cpu to do directed yield in qspinlock case. But this may result in some ping ponging? Actually, I think the qspinlock can work roughly the same as the pvticketlock, using the same lock_spinning and unlock_lock hooks. This is is approach I would like to see. This would also work for Xen PV guests. The current implementation depends on hardware PLE which Xen PV guests do not support and I'm not sure if other architectures have something similar (e.g., does ARM's virtualization extensions have PLE?). David ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC v5 7/8] pvqspinlock, x86: Add qspinlock para-virtualization support
On 26/02/14 15:14, Waiman Long wrote: This patch adds para-virtualization support to the queue spinlock code by enabling the queue head to kick the lock holder CPU, if known, in when the lock isn't released for a certain amount of time. It also enables the mutual monitoring of the queue head CPU and the following node CPU in the queue to make sure that their CPUs will stay scheduled in. I'm not really understanding how this is supposed to work. There appears to be an assumption that a guest can keep one of its VCPUs running by repeatedly kicking it? This is not possible under Xen and I doubt it's possible under KVM or any other hypervisor. David ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC v5 4/8] pvqspinlock, x86: Allow unfair spinlock in a real PV environment
On 26/02/14 15:14, Waiman Long wrote: Locking is always an issue in a virtualized environment as the virtual CPU that is waiting on a lock may get scheduled out and hence block any progress in lock acquisition even when the lock has been freed. One solution to this problem is to allow unfair lock in a para-virtualized environment. In this case, a new lock acquirer can come and steal the lock if the next-in-line CPU to get the lock is scheduled out. Unfair lock in a native environment is generally not a good idea as there is a possibility of lock starvation for a heavily contended lock. I'm not sure I'm keen on losing the fairness in PV environment. I'm concerned that on an over-committed host, the lock starvation problem will be particularly bad. But I'll have to revist this once a non-broken PV qspinlock implementation exists (or someone explains how the proposed one works). David ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC v5 7/8] pvqspinlock, x86: Add qspinlock para-virtualization support
On 27/02/14 13:11, Paolo Bonzini wrote: Il 27/02/2014 13:11, David Vrabel ha scritto: This patch adds para-virtualization support to the queue spinlock code by enabling the queue head to kick the lock holder CPU, if known, in when the lock isn't released for a certain amount of time. It also enables the mutual monitoring of the queue head CPU and the following node CPU in the queue to make sure that their CPUs will stay scheduled in. I'm not really understanding how this is supposed to work. There appears to be an assumption that a guest can keep one of its VCPUs running by repeatedly kicking it? This is not possible under Xen and I doubt it's possible under KVM or any other hypervisor. KVM allows any VCPU to wake up a currently halted VCPU of its choice, see Documentation/virtual/kvm/hypercalls.txt. But neither of the VCPUs being kicked here are halted -- they're either running or runnable (descheduled by the hypervisor). David ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [Xen-devel] [PATCH linux-next] xen/multicall: xen_mc_callback(): avoid buffer overflow
On 08/02/2013 20:34, Tim Gardner wrote: This buffer overflow was introduced with 91e0c5f3dad47838cb2ecc1865ce789a0b7182b1 (2.6.24). There's no buffer overflow here. xen_mc_flush() resets b-cbidx. David arch/x86/xen/multicalls.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/x86/xen/multicalls.c b/arch/x86/xen/multicalls.c index 0d82003..5270407 100644 --- a/arch/x86/xen/multicalls.c +++ b/arch/x86/xen/multicalls.c @@ -195,9 +195,10 @@ void xen_mc_callback(void (*fn)(void *), void *data) struct mc_buffer *b = __get_cpu_var(mc_buffer); struct callback *cb; - if (b-cbidx == MC_BATCH) { + if (b-cbidx = MC_BATCH) { trace_xen_mc_flush_reason(XEN_MC_FL_CALLBACK); xen_mc_flush(); + return; } trace_xen_mc_callback(fn, data); ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
On 11/01/13 13:22, Daniel Kiper wrote: On Thu, Jan 10, 2013 at 02:19:55PM +, David Vrabel wrote: On 04/01/13 17:01, Daniel Kiper wrote: My .5 cents: - We should focus on KEXEC_CMD_kexec_load and KEXEC_CMD_kexec_unload; probably we should introduce KEXEC_CMD_kexec_load2 and KEXEC_CMD_kexec_unload2; load should __LOAD__ kernel image and other things into hypervisor memory; Yes, but I don't see how we can easily support both ABIs easily. I'd be in favour of replacing the existing hypercalls and requiring updated kexec tools in dom0 (this isn't that different to requiring the correct libxc in dom0). Why? Just define new strutures for new functions of kexec hypercall. That should suffice. The current hypervisor ABI depends on an internal kernel ABI (i.e., the ABI provided by relocate_kernel). We do not want hypervisor internals to be constrained by having to be compatible with kernel internals. - Hmmm... Now I think that we should still use kexec syscall to load image into Xen memory (with new KEXEC_CMD_kexec_load2) because it establishes all things which are needed to call kdump if dom0 crashes; however, I could be wrong... I don't think we need the kexec syscall. The kernel can unconditionally do the crash hypercall, which will return if the kdump kernel isn't loaded and the kernel can fall back to the regular non-kexec panic. No, please do not do that. When you call HYPERVISOR_kexec_op(KEXEC_CMD_kexec) system is completly shutdown. Return form HYPERVISOR_kexec_op(KEXEC_CMD_kexec) would require to restore some kernel functionalities. It maybe impossible in some cases. Additionally, it means that some changes should be made in generic kexec code path. As I know kexec maintainers are very reluctant to make such things. Huh? There only needs to be a call to a new hypervisor_crash_kexec() function (which would then call the Xen specific crash hypercall) at the very beginning of crash_kexec(). If this returns the normal crash/shutdown path is done (which could even include a guest kexec!). David ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
On 04/01/13 17:01, Daniel Kiper wrote: On Fri, Jan 04, 2013 at 02:38:44PM +, David Vrabel wrote: On 04/01/13 14:22, Daniel Kiper wrote: On Wed, Jan 02, 2013 at 11:26:43AM +, Andrew Cooper wrote: On 27/12/12 18:02, Eric W. Biederman wrote: Andrew Cooperandrew.coop...@citrix.com writes: On 27/12/2012 07:53, Eric W. Biederman wrote: The syscall ABI still has the wrong semantics. Aka totally unmaintainable and umergeable. The concept of domU support is also strange. What does domU support even mean, when the dom0 support is loading a kernel to pick up Xen when Xen falls over. There are two requirements pulling at this patch series, but I agree that we need to clarify them. It probably make sense to split them apart a little even. Thinking about this split, there might be a way to simply it even more. /sbin/kexec can load the Xen crash kernel itself by issuing hypercalls using /dev/xen/privcmd. This would remove the need for the dom0 kernel to distinguish between loading a crash kernel for itself and loading a kernel for Xen. Or is this just a silly idea complicating the matter? This is impossible with current Xen kexec/kdump interface. It should be changed to do that. However, I suppose that Xen community would not be interested in such changes. I don't see why the hypercall ABI cannot be extended with new sub-ops that do the right thing -- the existing ABI is a bit weird. I plan to start prototyping something shortly (hopefully next week) for the Xen kexec case. Wow... As I can this time Xen community is interested in... That is great. I agree that current kexec interface is not ideal. I spent some more time looking at the existing interface and implementation and it really is broken. David, I am happy to help in that process. However, if you wish I could carry it myself. Anyway, it looks that I should hold on with my Linux kexec/kdump patches. I should be able to post some prototype patches for Xen in a few weeks. No guarantees though. My .5 cents: - We should focus on KEXEC_CMD_kexec_load and KEXEC_CMD_kexec_unload; probably we should introduce KEXEC_CMD_kexec_load2 and KEXEC_CMD_kexec_unload2; load should __LOAD__ kernel image and other things into hypervisor memory; Yes, but I don't see how we can easily support both ABIs easily. I'd be in favour of replacing the existing hypercalls and requiring updated kexec tools in dom0 (this isn't that different to requiring the correct libxc in dom0). I suppose that allmost all things could be copied from linux/kernel/kexec.c, linux/arch/x86/kernel/{machine_kexec_$(BITS).c,relocate_kernel_$(BITS).c}; I think that KEXEC_CMD_kexec should stay as is, I don't think we want all the junk from Linux inside Xen -- we only want to support the kdump case and do not have to handle returning from the kexec image. - Hmmm... Now I think that we should still use kexec syscall to load image into Xen memory (with new KEXEC_CMD_kexec_load2) because it establishes all things which are needed to call kdump if dom0 crashes; however, I could be wrong... I don't think we need the kexec syscall. The kernel can unconditionally do the crash hypercall, which will return if the kdump kernel isn't loaded and the kernel can fall back to the regular non-kexec panic. This will allow the kexec syscall to be used only for the domU kexec case. - last but not least, we should think about support for PV guests too. I won't be looking at this. To avoid confusion about the two largely orthogonal sorts of kexec how about defining some terms. I suggest: Xen kexec: Xen executes the image in response to a Xen crash or a hypercall from a privileged domain. Guest kexec: The guest kernel executes the images within the domain in response to a guest kernel crash or a system call. David ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation
On 04/01/13 14:22, Daniel Kiper wrote: On Wed, Jan 02, 2013 at 11:26:43AM +, Andrew Cooper wrote: On 27/12/12 18:02, Eric W. Biederman wrote: Andrew Cooperandrew.coop...@citrix.com writes: On 27/12/2012 07:53, Eric W. Biederman wrote: The syscall ABI still has the wrong semantics. Aka totally unmaintainable and umergeable. The concept of domU support is also strange. What does domU support even mean, when the dom0 support is loading a kernel to pick up Xen when Xen falls over. There are two requirements pulling at this patch series, but I agree that we need to clarify them. It probably make sense to split them apart a little even. Thinking about this split, there might be a way to simply it even more. /sbin/kexec can load the Xen crash kernel itself by issuing hypercalls using /dev/xen/privcmd. This would remove the need for the dom0 kernel to distinguish between loading a crash kernel for itself and loading a kernel for Xen. Or is this just a silly idea complicating the matter? This is impossible with current Xen kexec/kdump interface. It should be changed to do that. However, I suppose that Xen community would not be interested in such changes. I don't see why the hypercall ABI cannot be extended with new sub-ops that do the right thing -- the existing ABI is a bit weird. I plan to start prototyping something shortly (hopefully next week) for the Xen kexec case. David ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [Xen-devel] [patch 1/3] xen/privcmd: check for integer overflow in ioctl
On 08/09/12 10:52, Dan Carpenter wrote: If m.num is too large then the m.num * sizeof(*m.arr) multiplication could overflow and the access_ok() check wouldn't test the right size. m.num is range checked later on so it doesn't matter that the access_ok() checks might be wrong. A bit subtle, perhaps. David Signed-off-by: Dan Carpenter dan.carpen...@oracle.com --- Only needed in linux-next. diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c index 215a3c0..fdff8f9 100644 --- a/drivers/xen/privcmd.c +++ b/drivers/xen/privcmd.c @@ -325,6 +325,8 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, int version) return -EFAULT; /* Returns per-frame error in m.arr. */ m.err = NULL; + if (m.num SIZE_MAX / sizeof(*m.arr)) + return -EINVAL; if (!access_ok(VERIFY_WRITE, m.arr, m.num * sizeof(*m.arr))) return -EFAULT; break; @@ -332,6 +334,8 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, int version) if (copy_from_user(m, udata, sizeof(struct privcmd_mmapbatch_v2))) return -EFAULT; /* Returns per-frame error code in m.err. */ + if (m.num SIZE_MAX / sizeof(*m.err)) + return -EINVAL; if (!access_ok(VERIFY_WRITE, m.err, m.num * (sizeof(*m.err return -EFAULT; break; ___ Xen-devel mailing list xen-de...@lists.xen.org http://lists.xen.org/xen-devel ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [Xen-devel] [patch 3/3] xen/privcmd: remove const modifier from declaration
On 10/09/12 10:03, Jan Beulich wrote: On 08.09.12 at 11:58, Dan Carpenter dan.carpen...@oracle.com wrote: When we use this pointer, we cast away the const modifier and modify the data. I think it was an accident to declare it as const. NAK - the const is very valid here, as the v2 interface (as opposed to the v1 one) does _not_ modify this array (or if it does, it's a bug). This is a guarantee made to user mode, so it should also be expressed that way in the interface. But of course the cast used before this patch isn't right either, as it indeed inappropriately discards the qualifier. Afaiu this was done to simplify the internal workings of the code, but I don't think it's desirable to sacrifice type safety for implementation simplicity. m.arr here isn't const as m is really the V1 structure where m.arr is non-const. David ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [Xen-devel] [PATCH] xen: populate correct number of pages when across mem boundary
On 04/07/12 07:49, zhenzhong.duan wrote: When populate pages across a mem boundary at bootup, the page count populated isn't correct. This is due to mem populated to non-mem region and ignored. Pfn range is also wrongly aligned when mem boundary isn't page aligned. Also need consider the rare case when xen_do_chunk fail(populate). For a dom0 booted with dom_mem=3368952K(0xcd9ff000-4k) dmesg diff is: [0.00] Freeing 9e-100 pfn range: 98 pages freed [0.00] 1-1 mapping on 9e-100 [0.00] 1-1 mapping on cd9ff-10 [0.00] Released 98 pages of unused memory [0.00] Set 206435 page(s) to 1-1 mapping -[0.00] Populating cd9fe-cda00 pfn range: 1 pages added +[0.00] Populating cd9fe-cd9ff pfn range: 1 pages added +[0.00] Populating 10-100061 pfn range: 97 pages added [0.00] BIOS-provided physical RAM map: [0.00] Xen: - 0009e000 (usable) [0.00] Xen: 000a - 0010 (reserved) [0.00] Xen: 0010 - cd9ff000 (usable) [0.00] Xen: cd9ffc00 - cda53c00 (ACPI NVS) ... [0.00] Xen: 0001 - 000100061000 (usable) [0.00] Xen: 000100061000 - 00012c00 (unusable) ... [0.00] MEMBLOCK configuration: ... -[0.00] reserved[0x4] [0x00cd9ff000-0x00cd9ffbff], 0xc00 bytes -[0.00] reserved[0x5] [0x01-0x0100060fff], 0x61000 bytes Related xen memory layout: (XEN) Xen-e820 RAM map: (XEN) - 0009ec00 (usable) (XEN) 000f - 0010 (reserved) (XEN) 0010 - cd9ffc00 (usable) Signed-off-by: Zhenzhong Duan zhenzhong.d...@oracle.com --- arch/x86/xen/setup.c | 24 +++- 1 files changed, 11 insertions(+), 13 deletions(-) diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c index a4790bf..bd78773 100644 --- a/arch/x86/xen/setup.c +++ b/arch/x86/xen/setup.c @@ -157,50 +157,48 @@ static unsigned long __init xen_populate_chunk( unsigned long dest_pfn; for (i = 0, entry = list; i map_size; i++, entry++) { - unsigned long credits = credits_left; unsigned long s_pfn; unsigned long e_pfn; unsigned long pfns; long capacity; - if (credits = 0) + if (credits_left = 0) break; if (entry-type != E820_RAM) continue; - e_pfn = PFN_UP(entry-addr + entry-size); + e_pfn = PFN_DOWN(entry-addr + entry-size); Ok. /* We only care about E820 after the xen_start_info-nr_pages */ if (e_pfn = max_pfn) continue; - s_pfn = PFN_DOWN(entry-addr); + s_pfn = PFN_UP(entry-addr); Ok. /* If the E820 falls within the nr_pages, we want to start * at the nr_pages PFN. * If that would mean going past the E820 entry, skip it */ +again: if (s_pfn = max_pfn) { capacity = e_pfn - max_pfn; dest_pfn = max_pfn; } else { - /* last_pfn MUST be within E820_RAM regions */ - if (*last_pfn e_pfn = *last_pfn) - s_pfn = *last_pfn; capacity = e_pfn - s_pfn; dest_pfn = s_pfn; } - /* If we had filled this E820_RAM entry, go to the next one. */ - if (capacity = 0) - continue; - if (credits capacity) - credits = capacity; + if (credits_left capacity) + capacity = credits_left; - pfns = xen_do_chunk(dest_pfn, dest_pfn + credits, false); + pfns = xen_do_chunk(dest_pfn, dest_pfn + capacity, false); done += pfns; credits_left -= pfns; *last_pfn = (dest_pfn + pfns); + if (credits_left 0 *last_pfn e_pfn) { + s_pfn = *last_pfn; + goto again; + } This looks like it will loop forever if xen_do_chunk() repeatedly fails because Xen is out of pages. I think if xen_do_chunk() cannot get a page from Xen the repopulation process should stop -- aborting this chunk and any others. This will allow the guest to continue to boot just with less memory than expected. David ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization