Re: [Xen-devel] [PATCH v4 4/6] xen: add xen_pin_vcpu() to support calling functions on a dedicated pcpu

2016-04-05 Thread David Vrabel
On 05/04/16 11:01, Juergen Gross wrote:
> 
> No, I don't think this is a good idea. In the EINVAL or EBUSY case a
> simple Xen admin command might be enough to make the next call succeed.
> I don't want to disable pinning in this case.

Ok.

Acked-by: David Vrabel <david.vra...@citrix.com>

David
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] [PATCH v4 4/6] xen: add xen_pin_vcpu() to support calling functions on a dedicated pcpu

2016-04-05 Thread David Vrabel
On 05/04/16 06:10, Juergen Gross wrote:
> Some hardware models (e.g. Dell Studio 1555 laptops) require calls to
> the firmware to be issued on cpu 0 only. As Dom0 might have to use
> these calls, add xen_pin_vcpu() to achieve this functionality.
> 
> In case either the domain doesn't have the privilege to make the
> related hypercall or the hypervisor isn't supporting it, issue a
> warning once and disable further pinning attempts.
[...]
> --- a/arch/x86/xen/enlighten.c
> +++ b/arch/x86/xen/enlighten.c
> @@ -1885,6 +1885,45 @@ static void xen_set_cpu_features(struct cpuinfo_x86 *c)
>   }
>  }
>  
> +static void xen_pin_vcpu(int cpu)
> +{
> + static bool disable_pinning;
> + struct sched_pin_override pin_override;
> + int ret;
> +
> + if (disable_pinning)
> + return;
> +
> + pin_override.pcpu = cpu;
> + ret = HYPERVISOR_sched_op(SCHEDOP_pin_override, _override);

/* Ignore errors when removing override. */
> + if (cpu < 0)
> + return;
> +
> + switch (ret) {
> + case -ENOSYS:
> + pr_warn("The kernel tried to call a function on physical cpu 
> %d, but Xen isn't\n"
> + "supporting this. In case of problems you might 
> consider vcpu pinning.\n",
> + cpu);
> + disable_pinning = true;
> + break;
> + case -EPERM:
> + WARN(1, "Trying to pin vcpu without having privilege to do 
> so\n");
> + disable_pinning = true;
> + break;
> + case -EINVAL:
> + case -EBUSY:
> + pr_warn("The kernel tried to call a function on physical cpu 
> %d, but this cpu\n"
> + "seems not to be available. Please check your Xen cpu 
> configuration.\n",
> + cpu);
> + break;
> + case 0:
> + break;
> + default:
> + WARN(1, "rc %d while trying to pin vcpu\n", ret);
> + disable_pinning = true;
> + }

These messages are a bit wordy for my taste and since they don't say
what function failed or what driver is affected they're not as useful as
they could be.  I'd probably turn these all into:

if (cpu >= 0 && ret < 0) {
pr_warn("Failed to pin VCPU %d to physical CPU %d: %d",
smp_processor_id(), cpu, ret);
disable_pinning = true;
}

And look at getting the user of this API to print a more useful error.

"i8k: unable to call SMM BIOS on physical CPU %d: %d"

Or whatever.

David
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] [PATCH v4 1/6] xen: sync xen header

2016-04-05 Thread David Vrabel
On 05/04/16 06:10, Juergen Gross wrote:
> Import the actual version of include/xen/interface/sched.h from Xen.

Acked-by: David Vrabel <david.vra...@citrix.com>

David
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] [PATCH v2 5/6] virt, sched: add cpu pinning to smp_call_sync_on_phys_cpu()

2016-03-21 Thread David Vrabel
On 21/03/16 13:42, Jan Beulich wrote:
>
> Also don't you need to call smp_processor_id() after preempt_disable()?

I suggest using get_cpu()/put_cpu() here.

David
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v7 9/9] vring: Use the DMA API on Xen

2016-02-03 Thread David Vrabel
On 03/02/16 05:46, Andy Lutomirski wrote:
> Signed-off-by: Andy Lutomirski 

You forgot the previous Reviewed-by tags.

David
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v6 9/9] vring: Use the DMA API on Xen

2016-02-01 Thread David Vrabel
On 01/02/16 18:00, Andy Lutomirski wrote:
> Signed-off-by: Andy Lutomirski <l...@kernel.org>

Reviewed-by: David Vrabel <david.vra...@citrix.com>

Thanks.

David

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] [PATCH v5 09/10] vring: Use the DMA API on Xen

2016-01-29 Thread David Vrabel
On 29/01/16 02:31, Andy Lutomirski wrote:
> Signed-off-by: Andy Lutomirski 
> ---
>  drivers/virtio/virtio_ring.c | 12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
> index c169c6444637..305c05cc249a 100644
> --- a/drivers/virtio/virtio_ring.c
> +++ b/drivers/virtio/virtio_ring.c
> @@ -47,6 +47,18 @@
>  
>  static bool vring_use_dma_api(void)
>  {
> +#if defined(CONFIG_X86) && defined(CONFIG_XEN)
> + /*
> +  * In theory, it's possible to have a buggy QEMU-supposed
> +  * emulated Q35 IOMMU and Xen enabled at the same time.  On
> +  * such a configuration, virtio has never worked and will
> +  * not work without an even larger kludge.  Instead, enable
> +  * the DMA API if we're a Xen guest, which at least allows
> +  * all of the sensible Xen configurations to work correctly.
> +  */
> + return static_cpu_has(X86_FEATURE_XENPV);

You want:

if (xen_domain())
return true;

Without the #if so we use the DMA API for all types of Xen guest on all
architectures.

David
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 39/41] xen/events: use virt_xxx barriers

2016-01-11 Thread David Vrabel
On 10/01/16 14:21, Michael S. Tsirkin wrote:
> drivers/xen/events/events_fifo.c uses rmb() to communicate with the
> other side.
> 
> For guests compiled with CONFIG_SMP, smp_rmb would be sufficient, so
> rmb() here is only needed if a non-SMP guest runs on an SMP host.
> 
> Switch to the virt_rmb barrier which serves this exact purpose.
> 
> Pull in asm/barrier.h here to make sure the file is self-contained.
> 
> Suggested-by: David Vrabel <david.vra...@citrix.com>
> Signed-off-by: Michael S. Tsirkin <m...@redhat.com>

Acked-by: David Vrabel <david.vra...@citrix.com>

David
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] [PATCH v2 34/34] xen/io: use virt_xxx barriers

2016-01-04 Thread David Vrabel
On 31/12/15 19:10, Michael S. Tsirkin wrote:
> include/xen/interface/io/ring.h uses
> full memory barriers to communicate with the other side.
> 
> For guests compiled with CONFIG_SMP, smp_wmb and smp_mb
> would be sufficient, so mb() and wmb() here are only needed if
> a non-SMP guest runs on an SMP host.
> 
> Switch to virt_xxx barriers which serve this exact purpose.

Acked-by: David Vrabel <david.vra...@citrix.com>

David
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] [PATCH v2 33/34] xenbus: use virt_xxx barriers

2016-01-04 Thread David Vrabel
On 31/12/15 19:10, Michael S. Tsirkin wrote:
> drivers/xen/xenbus/xenbus_comms.c uses
> full memory barriers to communicate with the other side.
> 
> For guests compiled with CONFIG_SMP, smp_wmb and smp_mb
> would be sufficient, so mb() and wmb() here are only needed if
> a non-SMP guest runs on an SMP host.
> 
> Switch to virt_xxx barriers which serve this exact purpose.

Acked-by: David Vrabel <david.vra...@citrix.com>

If you're feeling particularly keen there's a rmb() consume_one_event()
in drivers/xen/events/events_fifo.c that can be converted to virt_rmb()
as well.

David
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] new barrier type for paravirt (was Re: [PATCH] virtio_ring: use smp_store_mb)

2015-12-21 Thread David Vrabel
On 20/12/15 09:25, Michael S. Tsirkin wrote:
> 
> I noticed that drivers/xen/xenbus/xenbus_comms.c uses
> full memory barriers to communicate with the other side.
> For example:
> 
> /* Must write data /after/ reading the consumer index.  * */
> mb();
> 
> memcpy(dst, data, avail);
> data += avail;
> len -= avail;
> 
> /* Other side must not see new producer until data is * 
> there. */
> wmb();
> intf->req_prod += avail;
> 
> /* Implies mb(): other side will see the updated producer. */
> notify_remote_via_evtchn(xen_store_evtchn);
> 
> To me, it looks like for guests compiled with CONFIG_SMP, smp_wmb and smp_mb
> would be sufficient, so mb() and wmb() here are only needed if
> a non-SMP guest runs on an SMP host.
> 
> Is my analysis correct?

For x86, yes.

For arm/arm64 I think so, but would prefer one of the Xen arm
maintainers to confirm.  In particular, whether inner-shareable barriers
are sufficient for memory shared with the hypervisor.

> So what I'm suggesting is something like the below patch,
> except instead of using virtio directly, a new set of barriers
> that behaves identically for SMP and non-SMP guests will be introduced.
> 
> And of course the weak barriers flag is not needed for Xen -
> that's a virtio only thing.
> 
> For example:
> 
> smp_pv_wmb()
> smp_pv_rmb()
> smp_pv_mb()

The smp_ prefix doesn't make a lot of sense to me here since these
barriers are going to be the same whether the kernel is SMP or not.

David
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] [PATCH RFC 0/3] Xen on Virtio

2015-12-14 Thread David Vrabel
On 07/12/15 16:19, Stefano Stabellini wrote:
> Hi all,
> 
> this patch series introduces support for running Linux on top of Xen
> inside a virtual machine with virtio devices (nested virt scenario).
> The problem is that Linux virtio drivers use virt_to_phys to get the
> guest pseudo-physical addresses to pass to the backend, which doesn't
> work as expected on Xen.
> 
> Switching the virtio drivers to the dma APIs (dma_alloc_coherent,
> dma_map/unmap_single and dma_map/unmap_sg) would solve the problem, as
> Xen support in Linux provides an implementation of the dma API which
> takes care of the additional address conversions. However using the dma
> API would increase the complexity of the non-Xen case too. We would also
> need to keep track of the physical or virtual address in addition to the
> dma address for each vring_desc to be able to free the memory in
> detach_buf (see patch #3).
> 
> Instead this series adds few obvious checks to perform address
> translations in a couple of key places, without changing non-Xen code
> paths. You are welcome to suggest improvements or alternative
> implementations.

Andy Lutomirski also looked at this.  Andy what happened to this work?

David
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] [PATCH RFC 1/3] xen: export xen_phys_to_bus, xen_bus_to_phys and xen_virt_to_bus

2015-12-14 Thread David Vrabel
On 07/12/15 16:19, Stefano Stabellini wrote:
> Signed-off-by: Stefano Stabellini <stefano.stabell...@eu.citrix.com>

Can you add a brief description about why these are being moved?  Then,
assuming this is needed in the end:

Acked-by: David Vrabel <david.vra...@citrix.com>

David
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] paravirt: remove unused pv_apic_ops structure

2015-11-17 Thread David Vrabel
On 17/11/15 13:44, Juergen Gross wrote:
> The only member of that structure is startup_ipi_hook which is always
> set to paravirt_nop.

Reviewed-by: David Vrabel <david.vra...@citrix.com>

David

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] [PATCH v15 12/15] pvqspinlock, x86: Enable PV qspinlock for Xen

2015-04-08 Thread David Vrabel
On 07/04/15 03:55, Waiman Long wrote:
 This patch adds the necessary Xen specific code to allow Xen to
 support the CPU halting and kicking operations needed by the queue
 spinlock PV code.

This basically looks the same as the version I wrote, except I think you
broke it.

 +static void xen_qlock_wait(u8 *byte, u8 val)
 +{
 + int irq = __this_cpu_read(lock_kicker_irq);
 +
 + /* If kicker interrupts not initialized yet, just spin */
 + if (irq == -1)
 + return;
 +
 + /* clear pending */
 + xen_clear_irq_pending(irq);
 +
 + /*
 +  * We check the byte value after clearing pending IRQ to make sure
 +  * that we won't miss a wakeup event because of the clearing.

My version had a barrier() here to ensure this.  The documentation of
READ_ONCE() suggests that it is not sufficient to meet this requirement
(and a READ_ONCE() here is not required anyway).

 +  *
 +  * The sync_clear_bit() call in xen_clear_irq_pending() is atomic.
 +  * So it is effectively a memory barrier for x86.
 +  */
 + if (READ_ONCE(*byte) != val)
 + return;
 +
 + /*
 +  * If an interrupt happens here, it will leave the wakeup irq
 +  * pending, which will cause xen_poll_irq() to return
 +  * immediately.
 +  */
 +
 + /* Block until irq becomes pending (or perhaps a spurious wakeup) */
 + xen_poll_irq(irq);
 +}

David

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] kasan_map_early_shadow() on Xen

2015-03-03 Thread David Vrabel
On 03/03/15 09:40, Luis R. Rodriguez wrote:
 Andrey,
 
 I believe that on Xen we should disable kasan, would like confirmation

Why?  This is the first of heard of this.

 from someone on xen-devel though. Here's the thing though -- if true
 -- I'd like to do it *properly*, where *properly* means addressing a
 bit of architecture. A simple Kconfig slap seems rather reactive. I'd
 like to address a way to properly ensure we don't run into this and
 other similar issues in the future. The CR4 shadow issue was another
 recent example issue, also introduced via v4.0 [0]. We can't keep
 doing this reactively.
 
 Let's go down the rabbit hole for a bit. HAVE_ARCH_KASAN will be
 selected on x86 when:
 
 if X86_64  SPARSEMEM_VMEMMAP
 
 Now Xen should not have SPARSEMEM_VMEMMAP but PVOPs' goal is to enable

Why?  Again, this is the first I've heard of this as well.  FWIW, all
the Xen configs we use have SPARSEMEM_VMEMMAP enabled.

David

 distributions to be able to have a single binary kernels and let the
 rest be figured out, so we can't just disable SPARSEMEM_VMEMMAP for
 Xen alone, we want to build Xen.. or part of Xen and perhaps keep
 SPARSEMEM_VMEMMAP, and only later figure things out.
 
 How do we do this cleanly and avoid future reactive measures? If the
 answer is not upon us, I'd like to at least highlight the issue so
 that in case we do come up with something its no surprise PVOPs is
 falling short for that single binary pipe dream right now.
 
 [0] https://lkml.org/lkml/2015/2/23/328
 
  Luis
 
 ___
 Xen-devel mailing list
 xen-de...@lists.xen.org
 http://lists.xen.org/xen-devel
 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] [PATCH V5] x86 spinlock: Fix memory corruption on completing completions

2015-02-16 Thread David Vrabel
On 15/02/15 17:30, Raghavendra K T wrote:
 --- a/arch/x86/xen/spinlock.c
 +++ b/arch/x86/xen/spinlock.c
 @@ -41,7 +41,7 @@ static u8 zero_stats;
  static inline void check_zero(void)
  {
   u8 ret;
 - u8 old = ACCESS_ONCE(zero_stats);
 + u8 old = READ_ONCE(zero_stats);
   if (unlikely(old)) {
   ret = cmpxchg(zero_stats, old, 0);
   /* This ensures only one fellow resets the stat */
 @@ -112,6 +112,7 @@ __visible void xen_lock_spinning(struct arch_spinlock 
 *lock, __ticket_t want)
   struct xen_lock_waiting *w = this_cpu_ptr(lock_waiting);
   int cpu = smp_processor_id();
   u64 start;
 + __ticket_t head;
   unsigned long flags;
  
   /* If kicker interrupts not initialized yet, just spin */
 @@ -159,11 +160,15 @@ __visible void xen_lock_spinning(struct arch_spinlock 
 *lock, __ticket_t want)
*/
   __ticket_enter_slowpath(lock);
  
 + /* make sure enter_slowpath, which is atomic does not cross the read */
 + smp_mb__after_atomic();
 +
   /*
* check again make sure it didn't become free while
* we weren't looking
*/
 - if (ACCESS_ONCE(lock-tickets.head) == want) {
 + head = READ_ONCE(lock-tickets.head);
 + if (__tickets_equal(head, want)) {
   add_stats(TAKEN_SLOW_PICKUP, 1);
   goto out;
   }
 @@ -204,8 +209,8 @@ static void xen_unlock_kick(struct arch_spinlock *lock, 
 __ticket_t next)
   const struct xen_lock_waiting *w = per_cpu(lock_waiting, cpu);
  
   /* Make sure we read lock before want */
 - if (ACCESS_ONCE(w-lock) == lock 
 - ACCESS_ONCE(w-want) == next) {
 + if (READ_ONCE(w-lock) == lock 
 + READ_ONCE(w-want) == next) {
   add_stats(RELEASED_SLOW_KICKED, 1);
   xen_send_IPI_one(cpu, XEN_SPIN_UNLOCK_VECTOR);
   break;

Acked-by: David Vrabel david.vra...@citrix.com

Although some of the ACCESS_ONCE to READ_ONCE changes are cosmetic and
are perhaps best left out of a patch destined for stable.

David

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] [PATCH v14 11/11] pvqspinlock, x86: Enable PV qspinlock for XEN

2015-01-21 Thread David Vrabel
On 20/01/15 20:12, Waiman Long wrote:
 This patch adds the necessary XEN specific code to allow XEN to
 support the CPU halting and kicking operations needed by the queue
 spinlock PV code.

Xen is a word, please don't capitalize it.

 +void xen_lock_stats(int stat_types)
 +{
 + if (stat_types  PV_LOCKSTAT_WAKE_KICKED)
 + add_smp(wake_kick_stats, 1);
 + if (stat_types  PV_LOCKSTAT_WAKE_SPURIOUS)
 + add_smp(wake_spur_stats, 1);
 + if (stat_types  PV_LOCKSTAT_KICK_NOHALT)
 + add_smp(kick_nohlt_stats, 1);
 + if (stat_types  PV_LOCKSTAT_HALT_QHEAD)
 + add_smp(halt_qhead_stats, 1);
 + if (stat_types  PV_LOCKSTAT_HALT_QNODE)
 + add_smp(halt_qnode_stats, 1);
 + if (stat_types  PV_LOCKSTAT_HALT_ABORT)
 + add_smp(halt_abort_stats, 1);
 +}
 +PV_CALLEE_SAVE_REGS_THUNK(xen_lock_stats);

This is not inlined and the 6 test-and-branch cannot be optimized away.

 +/*
 + * Halt the current CPU  release it back to the host
 + * Return 0 if halted, -1 otherwise.
 + */
 +int xen_halt_cpu(u8 *byte, u8 val)
 +{
 + int irq = __this_cpu_read(lock_kicker_irq);
 + unsigned long flags;
 + u64 start;
 +
 + /* If kicker interrupts not initialized yet, just spin */
 + if (irq == -1)
 + return -1;
 +
 + /*
 +  * Make sure an interrupt handler can't upset things in a
 +  * partially setup state.
 +  */
 + local_irq_save(flags);
 + start = spin_time_start();
 +
 + /* clear pending */
 + xen_clear_irq_pending(irq);
 +
 + /* Allow interrupts while blocked */
 + local_irq_restore(flags);

It's not clear what partially setup state is being protected here.
xen_clear_irq_pending() is an atomic bit clear.

I think you can drop the irq save/restore here.

 + /*
 +  * Don't halt if the content of the given byte address differs from
 +  * the expected value. A read memory barrier is added to make sure that
 +  * the latest value of the byte address is fetched.
 +  */
 + smp_rmb();

The atomic bit clear in xen_clear_irq_pending() acts as a full memory
barrier.  I don't think you need an additional memory barrier here, only
a compiler one.  I suggest using READ_ONCE().

 + if (*byte != val) {
 + xen_lock_stats(PV_LOCKSTAT_HALT_ABORT);
 + return -1;
 + }
 + /*
 +  * If an interrupt happens here, it will leave the wakeup irq
 +  * pending, which will cause xen_poll_irq() to return
 +  * immediately.
 +  */
 +
 + /* Block until irq becomes pending (or perhaps a spurious wakeup) */
 + xen_poll_irq(irq);
 + spin_time_accum_blocked(start);
 + return 0;
 +}
 +PV_CALLEE_SAVE_REGS_THUNK(xen_halt_cpu);
 +
 +#endif /* CONFIG_QUEUE_SPINLOCK */
 +
  static irqreturn_t dummy_handler(int irq, void *dev_id)
  {
   BUG();

David

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] [RFC] Hypervisor RNG and enumeration

2014-10-30 Thread David Vrabel
On 29/10/14 05:19, Andy Lutomirski wrote:
 CPUID leaf 4F02H: miscellaneous features
 
 
[...]
 ### CommonHV RNG
 
 If CPUID.4F02H.EAX is nonzero, then it contains an MSR index used to
 communicate with a hypervisor random number generator.  This MSR is
 referred to as MSR_COMMONHV_RNG.
 
 rdmsr(MSR_COMMONHV_RNG) returns a 64-bit best-effort random number.  If the
 hypervisor is able to generate a 64-bit cryptographically secure random 
 number,
 it SHOULD return it.  If not, then the hypervisor SHOULD do its best to return
 a random number suitable for seeding a cryptographic RNG.
 
 A guest is expected to read MSR_COMMONHV_RNG several times in a row.
 The hypervisor SHOULD return different values each time.
 
 rdmsr(MSR_COMMONHV_RNG) MUST NOT result in an exception, but guests MUST
 NOT assume that its return value is indeed secure.  For example, a hypervisor
 is free to return zero in response to rdmsr(MSR_COMMONHV_RNG).

I would add:

  If the hypervisor's pool of random data is exhausted, it MAY
  return 0.  The hypervisor MUST provide at least 4 (?) non-zero
  numbers to each guest.

Xen does not have a continual source of entropy and the only feasible
way is for the toolstack to provide each guest with a fixed size pool of
random data during guest creation.

The fixed size pool could be refilled by the guest if further random
data is needed (e.g., before an in-guest kexec).

 wrmsr(MSR_COMMONHV_RNG) offers the hypervisor up to 64 bits of entropy.
 The hypervisor MAY use it as it sees fit to improve its own random number
 generator.  A hypervisor SHOULD make a reasonable effort to avoid making
 values written to MSR_COMMONHV_RNG visible to untrusted parties, but
 guests SHOULD NOT write sensitive values to wrmsr(MSR_COMMONHV_RNG).

I don't think unprivileged guests should be able to influence the
hypervisor's RNG. Unless the intention here is it only affects the
numbers returned to this guest?

But since the write is optional, I don't object to it.

David

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] [RFC] Hypervisor RNG and enumeration

2014-10-29 Thread David Vrabel
On 29/10/14 13:45, Paolo Bonzini wrote:
 
 
 On 10/29/2014 11:37 AM, Andrew Cooper wrote:
 While testing various nested combinations, XenServer has found that
 modern Windows Server versions must have the hypervisor bit hidden from
 them for them to be happy running HyperV, despite the fact that they
 will make use of the Viridian virtual extensions also provided.
 
 Right.
 
 As a result, while it is certainly advisable for the hypervisor bit to
 be set, CommonHV should be available to be found by paravirtualised
 drivers inside an OS which can't cope with the hypervisor bit set.
 
 Microsoft should just stop putting arbitrary limitations on their
 software; or pay the price which, in this case, is not being able to use
 the features from the common specification.  I guess what they'd do is
 reinvent the RNG as a Viridian extension (if they need it).
 
 You can certainly do CPUID(0x4F00) even if HYPERVISOR=0.  What you
 get back is undefined, but in all likelihood it won't be the
 CommonHVIntf string.

Microsoft already has a specification to obtain a random number via an
ACPI device.  The VM Generation ID.

http://www.microsoft.com/en-us/download/details.aspx?id=30707

David
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v6 05/11] pvqspinlock, x86: Allow unfair spinlock in a PV guest

2014-03-13 Thread David Vrabel
On 12/03/14 18:54, Waiman Long wrote:
 Locking is always an issue in a virtualized environment as the virtual
 CPU that is waiting on a lock may get scheduled out and hence block
 any progress in lock acquisition even when the lock has been freed.
 
 One solution to this problem is to allow unfair lock in a
 para-virtualized environment. In this case, a new lock acquirer can
 come and steal the lock if the next-in-line CPU to get the lock is
 scheduled out. Unfair lock in a native environment is generally not a
 good idea as there is a possibility of lock starvation for a heavily
 contended lock.

I do not think this is a good idea -- the problems with unfair locks are
worse in a virtualized guest.  If a waiting VCPU deschedules and has to
be kicked to grab a lock then it is very likely to lose a race with
another running VCPU trying to take a lock (since it takes time for the
VCPU to be rescheduled).

 With the unfair locking activated on bare metal 4-socket Westmere-EX
 box, the execution times (in ms) of a spinlock micro-benchmark were
 as follows:
 
   # ofTicket   Fair   Unfair
   taskslock queue lockqueue lock
   --  ---   ----
 1   135135 137
 2  1045   1120 747
 3  1827   23451084
 4  2689   29341438
 5  3736   36581722
 6  4942   44342092
 7  6304   5176  2245
 8  7736   5955  2388

Are these figures with or without the later PV support patches?

David
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC v6 09/11] pvqspinlock, x86: Add qspinlock para-virtualization support

2014-03-13 Thread David Vrabel
On 12/03/14 18:54, Waiman Long wrote:
 This patch adds para-virtualization support to the queue spinlock in
 the same way as was done in the PV ticket lock code. In essence, the
 lock waiters will spin for a specified number of times (QSPIN_THRESHOLD
 = 2^14) and then halted itself. The queue head waiter will spins
 2*QSPIN_THRESHOLD times before halting itself. When it has spinned
 QSPIN_THRESHOLD times, the queue head will assume that the lock
 holder may be scheduled out and attempt to kick the lock holder CPU
 if it has the CPU number on hand.

I don't really understand the reasoning for kicking the lock holder.  It
will either be: running, runnable, or halted because it's in a slow path
wait for another lock.  In any of these states I do not see how a kick
is useful.

 Enabling the PV code does have a performance impact on spinlock
 acquisitions and releases. The following table shows the execution
 time (in ms) of a spinlock micro-benchmark that does lock/unlock
 operations 5M times for each task versus the number of contending
 tasks on a Westmere-EX system.
 
   # ofTicket lock  Queue lock
   tasks   PV off/PV on/%Change  PV off/PV on/%Change
   --     -
 1  135/  179/+33%  137/  169/+23%
 2 1045/ 1103/ +6% 1120/ 1536/+37%
 3 1827/ 2683/+47% 2313/ 2425/ +5%
 4   2689/ 4191/+56%   2914/ 3128/ +7%
 5   3736/ 5830/+56%   3715/ 3762/ +1%
 6   4942/ 7609/+54%   4504/ 4558/ +2%
 7   6304/ 9570/+52%   5292/ 5351/ +1%
 8   7736/11323/+46%   6037/ 6097/ +1%

Do you have measurements from tests when VCPUs are overcommitted?

 +#ifdef CONFIG_PARAVIRT_SPINLOCKS
 +/**
 + * queue_spin_unlock_slowpath - kick up the CPU of the queue head
 + * @lock : Pointer to queue spinlock structure
 + *
 + * The lock is released after finding the queue head to avoid racing
 + * condition between the queue head and the lock holder.
 + */
 +void queue_spin_unlock_slowpath(struct qspinlock *lock)
 +{
 + struct qnode *node, *prev;
 + u32 qcode = (u32)queue_get_qcode(lock);
 +
 + /*
 +  * Get the queue tail node
 +  */
 + node = xlate_qcode(qcode);
 +
 + /*
 +  * Locate the queue head node by following the prev pointer from
 +  * tail to head.
 +  * It is assumed that the PV guests won't have that many CPUs so
 +  * that it won't take a long time to follow the pointers.

This isn't a valid assumption, but this isn't that different from the
search done in the ticket slow unlock path so I guess it's ok.

David
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC v5 4/8] pvqspinlock, x86: Allow unfair spinlock in a real PV environment

2014-03-04 Thread David Vrabel
On 04/03/14 15:15, Waiman Long wrote:
 On 03/03/2014 05:55 AM, Paolo Bonzini wrote:
 Il 28/02/2014 18:06, Waiman Long ha scritto:
 On 02/26/2014 12:07 PM, Konrad Rzeszutek Wilk wrote:
 On Wed, Feb 26, 2014 at 10:14:24AM -0500, Waiman Long wrote:
 Locking is always an issue in a virtualized environment as the virtual
 CPU that is waiting on a lock may get scheduled out and hence block
 any progress in lock acquisition even when the lock has been freed.

 One solution to this problem is to allow unfair lock in a
 para-virtualized environment. In this case, a new lock acquirer can
 come and steal the lock if the next-in-line CPU to get the lock is
 scheduled out. Unfair lock in a native environment is generally not a
 Hmm, how do you know if the 'next-in-line CPU' is scheduled out? As
 in the hypervisor knows - but you as a guest might have no idea
 of it.

 I use a heart-beat counter to see if the other side responses within a
 certain time limit. If not, I assume it has been scheduled out probably
 due to PLE.

 PLE is unnecessary if you have true pv spinlocks where the
 next-in-line schedules itself out with a hypercall (Xen) or hlt
 instruction (KVM).  Set a bit in the qspinlock before going to sleep,
 and the lock owner will know that it needs to kick the next-in-line.

 I think there is no need for the unfair lock bits.  1-2% is a pretty
 large hit.

 Paolo
 
 I don't think that PLE is something that can be controlled by software.

You can avoid PLE by not issuing PAUSE instructions when spinning.  You
may want to consider this if you have a lock that explicitly deschedules
the VCPU while waiting (or just deschedule before PLE would trigger).

 It is done in hardware. I maybe wrong. Anyway, I plan to add code to
 schedule out the CPUs waiting in the queue except the first 2 in the
 next version of the patch.

I think you should deschedule all waiters.

 The PV code in the v5 patch did seem to improve benchmark performance
 with moderate to heavy spinlock contention.

The goal of PV aware locks is to improve performance when locks are
contented /and/ VCPUs are over-committed.  Is this something you're
actually measuring?  It's not clear to me.

David
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] [PATCH RFC v5 7/8] pvqspinlock, x86: Add qspinlock para-virtualization support

2014-03-03 Thread David Vrabel
On 27/02/14 15:50, Paolo Bonzini wrote:
 Il 27/02/2014 16:22, Raghavendra K T ha scritto:
 On 02/27/2014 08:15 PM, Paolo Bonzini wrote:
 [...]
 But neither of the VCPUs being kicked here are halted -- they're either
 running or runnable (descheduled by the hypervisor).

 /me actually looks at Waiman's code...

 Right, this is really different from pvticketlocks, where the *unlock*
 primitive wakes up a sleeping VCPU.  It is more similar to PLE
 (pause-loop exiting).

 Adding to the discussion, I see there are two possibilities here,
 considering that in undercommit cases we should not exceed
 HEAD_SPIN_THRESHOLD,

 1. the looping vcpu in pv_head_spin_check() should do halt()
 considering that we have done enough spinning (more than typical
 lock-hold time), and hence we are in potential overcommit.

 2. multiplex kick_cpu to do directed yield in qspinlock case.
 But this may result in some ping ponging?
 
 Actually, I think the qspinlock can work roughly the same as the
 pvticketlock, using the same lock_spinning and unlock_lock hooks.

This is is approach I would like to see.  This would also work for Xen
PV guests.

The current implementation depends on hardware PLE which Xen PV guests
do not support and I'm not sure if other architectures have something
similar (e.g., does ARM's virtualization extensions have PLE?).

David
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC v5 7/8] pvqspinlock, x86: Add qspinlock para-virtualization support

2014-02-27 Thread David Vrabel
On 26/02/14 15:14, Waiman Long wrote:
 This patch adds para-virtualization support to the queue spinlock code
 by enabling the queue head to kick the lock holder CPU, if known,
 in when the lock isn't released for a certain amount of time. It
 also enables the mutual monitoring of the queue head CPU and the
 following node CPU in the queue to make sure that their CPUs will
 stay scheduled in.

I'm not really understanding how this is supposed to work.  There
appears to be an assumption that a guest can keep one of its VCPUs
running by repeatedly kicking it?  This is not possible under Xen and I
doubt it's possible under KVM or any other hypervisor.

David
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC v5 4/8] pvqspinlock, x86: Allow unfair spinlock in a real PV environment

2014-02-27 Thread David Vrabel
On 26/02/14 15:14, Waiman Long wrote:
 Locking is always an issue in a virtualized environment as the virtual
 CPU that is waiting on a lock may get scheduled out and hence block
 any progress in lock acquisition even when the lock has been freed.
 
 One solution to this problem is to allow unfair lock in a
 para-virtualized environment. In this case, a new lock acquirer can
 come and steal the lock if the next-in-line CPU to get the lock is
 scheduled out. Unfair lock in a native environment is generally not a
 good idea as there is a possibility of lock starvation for a heavily
 contended lock.

I'm not sure I'm keen on losing the fairness in PV environment.  I'm
concerned that on an over-committed host, the lock starvation problem
will be particularly bad.

But I'll have to revist this once a non-broken PV qspinlock
implementation exists (or someone explains how the proposed one works).

David
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC v5 7/8] pvqspinlock, x86: Add qspinlock para-virtualization support

2014-02-27 Thread David Vrabel
On 27/02/14 13:11, Paolo Bonzini wrote:
 Il 27/02/2014 13:11, David Vrabel ha scritto:
  This patch adds para-virtualization support to the queue spinlock code
  by enabling the queue head to kick the lock holder CPU, if known,
  in when the lock isn't released for a certain amount of time. It
  also enables the mutual monitoring of the queue head CPU and the
  following node CPU in the queue to make sure that their CPUs will
  stay scheduled in.
 I'm not really understanding how this is supposed to work.  There
 appears to be an assumption that a guest can keep one of its VCPUs
 running by repeatedly kicking it?  This is not possible under Xen and I
 doubt it's possible under KVM or any other hypervisor.
 
 KVM allows any VCPU to wake up a currently halted VCPU of its choice,
 see Documentation/virtual/kvm/hypercalls.txt.

But neither of the VCPUs being kicked here are halted -- they're either
running or runnable (descheduled by the hypervisor).

David
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] [PATCH linux-next] xen/multicall: xen_mc_callback(): avoid buffer overflow

2013-02-11 Thread David Vrabel
On 08/02/2013 20:34, Tim Gardner wrote:
 This buffer overflow was introduced with 
 91e0c5f3dad47838cb2ecc1865ce789a0b7182b1
 (2.6.24).

There's no buffer overflow here. xen_mc_flush() resets b-cbidx.

David

  arch/x86/xen/multicalls.c |3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)
 
 diff --git a/arch/x86/xen/multicalls.c b/arch/x86/xen/multicalls.c
 index 0d82003..5270407 100644
 --- a/arch/x86/xen/multicalls.c
 +++ b/arch/x86/xen/multicalls.c
 @@ -195,9 +195,10 @@ void xen_mc_callback(void (*fn)(void *), void *data)
   struct mc_buffer *b = __get_cpu_var(mc_buffer);
   struct callback *cb;
  
 - if (b-cbidx == MC_BATCH) {
 + if (b-cbidx = MC_BATCH) {
   trace_xen_mc_flush_reason(XEN_MC_FL_CALLBACK);
   xen_mc_flush();
 + return;
   }
  
   trace_xen_mc_callback(fn, data);
 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation

2013-01-11 Thread David Vrabel
On 11/01/13 13:22, Daniel Kiper wrote:
 On Thu, Jan 10, 2013 at 02:19:55PM +, David Vrabel wrote:
 On 04/01/13 17:01, Daniel Kiper wrote:
 My .5 cents:
   - We should focus on KEXEC_CMD_kexec_load and KEXEC_CMD_kexec_unload;
 probably we should introduce KEXEC_CMD_kexec_load2 and 
 KEXEC_CMD_kexec_unload2;
 load should __LOAD__ kernel image and other things into hypervisor 
 memory;

 Yes, but I don't see how we can easily support both ABIs easily.  I'd be
 in favour of replacing the existing hypercalls and requiring updated
 kexec tools in dom0 (this isn't that different to requiring the correct
 libxc in dom0).
 
 Why? Just define new strutures for new functions of kexec hypercall.
 That should suffice.

The current hypervisor ABI depends on an internal kernel ABI (i.e., the
ABI provided by relocate_kernel).  We do not want hypervisor internals
to be constrained by having to be compatible with kernel internals.

   - Hmmm... Now I think that we should still use kexec syscall to load image
 into Xen memory (with new KEXEC_CMD_kexec_load2) because it establishes
 all things which are needed to call kdump if dom0 crashes; however,
 I could be wrong...

 I don't think we need the kexec syscall.  The kernel can unconditionally
 do the crash hypercall, which will return if the kdump kernel isn't
 loaded and the kernel can fall back to the regular non-kexec panic.
 
 No, please do not do that. When you call HYPERVISOR_kexec_op(KEXEC_CMD_kexec)
 system is completly shutdown. Return form HYPERVISOR_kexec_op(KEXEC_CMD_kexec)
 would require to restore some kernel functionalities. It maybe impossible
 in some cases. Additionally, it means that some changes should be made
 in generic kexec code path. As I know kexec maintainers are very reluctant
 to make such things.

Huh?  There only needs to be a call to a new hypervisor_crash_kexec()
function (which would then call the Xen specific crash hypercall) at the
very beginning of crash_kexec().  If this returns the normal
crash/shutdown path is done (which could even include a guest kexec!).

David
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation

2013-01-10 Thread David Vrabel
On 04/01/13 17:01, Daniel Kiper wrote:
 On Fri, Jan 04, 2013 at 02:38:44PM +, David Vrabel wrote:
 On 04/01/13 14:22, Daniel Kiper wrote:
 On Wed, Jan 02, 2013 at 11:26:43AM +, Andrew Cooper wrote:
 On 27/12/12 18:02, Eric W. Biederman wrote:
 Andrew Cooperandrew.coop...@citrix.com  writes:

 On 27/12/2012 07:53, Eric W. Biederman wrote:
 The syscall ABI still has the wrong semantics.

 Aka totally unmaintainable and umergeable.

 The concept of domU support is also strange.  What does domU support 
 even mean, when the dom0 support is loading a kernel to pick up Xen 
 when Xen falls over.
 There are two requirements pulling at this patch series, but I agree
 that we need to clarify them.
 It probably make sense to split them apart a little even.



 Thinking about this split, there might be a way to simply it even more.

 /sbin/kexec can load the Xen crash kernel itself by issuing
 hypercalls using /dev/xen/privcmd.  This would remove the need for
 the dom0 kernel to distinguish between loading a crash kernel for
 itself and loading a kernel for Xen.

 Or is this just a silly idea complicating the matter?

 This is impossible with current Xen kexec/kdump interface.
 It should be changed to do that. However, I suppose that
 Xen community would not be interested in such changes.

 I don't see why the hypercall ABI cannot be extended with new sub-ops
 that do the right thing -- the existing ABI is a bit weird.

 I plan to start prototyping something shortly (hopefully next week) for
 the Xen kexec case.
 
 Wow... As I can this time Xen community is interested in...
 That is great. I agree that current kexec interface is not ideal.

I spent some more time looking at the existing interface and
implementation and it really is broken.

 David, I am happy to help in that process. However, if you wish I could
 carry it myself. Anyway, it looks that I should hold on with my
 Linux kexec/kdump patches.

I should be able to post some prototype patches for Xen in a few weeks.
 No guarantees though.

 My .5 cents:
   - We should focus on KEXEC_CMD_kexec_load and KEXEC_CMD_kexec_unload;
 probably we should introduce KEXEC_CMD_kexec_load2 and 
 KEXEC_CMD_kexec_unload2;
 load should __LOAD__ kernel image and other things into hypervisor memory;

Yes, but I don't see how we can easily support both ABIs easily.  I'd be
in favour of replacing the existing hypercalls and requiring updated
kexec tools in dom0 (this isn't that different to requiring the correct
libxc in dom0).

 I suppose that allmost all things could be copied from 
 linux/kernel/kexec.c,
 linux/arch/x86/kernel/{machine_kexec_$(BITS).c,relocate_kernel_$(BITS).c};
 I think that KEXEC_CMD_kexec should stay as is,

I don't think we want all the junk from Linux inside Xen -- we only want
to support the kdump case and do not have to handle returning from the
kexec image.

   - Hmmm... Now I think that we should still use kexec syscall to load image
 into Xen memory (with new KEXEC_CMD_kexec_load2) because it establishes
 all things which are needed to call kdump if dom0 crashes; however,
 I could be wrong...

I don't think we need the kexec syscall.  The kernel can unconditionally
do the crash hypercall, which will return if the kdump kernel isn't
loaded and the kernel can fall back to the regular non-kexec panic.

This will allow the kexec syscall to be used only for the domU kexec case.

   - last but not least, we should think about support for PV guests
 too.

I won't be looking at this.

To avoid confusion about the two largely orthogonal sorts of kexec how
about defining some terms.  I suggest:

Xen kexec: Xen executes the image in response to a Xen crash or a
hypercall from a privileged domain.

Guest kexec: The guest kernel executes the images within the domain in
response to a guest kernel crash or a system call.

David
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] [PATCH v3 00/11] xen: Initial kexec/kdump implementation

2013-01-07 Thread David Vrabel
On 04/01/13 14:22, Daniel Kiper wrote:
 On Wed, Jan 02, 2013 at 11:26:43AM +, Andrew Cooper wrote:
 On 27/12/12 18:02, Eric W. Biederman wrote:
 Andrew Cooperandrew.coop...@citrix.com  writes:

 On 27/12/2012 07:53, Eric W. Biederman wrote:
 The syscall ABI still has the wrong semantics.

 Aka totally unmaintainable and umergeable.

 The concept of domU support is also strange.  What does domU support even 
 mean, when the dom0 support is loading a kernel to pick up Xen when Xen 
 falls over.
 There are two requirements pulling at this patch series, but I agree
 that we need to clarify them.
 It probably make sense to split them apart a little even.



 Thinking about this split, there might be a way to simply it even more.

 /sbin/kexec can load the Xen crash kernel itself by issuing
 hypercalls using /dev/xen/privcmd.  This would remove the need for
 the dom0 kernel to distinguish between loading a crash kernel for
 itself and loading a kernel for Xen.

 Or is this just a silly idea complicating the matter?
 
 This is impossible with current Xen kexec/kdump interface.
 It should be changed to do that. However, I suppose that
 Xen community would not be interested in such changes.

I don't see why the hypercall ABI cannot be extended with new sub-ops
that do the right thing -- the existing ABI is a bit weird.

I plan to start prototyping something shortly (hopefully next week) for
the Xen kexec case.

David
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] [patch 1/3] xen/privcmd: check for integer overflow in ioctl

2012-09-10 Thread David Vrabel
On 08/09/12 10:52, Dan Carpenter wrote:
 If m.num is too large then the m.num * sizeof(*m.arr) multiplication
 could overflow and the access_ok() check wouldn't test the right size.

m.num is range checked later on so it doesn't matter that the
access_ok() checks might be wrong.  A bit subtle, perhaps.

David

 Signed-off-by: Dan Carpenter dan.carpen...@oracle.com
 ---
 Only needed in linux-next.
 
 diff --git a/drivers/xen/privcmd.c b/drivers/xen/privcmd.c
 index 215a3c0..fdff8f9 100644
 --- a/drivers/xen/privcmd.c
 +++ b/drivers/xen/privcmd.c
 @@ -325,6 +325,8 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, 
 int version)
   return -EFAULT;
   /* Returns per-frame error in m.arr. */
   m.err = NULL;
 + if (m.num  SIZE_MAX / sizeof(*m.arr))
 + return -EINVAL;
   if (!access_ok(VERIFY_WRITE, m.arr, m.num * sizeof(*m.arr)))
   return -EFAULT;
   break;
 @@ -332,6 +334,8 @@ static long privcmd_ioctl_mmap_batch(void __user *udata, 
 int version)
   if (copy_from_user(m, udata, sizeof(struct 
 privcmd_mmapbatch_v2)))
   return -EFAULT;
   /* Returns per-frame error code in m.err. */
 + if (m.num  SIZE_MAX / sizeof(*m.err))
 + return -EINVAL;
   if (!access_ok(VERIFY_WRITE, m.err, m.num * (sizeof(*m.err
   return -EFAULT;
   break;
 
 ___
 Xen-devel mailing list
 xen-de...@lists.xen.org
 http://lists.xen.org/xen-devel

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] [patch 3/3] xen/privcmd: remove const modifier from declaration

2012-09-10 Thread David Vrabel
On 10/09/12 10:03, Jan Beulich wrote:
 On 08.09.12 at 11:58, Dan Carpenter dan.carpen...@oracle.com wrote:
 When we use this pointer, we cast away the const modifier and modify the
 data.  I think it was an accident to declare it as const.
 
 NAK - the const is very valid here, as the v2 interface (as opposed
 to the v1 one) does _not_ modify this array (or if it does, it's a
 bug). This is a guarantee made to user mode, so it should also be
 expressed that way in the interface.
 
 But of course the cast used before this patch isn't right either, as
 it indeed inappropriately discards the qualifier. Afaiu this was done
 to simplify the internal workings of the code, but I don't think it's
 desirable to sacrifice type safety for implementation simplicity.

m.arr here isn't const as m is really the V1 structure where m.arr is
non-const.

David
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Xen-devel] [PATCH] xen: populate correct number of pages when across mem boundary

2012-07-16 Thread David Vrabel
On 04/07/12 07:49, zhenzhong.duan wrote:
 When populate pages across a mem boundary at bootup, the page count
 populated isn't correct. This is due to mem populated to non-mem
 region and ignored.
 
 Pfn range is also wrongly aligned when mem boundary isn't page aligned.
 
 Also need consider the rare case when xen_do_chunk fail(populate).
 
 For a dom0 booted with dom_mem=3368952K(0xcd9ff000-4k) dmesg diff is:
  [0.00] Freeing 9e-100 pfn range: 98 pages freed
  [0.00] 1-1 mapping on 9e-100
  [0.00] 1-1 mapping on cd9ff-10
  [0.00] Released 98 pages of unused memory
  [0.00] Set 206435 page(s) to 1-1 mapping
 -[0.00] Populating cd9fe-cda00 pfn range: 1 pages added
 +[0.00] Populating cd9fe-cd9ff pfn range: 1 pages added
 +[0.00] Populating 10-100061 pfn range: 97 pages added
  [0.00] BIOS-provided physical RAM map:
  [0.00] Xen:  - 0009e000 (usable)
  [0.00] Xen: 000a - 0010 (reserved)
  [0.00] Xen: 0010 - cd9ff000 (usable)
  [0.00] Xen: cd9ffc00 - cda53c00 (ACPI NVS)
 ...
  [0.00] Xen: 0001 - 000100061000 (usable)
  [0.00] Xen: 000100061000 - 00012c00 (unusable)
 ...
  [0.00] MEMBLOCK configuration:
 ...
 -[0.00]  reserved[0x4]   [0x00cd9ff000-0x00cd9ffbff], 
 0xc00 bytes
 -[0.00]  reserved[0x5]   [0x01-0x0100060fff], 
 0x61000 bytes
 
 Related xen memory layout:
 (XEN) Xen-e820 RAM map:
 (XEN)   - 0009ec00 (usable)
 (XEN)  000f - 0010 (reserved)
 (XEN)  0010 - cd9ffc00 (usable)
 
 Signed-off-by: Zhenzhong Duan zhenzhong.d...@oracle.com
 ---
  arch/x86/xen/setup.c |   24 +++-
  1 files changed, 11 insertions(+), 13 deletions(-)
 
 diff --git a/arch/x86/xen/setup.c b/arch/x86/xen/setup.c
 index a4790bf..bd78773 100644
 --- a/arch/x86/xen/setup.c
 +++ b/arch/x86/xen/setup.c
 @@ -157,50 +157,48 @@ static unsigned long __init xen_populate_chunk(
   unsigned long dest_pfn;
  
   for (i = 0, entry = list; i  map_size; i++, entry++) {
 - unsigned long credits = credits_left;
   unsigned long s_pfn;
   unsigned long e_pfn;
   unsigned long pfns;
   long capacity;
  
 - if (credits = 0)
 + if (credits_left = 0)
   break;
  
   if (entry-type != E820_RAM)
   continue;
  
 - e_pfn = PFN_UP(entry-addr + entry-size);
 + e_pfn = PFN_DOWN(entry-addr + entry-size);

Ok.

  
   /* We only care about E820 after the xen_start_info-nr_pages */
   if (e_pfn = max_pfn)
   continue;
  
 - s_pfn = PFN_DOWN(entry-addr);
 + s_pfn = PFN_UP(entry-addr);

Ok.

   /* If the E820 falls within the nr_pages, we want to start
* at the nr_pages PFN.
* If that would mean going past the E820 entry, skip it
*/
 +again:
   if (s_pfn = max_pfn) {
   capacity = e_pfn - max_pfn;
   dest_pfn = max_pfn;
   } else {
 - /* last_pfn MUST be within E820_RAM regions */
 - if (*last_pfn  e_pfn = *last_pfn)
 - s_pfn = *last_pfn;
   capacity = e_pfn - s_pfn;
   dest_pfn = s_pfn;
   }
 - /* If we had filled this E820_RAM entry, go to the next one. */
 - if (capacity = 0)
 - continue;
  
 - if (credits  capacity)
 - credits = capacity;
 + if (credits_left  capacity)
 + capacity = credits_left;
  
 - pfns = xen_do_chunk(dest_pfn, dest_pfn + credits, false);
 + pfns = xen_do_chunk(dest_pfn, dest_pfn + capacity, false);
   done += pfns;
   credits_left -= pfns;
   *last_pfn = (dest_pfn + pfns);
 + if (credits_left  0  *last_pfn  e_pfn) {
 + s_pfn = *last_pfn;
 + goto again;
 + }

This looks like it will loop forever if xen_do_chunk() repeatedly fails
because Xen is out of pages.  I think if xen_do_chunk() cannot get a
page from Xen the repopulation process should stop -- aborting this
chunk and any others.  This will allow the guest to continue to boot
just with less memory than expected.

David
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization