Re: [Xen-devel] [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating.

2014-04-09 Thread Konrad Rzeszutek Wilk
On Wed, Apr 09, 2014 at 04:38:37PM +0100, David Vrabel wrote:
> On 09/04/14 16:34, Konrad Rzeszutek Wilk wrote:
> > On Wed, Apr 09, 2014 at 09:37:01AM +0200, Roger Pau Monné wrote:
> >> On 08/04/14 20:53, Konrad Rzeszutek Wilk wrote:
> >>> On Tue, Apr 08, 2014 at 08:18:48PM +0200, Roger Pau Monné wrote:
>  On 08/04/14 19:25, kon...@kernel.org wrote:
> > From: Konrad Rzeszutek Wilk 
> >
> > When we migrate an HVM guest, by default our shared_info can
> > only hold up to 32 CPUs. As such the hypercall
> > VCPUOP_register_vcpu_info was introduced which allowed us to
> > setup per-page areas for VCPUs. This means we can boot PVHVM
> > guest with more than 32 VCPUs. During migration the per-cpu
> > structure is allocated fresh by the hypervisor (vcpu_info_mfn
> > is set to INVALID_MFN) so that the newly migrated guest
> > can do make the VCPUOP_register_vcpu_info hypercall.
> >
> > Unfortunatly we end up triggering this condition:
> > /* Run this command on yourself or on other offline VCPUS. */
> >  if ( (v != current) && !test_bit(_VPF_down, >pause_flags) )
> >
> > which means we are unable to setup the per-cpu VCPU structures
> > for running vCPUS. The Linux PV code paths make this work by
> > iterating over every vCPU with:
> >
> >  1) is target CPU up (VCPUOP_is_up hypercall?)
> >  2) if yes, then VCPUOP_down to pause it.
> >  3) VCPUOP_register_vcpu_info
> >  4) if it was down, then VCPUOP_up to bring it back up
> >
> > But since VCPUOP_down, VCPUOP_is_up, and VCPUOP_up are
> > not allowed on HVM guests we can't do this. This patch
> > enables this.
> 
>  Hmmm, this looks like a very convoluted approach to something that could
>  be solved more easily IMHO. What we do on FreeBSD is put all vCPUs into
>  suspension, which means that all vCPUs except vCPU#0 will be in the
>  cpususpend_handler, see:
> 
>  http://svnweb.freebsd.org/base/head/sys/amd64/amd64/mp_machdep.c?revision=263878=markup#l1460
> >>>
> >>> How do you 'suspend' them? If I remember there is a disadvantage of doing
> >>> this as you have to bring all the CPUs "offline". That in Linux means 
> >>> using
> >>> the stop_machine which is pretty big hammer and increases the latency for 
> >>> migration.
> >>
> >> In order to suspend them an IPI_SUSPEND is sent to all vCPUs except vCPU#0:
> >>
> >> http://fxr.watson.org/fxr/source/kern/subr_smp.c#L289
> >>
> >> Which makes all APs call cpususpend_handler, so we know all APs are
> >> stuck in a while loop with interrupts disabled:
> >>
> >> http://fxr.watson.org/fxr/source/amd64/amd64/mp_machdep.c#L1459
> >>
> >> Then on resume the APs are taken out of the while loop and the first
> >> thing they do before returning from the IPI handler is registering the
> >> new per-cpu vcpu_info area. But I'm not sure this is something that can
> >> be accomplished easily on Linux.
> > 
> > That is a bit of what the 'stop_machine' would do. It puts all of the
> > CPUs in whatever function you want. But I am not sure of the latency impact 
> > - as
> > in what if the migration takes longer and all of the CPUs sit there 
> > spinning.
> > Another variant of that is the 'smp_call_function'.
> 
> I tested stop_machine() on all CPUs during suspend once and it was
> awful:  100s of ms of additional downtime.

Yikes.
> 
> Perhaps a hand-rolled IPI-and-park-in-handler would be quicker the full
> stop_machine().

But that is clearly a bigger patch than this little bug-fix.

Do you want to just take this patch as is and then later on I can work on
prototyping the 'IPI-and-park-in-handler'?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating.

2014-04-09 Thread David Vrabel
On 09/04/14 16:34, Konrad Rzeszutek Wilk wrote:
> On Wed, Apr 09, 2014 at 09:37:01AM +0200, Roger Pau Monné wrote:
>> On 08/04/14 20:53, Konrad Rzeszutek Wilk wrote:
>>> On Tue, Apr 08, 2014 at 08:18:48PM +0200, Roger Pau Monné wrote:
 On 08/04/14 19:25, kon...@kernel.org wrote:
> From: Konrad Rzeszutek Wilk 
>
> When we migrate an HVM guest, by default our shared_info can
> only hold up to 32 CPUs. As such the hypercall
> VCPUOP_register_vcpu_info was introduced which allowed us to
> setup per-page areas for VCPUs. This means we can boot PVHVM
> guest with more than 32 VCPUs. During migration the per-cpu
> structure is allocated fresh by the hypervisor (vcpu_info_mfn
> is set to INVALID_MFN) so that the newly migrated guest
> can do make the VCPUOP_register_vcpu_info hypercall.
>
> Unfortunatly we end up triggering this condition:
> /* Run this command on yourself or on other offline VCPUS. */
>  if ( (v != current) && !test_bit(_VPF_down, >pause_flags) )
>
> which means we are unable to setup the per-cpu VCPU structures
> for running vCPUS. The Linux PV code paths make this work by
> iterating over every vCPU with:
>
>  1) is target CPU up (VCPUOP_is_up hypercall?)
>  2) if yes, then VCPUOP_down to pause it.
>  3) VCPUOP_register_vcpu_info
>  4) if it was down, then VCPUOP_up to bring it back up
>
> But since VCPUOP_down, VCPUOP_is_up, and VCPUOP_up are
> not allowed on HVM guests we can't do this. This patch
> enables this.

 Hmmm, this looks like a very convoluted approach to something that could
 be solved more easily IMHO. What we do on FreeBSD is put all vCPUs into
 suspension, which means that all vCPUs except vCPU#0 will be in the
 cpususpend_handler, see:

 http://svnweb.freebsd.org/base/head/sys/amd64/amd64/mp_machdep.c?revision=263878=markup#l1460
>>>
>>> How do you 'suspend' them? If I remember there is a disadvantage of doing
>>> this as you have to bring all the CPUs "offline". That in Linux means using
>>> the stop_machine which is pretty big hammer and increases the latency for 
>>> migration.
>>
>> In order to suspend them an IPI_SUSPEND is sent to all vCPUs except vCPU#0:
>>
>> http://fxr.watson.org/fxr/source/kern/subr_smp.c#L289
>>
>> Which makes all APs call cpususpend_handler, so we know all APs are
>> stuck in a while loop with interrupts disabled:
>>
>> http://fxr.watson.org/fxr/source/amd64/amd64/mp_machdep.c#L1459
>>
>> Then on resume the APs are taken out of the while loop and the first
>> thing they do before returning from the IPI handler is registering the
>> new per-cpu vcpu_info area. But I'm not sure this is something that can
>> be accomplished easily on Linux.
> 
> That is a bit of what the 'stop_machine' would do. It puts all of the
> CPUs in whatever function you want. But I am not sure of the latency impact - 
> as
> in what if the migration takes longer and all of the CPUs sit there spinning.
> Another variant of that is the 'smp_call_function'.

I tested stop_machine() on all CPUs during suspend once and it was
awful:  100s of ms of additional downtime.

Perhaps a hand-rolled IPI-and-park-in-handler would be quicker the full
stop_machine().

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating.

2014-04-09 Thread Konrad Rzeszutek Wilk
On Wed, Apr 09, 2014 at 09:37:01AM +0200, Roger Pau Monné wrote:
> On 08/04/14 20:53, Konrad Rzeszutek Wilk wrote:
> > On Tue, Apr 08, 2014 at 08:18:48PM +0200, Roger Pau Monné wrote:
> >> On 08/04/14 19:25, kon...@kernel.org wrote:
> >>> From: Konrad Rzeszutek Wilk 
> >>>
> >>> When we migrate an HVM guest, by default our shared_info can
> >>> only hold up to 32 CPUs. As such the hypercall
> >>> VCPUOP_register_vcpu_info was introduced which allowed us to
> >>> setup per-page areas for VCPUs. This means we can boot PVHVM
> >>> guest with more than 32 VCPUs. During migration the per-cpu
> >>> structure is allocated fresh by the hypervisor (vcpu_info_mfn
> >>> is set to INVALID_MFN) so that the newly migrated guest
> >>> can do make the VCPUOP_register_vcpu_info hypercall.
> >>>
> >>> Unfortunatly we end up triggering this condition:
> >>> /* Run this command on yourself or on other offline VCPUS. */
> >>>  if ( (v != current) && !test_bit(_VPF_down, >pause_flags) )
> >>>
> >>> which means we are unable to setup the per-cpu VCPU structures
> >>> for running vCPUS. The Linux PV code paths make this work by
> >>> iterating over every vCPU with:
> >>>
> >>>  1) is target CPU up (VCPUOP_is_up hypercall?)
> >>>  2) if yes, then VCPUOP_down to pause it.
> >>>  3) VCPUOP_register_vcpu_info
> >>>  4) if it was down, then VCPUOP_up to bring it back up
> >>>
> >>> But since VCPUOP_down, VCPUOP_is_up, and VCPUOP_up are
> >>> not allowed on HVM guests we can't do this. This patch
> >>> enables this.
> >>
> >> Hmmm, this looks like a very convoluted approach to something that could
> >> be solved more easily IMHO. What we do on FreeBSD is put all vCPUs into
> >> suspension, which means that all vCPUs except vCPU#0 will be in the
> >> cpususpend_handler, see:
> >>
> >> http://svnweb.freebsd.org/base/head/sys/amd64/amd64/mp_machdep.c?revision=263878=markup#l1460
> > 
> > How do you 'suspend' them? If I remember there is a disadvantage of doing
> > this as you have to bring all the CPUs "offline". That in Linux means using
> > the stop_machine which is pretty big hammer and increases the latency for 
> > migration.
> 
> In order to suspend them an IPI_SUSPEND is sent to all vCPUs except vCPU#0:
> 
> http://fxr.watson.org/fxr/source/kern/subr_smp.c#L289
> 
> Which makes all APs call cpususpend_handler, so we know all APs are
> stuck in a while loop with interrupts disabled:
> 
> http://fxr.watson.org/fxr/source/amd64/amd64/mp_machdep.c#L1459
> 
> Then on resume the APs are taken out of the while loop and the first
> thing they do before returning from the IPI handler is registering the
> new per-cpu vcpu_info area. But I'm not sure this is something that can
> be accomplished easily on Linux.

That is a bit of what the 'stop_machine' would do. It puts all of the
CPUs in whatever function you want. But I am not sure of the latency impact - as
in what if the migration takes longer and all of the CPUs sit there spinning.
Another variant of that is the 'smp_call_function'.

Then when we resume - we need a mailbox that is shared (easily enough
I think) to tell us that the migration has been done - and then need to call
that VCPUOP_register_vcpu_info.

But if the migration has taken quite long - I fear that the watchdogs
might kick in and start complaining about the CPUs stuck. Especially
if we migrating on overcommitted guest.

With this the latency for them to be 'paused', 'initted', 'unpaused' I
think is much much smaller.

Ugh, lets wait with this exercise of using the 'smp_call_function'
sometime at the end of the summer - and see. That functionality
should be shared with the PV code path IMHO.

> 
> I've tried to local-migrate a FreeBSD PVHVM guest with 33 vCPUs on my
> 8-way box, and it seems to be working fine :).

Awesome!
> 
> Roger.
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating.

2014-04-09 Thread Roger Pau Monné
On 09/04/14 10:33, Ian Campbell wrote:
> On Tue, 2014-04-08 at 14:53 -0400, Konrad Rzeszutek Wilk wrote:
>> On Tue, Apr 08, 2014 at 08:18:48PM +0200, Roger Pau Monné wrote:
>>> On 08/04/14 19:25, kon...@kernel.org wrote:
 From: Konrad Rzeszutek Wilk 

 When we migrate an HVM guest, by default our shared_info can
 only hold up to 32 CPUs. As such the hypercall
 VCPUOP_register_vcpu_info was introduced which allowed us to
 setup per-page areas for VCPUs. This means we can boot PVHVM
 guest with more than 32 VCPUs. During migration the per-cpu
 structure is allocated fresh by the hypervisor (vcpu_info_mfn
 is set to INVALID_MFN) so that the newly migrated guest
 can do make the VCPUOP_register_vcpu_info hypercall.

 Unfortunatly we end up triggering this condition:
 /* Run this command on yourself or on other offline VCPUS. */
  if ( (v != current) && !test_bit(_VPF_down, >pause_flags) )

 which means we are unable to setup the per-cpu VCPU structures
 for running vCPUS. The Linux PV code paths make this work by
 iterating over every vCPU with:

  1) is target CPU up (VCPUOP_is_up hypercall?)
  2) if yes, then VCPUOP_down to pause it.
  3) VCPUOP_register_vcpu_info
  4) if it was down, then VCPUOP_up to bring it back up

 But since VCPUOP_down, VCPUOP_is_up, and VCPUOP_up are
 not allowed on HVM guests we can't do this. This patch
 enables this.
>>>
>>> Hmmm, this looks like a very convoluted approach to something that could
>>> be solved more easily IMHO. What we do on FreeBSD is put all vCPUs into
>>> suspension, which means that all vCPUs except vCPU#0 will be in the
>>> cpususpend_handler, see:
>>>
>>> http://svnweb.freebsd.org/base/head/sys/amd64/amd64/mp_machdep.c?revision=263878=markup#l1460
>>
>> How do you 'suspend' them? If I remember there is a disadvantage of doing
>> this as you have to bring all the CPUs "offline". That in Linux means using
>> the stop_machine which is pretty big hammer and increases the latency for 
>> migration.
> 
> Yes, this is why the ability to have the toolstack save/restore the
> secondary vcpu state was added. It's especially important for
> checkpointing, but it's relevant to regular migrate as a performance
> improvement too.
> 
> It's not just stop-machine, IIRC it's a tonne of udev events relating to
> cpus off/onlinign etc too and all the userspace activity which that
> implies.

Well, what it's done on FreeBSD is nothing like that, it's called the
cpususpend handler, but it's not off-lining CPUs or anything like that,
it just places the CPU in a while loop inside of an IPI handler, so we
can do something like this will all APs:

while (suspended)
 pause();

register_vcpu_info();

So the registration of the vcpu_info area happens just after the CPU is
waken from suspension and before it leaves the IPI handler, and it's the
CPU itself the one that calls VCPUOP_register_vcpu_info (so we can avoid
the gate in Xen that prevents registering the vcpu_info area for CPUs
different that ourself).

Roger.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating.

2014-04-09 Thread Ian Campbell
On Tue, 2014-04-08 at 14:53 -0400, Konrad Rzeszutek Wilk wrote:
> On Tue, Apr 08, 2014 at 08:18:48PM +0200, Roger Pau Monné wrote:
> > On 08/04/14 19:25, kon...@kernel.org wrote:
> > > From: Konrad Rzeszutek Wilk 
> > > 
> > > When we migrate an HVM guest, by default our shared_info can
> > > only hold up to 32 CPUs. As such the hypercall
> > > VCPUOP_register_vcpu_info was introduced which allowed us to
> > > setup per-page areas for VCPUs. This means we can boot PVHVM
> > > guest with more than 32 VCPUs. During migration the per-cpu
> > > structure is allocated fresh by the hypervisor (vcpu_info_mfn
> > > is set to INVALID_MFN) so that the newly migrated guest
> > > can do make the VCPUOP_register_vcpu_info hypercall.
> > > 
> > > Unfortunatly we end up triggering this condition:
> > > /* Run this command on yourself or on other offline VCPUS. */
> > >  if ( (v != current) && !test_bit(_VPF_down, >pause_flags) )
> > > 
> > > which means we are unable to setup the per-cpu VCPU structures
> > > for running vCPUS. The Linux PV code paths make this work by
> > > iterating over every vCPU with:
> > > 
> > >  1) is target CPU up (VCPUOP_is_up hypercall?)
> > >  2) if yes, then VCPUOP_down to pause it.
> > >  3) VCPUOP_register_vcpu_info
> > >  4) if it was down, then VCPUOP_up to bring it back up
> > > 
> > > But since VCPUOP_down, VCPUOP_is_up, and VCPUOP_up are
> > > not allowed on HVM guests we can't do this. This patch
> > > enables this.
> > 
> > Hmmm, this looks like a very convoluted approach to something that could
> > be solved more easily IMHO. What we do on FreeBSD is put all vCPUs into
> > suspension, which means that all vCPUs except vCPU#0 will be in the
> > cpususpend_handler, see:
> > 
> > http://svnweb.freebsd.org/base/head/sys/amd64/amd64/mp_machdep.c?revision=263878=markup#l1460
> 
> How do you 'suspend' them? If I remember there is a disadvantage of doing
> this as you have to bring all the CPUs "offline". That in Linux means using
> the stop_machine which is pretty big hammer and increases the latency for 
> migration.

Yes, this is why the ability to have the toolstack save/restore the
secondary vcpu state was added. It's especially important for
checkpointing, but it's relevant to regular migrate as a performance
improvement too.

It's not just stop-machine, IIRC it's a tonne of udev events relating to
cpus off/onlinign etc too and all the userspace activity which that
implies.

Ian.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating.

2014-04-09 Thread Roger Pau Monné
On 08/04/14 20:53, Konrad Rzeszutek Wilk wrote:
> On Tue, Apr 08, 2014 at 08:18:48PM +0200, Roger Pau Monné wrote:
>> On 08/04/14 19:25, kon...@kernel.org wrote:
>>> From: Konrad Rzeszutek Wilk 
>>>
>>> When we migrate an HVM guest, by default our shared_info can
>>> only hold up to 32 CPUs. As such the hypercall
>>> VCPUOP_register_vcpu_info was introduced which allowed us to
>>> setup per-page areas for VCPUs. This means we can boot PVHVM
>>> guest with more than 32 VCPUs. During migration the per-cpu
>>> structure is allocated fresh by the hypervisor (vcpu_info_mfn
>>> is set to INVALID_MFN) so that the newly migrated guest
>>> can do make the VCPUOP_register_vcpu_info hypercall.
>>>
>>> Unfortunatly we end up triggering this condition:
>>> /* Run this command on yourself or on other offline VCPUS. */
>>>  if ( (v != current) && !test_bit(_VPF_down, >pause_flags) )
>>>
>>> which means we are unable to setup the per-cpu VCPU structures
>>> for running vCPUS. The Linux PV code paths make this work by
>>> iterating over every vCPU with:
>>>
>>>  1) is target CPU up (VCPUOP_is_up hypercall?)
>>>  2) if yes, then VCPUOP_down to pause it.
>>>  3) VCPUOP_register_vcpu_info
>>>  4) if it was down, then VCPUOP_up to bring it back up
>>>
>>> But since VCPUOP_down, VCPUOP_is_up, and VCPUOP_up are
>>> not allowed on HVM guests we can't do this. This patch
>>> enables this.
>>
>> Hmmm, this looks like a very convoluted approach to something that could
>> be solved more easily IMHO. What we do on FreeBSD is put all vCPUs into
>> suspension, which means that all vCPUs except vCPU#0 will be in the
>> cpususpend_handler, see:
>>
>> http://svnweb.freebsd.org/base/head/sys/amd64/amd64/mp_machdep.c?revision=263878=markup#l1460
> 
> How do you 'suspend' them? If I remember there is a disadvantage of doing
> this as you have to bring all the CPUs "offline". That in Linux means using
> the stop_machine which is pretty big hammer and increases the latency for 
> migration.

In order to suspend them an IPI_SUSPEND is sent to all vCPUs except vCPU#0:

http://fxr.watson.org/fxr/source/kern/subr_smp.c#L289

Which makes all APs call cpususpend_handler, so we know all APs are
stuck in a while loop with interrupts disabled:

http://fxr.watson.org/fxr/source/amd64/amd64/mp_machdep.c#L1459

Then on resume the APs are taken out of the while loop and the first
thing they do before returning from the IPI handler is registering the
new per-cpu vcpu_info area. But I'm not sure this is something that can
be accomplished easily on Linux.

I've tried to local-migrate a FreeBSD PVHVM guest with 33 vCPUs on my
8-way box, and it seems to be working fine :).

Roger.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating.

2014-04-09 Thread Roger Pau Monné
On 08/04/14 20:53, Konrad Rzeszutek Wilk wrote:
 On Tue, Apr 08, 2014 at 08:18:48PM +0200, Roger Pau Monné wrote:
 On 08/04/14 19:25, kon...@kernel.org wrote:
 From: Konrad Rzeszutek Wilk konrad.w...@oracle.com

 When we migrate an HVM guest, by default our shared_info can
 only hold up to 32 CPUs. As such the hypercall
 VCPUOP_register_vcpu_info was introduced which allowed us to
 setup per-page areas for VCPUs. This means we can boot PVHVM
 guest with more than 32 VCPUs. During migration the per-cpu
 structure is allocated fresh by the hypervisor (vcpu_info_mfn
 is set to INVALID_MFN) so that the newly migrated guest
 can do make the VCPUOP_register_vcpu_info hypercall.

 Unfortunatly we end up triggering this condition:
 /* Run this command on yourself or on other offline VCPUS. */
  if ( (v != current)  !test_bit(_VPF_down, v-pause_flags) )

 which means we are unable to setup the per-cpu VCPU structures
 for running vCPUS. The Linux PV code paths make this work by
 iterating over every vCPU with:

  1) is target CPU up (VCPUOP_is_up hypercall?)
  2) if yes, then VCPUOP_down to pause it.
  3) VCPUOP_register_vcpu_info
  4) if it was down, then VCPUOP_up to bring it back up

 But since VCPUOP_down, VCPUOP_is_up, and VCPUOP_up are
 not allowed on HVM guests we can't do this. This patch
 enables this.

 Hmmm, this looks like a very convoluted approach to something that could
 be solved more easily IMHO. What we do on FreeBSD is put all vCPUs into
 suspension, which means that all vCPUs except vCPU#0 will be in the
 cpususpend_handler, see:

 http://svnweb.freebsd.org/base/head/sys/amd64/amd64/mp_machdep.c?revision=263878view=markup#l1460
 
 How do you 'suspend' them? If I remember there is a disadvantage of doing
 this as you have to bring all the CPUs offline. That in Linux means using
 the stop_machine which is pretty big hammer and increases the latency for 
 migration.

In order to suspend them an IPI_SUSPEND is sent to all vCPUs except vCPU#0:

http://fxr.watson.org/fxr/source/kern/subr_smp.c#L289

Which makes all APs call cpususpend_handler, so we know all APs are
stuck in a while loop with interrupts disabled:

http://fxr.watson.org/fxr/source/amd64/amd64/mp_machdep.c#L1459

Then on resume the APs are taken out of the while loop and the first
thing they do before returning from the IPI handler is registering the
new per-cpu vcpu_info area. But I'm not sure this is something that can
be accomplished easily on Linux.

I've tried to local-migrate a FreeBSD PVHVM guest with 33 vCPUs on my
8-way box, and it seems to be working fine :).

Roger.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating.

2014-04-09 Thread Ian Campbell
On Tue, 2014-04-08 at 14:53 -0400, Konrad Rzeszutek Wilk wrote:
 On Tue, Apr 08, 2014 at 08:18:48PM +0200, Roger Pau Monné wrote:
  On 08/04/14 19:25, kon...@kernel.org wrote:
   From: Konrad Rzeszutek Wilk konrad.w...@oracle.com
   
   When we migrate an HVM guest, by default our shared_info can
   only hold up to 32 CPUs. As such the hypercall
   VCPUOP_register_vcpu_info was introduced which allowed us to
   setup per-page areas for VCPUs. This means we can boot PVHVM
   guest with more than 32 VCPUs. During migration the per-cpu
   structure is allocated fresh by the hypervisor (vcpu_info_mfn
   is set to INVALID_MFN) so that the newly migrated guest
   can do make the VCPUOP_register_vcpu_info hypercall.
   
   Unfortunatly we end up triggering this condition:
   /* Run this command on yourself or on other offline VCPUS. */
if ( (v != current)  !test_bit(_VPF_down, v-pause_flags) )
   
   which means we are unable to setup the per-cpu VCPU structures
   for running vCPUS. The Linux PV code paths make this work by
   iterating over every vCPU with:
   
1) is target CPU up (VCPUOP_is_up hypercall?)
2) if yes, then VCPUOP_down to pause it.
3) VCPUOP_register_vcpu_info
4) if it was down, then VCPUOP_up to bring it back up
   
   But since VCPUOP_down, VCPUOP_is_up, and VCPUOP_up are
   not allowed on HVM guests we can't do this. This patch
   enables this.
  
  Hmmm, this looks like a very convoluted approach to something that could
  be solved more easily IMHO. What we do on FreeBSD is put all vCPUs into
  suspension, which means that all vCPUs except vCPU#0 will be in the
  cpususpend_handler, see:
  
  http://svnweb.freebsd.org/base/head/sys/amd64/amd64/mp_machdep.c?revision=263878view=markup#l1460
 
 How do you 'suspend' them? If I remember there is a disadvantage of doing
 this as you have to bring all the CPUs offline. That in Linux means using
 the stop_machine which is pretty big hammer and increases the latency for 
 migration.

Yes, this is why the ability to have the toolstack save/restore the
secondary vcpu state was added. It's especially important for
checkpointing, but it's relevant to regular migrate as a performance
improvement too.

It's not just stop-machine, IIRC it's a tonne of udev events relating to
cpus off/onlinign etc too and all the userspace activity which that
implies.

Ian.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating.

2014-04-09 Thread Roger Pau Monné
On 09/04/14 10:33, Ian Campbell wrote:
 On Tue, 2014-04-08 at 14:53 -0400, Konrad Rzeszutek Wilk wrote:
 On Tue, Apr 08, 2014 at 08:18:48PM +0200, Roger Pau Monné wrote:
 On 08/04/14 19:25, kon...@kernel.org wrote:
 From: Konrad Rzeszutek Wilk konrad.w...@oracle.com

 When we migrate an HVM guest, by default our shared_info can
 only hold up to 32 CPUs. As such the hypercall
 VCPUOP_register_vcpu_info was introduced which allowed us to
 setup per-page areas for VCPUs. This means we can boot PVHVM
 guest with more than 32 VCPUs. During migration the per-cpu
 structure is allocated fresh by the hypervisor (vcpu_info_mfn
 is set to INVALID_MFN) so that the newly migrated guest
 can do make the VCPUOP_register_vcpu_info hypercall.

 Unfortunatly we end up triggering this condition:
 /* Run this command on yourself or on other offline VCPUS. */
  if ( (v != current)  !test_bit(_VPF_down, v-pause_flags) )

 which means we are unable to setup the per-cpu VCPU structures
 for running vCPUS. The Linux PV code paths make this work by
 iterating over every vCPU with:

  1) is target CPU up (VCPUOP_is_up hypercall?)
  2) if yes, then VCPUOP_down to pause it.
  3) VCPUOP_register_vcpu_info
  4) if it was down, then VCPUOP_up to bring it back up

 But since VCPUOP_down, VCPUOP_is_up, and VCPUOP_up are
 not allowed on HVM guests we can't do this. This patch
 enables this.

 Hmmm, this looks like a very convoluted approach to something that could
 be solved more easily IMHO. What we do on FreeBSD is put all vCPUs into
 suspension, which means that all vCPUs except vCPU#0 will be in the
 cpususpend_handler, see:

 http://svnweb.freebsd.org/base/head/sys/amd64/amd64/mp_machdep.c?revision=263878view=markup#l1460

 How do you 'suspend' them? If I remember there is a disadvantage of doing
 this as you have to bring all the CPUs offline. That in Linux means using
 the stop_machine which is pretty big hammer and increases the latency for 
 migration.
 
 Yes, this is why the ability to have the toolstack save/restore the
 secondary vcpu state was added. It's especially important for
 checkpointing, but it's relevant to regular migrate as a performance
 improvement too.
 
 It's not just stop-machine, IIRC it's a tonne of udev events relating to
 cpus off/onlinign etc too and all the userspace activity which that
 implies.

Well, what it's done on FreeBSD is nothing like that, it's called the
cpususpend handler, but it's not off-lining CPUs or anything like that,
it just places the CPU in a while loop inside of an IPI handler, so we
can do something like this will all APs:

while (suspended)
 pause();

register_vcpu_info();

So the registration of the vcpu_info area happens just after the CPU is
waken from suspension and before it leaves the IPI handler, and it's the
CPU itself the one that calls VCPUOP_register_vcpu_info (so we can avoid
the gate in Xen that prevents registering the vcpu_info area for CPUs
different that ourself).

Roger.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating.

2014-04-09 Thread Konrad Rzeszutek Wilk
On Wed, Apr 09, 2014 at 09:37:01AM +0200, Roger Pau Monné wrote:
 On 08/04/14 20:53, Konrad Rzeszutek Wilk wrote:
  On Tue, Apr 08, 2014 at 08:18:48PM +0200, Roger Pau Monné wrote:
  On 08/04/14 19:25, kon...@kernel.org wrote:
  From: Konrad Rzeszutek Wilk konrad.w...@oracle.com
 
  When we migrate an HVM guest, by default our shared_info can
  only hold up to 32 CPUs. As such the hypercall
  VCPUOP_register_vcpu_info was introduced which allowed us to
  setup per-page areas for VCPUs. This means we can boot PVHVM
  guest with more than 32 VCPUs. During migration the per-cpu
  structure is allocated fresh by the hypervisor (vcpu_info_mfn
  is set to INVALID_MFN) so that the newly migrated guest
  can do make the VCPUOP_register_vcpu_info hypercall.
 
  Unfortunatly we end up triggering this condition:
  /* Run this command on yourself or on other offline VCPUS. */
   if ( (v != current)  !test_bit(_VPF_down, v-pause_flags) )
 
  which means we are unable to setup the per-cpu VCPU structures
  for running vCPUS. The Linux PV code paths make this work by
  iterating over every vCPU with:
 
   1) is target CPU up (VCPUOP_is_up hypercall?)
   2) if yes, then VCPUOP_down to pause it.
   3) VCPUOP_register_vcpu_info
   4) if it was down, then VCPUOP_up to bring it back up
 
  But since VCPUOP_down, VCPUOP_is_up, and VCPUOP_up are
  not allowed on HVM guests we can't do this. This patch
  enables this.
 
  Hmmm, this looks like a very convoluted approach to something that could
  be solved more easily IMHO. What we do on FreeBSD is put all vCPUs into
  suspension, which means that all vCPUs except vCPU#0 will be in the
  cpususpend_handler, see:
 
  http://svnweb.freebsd.org/base/head/sys/amd64/amd64/mp_machdep.c?revision=263878view=markup#l1460
  
  How do you 'suspend' them? If I remember there is a disadvantage of doing
  this as you have to bring all the CPUs offline. That in Linux means using
  the stop_machine which is pretty big hammer and increases the latency for 
  migration.
 
 In order to suspend them an IPI_SUSPEND is sent to all vCPUs except vCPU#0:
 
 http://fxr.watson.org/fxr/source/kern/subr_smp.c#L289
 
 Which makes all APs call cpususpend_handler, so we know all APs are
 stuck in a while loop with interrupts disabled:
 
 http://fxr.watson.org/fxr/source/amd64/amd64/mp_machdep.c#L1459
 
 Then on resume the APs are taken out of the while loop and the first
 thing they do before returning from the IPI handler is registering the
 new per-cpu vcpu_info area. But I'm not sure this is something that can
 be accomplished easily on Linux.

That is a bit of what the 'stop_machine' would do. It puts all of the
CPUs in whatever function you want. But I am not sure of the latency impact - as
in what if the migration takes longer and all of the CPUs sit there spinning.
Another variant of that is the 'smp_call_function'.

Then when we resume - we need a mailbox that is shared (easily enough
I think) to tell us that the migration has been done - and then need to call
that VCPUOP_register_vcpu_info.

But if the migration has taken quite long - I fear that the watchdogs
might kick in and start complaining about the CPUs stuck. Especially
if we migrating on overcommitted guest.

With this the latency for them to be 'paused', 'initted', 'unpaused' I
think is much much smaller.

Ugh, lets wait with this exercise of using the 'smp_call_function'
sometime at the end of the summer - and see. That functionality
should be shared with the PV code path IMHO.

 
 I've tried to local-migrate a FreeBSD PVHVM guest with 33 vCPUs on my
 8-way box, and it seems to be working fine :).

Awesome!
 
 Roger.
 
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating.

2014-04-09 Thread David Vrabel
On 09/04/14 16:34, Konrad Rzeszutek Wilk wrote:
 On Wed, Apr 09, 2014 at 09:37:01AM +0200, Roger Pau Monné wrote:
 On 08/04/14 20:53, Konrad Rzeszutek Wilk wrote:
 On Tue, Apr 08, 2014 at 08:18:48PM +0200, Roger Pau Monné wrote:
 On 08/04/14 19:25, kon...@kernel.org wrote:
 From: Konrad Rzeszutek Wilk konrad.w...@oracle.com

 When we migrate an HVM guest, by default our shared_info can
 only hold up to 32 CPUs. As such the hypercall
 VCPUOP_register_vcpu_info was introduced which allowed us to
 setup per-page areas for VCPUs. This means we can boot PVHVM
 guest with more than 32 VCPUs. During migration the per-cpu
 structure is allocated fresh by the hypervisor (vcpu_info_mfn
 is set to INVALID_MFN) so that the newly migrated guest
 can do make the VCPUOP_register_vcpu_info hypercall.

 Unfortunatly we end up triggering this condition:
 /* Run this command on yourself or on other offline VCPUS. */
  if ( (v != current)  !test_bit(_VPF_down, v-pause_flags) )

 which means we are unable to setup the per-cpu VCPU structures
 for running vCPUS. The Linux PV code paths make this work by
 iterating over every vCPU with:

  1) is target CPU up (VCPUOP_is_up hypercall?)
  2) if yes, then VCPUOP_down to pause it.
  3) VCPUOP_register_vcpu_info
  4) if it was down, then VCPUOP_up to bring it back up

 But since VCPUOP_down, VCPUOP_is_up, and VCPUOP_up are
 not allowed on HVM guests we can't do this. This patch
 enables this.

 Hmmm, this looks like a very convoluted approach to something that could
 be solved more easily IMHO. What we do on FreeBSD is put all vCPUs into
 suspension, which means that all vCPUs except vCPU#0 will be in the
 cpususpend_handler, see:

 http://svnweb.freebsd.org/base/head/sys/amd64/amd64/mp_machdep.c?revision=263878view=markup#l1460

 How do you 'suspend' them? If I remember there is a disadvantage of doing
 this as you have to bring all the CPUs offline. That in Linux means using
 the stop_machine which is pretty big hammer and increases the latency for 
 migration.

 In order to suspend them an IPI_SUSPEND is sent to all vCPUs except vCPU#0:

 http://fxr.watson.org/fxr/source/kern/subr_smp.c#L289

 Which makes all APs call cpususpend_handler, so we know all APs are
 stuck in a while loop with interrupts disabled:

 http://fxr.watson.org/fxr/source/amd64/amd64/mp_machdep.c#L1459

 Then on resume the APs are taken out of the while loop and the first
 thing they do before returning from the IPI handler is registering the
 new per-cpu vcpu_info area. But I'm not sure this is something that can
 be accomplished easily on Linux.
 
 That is a bit of what the 'stop_machine' would do. It puts all of the
 CPUs in whatever function you want. But I am not sure of the latency impact - 
 as
 in what if the migration takes longer and all of the CPUs sit there spinning.
 Another variant of that is the 'smp_call_function'.

I tested stop_machine() on all CPUs during suspend once and it was
awful:  100s of ms of additional downtime.

Perhaps a hand-rolled IPI-and-park-in-handler would be quicker the full
stop_machine().

David
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating.

2014-04-09 Thread Konrad Rzeszutek Wilk
On Wed, Apr 09, 2014 at 04:38:37PM +0100, David Vrabel wrote:
 On 09/04/14 16:34, Konrad Rzeszutek Wilk wrote:
  On Wed, Apr 09, 2014 at 09:37:01AM +0200, Roger Pau Monné wrote:
  On 08/04/14 20:53, Konrad Rzeszutek Wilk wrote:
  On Tue, Apr 08, 2014 at 08:18:48PM +0200, Roger Pau Monné wrote:
  On 08/04/14 19:25, kon...@kernel.org wrote:
  From: Konrad Rzeszutek Wilk konrad.w...@oracle.com
 
  When we migrate an HVM guest, by default our shared_info can
  only hold up to 32 CPUs. As such the hypercall
  VCPUOP_register_vcpu_info was introduced which allowed us to
  setup per-page areas for VCPUs. This means we can boot PVHVM
  guest with more than 32 VCPUs. During migration the per-cpu
  structure is allocated fresh by the hypervisor (vcpu_info_mfn
  is set to INVALID_MFN) so that the newly migrated guest
  can do make the VCPUOP_register_vcpu_info hypercall.
 
  Unfortunatly we end up triggering this condition:
  /* Run this command on yourself or on other offline VCPUS. */
   if ( (v != current)  !test_bit(_VPF_down, v-pause_flags) )
 
  which means we are unable to setup the per-cpu VCPU structures
  for running vCPUS. The Linux PV code paths make this work by
  iterating over every vCPU with:
 
   1) is target CPU up (VCPUOP_is_up hypercall?)
   2) if yes, then VCPUOP_down to pause it.
   3) VCPUOP_register_vcpu_info
   4) if it was down, then VCPUOP_up to bring it back up
 
  But since VCPUOP_down, VCPUOP_is_up, and VCPUOP_up are
  not allowed on HVM guests we can't do this. This patch
  enables this.
 
  Hmmm, this looks like a very convoluted approach to something that could
  be solved more easily IMHO. What we do on FreeBSD is put all vCPUs into
  suspension, which means that all vCPUs except vCPU#0 will be in the
  cpususpend_handler, see:
 
  http://svnweb.freebsd.org/base/head/sys/amd64/amd64/mp_machdep.c?revision=263878view=markup#l1460
 
  How do you 'suspend' them? If I remember there is a disadvantage of doing
  this as you have to bring all the CPUs offline. That in Linux means 
  using
  the stop_machine which is pretty big hammer and increases the latency for 
  migration.
 
  In order to suspend them an IPI_SUSPEND is sent to all vCPUs except vCPU#0:
 
  http://fxr.watson.org/fxr/source/kern/subr_smp.c#L289
 
  Which makes all APs call cpususpend_handler, so we know all APs are
  stuck in a while loop with interrupts disabled:
 
  http://fxr.watson.org/fxr/source/amd64/amd64/mp_machdep.c#L1459
 
  Then on resume the APs are taken out of the while loop and the first
  thing they do before returning from the IPI handler is registering the
  new per-cpu vcpu_info area. But I'm not sure this is something that can
  be accomplished easily on Linux.
  
  That is a bit of what the 'stop_machine' would do. It puts all of the
  CPUs in whatever function you want. But I am not sure of the latency impact 
  - as
  in what if the migration takes longer and all of the CPUs sit there 
  spinning.
  Another variant of that is the 'smp_call_function'.
 
 I tested stop_machine() on all CPUs during suspend once and it was
 awful:  100s of ms of additional downtime.

Yikes.
 
 Perhaps a hand-rolled IPI-and-park-in-handler would be quicker the full
 stop_machine().

But that is clearly a bigger patch than this little bug-fix.

Do you want to just take this patch as is and then later on I can work on
prototyping the 'IPI-and-park-in-handler'?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating.

2014-04-08 Thread Konrad Rzeszutek Wilk
On Tue, Apr 08, 2014 at 08:18:48PM +0200, Roger Pau Monné wrote:
> On 08/04/14 19:25, kon...@kernel.org wrote:
> > From: Konrad Rzeszutek Wilk 
> > 
> > When we migrate an HVM guest, by default our shared_info can
> > only hold up to 32 CPUs. As such the hypercall
> > VCPUOP_register_vcpu_info was introduced which allowed us to
> > setup per-page areas for VCPUs. This means we can boot PVHVM
> > guest with more than 32 VCPUs. During migration the per-cpu
> > structure is allocated fresh by the hypervisor (vcpu_info_mfn
> > is set to INVALID_MFN) so that the newly migrated guest
> > can do make the VCPUOP_register_vcpu_info hypercall.
> > 
> > Unfortunatly we end up triggering this condition:
> > /* Run this command on yourself or on other offline VCPUS. */
> >  if ( (v != current) && !test_bit(_VPF_down, >pause_flags) )
> > 
> > which means we are unable to setup the per-cpu VCPU structures
> > for running vCPUS. The Linux PV code paths make this work by
> > iterating over every vCPU with:
> > 
> >  1) is target CPU up (VCPUOP_is_up hypercall?)
> >  2) if yes, then VCPUOP_down to pause it.
> >  3) VCPUOP_register_vcpu_info
> >  4) if it was down, then VCPUOP_up to bring it back up
> > 
> > But since VCPUOP_down, VCPUOP_is_up, and VCPUOP_up are
> > not allowed on HVM guests we can't do this. This patch
> > enables this.
> 
> Hmmm, this looks like a very convoluted approach to something that could
> be solved more easily IMHO. What we do on FreeBSD is put all vCPUs into
> suspension, which means that all vCPUs except vCPU#0 will be in the
> cpususpend_handler, see:
> 
> http://svnweb.freebsd.org/base/head/sys/amd64/amd64/mp_machdep.c?revision=263878=markup#l1460

How do you 'suspend' them? If I remember there is a disadvantage of doing
this as you have to bring all the CPUs "offline". That in Linux means using
the stop_machine which is pretty big hammer and increases the latency for 
migration.

> 
> Then on resume we unblock the "suspended" CPUs, and the first thing they
> do is call cpu_ops.cpu_resume which is basically going to setup the
> vcpu_info using VCPUOP_register_vcpu_info. Not sure if something similar
> is possible under Linux, but it seems easier and doesn't require any
> Xen-side changes.
> 
> Roger.
> 
> 
> ___
> Xen-devel mailing list
> xen-de...@lists.xen.org
> http://lists.xen.org/xen-devel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating.

2014-04-08 Thread Roger Pau Monné
On 08/04/14 19:25, kon...@kernel.org wrote:
> From: Konrad Rzeszutek Wilk 
> 
> When we migrate an HVM guest, by default our shared_info can
> only hold up to 32 CPUs. As such the hypercall
> VCPUOP_register_vcpu_info was introduced which allowed us to
> setup per-page areas for VCPUs. This means we can boot PVHVM
> guest with more than 32 VCPUs. During migration the per-cpu
> structure is allocated fresh by the hypervisor (vcpu_info_mfn
> is set to INVALID_MFN) so that the newly migrated guest
> can do make the VCPUOP_register_vcpu_info hypercall.
> 
> Unfortunatly we end up triggering this condition:
> /* Run this command on yourself or on other offline VCPUS. */
>  if ( (v != current) && !test_bit(_VPF_down, >pause_flags) )
> 
> which means we are unable to setup the per-cpu VCPU structures
> for running vCPUS. The Linux PV code paths make this work by
> iterating over every vCPU with:
> 
>  1) is target CPU up (VCPUOP_is_up hypercall?)
>  2) if yes, then VCPUOP_down to pause it.
>  3) VCPUOP_register_vcpu_info
>  4) if it was down, then VCPUOP_up to bring it back up
> 
> But since VCPUOP_down, VCPUOP_is_up, and VCPUOP_up are
> not allowed on HVM guests we can't do this. This patch
> enables this.

Hmmm, this looks like a very convoluted approach to something that could
be solved more easily IMHO. What we do on FreeBSD is put all vCPUs into
suspension, which means that all vCPUs except vCPU#0 will be in the
cpususpend_handler, see:

http://svnweb.freebsd.org/base/head/sys/amd64/amd64/mp_machdep.c?revision=263878=markup#l1460

Then on resume we unblock the "suspended" CPUs, and the first thing they
do is call cpu_ops.cpu_resume which is basically going to setup the
vcpu_info using VCPUOP_register_vcpu_info. Not sure if something similar
is possible under Linux, but it seems easier and doesn't require any
Xen-side changes.

Roger.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating.

2014-04-08 Thread Roger Pau Monné
On 08/04/14 19:25, kon...@kernel.org wrote:
 From: Konrad Rzeszutek Wilk konrad.w...@oracle.com
 
 When we migrate an HVM guest, by default our shared_info can
 only hold up to 32 CPUs. As such the hypercall
 VCPUOP_register_vcpu_info was introduced which allowed us to
 setup per-page areas for VCPUs. This means we can boot PVHVM
 guest with more than 32 VCPUs. During migration the per-cpu
 structure is allocated fresh by the hypervisor (vcpu_info_mfn
 is set to INVALID_MFN) so that the newly migrated guest
 can do make the VCPUOP_register_vcpu_info hypercall.
 
 Unfortunatly we end up triggering this condition:
 /* Run this command on yourself or on other offline VCPUS. */
  if ( (v != current)  !test_bit(_VPF_down, v-pause_flags) )
 
 which means we are unable to setup the per-cpu VCPU structures
 for running vCPUS. The Linux PV code paths make this work by
 iterating over every vCPU with:
 
  1) is target CPU up (VCPUOP_is_up hypercall?)
  2) if yes, then VCPUOP_down to pause it.
  3) VCPUOP_register_vcpu_info
  4) if it was down, then VCPUOP_up to bring it back up
 
 But since VCPUOP_down, VCPUOP_is_up, and VCPUOP_up are
 not allowed on HVM guests we can't do this. This patch
 enables this.

Hmmm, this looks like a very convoluted approach to something that could
be solved more easily IMHO. What we do on FreeBSD is put all vCPUs into
suspension, which means that all vCPUs except vCPU#0 will be in the
cpususpend_handler, see:

http://svnweb.freebsd.org/base/head/sys/amd64/amd64/mp_machdep.c?revision=263878view=markup#l1460

Then on resume we unblock the suspended CPUs, and the first thing they
do is call cpu_ops.cpu_resume which is basically going to setup the
vcpu_info using VCPUOP_register_vcpu_info. Not sure if something similar
is possible under Linux, but it seems easier and doesn't require any
Xen-side changes.

Roger.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating.

2014-04-08 Thread Konrad Rzeszutek Wilk
On Tue, Apr 08, 2014 at 08:18:48PM +0200, Roger Pau Monné wrote:
 On 08/04/14 19:25, kon...@kernel.org wrote:
  From: Konrad Rzeszutek Wilk konrad.w...@oracle.com
  
  When we migrate an HVM guest, by default our shared_info can
  only hold up to 32 CPUs. As such the hypercall
  VCPUOP_register_vcpu_info was introduced which allowed us to
  setup per-page areas for VCPUs. This means we can boot PVHVM
  guest with more than 32 VCPUs. During migration the per-cpu
  structure is allocated fresh by the hypervisor (vcpu_info_mfn
  is set to INVALID_MFN) so that the newly migrated guest
  can do make the VCPUOP_register_vcpu_info hypercall.
  
  Unfortunatly we end up triggering this condition:
  /* Run this command on yourself or on other offline VCPUS. */
   if ( (v != current)  !test_bit(_VPF_down, v-pause_flags) )
  
  which means we are unable to setup the per-cpu VCPU structures
  for running vCPUS. The Linux PV code paths make this work by
  iterating over every vCPU with:
  
   1) is target CPU up (VCPUOP_is_up hypercall?)
   2) if yes, then VCPUOP_down to pause it.
   3) VCPUOP_register_vcpu_info
   4) if it was down, then VCPUOP_up to bring it back up
  
  But since VCPUOP_down, VCPUOP_is_up, and VCPUOP_up are
  not allowed on HVM guests we can't do this. This patch
  enables this.
 
 Hmmm, this looks like a very convoluted approach to something that could
 be solved more easily IMHO. What we do on FreeBSD is put all vCPUs into
 suspension, which means that all vCPUs except vCPU#0 will be in the
 cpususpend_handler, see:
 
 http://svnweb.freebsd.org/base/head/sys/amd64/amd64/mp_machdep.c?revision=263878view=markup#l1460

How do you 'suspend' them? If I remember there is a disadvantage of doing
this as you have to bring all the CPUs offline. That in Linux means using
the stop_machine which is pretty big hammer and increases the latency for 
migration.

 
 Then on resume we unblock the suspended CPUs, and the first thing they
 do is call cpu_ops.cpu_resume which is basically going to setup the
 vcpu_info using VCPUOP_register_vcpu_info. Not sure if something similar
 is possible under Linux, but it seems easier and doesn't require any
 Xen-side changes.
 
 Roger.
 
 
 ___
 Xen-devel mailing list
 xen-de...@lists.xen.org
 http://lists.xen.org/xen-devel
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/