subject:"Re\: \[Xen\-devel\] \[XEN PATCH 1\/2\] hvm\: Support more than 32 VCPUS when migrating."

Re: [Xen-devel] [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating.

2014-04-09 Thread Konrad Rzeszutek Wilk

On Wed, Apr 09, 2014 at 04:38:37PM +0100, David Vrabel wrote:
> On 09/04/14 16:34, Konrad Rzeszutek Wilk wrote:
> > On Wed, Apr 09, 2014 at 09:37:01AM +0200, Roger Pau Monné wrote:
> >> On 08/04/14 20:53, Konrad Rzeszutek Wilk wrote:
> >>> On Tue, Apr 08, 2014 at 08:18:48PM +0200, Roger Pau Monné wrote:
>  On 08/04/14 19:25, kon...@kernel.org wrote:
> > From: Konrad Rzeszutek Wilk 
> >
> > When we migrate an HVM guest, by default our shared_info can
> > only hold up to 32 CPUs. As such the hypercall
> > VCPUOP_register_vcpu_info was introduced which allowed us to
> > setup per-page areas for VCPUs. This means we can boot PVHVM
> > guest with more than 32 VCPUs. During migration the per-cpu
> > structure is allocated fresh by the hypervisor (vcpu_info_mfn
> > is set to INVALID_MFN) so that the newly migrated guest
> > can do make the VCPUOP_register_vcpu_info hypercall.
> >
> > Unfortunatly we end up triggering this condition:
> > /* Run this command on yourself or on other offline VCPUS. */
> >  if ( (v != current) && !test_bit(_VPF_down, >pause_flags) )
> >
> > which means we are unable to setup the per-cpu VCPU structures
> > for running vCPUS. The Linux PV code paths make this work by
> > iterating over every vCPU with:
> >
> >  1) is target CPU up (VCPUOP_is_up hypercall?)
> >  2) if yes, then VCPUOP_down to pause it.
> >  3) VCPUOP_register_vcpu_info
> >  4) if it was down, then VCPUOP_up to bring it back up
> >
> > But since VCPUOP_down, VCPUOP_is_up, and VCPUOP_up are
> > not allowed on HVM guests we can't do this. This patch
> > enables this.
> 
>  Hmmm, this looks like a very convoluted approach to something that could
>  be solved more easily IMHO. What we do on FreeBSD is put all vCPUs into
>  suspension, which means that all vCPUs except vCPU#0 will be in the
>  cpususpend_handler, see:
> 
>  http://svnweb.freebsd.org/base/head/sys/amd64/amd64/mp_machdep.c?revision=263878=markup#l1460
> >>>
> >>> How do you 'suspend' them? If I remember there is a disadvantage of doing
> >>> this as you have to bring all the CPUs "offline". That in Linux means 
> >>> using
> >>> the stop_machine which is pretty big hammer and increases the latency for 
> >>> migration.
> >>
> >> In order to suspend them an IPI_SUSPEND is sent to all vCPUs except vCPU#0:
> >>
> >> http://fxr.watson.org/fxr/source/kern/subr_smp.c#L289
> >>
> >> Which makes all APs call cpususpend_handler, so we know all APs are
> >> stuck in a while loop with interrupts disabled:
> >>
> >> http://fxr.watson.org/fxr/source/amd64/amd64/mp_machdep.c#L1459
> >>
> >> Then on resume the APs are taken out of the while loop and the first
> >> thing they do before returning from the IPI handler is registering the
> >> new per-cpu vcpu_info area. But I'm not sure this is something that can
> >> be accomplished easily on Linux.
> > 
> > That is a bit of what the 'stop_machine' would do. It puts all of the
> > CPUs in whatever function you want. But I am not sure of the latency impact 
> > - as
> > in what if the migration takes longer and all of the CPUs sit there 
> > spinning.
> > Another variant of that is the 'smp_call_function'.
> 
> I tested stop_machine() on all CPUs during suspend once and it was
> awful:  100s of ms of additional downtime.

Yikes.
> 
> Perhaps a hand-rolled IPI-and-park-in-handler would be quicker the full
> stop_machine().

But that is clearly a bigger patch than this little bug-fix.

Do you want to just take this patch as is and then later on I can work on
prototyping the 'IPI-and-park-in-handler'?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating.

2014-04-09 Thread David Vrabel

On 09/04/14 16:34, Konrad Rzeszutek Wilk wrote:
> On Wed, Apr 09, 2014 at 09:37:01AM +0200, Roger Pau Monné wrote:
>> On 08/04/14 20:53, Konrad Rzeszutek Wilk wrote:
>>> On Tue, Apr 08, 2014 at 08:18:48PM +0200, Roger Pau Monné wrote:
 On 08/04/14 19:25, kon...@kernel.org wrote:
> From: Konrad Rzeszutek Wilk 
>
> When we migrate an HVM guest, by default our shared_info can
> only hold up to 32 CPUs. As such the hypercall
> VCPUOP_register_vcpu_info was introduced which allowed us to
> setup per-page areas for VCPUs. This means we can boot PVHVM
> guest with more than 32 VCPUs. During migration the per-cpu
> structure is allocated fresh by the hypervisor (vcpu_info_mfn
> is set to INVALID_MFN) so that the newly migrated guest
> can do make the VCPUOP_register_vcpu_info hypercall.
>
> Unfortunatly we end up triggering this condition:
> /* Run this command on yourself or on other offline VCPUS. */
>  if ( (v != current) && !test_bit(_VPF_down, >pause_flags) )
>
> which means we are unable to setup the per-cpu VCPU structures
> for running vCPUS. The Linux PV code paths make this work by
> iterating over every vCPU with:
>
>  1) is target CPU up (VCPUOP_is_up hypercall?)
>  2) if yes, then VCPUOP_down to pause it.
>  3) VCPUOP_register_vcpu_info
>  4) if it was down, then VCPUOP_up to bring it back up
>
> But since VCPUOP_down, VCPUOP_is_up, and VCPUOP_up are
> not allowed on HVM guests we can't do this. This patch
> enables this.

 Hmmm, this looks like a very convoluted approach to something that could
 be solved more easily IMHO. What we do on FreeBSD is put all vCPUs into
 suspension, which means that all vCPUs except vCPU#0 will be in the
 cpususpend_handler, see:

 http://svnweb.freebsd.org/base/head/sys/amd64/amd64/mp_machdep.c?revision=263878=markup#l1460
>>>
>>> How do you 'suspend' them? If I remember there is a disadvantage of doing
>>> this as you have to bring all the CPUs "offline". That in Linux means using
>>> the stop_machine which is pretty big hammer and increases the latency for 
>>> migration.
>>
>> In order to suspend them an IPI_SUSPEND is sent to all vCPUs except vCPU#0:
>>
>> http://fxr.watson.org/fxr/source/kern/subr_smp.c#L289
>>
>> Which makes all APs call cpususpend_handler, so we know all APs are
>> stuck in a while loop with interrupts disabled:
>>
>> http://fxr.watson.org/fxr/source/amd64/amd64/mp_machdep.c#L1459
>>
>> Then on resume the APs are taken out of the while loop and the first
>> thing they do before returning from the IPI handler is registering the
>> new per-cpu vcpu_info area. But I'm not sure this is something that can
>> be accomplished easily on Linux.
> 
> That is a bit of what the 'stop_machine' would do. It puts all of the
> CPUs in whatever function you want. But I am not sure of the latency impact - 
> as
> in what if the migration takes longer and all of the CPUs sit there spinning.
> Another variant of that is the 'smp_call_function'.

I tested stop_machine() on all CPUs during suspend once and it was
awful:  100s of ms of additional downtime.

Perhaps a hand-rolled IPI-and-park-in-handler would be quicker the full
stop_machine().

David
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating.

2014-04-09 Thread Konrad Rzeszutek Wilk

On Wed, Apr 09, 2014 at 09:37:01AM +0200, Roger Pau Monné wrote:
> On 08/04/14 20:53, Konrad Rzeszutek Wilk wrote:
> > On Tue, Apr 08, 2014 at 08:18:48PM +0200, Roger Pau Monné wrote:
> >> On 08/04/14 19:25, kon...@kernel.org wrote:
> >>> From: Konrad Rzeszutek Wilk 
> >>>
> >>> When we migrate an HVM guest, by default our shared_info can
> >>> only hold up to 32 CPUs. As such the hypercall
> >>> VCPUOP_register_vcpu_info was introduced which allowed us to
> >>> setup per-page areas for VCPUs. This means we can boot PVHVM
> >>> guest with more than 32 VCPUs. During migration the per-cpu
> >>> structure is allocated fresh by the hypervisor (vcpu_info_mfn
> >>> is set to INVALID_MFN) so that the newly migrated guest
> >>> can do make the VCPUOP_register_vcpu_info hypercall.
> >>>
> >>> Unfortunatly we end up triggering this condition:
> >>> /* Run this command on yourself or on other offline VCPUS. */
> >>>  if ( (v != current) && !test_bit(_VPF_down, >pause_flags) )
> >>>
> >>> which means we are unable to setup the per-cpu VCPU structures
> >>> for running vCPUS. The Linux PV code paths make this work by
> >>> iterating over every vCPU with:
> >>>
> >>>  1) is target CPU up (VCPUOP_is_up hypercall?)
> >>>  2) if yes, then VCPUOP_down to pause it.
> >>>  3) VCPUOP_register_vcpu_info
> >>>  4) if it was down, then VCPUOP_up to bring it back up
> >>>
> >>> But since VCPUOP_down, VCPUOP_is_up, and VCPUOP_up are
> >>> not allowed on HVM guests we can't do this. This patch
> >>> enables this.
> >>
> >> Hmmm, this looks like a very convoluted approach to something that could
> >> be solved more easily IMHO. What we do on FreeBSD is put all vCPUs into
> >> suspension, which means that all vCPUs except vCPU#0 will be in the
> >> cpususpend_handler, see:
> >>
> >> http://svnweb.freebsd.org/base/head/sys/amd64/amd64/mp_machdep.c?revision=263878=markup#l1460
> > 
> > How do you 'suspend' them? If I remember there is a disadvantage of doing
> > this as you have to bring all the CPUs "offline". That in Linux means using
> > the stop_machine which is pretty big hammer and increases the latency for 
> > migration.
> 
> In order to suspend them an IPI_SUSPEND is sent to all vCPUs except vCPU#0:
> 
> http://fxr.watson.org/fxr/source/kern/subr_smp.c#L289
> 
> Which makes all APs call cpususpend_handler, so we know all APs are
> stuck in a while loop with interrupts disabled:
> 
> http://fxr.watson.org/fxr/source/amd64/amd64/mp_machdep.c#L1459
> 
> Then on resume the APs are taken out of the while loop and the first
> thing they do before returning from the IPI handler is registering the
> new per-cpu vcpu_info area. But I'm not sure this is something that can
> be accomplished easily on Linux.

That is a bit of what the 'stop_machine' would do. It puts all of the
CPUs in whatever function you want. But I am not sure of the latency impact - as
in what if the migration takes longer and all of the CPUs sit there spinning.
Another variant of that is the 'smp_call_function'.

Then when we resume - we need a mailbox that is shared (easily enough
I think) to tell us that the migration has been done - and then need to call
that VCPUOP_register_vcpu_info.

But if the migration has taken quite long - I fear that the watchdogs
might kick in and start complaining about the CPUs stuck. Especially
if we migrating on overcommitted guest.

With this the latency for them to be 'paused', 'initted', 'unpaused' I
think is much much smaller.

Ugh, lets wait with this exercise of using the 'smp_call_function'
sometime at the end of the summer - and see. That functionality
should be shared with the PV code path IMHO.

> 
> I've tried to local-migrate a FreeBSD PVHVM guest with 33 vCPUs on my
> 8-way box, and it seems to be working fine :).

Awesome!
> 
> Roger.
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating.

2014-04-09 Thread Roger Pau Monné

On 09/04/14 10:33, Ian Campbell wrote:
> On Tue, 2014-04-08 at 14:53 -0400, Konrad Rzeszutek Wilk wrote:
>> On Tue, Apr 08, 2014 at 08:18:48PM +0200, Roger Pau Monné wrote:
>>> On 08/04/14 19:25, kon...@kernel.org wrote:
 From: Konrad Rzeszutek Wilk 

 When we migrate an HVM guest, by default our shared_info can
 only hold up to 32 CPUs. As such the hypercall
 VCPUOP_register_vcpu_info was introduced which allowed us to
 setup per-page areas for VCPUs. This means we can boot PVHVM
 guest with more than 32 VCPUs. During migration the per-cpu
 structure is allocated fresh by the hypervisor (vcpu_info_mfn
 is set to INVALID_MFN) so that the newly migrated guest
 can do make the VCPUOP_register_vcpu_info hypercall.

 Unfortunatly we end up triggering this condition:
 /* Run this command on yourself or on other offline VCPUS. */
  if ( (v != current) && !test_bit(_VPF_down, >pause_flags) )

 which means we are unable to setup the per-cpu VCPU structures
 for running vCPUS. The Linux PV code paths make this work by
 iterating over every vCPU with:

  1) is target CPU up (VCPUOP_is_up hypercall?)
  2) if yes, then VCPUOP_down to pause it.
  3) VCPUOP_register_vcpu_info
  4) if it was down, then VCPUOP_up to bring it back up

 But since VCPUOP_down, VCPUOP_is_up, and VCPUOP_up are
 not allowed on HVM guests we can't do this. This patch
 enables this.
>>>
>>> Hmmm, this looks like a very convoluted approach to something that could
>>> be solved more easily IMHO. What we do on FreeBSD is put all vCPUs into
>>> suspension, which means that all vCPUs except vCPU#0 will be in the
>>> cpususpend_handler, see:
>>>
>>> http://svnweb.freebsd.org/base/head/sys/amd64/amd64/mp_machdep.c?revision=263878=markup#l1460
>>
>> How do you 'suspend' them? If I remember there is a disadvantage of doing
>> this as you have to bring all the CPUs "offline". That in Linux means using
>> the stop_machine which is pretty big hammer and increases the latency for 
>> migration.
> 
> Yes, this is why the ability to have the toolstack save/restore the
> secondary vcpu state was added. It's especially important for
> checkpointing, but it's relevant to regular migrate as a performance
> improvement too.
> 
> It's not just stop-machine, IIRC it's a tonne of udev events relating to
> cpus off/onlinign etc too and all the userspace activity which that
> implies.

Well, what it's done on FreeBSD is nothing like that, it's called the
cpususpend handler, but it's not off-lining CPUs or anything like that,
it just places the CPU in a while loop inside of an IPI handler, so we
can do something like this will all APs:

while (suspended)
 pause();

register_vcpu_info();

So the registration of the vcpu_info area happens just after the CPU is
waken from suspension and before it leaves the IPI handler, and it's the
CPU itself the one that calls VCPUOP_register_vcpu_info (so we can avoid
the gate in Xen that prevents registering the vcpu_info area for CPUs
different that ourself).

Roger.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating.

2014-04-09 Thread Ian Campbell

On Tue, 2014-04-08 at 14:53 -0400, Konrad Rzeszutek Wilk wrote:
> On Tue, Apr 08, 2014 at 08:18:48PM +0200, Roger Pau Monné wrote:
> > On 08/04/14 19:25, kon...@kernel.org wrote:
> > > From: Konrad Rzeszutek Wilk 
> > > 
> > > When we migrate an HVM guest, by default our shared_info can
> > > only hold up to 32 CPUs. As such the hypercall
> > > VCPUOP_register_vcpu_info was introduced which allowed us to
> > > setup per-page areas for VCPUs. This means we can boot PVHVM
> > > guest with more than 32 VCPUs. During migration the per-cpu
> > > structure is allocated fresh by the hypervisor (vcpu_info_mfn
> > > is set to INVALID_MFN) so that the newly migrated guest
> > > can do make the VCPUOP_register_vcpu_info hypercall.
> > > 
> > > Unfortunatly we end up triggering this condition:
> > > /* Run this command on yourself or on other offline VCPUS. */
> > >  if ( (v != current) && !test_bit(_VPF_down, >pause_flags) )
> > > 
> > > which means we are unable to setup the per-cpu VCPU structures
> > > for running vCPUS. The Linux PV code paths make this work by
> > > iterating over every vCPU with:
> > > 
> > >  1) is target CPU up (VCPUOP_is_up hypercall?)
> > >  2) if yes, then VCPUOP_down to pause it.
> > >  3) VCPUOP_register_vcpu_info
> > >  4) if it was down, then VCPUOP_up to bring it back up
> > > 
> > > But since VCPUOP_down, VCPUOP_is_up, and VCPUOP_up are
> > > not allowed on HVM guests we can't do this. This patch
> > > enables this.
> > 
> > Hmmm, this looks like a very convoluted approach to something that could
> > be solved more easily IMHO. What we do on FreeBSD is put all vCPUs into
> > suspension, which means that all vCPUs except vCPU#0 will be in the
> > cpususpend_handler, see:
> > 
> > http://svnweb.freebsd.org/base/head/sys/amd64/amd64/mp_machdep.c?revision=263878=markup#l1460
> 
> How do you 'suspend' them? If I remember there is a disadvantage of doing
> this as you have to bring all the CPUs "offline". That in Linux means using
> the stop_machine which is pretty big hammer and increases the latency for 
> migration.

Yes, this is why the ability to have the toolstack save/restore the
secondary vcpu state was added. It's especially important for
checkpointing, but it's relevant to regular migrate as a performance
improvement too.

It's not just stop-machine, IIRC it's a tonne of udev events relating to
cpus off/onlinign etc too and all the userspace activity which that
implies.

Ian.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating.

2014-04-09 Thread Roger Pau Monné

On 08/04/14 20:53, Konrad Rzeszutek Wilk wrote:
> On Tue, Apr 08, 2014 at 08:18:48PM +0200, Roger Pau Monné wrote:
>> On 08/04/14 19:25, kon...@kernel.org wrote:
>>> From: Konrad Rzeszutek Wilk 
>>>
>>> When we migrate an HVM guest, by default our shared_info can
>>> only hold up to 32 CPUs. As such the hypercall
>>> VCPUOP_register_vcpu_info was introduced which allowed us to
>>> setup per-page areas for VCPUs. This means we can boot PVHVM
>>> guest with more than 32 VCPUs. During migration the per-cpu
>>> structure is allocated fresh by the hypervisor (vcpu_info_mfn
>>> is set to INVALID_MFN) so that the newly migrated guest
>>> can do make the VCPUOP_register_vcpu_info hypercall.
>>>
>>> Unfortunatly we end up triggering this condition:
>>> /* Run this command on yourself or on other offline VCPUS. */
>>>  if ( (v != current) && !test_bit(_VPF_down, >pause_flags) )
>>>
>>> which means we are unable to setup the per-cpu VCPU structures
>>> for running vCPUS. The Linux PV code paths make this work by
>>> iterating over every vCPU with:
>>>
>>>  1) is target CPU up (VCPUOP_is_up hypercall?)
>>>  2) if yes, then VCPUOP_down to pause it.
>>>  3) VCPUOP_register_vcpu_info
>>>  4) if it was down, then VCPUOP_up to bring it back up
>>>
>>> But since VCPUOP_down, VCPUOP_is_up, and VCPUOP_up are
>>> not allowed on HVM guests we can't do this. This patch
>>> enables this.
>>
>> Hmmm, this looks like a very convoluted approach to something that could
>> be solved more easily IMHO. What we do on FreeBSD is put all vCPUs into
>> suspension, which means that all vCPUs except vCPU#0 will be in the
>> cpususpend_handler, see:
>>
>> http://svnweb.freebsd.org/base/head/sys/amd64/amd64/mp_machdep.c?revision=263878=markup#l1460
> 
> How do you 'suspend' them? If I remember there is a disadvantage of doing
> this as you have to bring all the CPUs "offline". That in Linux means using
> the stop_machine which is pretty big hammer and increases the latency for 
> migration.

In order to suspend them an IPI_SUSPEND is sent to all vCPUs except vCPU#0:

http://fxr.watson.org/fxr/source/kern/subr_smp.c#L289

Which makes all APs call cpususpend_handler, so we know all APs are
stuck in a while loop with interrupts disabled:

http://fxr.watson.org/fxr/source/amd64/amd64/mp_machdep.c#L1459

Then on resume the APs are taken out of the while loop and the first
thing they do before returning from the IPI handler is registering the
new per-cpu vcpu_info area. But I'm not sure this is something that can
be accomplished easily on Linux.

I've tried to local-migrate a FreeBSD PVHVM guest with 33 vCPUs on my
8-way box, and it seems to be working fine :).

Roger.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating.

2014-04-09 Thread Roger Pau Monné

On 08/04/14 20:53, Konrad Rzeszutek Wilk wrote:
On Tue, Apr 08, 2014 at 08:18:48PM +0200, Roger Pau Monné wrote:
On 08/04/14 19:25, kon...@kernel.org wrote:
From: Konrad Rzeszutek Wilk konrad.w...@oracle.com

When we migrate an HVM guest, by default our shared_info can
only hold up to 32 CPUs. As such the hypercall
VCPUOP_register_vcpu_info was introduced which allowed us to
setup per-page areas for VCPUs. This means we can boot PVHVM
guest with more than 32 VCPUs. During migration the per-cpu
structure is allocated fresh by the hypervisor (vcpu_info_mfn
is set to INVALID_MFN) so that the newly migrated guest
can do make the VCPUOP_register_vcpu_info hypercall.

Unfortunatly we end up triggering this condition:
/* Run this command on yourself or on other offline VCPUS. */
if ( (v != current) !test_bit(_VPF_down, v-pause_flags) )

which means we are unable to setup the per-cpu VCPU structures
for running vCPUS. The Linux PV code paths make this work by
iterating over every vCPU with:

1) is target CPU up (VCPUOP_is_up hypercall?)
2) if yes, then VCPUOP_down to pause it.
3) VCPUOP_register_vcpu_info
4) if it was down, then VCPUOP_up to bring it back up

But since VCPUOP_down, VCPUOP_is_up, and VCPUOP_up are
not allowed on HVM guests we can't do this. This patch
enables this.

Hmmm, this looks like a very convoluted approach to something that could
be solved more easily IMHO. What we do on FreeBSD is put all vCPUs into
suspension, which means that all vCPUs except vCPU#0 will be in the
cpususpend_handler, see:

http://svnweb.freebsd.org/base/head/sys/amd64/amd64/mp_machdep.c?revision=263878view=markup#l1460

How do you 'suspend' them? If I remember there is a disadvantage of doing
this as you have to bring all the CPUs offline. That in Linux means using
the stop_machine which is pretty big hammer and increases the latency for
migration.

In order to suspend them an IPI_SUSPEND is sent to all vCPUs except vCPU#0:

http://fxr.watson.org/fxr/source/kern/subr_smp.c#L289

Which makes all APs call cpususpend_handler, so we know all APs are
stuck in a while loop with interrupts disabled:

http://fxr.watson.org/fxr/source/amd64/amd64/mp_machdep.c#L1459

Then on resume the APs are taken out of the while loop and the first
thing they do before returning from the IPI handler is registering the
new per-cpu vcpu_info area. But I'm not sure this is something that can
be accomplished easily on Linux.

I've tried to local-migrate a FreeBSD PVHVM guest with 33 vCPUs on my
8-way box, and it seems to be working fine :).

Roger.

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: [Xen-devel] [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating.

2014-04-09 Thread Ian Campbell

On Tue, 2014-04-08 at 14:53 -0400, Konrad Rzeszutek Wilk wrote:
On Tue, Apr 08, 2014 at 08:18:48PM +0200, Roger Pau Monné wrote:
On 08/04/14 19:25, kon...@kernel.org wrote:
From: Konrad Rzeszutek Wilk konrad.w...@oracle.com

Unfortunatly we end up triggering this condition:
/* Run this command on yourself or on other offline VCPUS. */
if ( (v != current) !test_bit(_VPF_down, v-pause_flags) )

which means we are unable to setup the per-cpu VCPU structures
for running vCPUS. The Linux PV code paths make this work by
iterating over every vCPU with:

1) is target CPU up (VCPUOP_is_up hypercall?)
2) if yes, then VCPUOP_down to pause it.
3) VCPUOP_register_vcpu_info
4) if it was down, then VCPUOP_up to bring it back up

But since VCPUOP_down, VCPUOP_is_up, and VCPUOP_up are
not allowed on HVM guests we can't do this. This patch
enables this.

http://svnweb.freebsd.org/base/head/sys/amd64/amd64/mp_machdep.c?revision=263878view=markup#l1460

Yes, this is why the ability to have the toolstack save/restore the
secondary vcpu state was added. It's especially important for
checkpointing, but it's relevant to regular migrate as a performance
improvement too.

It's not just stop-machine, IIRC it's a tonne of udev events relating to
cpus off/onlinign etc too and all the userspace activity which that
implies.

Ian.

Re: [Xen-devel] [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating.

2014-04-09 Thread Roger Pau Monné

On 09/04/14 10:33, Ian Campbell wrote:
On Tue, 2014-04-08 at 14:53 -0400, Konrad Rzeszutek Wilk wrote:
On Tue, Apr 08, 2014 at 08:18:48PM +0200, Roger Pau Monné wrote:
On 08/04/14 19:25, kon...@kernel.org wrote:
From: Konrad Rzeszutek Wilk konrad.w...@oracle.com

Unfortunatly we end up triggering this condition:
/* Run this command on yourself or on other offline VCPUS. */
if ( (v != current) !test_bit(_VPF_down, v-pause_flags) )

which means we are unable to setup the per-cpu VCPU structures
for running vCPUS. The Linux PV code paths make this work by
iterating over every vCPU with:

1) is target CPU up (VCPUOP_is_up hypercall?)
2) if yes, then VCPUOP_down to pause it.
3) VCPUOP_register_vcpu_info
4) if it was down, then VCPUOP_up to bring it back up

But since VCPUOP_down, VCPUOP_is_up, and VCPUOP_up are
not allowed on HVM guests we can't do this. This patch
enables this.

http://svnweb.freebsd.org/base/head/sys/amd64/amd64/mp_machdep.c?revision=263878view=markup#l1460

It's not just stop-machine, IIRC it's a tonne of udev events relating to
cpus off/onlinign etc too and all the userspace activity which that
implies.

Well, what it's done on FreeBSD is nothing like that, it's called the
cpususpend handler, but it's not off-lining CPUs or anything like that,
it just places the CPU in a while loop inside of an IPI handler, so we
can do something like this will all APs:

while (suspended)
pause();

register_vcpu_info();

So the registration of the vcpu_info area happens just after the CPU is
waken from suspension and before it leaves the IPI handler, and it's the
CPU itself the one that calls VCPUOP_register_vcpu_info (so we can avoid
the gate in Xen that prevents registering the vcpu_info area for CPUs
different that ourself).

Roger.

Re: [Xen-devel] [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating.

2014-04-09 Thread Konrad Rzeszutek Wilk

On Wed, Apr 09, 2014 at 09:37:01AM +0200, Roger Pau Monné wrote:
On 08/04/14 20:53, Konrad Rzeszutek Wilk wrote:
On Tue, Apr 08, 2014 at 08:18:48PM +0200, Roger Pau Monné wrote:
On 08/04/14 19:25, kon...@kernel.org wrote:
From: Konrad Rzeszutek Wilk konrad.w...@oracle.com

Unfortunatly we end up triggering this condition:
/* Run this command on yourself or on other offline VCPUS. */
if ( (v != current) !test_bit(_VPF_down, v-pause_flags) )

which means we are unable to setup the per-cpu VCPU structures
for running vCPUS. The Linux PV code paths make this work by
iterating over every vCPU with:

1) is target CPU up (VCPUOP_is_up hypercall?)
2) if yes, then VCPUOP_down to pause it.
3) VCPUOP_register_vcpu_info
4) if it was down, then VCPUOP_up to bring it back up

But since VCPUOP_down, VCPUOP_is_up, and VCPUOP_up are
not allowed on HVM guests we can't do this. This patch
enables this.

http://svnweb.freebsd.org/base/head/sys/amd64/amd64/mp_machdep.c?revision=263878view=markup#l1460

In order to suspend them an IPI_SUSPEND is sent to all vCPUs except vCPU#0:

http://fxr.watson.org/fxr/source/kern/subr_smp.c#L289

Which makes all APs call cpususpend_handler, so we know all APs are
stuck in a while loop with interrupts disabled:

http://fxr.watson.org/fxr/source/amd64/amd64/mp_machdep.c#L1459

That is a bit of what the 'stop_machine' would do. It puts all of the
CPUs in whatever function you want. But I am not sure of the latency impact - as
in what if the migration takes longer and all of the CPUs sit there spinning.
Another variant of that is the 'smp_call_function'.

Then when we resume - we need a mailbox that is shared (easily enough
I think) to tell us that the migration has been done - and then need to call
that VCPUOP_register_vcpu_info.

But if the migration has taken quite long - I fear that the watchdogs
might kick in and start complaining about the CPUs stuck. Especially
if we migrating on overcommitted guest.

With this the latency for them to be 'paused', 'initted', 'unpaused' I
think is much much smaller.

Ugh, lets wait with this exercise of using the 'smp_call_function'
sometime at the end of the summer - and see. That functionality
should be shared with the PV code path IMHO.

I've tried to local-migrate a FreeBSD PVHVM guest with 33 vCPUs on my
8-way box, and it seems to be working fine :).

Awesome!

Roger.

Re: [Xen-devel] [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating.

2014-04-09 Thread David Vrabel

On 09/04/14 16:34, Konrad Rzeszutek Wilk wrote:
On Wed, Apr 09, 2014 at 09:37:01AM +0200, Roger Pau Monné wrote:
On 08/04/14 20:53, Konrad Rzeszutek Wilk wrote:
On Tue, Apr 08, 2014 at 08:18:48PM +0200, Roger Pau Monné wrote:
On 08/04/14 19:25, kon...@kernel.org wrote:
From: Konrad Rzeszutek Wilk konrad.w...@oracle.com

Unfortunatly we end up triggering this condition:
/* Run this command on yourself or on other offline VCPUS. */
if ( (v != current) !test_bit(_VPF_down, v-pause_flags) )

which means we are unable to setup the per-cpu VCPU structures
for running vCPUS. The Linux PV code paths make this work by
iterating over every vCPU with:

1) is target CPU up (VCPUOP_is_up hypercall?)
2) if yes, then VCPUOP_down to pause it.
3) VCPUOP_register_vcpu_info
4) if it was down, then VCPUOP_up to bring it back up

But since VCPUOP_down, VCPUOP_is_up, and VCPUOP_up are
not allowed on HVM guests we can't do this. This patch
enables this.

http://svnweb.freebsd.org/base/head/sys/amd64/amd64/mp_machdep.c?revision=263878view=markup#l1460

In order to suspend them an IPI_SUSPEND is sent to all vCPUs except vCPU#0:

http://fxr.watson.org/fxr/source/kern/subr_smp.c#L289

Which makes all APs call cpususpend_handler, so we know all APs are
stuck in a while loop with interrupts disabled:

http://fxr.watson.org/fxr/source/amd64/amd64/mp_machdep.c#L1459

That is a bit of what the 'stop_machine' would do. It puts all of the
CPUs in whatever function you want. But I am not sure of the latency impact -
as
in what if the migration takes longer and all of the CPUs sit there spinning.
Another variant of that is the 'smp_call_function'.

I tested stop_machine() on all CPUs during suspend once and it was
awful: 100s of ms of additional downtime.

Perhaps a hand-rolled IPI-and-park-in-handler would be quicker the full
stop_machine().

David
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: [Xen-devel] [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating.

2014-04-09 Thread Konrad Rzeszutek Wilk

On Wed, Apr 09, 2014 at 04:38:37PM +0100, David Vrabel wrote:
On 09/04/14 16:34, Konrad Rzeszutek Wilk wrote:
On Wed, Apr 09, 2014 at 09:37:01AM +0200, Roger Pau Monné wrote:
On 08/04/14 20:53, Konrad Rzeszutek Wilk wrote:
On Tue, Apr 08, 2014 at 08:18:48PM +0200, Roger Pau Monné wrote:
On 08/04/14 19:25, kon...@kernel.org wrote:
From: Konrad Rzeszutek Wilk konrad.w...@oracle.com

Unfortunatly we end up triggering this condition:
/* Run this command on yourself or on other offline VCPUS. */
if ( (v != current) !test_bit(_VPF_down, v-pause_flags) )

which means we are unable to setup the per-cpu VCPU structures
for running vCPUS. The Linux PV code paths make this work by
iterating over every vCPU with:

1) is target CPU up (VCPUOP_is_up hypercall?)
2) if yes, then VCPUOP_down to pause it.
3) VCPUOP_register_vcpu_info
4) if it was down, then VCPUOP_up to bring it back up

But since VCPUOP_down, VCPUOP_is_up, and VCPUOP_up are
not allowed on HVM guests we can't do this. This patch
enables this.

http://svnweb.freebsd.org/base/head/sys/amd64/amd64/mp_machdep.c?revision=263878view=markup#l1460

How do you 'suspend' them? If I remember there is a disadvantage of doing
this as you have to bring all the CPUs offline. That in Linux means
using
the stop_machine which is pretty big hammer and increases the latency for
migration.

In order to suspend them an IPI_SUSPEND is sent to all vCPUs except vCPU#0:

http://fxr.watson.org/fxr/source/kern/subr_smp.c#L289

Which makes all APs call cpususpend_handler, so we know all APs are
stuck in a while loop with interrupts disabled:

http://fxr.watson.org/fxr/source/amd64/amd64/mp_machdep.c#L1459

That is a bit of what the 'stop_machine' would do. It puts all of the
CPUs in whatever function you want. But I am not sure of the latency impact
- as
in what if the migration takes longer and all of the CPUs sit there
spinning.
Another variant of that is the 'smp_call_function'.

I tested stop_machine() on all CPUs during suspend once and it was
awful: 100s of ms of additional downtime.

Yikes.

Perhaps a hand-rolled IPI-and-park-in-handler would be quicker the full
stop_machine().

But that is clearly a bigger patch than this little bug-fix.

Do you want to just take this patch as is and then later on I can work on
prototyping the 'IPI-and-park-in-handler'?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: [Xen-devel] [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating.

2014-04-08 Thread Konrad Rzeszutek Wilk

On Tue, Apr 08, 2014 at 08:18:48PM +0200, Roger Pau Monné wrote:
> On 08/04/14 19:25, kon...@kernel.org wrote:
> > From: Konrad Rzeszutek Wilk 
> > 
> > When we migrate an HVM guest, by default our shared_info can
> > only hold up to 32 CPUs. As such the hypercall
> > VCPUOP_register_vcpu_info was introduced which allowed us to
> > setup per-page areas for VCPUs. This means we can boot PVHVM
> > guest with more than 32 VCPUs. During migration the per-cpu
> > structure is allocated fresh by the hypervisor (vcpu_info_mfn
> > is set to INVALID_MFN) so that the newly migrated guest
> > can do make the VCPUOP_register_vcpu_info hypercall.
> > 
> > Unfortunatly we end up triggering this condition:
> > /* Run this command on yourself or on other offline VCPUS. */
> >  if ( (v != current) && !test_bit(_VPF_down, >pause_flags) )
> > 
> > which means we are unable to setup the per-cpu VCPU structures
> > for running vCPUS. The Linux PV code paths make this work by
> > iterating over every vCPU with:
> > 
> >  1) is target CPU up (VCPUOP_is_up hypercall?)
> >  2) if yes, then VCPUOP_down to pause it.
> >  3) VCPUOP_register_vcpu_info
> >  4) if it was down, then VCPUOP_up to bring it back up
> > 
> > But since VCPUOP_down, VCPUOP_is_up, and VCPUOP_up are
> > not allowed on HVM guests we can't do this. This patch
> > enables this.
> 
> Hmmm, this looks like a very convoluted approach to something that could
> be solved more easily IMHO. What we do on FreeBSD is put all vCPUs into
> suspension, which means that all vCPUs except vCPU#0 will be in the
> cpususpend_handler, see:
> 
> http://svnweb.freebsd.org/base/head/sys/amd64/amd64/mp_machdep.c?revision=263878=markup#l1460

How do you 'suspend' them? If I remember there is a disadvantage of doing
this as you have to bring all the CPUs "offline". That in Linux means using
the stop_machine which is pretty big hammer and increases the latency for 
migration.

> 
> Then on resume we unblock the "suspended" CPUs, and the first thing they
> do is call cpu_ops.cpu_resume which is basically going to setup the
> vcpu_info using VCPUOP_register_vcpu_info. Not sure if something similar
> is possible under Linux, but it seems easier and doesn't require any
> Xen-side changes.
> 
> Roger.
> 
> 
> ___
> Xen-devel mailing list
> xen-de...@lists.xen.org
> http://lists.xen.org/xen-devel
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating.

2014-04-08 Thread Roger Pau Monné

On 08/04/14 19:25, kon...@kernel.org wrote:
> From: Konrad Rzeszutek Wilk 
> 
> When we migrate an HVM guest, by default our shared_info can
> only hold up to 32 CPUs. As such the hypercall
> VCPUOP_register_vcpu_info was introduced which allowed us to
> setup per-page areas for VCPUs. This means we can boot PVHVM
> guest with more than 32 VCPUs. During migration the per-cpu
> structure is allocated fresh by the hypervisor (vcpu_info_mfn
> is set to INVALID_MFN) so that the newly migrated guest
> can do make the VCPUOP_register_vcpu_info hypercall.
> 
> Unfortunatly we end up triggering this condition:
> /* Run this command on yourself or on other offline VCPUS. */
>  if ( (v != current) && !test_bit(_VPF_down, >pause_flags) )
> 
> which means we are unable to setup the per-cpu VCPU structures
> for running vCPUS. The Linux PV code paths make this work by
> iterating over every vCPU with:
> 
>  1) is target CPU up (VCPUOP_is_up hypercall?)
>  2) if yes, then VCPUOP_down to pause it.
>  3) VCPUOP_register_vcpu_info
>  4) if it was down, then VCPUOP_up to bring it back up
> 
> But since VCPUOP_down, VCPUOP_is_up, and VCPUOP_up are
> not allowed on HVM guests we can't do this. This patch
> enables this.

Hmmm, this looks like a very convoluted approach to something that could
be solved more easily IMHO. What we do on FreeBSD is put all vCPUs into
suspension, which means that all vCPUs except vCPU#0 will be in the
cpususpend_handler, see:

http://svnweb.freebsd.org/base/head/sys/amd64/amd64/mp_machdep.c?revision=263878=markup#l1460

Then on resume we unblock the "suspended" CPUs, and the first thing they
do is call cpu_ops.cpu_resume which is basically going to setup the
vcpu_info using VCPUOP_register_vcpu_info. Not sure if something similar
is possible under Linux, but it seems easier and doesn't require any
Xen-side changes.

Roger.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Xen-devel] [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating.

2014-04-08 Thread Roger Pau Monné

On 08/04/14 19:25, kon...@kernel.org wrote:
From: Konrad Rzeszutek Wilk konrad.w...@oracle.com

Unfortunatly we end up triggering this condition:
/* Run this command on yourself or on other offline VCPUS. */
if ( (v != current) !test_bit(_VPF_down, v-pause_flags) )

which means we are unable to setup the per-cpu VCPU structures
for running vCPUS. The Linux PV code paths make this work by
iterating over every vCPU with:

1) is target CPU up (VCPUOP_is_up hypercall?)
2) if yes, then VCPUOP_down to pause it.
3) VCPUOP_register_vcpu_info
4) if it was down, then VCPUOP_up to bring it back up

But since VCPUOP_down, VCPUOP_is_up, and VCPUOP_up are
not allowed on HVM guests we can't do this. This patch
enables this.

http://svnweb.freebsd.org/base/head/sys/amd64/amd64/mp_machdep.c?revision=263878view=markup#l1460

Then on resume we unblock the suspended CPUs, and the first thing they
do is call cpu_ops.cpu_resume which is basically going to setup the
vcpu_info using VCPUOP_register_vcpu_info. Not sure if something similar
is possible under Linux, but it seems easier and doesn't require any
Xen-side changes.

Roger.

Re: [Xen-devel] [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating.

2014-04-08 Thread Konrad Rzeszutek Wilk

On Tue, Apr 08, 2014 at 08:18:48PM +0200, Roger Pau Monné wrote:
On 08/04/14 19:25, kon...@kernel.org wrote:
From: Konrad Rzeszutek Wilk konrad.w...@oracle.com

Unfortunatly we end up triggering this condition:
/* Run this command on yourself or on other offline VCPUS. */
if ( (v != current) !test_bit(_VPF_down, v-pause_flags) )

which means we are unable to setup the per-cpu VCPU structures
for running vCPUS. The Linux PV code paths make this work by
iterating over every vCPU with:

1) is target CPU up (VCPUOP_is_up hypercall?)
2) if yes, then VCPUOP_down to pause it.
3) VCPUOP_register_vcpu_info
4) if it was down, then VCPUOP_up to bring it back up

But since VCPUOP_down, VCPUOP_is_up, and VCPUOP_up are
not allowed on HVM guests we can't do this. This patch
enables this.

http://svnweb.freebsd.org/base/head/sys/amd64/amd64/mp_machdep.c?revision=263878view=markup#l1460

Roger.

___
Xen-devel mailing list
xen-de...@lists.xen.org
http://lists.xen.org/xen-devel
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: [Xen-devel] [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating.

Re: [Xen-devel] [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating.

Re: [Xen-devel] [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating.

Re: [Xen-devel] [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating.

Re: [Xen-devel] [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating.

Re: [Xen-devel] [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating.

Re: [Xen-devel] [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating.

Re: [Xen-devel] [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating.

Re: [Xen-devel] [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating.

Re: [Xen-devel] [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating.

Re: [Xen-devel] [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating.

Re: [Xen-devel] [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating.

Re: [Xen-devel] [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating.

Re: [Xen-devel] [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating.

Re: [Xen-devel] [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating.

Re: [Xen-devel] [XEN PATCH 1/2] hvm: Support more than 32 VCPUS when migrating.

16 matches

Site Navigation

Mail list logo

Footer information