Re: KVM Nested L2 guest startup problems

2014-05-04 Thread Abel Gordon
On Fri, May 2, 2014 at 11:11 PM, Hu Yaohui  wrote:
>
> On Fri, May 2, 2014 at 2:39 PM, Bandan Das  wrote:
> > Hu Yaohui  writes:
> >
> >> On Fri, May 2, 2014 at 11:52 AM, Paolo Bonzini  wrote:
> >>> Il 02/05/2014 17:17, Hu Yaohui ha scritto:
> >>>
>  Hi Paolo,
>  I have tried L0 with linux kernel 3.14.2 and L1 with linux kernel 3.14.2
>  L1 QEMU qemu-1.7.0
>  L2 QEMU qemu-1.7.0.
> >>>
> >>>
> >>> Do you mean L0 and L1?
> >> Yes.
> >>>
> >>> What is your QEMU command line, and what is the processor?  Also, what 
> >>> guest
> >>> you are running?
> >>>
> >> L0 host
> >> - Debian 7 with linux kernel 3.14.2
> >> - 24 pCPU, 120G pMEM
> >> - cpu mode: Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz
> >
> > Ivy Bridge-EP ? Looks similar to
> > https://bugzilla.kernel.org/show_bug.cgi?id=73331
> >
> > Just out of curiosity, any difference if you run with ept=0 ?
> I have tried it. The same error with L0 kvm ept=1 and L1 kvm ept=0
> Do you have any idea how the Ivy Bridge-EP problem is solved?

I experienced a similar problem that was related to nested code
having some bugs related to apicv and other new vmx features.

For example, the code enabled posted interrupts to run L2 even when the
feature was not exposed to L1 and L1 didn't use it.

Try changing prepare_vmcs02  to force disabling posted_interrupts,
code should looks like:



exec_control = vmcs12->pin_based_vm_exec_control;
exec_control |= vmcs_config.pin_based_exec_ctrl;
exec_control &= ~(PIN_BASED_VMX_PREEMPTION_TIMER|PIN_BASED_POSTED_INTR);
vmcs_write32(PIN_BASED_VM_EXEC_CONTROL, exec_control);

...

and also

...
...
exec_control &= ~(SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES |
 SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |
 SECONDARY_EXEC_APIC_REGISTER_VIRT |
 SECONDARY_EXEC_PAUSE_LOOP_EXITING);
...
...


We also experienced issues using apicv for L1 while running a L2 guest
with no apicv, so also load kvm_intel with enable_apicv=0


Hope this solves your problem...
You are welcome to upstream the changes if it does :)

> >
> >> - QEMU command line of L1
> >> $ sudo qemu-system-x86_64 -machine accel=kvm -drive
> >> file=vdisk.img,if=virtio -m 4096 -smp 10 -net
> >> nic,model=virtio,macaddr=52:54:00:12:34:80 -cpu kvm64,+vmx -net
> >> tap,ifname=qtap0,script=no,downscript=no -vnc :2
> >>
> >> L1 guest
> >> - Ubuntu 10.04 with linux kernel 3.14.2
> >> - QEMU command line of L2
> >> $qemu-system-x86_64  -machine accel=kvm -smp 2 -boot c -drive
> >> file=/home/nested/vmdisks/vdisk1-virtnet.img,if=virtio -m 2048 -vnc :4
> >> -net nic,model=virtio,macaddr=52:54:00:12:34:90 -net
> >> tap,ifname=qtap0,script=no,downscript=no
> >>
> >> L2 guest
> >> - Ubuntu 10.04 with linux kernel 2.6.32
> >>> Paolo
> >>>
> >>>
>  I still get the same error when running qemu in L1 guest.
> >>>
> >>>
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe kvm" in
> >> the body of a message to majord...@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: KVM Nested L2 guest startup problems

2014-05-07 Thread Abel Gordon
On Wed, May 7, 2014 at 11:58 AM, Paolo Bonzini  wrote:
> Il 04/05/2014 18:33, Hu Yaohui ha scritto:
>
>>> I experienced a similar problem that was related to nested code
>>> having some bugs related to apicv and other new vmx features.
>>>
>>> For example, the code enabled posted interrupts to run L2 even when the
>>> feature was not exposed to L1 and L1 didn't use it.
>>>
>>> Try changing prepare_vmcs02  to force disabling posted_interrupts,
>>> code should looks like:
>>>
>>> 
>>> 
>>> exec_control = vmcs12->pin_based_vm_exec_control;
>>> exec_control |= vmcs_config.pin_based_exec_ctrl;
>>> exec_control &= ~(PIN_BASED_VMX_PREEMPTION_TIMER|PIN_BASED_POSTED_INTR);
>>> vmcs_write32(PIN_BASED_VM_EXEC_CONTROL, exec_control);
>>> 
>>> ...
>>>
>>> and also
>>>
>>> ...
>>> ...
>>> exec_control &= ~(SECONDARY_EXEC_VIRTUALIZE_APIC_ACCESSES |
>>>  SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY |
>>>  SECONDARY_EXEC_APIC_REGISTER_VIRT |
>>>  SECONDARY_EXEC_PAUSE_LOOP_EXITING);
>
>
> PLE should be left enabled, I think.

Well... the PLE settings L0 uses to run  L1 (vmcs01) may be different
than the PLE settings L1 configured to run  L2 (vmcs12).
For example, L0 can use a  ple_gap to run L1 that is bigger than the
ple_gap L1 configured to run L2. Or  L0 can use a ple_window to run L1
that is smaller than the ple_window L1 configured to run L2.

So seems PLE should never be exposed to L1 or an appropriate nested
handling needs to be implemented. Note the handling may become complex
because in some cases a PLE exit from L2 should be handled directly by
L0 and not passed to L1... remember nested preemption timer support :)
?


>
> Apart from that, I'll change the suggestion into a patch.

Great!

>
> Thanks!
>
> Paolo
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: KVM Nested L2 guest startup problems

2014-05-07 Thread Abel Gordon
On Wed, May 7, 2014 at 2:40 PM, Paolo Bonzini  wrote:
> Il 07/05/2014 13:37, Paolo Bonzini ha scritto:
>
>> Il 07/05/2014 13:16, Abel Gordon ha scritto:
>>>>
>>>> > PLE should be left enabled, I think.
>>>
>>> Well... the PLE settings L0 uses to run  L1 (vmcs01) may be different
>>> than the PLE settings L1 configured to run  L2 (vmcs12).
>>> For example, L0 can use a  ple_gap to run L1 that is bigger than the
>>> ple_gap L1 configured to run L2. Or  L0 can use a ple_window to run L1
>>> that is smaller than the ple_window L1 configured to run L2.
>>
>>
>> That's correct.  We should leave PLE enabled while running L2, but hide
>> the feature altogether from L1.
>
>
> ... which we already do.  The only secondary execution controls we allow are
> APIC page, unrestricted guest, WBINVD exits, and of course EPT.

But we don't verify if L1  tries to enable the feature for L1 (even if
it's not exposed)... Or do we ?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/