[PATCH] KVM: nVMX: Fix vmptrld fail and vmwrite error when L1 goes down w/ enable_shadow_vmcs

2014-07-03 Thread Wanpeng Li
This bug can be trigger by L1 goes down directly w/ enable_shadow_vmcs.

[ 6413.158950] kvm: vmptrld   (null)/7800 failed
[ 6413.158954] vmwrite error: reg 401e value 4 (err 1)
[ 6413.158957] CPU: 0 PID: 4840 Comm: qemu-system-x86 Tainted: G   OE 
3.16.0kvm+ #2
[ 6413.158958] Hardware name: Dell Inc. OptiPlex 9020/0DNKMN, BIOS A05 
12/05/2013
[ 6413.158959]  0003 880210c9fb58 81741de9 
8800d7433f80
[ 6413.158960]  880210c9fb68 a059fa08 880210c9fb78 
a05938bf
[ 6413.158962]  880210c9fba8 a059a97f 8800d7433f80 
0003
[ 6413.158963] Call Trace:
[ 6413.158968]  [] dump_stack+0x45/0x56
[ 6413.158972]  [] vmwrite_error+0x2c/0x2e [kvm_intel]
[ 6413.158974]  [] vmcs_writel+0x1f/0x30 [kvm_intel]
[ 6413.158976]  [] free_nested.part.73+0x5f/0x170 [kvm_intel]
[ 6413.158978]  [] vmx_free_vcpu+0x33/0x70 [kvm_intel]
[ 6413.158991]  [] kvm_arch_vcpu_free+0x44/0x50 [kvm]
[ 6413.158998]  [] kvm_arch_destroy_vm+0xf2/0x1f0 [kvm]

Commit 26a865 (KVM: VMX: fix use after free of vmx->loaded_vmcs) fix the use 
after free bug by move free_loaded_vmcs() before free_nested(), however, this 
lead to free loaded_vmcs->vmcs premature and vmptrld load a NULL pointer during 
sync shadow vmcs to vmcs12. In addition, vmwrite which used to disable shadow 
vmcs and reset VMCS_LINK_POINTER failed since there is no valid current-VMCS.
This patch fix it by skipping kfree vmcs02_list item for current-vmcs in 
nested_free_all_saved_vmcss() and kfree it after free_loaded_vmcs(). This can 
also avoid use after free bug.

Signed-off-by: Wanpeng Li 
---
 arch/x86/kvm/vmx.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 021d84a..6e427ac 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5762,10 +5762,11 @@ static void nested_free_all_saved_vmcss(struct vcpu_vmx 
*vmx)
 {
struct vmcs02_list *item, *n;
list_for_each_entry_safe(item, n, &vmx->nested.vmcs02_pool, list) {
-   if (vmx->loaded_vmcs != &item->vmcs02)
+   if (vmx->loaded_vmcs != &item->vmcs02) {
free_loaded_vmcs(&item->vmcs02);
-   list_del(&item->list);
-   kfree(item);
+   list_del(&item->list);
+   kfree(item);
+   }
}
vmx->nested.vmcs02_num = 0;
 
@@ -7526,10 +7527,16 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu 
*vcpu)
 static void vmx_free_vcpu(struct kvm_vcpu *vcpu)
 {
struct vcpu_vmx *vmx = to_vmx(vcpu);
+   struct vmcs02_list *item;
 
free_vpid(vmx);
-   free_loaded_vmcs(vmx->loaded_vmcs);
free_nested(vmx);
+   free_loaded_vmcs(vmx->loaded_vmcs);
+   if (vmx->nested.vmxon)
+   list_for_each_entry(item, &vmx->nested.vmcs02_pool, list) {
+   list_del(&item->list);
+   kfree(item);
+   }
kfree(vmx->guest_msrs);
kvm_vcpu_uninit(vcpu);
kmem_cache_free(kvm_vcpu_cache, vmx);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] powerpc/kvm: support to handle sw breakpoint

2014-07-03 Thread Alexander Graf


On 04.07.14 06:34, Madhavan Srinivasan wrote:

On Thursday 03 July 2014 05:21 PM, Alexander Graf wrote:

On 01.07.14 10:41, Madhavan Srinivasan wrote:

This patch adds kernel side support for software breakpoint.
Design is that, by using an illegal instruction, we trap to hypervisor
via Emulation Assistance interrupt, where we check for the illegal
instruction
and accordingly we return to Host or Guest. Patch also adds support for
software breakpoint in PR KVM.

Patch mandates use of "abs" instruction as sw breakpoint instruction
(primary opcode 31 and extended opcode 360). Based on PowerISA v2.01,
ABS instruction has been dropped from the architecture and treated an
illegal instruction.

Changes v1->v2:

   Moved the debug instruction #def to kvm_book3s.h. This way PR_KVM
can also share it.
   Added code to use KVM get one reg infrastructure to get debug opcode.
   Updated emulate.c to include emulation of debug instruction incase
of PR_KVM.
   Made changes to commit message.

Signed-off-by: Madhavan Srinivasan 
---
   arch/powerpc/include/asm/kvm_book3s.h |8 
   arch/powerpc/include/asm/ppc-opcode.h |5 +
   arch/powerpc/kvm/book3s.c |3 ++-
   arch/powerpc/kvm/book3s_hv.c  |9 +
   arch/powerpc/kvm/book3s_pr.c  |3 +++
   arch/powerpc/kvm/emulate.c|   10 ++
   6 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h
b/arch/powerpc/include/asm/kvm_book3s.h
index f52f656..180d549 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -24,6 +24,14 @@
   #include 
   #include 
   +/*
+ * KVMPPC_INST_BOOK3S_DEBUG is debug Instruction for supporting
Software Breakpoint.
+ * Instruction mnemonic is ABS, primary opcode is 31 and extended
opcode is 360.
+ * Based on PowerISA v2.01, ABS instruction has been dropped from the
architecture
+ * and treated an illegal instruction.
+ */
+#define KVMPPC_INST_BOOK3S_DEBUG0x7c0002d0

This will still break with LE guests.


I am told to try with all 0s opcode. So rewriting the patch.


The problem with "all 0s" is that it's reasonably likely to occur on 
real world code. Hence Segher was proposing something like 0x0000 
which should be the same regardless of endianness, but has a certain 
appeal of intentional placement ;).





+
   struct kvmppc_bat {
   u64 raw;
   u32 bepi;
diff --git a/arch/powerpc/include/asm/ppc-opcode.h
b/arch/powerpc/include/asm/ppc-opcode.h
index 3132bb9..3fbb4c1 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -111,6 +111,11 @@
   #define OP_31_XOP_LHBRX 790
   #define OP_31_XOP_STHBRX918
   +/* KVMPPC_INST_BOOK3S_DEBUG -- Software breakpoint Instruction
+ * Instruction mnemonic is ABS, primary opcode is 31 and extended
opcode is 360.
+ */
+#define OP_31_XOP_ABS360
+
   #define OP_LWZ  32
   #define OP_LD   58
   #define OP_LWZU 33
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index c254c27..b40fe5d 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -789,7 +789,8 @@ int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu
*vcpu,
   int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
   struct kvm_guest_debug *dbg)
   {
-return -EINVAL;
+vcpu->guest_debug = dbg->control;
+return 0;
   }
 void kvmppc_decrementer_func(unsigned long data)
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 7a12edb..402c1ec 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -725,8 +725,14 @@ static int kvmppc_handle_exit_hv(struct kvm_run
*run, struct kvm_vcpu *vcpu,
* we don't emulate any guest instructions at this stage.
*/
   case BOOK3S_INTERRUPT_H_EMUL_ASSIST:
+if (kvmppc_get_last_inst(vcpu) == KVMPPC_INST_BOOK3S_DEBUG ) {
+run->exit_reason = KVM_EXIT_DEBUG;
+run->debug.arch.address = kvmppc_get_pc(vcpu);
+r = RESUME_HOST;

Phew - why can't we just go into the normal instruction emulator for
EMUL_ASSIST?


IIUC, using the emulation_assist_interrupt function (kernel/trap.c) ?


I was more thinking of kvmppc_emulate_instruction() :).


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] kvm, mem-hotplug: Do not pin ept identity pagetable and apic access page.

2014-07-03 Thread Tang Chen

Hi Gleb,

On 07/03/2014 02:04 PM, Gleb Natapov wrote:

On Thu, Jul 03, 2014 at 09:17:59AM +0800, Tang Chen wrote:

Hi Gleb,

On 07/02/2014 05:00 PM, Tang Chen wrote:

Hi Gleb, Marcelo,

Please help to review this patch-set.

NOTE: This patch-set doesn't work properly.


ept identity pagetable and apic access page in kvm are pinned in memory.
As a result, they cannot be migrated/hot-removed.

But actually they don't need to be pinned in memory.

This patch-set introduces two new vcpu requests: KVM_REQ_MIGRATE_EPT and 
KVM_REQ_MIGRATE_APIC.
These two requests are made when the two pages are migrated by the mmu_notifier
to reset the related variable to unusable value. And will also be made when
ept violation happens to reset new pages.


[Known problem]
After this patch-set applied, the two pages can be migrated/hot-removed.
But after migrating apic access page, the guest died.

The host physical address of apic access page is stored in VMCS. I reset
it to 0 to stop guest from accessing it when it is unmapped by
kvm_mmu_notifier_invalidate_page(). And reset it to new page's host physical
address in tdp_page_fault(). But it seems that guest will access apic page
directly by the host physical address.


Would you please to give some advice about this problem ?


I haven't reviewed third patch yet, will do ASAP.



I printed some info in the kernel, and I found that mmu_notifier 
unmapped the
apic page and set VMCS APIC_ACCESS_ADDR to 0. But apic page ept 
violation didn't

happen. And the guest stopped running.

I think when guest tried to access apic page, there was no ept violation 
happened.

And as a result, VMCS APIC_ACCESS_ADDR was not correctly set.

Referring to Intel Software Developer's Manuel Vol 3B, when accessing 
apic page
using translation with a large page (2M, 4M, 1G), APIC VM_exit will not 
happen.


How do you think about this ?

Thanks. :)




--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race

2014-07-03 Thread Wanpeng Li
On Thu, Jul 03, 2014 at 01:15:26AM -0400, Bandan Das wrote:
>Jan Kiszka  writes:
>
>> On 2014-07-02 08:54, Wanpeng Li wrote:
>>> This patch fix bug https://bugzilla.kernel.org/show_bug.cgi?id=72381 
>>> 
>>> If we didn't inject a still-pending event to L1 since nested_run_pending,
>>> KVM_REQ_EVENT should be requested after the vmexit in order to inject the 
>>> event to L1. However, current log blindly request a KVM_REQ_EVENT even if 
>>> there is no still-pending event to L1 which blocked by nested_run_pending. 
>>> There is a race which lead to an interrupt will be injected to L2 which 
>>> belong to L1 if L0 send an interrupt to L1 during this window. 
>>> 
>>>VCPU0   another thread 
>>> 
>>> L1 intr not blocked on L2 first entry
>>> vmx_vcpu_run req event 
>>> kvm check request req event 
>>> check_nested_events don't have any intr 
>>> not nested exit 
>>> intr occur (8254, lapic timer 
>>> etc)
>>> inject_pending_event now have intr 
>>> inject interrupt 
>>> 
>>> This patch fix this race by introduced a l1_events_blocked field in 
>>> nested_vmx 
>>> which indicates there is still-pending event which blocked by 
>>> nested_run_pending, 
>>> and smart request a KVM_REQ_EVENT if there is a still-pending event which 
>>> blocked 
>>> by nested_run_pending.
>>
>> There are more, unrelated reasons why KVM_REQ_EVENT could be set. Why
>> aren't those able to trigger this scenario?
>>
>> In any case, unconditionally setting KVM_REQ_EVENT seems strange and
>> should be changed.
>
>
>Ugh! I think I am hitting another one but this one's probably because 
>we are not setting KVM_REQ_EVENT for something we should.
>
>Before this patch, I was able to hit this bug everytime with 
>"modprobe kvm_intel ept=0 nested=1 enable_shadow_vmcs=0" and then booting 
>L2. I can verify that I was indeed hitting the race in inject_pending_event.
>
>After this patch, I believe I am hitting another bug - this happens 
>after I boot L2, as above, and then start a Linux kernel compilation
>and then wait and watch :) It's a pain to debug because this happens
>almost once in three times; it never happens if I run with ept=1, however,
>I think that's only because the test completes sooner. But I can confirm
>that I don't see it if I always set REQ_EVENT if nested_run_pending is set 
>instead of
>the approach this patch takes.
>(Any debug hints help appreciated!)
>
>So, I am not sure if this is the right fix. Rather, I think the safer thing
>to do is to have the interrupt pending check for injection into L1 at
>the "same site" as the call to kvm_queue_interrupt() just like we had before 
>commit b6b8a1451fc40412c57d1. Is there any advantage to having all the 
>nested events checks together ?
>

How about revert commit b6b8a1451 and try if the bug which you mentioned
is still there?

Regards,
Wanpeng Li 

>PS - Actually, a much easier fix (or rather hack) is to return 1 in 
>vmx_interrupt_allowed() (as I mentioned elsewhere) only if 
>!is_guest_mode(vcpu) That way, the pending interrupt interrupt 
>can be taken care of correctly during the next vmexit.
>
>Bandan
>
>> Jan
>>
>>> 
>>> Signed-off-by: Wanpeng Li 
>>> ---
>>>  arch/x86/kvm/vmx.c | 20 +++-
>>>  1 file changed, 15 insertions(+), 5 deletions(-)
>>> 
>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>> index f4e5aed..fe69c49 100644
>>> --- a/arch/x86/kvm/vmx.c
>>> +++ b/arch/x86/kvm/vmx.c
>>> @@ -372,6 +372,7 @@ struct nested_vmx {
>>> u64 vmcs01_tsc_offset;
>>> /* L2 must run next, and mustn't decide to exit to L1. */
>>> bool nested_run_pending;
>>> +   bool l1_events_blocked;
>>> /*
>>>  * Guest pages referred to in vmcs02 with host-physical pointers, so
>>>  * we must keep them pinned while L2 runs.
>>> @@ -7380,8 +7381,10 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu 
>>> *vcpu)
>>>  * we did not inject a still-pending event to L1 now because of
>>>  * nested_run_pending, we need to re-enable this bit.
>>>  */
>>> -   if (vmx->nested.nested_run_pending)
>>> +   if (to_vmx(vcpu)->nested.l1_events_blocked) {
>>> +   to_vmx(vcpu)->nested.l1_events_blocked = false;
>>> kvm_make_request(KVM_REQ_EVENT, vcpu);
>>> +   }
>>>  
>>> vmx->nested.nested_run_pending = 0;
>>>  
>>> @@ -8197,15 +8200,20 @@ static int vmx_check_nested_events(struct kvm_vcpu 
>>> *vcpu, bool external_intr)
>>>  
>>> if (nested_cpu_has_preemption_timer(get_vmcs12(vcpu)) &&
>>> vmx->nested.preemption_timer_expired) {
>>> -   if (vmx->nested.nested_run_pending)
>>> +   if (vmx->nested.nested_run_pending) {
>>> +   vmx->nested.l1_events_blocked = true;
>>> return -EBUSY;
>>> +   }
>>> nested_vmx_vmexit(vcpu, EXIT_REASON_PREEMPTION_TIMER, 0, 0);
>>> return 0;
>>> }
>>>  
>>> if (vcpu->arch.nmi_pending && nested_exit_on_nmi(vcpu)) {
>

Re: [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race

2014-07-03 Thread Wanpeng Li
On Fri, Jul 04, 2014 at 07:43:14AM +0200, Jan Kiszka wrote:
>On 2014-07-04 04:52, Wanpeng Li wrote:
>> On Thu, Jul 03, 2014 at 01:27:05PM -0400, Bandan Das wrote:
>> [...]
>>> # modprobe kvm_intel ept=0 nested=1 enable_shadow_vmcs=0
>>>
>>> The Host CPU - Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
>>> qemu cmd to run L1 - 
>>> # qemu-system-x86_64 -drive 
>>> file=level1.img,if=virtio,id=disk0,format=raw,cache=none,werror=stop,rerror=stop,aio=threads
>>>  -drive 
>>> file=level2.img,if=virtio,id=disk1,format=raw,cache=none,werror=stop,rerror=stop,aio=threads
>>>  -vnc :2 --enable-kvm -monitor stdio -m 4G -net 
>>> nic,macaddr=00:23:32:45:89:10 -net 
>>> tap,ifname=tap0,script=/etc/qemu-ifup,downscript=no -smp 4 -cpu 
>>> Nehalem,+vmx -serial pty
>>>
>>> qemu cmd to run L2 -
>>> # sudo qemu-system-x86_64 -hda VM/level2.img -vnc :0 --enable-kvm -monitor 
>>> stdio -m 2G -smp 2 -cpu Nehalem -redir tcp:::22
>>>
>>> Additionally,
>>> L0 is FC19 with 3.16-rc3
>>> L1 and L2 are Ubuntu 14.04 with 3.13.0-24-generic
>>>
>>> Then start a kernel compilation inside L2 with "make -j3"
>>>
>>> There's no call trace on L0, both L0 and L1 are hung (or rather really 
>>> slow) and
>>> L1 serial spews out CPU softlock up errors. Enabling panic on softlockup on 
>>> L1 will give
>>> a trace with smp_call_function_many() I think the corresponding code in 
>>> kernel/smp.c that
>>> triggers this is
>>>
>>> WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
>>>  && !oops_in_progress && !early_boot_irqs_disabled);
>>>
>>> I know in most cases this is usually harmless, but in this specific case,
>>> it seems it's stuck here forever.
>>>
>>> Sorry, I don't have a L1 call trace handy atm, I can post that if you are 
>>> interested.
>>>
>>> Note that this can take as much as 30 to 40 minutes to appear but once it 
>>> does,
>>> you will know because both L1 and L2 will be stuck with the serial messages 
>>> as I mentioned
>>> before. From my side, let me try this on another system to rule out any 
>>> machine specific
>>> weirdness going on..
>>>
>> 
>> Thanks for your pointing out. 
>> 
>>> Please let me know if you need any further information.
>>>
>> 
>> I just run kvm-unit-tests w/ vmx.flat and eventinj.flat.
>> 
>> 
>> w/ vmx.flat and w/o my patch applied 
>> 
>> [...]
>> 
>> Test suite : interrupt
>> FAIL: direct interrupt while running guest
>> PASS: intercepted interrupt while running guest
>> FAIL: direct interrupt + hlt
>> FAIL: intercepted interrupt + hlt
>> FAIL: direct interrupt + activity state hlt
>> FAIL: intercepted interrupt + activity state hlt
>> PASS: running a guest with interrupt acknowledgement set
>> SUMMARY: 69 tests, 6 failures
>> 
>> w/ vmx.flat and w/ my patch applied 
>> 
>> [...]
>> 
>> Test suite : interrupt
>> PASS: direct interrupt while running guest
>> PASS: intercepted interrupt while running guest
>> PASS: direct interrupt + hlt
>> FAIL: intercepted interrupt + hlt
>> PASS: direct interrupt + activity state hlt
>> PASS: intercepted interrupt + activity state hlt
>> PASS: running a guest with interrupt acknowledgement set
>> 
>> SUMMARY: 69 tests, 2 failures 
>
>Which version (hash) of kvm-unit-tests are you using? All tests up to
>307621765a are running fine here, but since a0e30e712d not much is
>completing successfully anymore:
>

I just git pull my kvm-unit-tests to latest, the last commit is daeec9795d.

>enabling apic
>paging enabled
>cr0 = 80010011
>cr3 = 7fff000
>cr4 = 20
>PASS: test vmxon with FEATURE_CONTROL cleared
>PASS: test vmxon without FEATURE_CONTROL lock
>PASS: test enable VMX in FEATURE_CONTROL
>PASS: test FEATURE_CONTROL lock bit
>PASS: test vmxon
>FAIL: test vmptrld
>PASS: test vmclear
>init_vmcs : make_vmcs_current error
>FAIL: test vmptrst
>init_vmcs : make_vmcs_current error
>vmx_run : vmlaunch failed.
>FAIL: test vmlaunch
>FAIL: test vmlaunch
>
>SUMMARY: 10 tests, 4 unexpected failures


/opt/qemu/bin/qemu-system-x86_64 -enable-kvm -device pc-testdev -serial stdio 
-device isa-debug-exit,iobase=0xf4,iosize=0x4 -kernel ./x86/vmx.flat -cpu host

Test suite : interrupt
PASS: direct interrupt while running guest
PASS: intercepted interrupt while running guest
PASS: direct interrupt + hlt
FAIL: intercepted interrupt + hlt
PASS: direct interrupt + activity state hlt
PASS: intercepted interrupt + activity state hlt
PASS: running a guest with interrupt acknowledgement set

SUMMARY: 69 tests, 2 failures

However, interrupted interrupt + hlt always fail w/ and w/o my patch.

Regards,
Wanpeng Li 

>
>
>Jan
>
>-- 
>Siemens AG, Corporate Technology, CT RTC ITP SES-DE
>Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: direct device assignment in nested VM

2014-07-03 Thread Jan Kiszka
On 2014-07-04 05:27, Hu Yaohui wrote:
> Hi All,
> Is direct device assignment in nested VM supported in the latest KVM
> mainline now?

Le Tan is currently working on emulated device assignment (VT-d
emulation in QEMU). This is the necessary first step and could later be
extended to enable assignment of physical devices to nested guests with
support of the host IOMMU (out of scope for Le's project, though).

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race

2014-07-03 Thread Jan Kiszka
On 2014-07-04 04:52, Wanpeng Li wrote:
> On Thu, Jul 03, 2014 at 01:27:05PM -0400, Bandan Das wrote:
> [...]
>> # modprobe kvm_intel ept=0 nested=1 enable_shadow_vmcs=0
>>
>> The Host CPU - Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
>> qemu cmd to run L1 - 
>> # qemu-system-x86_64 -drive 
>> file=level1.img,if=virtio,id=disk0,format=raw,cache=none,werror=stop,rerror=stop,aio=threads
>>  -drive 
>> file=level2.img,if=virtio,id=disk1,format=raw,cache=none,werror=stop,rerror=stop,aio=threads
>>  -vnc :2 --enable-kvm -monitor stdio -m 4G -net 
>> nic,macaddr=00:23:32:45:89:10 -net 
>> tap,ifname=tap0,script=/etc/qemu-ifup,downscript=no -smp 4 -cpu Nehalem,+vmx 
>> -serial pty
>>
>> qemu cmd to run L2 -
>> # sudo qemu-system-x86_64 -hda VM/level2.img -vnc :0 --enable-kvm -monitor 
>> stdio -m 2G -smp 2 -cpu Nehalem -redir tcp:::22
>>
>> Additionally,
>> L0 is FC19 with 3.16-rc3
>> L1 and L2 are Ubuntu 14.04 with 3.13.0-24-generic
>>
>> Then start a kernel compilation inside L2 with "make -j3"
>>
>> There's no call trace on L0, both L0 and L1 are hung (or rather really slow) 
>> and
>> L1 serial spews out CPU softlock up errors. Enabling panic on softlockup on 
>> L1 will give
>> a trace with smp_call_function_many() I think the corresponding code in 
>> kernel/smp.c that
>> triggers this is
>>
>> WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
>>  && !oops_in_progress && !early_boot_irqs_disabled);
>>
>> I know in most cases this is usually harmless, but in this specific case,
>> it seems it's stuck here forever.
>>
>> Sorry, I don't have a L1 call trace handy atm, I can post that if you are 
>> interested.
>>
>> Note that this can take as much as 30 to 40 minutes to appear but once it 
>> does,
>> you will know because both L1 and L2 will be stuck with the serial messages 
>> as I mentioned
>> before. From my side, let me try this on another system to rule out any 
>> machine specific
>> weirdness going on..
>>
> 
> Thanks for your pointing out. 
> 
>> Please let me know if you need any further information.
>>
> 
> I just run kvm-unit-tests w/ vmx.flat and eventinj.flat.
> 
> 
> w/ vmx.flat and w/o my patch applied 
> 
> [...]
> 
> Test suite : interrupt
> FAIL: direct interrupt while running guest
> PASS: intercepted interrupt while running guest
> FAIL: direct interrupt + hlt
> FAIL: intercepted interrupt + hlt
> FAIL: direct interrupt + activity state hlt
> FAIL: intercepted interrupt + activity state hlt
> PASS: running a guest with interrupt acknowledgement set
> SUMMARY: 69 tests, 6 failures
> 
> w/ vmx.flat and w/ my patch applied 
> 
> [...]
> 
> Test suite : interrupt
> PASS: direct interrupt while running guest
> PASS: intercepted interrupt while running guest
> PASS: direct interrupt + hlt
> FAIL: intercepted interrupt + hlt
> PASS: direct interrupt + activity state hlt
> PASS: intercepted interrupt + activity state hlt
> PASS: running a guest with interrupt acknowledgement set
> 
> SUMMARY: 69 tests, 2 failures 

Which version (hash) of kvm-unit-tests are you using? All tests up to
307621765a are running fine here, but since a0e30e712d not much is
completing successfully anymore:

enabling apic
paging enabled
cr0 = 80010011
cr3 = 7fff000
cr4 = 20
PASS: test vmxon with FEATURE_CONTROL cleared
PASS: test vmxon without FEATURE_CONTROL lock
PASS: test enable VMX in FEATURE_CONTROL
PASS: test FEATURE_CONTROL lock bit
PASS: test vmxon
FAIL: test vmptrld
PASS: test vmclear
init_vmcs : make_vmcs_current error
FAIL: test vmptrst
init_vmcs : make_vmcs_current error
vmx_run : vmlaunch failed.
FAIL: test vmlaunch
FAIL: test vmlaunch

SUMMARY: 10 tests, 4 unexpected failures


Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


direct device assignment in nested VM

2014-07-03 Thread Hu Yaohui
Hi All,
Is direct device assignment in nested VM supported in the latest KVM
mainline now?

Thanks,
Yaohui
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race

2014-07-03 Thread Wanpeng Li
On Thu, Jul 03, 2014 at 01:27:05PM -0400, Bandan Das wrote:
[...]
># modprobe kvm_intel ept=0 nested=1 enable_shadow_vmcs=0
>
>The Host CPU - Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
>qemu cmd to run L1 - 
># qemu-system-x86_64 -drive 
>file=level1.img,if=virtio,id=disk0,format=raw,cache=none,werror=stop,rerror=stop,aio=threads
> -drive 
>file=level2.img,if=virtio,id=disk1,format=raw,cache=none,werror=stop,rerror=stop,aio=threads
> -vnc :2 --enable-kvm -monitor stdio -m 4G -net nic,macaddr=00:23:32:45:89:10 
>-net tap,ifname=tap0,script=/etc/qemu-ifup,downscript=no -smp 4 -cpu 
>Nehalem,+vmx -serial pty
>
>qemu cmd to run L2 -
># sudo qemu-system-x86_64 -hda VM/level2.img -vnc :0 --enable-kvm -monitor 
>stdio -m 2G -smp 2 -cpu Nehalem -redir tcp:::22
>
>Additionally,
>L0 is FC19 with 3.16-rc3
>L1 and L2 are Ubuntu 14.04 with 3.13.0-24-generic
>
>Then start a kernel compilation inside L2 with "make -j3"
>
>There's no call trace on L0, both L0 and L1 are hung (or rather really slow) 
>and
>L1 serial spews out CPU softlock up errors. Enabling panic on softlockup on L1 
>will give
>a trace with smp_call_function_many() I think the corresponding code in 
>kernel/smp.c that
>triggers this is
> 
>WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
>  && !oops_in_progress && !early_boot_irqs_disabled);
>
>I know in most cases this is usually harmless, but in this specific case,
>it seems it's stuck here forever.
>
>Sorry, I don't have a L1 call trace handy atm, I can post that if you are 
>interested.
>
>Note that this can take as much as 30 to 40 minutes to appear but once it does,
>you will know because both L1 and L2 will be stuck with the serial messages as 
>I mentioned
>before. From my side, let me try this on another system to rule out any 
>machine specific
>weirdness going on..
>

Thanks for your pointing out. 

>Please let me know if you need any further information.
>

I just run kvm-unit-tests w/ vmx.flat and eventinj.flat.


w/ vmx.flat and w/o my patch applied 

[...]

Test suite : interrupt
FAIL: direct interrupt while running guest
PASS: intercepted interrupt while running guest
FAIL: direct interrupt + hlt
FAIL: intercepted interrupt + hlt
FAIL: direct interrupt + activity state hlt
FAIL: intercepted interrupt + activity state hlt
PASS: running a guest with interrupt acknowledgement set
SUMMARY: 69 tests, 6 failures

w/ vmx.flat and w/ my patch applied 

[...]

Test suite : interrupt
PASS: direct interrupt while running guest
PASS: intercepted interrupt while running guest
PASS: direct interrupt + hlt
FAIL: intercepted interrupt + hlt
PASS: direct interrupt + activity state hlt
PASS: intercepted interrupt + activity state hlt
PASS: running a guest with interrupt acknowledgement set

SUMMARY: 69 tests, 2 failures 


w/ eventinj.flat and w/o my patch applied 

SUMMARY: 13 tests, 0 failures

w/ eventinj.flat and w/ my patch applied 

SUMMARY: 13 tests, 0 failures


I'm not sure if the bug you mentioned has any relationship with "Fail:
intercepted interrupt + hlt" which has already present before my patch.

Regards,
Wanpeng Li 

>Thanks
>Bandan
>
>> Regards,
>> Wanpeng Li 
>>
>>>almost once in three times; it never happens if I run with ept=1, however,
>>>I think that's only because the test completes sooner. But I can confirm
>>>that I don't see it if I always set REQ_EVENT if nested_run_pending is set 
>>>instead of
>>>the approach this patch takes.
>>>(Any debug hints help appreciated!)
>>>
>>>So, I am not sure if this is the right fix. Rather, I think the safer thing
>>>to do is to have the interrupt pending check for injection into L1 at
>>>the "same site" as the call to kvm_queue_interrupt() just like we had before 
>>>commit b6b8a1451fc40412c57d1. Is there any advantage to having all the 
>>>nested events checks together ?
>>>
>>>PS - Actually, a much easier fix (or rather hack) is to return 1 in 
>>>vmx_interrupt_allowed() (as I mentioned elsewhere) only if 
>>>!is_guest_mode(vcpu) That way, the pending interrupt interrupt 
>>>can be taken care of correctly during the next vmexit.
>>>
>>>Bandan
>>>
 Jan

>> [...]
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4] kvm, memory-hotplug: Update ept identity pagetable when it is migrated.

2014-07-03 Thread Tang Chen

Hi Gleb,

On 07/03/2014 12:34 AM, Gleb Natapov wrote:

On Wed, Jul 02, 2014 at 05:00:36PM +0800, Tang Chen wrote:

ept identity pagetable is pinned in memory, and as a result it cannot be
migrated/hot-removed.

But actually it doesn't need to be pinned in memory.

This patch introduces a new vcpu request: KVM_REQ_MIGRATE_EPT to reset ept
indetity pagetable related variable. This request will be made when
kvm_mmu_notifier_invalidate_page() is called when the page is unmapped
from the qemu user space to reset kvm->arch.ept_identity_pagetable to NULL.
And will also be made when ept violation happens to reset
kvm->arch.ept_identity_pagetable to the new page.


kvm->arch.ept_identity_pagetable is never used as a page address, just
boolean null/!null to see if identity pagetable is initialized. I do
not see why would we want to track its address at all. Changing it to bool
and assigning true during initialization should be enough.


We already have kvm->arch.ept_identity_pagetable_done to indicate if the 
ept
identity table is initialized. If we make 
kvm->arch.ept_identity_pagetable to

bool, do you mean we have:

kvm->arch.ept_identity_pagetable: indicate if ept page is allocated,
kvm->arch.ept_identity_pagetable_done: indicate if ept page is initialized ?

I don't think we need this. Shall we remove 
kvm->arch.ept_identity_pagetable ?


Thanks.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] kvm, mem-hotplug: Update apic access page when it is migrated.

2014-07-03 Thread Tang Chen

Hi Gleb,

Thanks for the advices. Please see below.

On 07/03/2014 09:55 PM, Gleb Natapov wrote:
..

@@ -575,6 +575,7 @@ struct kvm_arch {

unsigned int tss_addr;
struct page *apic_access_page;
+   bool apic_access_page_migrated;

Better have two requests KVM_REQ_APIC_PAGE_MAP, KVM_REQ_APIC_PAGE_UNMAP IMO.



vcpu->requests is an unsigned long, and we can only has 64 requests. Isn't
adding two requests for apic page and another similar two for ept page too
many ?  Not sure.



gpa_t wall_clock;

@@ -739,6 +740,7 @@ struct kvm_x86_ops {
void (*hwapic_isr_update)(struct kvm *kvm, int isr);
void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
+   void (*set_apic_access_page_addr)(struct kvm *kvm, hpa_t hpa);
void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index c0d72f6..a655444 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3436,6 +3436,21 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t 
gpa, u32 error_code,
kvm_make_request(KVM_REQ_MIGRATE_EPT, vcpu);
}

+   if (gpa == VMX_APIC_ACCESS_PAGE_ADDR&&
+   vcpu->kvm->arch.apic_access_page_migrated) {

Why check arch.apic_access_page_migrated here? Isn't it enough that the fault 
is on apic
address.



True. It's enough. Followed.


+   int i;
+
+   vcpu->kvm->arch.apic_access_page_migrated = false;
+
+   /*
+* We need update APIC_ACCESS_ADDR pointer in each VMCS of
+* all the online vcpus.
+*/
+   for (i = 0; i<  atomic_read(&vcpu->kvm->online_vcpus); i++)
+   kvm_make_request(KVM_REQ_MIGRATE_APIC,
+vcpu->kvm->vcpus[i]);

make_all_cpus_request(). You need to kick all vcpus from a guest mode.



OK, followed. But would you please explain more about this. :)
Why need to kick all vcpus from guest mode when making request to all 
vcpus ?



+   }
+
spin_unlock(&vcpu->kvm->mmu_lock);

return r;
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c336cb3..abc152f 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3988,7 +3988,7 @@ static int alloc_apic_access_page(struct kvm *kvm)
if (r)
goto out;

-   page = gfn_to_page(kvm, VMX_APIC_ACCESS_PAGE_ADDR>>  PAGE_SHIFT);
+   page = gfn_to_page_no_pin(kvm, VMX_APIC_ACCESS_PAGE_ADDR>>  PAGE_SHIFT);
if (is_error_page(page)) {
r = -EFAULT;
goto out;
@@ -7075,6 +7075,12 @@ static void vmx_set_virtual_x2apic_mode(struct kvm_vcpu 
*vcpu, bool set)
vmx_set_msr_bitmap(vcpu);
  }

+static void vmx_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa)
+{
+   if (vm_need_virtualize_apic_accesses(kvm))

This shouldn't even been called if apic access page is not supported. Nor
mmu_notifier path neither tdp_page_fault path should ever see 0xfee0
address. BUG() is more appropriate here.



I don't quite understand. Why calling this function here will leed to bug ?
(Sorry, I'm not quite understand the internal of KVM. Please help.)




+   vmcs_write64(APIC_ACCESS_ADDR, hpa);
+}
+
  static void vmx_hwapic_isr_update(struct kvm *kvm, int isr)
  {
u16 status;
@@ -8846,6 +8852,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
.enable_irq_window = enable_irq_window,
.update_cr8_intercept = update_cr8_intercept,
.set_virtual_x2apic_mode = vmx_set_virtual_x2apic_mode,
+   .set_apic_access_page_addr = vmx_set_apic_access_page_addr,

svm needs that too.



OK, will add one for svm.


.vm_has_apicv = vmx_vm_has_apicv,
.load_eoi_exitmap = vmx_load_eoi_exitmap,
.hwapic_irr_update = vmx_hwapic_irr_update,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a26524f..14e7174 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5943,6 +5943,24 @@ static void vcpu_migrated_page_update_ept(struct 
kvm_vcpu *vcpu)
}
  }

+static void vcpu_migrated_page_update_apic(struct kvm_vcpu *vcpu)
+{
+   struct kvm *kvm = vcpu->kvm;
+
+   if (kvm->arch.apic_access_page_migrated) {
+   if (kvm->arch.apic_access_page)
+   kvm->arch.apic_access_page = pfn_to_page(0);

All vcpus will access apic_access_page without locking here. May be
set kvm->arch.apic_access_page to zero in mmu_notifier and here call
  kvm_x86_ops->set_apic_access_page_addr(kvm, kvm->arch.apic_access_page);



I'm a little confused. apic access page's phys_addr is stored in vmcs, and
I think it will be used by vcpu directly to access the physical page.
Setting kvm->arch.apic_access_page to zero wil

Re: [PATCH 4/4] kvm, mem-hotplug: Update apic access page when it is migrated.

2014-07-03 Thread Tang Chen

Hi Gleb,

Thanks for the advices. Please see below.

On 07/03/2014 09:55 PM, Gleb Natapov wrote:
..

@@ -575,6 +575,7 @@ struct kvm_arch {

unsigned int tss_addr;
struct page *apic_access_page;
+   bool apic_access_page_migrated;

Better have two requests KVM_REQ_APIC_PAGE_MAP, KVM_REQ_APIC_PAGE_UNMAP IMO.



vcpu->requests is an unsigned long, and we can only has 64 requests. Isn't
adding two requests for apic page and another similar two for ept page too
many ?  Not sure.



gpa_t wall_clock;

@@ -739,6 +740,7 @@ struct kvm_x86_ops {
void (*hwapic_isr_update)(struct kvm *kvm, int isr);
void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
+   void (*set_apic_access_page_addr)(struct kvm *kvm, hpa_t hpa);
void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index c0d72f6..a655444 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3436,6 +3436,21 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t 
gpa, u32 error_code,
kvm_make_request(KVM_REQ_MIGRATE_EPT, vcpu);
}

+   if (gpa == VMX_APIC_ACCESS_PAGE_ADDR&&
+   vcpu->kvm->arch.apic_access_page_migrated) {

Why check arch.apic_access_page_migrated here? Isn't it enough that the fault 
is on apic
address.



True. It's enough. Followed.


+   int i;
+
+   vcpu->kvm->arch.apic_access_page_migrated = false;
+
+   /*
+* We need update APIC_ACCESS_ADDR pointer in each VMCS of
+* all the online vcpus.
+*/
+   for (i = 0; i<  atomic_read(&vcpu->kvm->online_vcpus); i++)
+   kvm_make_request(KVM_REQ_MIGRATE_APIC,
+vcpu->kvm->vcpus[i]);

make_all_cpus_request(). You need to kick all vcpus from a guest mode.



OK, followed. But would you please explain more about this. :)
Why need to kick all vcpus from guest mode when making request to all 
vcpus ?



+   }
+
spin_unlock(&vcpu->kvm->mmu_lock);

return r;
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c336cb3..abc152f 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -3988,7 +3988,7 @@ static int alloc_apic_access_page(struct kvm *kvm)
if (r)
goto out;

-   page = gfn_to_page(kvm, VMX_APIC_ACCESS_PAGE_ADDR>>  PAGE_SHIFT);
+   page = gfn_to_page_no_pin(kvm, VMX_APIC_ACCESS_PAGE_ADDR>>  PAGE_SHIFT);
if (is_error_page(page)) {
r = -EFAULT;
goto out;
@@ -7075,6 +7075,12 @@ static void vmx_set_virtual_x2apic_mode(struct kvm_vcpu 
*vcpu, bool set)
vmx_set_msr_bitmap(vcpu);
  }

+static void vmx_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa)
+{
+   if (vm_need_virtualize_apic_accesses(kvm))

This shouldn't even been called if apic access page is not supported. Nor
mmu_notifier path neither tdp_page_fault path should ever see 0xfee0
address. BUG() is more appropriate here.



I don't quite understand. Why calling this function here will leed to bug ?
(Sorry, I'm not quite understand the internal of KVM. Please help.)




+   vmcs_write64(APIC_ACCESS_ADDR, hpa);
+}
+
  static void vmx_hwapic_isr_update(struct kvm *kvm, int isr)
  {
u16 status;
@@ -8846,6 +8852,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
.enable_irq_window = enable_irq_window,
.update_cr8_intercept = update_cr8_intercept,
.set_virtual_x2apic_mode = vmx_set_virtual_x2apic_mode,
+   .set_apic_access_page_addr = vmx_set_apic_access_page_addr,

svm needs that too.



OK, will add one for svm.


.vm_has_apicv = vmx_vm_has_apicv,
.load_eoi_exitmap = vmx_load_eoi_exitmap,
.hwapic_irr_update = vmx_hwapic_irr_update,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index a26524f..14e7174 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -5943,6 +5943,24 @@ static void vcpu_migrated_page_update_ept(struct 
kvm_vcpu *vcpu)
}
  }

+static void vcpu_migrated_page_update_apic(struct kvm_vcpu *vcpu)
+{
+   struct kvm *kvm = vcpu->kvm;
+
+   if (kvm->arch.apic_access_page_migrated) {
+   if (kvm->arch.apic_access_page)
+   kvm->arch.apic_access_page = pfn_to_page(0);

All vcpus will access apic_access_page without locking here. May be
set kvm->arch.apic_access_page to zero in mmu_notifier and here call
  kvm_x86_ops->set_apic_access_page_addr(kvm, kvm->arch.apic_access_page);



I'm a little confused. apic access page's phys_addr is stored in vmcs, and
I think it will be used by vcpu directly to access the physical page.
Setting kvm->arch.apic_access_page to zero wil

Re: [PATCH 4/6 v2] KVM: PPC: Book3E: Add AltiVec support

2014-07-03 Thread Scott Wood
On Mon, 2014-06-30 at 18:34 +0300, Mihai Caraman wrote:
> Add KVM Book3E AltiVec support. KVM Book3E FPU support gracefully reuse host
> infrastructure so follow the same approach for AltiVec.
> 
> Signed-off-by: Mihai Caraman 
> ---
> v2:
>  - integrate Paul's FP/VMX/VSX changes
> 
>  arch/powerpc/kvm/booke.c | 67 
> ++--
>  1 file changed, 65 insertions(+), 2 deletions(-)

I had to apply the whole patchset to get proper context for reviewing
this, and found some nits:

> case BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST:
> if (kvmppc_supports_spe() || kvmppc_supports_altivec()) {
> kvmppc_booke_queue_irqprio(vcpu,
> BOOKE_IRQPRIO_SPE_FP_DATA_ALTIVEC_ASSIST);
> r = RESUME_GUEST;
> } else { 
> /*
>  * These really should never happen without 
> CONFIG_SPE,
>  * as we should never enable the real MSR[SPE] in the
>  * guest.
>  */

Besides the comment not being updated for Altivec, it's not true on HV,
where the guest can enable MSR[VEC] all by itself.  For HV, the reason
we shouldn't be able to get here is that we disable KVM on e6500 if
CONFIG_ALTIVEC is not enabled, and no other HV core supports either SPE
or Altivec.

> pr_crit("%s: unexpected SPE interrupt %u at %08lx\n",
> __func__, exit_nr, vcpu->arch.pc);

Error string will say SPE regardless of what sort of chip you're on.
Given that this is explicitly on the "no support for Altivec or SPE"
path, "SPE/Altivec" phrasing seems appropriate.  Of course we have
bigger problems than that if we ever reach this code. :-)

-Scott


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/6 v2] KVM: PPC: Book3E: Use common defines for SPE/FP/AltiVec int numbers

2014-07-03 Thread Alexander Graf


On 04.07.14 01:00, Scott Wood wrote:

On Fri, 2014-07-04 at 00:35 +0200, Alexander Graf wrote:

On 04.07.14 00:31, Scott Wood wrote:

On Thu, 2014-07-03 at 17:15 -0500, Scott Wood wrote:

On Thu, 2014-07-03 at 10:25 -0500, Caraman Mihai Claudiu-B02008 wrote:

-Original Message-
From: Alexander Graf [mailto:ag...@suse.de]
Sent: Thursday, July 03, 2014 3:21 PM
To: Caraman Mihai Claudiu-B02008; kvm-...@vger.kernel.org
Cc: kvm@vger.kernel.org; linuxppc-...@lists.ozlabs.org
Subject: Re: [PATCH 1/6 v2] KVM: PPC: Book3E: Use common defines for
SPE/FP/AltiVec int numbers


On 30.06.14 17:34, Mihai Caraman wrote:

Use common BOOKE_IRQPRIO and BOOKE_INTERRUPT defines for SPE/FP/AltiVec
which share the same interrupt numbers.

Signed-off-by: Mihai Caraman 
---
v2:
- remove outdated definitions

arch/powerpc/include/asm/kvm_asm.h|  8 
arch/powerpc/kvm/booke.c  | 17 +
arch/powerpc/kvm/booke.h  |  4 ++--
arch/powerpc/kvm/booke_interrupts.S   |  9 +
arch/powerpc/kvm/bookehv_interrupts.S |  4 ++--
arch/powerpc/kvm/e500.c   | 10 ++
arch/powerpc/kvm/e500_emulate.c   | 10 ++
7 files changed, 30 insertions(+), 32 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_asm.h

b/arch/powerpc/include/asm/kvm_asm.h

index 9601741..c94fd33 100644
--- a/arch/powerpc/include/asm/kvm_asm.h
+++ b/arch/powerpc/include/asm/kvm_asm.h
@@ -56,14 +56,6 @@
/* E500 */
#define BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL 32
#define BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST 33
-/*
- * TODO: Unify 32-bit and 64-bit kernel exception handlers to use same

defines

- */
-#define BOOKE_INTERRUPT_SPE_UNAVAIL

BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL

-#define BOOKE_INTERRUPT_SPE_FP_DATA

BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST

-#define BOOKE_INTERRUPT_ALTIVEC_UNAVAIL

BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL

-#define BOOKE_INTERRUPT_ALTIVEC_ASSIST \
-   BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST

I think I'd prefer to keep them separate.

What is the reason from changing your mind from ver 1? Do you want to have
different defines with same values (we specifically mapped them to the
hardware interrupt numbers). We already upstreamed the necessary changes
in the kernel. Scott, please share your opinion here.

I don't like hiding the fact that they're the same number, which could
lead to wrong code in the absence of ifdefs that strictly mutually
exclude SPE and Altivec code -- there was an instance of this with
MSR_VEC versus MSR_SPE in a previous patchset.

That said, if you want to enforce that mutual exclusion in a way that is
clear, I won't object too loudly -- but the code does look pretty
similar between the two (as well as between the two IVORs).

Yes, I want to make sure we have 2 separate code paths for SPE and
Altivec. No code sharing at all unless it's very generically possible.

Also, which code does look pretty similar? The fact that we deflect
interrupts back into the guest? That's mostly boilerplate.

There's also the injection of a program check (or exiting to userspace)
when CONFIG_SPE/ALTIVEC is missing.  Not a big deal, but maybe it could
be factored into a helper function.  I like minimizing boilerplate.


Yes, me too - but I also like to be explicit. If there's enough code to 
share, factoring those into helpers certainly works well for me.



Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/6 v2] KVM: PPC: Book3E: Use common defines for SPE/FP/AltiVec int numbers

2014-07-03 Thread Scott Wood
On Fri, 2014-07-04 at 00:35 +0200, Alexander Graf wrote:
> On 04.07.14 00:31, Scott Wood wrote:
> > On Thu, 2014-07-03 at 17:15 -0500, Scott Wood wrote:
> >> On Thu, 2014-07-03 at 10:25 -0500, Caraman Mihai Claudiu-B02008 wrote:
>  -Original Message-
>  From: Alexander Graf [mailto:ag...@suse.de]
>  Sent: Thursday, July 03, 2014 3:21 PM
>  To: Caraman Mihai Claudiu-B02008; kvm-...@vger.kernel.org
>  Cc: kvm@vger.kernel.org; linuxppc-...@lists.ozlabs.org
>  Subject: Re: [PATCH 1/6 v2] KVM: PPC: Book3E: Use common defines for
>  SPE/FP/AltiVec int numbers
> 
> 
>  On 30.06.14 17:34, Mihai Caraman wrote:
> > Use common BOOKE_IRQPRIO and BOOKE_INTERRUPT defines for SPE/FP/AltiVec
> > which share the same interrupt numbers.
> >
> > Signed-off-by: Mihai Caraman 
> > ---
> > v2:
> >- remove outdated definitions
> >
> >arch/powerpc/include/asm/kvm_asm.h|  8 
> >arch/powerpc/kvm/booke.c  | 17 +
> >arch/powerpc/kvm/booke.h  |  4 ++--
> >arch/powerpc/kvm/booke_interrupts.S   |  9 +
> >arch/powerpc/kvm/bookehv_interrupts.S |  4 ++--
> >arch/powerpc/kvm/e500.c   | 10 ++
> >arch/powerpc/kvm/e500_emulate.c   | 10 ++
> >7 files changed, 30 insertions(+), 32 deletions(-)
> >
> > diff --git a/arch/powerpc/include/asm/kvm_asm.h
>  b/arch/powerpc/include/asm/kvm_asm.h
> > index 9601741..c94fd33 100644
> > --- a/arch/powerpc/include/asm/kvm_asm.h
> > +++ b/arch/powerpc/include/asm/kvm_asm.h
> > @@ -56,14 +56,6 @@
> >/* E500 */
> >#define BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL 32
> >#define BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST 33
> > -/*
> > - * TODO: Unify 32-bit and 64-bit kernel exception handlers to use same
>  defines
> > - */
> > -#define BOOKE_INTERRUPT_SPE_UNAVAIL
>  BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL
> > -#define BOOKE_INTERRUPT_SPE_FP_DATA
>  BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST
> > -#define BOOKE_INTERRUPT_ALTIVEC_UNAVAIL
>  BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL
> > -#define BOOKE_INTERRUPT_ALTIVEC_ASSIST \
> > -   
> > BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST
>  I think I'd prefer to keep them separate.
> >>> What is the reason from changing your mind from ver 1? Do you want to have
> >>> different defines with same values (we specifically mapped them to the
> >>> hardware interrupt numbers). We already upstreamed the necessary changes
> >>> in the kernel. Scott, please share your opinion here.
> >> I don't like hiding the fact that they're the same number, which could
> >> lead to wrong code in the absence of ifdefs that strictly mutually
> >> exclude SPE and Altivec code -- there was an instance of this with
> >> MSR_VEC versus MSR_SPE in a previous patchset.
> > That said, if you want to enforce that mutual exclusion in a way that is
> > clear, I won't object too loudly -- but the code does look pretty
> > similar between the two (as well as between the two IVORs).
> 
> Yes, I want to make sure we have 2 separate code paths for SPE and 
> Altivec. No code sharing at all unless it's very generically possible.
> 
> Also, which code does look pretty similar? The fact that we deflect 
> interrupts back into the guest? That's mostly boilerplate.

There's also the injection of a program check (or exiting to userspace)
when CONFIG_SPE/ALTIVEC is missing.  Not a big deal, but maybe it could
be factored into a helper function.  I like minimizing boilerplate.

-Scott


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/6 v2] KVM: PPC: Book3E: Use common defines for SPE/FP/AltiVec int numbers

2014-07-03 Thread Alexander Graf


On 04.07.14 00:31, Scott Wood wrote:

On Thu, 2014-07-03 at 17:15 -0500, Scott Wood wrote:

On Thu, 2014-07-03 at 10:25 -0500, Caraman Mihai Claudiu-B02008 wrote:

-Original Message-
From: Alexander Graf [mailto:ag...@suse.de]
Sent: Thursday, July 03, 2014 3:21 PM
To: Caraman Mihai Claudiu-B02008; kvm-...@vger.kernel.org
Cc: kvm@vger.kernel.org; linuxppc-...@lists.ozlabs.org
Subject: Re: [PATCH 1/6 v2] KVM: PPC: Book3E: Use common defines for
SPE/FP/AltiVec int numbers


On 30.06.14 17:34, Mihai Caraman wrote:

Use common BOOKE_IRQPRIO and BOOKE_INTERRUPT defines for SPE/FP/AltiVec
which share the same interrupt numbers.

Signed-off-by: Mihai Caraman 
---
v2:
   - remove outdated definitions

   arch/powerpc/include/asm/kvm_asm.h|  8 
   arch/powerpc/kvm/booke.c  | 17 +
   arch/powerpc/kvm/booke.h  |  4 ++--
   arch/powerpc/kvm/booke_interrupts.S   |  9 +
   arch/powerpc/kvm/bookehv_interrupts.S |  4 ++--
   arch/powerpc/kvm/e500.c   | 10 ++
   arch/powerpc/kvm/e500_emulate.c   | 10 ++
   7 files changed, 30 insertions(+), 32 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_asm.h

b/arch/powerpc/include/asm/kvm_asm.h

index 9601741..c94fd33 100644
--- a/arch/powerpc/include/asm/kvm_asm.h
+++ b/arch/powerpc/include/asm/kvm_asm.h
@@ -56,14 +56,6 @@
   /* E500 */
   #define BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL 32
   #define BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST 33
-/*
- * TODO: Unify 32-bit and 64-bit kernel exception handlers to use same

defines

- */
-#define BOOKE_INTERRUPT_SPE_UNAVAIL

BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL

-#define BOOKE_INTERRUPT_SPE_FP_DATA

BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST

-#define BOOKE_INTERRUPT_ALTIVEC_UNAVAIL

BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL

-#define BOOKE_INTERRUPT_ALTIVEC_ASSIST \
-   BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST

I think I'd prefer to keep them separate.

What is the reason from changing your mind from ver 1? Do you want to have
different defines with same values (we specifically mapped them to the
hardware interrupt numbers). We already upstreamed the necessary changes
in the kernel. Scott, please share your opinion here.

I don't like hiding the fact that they're the same number, which could
lead to wrong code in the absence of ifdefs that strictly mutually
exclude SPE and Altivec code -- there was an instance of this with
MSR_VEC versus MSR_SPE in a previous patchset.

That said, if you want to enforce that mutual exclusion in a way that is
clear, I won't object too loudly -- but the code does look pretty
similar between the two (as well as between the two IVORs).


Yes, I want to make sure we have 2 separate code paths for SPE and 
Altivec. No code sharing at all unless it's very generically possible.


Also, which code does look pretty similar? The fact that we deflect 
interrupts back into the guest? That's mostly boilerplate.



Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/6 v2] KVM: PPC: Book3E: Use common defines for SPE/FP/AltiVec int numbers

2014-07-03 Thread Scott Wood
On Thu, 2014-07-03 at 17:15 -0500, Scott Wood wrote:
> On Thu, 2014-07-03 at 10:25 -0500, Caraman Mihai Claudiu-B02008 wrote:
> > > -Original Message-
> > > From: Alexander Graf [mailto:ag...@suse.de]
> > > Sent: Thursday, July 03, 2014 3:21 PM
> > > To: Caraman Mihai Claudiu-B02008; kvm-...@vger.kernel.org
> > > Cc: kvm@vger.kernel.org; linuxppc-...@lists.ozlabs.org
> > > Subject: Re: [PATCH 1/6 v2] KVM: PPC: Book3E: Use common defines for
> > > SPE/FP/AltiVec int numbers
> > > 
> > > 
> > > On 30.06.14 17:34, Mihai Caraman wrote:
> > > > Use common BOOKE_IRQPRIO and BOOKE_INTERRUPT defines for SPE/FP/AltiVec
> > > > which share the same interrupt numbers.
> > > >
> > > > Signed-off-by: Mihai Caraman 
> > > > ---
> > > > v2:
> > > >   - remove outdated definitions
> > > >
> > > >   arch/powerpc/include/asm/kvm_asm.h|  8 
> > > >   arch/powerpc/kvm/booke.c  | 17 +
> > > >   arch/powerpc/kvm/booke.h  |  4 ++--
> > > >   arch/powerpc/kvm/booke_interrupts.S   |  9 +
> > > >   arch/powerpc/kvm/bookehv_interrupts.S |  4 ++--
> > > >   arch/powerpc/kvm/e500.c   | 10 ++
> > > >   arch/powerpc/kvm/e500_emulate.c   | 10 ++
> > > >   7 files changed, 30 insertions(+), 32 deletions(-)
> > > >
> > > > diff --git a/arch/powerpc/include/asm/kvm_asm.h
> > > b/arch/powerpc/include/asm/kvm_asm.h
> > > > index 9601741..c94fd33 100644
> > > > --- a/arch/powerpc/include/asm/kvm_asm.h
> > > > +++ b/arch/powerpc/include/asm/kvm_asm.h
> > > > @@ -56,14 +56,6 @@
> > > >   /* E500 */
> > > >   #define BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL 32
> > > >   #define BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST 33
> > > > -/*
> > > > - * TODO: Unify 32-bit and 64-bit kernel exception handlers to use same
> > > defines
> > > > - */
> > > > -#define BOOKE_INTERRUPT_SPE_UNAVAIL
> > > BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL
> > > > -#define BOOKE_INTERRUPT_SPE_FP_DATA
> > > BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST
> > > > -#define BOOKE_INTERRUPT_ALTIVEC_UNAVAIL
> > > BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL
> > > > -#define BOOKE_INTERRUPT_ALTIVEC_ASSIST \
> > > > -   
> > > > BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST
> > > 
> > > I think I'd prefer to keep them separate.
> > 
> > What is the reason from changing your mind from ver 1? Do you want to have
> > different defines with same values (we specifically mapped them to the
> > hardware interrupt numbers). We already upstreamed the necessary changes
> > in the kernel. Scott, please share your opinion here.
> 
> I don't like hiding the fact that they're the same number, which could
> lead to wrong code in the absence of ifdefs that strictly mutually
> exclude SPE and Altivec code -- there was an instance of this with
> MSR_VEC versus MSR_SPE in a previous patchset. 

That said, if you want to enforce that mutual exclusion in a way that is
clear, I won't object too loudly -- but the code does look pretty
similar between the two (as well as between the two IVORs).

-Scott


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/6 v2] KVM: PPC: Book3E: Use common defines for SPE/FP/AltiVec int numbers

2014-07-03 Thread Alexander Graf


On 04.07.14 00:15, Scott Wood wrote:

On Thu, 2014-07-03 at 10:25 -0500, Caraman Mihai Claudiu-B02008 wrote:

-Original Message-
From: Alexander Graf [mailto:ag...@suse.de]
Sent: Thursday, July 03, 2014 3:21 PM
To: Caraman Mihai Claudiu-B02008; kvm-...@vger.kernel.org
Cc: kvm@vger.kernel.org; linuxppc-...@lists.ozlabs.org
Subject: Re: [PATCH 1/6 v2] KVM: PPC: Book3E: Use common defines for
SPE/FP/AltiVec int numbers


On 30.06.14 17:34, Mihai Caraman wrote:

Use common BOOKE_IRQPRIO and BOOKE_INTERRUPT defines for SPE/FP/AltiVec
which share the same interrupt numbers.

Signed-off-by: Mihai Caraman 
---
v2:
   - remove outdated definitions

   arch/powerpc/include/asm/kvm_asm.h|  8 
   arch/powerpc/kvm/booke.c  | 17 +
   arch/powerpc/kvm/booke.h  |  4 ++--
   arch/powerpc/kvm/booke_interrupts.S   |  9 +
   arch/powerpc/kvm/bookehv_interrupts.S |  4 ++--
   arch/powerpc/kvm/e500.c   | 10 ++
   arch/powerpc/kvm/e500_emulate.c   | 10 ++
   7 files changed, 30 insertions(+), 32 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_asm.h

b/arch/powerpc/include/asm/kvm_asm.h

index 9601741..c94fd33 100644
--- a/arch/powerpc/include/asm/kvm_asm.h
+++ b/arch/powerpc/include/asm/kvm_asm.h
@@ -56,14 +56,6 @@
   /* E500 */
   #define BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL 32
   #define BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST 33
-/*
- * TODO: Unify 32-bit and 64-bit kernel exception handlers to use same

defines

- */
-#define BOOKE_INTERRUPT_SPE_UNAVAIL

BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL

-#define BOOKE_INTERRUPT_SPE_FP_DATA

BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST

-#define BOOKE_INTERRUPT_ALTIVEC_UNAVAIL

BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL

-#define BOOKE_INTERRUPT_ALTIVEC_ASSIST \
-   BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST

I think I'd prefer to keep them separate.

What is the reason from changing your mind from ver 1? Do you want to have
different defines with same values (we specifically mapped them to the
hardware interrupt numbers). We already upstreamed the necessary changes
in the kernel. Scott, please share your opinion here.

I don't like hiding the fact that they're the same number, which could
lead to wrong code in the absence of ifdefs that strictly mutually
exclude SPE and Altivec code -- there was an instance of this with
MSR_VEC versus MSR_SPE in a previous patchset.


The nice thing here is that we use almost all of these numbers in 
switch() statements which give us automated duplicate checking - so we 
don't accidentally go into the wrong code path without knowing it.



Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/6 v2] KVM: PPC: Book3E: Use common defines for SPE/FP/AltiVec int numbers

2014-07-03 Thread Scott Wood
On Thu, 2014-07-03 at 10:25 -0500, Caraman Mihai Claudiu-B02008 wrote:
> > -Original Message-
> > From: Alexander Graf [mailto:ag...@suse.de]
> > Sent: Thursday, July 03, 2014 3:21 PM
> > To: Caraman Mihai Claudiu-B02008; kvm-...@vger.kernel.org
> > Cc: kvm@vger.kernel.org; linuxppc-...@lists.ozlabs.org
> > Subject: Re: [PATCH 1/6 v2] KVM: PPC: Book3E: Use common defines for
> > SPE/FP/AltiVec int numbers
> > 
> > 
> > On 30.06.14 17:34, Mihai Caraman wrote:
> > > Use common BOOKE_IRQPRIO and BOOKE_INTERRUPT defines for SPE/FP/AltiVec
> > > which share the same interrupt numbers.
> > >
> > > Signed-off-by: Mihai Caraman 
> > > ---
> > > v2:
> > >   - remove outdated definitions
> > >
> > >   arch/powerpc/include/asm/kvm_asm.h|  8 
> > >   arch/powerpc/kvm/booke.c  | 17 +
> > >   arch/powerpc/kvm/booke.h  |  4 ++--
> > >   arch/powerpc/kvm/booke_interrupts.S   |  9 +
> > >   arch/powerpc/kvm/bookehv_interrupts.S |  4 ++--
> > >   arch/powerpc/kvm/e500.c   | 10 ++
> > >   arch/powerpc/kvm/e500_emulate.c   | 10 ++
> > >   7 files changed, 30 insertions(+), 32 deletions(-)
> > >
> > > diff --git a/arch/powerpc/include/asm/kvm_asm.h
> > b/arch/powerpc/include/asm/kvm_asm.h
> > > index 9601741..c94fd33 100644
> > > --- a/arch/powerpc/include/asm/kvm_asm.h
> > > +++ b/arch/powerpc/include/asm/kvm_asm.h
> > > @@ -56,14 +56,6 @@
> > >   /* E500 */
> > >   #define BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL 32
> > >   #define BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST 33
> > > -/*
> > > - * TODO: Unify 32-bit and 64-bit kernel exception handlers to use same
> > defines
> > > - */
> > > -#define BOOKE_INTERRUPT_SPE_UNAVAIL
> > BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL
> > > -#define BOOKE_INTERRUPT_SPE_FP_DATA
> > BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST
> > > -#define BOOKE_INTERRUPT_ALTIVEC_UNAVAIL
> > BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL
> > > -#define BOOKE_INTERRUPT_ALTIVEC_ASSIST \
> > > - BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST
> > 
> > I think I'd prefer to keep them separate.
> 
> What is the reason from changing your mind from ver 1? Do you want to have
> different defines with same values (we specifically mapped them to the
> hardware interrupt numbers). We already upstreamed the necessary changes
> in the kernel. Scott, please share your opinion here.

I don't like hiding the fact that they're the same number, which could
lead to wrong code in the absence of ifdefs that strictly mutually
exclude SPE and Altivec code -- there was an instance of this with
MSR_VEC versus MSR_SPE in a previous patchset.
 
> > >   #define BOOKE_INTERRUPT_SPE_FP_ROUND 34
> > >   #define BOOKE_INTERRUPT_PERFORMANCE_MONITOR 35
> > >   #define BOOKE_INTERRUPT_DOORBELL 36
> > > diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
> > > index ab62109..3c86d9b 100644
> > > --- a/arch/powerpc/kvm/booke.c
> > > +++ b/arch/powerpc/kvm/booke.c
> > > @@ -388,8 +388,8 @@ static int kvmppc_booke_irqprio_deliver(struct
> > kvm_vcpu *vcpu,
> > >   case BOOKE_IRQPRIO_ITLB_MISS:
> > >   case BOOKE_IRQPRIO_SYSCALL:
> > >   case BOOKE_IRQPRIO_FP_UNAVAIL:
> > > - case BOOKE_IRQPRIO_SPE_UNAVAIL:
> > > - case BOOKE_IRQPRIO_SPE_FP_DATA:
> > > + case BOOKE_IRQPRIO_SPE_ALTIVEC_UNAVAIL:
> > > + case BOOKE_IRQPRIO_SPE_FP_DATA_ALTIVEC_ASSIST:
> > 
> > #ifdef CONFIG_KVM_E500V2
> >case ...SPE:
> > #else
> >case ..ALTIVEC:
> > #endif

I don't think that's an improvement.

-Scott


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 00/21] arm64: GICv3 support

2014-07-03 Thread Marc Zyngier
Hi Jason,

On 30/06/14 16:50, Marc Zyngier wrote:
> Hi Jason,
> 
> On 30/06/14 16:43, Jason Cooper wrote:
>> Marc,
>>
>> On Mon, Jun 30, 2014 at 04:01:29PM +0100, Marc Zyngier wrote:
>>> I now have received enough Reviewed-by from people familiar with the
>>> architecture, and a number of Tested-by from actual implementors,
>>> which (IMHO) makes ready for merging into 3.17 (Thomas, Jason: How do
>>> you want to play it? We have a rather big dependency between the first
>>> few patches and the rest of the stuff, which is KVM only).
>>
>> On a quick glance, it looks like patch #1 is conflict-prone and #2 crosses
>> outside of irqchip/.  So, I could setup a topic branch for #1 that
>> we'll both use.  Then I'll Ack #2 for you to take.  Would that work for
>> you?
> 
> That'd be absolutely great. Just point me to the topic branch you'd like
> to use, and I'll get it into the KVM/ARM tree.

Sorry to pester you... Do you have any update on this? Christoffer and I
are putting our queue together for the next merge window, and I'd really
like to get this going before disappearing on holiday...

Thanks a lot,

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race

2014-07-03 Thread Bandan Das
Wanpeng Li  writes:

> On Thu, Jul 03, 2014 at 01:15:26AM -0400, Bandan Das wrote:
>>Jan Kiszka  writes:
>>
>>> On 2014-07-02 08:54, Wanpeng Li wrote:
 This patch fix bug https://bugzilla.kernel.org/show_bug.cgi?id=72381 
 
 If we didn't inject a still-pending event to L1 since nested_run_pending,
 KVM_REQ_EVENT should be requested after the vmexit in order to inject the 
 event to L1. However, current log blindly request a KVM_REQ_EVENT even if 
 there is no still-pending event to L1 which blocked by nested_run_pending. 
 There is a race which lead to an interrupt will be injected to L2 which 
 belong to L1 if L0 send an interrupt to L1 during this window. 
 
VCPU0   another thread 
 
 L1 intr not blocked on L2 first entry
 vmx_vcpu_run req event 
 kvm check request req event 
 check_nested_events don't have any intr 
 not nested exit 
 intr occur (8254, lapic timer 
 etc)
 inject_pending_event now have intr 
 inject interrupt 
 
 This patch fix this race by introduced a l1_events_blocked field in 
 nested_vmx 
 which indicates there is still-pending event which blocked by 
 nested_run_pending, 
 and smart request a KVM_REQ_EVENT if there is a still-pending event which 
 blocked 
 by nested_run_pending.
>>>
>>> There are more, unrelated reasons why KVM_REQ_EVENT could be set. Why
>>> aren't those able to trigger this scenario?
>>>
>>> In any case, unconditionally setting KVM_REQ_EVENT seems strange and
>>> should be changed.
>>
>>
>>Ugh! I think I am hitting another one but this one's probably because 
>>we are not setting KVM_REQ_EVENT for something we should.
>>
>>Before this patch, I was able to hit this bug everytime with 
>>"modprobe kvm_intel ept=0 nested=1 enable_shadow_vmcs=0" and then booting 
>>L2. I can verify that I was indeed hitting the race in inject_pending_event.
>>
>>After this patch, I believe I am hitting another bug - this happens 
>>after I boot L2, as above, and then start a Linux kernel compilation
>>and then wait and watch :) It's a pain to debug because this happens
>
> I have already try several times with "modprobe kvm_intel ept=0 nested=1
> enable_shadow_vmcs=0" and still can't reproduce the bug you mentioned.
> Could you give me more details such like L1 and L2 which one hang or panic? 
> In addition, if you can post the call trace is appreciated. 

# modprobe kvm_intel ept=0 nested=1 enable_shadow_vmcs=0

The Host CPU - Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz
qemu cmd to run L1 - 
# qemu-system-x86_64 -drive 
file=level1.img,if=virtio,id=disk0,format=raw,cache=none,werror=stop,rerror=stop,aio=threads
 -drive 
file=level2.img,if=virtio,id=disk1,format=raw,cache=none,werror=stop,rerror=stop,aio=threads
 -vnc :2 --enable-kvm -monitor stdio -m 4G -net nic,macaddr=00:23:32:45:89:10 
-net tap,ifname=tap0,script=/etc/qemu-ifup,downscript=no -smp 4 -cpu 
Nehalem,+vmx -serial pty

qemu cmd to run L2 -
# sudo qemu-system-x86_64 -hda VM/level2.img -vnc :0 --enable-kvm -monitor 
stdio -m 2G -smp 2 -cpu Nehalem -redir tcp:::22

Additionally,
L0 is FC19 with 3.16-rc3
L1 and L2 are Ubuntu 14.04 with 3.13.0-24-generic

Then start a kernel compilation inside L2 with "make -j3"

There's no call trace on L0, both L0 and L1 are hung (or rather really slow) and
L1 serial spews out CPU softlock up errors. Enabling panic on softlockup on L1 
will give
a trace with smp_call_function_many() I think the corresponding code in 
kernel/smp.c that
triggers this is
 
WARN_ON_ONCE(cpu_online(this_cpu) && irqs_disabled()
  && !oops_in_progress && !early_boot_irqs_disabled);

I know in most cases this is usually harmless, but in this specific case,
it seems it's stuck here forever.

Sorry, I don't have a L1 call trace handy atm, I can post that if you are 
interested.

Note that this can take as much as 30 to 40 minutes to appear but once it does,
you will know because both L1 and L2 will be stuck with the serial messages as 
I mentioned
before. From my side, let me try this on another system to rule out any machine 
specific
weirdness going on..

Please let me know if you need any further information.

Thanks
Bandan

> Regards,
> Wanpeng Li 
>
>>almost once in three times; it never happens if I run with ept=1, however,
>>I think that's only because the test completes sooner. But I can confirm
>>that I don't see it if I always set REQ_EVENT if nested_run_pending is set 
>>instead of
>>the approach this patch takes.
>>(Any debug hints help appreciated!)
>>
>>So, I am not sure if this is the right fix. Rather, I think the safer thing
>>to do is to have the interrupt pending check for injection into L1 at
>>the "same site" as the call to kvm_queue_interrupt() just like we had before 
>>commit b6b8a1451fc40412c57d1. Is there any advantage to having all the 
>>nested events checks together ?
>

Re: [PATCH V2 2/3] perf protect LBR when Intel PT is enabled.

2014-07-03 Thread Peter Zijlstra
On Thu, Jul 03, 2014 at 05:52:37PM +0200, Andi Kleen wrote:
> > If there's active LBR users out there, we should refuse to enable PT and
> > vice versa. 
> 
> This doesn't work, e.g. hardware debuggers can take over at any time.

Tough cookies. Hardware debuggers get to deal with whatever crap they
cause.


pgpnsyPH4uF5k.pgp
Description: PGP signature


Re: [PATCH 3/5 v4] KVM: PPC: Book3s: Remove kvmppc_read_inst() function

2014-07-03 Thread Alexander Graf


Am 03.07.2014 um 18:18 schrieb "mihai.cara...@freescale.com" 
:

>> -Original Message-
>> From: Alexander Graf [mailto:ag...@suse.de]
>> Sent: Thursday, July 03, 2014 4:37 PM
>> To: Caraman Mihai Claudiu-B02008; kvm-...@vger.kernel.org
>> Cc: kvm@vger.kernel.org; linuxppc-...@lists.ozlabs.org
>> Subject: Re: [PATCH 3/5 v4] KVM: PPC: Book3s: Remove kvmppc_read_inst()
>> function
>> 
>> 
>>> On 28.06.14 00:49, Mihai Caraman wrote:
>>> In the context of replacing kvmppc_ld() function calls with a version
>> of
>>> kvmppc_get_last_inst() which allow to fail, Alex Graf suggested this:
>>> 
>>> "If we get EMULATE_AGAIN, we just have to make sure we go back into the
>> guest.
>>> No need to inject an ISI into  the guest - it'll do that all by itself.
>>> With an error returning kvmppc_get_last_inst we can just use completely
>>> get rid of kvmppc_read_inst() and only use kvmppc_get_last_inst()
>> instead."
>>> 
>>> As a intermediate step get rid of kvmppc_read_inst() and only use
>> kvmppc_ld()
>>> instead.
>>> 
>>> Signed-off-by: Mihai Caraman 
>>> ---
>>> v4:
>>>  - new patch
>>> 
>>>  arch/powerpc/kvm/book3s_pr.c | 85 ++-
>> -
>>>  1 file changed, 35 insertions(+), 50 deletions(-)
>>> 
>>> diff --git a/arch/powerpc/kvm/book3s_pr.c
>> b/arch/powerpc/kvm/book3s_pr.c
>>> index 15fd6c2..d247d88 100644
>>> --- a/arch/powerpc/kvm/book3s_pr.c
>>> +++ b/arch/powerpc/kvm/book3s_pr.c
>>> @@ -665,42 +665,6 @@ static void kvmppc_giveup_fac(struct kvm_vcpu
>> *vcpu, ulong fac)
>>>  #endif
>>>  }
>>> 
>>> -static int kvmppc_read_inst(struct kvm_vcpu *vcpu)
>>> -{
>>> -ulong srr0 = kvmppc_get_pc(vcpu);
>>> -u32 last_inst = kvmppc_get_last_inst(vcpu);
>>> -int ret;
>>> -
>>> -ret = kvmppc_ld(vcpu, &srr0, sizeof(u32), &last_inst, false);
>>> -if (ret == -ENOENT) {
>>> -ulong msr = kvmppc_get_msr(vcpu);
>>> -
>>> -msr = kvmppc_set_field(msr, 33, 33, 1);
>>> -msr = kvmppc_set_field(msr, 34, 36, 0);
>>> -msr = kvmppc_set_field(msr, 42, 47, 0);
>>> -kvmppc_set_msr_fast(vcpu, msr);
>>> -kvmppc_book3s_queue_irqprio(vcpu,
>> BOOK3S_INTERRUPT_INST_STORAGE);
>>> -return EMULATE_AGAIN;
>>> -}
>>> -
>>> -return EMULATE_DONE;
>>> -}
>>> -
>>> -static int kvmppc_check_ext(struct kvm_vcpu *vcpu, unsigned int
>> exit_nr)
>>> -{
>>> -
>>> -/* Need to do paired single emulation? */
>>> -if (!(vcpu->arch.hflags & BOOK3S_HFLAG_PAIRED_SINGLE))
>>> -return EMULATE_DONE;
>>> -
>>> -/* Read out the instruction */
>>> -if (kvmppc_read_inst(vcpu) == EMULATE_DONE)
>>> -/* Need to emulate */
>>> -return EMULATE_FAIL;
>>> -
>>> -return EMULATE_AGAIN;
>>> -}
>>> -
>>>  /* Handle external providers (FPU, Altivec, VSX) */
>>>  static int kvmppc_handle_ext(struct kvm_vcpu *vcpu, unsigned int
>> exit_nr,
>>>   ulong msr)
>>> @@ -1101,31 +1065,51 @@ program_interrupt:
>>>  case BOOK3S_INTERRUPT_VSX:
>>>  {
>>>  int ext_msr = 0;
>>> +int emul;
>>> +ulong pc;
>>> +u32 last_inst;
>>> 
>>> -switch (exit_nr) {
>>> -case BOOK3S_INTERRUPT_FP_UNAVAIL: ext_msr = MSR_FP;  break;
>>> -case BOOK3S_INTERRUPT_ALTIVEC: ext_msr = MSR_VEC; break;
>>> -case BOOK3S_INTERRUPT_VSX:ext_msr = MSR_VSX; break;
>>> -}
>>> +if (!(vcpu->arch.hflags & BOOK3S_HFLAG_PAIRED_SINGLE)) {
>> 
>> Please make paired single emulation the unusual, if()'ed case, not the
>> normal exit path :).
> 
> Huh ... do you have more Book3s specific requests, it will be strange if
> it will still work after all this blind changes :)

Heh :).

All I'm saying is that rather than

if (no emulation) {
  foo();
  break;
)

ps_emulation();
break;

We should do

if (ps emulation) {
  ps_emulation();
  break;
}

foo();
break;


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 3/5 v4] KVM: PPC: Book3s: Remove kvmppc_read_inst() function

2014-07-03 Thread mihai.cara...@freescale.com
> -Original Message-
> From: Alexander Graf [mailto:ag...@suse.de]
> Sent: Thursday, July 03, 2014 4:37 PM
> To: Caraman Mihai Claudiu-B02008; kvm-...@vger.kernel.org
> Cc: kvm@vger.kernel.org; linuxppc-...@lists.ozlabs.org
> Subject: Re: [PATCH 3/5 v4] KVM: PPC: Book3s: Remove kvmppc_read_inst()
> function
> 
> 
> On 28.06.14 00:49, Mihai Caraman wrote:
> > In the context of replacing kvmppc_ld() function calls with a version
> of
> > kvmppc_get_last_inst() which allow to fail, Alex Graf suggested this:
> >
> > "If we get EMULATE_AGAIN, we just have to make sure we go back into the
> guest.
> > No need to inject an ISI into  the guest - it'll do that all by itself.
> > With an error returning kvmppc_get_last_inst we can just use completely
> > get rid of kvmppc_read_inst() and only use kvmppc_get_last_inst()
> instead."
> >
> > As a intermediate step get rid of kvmppc_read_inst() and only use
> kvmppc_ld()
> > instead.
> >
> > Signed-off-by: Mihai Caraman 
> > ---
> > v4:
> >   - new patch
> >
> >   arch/powerpc/kvm/book3s_pr.c | 85 ++-
> -
> >   1 file changed, 35 insertions(+), 50 deletions(-)
> >
> > diff --git a/arch/powerpc/kvm/book3s_pr.c
> b/arch/powerpc/kvm/book3s_pr.c
> > index 15fd6c2..d247d88 100644
> > --- a/arch/powerpc/kvm/book3s_pr.c
> > +++ b/arch/powerpc/kvm/book3s_pr.c
> > @@ -665,42 +665,6 @@ static void kvmppc_giveup_fac(struct kvm_vcpu
> *vcpu, ulong fac)
> >   #endif
> >   }
> >
> > -static int kvmppc_read_inst(struct kvm_vcpu *vcpu)
> > -{
> > -   ulong srr0 = kvmppc_get_pc(vcpu);
> > -   u32 last_inst = kvmppc_get_last_inst(vcpu);
> > -   int ret;
> > -
> > -   ret = kvmppc_ld(vcpu, &srr0, sizeof(u32), &last_inst, false);
> > -   if (ret == -ENOENT) {
> > -   ulong msr = kvmppc_get_msr(vcpu);
> > -
> > -   msr = kvmppc_set_field(msr, 33, 33, 1);
> > -   msr = kvmppc_set_field(msr, 34, 36, 0);
> > -   msr = kvmppc_set_field(msr, 42, 47, 0);
> > -   kvmppc_set_msr_fast(vcpu, msr);
> > -   kvmppc_book3s_queue_irqprio(vcpu,
> BOOK3S_INTERRUPT_INST_STORAGE);
> > -   return EMULATE_AGAIN;
> > -   }
> > -
> > -   return EMULATE_DONE;
> > -}
> > -
> > -static int kvmppc_check_ext(struct kvm_vcpu *vcpu, unsigned int
> exit_nr)
> > -{
> > -
> > -   /* Need to do paired single emulation? */
> > -   if (!(vcpu->arch.hflags & BOOK3S_HFLAG_PAIRED_SINGLE))
> > -   return EMULATE_DONE;
> > -
> > -   /* Read out the instruction */
> > -   if (kvmppc_read_inst(vcpu) == EMULATE_DONE)
> > -   /* Need to emulate */
> > -   return EMULATE_FAIL;
> > -
> > -   return EMULATE_AGAIN;
> > -}
> > -
> >   /* Handle external providers (FPU, Altivec, VSX) */
> >   static int kvmppc_handle_ext(struct kvm_vcpu *vcpu, unsigned int
> exit_nr,
> >  ulong msr)
> > @@ -1101,31 +1065,51 @@ program_interrupt:
> > case BOOK3S_INTERRUPT_VSX:
> > {
> > int ext_msr = 0;
> > +   int emul;
> > +   ulong pc;
> > +   u32 last_inst;
> >
> > -   switch (exit_nr) {
> > -   case BOOK3S_INTERRUPT_FP_UNAVAIL: ext_msr = MSR_FP;  break;
> > -   case BOOK3S_INTERRUPT_ALTIVEC:ext_msr = MSR_VEC; break;
> > -   case BOOK3S_INTERRUPT_VSX:ext_msr = MSR_VSX; break;
> > -   }
> > +   if (!(vcpu->arch.hflags & BOOK3S_HFLAG_PAIRED_SINGLE)) {
> 
> Please make paired single emulation the unusual, if()'ed case, not the
> normal exit path :).

Huh ... do you have more Book3s specific requests, it will be strange if
it will still work after all this blind changes :)

-Mike
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 5/6 v2] KVM: PPC: Book3E: Add ONE_REG AltiVec support

2014-07-03 Thread mihai.cara...@freescale.com
> -Original Message-
> From: Alexander Graf [mailto:ag...@suse.de]
> Sent: Thursday, July 03, 2014 3:34 PM
> To: Caraman Mihai Claudiu-B02008; kvm-...@vger.kernel.org
> Cc: kvm@vger.kernel.org; linuxppc-...@lists.ozlabs.org
> Subject: Re: [PATCH 5/6 v2] KVM: PPC: Book3E: Add ONE_REG AltiVec support
> 
> 
> On 30.06.14 17:34, Mihai Caraman wrote:
> > Add ONE_REG support for AltiVec on Book3E.
> >
> > Signed-off-by: Mihai Caraman 
> 
> Any chance we can handle these in generic code?

I expected this request :) Can we let this for a second phase to have
e6500 enabled first?

Can you share with us a Book3S setup so I can validate the requested
changes? I already fell anxious touching strange hardware specific
Book3S code without running it.

-Mike
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 4/6 v2] KVM: PPC: Book3E: Add AltiVec support

2014-07-03 Thread mihai.cara...@freescale.com
> -Original Message-
> From: Alexander Graf [mailto:ag...@suse.de]
> Sent: Thursday, July 03, 2014 3:32 PM
> To: Caraman Mihai Claudiu-B02008; kvm-...@vger.kernel.org
> Cc: kvm@vger.kernel.org; linuxppc-...@lists.ozlabs.org
> Subject: Re: [PATCH 4/6 v2] KVM: PPC: Book3E: Add AltiVec support
> 
> 
> On 30.06.14 17:34, Mihai Caraman wrote:
> > Add KVM Book3E AltiVec support. KVM Book3E FPU support gracefully reuse
> host
> > infrastructure so follow the same approach for AltiVec.
> >
> > Signed-off-by: Mihai Caraman 
> 
> Same comment here - I fail to see how we refetch Altivec state after a
> context switch.

See previous comment. I also run my usual Altivec stress test consisting in
a guest and host process running affine to a physical core an competing for
the same unit's resources using different data sets.

-Mike
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 1/6 v2] KVM: PPC: Book3E: Use common defines for SPE/FP/AltiVec int numbers

2014-07-03 Thread mihai.cara...@freescale.com
> -Original Message-
> From: Alexander Graf [mailto:ag...@suse.de]
> Sent: Thursday, July 03, 2014 6:31 PM
> To: Caraman Mihai Claudiu-B02008; Wood Scott-B07421; kvm-
> p...@vger.kernel.org
> Cc: kvm@vger.kernel.org; linuxppc-...@lists.ozlabs.org
> Subject: Re: [PATCH 1/6 v2] KVM: PPC: Book3E: Use common defines for
> SPE/FP/AltiVec int numbers
> 
> 
> On 03.07.14 17:25, mihai.cara...@freescale.com wrote:
> >> -Original Message-
> >> From: Alexander Graf [mailto:ag...@suse.de]
> >> Sent: Thursday, July 03, 2014 3:21 PM
> >> To: Caraman Mihai Claudiu-B02008; kvm-...@vger.kernel.org
> >> Cc: kvm@vger.kernel.org; linuxppc-...@lists.ozlabs.org
> >> Subject: Re: [PATCH 1/6 v2] KVM: PPC: Book3E: Use common defines for
> >> SPE/FP/AltiVec int numbers
> >>
> >>
> >> On 30.06.14 17:34, Mihai Caraman wrote:
> >>> Use common BOOKE_IRQPRIO and BOOKE_INTERRUPT defines for
> SPE/FP/AltiVec
> >>> which share the same interrupt numbers.
> >>>
> >>> Signed-off-by: Mihai Caraman 
> >>> ---
> >>> v2:
> >>>- remove outdated definitions
> >>>
> >>>arch/powerpc/include/asm/kvm_asm.h|  8 
> >>>arch/powerpc/kvm/booke.c  | 17 +
> >>>arch/powerpc/kvm/booke.h  |  4 ++--
> >>>arch/powerpc/kvm/booke_interrupts.S   |  9 +
> >>>arch/powerpc/kvm/bookehv_interrupts.S |  4 ++--
> >>>arch/powerpc/kvm/e500.c   | 10 ++
> >>>arch/powerpc/kvm/e500_emulate.c   | 10 ++
> >>>7 files changed, 30 insertions(+), 32 deletions(-)
> >>>
> >>> diff --git a/arch/powerpc/include/asm/kvm_asm.h
> >> b/arch/powerpc/include/asm/kvm_asm.h
> >>> index 9601741..c94fd33 100644
> >>> --- a/arch/powerpc/include/asm/kvm_asm.h
> >>> +++ b/arch/powerpc/include/asm/kvm_asm.h
> >>> @@ -56,14 +56,6 @@
> >>>/* E500 */
> >>>#define BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL 32
> >>>#define BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST 33
> >>> -/*
> >>> - * TODO: Unify 32-bit and 64-bit kernel exception handlers to use
> same
> >> defines
> >>> - */
> >>> -#define BOOKE_INTERRUPT_SPE_UNAVAIL
> >> BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL
> >>> -#define BOOKE_INTERRUPT_SPE_FP_DATA
> >> BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST
> >>> -#define BOOKE_INTERRUPT_ALTIVEC_UNAVAIL
> >> BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL
> >>> -#define BOOKE_INTERRUPT_ALTIVEC_ASSIST \
> >>> - BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST
> >> I think I'd prefer to keep them separate.
> > What is the reason from changing your mind from ver 1? Do you want to
> have
> 
> Uh, mind to point me to an email where I said I like the approach? :)

You tacitly approved it in this thread ... I did not say you like it :)

https://lists.ozlabs.org/pipermail/linuxppc-dev/2013-July/108501.html

-Mike
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 2/3] perf protect LBR when Intel PT is enabled.

2014-07-03 Thread Andi Kleen
> If there's active LBR users out there, we should refuse to enable PT and
> vice versa. 

This doesn't work, e.g. hardware debuggers can take over at any time.

-Andi
-- 
a...@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 3/6 v2] KVM: PPC: Book3E: Increase FPU laziness

2014-07-03 Thread mihai.cara...@freescale.com
> -Original Message-
> From: Alexander Graf [mailto:ag...@suse.de]
> Sent: Thursday, July 03, 2014 3:29 PM
> To: Caraman Mihai Claudiu-B02008; kvm-...@vger.kernel.org
> Cc: kvm@vger.kernel.org; linuxppc-...@lists.ozlabs.org
> Subject: Re: [PATCH 3/6 v2] KVM: PPC: Book3E: Increase FPU laziness
> 
> 
> On 30.06.14 17:34, Mihai Caraman wrote:
> > Increase FPU laziness by calling kvmppc_load_guest_fp() just before
> > returning to guest instead of each sched in. Without this improvement
> > an interrupt may also claim floting point corrupting guest state.
>
> How do you handle context switching with this patch applied? During most
> of the guest's lifetime we never exit kvmppc_vcpu_run(), so when the
> guest gets switched out all FPU state gets lost?

No, we had this discussion in ver 1. The FP/VMX/VSX is implemented lazy in
the kernel i.e. the unit state is not saved/restored until another thread
that once claimed the unit is sched in.

Since FP/VMX/VSX can be activated by the guest independent of the host, the
vcpu thread is always using the unit (even if it did not claimed it once).

Now, this patch optimize the sched in flow. Instead of checking on each vcpu
sched in if the kernel unloaded unit's guest state for another competing host
process we do this when we enter the guest.

-Mike

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 00/10] RFC: userfault

2014-07-03 Thread Dave Hansen
On 07/02/2014 09:50 AM, Andrea Arcangeli wrote:
> The MADV_USERFAULT feature should be generic enough that it can
> provide the userfaults to the Android volatile range feature too, on
> access of reclaimed volatile pages.

Maybe.

I certainly can't keep track of all the versions of the variations of
the volatile ranges patches.  But, I don't think it's a given that this
can be reused.  First of all, volatile ranges is trying to replace
ashmem and is going to require _some_ form of sharing.  This mechanism,
being tightly coupled to anonymous memory at the moment, is not a close
fit for that.

It's also important to call out that this is a VMA-based mechanism.  I
certainly can't predict what we'll merge for volatile ranges, but not
all of them are VMA-based.  We'd also need a mechanism on top of this to
differentiate plain not-present pages from not-present-because-purged pages.

That said, I _think_ this might fit well in to what the Mozilla guys
wanted out of volatile ranges.  I'm not confident about it, though.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/6 v2] KVM: PPC: Book3E: Use common defines for SPE/FP/AltiVec int numbers

2014-07-03 Thread Alexander Graf


On 03.07.14 17:25, mihai.cara...@freescale.com wrote:

-Original Message-
From: Alexander Graf [mailto:ag...@suse.de]
Sent: Thursday, July 03, 2014 3:21 PM
To: Caraman Mihai Claudiu-B02008; kvm-...@vger.kernel.org
Cc: kvm@vger.kernel.org; linuxppc-...@lists.ozlabs.org
Subject: Re: [PATCH 1/6 v2] KVM: PPC: Book3E: Use common defines for
SPE/FP/AltiVec int numbers


On 30.06.14 17:34, Mihai Caraman wrote:

Use common BOOKE_IRQPRIO and BOOKE_INTERRUPT defines for SPE/FP/AltiVec
which share the same interrupt numbers.

Signed-off-by: Mihai Caraman 
---
v2:
   - remove outdated definitions

   arch/powerpc/include/asm/kvm_asm.h|  8 
   arch/powerpc/kvm/booke.c  | 17 +
   arch/powerpc/kvm/booke.h  |  4 ++--
   arch/powerpc/kvm/booke_interrupts.S   |  9 +
   arch/powerpc/kvm/bookehv_interrupts.S |  4 ++--
   arch/powerpc/kvm/e500.c   | 10 ++
   arch/powerpc/kvm/e500_emulate.c   | 10 ++
   7 files changed, 30 insertions(+), 32 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_asm.h

b/arch/powerpc/include/asm/kvm_asm.h

index 9601741..c94fd33 100644
--- a/arch/powerpc/include/asm/kvm_asm.h
+++ b/arch/powerpc/include/asm/kvm_asm.h
@@ -56,14 +56,6 @@
   /* E500 */
   #define BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL 32
   #define BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST 33
-/*
- * TODO: Unify 32-bit and 64-bit kernel exception handlers to use same

defines

- */
-#define BOOKE_INTERRUPT_SPE_UNAVAIL

BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL

-#define BOOKE_INTERRUPT_SPE_FP_DATA

BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST

-#define BOOKE_INTERRUPT_ALTIVEC_UNAVAIL

BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL

-#define BOOKE_INTERRUPT_ALTIVEC_ASSIST \
-   BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST

I think I'd prefer to keep them separate.

What is the reason from changing your mind from ver 1? Do you want to have


Uh, mind to point me to an email where I said I like the approach? :)


different defines with same values (we specifically mapped them to the
hardware interrupt numbers). We already upstreamed the necessary changes


Yes, I think that'd end up the most readable flow of things.


in the kernel. Scott, please share your opinion here.


I'm not going to be religious about it, but names like 
"BOOKE_IRQPRIO_SPE_FP_DATA_ALTIVEC_ASSIST" are


  1) too long
  2) too ambiguous

It just means the code gets harder to read. Any way we can take to 
simplify the code flow is a win IMHO. And if I don't even remotely have 
to consider SPE when reading an Altivec path, I think that's a good 
thing :).



Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 1/6 v2] KVM: PPC: Book3E: Use common defines for SPE/FP/AltiVec int numbers

2014-07-03 Thread mihai.cara...@freescale.com
> -Original Message-
> From: Alexander Graf [mailto:ag...@suse.de]
> Sent: Thursday, July 03, 2014 3:21 PM
> To: Caraman Mihai Claudiu-B02008; kvm-...@vger.kernel.org
> Cc: kvm@vger.kernel.org; linuxppc-...@lists.ozlabs.org
> Subject: Re: [PATCH 1/6 v2] KVM: PPC: Book3E: Use common defines for
> SPE/FP/AltiVec int numbers
> 
> 
> On 30.06.14 17:34, Mihai Caraman wrote:
> > Use common BOOKE_IRQPRIO and BOOKE_INTERRUPT defines for SPE/FP/AltiVec
> > which share the same interrupt numbers.
> >
> > Signed-off-by: Mihai Caraman 
> > ---
> > v2:
> >   - remove outdated definitions
> >
> >   arch/powerpc/include/asm/kvm_asm.h|  8 
> >   arch/powerpc/kvm/booke.c  | 17 +
> >   arch/powerpc/kvm/booke.h  |  4 ++--
> >   arch/powerpc/kvm/booke_interrupts.S   |  9 +
> >   arch/powerpc/kvm/bookehv_interrupts.S |  4 ++--
> >   arch/powerpc/kvm/e500.c   | 10 ++
> >   arch/powerpc/kvm/e500_emulate.c   | 10 ++
> >   7 files changed, 30 insertions(+), 32 deletions(-)
> >
> > diff --git a/arch/powerpc/include/asm/kvm_asm.h
> b/arch/powerpc/include/asm/kvm_asm.h
> > index 9601741..c94fd33 100644
> > --- a/arch/powerpc/include/asm/kvm_asm.h
> > +++ b/arch/powerpc/include/asm/kvm_asm.h
> > @@ -56,14 +56,6 @@
> >   /* E500 */
> >   #define BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL 32
> >   #define BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST 33
> > -/*
> > - * TODO: Unify 32-bit and 64-bit kernel exception handlers to use same
> defines
> > - */
> > -#define BOOKE_INTERRUPT_SPE_UNAVAIL
> BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL
> > -#define BOOKE_INTERRUPT_SPE_FP_DATA
> BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST
> > -#define BOOKE_INTERRUPT_ALTIVEC_UNAVAIL
> BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL
> > -#define BOOKE_INTERRUPT_ALTIVEC_ASSIST \
> > -   BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST
> 
> I think I'd prefer to keep them separate.

What is the reason from changing your mind from ver 1? Do you want to have
different defines with same values (we specifically mapped them to the
hardware interrupt numbers). We already upstreamed the necessary changes
in the kernel. Scott, please share your opinion here.

> 
> >   #define BOOKE_INTERRUPT_SPE_FP_ROUND 34
> >   #define BOOKE_INTERRUPT_PERFORMANCE_MONITOR 35
> >   #define BOOKE_INTERRUPT_DOORBELL 36
> > diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
> > index ab62109..3c86d9b 100644
> > --- a/arch/powerpc/kvm/booke.c
> > +++ b/arch/powerpc/kvm/booke.c
> > @@ -388,8 +388,8 @@ static int kvmppc_booke_irqprio_deliver(struct
> kvm_vcpu *vcpu,
> > case BOOKE_IRQPRIO_ITLB_MISS:
> > case BOOKE_IRQPRIO_SYSCALL:
> > case BOOKE_IRQPRIO_FP_UNAVAIL:
> > -   case BOOKE_IRQPRIO_SPE_UNAVAIL:
> > -   case BOOKE_IRQPRIO_SPE_FP_DATA:
> > +   case BOOKE_IRQPRIO_SPE_ALTIVEC_UNAVAIL:
> > +   case BOOKE_IRQPRIO_SPE_FP_DATA_ALTIVEC_ASSIST:
> 
> #ifdef CONFIG_KVM_E500V2
>case ...SPE:
> #else
>case ..ALTIVEC:
> #endif

-Mike
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RESEND PATCH v7 3/4] arm: dirty log write protect management support

2014-07-03 Thread Christoffer Dall
On Tue, Jun 17, 2014 at 06:41:52PM -0700, Mario Smarduch wrote:
> On 06/11/2014 12:03 AM, Christoffer Dall wrote:
> 
> >>
> >> There is also the issue of kvm_flush_remote_tlbs(), that's also weak,
> >> the generic one is using IPIs. Since it's only used in mmu.c maybe make 
> >> this one static.
> >>
> > So I don't see a lot of use of weak symbols in kvm_main.c (actually on
> > kvmarm/next I don't see any), but we do want to share code when more
> > than one architecture implements something in the exact same way, like
> > it seems x86 and ARM is doing here for this particular function.
> > 
> > I think the KVM scheme is usually to check for some define, like:
> > 
> > #ifdef KVM_ARCH_HAVE_GET_DIRTY_LOG
> > ret = kvm_arch_get_dirty_log(...);
> > #else
> > ret = kvm_get_dirty_log(...);
> > #endif
> > 
> > but Paolo may have a more informed oppinion of how to deal with these.
> > 
> > Thanks,
> > -Christoffer
> >
> 
>  
> One approach I'm trying looking at the code in kvm_main().
> This approach applies more to selecting features as opposed to
> selecting generic vs architecture specific functions.
> 
> 1.-
>  - add to 'virt/kvm/Kconfig'
> config HAVE_KVM_ARCH_TLB_FLUSH_ALL
>bool
> 
> config HAVE_KVM_ARCH_DIRTY_LOG
>bool
> 2.--
> For ARM and later ARM64 add to 'arch/arm[64]/kvm/Kconfig'
> config KVM
> bool "Kernel-based Virtual Machine (KVM) support"
> ...
> select HAVE_KVM_ARCH_TLB_FLUSH_ALL
> ..
> 
> Not for HAVE_KVM_ARCH_DIRTY_LOG given it's shared with x86,
> but would need to do it for every other architecture that
> does not share it (except initially for arm64 since it
> will use the variant that returns -EINVAL until feature
> is supported)
> 
> 3--
> In kvm_main.c would have something like
> 
> void kvm_flush_remote_tlbs(struct kvm *kvm)
> {
> #ifdef CONFIG_HAVE_KVM_ARCH_TLB_FLUSH_ALL
> kvm_arch_flush_remote_tlbs(kvm);
> #else
> long dirty_count = kvm->tlbs_dirty;
> 
> smp_mb();
> if (make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH))
> ++kvm->stat.remote_tlb_flush;
> cmpxchg(&kvm->tlbs_dirty, dirty_count, 0);
> #endif
> }
> 
> Then add void kvm_flush_remote_tlbs(struct kvm *kvm) definition
> to arm kvm_host.h. Define the function in this case mmu.c
> 
> For the dirty log function
> int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm,
> struct kvm_dirty_log *log)
> {
> #ifdef CONFIG_HAVE_KVM_ARCH_DIRTY_LOG
> kvm_arch_vm_ioctl_get_dirty_log(kvm, log);
> #else
> int r;
> struct kvm_memory_slot *memslot;
> unsigned long n, i;
> unsigned long *dirty_bitmap;
> unsigned long *dirty_bitmap_buffer;
> bool is_dirty = false;
>   ...
> 
> But then you have to go into every architecture and define the
> kvm_arch_vm_...() variant.
> 
> Is this the right way to go? Or is there a simpler way?
> 
Hmmm, I'm really not an expert in the 'established procedures' for what
to put in config files etc., but here's my basic take:

a) you wouldn't put a config option in Kconfig unless it's comething
that's actually configurable or some generic feature/subsystem that
should only be enabled if hardware has certain capabilities or other
config options enabled.

b) this seems entirely an implementation issue and not depending on
anything users should select.

c) therefore, I think it's either a question of always having an
arch-specific implementation that you probe for its return value or you
have some sort of define in the header files for the
arch/X/include/asm/kvm_host.h to control what you need.

-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 4/4] KVM: PPC: e500mc: Advertise E.PT to support HTW guests

2014-07-03 Thread Mihai Caraman
Enable E.PT for vcpus with MMU MAV 2.0 to support Hardware Page Tablewalk (HTW)
in guests.

Signed-off-by: Mihai Caraman 
---
 arch/powerpc/kvm/e500_mmu.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/arch/powerpc/kvm/e500_mmu.c b/arch/powerpc/kvm/e500_mmu.c
index b775e6a..1de0cd6 100644
--- a/arch/powerpc/kvm/e500_mmu.c
+++ b/arch/powerpc/kvm/e500_mmu.c
@@ -945,11 +945,7 @@ static int vcpu_mmu_init(struct kvm_vcpu *vcpu,
vcpu->arch.tlbps[1] = mfspr(SPRN_TLB1PS);
 
vcpu->arch.mmucfg &= ~MMUCFG_LRAT;
-
-   /* Guest mmu emulation currently doesn't handle E.PT */
-   vcpu->arch.eptcfg = 0;
-   vcpu->arch.tlbcfg[0] &= ~TLBnCFG_PT;
-   vcpu->arch.tlbcfg[1] &= ~TLBnCFG_IND;
+   vcpu->arch.eptcfg = mfspr(SPRN_EPTCFG);
}
 
return 0;
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 2/4] KVM: PPC: Book3E: Handle LRAT error exception

2014-07-03 Thread Mihai Caraman
Handle LRAT error exception with support for lrat mapping and invalidation.

Signed-off-by: Mihai Caraman 
---
 arch/powerpc/include/asm/kvm_host.h   |   1 +
 arch/powerpc/include/asm/kvm_ppc.h|   2 +
 arch/powerpc/include/asm/mmu-book3e.h |   3 +
 arch/powerpc/include/asm/reg_booke.h  |  13 
 arch/powerpc/kernel/asm-offsets.c |   1 +
 arch/powerpc/kvm/booke.c  |  40 +++
 arch/powerpc/kvm/bookehv_interrupts.S |   9 ++-
 arch/powerpc/kvm/e500_mmu_host.c  | 125 ++
 arch/powerpc/kvm/e500mc.c |   2 +
 9 files changed, 195 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/kvm_host.h 
b/arch/powerpc/include/asm/kvm_host.h
index bb66d8b..7b6b2ec 100644
--- a/arch/powerpc/include/asm/kvm_host.h
+++ b/arch/powerpc/include/asm/kvm_host.h
@@ -433,6 +433,7 @@ struct kvm_vcpu_arch {
u32 eplc;
u32 epsc;
u32 oldpir;
+   u64 fault_lper;
 #endif
 
 #if defined(CONFIG_BOOKE)
diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index 9c89cdd..2730a29 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -86,6 +86,8 @@ extern gpa_t kvmppc_mmu_xlate(struct kvm_vcpu *vcpu, unsigned 
int gtlb_index,
   gva_t eaddr);
 extern void kvmppc_mmu_dtlb_miss(struct kvm_vcpu *vcpu);
 extern void kvmppc_mmu_itlb_miss(struct kvm_vcpu *vcpu);
+extern void kvmppc_lrat_map(struct kvm_vcpu *vcpu, gfn_t gfn);
+extern void kvmppc_lrat_invalidate(struct kvm_vcpu *vcpu);
 
 extern struct kvm_vcpu *kvmppc_core_vcpu_create(struct kvm *kvm,
 unsigned int id);
diff --git a/arch/powerpc/include/asm/mmu-book3e.h 
b/arch/powerpc/include/asm/mmu-book3e.h
index 088fd9f..ac6acf7 100644
--- a/arch/powerpc/include/asm/mmu-book3e.h
+++ b/arch/powerpc/include/asm/mmu-book3e.h
@@ -40,6 +40,8 @@
 
 /* MAS registers bit definitions */
 
+#define MAS0_ATSEL 0x8000
+#define MAS0_ATSEL_SHIFT   31
 #define MAS0_TLBSEL_MASK0x3000
 #define MAS0_TLBSEL_SHIFT   28
 #define MAS0_TLBSEL(x)  (((x) << MAS0_TLBSEL_SHIFT) & MAS0_TLBSEL_MASK)
@@ -53,6 +55,7 @@
 #define MAS0_WQ_CLR_RSRV   0x2000
 
 #define MAS1_VALID 0x8000
+#define MAS1_VALID_SHIFT   31
 #define MAS1_IPROT 0x4000
 #define MAS1_TID(x)(((x) << 16) & 0x3FFF)
 #define MAS1_IND   0x2000
diff --git a/arch/powerpc/include/asm/reg_booke.h 
b/arch/powerpc/include/asm/reg_booke.h
index 75bda23..783d617 100644
--- a/arch/powerpc/include/asm/reg_booke.h
+++ b/arch/powerpc/include/asm/reg_booke.h
@@ -43,6 +43,8 @@
 
 /* Special Purpose Registers (SPRNs)*/
 #define SPRN_DECAR 0x036   /* Decrementer Auto Reload Register */
+#define SPRN_LPER  0x038   /* Logical Page Exception Register */
+#define SPRN_LPERU 0x039   /* Logical Page Exception Register Upper */
 #define SPRN_IVPR  0x03F   /* Interrupt Vector Prefix Register */
 #define SPRN_USPRG00x100   /* User Special Purpose Register General 0 */
 #define SPRN_SPRG3R0x103   /* Special Purpose Register General 3 Read */
@@ -358,6 +360,9 @@
 #define ESR_ILK0x0010  /* Instr. Cache Locking */
 #define ESR_PUO0x0004  /* Unimplemented Operation 
exception */
 #define ESR_BO 0x0002  /* Byte Ordering */
+#define ESR_DATA   0x0400  /* Page Table Data Access */
+#define ESR_TLBI   0x0200  /* Page Table TLB Ineligible */
+#define ESR_PT 0x0100  /* Page Table Translation */
 #define ESR_SPV0x0080  /* Signal Processing operation 
*/
 
 /* Bit definitions related to the DBCR0. */
@@ -649,6 +654,14 @@
 #define EPC_EPID   0x3fff
 #define EPC_EPID_SHIFT 0
 
+/* Bit definitions for LPER */
+#define LPER_ALPN  0x000FF000ULL
+#define LPER_ALPN_SHIFT12
+#define LPER_WIMGE 0x0F80
+#define LPER_WIMGE_SHIFT   7
+#define LPER_LPS   0x000F
+#define LPER_LPS_SHIFT 0
+
 /*
  * The IBM-403 is an even more odd special case, as it is much
  * older than the IBM-405 series.  We put these down here incase someone
diff --git a/arch/powerpc/kernel/asm-offsets.c 
b/arch/powerpc/kernel/asm-offsets.c
index f5995a9..be6e329 100644
--- a/arch/powerpc/kernel/asm-offsets.c
+++ b/arch/powerpc/kernel/asm-offsets.c
@@ -713,6 +713,7 @@ int main(void)
DEFINE(VCPU_HOST_MAS4, offsetof(struct kvm_vcpu, arch.host_mas4));
DEFINE(VCPU_HOST_MAS6, offsetof(struct kvm_vcpu, arch.host_mas6));
DEFINE(VCPU_EPLC, offsetof(struct kvm_vcpu, arch.eplc));
+   DEFINE(VCPU_FAULT_LPER, offsetof(struct kvm_vcpu, arch.fault_lper));
 #endif
 
 #ifdef CONFIG_KVM_EXIT_TIMING
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index a192975..ab1077f 100644
--- a/arch/powerpc/kvm/b

[RFC PATCH 1/4] powerpc/booke64: Add LRAT next and max entries to tlb_core_data structure

2014-07-03 Thread Mihai Caraman
LRAT (Logical to Real Address Translation) is shared between hw threads.
Add LRAT next and max entries to tlb_core_data structure and initialize them.

Signed-off-by: Mihai Caraman 
---
 arch/powerpc/include/asm/mmu-book3e.h | 7 +++
 arch/powerpc/include/asm/reg_booke.h  | 1 +
 arch/powerpc/mm/fsl_booke_mmu.c   | 8 
 3 files changed, 16 insertions(+)

diff --git a/arch/powerpc/include/asm/mmu-book3e.h 
b/arch/powerpc/include/asm/mmu-book3e.h
index 8d24f78..088fd9f 100644
--- a/arch/powerpc/include/asm/mmu-book3e.h
+++ b/arch/powerpc/include/asm/mmu-book3e.h
@@ -217,6 +217,12 @@
 #define TLBILX_T_CLASS26
 #define TLBILX_T_CLASS37
 
+/* LRATCFG bits */
+#define LRATCFG_ASSOC  0xFF00
+#define LRATCFG_LASIZE 0x00FE
+#define LRATCFG_LPID   0x2000
+#define LRATCFG_NENTRY 0x0FFF
+
 #ifndef __ASSEMBLY__
 #include 
 
@@ -294,6 +300,7 @@ struct tlb_core_data {
 
/* For software way selection, as on Freescale TLB1 */
u8 esel_next, esel_max, esel_first;
+   u8 lrat_next, lrat_max;
 };
 
 #ifdef CONFIG_PPC64
diff --git a/arch/powerpc/include/asm/reg_booke.h 
b/arch/powerpc/include/asm/reg_booke.h
index 464f108..75bda23 100644
--- a/arch/powerpc/include/asm/reg_booke.h
+++ b/arch/powerpc/include/asm/reg_booke.h
@@ -64,6 +64,7 @@
 #define SPRN_DVC2  0x13F   /* Data Value Compare Register 2 */
 #define SPRN_LPID  0x152   /* Logical Partition ID */
 #define SPRN_MAS8  0x155   /* MMU Assist Register 8 */
+#define SPRN_LRATCFG   0x156   /* LRAT Configuration Register */
 #define SPRN_TLB0PS0x158   /* TLB 0 Page Size Register */
 #define SPRN_TLB1PS0x159   /* TLB 1 Page Size Register */
 #define SPRN_MAS5_MAS6 0x15c   /* MMU Assist Register 5 || 6 */
diff --git a/arch/powerpc/mm/fsl_booke_mmu.c b/arch/powerpc/mm/fsl_booke_mmu.c
index 94cd728..6492708 100644
--- a/arch/powerpc/mm/fsl_booke_mmu.c
+++ b/arch/powerpc/mm/fsl_booke_mmu.c
@@ -196,6 +196,14 @@ static unsigned long map_mem_in_cams_addr(phys_addr_t 
phys, unsigned long virt,
get_paca()->tcd.esel_next = i;
get_paca()->tcd.esel_max = mfspr(SPRN_TLB1CFG) & TLBnCFG_N_ENTRY;
get_paca()->tcd.esel_first = i;
+
+   get_paca()->tcd.lrat_next = 0;
+   if (((mfspr(SPRN_MMUCFG) & MMUCFG_MAVN) == MMUCFG_MAVN_V2) &&
+   (mfspr(SPRN_MMUCFG) & MMUCFG_LRAT)) {
+   get_paca()->tcd.lrat_max = mfspr(SPRN_LRATCFG) & LRATCFG_NENTRY;
+   } else {
+   get_paca()->tcd.lrat_max = 0;
+   }
 #endif
 
return amount_mapped;
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 3/4] KVM: PPC: e500: TLB emulation for IND entries

2014-07-03 Thread Mihai Caraman
Handle indirect entries (IND) in TLB emulation code. Translation size of IND
entries differ from the size of referred Page Tables (Linux guests now use IND
of 2MB for 4KB PTs) and this require careful tweak of the existing logic.

TLB search emulation requires additional search in HW TLB0 (since these entries
are directly added by HTW) and found entries shoud be presented to the guest 
with
RPN changed from PFN to GFN. There might be more GFNs pointing to the same PFN 
so
the only way to get the corresponding GFN is to search it in guest's PTE. If IND
entry for the corresponding PT is not available just invalidate guest's ea and
report a tlbsx miss. This patch only implements the invalidation and let a TODO
note for searching HW TLB0.

Signed-off-by: Mihai Caraman 
---
 arch/powerpc/include/asm/mmu-book3e.h |  2 +
 arch/powerpc/kvm/e500.h   | 81 ---
 arch/powerpc/kvm/e500_mmu.c   | 78 +++--
 arch/powerpc/kvm/e500_mmu_host.c  | 31 --
 arch/powerpc/kvm/e500mc.c | 53 +--
 5 files changed, 211 insertions(+), 34 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu-book3e.h 
b/arch/powerpc/include/asm/mmu-book3e.h
index ac6acf7..e482ad8 100644
--- a/arch/powerpc/include/asm/mmu-book3e.h
+++ b/arch/powerpc/include/asm/mmu-book3e.h
@@ -59,6 +59,7 @@
 #define MAS1_IPROT 0x4000
 #define MAS1_TID(x)(((x) << 16) & 0x3FFF)
 #define MAS1_IND   0x2000
+#define MAS1_IND_SHIFT 13
 #define MAS1_TS0x1000
 #define MAS1_TSIZE_MASK0x0f80
 #define MAS1_TSIZE_SHIFT   7
@@ -94,6 +95,7 @@
 #define MAS4_TLBSEL_MASK   MAS0_TLBSEL_MASK
 #define MAS4_TLBSELD(x)MAS0_TLBSEL(x)
 #define MAS4_INDD  0x8000  /* Default IND */
+#define MAS4_INDD_SHIFT15
 #define MAS4_TSIZED(x) MAS1_TSIZE(x)
 #define MAS4_X0D   0x0040
 #define MAS4_X1D   0x0020
diff --git a/arch/powerpc/kvm/e500.h b/arch/powerpc/kvm/e500.h
index a326178..70a556d 100644
--- a/arch/powerpc/kvm/e500.h
+++ b/arch/powerpc/kvm/e500.h
@@ -148,6 +148,22 @@ unsigned int kvmppc_e500_get_sid(struct kvmppc_vcpu_e500 
*vcpu_e500,
 unsigned int pr, int avoid_recursion);
 #endif
 
+static inline bool has_feature(const struct kvm_vcpu *vcpu,
+  enum vcpu_ftr ftr)
+{
+   bool has_ftr;
+
+   switch (ftr) {
+   case VCPU_FTR_MMU_V2:
+   has_ftr = ((vcpu->arch.mmucfg & MMUCFG_MAVN) == MMUCFG_MAVN_V2);
+   break;
+
+   default:
+   return false;
+   }
+   return has_ftr;
+}
+
 /* TLB helper functions */
 static inline unsigned int
 get_tlb_size(const struct kvm_book3e_206_tlb_entry *tlbe)
@@ -207,6 +223,16 @@ get_tlb_tsize(const struct kvm_book3e_206_tlb_entry *tlbe)
return (tlbe->mas1 & MAS1_TSIZE_MASK) >> MAS1_TSIZE_SHIFT;
 }
 
+static inline unsigned int
+get_tlb_ind(const struct kvm_vcpu *vcpu,
+   const struct kvm_book3e_206_tlb_entry *tlbe)
+{
+   if (has_feature(vcpu, VCPU_FTR_MMU_V2))
+   return (tlbe->mas1 & MAS1_IND) >> MAS1_IND_SHIFT;
+
+   return 0;
+}
+
 static inline unsigned int get_cur_pid(struct kvm_vcpu *vcpu)
 {
return vcpu->arch.pid & 0xff;
@@ -232,6 +258,30 @@ static inline unsigned int get_cur_sas(const struct 
kvm_vcpu *vcpu)
return vcpu->arch.shared->mas6 & 0x1;
 }
 
+static inline unsigned int get_cur_ind(const struct kvm_vcpu *vcpu)
+{
+   if (has_feature(vcpu, VCPU_FTR_MMU_V2))
+   return (vcpu->arch.shared->mas1 & MAS1_IND) >> MAS1_IND_SHIFT;
+
+   return 0;
+}
+
+static inline unsigned int get_cur_indd(const struct kvm_vcpu *vcpu)
+{
+   if (has_feature(vcpu, VCPU_FTR_MMU_V2))
+   return (vcpu->arch.shared->mas4 & MAS4_INDD) >> MAS4_INDD_SHIFT;
+
+   return 0;
+}
+
+static inline unsigned int get_cur_sind(const struct kvm_vcpu *vcpu)
+{
+   if (has_feature(vcpu, VCPU_FTR_MMU_V2))
+   return (vcpu->arch.shared->mas6 & MAS6_SIND) >> MAS6_SIND_SHIFT;
+
+   return 0;
+}
+
 static inline unsigned int get_tlb_tlbsel(const struct kvm_vcpu *vcpu)
 {
/*
@@ -286,6 +336,22 @@ void kvmppc_e500_tlbil_one(struct kvmppc_vcpu_e500 
*vcpu_e500,
 void kvmppc_e500_tlbil_all(struct kvmppc_vcpu_e500 *vcpu_e500);
 
 #ifdef CONFIG_KVM_BOOKE_HV
+void inval_tlb_on_host(struct kvm_vcpu *vcpu, int type, int pid);
+
+void inval_ea_on_host(struct kvm_vcpu *vcpu, gva_t ea, int pid, int sas,
+ int sind);
+#else
+/* TLB is fully virtualized */
+static inline void inval_tlb_on_host(struct kvm_vcpu *vcpu,
+int type, int pid)
+{}
+
+static inline void inval_ea_on_host(struct kvm_vcpu *vcpu, gva_t ea, int pid,
+   int sas, int sind)
+{}
+#endif
+
+#ifdef CONFIG_

[RFC PATCH 0/4] KVM Book3E support for HTW guests

2014-07-03 Thread Mihai Caraman
KVM Book3E support for Hardware Page Tablewalk enabled guests.

Mihai Caraman (4):
  powerpc/booke64: Add LRAT next and max entries to tlb_core_data
structure
  KVM: PPC: Book3E: Handle LRAT error exception
  KVM: PPC: e500: TLB emulation for IND entries
  KVM: PPC: e500mc: Advertise E.PT to support HTW guests

 arch/powerpc/include/asm/kvm_host.h   |   1 +
 arch/powerpc/include/asm/kvm_ppc.h|   2 +
 arch/powerpc/include/asm/mmu-book3e.h |  12 +++
 arch/powerpc/include/asm/reg_booke.h  |  14 +++
 arch/powerpc/kernel/asm-offsets.c |   1 +
 arch/powerpc/kvm/booke.c  |  40 +
 arch/powerpc/kvm/bookehv_interrupts.S |   9 +-
 arch/powerpc/kvm/e500.h   |  81 ++
 arch/powerpc/kvm/e500_mmu.c   |  84 ++
 arch/powerpc/kvm/e500_mmu_host.c  | 156 +-
 arch/powerpc/kvm/e500mc.c |  55 +++-
 arch/powerpc/mm/fsl_booke_mmu.c   |   8 ++
 12 files changed, 423 insertions(+), 40 deletions(-)

-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] kvm, mem-hotplug: Update apic access page when it is migrated.

2014-07-03 Thread Gleb Natapov
On Wed, Jul 02, 2014 at 05:00:37PM +0800, Tang Chen wrote:
> apic access page is pinned in memory, and as a result it cannot be
> migrated/hot-removed.
> 
> Actually it doesn't need to be pinned in memory.
> 
> This patch introduces a new vcpu request: KVM_REQ_MIGRATE_EPT. This requet
> will be made when kvm_mmu_notifier_invalidate_page() is called when the page
> is unmapped from the qemu user space to reset APIC_ACCESS_ADDR pointer in
> each online vcpu to 0. And will also be made when ept violation happens to
> reset APIC_ACCESS_ADDR to the new page phys_addr (host phys_addr).
> ---
>  arch/x86/include/asm/kvm_host.h |  2 ++
>  arch/x86/kvm/mmu.c  | 15 +++
>  arch/x86/kvm/vmx.c  |  9 -
>  arch/x86/kvm/x86.c  | 20 
>  include/linux/kvm_host.h|  1 +
>  virt/kvm/kvm_main.c | 15 +++
>  6 files changed, 61 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 8771c0f..f104b87 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -575,6 +575,7 @@ struct kvm_arch {
>  
>   unsigned int tss_addr;
>   struct page *apic_access_page;
> + bool apic_access_page_migrated;
Better have two requests KVM_REQ_APIC_PAGE_MAP, KVM_REQ_APIC_PAGE_UNMAP IMO.

>  
>   gpa_t wall_clock;
>  
> @@ -739,6 +740,7 @@ struct kvm_x86_ops {
>   void (*hwapic_isr_update)(struct kvm *kvm, int isr);
>   void (*load_eoi_exitmap)(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap);
>   void (*set_virtual_x2apic_mode)(struct kvm_vcpu *vcpu, bool set);
> + void (*set_apic_access_page_addr)(struct kvm *kvm, hpa_t hpa);
>   void (*deliver_posted_interrupt)(struct kvm_vcpu *vcpu, int vector);
>   void (*sync_pir_to_irr)(struct kvm_vcpu *vcpu);
>   int (*set_tss_addr)(struct kvm *kvm, unsigned int addr);
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index c0d72f6..a655444 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -3436,6 +3436,21 @@ static int tdp_page_fault(struct kvm_vcpu *vcpu, gva_t 
> gpa, u32 error_code,
>   kvm_make_request(KVM_REQ_MIGRATE_EPT, vcpu);
>   }
>  
> + if (gpa == VMX_APIC_ACCESS_PAGE_ADDR &&
> + vcpu->kvm->arch.apic_access_page_migrated) {
Why check arch.apic_access_page_migrated here? Isn't it enough that the fault 
is on apic
address.

> + int i;
> +
> + vcpu->kvm->arch.apic_access_page_migrated = false;
> +
> + /*
> +  * We need update APIC_ACCESS_ADDR pointer in each VMCS of
> +  * all the online vcpus.
> +  */
> + for (i = 0; i < atomic_read(&vcpu->kvm->online_vcpus); i++)
> + kvm_make_request(KVM_REQ_MIGRATE_APIC,
> +  vcpu->kvm->vcpus[i]);
make_all_cpus_request(). You need to kick all vcpus from a guest mode.

> + }
> +
>   spin_unlock(&vcpu->kvm->mmu_lock);
>  
>   return r;
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index c336cb3..abc152f 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -3988,7 +3988,7 @@ static int alloc_apic_access_page(struct kvm *kvm)
>   if (r)
>   goto out;
>  
> - page = gfn_to_page(kvm, VMX_APIC_ACCESS_PAGE_ADDR >> PAGE_SHIFT);
> + page = gfn_to_page_no_pin(kvm, VMX_APIC_ACCESS_PAGE_ADDR >> PAGE_SHIFT);
>   if (is_error_page(page)) {
>   r = -EFAULT;
>   goto out;
> @@ -7075,6 +7075,12 @@ static void vmx_set_virtual_x2apic_mode(struct 
> kvm_vcpu *vcpu, bool set)
>   vmx_set_msr_bitmap(vcpu);
>  }
>  
> +static void vmx_set_apic_access_page_addr(struct kvm *kvm, hpa_t hpa)
> +{
> + if (vm_need_virtualize_apic_accesses(kvm))
This shouldn't even been called if apic access page is not supported. Nor
mmu_notifier path neither tdp_page_fault path should ever see 0xfee0
address. BUG() is more appropriate here.


> + vmcs_write64(APIC_ACCESS_ADDR, hpa);
> +}
> +
>  static void vmx_hwapic_isr_update(struct kvm *kvm, int isr)
>  {
>   u16 status;
> @@ -8846,6 +8852,7 @@ static struct kvm_x86_ops vmx_x86_ops = {
>   .enable_irq_window = enable_irq_window,
>   .update_cr8_intercept = update_cr8_intercept,
>   .set_virtual_x2apic_mode = vmx_set_virtual_x2apic_mode,
> + .set_apic_access_page_addr = vmx_set_apic_access_page_addr,
svm needs that too.

>   .vm_has_apicv = vmx_vm_has_apicv,
>   .load_eoi_exitmap = vmx_load_eoi_exitmap,
>   .hwapic_irr_update = vmx_hwapic_irr_update,
> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index a26524f..14e7174 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -5943,6 +5943,24 @@ static void vcpu_migrated_page_update_ept(struct 
> kvm_vcpu *vcpu)
>   }
>  }
>  
> +static void vcpu_migrated_page_update_apic(struct kvm_vcpu *vcpu)
> +{
> + struct k

Re: [PATCH 5/5 v4] KVM: PPC: Bookehv: Get vcpu's last instruction for emulation

2014-07-03 Thread Alexander Graf


On 28.06.14 00:49, Mihai Caraman wrote:

On book3e, KVM uses load external pid (lwepx) dedicated instruction to read
guest last instruction on the exit path. lwepx exceptions (DTLB_MISS, DSI
and LRAT), generated by loading a guest address, needs to be handled by KVM.
These exceptions are generated in a substituted guest translation context
(EPLC[EGS] = 1) from host context (MSR[GS] = 0).

Currently, KVM hooks only interrupts generated from guest context (MSR[GS] = 1),
doing minimal checks on the fast path to avoid host performance degradation.
lwepx exceptions originate from host state (MSR[GS] = 0) which implies
additional checks in DO_KVM macro (beside the current MSR[GS] = 1) by looking
at the Exception Syndrome Register (ESR[EPID]) and the External PID Load Context
Register (EPLC[EGS]). Doing this on each Data TLB miss exception is obvious
too intrusive for the host.

Read guest last instruction from kvmppc_load_last_inst() by searching for the
physical address and kmap it. This address the TODO for TLB eviction and
execute-but-not-read entries, and allow us to get rid of lwepx until we are
able to handle failures.

A simple stress benchmark shows a 1% sys performance degradation compared with
previous approach (lwepx without failure handling):

time for i in `seq 1 1`; do /bin/echo > /dev/null; done

real0m 8.85s
user0m 4.34s
sys 0m 4.48s

vs

real0m 8.84s
user0m 4.36s
sys 0m 4.44s

An alternative solution, to handle lwepx exceptions in KVM, is to temporary
highjack the interrupt vector from host. Some cores share host IVOR registers
between hardware threads, which is the case of FSL e6500, which impose 
additional
synchronization logic for this solution to work. The optimization can be 
addressed
later on top of this patch.

Signed-off-by: Mihai Caraman 
---
v4:
  - add switch and new function when getting last inst earlier
  - use enum instead of prev semnatic
  - get rid of mas0, optimize mas7_mas3
  - give more context in visible messages
  - check storage attributes mismatch on MMUv2
  - get rid of pfn_valid check

v3:
  - reworked patch description
  - use unaltered kmap addr for kunmap
  - get last instruction before beeing preempted

v2:
  - reworked patch description
  - used pr_* functions
  - addressed cosmetic feedback

  arch/powerpc/kvm/booke.c  | 44 +
  arch/powerpc/kvm/bookehv_interrupts.S | 37 --
  arch/powerpc/kvm/e500_mmu_host.c  | 91 +++
  3 files changed, 144 insertions(+), 28 deletions(-)

diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 34a42b9..843077b 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -869,6 +869,28 @@ static void kvmppc_restart_interrupt(struct kvm_vcpu *vcpu,
}
  }
  
+static int kvmppc_resume_inst_load(struct kvm_run *run, struct kvm_vcpu *vcpu,

+ enum emulation_result emulated, u32 last_inst)
+{
+   switch (emulated) {
+   case EMULATE_AGAIN:
+   return RESUME_GUEST;
+
+   case EMULATE_FAIL:
+   pr_debug("%s: load instruction from guest address %lx failed\n",
+  __func__, vcpu->arch.pc);
+   /* For debugging, encode the failing instruction and
+* report it to userspace. */
+   run->hw.hardware_exit_reason = ~0ULL << 32;
+   run->hw.hardware_exit_reason |= last_inst;
+   kvmppc_core_queue_program(vcpu, ESR_PIL);
+   return RESUME_HOST;
+
+   default:
+   BUG();
+   }
+}
+
  /**
   * kvmppc_handle_exit
   *
@@ -880,6 +902,8 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu 
*vcpu,
int r = RESUME_HOST;
int s;
int idx;
+   u32 last_inst = KVM_INST_FETCH_FAILED;
+   enum emulation_result emulated = EMULATE_DONE;
  
  	/* update before a new last_exit_type is rewritten */

kvmppc_update_timing_stats(vcpu);
@@ -887,6 +911,20 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
/* restart interrupts if they were meant for the host */
kvmppc_restart_interrupt(vcpu, exit_nr);
  
+	/*

+* get last instruction before beeing preempted
+* TODO: for e6500 check also BOOKE_INTERRUPT_LRAT_ERROR & ESR_DATA
+*/
+   switch (exit_nr) {
+   case BOOKE_INTERRUPT_DATA_STORAGE:
+   case BOOKE_INTERRUPT_DTLB_MISS:
+   case BOOKE_INTERRUPT_HV_PRIV:
+   emulated = kvmppc_get_last_inst(vcpu, false, &last_inst);
+   break;
+   default:
+   break;
+   }
+
local_irq_enable();
  
  	trace_kvm_exit(exit_nr, vcpu);

@@ -895,6 +933,11 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
run->exit_reason = KVM_EXIT_UNKNOWN;
run->ready_for_interrupt_injection = 1;
  
+	if (emulated != EMULATE_DONE) {

+   r = kvmppc_resume_inst_load(run,

Re: [PATCH 4/5 v4] KVM: PPC: Alow kvmppc_get_last_inst() to fail

2014-07-03 Thread Alexander Graf


On 28.06.14 00:49, Mihai Caraman wrote:

On book3e, guest last instruction is read on the exit path using load
external pid (lwepx) dedicated instruction. This load operation may fail
due to TLB eviction and execute-but-not-read entries.

This patch lay down the path for an alternative solution to read the guest
last instruction, by allowing kvmppc_get_lat_inst() function to fail.
Architecture specific implmentations of kvmppc_load_last_inst() may read
last guest instruction and instruct the emulation layer to re-execute the
guest in case of failure.

Make kvmppc_get_last_inst() definition common between architectures.

Signed-off-by: Mihai Caraman 
---
v4:
  - these changes compile on book3s, please validate the functionality and
do the necessary adaptations!
  - common declaration and enum for kvmppc_load_last_inst()
  - remove kvmppc_read_inst() in a preceding patch

v3:
  - rework patch description
  - add common definition for kvmppc_get_last_inst()
  - check return values in book3s code

v2:
  - integrated kvmppc_get_last_inst() in book3s code and checked build
  - addressed cosmetic feedback

  arch/powerpc/include/asm/kvm_book3s.h| 26 --
  arch/powerpc/include/asm/kvm_booke.h |  5 
  arch/powerpc/include/asm/kvm_ppc.h   | 24 +
  arch/powerpc/kvm/book3s.c| 11 
  arch/powerpc/kvm/book3s_64_mmu_hv.c  | 17 
  arch/powerpc/kvm/book3s_paired_singles.c | 38 +--
  arch/powerpc/kvm/book3s_pr.c | 45 
  arch/powerpc/kvm/booke.c |  3 +++
  arch/powerpc/kvm/e500_mmu_host.c |  6 +
  arch/powerpc/kvm/emulate.c   | 18 -
  arch/powerpc/kvm/powerpc.c   | 11 ++--
  11 files changed, 128 insertions(+), 76 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index ceb70aa..1300cd9 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -276,32 +276,6 @@ static inline bool kvmppc_need_byteswap(struct kvm_vcpu 
*vcpu)
return (kvmppc_get_msr(vcpu) & MSR_LE) != (MSR_KERNEL & MSR_LE);
  }
  
-static inline u32 kvmppc_get_last_inst_internal(struct kvm_vcpu *vcpu, ulong pc)

-{
-   /* Load the instruction manually if it failed to do so in the
-* exit path */
-   if (vcpu->arch.last_inst == KVM_INST_FETCH_FAILED)
-   kvmppc_ld(vcpu, &pc, sizeof(u32), &vcpu->arch.last_inst, false);
-
-   return kvmppc_need_byteswap(vcpu) ? swab32(vcpu->arch.last_inst) :
-   vcpu->arch.last_inst;
-}
-
-static inline u32 kvmppc_get_last_inst(struct kvm_vcpu *vcpu)
-{
-   return kvmppc_get_last_inst_internal(vcpu, kvmppc_get_pc(vcpu));
-}
-
-/*
- * Like kvmppc_get_last_inst(), but for fetching a sc instruction.
- * Because the sc instruction sets SRR0 to point to the following
- * instruction, we have to fetch from pc - 4.
- */
-static inline u32 kvmppc_get_last_sc(struct kvm_vcpu *vcpu)
-{
-   return kvmppc_get_last_inst_internal(vcpu, kvmppc_get_pc(vcpu) - 4);
-}
-
  static inline ulong kvmppc_get_fault_dar(struct kvm_vcpu *vcpu)
  {
return vcpu->arch.fault_dar;
diff --git a/arch/powerpc/include/asm/kvm_booke.h 
b/arch/powerpc/include/asm/kvm_booke.h
index c7aed61..cbb1990 100644
--- a/arch/powerpc/include/asm/kvm_booke.h
+++ b/arch/powerpc/include/asm/kvm_booke.h
@@ -69,11 +69,6 @@ static inline bool kvmppc_need_byteswap(struct kvm_vcpu 
*vcpu)
return false;
  }
  
-static inline u32 kvmppc_get_last_inst(struct kvm_vcpu *vcpu)

-{
-   return vcpu->arch.last_inst;
-}
-
  static inline void kvmppc_set_ctr(struct kvm_vcpu *vcpu, ulong val)
  {
vcpu->arch.ctr = val;
diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index e2fd5a1..ec326c8 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -47,6 +47,11 @@ enum emulation_result {
EMULATE_EXIT_USER,/* emulation requires exit to user-space */
  };
  
+enum instruction_type {

+   INST_GENERIC,
+   INST_SC,/* system call */
+};
+
  extern int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu);
  extern int __kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu);
  extern void kvmppc_handler_highmem(void);
@@ -62,6 +67,9 @@ extern int kvmppc_handle_store(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
   u64 val, unsigned int bytes,
   int is_default_endian);
  
+extern int kvmppc_load_last_inst(struct kvm_vcpu *vcpu,

+enum instruction_type type, u32 *inst);
+
  extern int kvmppc_emulate_instruction(struct kvm_run *run,
struct kvm_vcpu *vcpu);
  extern int kvmppc_emulate_mmio(struct kvm_run *run, struct kvm_vcpu *vcpu);
@@ -234,6 +242,22 @@ st

Re: [PATCH 3/5 v4] KVM: PPC: Book3s: Remove kvmppc_read_inst() function

2014-07-03 Thread Alexander Graf


On 28.06.14 00:49, Mihai Caraman wrote:

In the context of replacing kvmppc_ld() function calls with a version of
kvmppc_get_last_inst() which allow to fail, Alex Graf suggested this:

"If we get EMULATE_AGAIN, we just have to make sure we go back into the guest.
No need to inject an ISI into  the guest - it'll do that all by itself.
With an error returning kvmppc_get_last_inst we can just use completely
get rid of kvmppc_read_inst() and only use kvmppc_get_last_inst() instead."

As a intermediate step get rid of kvmppc_read_inst() and only use kvmppc_ld()
instead.

Signed-off-by: Mihai Caraman 
---
v4:
  - new patch

  arch/powerpc/kvm/book3s_pr.c | 85 ++--
  1 file changed, 35 insertions(+), 50 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index 15fd6c2..d247d88 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -665,42 +665,6 @@ static void kvmppc_giveup_fac(struct kvm_vcpu *vcpu, ulong 
fac)
  #endif
  }
  
-static int kvmppc_read_inst(struct kvm_vcpu *vcpu)

-{
-   ulong srr0 = kvmppc_get_pc(vcpu);
-   u32 last_inst = kvmppc_get_last_inst(vcpu);
-   int ret;
-
-   ret = kvmppc_ld(vcpu, &srr0, sizeof(u32), &last_inst, false);
-   if (ret == -ENOENT) {
-   ulong msr = kvmppc_get_msr(vcpu);
-
-   msr = kvmppc_set_field(msr, 33, 33, 1);
-   msr = kvmppc_set_field(msr, 34, 36, 0);
-   msr = kvmppc_set_field(msr, 42, 47, 0);
-   kvmppc_set_msr_fast(vcpu, msr);
-   kvmppc_book3s_queue_irqprio(vcpu, 
BOOK3S_INTERRUPT_INST_STORAGE);
-   return EMULATE_AGAIN;
-   }
-
-   return EMULATE_DONE;
-}
-
-static int kvmppc_check_ext(struct kvm_vcpu *vcpu, unsigned int exit_nr)
-{
-
-   /* Need to do paired single emulation? */
-   if (!(vcpu->arch.hflags & BOOK3S_HFLAG_PAIRED_SINGLE))
-   return EMULATE_DONE;
-
-   /* Read out the instruction */
-   if (kvmppc_read_inst(vcpu) == EMULATE_DONE)
-   /* Need to emulate */
-   return EMULATE_FAIL;
-
-   return EMULATE_AGAIN;
-}
-
  /* Handle external providers (FPU, Altivec, VSX) */
  static int kvmppc_handle_ext(struct kvm_vcpu *vcpu, unsigned int exit_nr,
 ulong msr)
@@ -1101,31 +1065,51 @@ program_interrupt:
case BOOK3S_INTERRUPT_VSX:
{
int ext_msr = 0;
+   int emul;
+   ulong pc;
+   u32 last_inst;
  
-		switch (exit_nr) {

-   case BOOK3S_INTERRUPT_FP_UNAVAIL: ext_msr = MSR_FP;  break;
-   case BOOK3S_INTERRUPT_ALTIVEC:ext_msr = MSR_VEC; break;
-   case BOOK3S_INTERRUPT_VSX:ext_msr = MSR_VSX; break;
-   }
+   if (!(vcpu->arch.hflags & BOOK3S_HFLAG_PAIRED_SINGLE)) {


Please make paired single emulation the unusual, if()'ed case, not the 
normal exit path :).



Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Memory statistic question

2014-07-03 Thread Gleb Voronich

Hello,

I'm trying to get memory usage statistic inside a guest machine using 
virDomainMemoryStats function at libvirt.
According to 
http://libvirt.org/html/libvirt-libvirt.html#virDomainMemoryStatTags the 
structure could have a lot of useful memory statistic but the amount of 
parameters depends on the hypervisor and the driver.
However using KVM I'm able to get VIR_DOMAIN_MEMORY_STAT_ACTUAL_BALLOON 
and VIR_DOMAIN_MEMORY_STAT_RSS.


Am I right it is not possible to get additional info such as used/unused 
memory inside a guest machine in case of KVM?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 08/10] userfaultfd: add new syscall to provide memory externalization

2014-07-03 Thread Andrea Arcangeli
Hi Andy,

thanks for CC'ing linux-api.

On Wed, Jul 02, 2014 at 06:56:03PM -0700, Andy Lutomirski wrote:
> On 07/02/2014 09:50 AM, Andrea Arcangeli wrote:
> > Once an userfaultfd is created MADV_USERFAULT regions talks through
> > the userfaultfd protocol with the thread responsible for doing the
> > memory externalization of the process.
> > 
> > The protocol starts by userland writing the requested/preferred
> > USERFAULT_PROTOCOL version into the userfault fd (64bit write), if
> > kernel knows it, it will ack it by allowing userland to read 64bit
> > from the userfault fd that will contain the same 64bit
> > USERFAULT_PROTOCOL version that userland asked. Otherwise userfault
> > will read __u64 value -1ULL (aka USERFAULTFD_UNKNOWN_PROTOCOL) and it
> > will have to try again by writing an older protocol version if
> > suitable for its usage too, and read it back again until it stops
> > reading -1ULL. After that the userfaultfd protocol starts.
> > 
> > The protocol consists in the userfault fd reads 64bit in size
> > providing userland the fault addresses. After a userfault address has
> > been read and the fault is resolved by userland, the application must
> > write back 128bits in the form of [ start, end ] range (64bit each)
> > that will tell the kernel such a range has been mapped. Multiple read
> > userfaults can be resolved in a single range write. poll() can be used
> > to know when there are new userfaults to read (POLLIN) and when there
> > are threads waiting a wakeup through a range write (POLLOUT).
> > 
> > Signed-off-by: Andrea Arcangeli 
> 
> > +#ifdef CONFIG_PROC_FS
> > +static int userfaultfd_show_fdinfo(struct seq_file *m, struct file *f)
> > +{
> > +   struct userfaultfd_ctx *ctx = f->private_data;
> > +   int ret;
> > +   wait_queue_t *wq;
> > +   struct userfaultfd_wait_queue *uwq;
> > +   unsigned long pending = 0, total = 0;
> > +
> > +   spin_lock(&ctx->fault_wqh.lock);
> > +   list_for_each_entry(wq, &ctx->fault_wqh.task_list, task_list) {
> > +   uwq = container_of(wq, struct userfaultfd_wait_queue, wq);
> > +   if (uwq->pending)
> > +   pending++;
> > +   total++;
> > +   }
> > +   spin_unlock(&ctx->fault_wqh.lock);
> > +
> > +   ret = seq_printf(m, "pending:\t%lu\ntotal:\t%lu\n", pending, total);
> 
> This should show the protocol version, too.

Ok, does the below look ok?

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 388553e..f9d3e9f 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -493,7 +493,13 @@ static int userfaultfd_show_fdinfo(struct seq_file *m, 
struct file *f)
}
spin_unlock(&ctx->fault_wqh.lock);
 
-   ret = seq_printf(m, "pending:\t%lu\ntotal:\t%lu\n", pending, total);
+   /*
+* If more protocols will be added, there will be all shown
+* separated by a space. Like this:
+*  protocols: 0xaa 0xbb
+*/
+   ret = seq_printf(m, "pending:\t%lu\ntotal:\t%lu\nprotocols:\t%Lx\n",
+pending, total, USERFAULTFD_PROTOCOL);
 
return ret;
 }


> > +
> > +SYSCALL_DEFINE1(userfaultfd, int, flags)
> > +{
> > +   int fd, error;
> > +   struct file *file;
> 
> This looks like it can't be used more than once in a process.  That will

It can't be used more than once, correct.

file = ERR_PTR(-EBUSY);
if (get_mm_slot(current->mm))
goto out_free_unlock;

If a userfaultfd is already registered for the current mm the second
one gets -EBUSY.

> be unfortunate for libraries.  Would it be feasible to either have

So you envision two userfaultfd memory managers for the same process?
I assume each one would claim separate ranges of memory?

For that case the demultiplexing of userfaults can be entirely managed
by userland.

One libuserfault library can actually register the userfaultfd, and
then the two libs can register into libuserfault and claim their own
ranges. It could run the code of the two libs in the thread context
that waits on the userfaultfd with zero overhead, or message passing
across threads can be used to run both libs in parallel in their own
thread. The demultiplexing code wouldn't be CPU intensive. The
downside are two schedule event required if they want to run their lib
code in a separate thread. If we'd claim the two different ranges in
the kernel for two different userfaultfd, the kernel would be speaking
directly with each library thread. That'd be the only advantage if
they don't want to run in the context of the thread that waits on the
userfaultfd.

To increase SMP scalability in the future we could also add a
UFFD_LOAD_BALANCE to distribute userfaults to different userfaultfd,
that if used could relax the -EBUSY (but it wouldn't be two different
claimed ranges for two different libs).

If passing UFFD_LOAD_BALANCE to the current code sys_userfaultfd would
return -EINVAL. I haven't implemented it because I'm not sure if such
thing would ever be needed. Compared to distributing the user

Re: [PATCH 5/6 v2] KVM: PPC: Book3E: Add ONE_REG AltiVec support

2014-07-03 Thread Alexander Graf


On 30.06.14 17:34, Mihai Caraman wrote:

Add ONE_REG support for AltiVec on Book3E.

Signed-off-by: Mihai Caraman 


Any chance we can handle these in generic code?


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/6 v2] KVM: PPC: Book3E: Add AltiVec support

2014-07-03 Thread Alexander Graf


On 30.06.14 17:34, Mihai Caraman wrote:

Add KVM Book3E AltiVec support. KVM Book3E FPU support gracefully reuse host
infrastructure so follow the same approach for AltiVec.

Signed-off-by: Mihai Caraman 


Same comment here - I fail to see how we refetch Altivec state after a 
context switch.



Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/6 v2] KVM: PPC: Book3E: Increase FPU laziness

2014-07-03 Thread Alexander Graf


On 30.06.14 17:34, Mihai Caraman wrote:

Increase FPU laziness by calling kvmppc_load_guest_fp() just before
returning to guest instead of each sched in. Without this improvement
an interrupt may also claim floting point corrupting guest state.


How do you handle context switching with this patch applied? During most 
of the guest's lifetime we never exit kvmppc_vcpu_run(), so when the 
guest gets switched out all FPU state gets lost?



Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/6 v2] KVM: PPC: Book3E: Refactor SPE/FP exit handling

2014-07-03 Thread Alexander Graf


On 30.06.14 17:34, Mihai Caraman wrote:

SPE/FP/AltiVec interrupts share the same numbers. Refactor SPE/FP exit handling
to accommodate AltiVec later on the same flow. Add kvmppc_supports_spe() to 
detect
suport for the unit at runtime since it can be configured in the kernel but not
featured on hardware and vice versa.

Signed-off-by: Mihai Caraman 


Why not keep them 100% separate?


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/6 v2] KVM: PPC: Book3E: Use common defines for SPE/FP/AltiVec int numbers

2014-07-03 Thread Alexander Graf


On 30.06.14 17:34, Mihai Caraman wrote:

Use common BOOKE_IRQPRIO and BOOKE_INTERRUPT defines for SPE/FP/AltiVec
which share the same interrupt numbers.

Signed-off-by: Mihai Caraman 
---
v2:
  - remove outdated definitions

  arch/powerpc/include/asm/kvm_asm.h|  8 
  arch/powerpc/kvm/booke.c  | 17 +
  arch/powerpc/kvm/booke.h  |  4 ++--
  arch/powerpc/kvm/booke_interrupts.S   |  9 +
  arch/powerpc/kvm/bookehv_interrupts.S |  4 ++--
  arch/powerpc/kvm/e500.c   | 10 ++
  arch/powerpc/kvm/e500_emulate.c   | 10 ++
  7 files changed, 30 insertions(+), 32 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_asm.h 
b/arch/powerpc/include/asm/kvm_asm.h
index 9601741..c94fd33 100644
--- a/arch/powerpc/include/asm/kvm_asm.h
+++ b/arch/powerpc/include/asm/kvm_asm.h
@@ -56,14 +56,6 @@
  /* E500 */
  #define BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL 32
  #define BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST 33
-/*
- * TODO: Unify 32-bit and 64-bit kernel exception handlers to use same defines
- */
-#define BOOKE_INTERRUPT_SPE_UNAVAIL BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL
-#define BOOKE_INTERRUPT_SPE_FP_DATA BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST
-#define BOOKE_INTERRUPT_ALTIVEC_UNAVAIL BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL
-#define BOOKE_INTERRUPT_ALTIVEC_ASSIST \
-   BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST


I think I'd prefer to keep them separate.


  #define BOOKE_INTERRUPT_SPE_FP_ROUND 34
  #define BOOKE_INTERRUPT_PERFORMANCE_MONITOR 35
  #define BOOKE_INTERRUPT_DOORBELL 36
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index ab62109..3c86d9b 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -388,8 +388,8 @@ static int kvmppc_booke_irqprio_deliver(struct kvm_vcpu 
*vcpu,
case BOOKE_IRQPRIO_ITLB_MISS:
case BOOKE_IRQPRIO_SYSCALL:
case BOOKE_IRQPRIO_FP_UNAVAIL:
-   case BOOKE_IRQPRIO_SPE_UNAVAIL:
-   case BOOKE_IRQPRIO_SPE_FP_DATA:
+   case BOOKE_IRQPRIO_SPE_ALTIVEC_UNAVAIL:
+   case BOOKE_IRQPRIO_SPE_FP_DATA_ALTIVEC_ASSIST:


#ifdef CONFIG_KVM_E500V2
  case ...SPE:
#else
  case ..ALTIVEC:
#endif


case BOOKE_IRQPRIO_SPE_FP_ROUND:
case BOOKE_IRQPRIO_AP_UNAVAIL:
allowed = 1;
@@ -977,18 +977,19 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
break;
  
  #ifdef CONFIG_SPE

-   case BOOKE_INTERRUPT_SPE_UNAVAIL: {
+   case BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL: {
if (vcpu->arch.shared->msr & MSR_SPE)
kvmppc_vcpu_enable_spe(vcpu);
else
kvmppc_booke_queue_irqprio(vcpu,
-  BOOKE_IRQPRIO_SPE_UNAVAIL);
+   BOOKE_IRQPRIO_SPE_ALTIVEC_UNAVAIL);
r = RESUME_GUEST;
break;
}
  
-	case BOOKE_INTERRUPT_SPE_FP_DATA:

-   kvmppc_booke_queue_irqprio(vcpu, BOOKE_IRQPRIO_SPE_FP_DATA);
+   case BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST:
+   kvmppc_booke_queue_irqprio(vcpu,
+   BOOKE_IRQPRIO_SPE_FP_DATA_ALTIVEC_ASSIST);
r = RESUME_GUEST;
break;
  
@@ -997,7 +998,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu *vcpu,

r = RESUME_GUEST;
break;
  #else
-   case BOOKE_INTERRUPT_SPE_UNAVAIL:
+   case BOOKE_INTERRUPT_SPE_ALTIVEC_UNAVAIL:
/*
 * Guest wants SPE, but host kernel doesn't support it.  Send
 * an "unimplemented operation" program check to the guest.
@@ -1010,7 +1011,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
 * These really should never happen without CONFIG_SPE,
 * as we should never enable the real MSR[SPE] in the guest.
 */
-   case BOOKE_INTERRUPT_SPE_FP_DATA:
+   case BOOKE_INTERRUPT_SPE_FP_DATA_ALTIVEC_ASSIST:
case BOOKE_INTERRUPT_SPE_FP_ROUND:
printk(KERN_CRIT "%s: unexpected SPE interrupt %u at %08lx\n",
   __func__, exit_nr, vcpu->arch.pc);
diff --git a/arch/powerpc/kvm/booke.h b/arch/powerpc/kvm/booke.h
index b632cd3..f182b32 100644
--- a/arch/powerpc/kvm/booke.h
+++ b/arch/powerpc/kvm/booke.h
@@ -32,8 +32,8 @@
  #define BOOKE_IRQPRIO_ALIGNMENT 2
  #define BOOKE_IRQPRIO_PROGRAM 3
  #define BOOKE_IRQPRIO_FP_UNAVAIL 4
-#define BOOKE_IRQPRIO_SPE_UNAVAIL 5
-#define BOOKE_IRQPRIO_SPE_FP_DATA 6
+#define BOOKE_IRQPRIO_SPE_ALTIVEC_UNAVAIL 5
+#define BOOKE_IRQPRIO_SPE_FP_DATA_ALTIVEC_ASSIST 6


#ifdef CONFIG_KVM_E500V2
#define ...SPE...
#else
#define ...ALTIVEC...
#endif

That way we can be 100% sure that no SPE cruft leaks into anything.



  #define BOOKE_IRQPRIO_SPE_FP_ROUND 7
  #define BOOKE_IRQPRIO_SYSCALL 8
  #define BOOKE_IRQPRIO_AP_UNAVAIL 9
diff --git a/arch/powerpc/kvm/booke_i

Re: [PATCH] KVM: PPC: e500: Fix default tlb for victim hint

2014-07-03 Thread Alexander Graf


On 30.06.14 20:18, Scott Wood wrote:

On Mon, 2014-06-30 at 15:54 +0300, Mihai Caraman wrote:

Tlb search operation used for victim hint relies on the default tlb set by the
host. When hardware tablewalk support is enabled in the host, the default tlb is
TLB1 which leads KVM to evict the bolted entry. Set and restore the default tlb
when searching for victim hint.

Signed-off-by: Mihai Caraman 
---
  arch/powerpc/include/asm/mmu-book3e.h | 5 -
  arch/powerpc/kvm/e500_mmu_host.c  | 4 
  2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/mmu-book3e.h 
b/arch/powerpc/include/asm/mmu-book3e.h
index 901dac6..5dad378 100644
--- a/arch/powerpc/include/asm/mmu-book3e.h
+++ b/arch/powerpc/include/asm/mmu-book3e.h
@@ -40,7 +40,9 @@
  
  /* MAS registers bit definitions */
  
-#define MAS0_TLBSEL(x)		(((x) << 28) & 0x3000)

+#define MAS0_TLBSEL_MASK0x3000
+#define MAS0_TLBSEL_SHIFT   28
+#define MAS0_TLBSEL(x)  (((x) << MAS0_TLBSEL_SHIFT) & MAS0_TLBSEL_MASK)
  #define MAS0_ESEL_MASK0x0FFF
  #define MAS0_ESEL_SHIFT   16
  #define MAS0_ESEL(x)  (((x) << MAS0_ESEL_SHIFT) & MAS0_ESEL_MASK)
@@ -86,6 +88,7 @@
  #define MAS3_SPSIZE   0x003e
  #define MAS3_SPSIZE_SHIFT 1
  
+#define MAS4_TLBSEL_MASK	MAS0_TLBSEL_MASK

  #define MAS4_TLBSELD(x)   MAS0_TLBSEL(x)
  #define MAS4_INDD 0x8000  /* Default IND */
  #define MAS4_TSIZED(x)MAS1_TSIZE(x)
diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c
index dd2cc03..79677d7 100644
--- a/arch/powerpc/kvm/e500_mmu_host.c
+++ b/arch/powerpc/kvm/e500_mmu_host.c
@@ -107,11 +107,15 @@ static u32 get_host_mas0(unsigned long eaddr)
  {
unsigned long flags;
u32 mas0;
+   u32 mas4;
  
  	local_irq_save(flags);

mtspr(SPRN_MAS6, 0);
+   mas4 = mfspr(SPRN_MAS4);
+   mtspr(SPRN_MAS4, mas4 & ~MAS4_TLBSEL_MASK);
asm volatile("tlbsx 0, %0" : : "b" (eaddr & ~CONFIG_PAGE_OFFSET));
mas0 = mfspr(SPRN_MAS0);
+   mtspr(SPRN_MAS4, mas4);
local_irq_restore(flags);
  
  	return mas0;

Reviewed-by: Scott Wood 


Thanks, applied to kvm-ppc-queue.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: PPC: e500: Emulate power management control SPR

2014-07-03 Thread Alexander Graf


On 30.06.14 20:20, Scott Wood wrote:

On Mon, 2014-06-30 at 15:55 +0300, Mihai Caraman wrote:

For FSL e6500 core the kernel uses power management SPR register (PWRMGTCR0)
to enable idle power down for cores and devices by setting up the idle count
period at boot time. With the host already controlling the power management
configuration the guest could simply benefit from it, so emulate guest request
as nop.

Signed-off-by: Mihai Caraman 
---
  arch/powerpc/kvm/e500_emulate.c | 8 
  1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/kvm/e500_emulate.c b/arch/powerpc/kvm/e500_emulate.c
index 002d517..98a22e5 100644
--- a/arch/powerpc/kvm/e500_emulate.c
+++ b/arch/powerpc/kvm/e500_emulate.c
@@ -250,6 +250,10 @@ int kvmppc_core_emulate_mtspr_e500(struct kvm_vcpu *vcpu, 
int sprn, ulong spr_va
spr_val);
break;
  
+	case SPRN_PWRMGTCR0:

+   /* Guest relies on host power management configurations */
+   break;
+
/* extra exceptions */
case SPRN_IVOR32:
vcpu->arch.ivor[BOOKE_IRQPRIO_SPE_UNAVAIL] = spr_val;
@@ -355,6 +359,10 @@ int kvmppc_core_emulate_mfspr_e500(struct kvm_vcpu *vcpu, 
int sprn, ulong *spr_v
*spr_val = 0;
break;
  
+	case SPRN_PWRMGTCR0:

+   *spr_val = 0;
+   break;
+
case SPRN_MMUCFG:
*spr_val = vcpu->arch.mmucfg;
break;

When reading, is it better to return zero, or the current host value, or
the value last written by the guest (even though it wasn't written to
hardware)?


I think it makes sense to treat it as general storage. I don't think 
leaking the host value into the guest is useful. And while zero works, 
the spec does say that the value gets retained, so I think we should do 
the same.



Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] powerpc/kvm: support to handle sw breakpoint

2014-07-03 Thread Alexander Graf


On 01.07.14 10:41, Madhavan Srinivasan wrote:

This patch adds kernel side support for software breakpoint.
Design is that, by using an illegal instruction, we trap to hypervisor
via Emulation Assistance interrupt, where we check for the illegal instruction
and accordingly we return to Host or Guest. Patch also adds support for
software breakpoint in PR KVM.

Patch mandates use of "abs" instruction as sw breakpoint instruction
(primary opcode 31 and extended opcode 360). Based on PowerISA v2.01,
ABS instruction has been dropped from the architecture and treated an
illegal instruction.

Changes v1->v2:

  Moved the debug instruction #def to kvm_book3s.h. This way PR_KVM can also 
share it.
  Added code to use KVM get one reg infrastructure to get debug opcode.
  Updated emulate.c to include emulation of debug instruction incase of PR_KVM.
  Made changes to commit message.

Signed-off-by: Madhavan Srinivasan 
---
  arch/powerpc/include/asm/kvm_book3s.h |8 
  arch/powerpc/include/asm/ppc-opcode.h |5 +
  arch/powerpc/kvm/book3s.c |3 ++-
  arch/powerpc/kvm/book3s_hv.c  |9 +
  arch/powerpc/kvm/book3s_pr.c  |3 +++
  arch/powerpc/kvm/emulate.c|   10 ++
  6 files changed, 37 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/kvm_book3s.h 
b/arch/powerpc/include/asm/kvm_book3s.h
index f52f656..180d549 100644
--- a/arch/powerpc/include/asm/kvm_book3s.h
+++ b/arch/powerpc/include/asm/kvm_book3s.h
@@ -24,6 +24,14 @@
  #include 
  #include 
  
+/*

+ * KVMPPC_INST_BOOK3S_DEBUG is debug Instruction for supporting Software 
Breakpoint.
+ * Instruction mnemonic is ABS, primary opcode is 31 and extended opcode is 
360.
+ * Based on PowerISA v2.01, ABS instruction has been dropped from the 
architecture
+ * and treated an illegal instruction.
+ */
+#define KVMPPC_INST_BOOK3S_DEBUG   0x7c0002d0


This will still break with LE guests.


+
  struct kvmppc_bat {
u64 raw;
u32 bepi;
diff --git a/arch/powerpc/include/asm/ppc-opcode.h 
b/arch/powerpc/include/asm/ppc-opcode.h
index 3132bb9..3fbb4c1 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -111,6 +111,11 @@
  #define OP_31_XOP_LHBRX 790
  #define OP_31_XOP_STHBRX918
  
+/* KVMPPC_INST_BOOK3S_DEBUG -- Software breakpoint Instruction

+ * Instruction mnemonic is ABS, primary opcode is 31 and extended opcode is 
360.
+ */
+#define OP_31_XOP_ABS  360
+
  #define OP_LWZ  32
  #define OP_LD   58
  #define OP_LWZU 33
diff --git a/arch/powerpc/kvm/book3s.c b/arch/powerpc/kvm/book3s.c
index c254c27..b40fe5d 100644
--- a/arch/powerpc/kvm/book3s.c
+++ b/arch/powerpc/kvm/book3s.c
@@ -789,7 +789,8 @@ int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu,
  int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,
struct kvm_guest_debug *dbg)
  {
-   return -EINVAL;
+   vcpu->guest_debug = dbg->control;
+   return 0;
  }
  
  void kvmppc_decrementer_func(unsigned long data)

diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 7a12edb..402c1ec 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -725,8 +725,14 @@ static int kvmppc_handle_exit_hv(struct kvm_run *run, 
struct kvm_vcpu *vcpu,
 * we don't emulate any guest instructions at this stage.
 */
case BOOK3S_INTERRUPT_H_EMUL_ASSIST:
+   if (kvmppc_get_last_inst(vcpu) == KVMPPC_INST_BOOK3S_DEBUG ) {
+   run->exit_reason = KVM_EXIT_DEBUG;
+   run->debug.arch.address = kvmppc_get_pc(vcpu);
+   r = RESUME_HOST;


Phew - why can't we just go into the normal instruction emulator for 
EMUL_ASSIST?



Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] KVM : powerpc/booke: Allow debug interrupt injection to guest

2014-07-03 Thread Alexander Graf


On 02.07.14 19:28, bharat.bhus...@freescale.com wrote:



-Original Message-
From: Bhushan Bharat-R65777
Sent: Wednesday, July 02, 2014 5:07 PM
To: Wood Scott-B07421; Alexander Graf
Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org
Subject: RE: [PATCH 2/2] KVM : powerpc/booke: Allow debug interrupt injection to
guest



-Original Message-
From: Wood Scott-B07421
Sent: Tuesday, July 01, 2014 10:11 PM
To: Alexander Graf
Cc: Bhushan Bharat-R65777; kvm-...@vger.kernel.org;
kvm@vger.kernel.org
Subject: Re: [PATCH 2/2] KVM : powerpc/booke: Allow debug interrupt
injection to guest

On Tue, 2014-07-01 at 18:22 +0200, Alexander Graf wrote:

On 01.07.14 17:35, Scott Wood wrote:

On Tue, 2014-07-01 at 17:04 +0200, Alexander Graf wrote:

On 01.07.14 16:58, Scott Wood wrote:

On Tue, 2014-07-01 at 08:23 +0200, Alexander Graf wrote:

I don't think QEMU should be aware of these limitations.

OK, but we should at least have some idea of how the whole thing
is supposed to work, in order to determine if this is the
correct behavior for QEMU.  I thought the model was that debug
resources are either owned by QEMU or by the guest, and in the
latter case, QEMU would never see the debug exception to begin with.

That's bad for a number of reasons. For starters it's different
from how
x86 handles debug registers - and I hate to be different just for
the sake of being different.

How does it work on x86?

It overwrites more-or-less random breakpoints with its own ones, but
leaves the others intact ;).

Are you talking about software breakpoints or management of hardware
debug registers?


So if we do want to declare that debug registers are owned by
either QEMU or the guest, we should change the semantics for all
architectures.

If we want to say that ownership of the registers is shared, we
need a plan for how that would actually work.

I think you're overengineering here :). When do people actually use
gdbstub? Usually when they want to debug a broken guest. We can
either

* overengineer heavily and reduce the number of registers
available to the guest to always have spares
* overengineer a bit and turn off guest debugging completely when
we use gdbstub
* just leave as much alive as we can, hoping that it helps with
the debugging

Option 3 is what x86 does - and I think it's a reasonable approach.
This is not an interface that needs to be 100% consistent and bullet
proof, it's a best effort to enable you to debug as much as possible.

I'm not insisting on 100% -- just hoping for some
explanation/discussion about how it's intended to work for the cases where it

can.

How will MSR[DE] and MSRP[DEP] be handled?

How would I go about telling QEMU/KVM that I don't want this shared
mode, because I don't want guest to interfere with the debugging I'm
trying to do from QEMU?

Will guest accesses to debug registers cause a userspace exit when
guest_debug is enabled?


I think we're in a path that is slow enough already to not worry
about performance.

It's not just about performance, but simplicity of use, and
consistency of API.

Oh, and it looks like there already exist one reg definitions and
implementations for most of the debug registers.

For BookE? Where?

arch/powerpc/kvm/booke.c: KVM_REG_PPC_IACn, KVM_REG_PPC_DACn

I tried to quickly prototype what I think we want to do (this is not tested)

Hi Scott,

There is one problem which is stopping us to share debug resource between qemu 
and guest, looking for suggestions:
- As qemu is also using debug resource,  We have to set MSR_DE and set MSRP_DEP 
(guest will not be able to clear MSR_DE). So qemu set debug events will always 
cause the debug interrupts.
- Now guest is also using debug resources and for some reason if guest wants to 
clear MSR_DE (disable debug interrupt) But it will not be able to disable as 
MSRP_DEP is set and KVM will not come to know guest willingness to disable 
MSR_DE.
- If the debug interrupts occurs then we will exit to QEMU and this may not a 
QEMU set event so it will inject interrupt to guest (using one-reg or set-sregs)
- Now KVM, when handling one-reg/sregs request to inject debug interrupt, do 
not know whether guest can handle the debug interrupt or not (as guest might 
have tried to set/clear MSR_DE).


Yeah, and with this everything falls apart. I guess we can't share 
hardware debug resources after all then. So yes, let's declare all 
hardware debug facilities QEMU owned as soon as QEMU starts using gdbstub.



Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3.11 079/198] MIPS: KVM: Allocate at least 16KB for exception handlers

2014-07-03 Thread Luis Henriques
3.11.10.13 -stable review patch.  If anyone has any objections, please let me 
know.

--

From: James Hogan 

commit 7006e2dfda9adfa40251093604db76d7e44263b3 upstream.

Each MIPS KVM guest has its own copy of the KVM exception vector. This
contains the TLB refill exception handler at offset 0x000, the general
exception handler at offset 0x180, and interrupt exception handlers at
offset 0x200 in case Cause_IV=1. A common handler is copied to offset
0x2000 and offset 0x3000 is used for temporarily storing k1 during entry
from guest.

However the amount of memory allocated for this purpose is calculated as
0x200 rounded up to the next page boundary, which is insufficient if 4KB
pages are in use. This can lead to the common handler at offset 0x2000
being overwritten and infinitely recursive exceptions on the next exit
from the guest.

Increase the minimum size from 0x200 to 0x4000 to cover the full use of
the page.

Signed-off-by: James Hogan 
Cc: Paolo Bonzini 
Cc: Gleb Natapov 
Cc: kvm@vger.kernel.org
Cc: Ralf Baechle 
Cc: linux-m...@linux-mips.org
Cc: Sanjay Lal 
Signed-off-by: Paolo Bonzini 
Signed-off-by: Luis Henriques 
---
 arch/mips/kvm/kvm_mips.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/mips/kvm/kvm_mips.c b/arch/mips/kvm/kvm_mips.c
index dd203e59e6fd..426345ac6f6e 100644
--- a/arch/mips/kvm/kvm_mips.c
+++ b/arch/mips/kvm/kvm_mips.c
@@ -299,7 +299,7 @@ struct kvm_vcpu *kvm_arch_vcpu_create(struct kvm *kvm, 
unsigned int id)
if (cpu_has_veic || cpu_has_vint) {
size = 0x200 + VECTORSPACING * 64;
} else {
-   size = 0x200;
+   size = 0x4000;
}
 
/* Save Linux EBASE */
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: copy_huge_page: unable to handle kernel NULL pointer dereference at 0000000000000008

2014-07-03 Thread jipan yang
cc linux-mm as suggested by Wanpeng.

[ 2423.567961] BUG: unable to handle kernel NULL pointer dereference
at 0008
[ 2423.568252] IP: [] copy_huge_page+0x8a/0x2a0
[ 2423.568465] PGD 0
[ 2423.568535] Oops:  [#1] SMP
[ 2423.568658] Modules linked in: ip6table_filter ip6_tables
ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat
nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT
xt_CHECKSUM iptable_mangle xt_tcpudp iptable_filter ip_tables x_tables
kvm_intel kvm bridge stp ast ttm drm_kms_helper drm llc sysimgblt
sysfillrect syscopyarea ioatdma shpchp joydev dcdbas mei_me mei
mac_hid lpc_ich acpi_pad wmi lp ext2 parport hid_generic usbhid hid
ixgbe igb ahci dca i2c_algo_bit libahci ptp pps_core mdio
[ 2423.570426] CPU: 16 PID: 2869 Comm: qemu-system-x86 Not tainted
3.11.0-15-generic #25~precise1-Ubuntu
[ 2423.570763] Hardware name: Dell Inc. PowerEdge C6220 II/09N44V,
BIOS 2.0.3 07/03/2013
[ 2423.571046] task: 88101de89770 ti: 88101df22000 task.ti:
88101df22000
[ 2423.571314] RIP: 0010:[]  []
copy_huge_page+0x8a/0x2a0
[ 2423.571609] RSP: 0018:88101df23768  EFLAGS: 00010246
[ 2423.571797] RAX: 0020 RBX: 81f9aa20 RCX: 0012
[ 2423.572052] RDX: 81f9aa20 RSI: 1000 RDI: ea007bf2
[ 2423.572307] RBP: 88101df237b8 R08:  R09: 0200
[ 2423.572562] R10:  R11: 00017960 R12: ea007bf2
[ 2423.572816] R13: 0001 R14: 02040008407d R15: ea0040438000
[ 2423.573074] FS:  7f40c700() GS:88203eec()
knlGS:
[ 2423.573366] CS:  0010 DS:  ES:  CR0: 80050033
[ 2423.573570] CR2: 0008 CR3: 002024247000 CR4: 001427e0
[ 2423.573825] Stack:
[ 2423.573889]  88101df23788 81156460 88207fff8000
ea007bf2
[ 2423.574151]  88101df23798 ea0040438000 ea007bf2
0001
[ 2423.574414]  02040008407d 88101b341250 88101df237e8
8119fee9
[ 2423.574675] Call Trace:
[ 2423.574765]  [] ? put_compound_page+0x40/0x70
[ 2423.574980]  [] migrate_page_copy+0x39/0x250
[ 2423.575190]  []
migrate_misplaced_transhuge_page+0x16c/0x4d0
[ 2423.575454]  [] do_huge_pmd_numa_page+0x169/0x2d0
[ 2423.575682]  [] ? putback_lru_page+0x5b/0xc0
[ 2423.575894]  [] handle_mm_fault+0x2c4/0x3e0
[ 2423.576103]  [] ? gup_pmd_range+0xaa/0xf0
[ 2423.576303]  [] __get_user_pages+0x178/0x5c0
[ 2423.576516]  [] ? gup_pmd_range+0xd0/0xf0
[ 2423.576737]  [] hva_to_pfn_slow+0x9e/0x150 [kvm]
[ 2423.576971]  [] hva_to_pfn+0xd5/0x210 [kvm]
[ 2423.577188]  [] ? kvm_release_pfn_clean+0x50/0x60 [kvm]
[ 2423.577452]  [] ? mmu_set_spte+0x138/0x270 [kvm]
[ 2423.577685]  [] __gfn_to_pfn_memslot+0xad/0xb0 [kvm]
[ 2423.577930]  [] __gfn_to_pfn+0x57/0x70 [kvm]
[ 2423.578149]  [] gfn_to_pfn_async+0x1a/0x20 [kvm]
[ 2423.578387]  [] try_async_pf+0x4a/0x90 [kvm]
[ 2423.578607]  [] ? kvm_host_page_size+0x9b/0xb0 [kvm]
[ 2423.587202]  [] tdp_page_fault+0x10b/0x220 [kvm]
[ 2423.595850]  [] kvm_mmu_page_fault+0x31/0x70 [kvm]
[ 2423.604557]  [] handle_ept_violation+0x7e/0x150 [kvm_intel]
[ 2423.613164]  [] vmx_handle_exit+0xa7/0x270 [kvm_intel]
[ 2423.621586]  [] vcpu_enter_guest+0x447/0x770 [kvm]
[ 2423.629990]  [] ? kvm_apic_local_deliver+0x69/0x70 [kvm]
[ 2423.638546]  [] __vcpu_run+0x1b8/0x2f0 [kvm]
[ 2423.646872]  [] kvm_arch_vcpu_ioctl_run+0x9d/0x170 [kvm]
[ 2423.654980]  [] kvm_vcpu_ioctl+0x43b/0x600 [kvm]
[ 2423.662816]  [] do_vfs_ioctl+0x7c/0x2f0
[ 2423.670388]  [] SyS_ioctl+0x91/0xb0
[ 2423.677695]  [] system_call_fastpath+0x1a/0x1f
[ 2423.684854] Code: f9 81 48 d3 e6 48 39 c6 74 2a be 00 10 00 00 eb
0e 8b 4b 08 48 89 f7 48 d3 e7 48 39 c7 74 15 48 81 c3 60 0b 00 00 48
39 d3 72 e6 <8b> 0c 25 08 00 00 00 31 db 41 bc 01 00 00 00 44 89 e0 d3
e0 3d
[ 2423.699824] RIP  [] copy_huge_page+0x8a/0x2a0
[ 2423.707111]  RSP 
[ 2423.714230] CR2: 0008
[ 2423.784650] ---[ end trace f686f7a0c554a317 ]---
[ 2423.792015] BUG: unable to handle kernel NULL pointer dereference
at 0008
[ 2423.799305] IP: [] copy_huge_page+0x8a/0x2a0
[ 2423.806618] PGD 0
[ 2423.813866] Oops:  [#2] SMP
[ 2423.821032] Modules linked in: ip6table_filter ip6_tables
ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat
nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT
xt_CHECKSUM iptable_mangle xt_tcpudp iptable_filter ip_tables x_tables
kvm_intel kvm bridge stp ast ttm drm_kms_helper drm llc sysimgblt
sysfillrect syscopyarea ioatdma shpchp joydev dcdbas mei_me mei
mac_hid lpc_ich acpi_pad wmi lp ext2 parport hid_generic usbhid hid
ixgbe igb ahci dca i2c_algo_bit libahci ptp pps_core mdio
[ 2423.860767] CPU: 30 PID: 2868 Comm: qemu-system-x86 Tainted: G
D  3.11.0-15-generic #25~precise1-Ubuntu
[ 2423.869230] Hardware name: Dell Inc. PowerEdge C6220 II/09N44V,
BIOS 2.0.3 07/03/2013
[ 2423.877709] task: 88101de8c650 ti: 88101dcf6000 task.ti:

Re: copy_huge_page: unable to handle kernel NULL pointer dereference at 0000000000000008

2014-07-03 Thread jipan yang
to make the log more readable

[ 2423.567961] BUG: unable to handle kernel NULL pointer dereference
at 0008
[ 2423.568252] IP: [] copy_huge_page+0x8a/0x2a0
[ 2423.568465] PGD 0
[ 2423.568535] Oops:  [#1] SMP
[ 2423.568658] Modules linked in: ip6table_filter ip6_tables
ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat
nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT
xt_CHECKSUM iptable_mangle xt_tcpudp iptable_filter ip_tables x_tables
kvm_intel kvm bridge stp ast ttm drm_kms_helper drm llc sysimgblt
sysfillrect syscopyarea ioatdma shpchp joydev dcdbas mei_me mei
mac_hid lpc_ich acpi_pad wmi lp ext2 parport hid_generic usbhid hid
ixgbe igb ahci dca i2c_algo_bit libahci ptp pps_core mdio
[ 2423.570426] CPU: 16 PID: 2869 Comm: qemu-system-x86 Not tainted
3.11.0-15-generic #25~precise1-Ubuntu
[ 2423.570763] Hardware name: Dell Inc. PowerEdge C6220 II/09N44V,
BIOS 2.0.3 07/03/2013
[ 2423.571046] task: 88101de89770 ti: 88101df22000 task.ti:
88101df22000
[ 2423.571314] RIP: 0010:[]  []
copy_huge_page+0x8a/0x2a0
[ 2423.571609] RSP: 0018:88101df23768  EFLAGS: 00010246
[ 2423.571797] RAX: 0020 RBX: 81f9aa20 RCX: 0012
[ 2423.572052] RDX: 81f9aa20 RSI: 1000 RDI: ea007bf2
[ 2423.572307] RBP: 88101df237b8 R08:  R09: 0200
[ 2423.572562] R10:  R11: 00017960 R12: ea007bf2
[ 2423.572816] R13: 0001 R14: 02040008407d R15: ea0040438000
[ 2423.573074] FS:  7f40c700() GS:88203eec()
knlGS:
[ 2423.573366] CS:  0010 DS:  ES:  CR0: 80050033
[ 2423.573570] CR2: 0008 CR3: 002024247000 CR4: 001427e0
[ 2423.573825] Stack:
[ 2423.573889]  88101df23788 81156460 88207fff8000
ea007bf2
[ 2423.574151]  88101df23798 ea0040438000 ea007bf2
0001
[ 2423.574414]  02040008407d 88101b341250 88101df237e8
8119fee9
[ 2423.574675] Call Trace:
[ 2423.574765]  [] ? put_compound_page+0x40/0x70
[ 2423.574980]  [] migrate_page_copy+0x39/0x250
[ 2423.575190]  []
migrate_misplaced_transhuge_page+0x16c/0x4d0
[ 2423.575454]  [] do_huge_pmd_numa_page+0x169/0x2d0
[ 2423.575682]  [] ? putback_lru_page+0x5b/0xc0
[ 2423.575894]  [] handle_mm_fault+0x2c4/0x3e0
[ 2423.576103]  [] ? gup_pmd_range+0xaa/0xf0
[ 2423.576303]  [] __get_user_pages+0x178/0x5c0
[ 2423.576516]  [] ? gup_pmd_range+0xd0/0xf0
[ 2423.576737]  [] hva_to_pfn_slow+0x9e/0x150 [kvm]
[ 2423.576971]  [] hva_to_pfn+0xd5/0x210 [kvm]
[ 2423.577188]  [] ? kvm_release_pfn_clean+0x50/0x60 [kvm]
[ 2423.577452]  [] ? mmu_set_spte+0x138/0x270 [kvm]
[ 2423.577685]  [] __gfn_to_pfn_memslot+0xad/0xb0 [kvm]
[ 2423.577930]  [] __gfn_to_pfn+0x57/0x70 [kvm]
[ 2423.578149]  [] gfn_to_pfn_async+0x1a/0x20 [kvm]
[ 2423.578387]  [] try_async_pf+0x4a/0x90 [kvm]
[ 2423.578607]  [] ? kvm_host_page_size+0x9b/0xb0 [kvm]
[ 2423.587202]  [] tdp_page_fault+0x10b/0x220 [kvm]
[ 2423.595850]  [] kvm_mmu_page_fault+0x31/0x70 [kvm]
[ 2423.604557]  [] handle_ept_violation+0x7e/0x150 [kvm_intel]
[ 2423.613164]  [] vmx_handle_exit+0xa7/0x270 [kvm_intel]
[ 2423.621586]  [] vcpu_enter_guest+0x447/0x770 [kvm]
[ 2423.629990]  [] ? kvm_apic_local_deliver+0x69/0x70 [kvm]
[ 2423.638546]  [] __vcpu_run+0x1b8/0x2f0 [kvm]
[ 2423.646872]  [] kvm_arch_vcpu_ioctl_run+0x9d/0x170 [kvm]
[ 2423.654980]  [] kvm_vcpu_ioctl+0x43b/0x600 [kvm]
[ 2423.662816]  [] do_vfs_ioctl+0x7c/0x2f0
[ 2423.670388]  [] SyS_ioctl+0x91/0xb0
[ 2423.677695]  [] system_call_fastpath+0x1a/0x1f
[ 2423.684854] Code: f9 81 48 d3 e6 48 39 c6 74 2a be 00 10 00 00 eb
0e 8b 4b 08 48 89 f7 48 d3 e7 48 39 c7 74 15 48 81 c3 60 0b 00 00 48
39 d3 72 e6 <8b> 0c 25 08 00 00 00 31 db 41 bc 01 00 00 00 44 89 e0 d3
e0 3d
[ 2423.699824] RIP  [] copy_huge_page+0x8a/0x2a0
[ 2423.707111]  RSP 
[ 2423.714230] CR2: 0008
[ 2423.784650] ---[ end trace f686f7a0c554a317 ]---
[ 2423.792015] BUG: unable to handle kernel NULL pointer dereference
at 0008
[ 2423.799305] IP: [] copy_huge_page+0x8a/0x2a0
[ 2423.806618] PGD 0
[ 2423.813866] Oops:  [#2] SMP
[ 2423.821032] Modules linked in: ip6table_filter ip6_tables
ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_nat
nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT
xt_CHECKSUM iptable_mangle xt_tcpudp iptable_filter ip_tables x_tables
kvm_intel kvm bridge stp ast ttm drm_kms_helper drm llc sysimgblt
sysfillrect syscopyarea ioatdma shpchp joydev dcdbas mei_me mei
mac_hid lpc_ich acpi_pad wmi lp ext2 parport hid_generic usbhid hid
ixgbe igb ahci dca i2c_algo_bit libahci ptp pps_core mdio
[ 2423.860767] CPU: 30 PID: 2868 Comm: qemu-system-x86 Tainted: G
D  3.11.0-15-generic #25~precise1-Ubuntu
[ 2423.869230] Hardware name: Dell Inc. PowerEdge C6220 II/09N44V,
BIOS 2.0.3 07/03/2013
[ 2423.877709] task: 88101de8c650 ti: 88101dcf6000 task.ti:
881

Re: copy_huge_page: unable to handle kernel NULL pointer dereference at 0000000000000008

2014-07-03 Thread Wanpeng Li
You should also Cc mm ML
On Thu, Jul 03, 2014 at 12:57:04AM -0700, jipan yang wrote:
>Hi,
>
>I've seen the problem quite a few times.  Before spending more time on
>it, I'd like to have a quick check here to see if anyone ever saw the
>same problem?  Hope it is a relevant question with this mail list.
>
>
>Jul  2 11:08:21 arno-3 kernel: [ 2165.078623] BUG: unable to handle
>kernel NULL pointer dereference at 0008
>Jul  2 11:08:21 arno-3 kernel: [ 2165.078916] IP: []
>copy_huge_page+0x8a/0x2a0
>Jul  2 11:08:21 arno-3 kernel: [ 2165.079128] PGD 0
>Jul  2 11:08:21 arno-3 kernel: [ 2165.079198] Oops:  [#1] SMP
>Jul  2 11:08:21 arno-3 kernel: [ 2165.079319] Modules linked in:
>ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE
>iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4
>xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp
>iptable_filter ip_tables x_tables kvm_intel kvm bridge stp llc ast ttm
>drm_kms_helper drm sysimgblt sysfillrect syscopyarea lp mei_me ioatdma
>ext2 parport mei shpchp dcdbas joydev mac_hid lpc_ich acpi_pad wmi
>hid_generic usbhid hid ixgbe igb dca i2c_algo_bit ahci ptp libahci
>mdio pps_core
>Jul  2 11:08:21 arno-3 kernel: [ 2165.081090] CPU: 19 PID: 3494 Comm:
>qemu-system-x86 Not tainted 3.11.0-15-generic #25~precise1-Ubuntu
>Jul  2 11:08:21 arno-3 kernel: [ 2165.081424] Hardware name: Dell Inc.
>PowerEdge C6220 II/09N44V, BIOS 2.0.3 07/03/2013
>Jul  2 11:08:21 arno-3 kernel: [ 2165.081705] task: 88102675
>ti: 881026056000 task.ti: 881026056000
>Jul  2 11:08:21 arno-3 kernel: [ 2165.081973] RIP:
>0010:[]  []
>copy_huge_page+0x8a/0x2a0
>Jul  2 11:08:21 arno-3 kernel: [ 2165.082267] RSP:
>0018:881026057768  EFLAGS: 00010246
>Jul  2 11:08:21 arno-3 kernel: [ 2165.082455] RAX: 0020
>RBX: 81f9aa20 RCX: 0012
>Jul  2 11:08:21 arno-3 kernel: [ 2165.082710] RDX: 81f9aa20
>RSI: 1000 RDI: ea0077f28000
>Jul  2 11:08:21 arno-3 kernel: [ 2165.082963] RBP: 8810260577b8
>R08:  R09: 01ff
>Jul  2 11:08:21 arno-3 kernel: [ 2165.083217] R10: 
>R11: 00017960 R12: ea0077f28000
>Jul  2 11:08:21 arno-3 kernel: [ 2165.083471] R13: 0001
>R14: 02040008407d R15: ea003a9b8000
>Jul  2 11:08:21 arno-3 kernel: [ 2165.083727] FS:
>7f19d799a700() GS:88203ef2()
>knlGS:
>Jul  2 11:08:21 arno-3 kernel: [ 2165.084019] CS:  0010 DS:  ES:
> CR0: 80050033
>Jul  2 11:08:21 arno-3 kernel: [ 2165.084222] CR2: 0008
>CR3: 002023b1c000 CR4: 001427e0
>Jul  2 11:08:21 arno-3 kernel: [ 2165.084477] Stack:
>Jul  2 11:08:21 arno-3 kernel: [ 2165.084540]  881026057788
>81156460 88207fff8000 ea0077f28000
>Jul  2 11:08:21 arno-3 kernel: [ 2165.084802]  881026057798
>ea003a9b8000 ea0077f28000 0001
>Jul  2 11:08:21 arno-3 kernel: [ 2165.085064]  02040008407d
>881026f11260 8810260577e8 8119fee9
>Jul  2 11:08:21 arno-3 kernel: [ 2165.085326] Call Trace:
>Jul  2 11:08:21 arno-3 kernel: [ 2165.085418]  [] ?
>put_compound_page+0x40/0x70
>Jul  2 11:08:21 arno-3 kernel: [ 2165.085633]  []
>migrate_page_copy+0x39/0x250
>Jul  2 11:08:21 arno-3 kernel: [ 2165.085844]  []
>migrate_misplaced_transhuge_page+0x16c/0x4d0
>Jul  2 11:08:21 arno-3 kernel: [ 2165.086106]  []
>do_huge_pmd_numa_page+0x169/0x2d0
>Jul  2 11:08:21 arno-3 kernel: [ 2165.086332]  []
>handle_mm_fault+0x2c4/0x3e0
>Jul  2 11:08:21 arno-3 kernel: [ 2165.086539]  []
>__get_user_pages+0x178/0x5c0
>Jul  2 11:08:21 arno-3 kernel: [ 2165.086756]  [] ?
>gup_pmd_range+0xd0/0xf0
>Jul  2 11:08:21 arno-3 kernel: [ 2165.086972]  []
>hva_to_pfn_slow+0x9e/0x150 [kvm]
>Jul  2 11:08:21 arno-3 kernel: [ 2165.087206]  []
>hva_to_pfn+0xd5/0x210 [kvm]
>Jul  2 11:08:21 arno-3 kernel: [ 2165.087423]  [] ?
>kvm_release_pfn_clean+0x50/0x60 [kvm]
>Jul  2 11:08:21 arno-3 kernel: [ 2165.087686]  [] ?
>mmu_set_spte+0x138/0x270 [kvm]
>Jul  2 11:08:21 arno-3 kernel: [ 2165.087920]  []
>__gfn_to_pfn_memslot+0xad/0xb0 [kvm]
>Jul  2 11:08:21 arno-3 kernel: [ 2165.088166]  []
>__gfn_to_pfn+0x57/0x70 [kvm]
>Jul  2 11:08:21 arno-3 kernel: [ 2165.088389]  []
>gfn_to_pfn_async+0x1a/0x20 [kvm]
>Jul  2 11:08:21 arno-3 kernel: [ 2165.088628]  []
>try_async_pf+0x4a/0x90 [kvm]
>Jul  2 11:08:21 arno-3 kernel: [ 2165.088849]  [] ?
>kvm_host_page_size+0x9b/0xb0 [kvm]
>Jul  2 11:08:21 arno-3 kernel: [ 2165.089098]  []
>tdp_page_fault+0x10b/0x220 [kvm]
>Jul  2 11:08:21 arno-3 kernel: [ 2165.089334]  []
>kvm_mmu_page_fault+0x31/0x70 [kvm]
>Jul  2 11:08:21 arno-3 kernel: [ 2165.098035]  []
>handle_ept_violation+0x7e/0x150 [kvm_intel]
>Jul  2 11:08:21 arno-3 kernel: [ 2165.106835]  []
>vmx_handle_exit+0xa7/0x270 [kvm_intel]
>Jul  2 11:08:21 arno-3 kernel: [ 2165.115677]  []
>vcpu_enter_guest+0x447/0x770 [kvm]
>Jul  2 11:08:21 arno-3 kernel: [ 2165.124374]  [] ?
>recalc_sigpending+0x1f

copy_huge_page: unable to handle kernel NULL pointer dereference at 0000000000000008

2014-07-03 Thread jipan yang
Hi,

I've seen the problem quite a few times.  Before spending more time on
it, I'd like to have a quick check here to see if anyone ever saw the
same problem?  Hope it is a relevant question with this mail list.


Jul  2 11:08:21 arno-3 kernel: [ 2165.078623] BUG: unable to handle
kernel NULL pointer dereference at 0008
Jul  2 11:08:21 arno-3 kernel: [ 2165.078916] IP: []
copy_huge_page+0x8a/0x2a0
Jul  2 11:08:21 arno-3 kernel: [ 2165.079128] PGD 0
Jul  2 11:08:21 arno-3 kernel: [ 2165.079198] Oops:  [#1] SMP
Jul  2 11:08:21 arno-3 kernel: [ 2165.079319] Modules linked in:
ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE
iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4
xt_state nf_conntrack ipt_REJECT xt_CHECKSUM iptable_mangle xt_tcpudp
iptable_filter ip_tables x_tables kvm_intel kvm bridge stp llc ast ttm
drm_kms_helper drm sysimgblt sysfillrect syscopyarea lp mei_me ioatdma
ext2 parport mei shpchp dcdbas joydev mac_hid lpc_ich acpi_pad wmi
hid_generic usbhid hid ixgbe igb dca i2c_algo_bit ahci ptp libahci
mdio pps_core
Jul  2 11:08:21 arno-3 kernel: [ 2165.081090] CPU: 19 PID: 3494 Comm:
qemu-system-x86 Not tainted 3.11.0-15-generic #25~precise1-Ubuntu
Jul  2 11:08:21 arno-3 kernel: [ 2165.081424] Hardware name: Dell Inc.
PowerEdge C6220 II/09N44V, BIOS 2.0.3 07/03/2013
Jul  2 11:08:21 arno-3 kernel: [ 2165.081705] task: 88102675
ti: 881026056000 task.ti: 881026056000
Jul  2 11:08:21 arno-3 kernel: [ 2165.081973] RIP:
0010:[]  []
copy_huge_page+0x8a/0x2a0
Jul  2 11:08:21 arno-3 kernel: [ 2165.082267] RSP:
0018:881026057768  EFLAGS: 00010246
Jul  2 11:08:21 arno-3 kernel: [ 2165.082455] RAX: 0020
RBX: 81f9aa20 RCX: 0012
Jul  2 11:08:21 arno-3 kernel: [ 2165.082710] RDX: 81f9aa20
RSI: 1000 RDI: ea0077f28000
Jul  2 11:08:21 arno-3 kernel: [ 2165.082963] RBP: 8810260577b8
R08:  R09: 01ff
Jul  2 11:08:21 arno-3 kernel: [ 2165.083217] R10: 
R11: 00017960 R12: ea0077f28000
Jul  2 11:08:21 arno-3 kernel: [ 2165.083471] R13: 0001
R14: 02040008407d R15: ea003a9b8000
Jul  2 11:08:21 arno-3 kernel: [ 2165.083727] FS:
7f19d799a700() GS:88203ef2()
knlGS:
Jul  2 11:08:21 arno-3 kernel: [ 2165.084019] CS:  0010 DS:  ES:
 CR0: 80050033
Jul  2 11:08:21 arno-3 kernel: [ 2165.084222] CR2: 0008
CR3: 002023b1c000 CR4: 001427e0
Jul  2 11:08:21 arno-3 kernel: [ 2165.084477] Stack:
Jul  2 11:08:21 arno-3 kernel: [ 2165.084540]  881026057788
81156460 88207fff8000 ea0077f28000
Jul  2 11:08:21 arno-3 kernel: [ 2165.084802]  881026057798
ea003a9b8000 ea0077f28000 0001
Jul  2 11:08:21 arno-3 kernel: [ 2165.085064]  02040008407d
881026f11260 8810260577e8 8119fee9
Jul  2 11:08:21 arno-3 kernel: [ 2165.085326] Call Trace:
Jul  2 11:08:21 arno-3 kernel: [ 2165.085418]  [] ?
put_compound_page+0x40/0x70
Jul  2 11:08:21 arno-3 kernel: [ 2165.085633]  []
migrate_page_copy+0x39/0x250
Jul  2 11:08:21 arno-3 kernel: [ 2165.085844]  []
migrate_misplaced_transhuge_page+0x16c/0x4d0
Jul  2 11:08:21 arno-3 kernel: [ 2165.086106]  []
do_huge_pmd_numa_page+0x169/0x2d0
Jul  2 11:08:21 arno-3 kernel: [ 2165.086332]  []
handle_mm_fault+0x2c4/0x3e0
Jul  2 11:08:21 arno-3 kernel: [ 2165.086539]  []
__get_user_pages+0x178/0x5c0
Jul  2 11:08:21 arno-3 kernel: [ 2165.086756]  [] ?
gup_pmd_range+0xd0/0xf0
Jul  2 11:08:21 arno-3 kernel: [ 2165.086972]  []
hva_to_pfn_slow+0x9e/0x150 [kvm]
Jul  2 11:08:21 arno-3 kernel: [ 2165.087206]  []
hva_to_pfn+0xd5/0x210 [kvm]
Jul  2 11:08:21 arno-3 kernel: [ 2165.087423]  [] ?
kvm_release_pfn_clean+0x50/0x60 [kvm]
Jul  2 11:08:21 arno-3 kernel: [ 2165.087686]  [] ?
mmu_set_spte+0x138/0x270 [kvm]
Jul  2 11:08:21 arno-3 kernel: [ 2165.087920]  []
__gfn_to_pfn_memslot+0xad/0xb0 [kvm]
Jul  2 11:08:21 arno-3 kernel: [ 2165.088166]  []
__gfn_to_pfn+0x57/0x70 [kvm]
Jul  2 11:08:21 arno-3 kernel: [ 2165.088389]  []
gfn_to_pfn_async+0x1a/0x20 [kvm]
Jul  2 11:08:21 arno-3 kernel: [ 2165.088628]  []
try_async_pf+0x4a/0x90 [kvm]
Jul  2 11:08:21 arno-3 kernel: [ 2165.088849]  [] ?
kvm_host_page_size+0x9b/0xb0 [kvm]
Jul  2 11:08:21 arno-3 kernel: [ 2165.089098]  []
tdp_page_fault+0x10b/0x220 [kvm]
Jul  2 11:08:21 arno-3 kernel: [ 2165.089334]  []
kvm_mmu_page_fault+0x31/0x70 [kvm]
Jul  2 11:08:21 arno-3 kernel: [ 2165.098035]  []
handle_ept_violation+0x7e/0x150 [kvm_intel]
Jul  2 11:08:21 arno-3 kernel: [ 2165.106835]  []
vmx_handle_exit+0xa7/0x270 [kvm_intel]
Jul  2 11:08:21 arno-3 kernel: [ 2165.115677]  []
vcpu_enter_guest+0x447/0x770 [kvm]
Jul  2 11:08:21 arno-3 kernel: [ 2165.124374]  [] ?
recalc_sigpending+0x1f/0x60
Jul  2 11:08:21 arno-3 kernel: [ 2165.132901]  []
__vcpu_run+0x1b8/0x2f0 [kvm]
Jul  2 11:08:21 arno-3 kernel: [ 2165.141395]  []
kvm_arch_vcpu_ioctl_run+0x9d/0x170 [kvm]
Jul  2 11:0

Re: [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race

2014-07-03 Thread Jan Kiszka
On 2014-07-03 07:29, Bandan Das wrote:
> Wanpeng Li  writes:
> 
>> Hi Bandan,
>> On Wed, Jul 02, 2014 at 12:27:59PM -0400, Bandan Das wrote:
>>> Wanpeng Li  writes:
>>>
 This patch fix bug https://bugzilla.kernel.org/show_bug.cgi?id=72381 
>>> I can also reproduce this easily with Linux as L1 by "slowing it down"
>>> eg. running with ept = 0
>>>
>>> I suggest changing the subject to -
>>> KVM: nVMX: Fix race that incorrectly injects L1's irq to L2
>>>
>>
>> Ok, I will fold this to next version. ;-)
>>
 If we didn't inject a still-pending event to L1 since nested_run_pending,
 KVM_REQ_EVENT should be requested after the vmexit in order to inject the 
 event to L1. However, current log blindly request a KVM_REQ_EVENT even if 
>>>
>>> What's current "log" ? Do you mean current "code" ?
>>>
>>
>> Yeah, it's a typo. I mean "logic".
>>
>> [...]
>>> Also, I am wondering isn't it enough to just do this to avoid this race ?
>>>
>>> static int vmx_interrupt_allowed(struct kvm_vcpu *vcpu)
>>> {
>>> -   return (!to_vmx(vcpu)->nested.nested_run_pending &&
>>> +   return (!is_guest_mode(vcpu) &&
>>> +   !to_vmx(vcpu)->nested.nested_run_pending &&
>>>vmcs_readl(GUEST_RFLAGS) & X86_EFLAGS_IF) &&
>>>!(vmcs_read32(GUEST_INTERRUPTIBILITY_INFO) &
>>>
>>
>> I don't think you fix the root cause of the race, and there are two cases 
>> which 
>> I concern about your proposal:
>>
>> - If there is a special L1 which don't ask to exit on external intrs, you 
>> will 
>>   lose the intrs which L0 inject to L2.  
> 
> Oh didn't think about that case :), thanks for the pointing this out.
> It's easy to check this with Xen as L1, I suppose.

Xen most probably intercepts external interrupts, but Jailhouse
definitely does not. We also have a unit test for that, but I will
likely not expose the issue of lost events.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH V2 2/3] perf protect LBR when Intel PT is enabled.

2014-07-03 Thread Peter Zijlstra
On Wed, Jul 02, 2014 at 11:14:14AM -0700, kan.li...@intel.com wrote:
> From: Kan Liang 
> 
> If RTIT_CTL.TraceEn=1, any attempt to read or write the LBR or LER MSRs, 
> including LBR_TOS, will result in a #GP.
> Since Intel PT can be enabled/disabled at runtime, LBR MSRs have to be 
> protected by _safe() at runtime.

Lines are too long, and the reasoning is totally broken.

If there's active LBR users out there, we should refuse to enable PT and
vice versa. What we should not be doing is using _safe and fault and
generate crap.


pgpSTKffjwJbG.pgp
Description: PGP signature


Re: [PATCH] KVM: nVMX: Fix IRQs inject to L2 which belong to L1 since race

2014-07-03 Thread Wanpeng Li
On Thu, Jul 03, 2014 at 01:15:26AM -0400, Bandan Das wrote:
>Jan Kiszka  writes:
>
>> On 2014-07-02 08:54, Wanpeng Li wrote:
>>> This patch fix bug https://bugzilla.kernel.org/show_bug.cgi?id=72381 
>>> 
>>> If we didn't inject a still-pending event to L1 since nested_run_pending,
>>> KVM_REQ_EVENT should be requested after the vmexit in order to inject the 
>>> event to L1. However, current log blindly request a KVM_REQ_EVENT even if 
>>> there is no still-pending event to L1 which blocked by nested_run_pending. 
>>> There is a race which lead to an interrupt will be injected to L2 which 
>>> belong to L1 if L0 send an interrupt to L1 during this window. 
>>> 
>>>VCPU0   another thread 
>>> 
>>> L1 intr not blocked on L2 first entry
>>> vmx_vcpu_run req event 
>>> kvm check request req event 
>>> check_nested_events don't have any intr 
>>> not nested exit 
>>> intr occur (8254, lapic timer 
>>> etc)
>>> inject_pending_event now have intr 
>>> inject interrupt 
>>> 
>>> This patch fix this race by introduced a l1_events_blocked field in 
>>> nested_vmx 
>>> which indicates there is still-pending event which blocked by 
>>> nested_run_pending, 
>>> and smart request a KVM_REQ_EVENT if there is a still-pending event which 
>>> blocked 
>>> by nested_run_pending.
>>
>> There are more, unrelated reasons why KVM_REQ_EVENT could be set. Why
>> aren't those able to trigger this scenario?
>>
>> In any case, unconditionally setting KVM_REQ_EVENT seems strange and
>> should be changed.
>
>
>Ugh! I think I am hitting another one but this one's probably because 
>we are not setting KVM_REQ_EVENT for something we should.
>
>Before this patch, I was able to hit this bug everytime with 
>"modprobe kvm_intel ept=0 nested=1 enable_shadow_vmcs=0" and then booting 
>L2. I can verify that I was indeed hitting the race in inject_pending_event.
>
>After this patch, I believe I am hitting another bug - this happens 
>after I boot L2, as above, and then start a Linux kernel compilation
>and then wait and watch :) It's a pain to debug because this happens

I have already try several times with "modprobe kvm_intel ept=0 nested=1
enable_shadow_vmcs=0" and still can't reproduce the bug you mentioned.
Could you give me more details such like L1 and L2 which one hang or panic? 
In addition, if you can post the call trace is appreciated. 

Regards,
Wanpeng Li 

>almost once in three times; it never happens if I run with ept=1, however,
>I think that's only because the test completes sooner. But I can confirm
>that I don't see it if I always set REQ_EVENT if nested_run_pending is set 
>instead of
>the approach this patch takes.
>(Any debug hints help appreciated!)
>
>So, I am not sure if this is the right fix. Rather, I think the safer thing
>to do is to have the interrupt pending check for injection into L1 at
>the "same site" as the call to kvm_queue_interrupt() just like we had before 
>commit b6b8a1451fc40412c57d1. Is there any advantage to having all the 
>nested events checks together ?
>
>PS - Actually, a much easier fix (or rather hack) is to return 1 in 
>vmx_interrupt_allowed() (as I mentioned elsewhere) only if 
>!is_guest_mode(vcpu) That way, the pending interrupt interrupt 
>can be taken care of correctly during the next vmexit.
>
>Bandan
>
>> Jan
>>
[...]
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html