Re: [RFC PATCH] kvm-unit-tests : Basic architecture of VMX nested test case

2013-07-24 Thread Arthur Chunqi Li
Hi Gleb and Paolo,
What about organizing vmx_run() as follows:

static int vmx_run()
{
u32 eax;
bool ret;

vmcs_write(HOST_RSP, get_rsp());
ret = vmlaunch();
while (!ret) {
asm volatile(
vmx_return:\n\t
SAVE_GPR
);
eax = exit_handler();
switch (eax) {
case VMX_TEST_VMEXIT:
return 0;
case VMX_TEST_RESUME:
break;
default:
printf(%s : unhandled ret from exit_handler.\n, __func__);
return 1;
}
ret = vmresume();
}
printf(%s : vmenter failed.\n, __func__);
return 1;
}

Arthur

On Fri, Jul 19, 2013 at 8:06 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 19/07/2013 11:40, Gleb Natapov ha scritto:
 Because this is written in C, and I know trying to fool the compiler is
 a losing game.  So my reaction is okay, HOST_RIP must be set so that
 code will not jump around.  If I see

asm(vmlaunch)
exit(-1)

 the reaction is the opposite: hmm, anything that jumps around would
 have a hard time with the compiler, there must be some assembly
 trampoline somewhere; let's check what HOST_RIP is.

 We do try to fool compiler often even without vmx: code patching. This is
 why asm goto was invented, no? So, like you said in previous emails,
 asm goto is appropriate way here to tell compiler what is going on.

 Code patching is only reimplementing an existing C structure (if/else)
 in a different manner.  Here the actual code flow (location of HOST_RIP
 and value of HOST_RSP) cannot be expressed correctly to the compiler
 because we do not use the C label (we use an asm label).

 I don't think asm goto can be made to work for vmx_return, though if we
 go for a big blob it could be useful to jump to the error handling code.

 I don't see anything bad in jumping completely to a different context.
 The guest and host are sort of like two coroutines, they hardly have any
 connection with the code that called vmlaunch.
 You can see it as two coroutines, or you can see it as linear logic:
   do host stuff
   enter guest
   do guest stuff
   exit guest
   continue doing host stuff

 Both can be implemented, I prefer linear one. I would prefer linear one
 to coroutine in any code design, no connection to vmx. Sometimes
 coroutine are better than alternative (and in those cases alternative is
 never a linear logic), but this is not the case.

 Fair enough.

 As things stand, we have only one version that works reliably with
 past/present/future compilers, and that is the one with setjmp/longjmp.
  A v5 would be needed anyway.  If Arthur (and Jan) want to write the
 vmentry as a big asm blob, that's fine by me.  Still, some variety adds
 value in a testsuite: think of a hypothetic nested VMX implementation
 that ignores HOST_RIP and just falls through to the next host %rip, we
 want that to fail the tests! (*)

 (*) In fact, I think this must be a requirement even if Arthur
 goes for a big asm blob.

 If they prefer to keep setjmp/longjmp and fix the few remaining nits, I
 think that should be acceptable anyway.  It would even make sense to
 have multiple vmentries if you can show they stress the hypervisor
 differently.

 In any case, I think we all agree (Arthur too) that this RFC doing mixed
 asm and C is the worst of both worlds.

 The actually differences in asm instruction between both
 version will not be bigger then a couple of lines (if we will not take
 setjmp/longjmp implementation into account :)),

 I was waiting for this parenthetical remark to appear. ;)

 I've put a smile there :) I realize this argument is not completely fair,
 but for the sake of argument everything goes!

 Yes, I caught the irony. ;)

 Paolo



-- 
Arthur Chunqi Li
Department of Computer Science
School of EECS
Peking University
Beijing, China
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] kvm-unit-tests : Basic architecture of VMX nested test case

2013-07-24 Thread Paolo Bonzini
Il 24/07/2013 08:11, Arthur Chunqi Li ha scritto:
 
 static int vmx_run()
 {
 u32 eax;
 bool ret;
 
 vmcs_write(HOST_RSP, get_rsp());
 ret = vmlaunch();

The compiler can still change rsp between here...

 while (!ret) {
 asm volatile(
 vmx_return:\n\t

... and here.

If you want to write it in C, the only thing that can be after
vmlaunch/vmresume is exit().  Else it has to be asm.

Paolo

 SAVE_GPR
 );
 eax = exit_handler();
 switch (eax) {
 case VMX_TEST_VMEXIT:
 return 0;
 case VMX_TEST_RESUME:
 break;
 default:
 printf(%s : unhandled ret from exit_handler.\n, __func__);
 return 1;
 }
 ret = vmresume();
 }
 printf(%s : vmenter failed.\n, __func__);
 return 1;
 }

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] kvm-unit-tests : Basic architecture of VMX nested test case

2013-07-24 Thread Arthur Chunqi Li
On Wed, Jul 24, 2013 at 2:40 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 24/07/2013 08:11, Arthur Chunqi Li ha scritto:

 static int vmx_run()
 {
 u32 eax;
 bool ret;

 vmcs_write(HOST_RSP, get_rsp());
 ret = vmlaunch();

 The compiler can still change rsp between here...

 while (!ret) {
 asm volatile(
 vmx_return:\n\t

 ... and here.

 If you want to write it in C, the only thing that can be after
 vmlaunch/vmresume is exit().  Else it has to be asm.
Actually, you mean we need to write all the codes in asm to avoid
changing to rsp, right?

Arthur

 Paolo

 SAVE_GPR
 );
 eax = exit_handler();
 switch (eax) {
 case VMX_TEST_VMEXIT:
 return 0;
 case VMX_TEST_RESUME:
 break;
 default:
 printf(%s : unhandled ret from exit_handler.\n, __func__);
 return 1;
 }
 ret = vmresume();
 }
 printf(%s : vmenter failed.\n, __func__);
 return 1;
 }

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] kvm-unit-tests : Basic architecture of VMX nested test case

2013-07-24 Thread Paolo Bonzini
Il 24/07/2013 08:46, Arthur Chunqi Li ha scritto:
 On Wed, Jul 24, 2013 at 2:40 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 24/07/2013 08:11, Arthur Chunqi Li ha scritto:

 static int vmx_run()
 {
 u32 eax;
 bool ret;

 vmcs_write(HOST_RSP, get_rsp());
 ret = vmlaunch();

 The compiler can still change rsp between here...

 while (!ret) {
 asm volatile(
 vmx_return:\n\t

 ... and here.

 If you want to write it in C, the only thing that can be after
 vmlaunch/vmresume is exit().  Else it has to be asm.
 Actually, you mean we need to write all the codes in asm to avoid
 changing to rsp, right?

Not necessarily all the code.  It is also ok to use setjmp/longjmp with
a small asm trampoline, because this method won't care about the exact
rsp values that are used.  But if you want to do as Gleb said, and put
vmx_return just after vmlaunch, it has to be all asm as in KVM's
arch/x86/kvm/vmx.c.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] x86: properly handle kvm emulation of hyperv

2013-07-24 Thread Jason Wang
On 07/24/2013 12:48 PM, H. Peter Anvin wrote:
 On 07/23/2013 09:37 PM, Jason Wang wrote:
 On 07/23/2013 10:48 PM, H. Peter Anvin wrote:
 On 07/23/2013 06:55 AM, KY Srinivasan wrote:
 This strategy of hypervisor detection based on some detection order IMHO 
 is not
 a robust detection strategy. The current scheme works since the only 
 hypervisor emulated
 (by other hypervisors happens to be Hyper-V). What if this were to change.

 One strategy would be to pick the *last* one in the CPUID list, since
 the ones before it are logically the one(s) being emulated...

 -hpa

 How about simply does a reverse loop from 0x4001 to 0x4001?

 Not all systems like being poked too far into hyperspace.  Just remember
 the last match and walk the list.

   -hpa


Ok, but it raises a question - how to know it was the 'last' match
without knowing all signatures of other hyper-visor?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] x86: properly handle kvm emulation of hyperv

2013-07-24 Thread Paolo Bonzini
Il 24/07/2013 08:54, Jason Wang ha scritto:
 On 07/24/2013 12:48 PM, H. Peter Anvin wrote:
 On 07/23/2013 09:37 PM, Jason Wang wrote:
 On 07/23/2013 10:48 PM, H. Peter Anvin wrote:
 On 07/23/2013 06:55 AM, KY Srinivasan wrote:
 This strategy of hypervisor detection based on some detection order IMHO 
 is not
 a robust detection strategy. The current scheme works since the only 
 hypervisor emulated
 (by other hypervisors happens to be Hyper-V). What if this were to change.

 One strategy would be to pick the *last* one in the CPUID list, since
 the ones before it are logically the one(s) being emulated...

-hpa

 How about simply does a reverse loop from 0x4001 to 0x4001?

 Not all systems like being poked too far into hyperspace.  Just remember
 the last match and walk the list.

  -hpa

 
 Ok, but it raises a question - how to know it was the 'last' match
 without knowing all signatures of other hyper-visor?

You can return a priority value from the .detect function.  The
priority value can simply be the CPUID leaf where the signature was
found (or a low value such as 1 if detection was done with DMI).

Then you can pick the hypervisor with the highest priority instead of
hard-coding the order.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: VU#976534 - How to submit security bugs?

2013-07-24 Thread Stefan Hajnoczi
On Mon, Jul 22, 2013 at 02:49:50PM -0400, CERT(R) Coordination Center wrote:
   My name is Adam Rauf and I work for the CERT Coordination Center.  We
 have a report that may affect KVM/QEMU.  How can we securely send it over to
 you?  Thanks so much!

Paolo, Gleb, Anthony: Is this already being discussed off-list?

Adam: Paolo Bonzini and Gleb Natapov are the KVM kernel maintainers and
Anthony Liguori is QEMU maintainer.  You can verify this by checking
linux.git ./MAINTAINERS and qemu.git ./MAINTAINERS.  I suggest getting
in touch with them.

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Cannot load kvm_amd module - (says disabled by bios)

2013-07-24 Thread Massimiliano Adamo


On Tue, 2013-07-23 at 15:31 -0600, Alex Williamson wrote:
 On Tue, 2013-07-23 at 23:11 +0200, Massimiliano Adamo wrote:
  All, 
  
  this is a bug with KVM, impacting (at least) all mainstream kernels that
  I tried so far: 3.2, 3.3, 3.4, 3.5, 3.8, 3.10 and 3.11
  
  This is the link of the downstream bug:
  https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1201092
  
  - Firt of all I mention that this bug has been raised also for Fedora
  18.
  Here is the link: https://bugzilla.redhat.com/show_bug.cgi?id=978608
  
  - I am running Ubuntu Raring (with the kernel 3.8.0-27-generic), but
  I've also tried the mainstream kernel (without Ubuntu patches).
  
  - It happens with the following CPU: AMD E-350D
  
  - The kvm-ok executable says that the system is capable of running KVM,
  but it also says the it's disabled in the BIOS.
  
  - This is the out put of kvm-ok:
  # kvm-ok
  INFO: /dev/kvm does not exist
  HINT: sudo modprobe kvm_amd
  INFO: Your CPU supports KVM extensions
  INFO: KVM (svm) is disabled by your BIOS
  HINT: Enter your BIOS setup and enable Virtualization Technology (VT),
and then hard poweroff/poweron your system
  KVM acceleration can NOT be used
  
  - This is what modprobe kvm-amd says:
  # modprobe kvm-amd
  ERROR: could not insert 'kvm_amd': Operation not supported
  root@yasna:~# dmesg |tail -n1
  [ 2542.263745] kvm: disabled by bios
  
  - AMD-V extension on Virtualbox works correctly.
  Therefore the extension works properly but it's not recognized by KVM.
 
 What evidence do you have that Virtualbox is actually making use of
 AMD-V?
 

Hi! That's a good point. 
Right now I can only say it was selectable from the GUI. Therefore I
just imagined it's recognized.  
Now I try to see how to enable/check logs on Virtualbox and I'll get
back to you with some feedback.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-24 Thread Alexander Graf

On 24.07.2013, at 04:26, “tiejun.chen” wrote:

 On 07/18/2013 06:27 PM, Alexander Graf wrote:
 
 On 18.07.2013, at 12:19, “tiejun.chen” wrote:
 
 On 07/18/2013 06:12 PM, Alexander Graf wrote:
 
 On 18.07.2013, at 12:08, “tiejun.chen” wrote:
 
 On 07/18/2013 05:48 PM, Alexander Graf wrote:
 
 On 18.07.2013, at 10:25, Bhushan Bharat-R65777 wrote:
 
 
 
 -Original Message-
 From: Bhushan Bharat-R65777
 Sent: Thursday, July 18, 2013 1:53 PM
 To: '�tiejun.chen�'
 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood 
 Scott-
 B07421
 Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for 
 kernel
 managed pages
 
 
 
 -Original Message-
 From: �tiejun.chen� [mailto:tiejun.c...@windriver.com]
 Sent: Thursday, July 18, 2013 1:52 PM
 To: Bhushan Bharat-R65777
 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de; Wood
 Scott-
 B07421
 Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
 kernel managed pages
 
 On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote:
 
 
 -Original Message-
 From: kvm-ppc-ow...@vger.kernel.org
 [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of �tiejun.chen�
 Sent: Thursday, July 18, 2013 1:01 PM
 To: Bhushan Bharat-R65777
 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de;
 Wood
 Scott-
 B07421
 Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for
 kernel managed pages
 
 On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote:
 
 
 -Original Message-
 From: �tiejun.chen� [mailto:tiejun.c...@windriver.com]
 Sent: Thursday, July 18, 2013 11:56 AM
 To: Bhushan Bharat-R65777
 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; ag...@suse.de;
 Wood
 Scott- B07421; Bhushan Bharat-R65777
 Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only
 for kernel managed pages
 
 On 07/18/2013 02:04 PM, Bharat Bhushan wrote:
 If there is a struct page for the requested mapping then it's
 normal DDR and the mapping sets M bit (coherent, cacheable)
 else this is treated as I/O and we set  I + G  (cache
 inhibited,
 guarded)
 
 This helps setting proper TLB mapping for direct assigned device
 
 Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
 ---
arch/powerpc/kvm/e500_mmu_host.c |   17 -
1 files changed, 12 insertions(+), 5 deletions(-)
 
 diff --git a/arch/powerpc/kvm/e500_mmu_host.c
 b/arch/powerpc/kvm/e500_mmu_host.c
 index 1c6a9d7..089c227 100644
 --- a/arch/powerpc/kvm/e500_mmu_host.c
 +++ b/arch/powerpc/kvm/e500_mmu_host.c
 @@ -64,13 +64,20 @@ static inline u32
 e500_shadow_mas3_attrib(u32 mas3, int
 usermode)
  return mas3;
}
 
 -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int
 usermode)
 +static inline u32 e500_shadow_mas2_attrib(u32 mas2, pfn_t pfn)
{
 +u32 mas2_attr;
 +
 +mas2_attr = mas2  MAS2_ATTRIB_MASK;
 +
 +if (!pfn_valid(pfn)) {
 
 Why not directly use kvm_is_mmio_pfn()?
 
 What I understand from this function (someone can correct me) is
 that it
 returns false when the page is managed by kernel and is not
 marked as RESERVED (for some reason). For us it does not matter
 whether the page is reserved or not, if it is kernel visible page 
 then it
 is DDR.
 
 
 I think you are setting I|G by addressing all mmio pages, right? If
 so,
 
  KVM: direct mmio pfn check
 
  Userspace may specify memory slots that are backed by mmio
 pages rather than
  normal RAM.  In some cases it is not enough to identify these
 mmio
 pages
  by pfn_valid().  This patch adds checking the PageReserved as 
 well.
 
 Do you know what are those some cases and how checking
 PageReserved helps in
 those cases?
 
 No, myself didn't see these actual cases in qemu,too. But this should
 be chronically persistent as I understand ;-)
 
 Then I will wait till someone educate me :)
 
 The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do 
 not want to call this for all tlbwe operation unless it is necessary.
 
 It certainly does more than we need and potentially slows down the fast 
 path (RAM mapping). The only thing it does on top of if (pfn_valid()) 
 is to check for pages that are declared reserved on the host. This 
 happens in 2 cases:
 
   1) Non cache coherent DMA
   2) Memory hot remove
 
 The non coherent DMA case would be interesting, as with the mechanism as 
 it is in place in Linux today, we could potentially break normal guest 
 operation if we don't take it into account. However, it's Kconfig 
 guarded by:
 
 depends on 4xx || 8xx || E200 || PPC_MPC512x || GAMECUBE_COMMON
 default n if PPC_47x
 default y
 
 so we never hit it with any core we care about ;).
 
 Memory hot remove does not exist on e500 FWIW, so we don't have to worry 
 about that one either.
 
 Thanks for this good information :)
 
 So why not limit those codes with CONFIG_MEMORY_HOTPLUG inside 
 kvm_is_mmio_pfn() to make sure that check is only valid when that is 
 really needed? This 

Re: [RFC PATCH] kvm-unit-tests : Basic architecture of VMX nested test case

2013-07-24 Thread Arthur Chunqi Li
So as what Gleb said, what about the following codes:

static int vmx_run2()
{
u32 eax;
bool ret;

asm volatile(
mov %%rsp, %%rsi\n\t
mov %2, %%edi\n\t
call vmcs_write\n\t
vmlaunch\n\t
setbe %0\n\t
jne 4f\n\t

vmx_return:\n\t
SAVE_GPR_C
call exit_handler\n\t
cmp %3, %%eax\n\t
je 2f\n\t
cmp %4, %%eax\n\t
je 1f\n\t
jmp 3f\n\t

/* VMX_TEST_RESUME */
1:\n\t
LOAD_GPR_C
vmresume\n\t
setbe %0\n\t
jne 4f\n\t
/* VMX_TEST_VMEXIT */
2:\n\t
mov $0, %1\n\t
jmp 5f\n\t
/* undefined ret from exit_handler */
3:\n\t
mov $2, %1\n\t
jmp 5f\n\t
/* vmlaunch/vmresume failed, exit */
4:\n\t
mov $1, %1\n\t
5:\n\t
: =r(ret), =r(eax)
: i(HOST_RSP), i(VMX_TEST_VMEXIT),
  i(VMX_TEST_RESUME)
: rax, rbx, rdi, rsi,
   r8, r9, r10, r11, r12, r13, r14, r15,
  memory, cc
);
switch (eax) {
case 0:
return 0;
case 1:
printf(%s : vmenter failed.\n, __func__);
break;
default:
printf(%s : unhandled ret from exit_handler.\n, __func__);
break;
}
return 1;
}

On Wed, Jul 24, 2013 at 2:48 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 24/07/2013 08:46, Arthur Chunqi Li ha scritto:
 On Wed, Jul 24, 2013 at 2:40 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 24/07/2013 08:11, Arthur Chunqi Li ha scritto:

 static int vmx_run()
 {
 u32 eax;
 bool ret;

 vmcs_write(HOST_RSP, get_rsp());
 ret = vmlaunch();

 The compiler can still change rsp between here...

 while (!ret) {
 asm volatile(
 vmx_return:\n\t

 ... and here.

 If you want to write it in C, the only thing that can be after
 vmlaunch/vmresume is exit().  Else it has to be asm.
 Actually, you mean we need to write all the codes in asm to avoid
 changing to rsp, right?

 Not necessarily all the code.  It is also ok to use setjmp/longjmp with
 a small asm trampoline, because this method won't care about the exact
 rsp values that are used.  But if you want to do as Gleb said, and put
 vmx_return just after vmlaunch, it has to be all asm as in KVM's
 arch/x86/kvm/vmx.c.

 Paolo



-- 
Arthur Chunqi Li
Department of Computer Science
School of EECS
Peking University
Beijing, China
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] kvm-unit-tests : Basic architecture of VMX nested test case

2013-07-24 Thread Jan Kiszka
On 2013-07-24 10:48, Arthur Chunqi Li wrote:
 So as what Gleb said, what about the following codes:
 
 static int vmx_run2()
 {
 u32 eax;
 bool ret;
 
 asm volatile(
 mov %%rsp, %%rsi\n\t
 mov %2, %%edi\n\t
 call vmcs_write\n\t
 vmlaunch\n\t

Just like in KVM, provide a flag to the asm block that selects vmlaunch
or vmresume, then grab all the required information on return and leave
the asm block quickly again.

Jan

 setbe %0\n\t
 jne 4f\n\t
 
 vmx_return:\n\t
 SAVE_GPR_C
 call exit_handler\n\t
 cmp %3, %%eax\n\t
 je 2f\n\t
 cmp %4, %%eax\n\t
 je 1f\n\t
 jmp 3f\n\t
 
 /* VMX_TEST_RESUME */
 1:\n\t
 LOAD_GPR_C
 vmresume\n\t
 setbe %0\n\t
 jne 4f\n\t
 /* VMX_TEST_VMEXIT */
 2:\n\t
 mov $0, %1\n\t
 jmp 5f\n\t
 /* undefined ret from exit_handler */
 3:\n\t
 mov $2, %1\n\t
 jmp 5f\n\t
 /* vmlaunch/vmresume failed, exit */
 4:\n\t
 mov $1, %1\n\t
 5:\n\t
 : =r(ret), =r(eax)
 : i(HOST_RSP), i(VMX_TEST_VMEXIT),
   i(VMX_TEST_RESUME)
 : rax, rbx, rdi, rsi,
r8, r9, r10, r11, r12, r13, r14, r15,
   memory, cc
 );
 switch (eax) {
 case 0:
 return 0;
 case 1:
 printf(%s : vmenter failed.\n, __func__);
 break;
 default:
 printf(%s : unhandled ret from exit_handler.\n, __func__);
 break;
 }
 return 1;
 }
 
 On Wed, Jul 24, 2013 at 2:48 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 24/07/2013 08:46, Arthur Chunqi Li ha scritto:
 On Wed, Jul 24, 2013 at 2:40 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 24/07/2013 08:11, Arthur Chunqi Li ha scritto:

 static int vmx_run()
 {
 u32 eax;
 bool ret;

 vmcs_write(HOST_RSP, get_rsp());
 ret = vmlaunch();

 The compiler can still change rsp between here...

 while (!ret) {
 asm volatile(
 vmx_return:\n\t

 ... and here.

 If you want to write it in C, the only thing that can be after
 vmlaunch/vmresume is exit().  Else it has to be asm.
 Actually, you mean we need to write all the codes in asm to avoid
 changing to rsp, right?

 Not necessarily all the code.  It is also ok to use setjmp/longjmp with
 a small asm trampoline, because this method won't care about the exact
 rsp values that are used.  But if you want to do as Gleb said, and put
 vmx_return just after vmlaunch, it has to be all asm as in KVM's
 arch/x86/kvm/vmx.c.

 Paolo
 
 
 




signature.asc
Description: OpenPGP digital signature


Re: Cannot load kvm_amd module - (says disabled by bios)

2013-07-24 Thread Massimiliano Adamo


On Wed, 2013-07-24 at 10:04 +0200, Massimiliano Adamo wrote:
 
 On Tue, 2013-07-23 at 15:31 -0600, Alex Williamson wrote:
  On Tue, 2013-07-23 at 23:11 +0200, Massimiliano Adamo wrote:
   All, 
   
   this is a bug with KVM, impacting (at least) all mainstream kernels that
   I tried so far: 3.2, 3.3, 3.4, 3.5, 3.8, 3.10 and 3.11
   
   This is the link of the downstream bug:
   https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1201092
   
   - Firt of all I mention that this bug has been raised also for Fedora
   18.
   Here is the link: https://bugzilla.redhat.com/show_bug.cgi?id=978608
   
   - I am running Ubuntu Raring (with the kernel 3.8.0-27-generic), but
   I've also tried the mainstream kernel (without Ubuntu patches).
   
   - It happens with the following CPU: AMD E-350D
   
   - The kvm-ok executable says that the system is capable of running KVM,
   but it also says the it's disabled in the BIOS.
   
   - This is the out put of kvm-ok:
   # kvm-ok
   INFO: /dev/kvm does not exist
   HINT: sudo modprobe kvm_amd
   INFO: Your CPU supports KVM extensions
   INFO: KVM (svm) is disabled by your BIOS
   HINT: Enter your BIOS setup and enable Virtualization Technology (VT),
 and then hard poweroff/poweron your system
   KVM acceleration can NOT be used
   
   - This is what modprobe kvm-amd says:
   # modprobe kvm-amd
   ERROR: could not insert 'kvm_amd': Operation not supported
   root@yasna:~# dmesg |tail -n1
   [ 2542.263745] kvm: disabled by bios
   
   - AMD-V extension on Virtualbox works correctly.
   Therefore the extension works properly but it's not recognized by KVM.
  
  What evidence do you have that Virtualbox is actually making use of
  AMD-V?
  
 
 Hi! That's a good point. 
 Right now I can only say it was selectable from the GUI. Therefore I
 just imagined it's recognized.  
 Now I try to see how to enable/check logs on Virtualbox and I'll get
 back to you with some feedback.
 


Hi,

logs are enabled by default and they say that the extension might be
disabled even for Virtualbox:

00:00:01.406771 SVM - AMD VM Extensions= 0 (1)
00:00:01.406945 HWACCM: No VT-x or AMD-V CPU extension found. Reason
VERR_SVM_DISABLED

(therefore, it let's you enable the AMD-V feature, because it's found on
the CPU but it's not able to use it). 

And I've found a suggestion to run the commands below: 
# rdmsr -f 12:12 0xc080 # If set (1), SVME is enabled.
0
#
# rdmsr -f 4:4 0xc0010114 # If set (1), it is disabled.
1

If I will have time, I would try installing Windows on a spare
partition, and then try running virtualbox there (and check logs). 
If you have a better advice, or any other idea, please just let me
know. 

p.s.: CC-ing the person who is facing the same issue with Fedora.

--  
Massimiliano


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-24 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Wednesday, July 24, 2013 1:55 PM
 To: “tiejun.chen”
 Cc: Bhushan Bharat-R65777; kvm-...@vger.kernel.org; kvm@vger.kernel.org list;
 Wood Scott-B07421; Gleb Natapov; Paolo Bonzini
 Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel
 managed pages
 
 
 On 24.07.2013, at 04:26, “tiejun.chen” wrote:
 
  On 07/18/2013 06:27 PM, Alexander Graf wrote:
 
  On 18.07.2013, at 12:19, “tiejun.chen” wrote:
 
  On 07/18/2013 06:12 PM, Alexander Graf wrote:
 
  On 18.07.2013, at 12:08, “tiejun.chen” wrote:
 
  On 07/18/2013 05:48 PM, Alexander Graf wrote:
 
  On 18.07.2013, at 10:25, Bhushan Bharat-R65777 wrote:
 
 
 
  -Original Message-
  From: Bhushan Bharat-R65777
  Sent: Thursday, July 18, 2013 1:53 PM
  To: ' tiejun.chen '
  Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org;
  ag...@suse.de; Wood Scott-
  B07421
  Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only
  for kernel managed pages
 
 
 
  -Original Message-
  From:  tiejun.chen  [mailto:tiejun.c...@windriver.com]
  Sent: Thursday, July 18, 2013 1:52 PM
  To: Bhushan Bharat-R65777
  Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org;
  ag...@suse.de; Wood
  Scott-
  B07421
  Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency
  only for kernel managed pages
 
  On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote:
 
 
  -Original Message-
  From: kvm-ppc-ow...@vger.kernel.org
  [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of  tiejun.chen 
  
  Sent: Thursday, July 18, 2013 1:01 PM
  To: Bhushan Bharat-R65777
  Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org;
  ag...@suse.de; Wood
  Scott-
  B07421
  Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency
  only for kernel managed pages
 
  On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote:
 
 
  -Original Message-
  From:  tiejun.chen  [mailto:tiejun.c...@windriver.com]
  Sent: Thursday, July 18, 2013 11:56 AM
  To: Bhushan Bharat-R65777
  Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org;
  ag...@suse.de; Wood
  Scott- B07421; Bhushan Bharat-R65777
  Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency
  only for kernel managed pages
 
  On 07/18/2013 02:04 PM, Bharat Bhushan wrote:
  If there is a struct page for the requested mapping then
  it's normal DDR and the mapping sets M bit (coherent,
  cacheable) else this is treated as I/O and we set  I +
  G  (cache inhibited,
  guarded)
 
  This helps setting proper TLB mapping for direct assigned
  device
 
  Signed-off-by: Bharat Bhushan
  bharat.bhus...@freescale.com
  ---
 arch/powerpc/kvm/e500_mmu_host.c |   17 -
 1 files changed, 12 insertions(+), 5 deletions(-)
 
  diff --git a/arch/powerpc/kvm/e500_mmu_host.c
  b/arch/powerpc/kvm/e500_mmu_host.c
  index 1c6a9d7..089c227 100644
  --- a/arch/powerpc/kvm/e500_mmu_host.c
  +++ b/arch/powerpc/kvm/e500_mmu_host.c
  @@ -64,13 +64,20 @@ static inline u32
  e500_shadow_mas3_attrib(u32 mas3, int
  usermode)
 return mas3;
 }
 
  -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int
  usermode)
  +static inline u32 e500_shadow_mas2_attrib(u32 mas2,
  +pfn_t pfn)
 {
  +  u32 mas2_attr;
  +
  +  mas2_attr = mas2  MAS2_ATTRIB_MASK;
  +
  +  if (!pfn_valid(pfn)) {
 
  Why not directly use kvm_is_mmio_pfn()?
 
  What I understand from this function (someone can correct
  me) is that it
  returns false when the page is managed by kernel and is
  not marked as RESERVED (for some reason). For us it does not
  matter whether the page is reserved or not, if it is kernel
  visible page then it
  is DDR.
 
 
  I think you are setting I|G by addressing all mmio pages,
  right? If so,
 
   KVM: direct mmio pfn check
 
   Userspace may specify memory slots that are backed by
  mmio pages rather than
   normal RAM.  In some cases it is not enough to identify
  these mmio
  pages
   by pfn_valid().  This patch adds checking the PageReserved as
 well.
 
  Do you know what are those some cases and how checking
  PageReserved helps in
  those cases?
 
  No, myself didn't see these actual cases in qemu,too. But this
  should be chronically persistent as I understand ;-)
 
  Then I will wait till someone educate me :)
 
  The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do
 not want to call this for all tlbwe operation unless it is necessary.
 
  It certainly does more than we need and potentially slows down the fast
 path (RAM mapping). The only thing it does on top of if (pfn_valid()) is to
 check for pages that are declared reserved on the host. This happens in 2 
 cases:
 
1) Non cache coherent DMA
2) Memory hot remove
 
  The non coherent DMA case would be interesting, as with the mechanism 
  as
 it is in place in Linux today, we could potentially break normal guest 
 operation
 if we don't take it into account. However, it's Kconfig guarded by:
 
  

Re: [RFC PATCH] kvm-unit-tests : Basic architecture of VMX nested test case

2013-07-24 Thread Paolo Bonzini
Il 24/07/2013 10:48, Arthur Chunqi Li ha scritto:
 So as what Gleb said, what about the following codes:
 
 static int vmx_run2()
 {
 u32 eax;
 bool ret;
 
 asm volatile(
 mov %%rsp, %%rsi\n\t
 mov %2, %%edi\n\t
 call vmcs_write\n\t
 vmlaunch\n\t
 setbe %0\n\t
 jne 4f\n\t

setbe doesn't set the flags, so you need jbe here (and then you can have
a mov $1, %0 at label 4 instead of using setbe).

 
 vmx_return:\n\t
 SAVE_GPR_C
 call exit_handler\n\t
 cmp %3, %%eax\n\t
 je 2f\n\t
 cmp %4, %%eax\n\t
 je 1f\n\t
 jmp 3f\n\t
 
 /* VMX_TEST_RESUME */
 1:\n\t
 LOAD_GPR_C
 vmresume\n\t

You need a SAVE_GPR_C here.  Then it is better to put the jbe at the
vmx_return label:

... do vmlaunch or vmreturn as Jan said ...
 vmx_return:
SAVE_GPR_C
jbe 4f
call exit_handler
etc.

Here is the relevant code from KVM that Jan was referring to:

jne 1f \n\t
__ex(ASM_VMX_VMLAUNCH) \n\t
jmp 2f \n\t
1:  __ex(ASM_VMX_VMRESUME) \n\t
2: 

I'd prefer to have a jbe vmx_return; ud2 after the vmlaunch/vmresume,
in order to test that the hypervisor respects HOST_RIP.

Paolo

 setbe %0\n\t
 jne 4f\n\t
 /* VMX_TEST_VMEXIT */
 2:\n\t
 mov $0, %1\n\t
 jmp 5f\n\t
 /* undefined ret from exit_handler */
 3:\n\t
 mov $2, %1\n\t
 jmp 5f\n\t
 /* vmlaunch/vmresume failed, exit */
 4:\n\t
 mov $1, %1\n\t
 5:\n\t
 : =r(ret), =r(eax)
 : i(HOST_RSP), i(VMX_TEST_VMEXIT),
   i(VMX_TEST_RESUME)
 : rax, rbx, rdi, rsi,
r8, r9, r10, r11, r12, r13, r14, r15,
   memory, cc
 );
 switch (eax) {
 case 0:
 return 0;
 case 1:
 printf(%s : vmenter failed.\n, __func__);
 break;
 default:
 printf(%s : unhandled ret from exit_handler.\n, __func__);
 break;
 }
 return 1;
 }
 
 On Wed, Jul 24, 2013 at 2:48 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 24/07/2013 08:46, Arthur Chunqi Li ha scritto:
 On Wed, Jul 24, 2013 at 2:40 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 24/07/2013 08:11, Arthur Chunqi Li ha scritto:

 static int vmx_run()
 {
 u32 eax;
 bool ret;

 vmcs_write(HOST_RSP, get_rsp());
 ret = vmlaunch();

 The compiler can still change rsp between here...

 while (!ret) {
 asm volatile(
 vmx_return:\n\t

 ... and here.

 If you want to write it in C, the only thing that can be after
 vmlaunch/vmresume is exit().  Else it has to be asm.
 Actually, you mean we need to write all the codes in asm to avoid
 changing to rsp, right?

 Not necessarily all the code.  It is also ok to use setjmp/longjmp with
 a small asm trampoline, because this method won't care about the exact
 rsp values that are used.  But if you want to do as Gleb said, and put
 vmx_return just after vmlaunch, it has to be all asm as in KVM's
 arch/x86/kvm/vmx.c.

 Paolo
 
 
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RESEND RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor

2013-07-24 Thread Raghavendra K T
* Gleb Natapov g...@redhat.com [2013-07-23 18:07:48]:

 On Mon, Jul 22, 2013 at 11:50:16AM +0530, Raghavendra K T wrote:
  +static void kvm_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
  +{
  +   struct kvm_lock_waiting *w;
  +   int cpu;
  +   u64 start;
  +   unsigned long flags;
  +
 Why don't you bailout if in nmi here like we discussed?

Sorry. I misunderstood that we shall ignore that part. Here
is the updated one

---8---

kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor

From: Srivatsa Vaddagiri va...@linux.vnet.ibm.com

During smp_boot_cpus  paravirtualied KVM guest detects if the hypervisor has
required feature (KVM_FEATURE_PV_UNHALT) to support pv-ticketlocks. If so,
 support for pv-ticketlocks is registered via pv_lock_ops.

Use KVM_HC_KICK_CPU hypercall to wakeup waiting/halted vcpu.

Signed-off-by: Srivatsa Vaddagiri va...@linux.vnet.ibm.com
Signed-off-by: Suzuki Poulose suz...@in.ibm.com
[Raghu: check_zero race fix, enum for kvm_contention_stat, jumplabel related 
changes,
addition of safe_halt for irq enabled case(Gleb)]
Signed-off-by: Raghavendra K T raghavendra...@linux.vnet.ibm.com
---
 arch/x86/include/asm/kvm_para.h |   14 ++
 arch/x86/kernel/kvm.c   |  262 +++
 2 files changed, 274 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h
index 695399f..427afcb 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -118,10 +118,20 @@ void kvm_async_pf_task_wait(u32 token);
 void kvm_async_pf_task_wake(u32 token);
 u32 kvm_read_and_reset_pf_reason(void);
 extern void kvm_disable_steal_time(void);
-#else
-#define kvm_guest_init() do { } while (0)
+
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+void __init kvm_spinlock_init(void);
+#else /* !CONFIG_PARAVIRT_SPINLOCKS */
+static inline void kvm_spinlock_init(void)
+{
+}
+#endif /* CONFIG_PARAVIRT_SPINLOCKS */
+
+#else /* CONFIG_KVM_GUEST */
+#define kvm_guest_init() do {} while (0)
 #define kvm_async_pf_task_wait(T) do {} while(0)
 #define kvm_async_pf_task_wake(T) do {} while(0)
+
 static inline u32 kvm_read_and_reset_pf_reason(void)
 {
return 0;
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index cd6d9a5..fe42970 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -34,6 +34,7 @@
 #include linux/sched.h
 #include linux/slab.h
 #include linux/kprobes.h
+#include linux/debugfs.h
 #include asm/timer.h
 #include asm/cpu.h
 #include asm/traps.h
@@ -419,6 +420,7 @@ static void __init kvm_smp_prepare_boot_cpu(void)
WARN_ON(kvm_register_clock(primary cpu clock));
kvm_guest_cpu_init();
native_smp_prepare_boot_cpu();
+   kvm_spinlock_init();
 }
 
 static void __cpuinit kvm_guest_cpu_online(void *dummy)
@@ -523,3 +525,263 @@ static __init int activate_jump_labels(void)
return 0;
 }
 arch_initcall(activate_jump_labels);
+
+/* Kick a cpu by its apicid. Used to wake up a halted vcpu */
+void kvm_kick_cpu(int cpu)
+{
+   int apicid;
+   unsigned long flags = 0;
+
+   apicid = per_cpu(x86_cpu_to_apicid, cpu);
+   kvm_hypercall2(KVM_HC_KICK_CPU, flags, apicid);
+}
+
+#ifdef CONFIG_PARAVIRT_SPINLOCKS
+
+enum kvm_contention_stat {
+   TAKEN_SLOW,
+   TAKEN_SLOW_PICKUP,
+   RELEASED_SLOW,
+   RELEASED_SLOW_KICKED,
+   NR_CONTENTION_STATS
+};
+
+#ifdef CONFIG_KVM_DEBUG_FS
+#define HISTO_BUCKETS  30
+
+static struct kvm_spinlock_stats
+{
+   u32 contention_stats[NR_CONTENTION_STATS];
+   u32 histo_spin_blocked[HISTO_BUCKETS+1];
+   u64 time_blocked;
+} spinlock_stats;
+
+static u8 zero_stats;
+
+static inline void check_zero(void)
+{
+   u8 ret;
+   u8 old;
+
+   old = ACCESS_ONCE(zero_stats);
+   if (unlikely(old)) {
+   ret = cmpxchg(zero_stats, old, 0);
+   /* This ensures only one fellow resets the stat */
+   if (ret == old)
+   memset(spinlock_stats, 0, sizeof(spinlock_stats));
+   }
+}
+
+static inline void add_stats(enum kvm_contention_stat var, u32 val)
+{
+   check_zero();
+   spinlock_stats.contention_stats[var] += val;
+}
+
+
+static inline u64 spin_time_start(void)
+{
+   return sched_clock();
+}
+
+static void __spin_time_accum(u64 delta, u32 *array)
+{
+   unsigned index;
+
+   index = ilog2(delta);
+   check_zero();
+
+   if (index  HISTO_BUCKETS)
+   array[index]++;
+   else
+   array[HISTO_BUCKETS]++;
+}
+
+static inline void spin_time_accum_blocked(u64 start)
+{
+   u32 delta;
+
+   delta = sched_clock() - start;
+   __spin_time_accum(delta, spinlock_stats.histo_spin_blocked);
+   spinlock_stats.time_blocked += delta;
+}
+
+static struct dentry *d_spin_debug;
+static struct dentry *d_kvm_debug;
+
+struct dentry *kvm_init_debugfs(void)
+{
+   d_kvm_debug = debugfs_create_dir(kvm, NULL);
+   if (!d_kvm_debug)
+   

Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-24 Thread Alexander Graf

On 24.07.2013, at 11:11, Bhushan Bharat-R65777 wrote:

 
 
 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Wednesday, July 24, 2013 1:55 PM
 To: “tiejun.chen”
 Cc: Bhushan Bharat-R65777; kvm-...@vger.kernel.org; kvm@vger.kernel.org list;
 Wood Scott-B07421; Gleb Natapov; Paolo Bonzini
 Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel
 managed pages
 
 
 On 24.07.2013, at 04:26, “tiejun.chen” wrote:
 
 On 07/18/2013 06:27 PM, Alexander Graf wrote:
 
 On 18.07.2013, at 12:19, “tiejun.chen” wrote:
 
 On 07/18/2013 06:12 PM, Alexander Graf wrote:
 
 On 18.07.2013, at 12:08, “tiejun.chen” wrote:
 
 On 07/18/2013 05:48 PM, Alexander Graf wrote:
 
 On 18.07.2013, at 10:25, Bhushan Bharat-R65777 wrote:
 
 
 
 -Original Message-
 From: Bhushan Bharat-R65777
 Sent: Thursday, July 18, 2013 1:53 PM
 To: ' tiejun.chen '
 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org;
 ag...@suse.de; Wood Scott-
 B07421
 Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only
 for kernel managed pages
 
 
 
 -Original Message-
 From:  tiejun.chen  [mailto:tiejun.c...@windriver.com]
 Sent: Thursday, July 18, 2013 1:52 PM
 To: Bhushan Bharat-R65777
 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org;
 ag...@suse.de; Wood
 Scott-
 B07421
 Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency
 only for kernel managed pages
 
 On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote:
 
 
 -Original Message-
 From: kvm-ppc-ow...@vger.kernel.org
 [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of  tiejun.chen 
 
 Sent: Thursday, July 18, 2013 1:01 PM
 To: Bhushan Bharat-R65777
 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org;
 ag...@suse.de; Wood
 Scott-
 B07421
 Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency
 only for kernel managed pages
 
 On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote:
 
 
 -Original Message-
 From:  tiejun.chen  [mailto:tiejun.c...@windriver.com]
 Sent: Thursday, July 18, 2013 11:56 AM
 To: Bhushan Bharat-R65777
 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org;
 ag...@suse.de; Wood
 Scott- B07421; Bhushan Bharat-R65777
 Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency
 only for kernel managed pages
 
 On 07/18/2013 02:04 PM, Bharat Bhushan wrote:
 If there is a struct page for the requested mapping then
 it's normal DDR and the mapping sets M bit (coherent,
 cacheable) else this is treated as I/O and we set  I +
 G  (cache inhibited,
 guarded)
 
 This helps setting proper TLB mapping for direct assigned
 device
 
 Signed-off-by: Bharat Bhushan
 bharat.bhus...@freescale.com
 ---
   arch/powerpc/kvm/e500_mmu_host.c |   17 -
   1 files changed, 12 insertions(+), 5 deletions(-)
 
 diff --git a/arch/powerpc/kvm/e500_mmu_host.c
 b/arch/powerpc/kvm/e500_mmu_host.c
 index 1c6a9d7..089c227 100644
 --- a/arch/powerpc/kvm/e500_mmu_host.c
 +++ b/arch/powerpc/kvm/e500_mmu_host.c
 @@ -64,13 +64,20 @@ static inline u32
 e500_shadow_mas3_attrib(u32 mas3, int
 usermode)
return mas3;
   }
 
 -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int
 usermode)
 +static inline u32 e500_shadow_mas2_attrib(u32 mas2,
 +pfn_t pfn)
   {
 +  u32 mas2_attr;
 +
 +  mas2_attr = mas2  MAS2_ATTRIB_MASK;
 +
 +  if (!pfn_valid(pfn)) {
 
 Why not directly use kvm_is_mmio_pfn()?
 
 What I understand from this function (someone can correct
 me) is that it
 returns false when the page is managed by kernel and is
 not marked as RESERVED (for some reason). For us it does not
 matter whether the page is reserved or not, if it is kernel
 visible page then it
 is DDR.
 
 
 I think you are setting I|G by addressing all mmio pages,
 right? If so,
 
 KVM: direct mmio pfn check
 
 Userspace may specify memory slots that are backed by
 mmio pages rather than
 normal RAM.  In some cases it is not enough to identify
 these mmio
 pages
 by pfn_valid().  This patch adds checking the PageReserved as
 well.
 
 Do you know what are those some cases and how checking
 PageReserved helps in
 those cases?
 
 No, myself didn't see these actual cases in qemu,too. But this
 should be chronically persistent as I understand ;-)
 
 Then I will wait till someone educate me :)
 
 The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do
 not want to call this for all tlbwe operation unless it is necessary.
 
 It certainly does more than we need and potentially slows down the fast
 path (RAM mapping). The only thing it does on top of if (pfn_valid()) is to
 check for pages that are declared reserved on the host. This happens in 2 
 cases:
 
  1) Non cache coherent DMA
  2) Memory hot remove
 
 The non coherent DMA case would be interesting, as with the mechanism 
 as
 it is in place in Linux today, we could potentially break normal guest 
 operation
 if we don't take it into account. However, it's Kconfig guarded by:
 
depends on 4xx || 8xx || E200 || PPC_MPC512x || GAMECUBE_COMMON
   

Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-24 Thread Gleb Natapov
On Wed, Jul 24, 2013 at 11:21:11AM +0200, Alexander Graf wrote:
  Are not we going to use page_is_ram() from  e500_shadow_mas2_attrib() as 
  Scott commented?
 
 rWhy aren't we using page_is_ram() in kvm_is_mmio_pfn()?
 
 
Because it is much slower and, IIRC, actually used to build pfn map that allow
us to check quickly for valid pfn.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor

2013-07-24 Thread Raghavendra K T

On 07/23/2013 08:37 PM, Gleb Natapov wrote:

On Mon, Jul 22, 2013 at 11:50:16AM +0530, Raghavendra K T wrote:

+static void kvm_lock_spinning(struct arch_spinlock *lock, __ticket_t want)

[...]

+
+   /*
+* halt until it's our turn and kicked. Note that we do safe halt
+* for irq enabled case to avoid hang when lock info is overwritten
+* in irq spinlock slowpath and no spurious interrupt occur to save us.
+*/
+   if (arch_irqs_disabled_flags(flags))
+   halt();
+   else
+   safe_halt();
+
+out:

So here now interrupts can be either disabled or enabled. Previous
version disabled interrupts here, so are we sure it is safe to have them
enabled at this point? I do not see any problem yet, will keep thinking.


If we enable interrupt here, then



+   cpumask_clear_cpu(cpu, waiting_cpus);


and if we start serving lock for an interrupt that came here,
cpumask clear and w-lock=null may not happen atomically.
if irq spinlock does not take slow path we would have non null value for 
lock, but with no information in waitingcpu.


I am still thinking what would be problem with that.


+   w-lock = NULL;
+   local_irq_restore(flags);
+   spin_time_accum_blocked(start);
+}
+PV_CALLEE_SAVE_REGS_THUNK(kvm_lock_spinning);
+
+/* Kick vcpu waiting on @lock-head to reach value @ticket */
+static void kvm_unlock_kick(struct arch_spinlock *lock, __ticket_t ticket)
+{
+   int cpu;
+
+   add_stats(RELEASED_SLOW, 1);
+   for_each_cpu(cpu, waiting_cpus) {
+   const struct kvm_lock_waiting *w = per_cpu(lock_waiting, cpu);
+   if (ACCESS_ONCE(w-lock) == lock 
+   ACCESS_ONCE(w-want) == ticket) {
+   add_stats(RELEASED_SLOW_KICKED, 1);
+   kvm_kick_cpu(cpu);

What about using NMI to wake sleepers? I think it was discussed, but
forgot why it was dismissed.


I think I have missed that discussion. 'll go back and check. so what is 
the idea here? we can easily wake up the halted vcpus that have 
interrupt disabled?


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-24 Thread Alexander Graf

On 24.07.2013, at 11:35, Gleb Natapov wrote:

 On Wed, Jul 24, 2013 at 11:21:11AM +0200, Alexander Graf wrote:
 Are not we going to use page_is_ram() from  e500_shadow_mas2_attrib() as 
 Scott commented?
 
 rWhy aren't we using page_is_ram() in kvm_is_mmio_pfn()?
 
 
 Because it is much slower and, IIRC, actually used to build pfn map that allow
 us to check quickly for valid pfn.

Then why should we use page_is_ram()? :)

I really don't want the e500 code to diverge too much from what the rest of the 
kvm code is doing.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] kvm-unit-tests : Basic architecture of VMX nested test case

2013-07-24 Thread Arthur Chunqi Li
So what about this one. I merged all the exit reason to ret and
remove the flag detection after vmlaunch/vmresume (because I think
this detection is useless). Currently we support only one guest, so
variant launched is located in vmx_run(). If we want to support
multiple guest, we could move it to some structures (e.g.
environment_ctxt). Now I just put it here.

static int vmx_run()
{
u32 ret = 0;
bool launched = 0;

asm volatile(
mov %%rsp, %%rsi\n\t
mov %2, %%edi\n\t
call vmcs_write\n\t

0: 
LOAD_GPR_C
cmp $0, %1\n\t
jne 1f\n\t
vmlaunch\n\t
SAVE_GPR_C
/* vmlaunch error, return VMX_TEST_LAUNCH_ERR */
mov %3, %0\n\t
jmp 2f\n\t
1: 
vmresume\n\t
SAVE_GPR_C
/* vmresume error, return VMX_TEST_RESUME_ERR */
mov %4, %0\n\t
jmp 2f\n\t
vmx_return:\n\t
SAVE_GPR_C
call exit_handler\n\t
/* set launched = 1 */
mov $0x1, %1\n\t
cmp %5, %%eax\n\t
je 0b\n\t
mov %%eax, %0\n\t
2: 
: =r(ret), =r(launched)
: i(HOST_RSP), i(VMX_TEST_LAUNCH_ERR),
  i(VMX_TEST_RESUME_ERR), i(VMX_TEST_RESUME)
: rax, rbx, rdi, rsi,
  r8, r9, r10, r11, r12, r13, r14, r15,
  memory, cc
);
switch (ret) {
case VMX_TEST_VMEXIT:
return 0;
case VMX_TEST_LAUNCH_ERR:
printf(%s : vmlaunch failed.\n, __func__);
break;
case VMX_TEST_RESUME_ERR:
printf(%s : vmresume failed.\n, __func__);
break;
default:
printf(%s : unhandled ret from exit_handler, ret=%d.\n,
__func__, ret);
break;
}
return 1;
}

On Wed, Jul 24, 2013 at 5:16 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 24/07/2013 10:48, Arthur Chunqi Li ha scritto:
 So as what Gleb said, what about the following codes:

 static int vmx_run2()
 {
 u32 eax;
 bool ret;

 asm volatile(
 mov %%rsp, %%rsi\n\t
 mov %2, %%edi\n\t
 call vmcs_write\n\t
 vmlaunch\n\t
 setbe %0\n\t
 jne 4f\n\t

 setbe doesn't set the flags, so you need jbe here (and then you can have
 a mov $1, %0 at label 4 instead of using setbe).


 vmx_return:\n\t
 SAVE_GPR_C
 call exit_handler\n\t
 cmp %3, %%eax\n\t
 je 2f\n\t
 cmp %4, %%eax\n\t
 je 1f\n\t
 jmp 3f\n\t

 /* VMX_TEST_RESUME */
 1:\n\t
 LOAD_GPR_C
 vmresume\n\t

 You need a SAVE_GPR_C here.  Then it is better to put the jbe at the
 vmx_return label:

 ... do vmlaunch or vmreturn as Jan said ...
  vmx_return:
 SAVE_GPR_C
 jbe 4f
 call exit_handler
 etc.

 Here is the relevant code from KVM that Jan was referring to:

 jne 1f \n\t
 __ex(ASM_VMX_VMLAUNCH) \n\t
 jmp 2f \n\t
 1:  __ex(ASM_VMX_VMRESUME) \n\t
 2: 

 I'd prefer to have a jbe vmx_return; ud2 after the vmlaunch/vmresume,
 in order to test that the hypervisor respects HOST_RIP.

 Paolo

 setbe %0\n\t
 jne 4f\n\t
 /* VMX_TEST_VMEXIT */
 2:\n\t
 mov $0, %1\n\t
 jmp 5f\n\t
 /* undefined ret from exit_handler */
 3:\n\t
 mov $2, %1\n\t
 jmp 5f\n\t
 /* vmlaunch/vmresume failed, exit */
 4:\n\t
 mov $1, %1\n\t
 5:\n\t
 : =r(ret), =r(eax)
 : i(HOST_RSP), i(VMX_TEST_VMEXIT),
   i(VMX_TEST_RESUME)
 : rax, rbx, rdi, rsi,
r8, r9, r10, r11, r12, r13, r14, r15,
   memory, cc
 );
 switch (eax) {
 case 0:
 return 0;
 case 1:
 printf(%s : vmenter failed.\n, __func__);
 break;
 default:
 printf(%s : unhandled ret from exit_handler.\n, __func__);
 break;
 }
 return 1;
 }

 On Wed, Jul 24, 2013 at 2:48 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 24/07/2013 08:46, Arthur Chunqi Li ha scritto:
 On Wed, Jul 24, 2013 at 2:40 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 24/07/2013 08:11, Arthur Chunqi Li ha scritto:

 static int vmx_run()
 {
 u32 eax;
 bool ret;

 vmcs_write(HOST_RSP, get_rsp());
 ret = vmlaunch();

 The compiler can still change rsp between here...

 while (!ret) {
 asm volatile(
 vmx_return:\n\t

 ... and here.

 If you want to write it in C, the only thing that can be after
 vmlaunch/vmresume is exit().  Else it has to be asm.
 Actually, you mean we need to write all the codes in asm to avoid
 changing to rsp, right?

 Not necessarily all the code.  It is also ok to use setjmp/longjmp with
 a small asm trampoline, because this method won't care about the exact
 rsp values that are used.  But if you want to do as Gleb said, and put
 vmx_return just after vmlaunch, it has to be all asm as in KVM's
 arch/x86/kvm/vmx.c.

 Paolo







-- 
Arthur 

Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-24 Thread Gleb Natapov
Copying Andrea for him to verify that I am not talking nonsense :)

On Wed, Jul 24, 2013 at 10:25:20AM +0200, Alexander Graf wrote:
  diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
  index 1580dd4..5e8635b 100644
  --- a/virt/kvm/kvm_main.c
  +++ b/virt/kvm/kvm_main.c
  @@ -102,6 +102,10 @@ static bool largepages_enabled = true;
  
  bool kvm_is_mmio_pfn(pfn_t pfn)
  {
  +#ifdef CONFIG_MEMORY_HOTPLUG
 
 I'd feel safer if we narrow this down to e500.
 
  +   /*
  +* Currently only in memory hot remove case we may still need this.
  +*/
 if (pfn_valid(pfn)) {
 
 We still have to check for pfn_valid, no? So the #ifdef should be down here.
 
 int reserved;
 struct page *tail = pfn_to_page(pfn);
  @@ -124,6 +128,7 @@ bool kvm_is_mmio_pfn(pfn_t pfn)
 }
 return PageReserved(tail);
 }
  +#endif
  
 return true;
  }
  
  Before apply this change:
  
  real(1m19.954s + 1m20.918s + 1m22.740s + 1m21.146s + 1m22.120s)/5= 
  1m21.376s
  user(0m23.181s + 0m23.550s + 0m23.506s + 0m23.410s + 0m23.520s)/5= 
  0m23.433s
  sys (0m49.087s + 0m49.563s + 0m51.758s + 0m50.290s + 0m51.047s)/5= 0m50.349s
  
  After apply this change:
  
  real(1m19.507s + 1m20.919s + 1m21.436s + 1m21.179s + 1m20.293s)/5= 
  1m20.667s
  user(0m22.595s + 0m22.719s + 0m22.484s + 0m22.811s + 0m22.467s)/5= 
  0m22.615s
  sys (0m48.841s + 0m49.929s + 0m50.310s + 0m49.813s + 0m48.587s)/5= 0m49.496s
  
  So,
  
  real(1m20.667s - 1m21.376s)/1m21.376s x 100% = -0.6%
  user(0m22.615s - 0m23.433s)/0m23.433s x 100% = -3.5%
  sys (0m49.496s - 0m50.349s)/0m50.349s x 100% = -1.7%
 
 Very nice, so there is a real world performance benefit to doing this. Then 
 yes, I think it would make sense to change the global helper function to be 
 fast on e500 and use that one from e500_shadow_mas2_attrib() instead.
 
 Gleb, Paolo, any hard feelings?
 
I do not see how can we break the function in such a way and get
away with it. Not all valid pfns point to memory. Physical address can
be sparse (due to PCI hole, framebuffer or just because).

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] kvm-unit-tests : Basic architecture of VMX nested test case

2013-07-24 Thread Jan Kiszka
On 2013-07-24 11:56, Arthur Chunqi Li wrote:
 So what about this one. I merged all the exit reason to ret and
 remove the flag detection after vmlaunch/vmresume (because I think
 this detection is useless). Currently we support only one guest, so
 variant launched is located in vmx_run(). If we want to support
 multiple guest, we could move it to some structures (e.g.
 environment_ctxt). Now I just put it here.
 
 static int vmx_run()
 {
 u32 ret = 0;
 bool launched = 0;
 
 asm volatile(
 mov %%rsp, %%rsi\n\t
 mov %2, %%edi\n\t
 call vmcs_write\n\t
 
 0: 
 LOAD_GPR_C
 cmp $0, %1\n\t
 jne 1f\n\t
 vmlaunch\n\t
 SAVE_GPR_C
 /* vmlaunch error, return VMX_TEST_LAUNCH_ERR */
 mov %3, %0\n\t
 jmp 2f\n\t
 1: 
 vmresume\n\t
 SAVE_GPR_C
 /* vmresume error, return VMX_TEST_RESUME_ERR */
 mov %4, %0\n\t
 jmp 2f\n\t

Where do you store the flags now? You may want to differentiate / test
if ZF of CF is set.

Jan

 vmx_return:\n\t
 SAVE_GPR_C
 call exit_handler\n\t
 /* set launched = 1 */
 mov $0x1, %1\n\t
 cmp %5, %%eax\n\t
 je 0b\n\t
 mov %%eax, %0\n\t
 2: 
 : =r(ret), =r(launched)
 : i(HOST_RSP), i(VMX_TEST_LAUNCH_ERR),
   i(VMX_TEST_RESUME_ERR), i(VMX_TEST_RESUME)
 : rax, rbx, rdi, rsi,
   r8, r9, r10, r11, r12, r13, r14, r15,
   memory, cc
 );
 switch (ret) {
 case VMX_TEST_VMEXIT:
 return 0;
 case VMX_TEST_LAUNCH_ERR:
 printf(%s : vmlaunch failed.\n, __func__);
 break;
 case VMX_TEST_RESUME_ERR:
 printf(%s : vmresume failed.\n, __func__);
 break;
 default:
 printf(%s : unhandled ret from exit_handler, ret=%d.\n,
 __func__, ret);
 break;
 }
 return 1;
 }
 




signature.asc
Description: OpenPGP digital signature


Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-24 Thread Alexander Graf

On 24.07.2013, at 12:01, Gleb Natapov wrote:

 Copying Andrea for him to verify that I am not talking nonsense :)
 
 On Wed, Jul 24, 2013 at 10:25:20AM +0200, Alexander Graf wrote:
 diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
 index 1580dd4..5e8635b 100644
 --- a/virt/kvm/kvm_main.c
 +++ b/virt/kvm/kvm_main.c
 @@ -102,6 +102,10 @@ static bool largepages_enabled = true;
 
 bool kvm_is_mmio_pfn(pfn_t pfn)
 {
 +#ifdef CONFIG_MEMORY_HOTPLUG
 
 I'd feel safer if we narrow this down to e500.
 
 +   /*
 +* Currently only in memory hot remove case we may still need this.
 +*/
   if (pfn_valid(pfn)) {
 
 We still have to check for pfn_valid, no? So the #ifdef should be down here.
 
   int reserved;
   struct page *tail = pfn_to_page(pfn);
 @@ -124,6 +128,7 @@ bool kvm_is_mmio_pfn(pfn_t pfn)
   }
   return PageReserved(tail);
   }
 +#endif
 
   return true;
 }
 
 Before apply this change:
 
 real(1m19.954s + 1m20.918s + 1m22.740s + 1m21.146s + 1m22.120s)/5= 
 1m21.376s
 user(0m23.181s + 0m23.550s + 0m23.506s + 0m23.410s + 0m23.520s)/5= 
 0m23.433s
 sys (0m49.087s + 0m49.563s + 0m51.758s + 0m50.290s + 0m51.047s)/5= 0m50.349s
 
 After apply this change:
 
 real(1m19.507s + 1m20.919s + 1m21.436s + 1m21.179s + 1m20.293s)/5= 
 1m20.667s
 user(0m22.595s + 0m22.719s + 0m22.484s + 0m22.811s + 0m22.467s)/5= 
 0m22.615s
 sys (0m48.841s + 0m49.929s + 0m50.310s + 0m49.813s + 0m48.587s)/5= 0m49.496s
 
 So,
 
 real(1m20.667s - 1m21.376s)/1m21.376s x 100% = -0.6%
 user(0m22.615s - 0m23.433s)/0m23.433s x 100% = -3.5%
 sys (0m49.496s - 0m50.349s)/0m50.349s x 100% = -1.7%
 
 Very nice, so there is a real world performance benefit to doing this. Then 
 yes, I think it would make sense to change the global helper function to be 
 fast on e500 and use that one from e500_shadow_mas2_attrib() instead.
 
 Gleb, Paolo, any hard feelings?
 
 I do not see how can we break the function in such a way and get
 away with it. Not all valid pfns point to memory. Physical address can
 be sparse (due to PCI hole, framebuffer or just because).

But we don't check for sparseness today in here either. We merely check for 
incomplete huge pages.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] kvm-unit-tests : Basic architecture of VMX nested test case

2013-07-24 Thread Arthur Chunqi Li
On Wed, Jul 24, 2013 at 6:03 PM, Jan Kiszka jan.kis...@web.de wrote:
 On 2013-07-24 11:56, Arthur Chunqi Li wrote:
 So what about this one. I merged all the exit reason to ret and
 remove the flag detection after vmlaunch/vmresume (because I think
 this detection is useless). Currently we support only one guest, so
 variant launched is located in vmx_run(). If we want to support
 multiple guest, we could move it to some structures (e.g.
 environment_ctxt). Now I just put it here.

 static int vmx_run()
 {
 u32 ret = 0;
 bool launched = 0;

 asm volatile(
 mov %%rsp, %%rsi\n\t
 mov %2, %%edi\n\t
 call vmcs_write\n\t

 0: 
 LOAD_GPR_C
 cmp $0, %1\n\t
 jne 1f\n\t
 vmlaunch\n\t
 SAVE_GPR_C
 /* vmlaunch error, return VMX_TEST_LAUNCH_ERR */
 mov %3, %0\n\t
 jmp 2f\n\t
 1: 
 vmresume\n\t
 SAVE_GPR_C
 /* vmresume error, return VMX_TEST_RESUME_ERR */
 mov %4, %0\n\t
 jmp 2f\n\t

 Where do you store the flags now? You may want to differentiate / test
 if ZF of CF is set.
I store the flags as a global variant. You mean I need to detect ZF/CF
after vmlaunch/vmresume?

Arthur

 Jan

 vmx_return:\n\t
 SAVE_GPR_C
 call exit_handler\n\t
 /* set launched = 1 */
 mov $0x1, %1\n\t
 cmp %5, %%eax\n\t
 je 0b\n\t
 mov %%eax, %0\n\t
 2: 
 : =r(ret), =r(launched)
 : i(HOST_RSP), i(VMX_TEST_LAUNCH_ERR),
   i(VMX_TEST_RESUME_ERR), i(VMX_TEST_RESUME)
 : rax, rbx, rdi, rsi,
   r8, r9, r10, r11, r12, r13, r14, r15,
   memory, cc
 );
 switch (ret) {
 case VMX_TEST_VMEXIT:
 return 0;
 case VMX_TEST_LAUNCH_ERR:
 printf(%s : vmlaunch failed.\n, __func__);
 break;
 case VMX_TEST_RESUME_ERR:
 printf(%s : vmresume failed.\n, __func__);
 break;
 default:
 printf(%s : unhandled ret from exit_handler, ret=%d.\n,
 __func__, ret);
 break;
 }
 return 1;
 }






-- 
Arthur Chunqi Li
Department of Computer Science
School of EECS
Peking University
Beijing, China
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-24 Thread Gleb Natapov
On Wed, Jul 24, 2013 at 12:09:42PM +0200, Alexander Graf wrote:
 
 On 24.07.2013, at 12:01, Gleb Natapov wrote:
 
  Copying Andrea for him to verify that I am not talking nonsense :)
  
  On Wed, Jul 24, 2013 at 10:25:20AM +0200, Alexander Graf wrote:
  diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
  index 1580dd4..5e8635b 100644
  --- a/virt/kvm/kvm_main.c
  +++ b/virt/kvm/kvm_main.c
  @@ -102,6 +102,10 @@ static bool largepages_enabled = true;
  
  bool kvm_is_mmio_pfn(pfn_t pfn)
  {
  +#ifdef CONFIG_MEMORY_HOTPLUG
  
  I'd feel safer if we narrow this down to e500.
  
  +   /*
  +* Currently only in memory hot remove case we may still need 
  this.
  +*/
if (pfn_valid(pfn)) {
  
  We still have to check for pfn_valid, no? So the #ifdef should be down 
  here.
  
int reserved;
struct page *tail = pfn_to_page(pfn);
  @@ -124,6 +128,7 @@ bool kvm_is_mmio_pfn(pfn_t pfn)
}
return PageReserved(tail);
}
  +#endif
  
return true;
  }
  
  Before apply this change:
  
  real(1m19.954s + 1m20.918s + 1m22.740s + 1m21.146s + 1m22.120s)/5= 
  1m21.376s
  user(0m23.181s + 0m23.550s + 0m23.506s + 0m23.410s + 0m23.520s)/5= 
  0m23.433s
  sys   (0m49.087s + 0m49.563s + 0m51.758s + 0m50.290s + 0m51.047s)/5= 
  0m50.349s
  
  After apply this change:
  
  real(1m19.507s + 1m20.919s + 1m21.436s + 1m21.179s + 1m20.293s)/5= 
  1m20.667s
  user(0m22.595s + 0m22.719s + 0m22.484s + 0m22.811s + 0m22.467s)/5= 
  0m22.615s
  sys   (0m48.841s + 0m49.929s + 0m50.310s + 0m49.813s + 0m48.587s)/5= 
  0m49.496s
  
  So,
  
  real(1m20.667s - 1m21.376s)/1m21.376s x 100% = -0.6%
  user(0m22.615s - 0m23.433s)/0m23.433s x 100% = -3.5%
  sys   (0m49.496s - 0m50.349s)/0m50.349s x 100% = -1.7%
  
  Very nice, so there is a real world performance benefit to doing this. 
  Then yes, I think it would make sense to change the global helper function 
  to be fast on e500 and use that one from e500_shadow_mas2_attrib() instead.
  
  Gleb, Paolo, any hard feelings?
  
  I do not see how can we break the function in such a way and get
  away with it. Not all valid pfns point to memory. Physical address can
  be sparse (due to PCI hole, framebuffer or just because).
 
 But we don't check for sparseness today in here either. We merely check for 
 incomplete huge pages.
 
That's not how I read the code. The code checks for reserved flag set.
It should be set on pfns that point to memory holes. As far as I
understand huge page tricks they are there to guaranty that THP does not
change flags under us, Andrea?

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] kvm-unit-tests : Basic architecture of VMX nested test case

2013-07-24 Thread Jan Kiszka
On 2013-07-24 12:16, Arthur Chunqi Li wrote:
 On Wed, Jul 24, 2013 at 6:03 PM, Jan Kiszka jan.kis...@web.de wrote:
 On 2013-07-24 11:56, Arthur Chunqi Li wrote:
 So what about this one. I merged all the exit reason to ret and
 remove the flag detection after vmlaunch/vmresume (because I think
 this detection is useless). Currently we support only one guest, so
 variant launched is located in vmx_run(). If we want to support
 multiple guest, we could move it to some structures (e.g.
 environment_ctxt). Now I just put it here.

 static int vmx_run()
 {
 u32 ret = 0;
 bool launched = 0;

 asm volatile(
 mov %%rsp, %%rsi\n\t
 mov %2, %%edi\n\t
 call vmcs_write\n\t

 0: 
 LOAD_GPR_C
 cmp $0, %1\n\t
 jne 1f\n\t
 vmlaunch\n\t
 SAVE_GPR_C
 /* vmlaunch error, return VMX_TEST_LAUNCH_ERR */
 mov %3, %0\n\t
 jmp 2f\n\t
 1: 
 vmresume\n\t
 SAVE_GPR_C
 /* vmresume error, return VMX_TEST_RESUME_ERR */
 mov %4, %0\n\t
 jmp 2f\n\t

 Where do you store the flags now? You may want to differentiate / test
 if ZF of CF is set.
 I store the flags as a global variant. You mean I need to detect ZF/CF
 after vmlaunch/vmresume?

Yes - if you want to check correct emulation of those instructions
completely.

Jan




signature.asc
Description: OpenPGP digital signature


Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-24 Thread Alexander Graf

On 24.07.2013, at 12:19, Gleb Natapov wrote:

 On Wed, Jul 24, 2013 at 12:09:42PM +0200, Alexander Graf wrote:
 
 On 24.07.2013, at 12:01, Gleb Natapov wrote:
 
 Copying Andrea for him to verify that I am not talking nonsense :)
 
 On Wed, Jul 24, 2013 at 10:25:20AM +0200, Alexander Graf wrote:
 diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
 index 1580dd4..5e8635b 100644
 --- a/virt/kvm/kvm_main.c
 +++ b/virt/kvm/kvm_main.c
 @@ -102,6 +102,10 @@ static bool largepages_enabled = true;
 
 bool kvm_is_mmio_pfn(pfn_t pfn)
 {
 +#ifdef CONFIG_MEMORY_HOTPLUG
 
 I'd feel safer if we narrow this down to e500.
 
 +   /*
 +* Currently only in memory hot remove case we may still need 
 this.
 +*/
  if (pfn_valid(pfn)) {
 
 We still have to check for pfn_valid, no? So the #ifdef should be down 
 here.
 
  int reserved;
  struct page *tail = pfn_to_page(pfn);
 @@ -124,6 +128,7 @@ bool kvm_is_mmio_pfn(pfn_t pfn)
  }
  return PageReserved(tail);
  }
 +#endif
 
  return true;
 }
 
 Before apply this change:
 
 real(1m19.954s + 1m20.918s + 1m22.740s + 1m21.146s + 1m22.120s)/5= 
 1m21.376s
 user(0m23.181s + 0m23.550s + 0m23.506s + 0m23.410s + 0m23.520s)/5= 
 0m23.433s
 sys   (0m49.087s + 0m49.563s + 0m51.758s + 0m50.290s + 0m51.047s)/5= 
 0m50.349s
 
 After apply this change:
 
 real(1m19.507s + 1m20.919s + 1m21.436s + 1m21.179s + 1m20.293s)/5= 
 1m20.667s
 user(0m22.595s + 0m22.719s + 0m22.484s + 0m22.811s + 0m22.467s)/5= 
 0m22.615s
 sys   (0m48.841s + 0m49.929s + 0m50.310s + 0m49.813s + 0m48.587s)/5= 
 0m49.496s
 
 So,
 
 real(1m20.667s - 1m21.376s)/1m21.376s x 100% = -0.6%
 user(0m22.615s - 0m23.433s)/0m23.433s x 100% = -3.5%
 sys   (0m49.496s - 0m50.349s)/0m50.349s x 100% = -1.7%
 
 Very nice, so there is a real world performance benefit to doing this. 
 Then yes, I think it would make sense to change the global helper function 
 to be fast on e500 and use that one from e500_shadow_mas2_attrib() instead.
 
 Gleb, Paolo, any hard feelings?
 
 I do not see how can we break the function in such a way and get
 away with it. Not all valid pfns point to memory. Physical address can
 be sparse (due to PCI hole, framebuffer or just because).
 
 But we don't check for sparseness today in here either. We merely check for 
 incomplete huge pages.
 
 That's not how I read the code. The code checks for reserved flag set.
 It should be set on pfns that point to memory holes. As far as I

I couldn't find any traces of code that sets the reserved bits on e500 chips 
though. I've only seen it getting set for memory hotplug and memory incoherent 
DMA code which doesn't get used on e500.

But I'd be more than happy to get proven wrong :).


Alex

 understand huge page tricks they are there to guaranty that THP does not
 change flags under us, Andrea?
 
 --
   Gleb.
 --
 To unsubscribe from this list: send the line unsubscribe kvm-ppc in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-24 Thread Gleb Natapov
On Wed, Jul 24, 2013 at 12:25:18PM +0200, Alexander Graf wrote:
 
 On 24.07.2013, at 12:19, Gleb Natapov wrote:
 
  On Wed, Jul 24, 2013 at 12:09:42PM +0200, Alexander Graf wrote:
  
  On 24.07.2013, at 12:01, Gleb Natapov wrote:
  
  Copying Andrea for him to verify that I am not talking nonsense :)
  
  On Wed, Jul 24, 2013 at 10:25:20AM +0200, Alexander Graf wrote:
  diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
  index 1580dd4..5e8635b 100644
  --- a/virt/kvm/kvm_main.c
  +++ b/virt/kvm/kvm_main.c
  @@ -102,6 +102,10 @@ static bool largepages_enabled = true;
  
  bool kvm_is_mmio_pfn(pfn_t pfn)
  {
  +#ifdef CONFIG_MEMORY_HOTPLUG
  
  I'd feel safer if we narrow this down to e500.
  
  +   /*
  +* Currently only in memory hot remove case we may still need 
  this.
  +*/
   if (pfn_valid(pfn)) {
  
  We still have to check for pfn_valid, no? So the #ifdef should be down 
  here.
  
   int reserved;
   struct page *tail = pfn_to_page(pfn);
  @@ -124,6 +128,7 @@ bool kvm_is_mmio_pfn(pfn_t pfn)
   }
   return PageReserved(tail);
   }
  +#endif
  
   return true;
  }
  
  Before apply this change:
  
  real(1m19.954s + 1m20.918s + 1m22.740s + 1m21.146s + 1m22.120s)/5= 
  1m21.376s
  user(0m23.181s + 0m23.550s + 0m23.506s + 0m23.410s + 0m23.520s)/5= 
  0m23.433s
  sys (0m49.087s + 0m49.563s + 0m51.758s + 0m50.290s + 0m51.047s)/5= 
  0m50.349s
  
  After apply this change:
  
  real(1m19.507s + 1m20.919s + 1m21.436s + 1m21.179s + 1m20.293s)/5= 
  1m20.667s
  user(0m22.595s + 0m22.719s + 0m22.484s + 0m22.811s + 0m22.467s)/5= 
  0m22.615s
  sys (0m48.841s + 0m49.929s + 0m50.310s + 0m49.813s + 0m48.587s)/5= 
  0m49.496s
  
  So,
  
  real(1m20.667s - 1m21.376s)/1m21.376s x 100% = -0.6%
  user(0m22.615s - 0m23.433s)/0m23.433s x 100% = -3.5%
  sys (0m49.496s - 0m50.349s)/0m50.349s x 100% = -1.7%
  
  Very nice, so there is a real world performance benefit to doing this. 
  Then yes, I think it would make sense to change the global helper 
  function to be fast on e500 and use that one from 
  e500_shadow_mas2_attrib() instead.
  
  Gleb, Paolo, any hard feelings?
  
  I do not see how can we break the function in such a way and get
  away with it. Not all valid pfns point to memory. Physical address can
  be sparse (due to PCI hole, framebuffer or just because).
  
  But we don't check for sparseness today in here either. We merely check 
  for incomplete huge pages.
  
  That's not how I read the code. The code checks for reserved flag set.
  It should be set on pfns that point to memory holes. As far as I
 
 I couldn't find any traces of code that sets the reserved bits on e500 chips 
 though. I've only seen it getting set for memory hotplug and memory 
 incoherent DMA code which doesn't get used on e500.
 
 But I'd be more than happy to get proven wrong :).
 
Can you write a module that scans all page structures? AFAIK all pages
are marked as reserved and then those that become regular memory are
marked as unreserved. Hope Andrea will chime in here :)

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor

2013-07-24 Thread Gleb Natapov
On Wed, Jul 24, 2013 at 03:15:50PM +0530, Raghavendra K T wrote:
 On 07/23/2013 08:37 PM, Gleb Natapov wrote:
 On Mon, Jul 22, 2013 at 11:50:16AM +0530, Raghavendra K T wrote:
 +static void kvm_lock_spinning(struct arch_spinlock *lock, __ticket_t want)
 [...]
 +
 +   /*
 +* halt until it's our turn and kicked. Note that we do safe halt
 +* for irq enabled case to avoid hang when lock info is overwritten
 +* in irq spinlock slowpath and no spurious interrupt occur to save us.
 +*/
 +   if (arch_irqs_disabled_flags(flags))
 +   halt();
 +   else
 +   safe_halt();
 +
 +out:
 So here now interrupts can be either disabled or enabled. Previous
 version disabled interrupts here, so are we sure it is safe to have them
 enabled at this point? I do not see any problem yet, will keep thinking.
 
 If we enable interrupt here, then
 
 
 +   cpumask_clear_cpu(cpu, waiting_cpus);
 
 and if we start serving lock for an interrupt that came here,
 cpumask clear and w-lock=null may not happen atomically.
 if irq spinlock does not take slow path we would have non null value
 for lock, but with no information in waitingcpu.
 
 I am still thinking what would be problem with that.
 
Exactly, for kicker waiting_cpus and w-lock updates are
non atomic anyway.

 +   w-lock = NULL;
 +   local_irq_restore(flags);
 +   spin_time_accum_blocked(start);
 +}
 +PV_CALLEE_SAVE_REGS_THUNK(kvm_lock_spinning);
 +
 +/* Kick vcpu waiting on @lock-head to reach value @ticket */
 +static void kvm_unlock_kick(struct arch_spinlock *lock, __ticket_t ticket)
 +{
 +   int cpu;
 +
 +   add_stats(RELEASED_SLOW, 1);
 +   for_each_cpu(cpu, waiting_cpus) {
 +   const struct kvm_lock_waiting *w = per_cpu(lock_waiting, cpu);
 +   if (ACCESS_ONCE(w-lock) == lock 
 +   ACCESS_ONCE(w-want) == ticket) {
 +   add_stats(RELEASED_SLOW_KICKED, 1);
 +   kvm_kick_cpu(cpu);
 What about using NMI to wake sleepers? I think it was discussed, but
 forgot why it was dismissed.
 
 I think I have missed that discussion. 'll go back and check. so
 what is the idea here? we can easily wake up the halted vcpus that
 have interrupt disabled?
We can of course. IIRC the objection was that NMI handling path is very
fragile and handling NMI on each wakeup will be more expensive then
waking up a guest without injecting an event, but it is still interesting
to see the numbers.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] kvm-unit-tests : Basic architecture of VMX nested test case

2013-07-24 Thread Arthur Chunqi Li
And what about this version:

static int vmx_run()
{
u32 ret = 0;

asm volatile(
mov %%rsp, %%rsi\n\t
mov %2, %%edi\n\t
call vmcs_write\n\t

0: 
LOAD_GPR_C
cmpl $0, %1\n\t
jne 1f\n\t
vmlaunch;seta %1\n\t
/* vmlaunch error, return VMX_TEST_LAUNCH_ERR */
movl %3, %0\n\t
SAVE_GPR_C
jmp 2f\n\t
1: 
vmresume;seta %1\n\t
/* vmresume error, return VMX_TEST_RESUME_ERR */
movl %4, %0\n\t
SAVE_GPR_C
jmp 2f\n\t

vmx_return:\n\t
SAVE_GPR_C
call exit_handler\n\t
/* set launched = 1 */
movl $0x1, %1\n\t
/* jump to resume when VMX_TEST_RESUME */
cmp %5, %%eax\n\t
je 0b\n\t
mov %%eax, %0\n\t
2: 
: =m(ret), =m(launched)
: i(HOST_RSP), i(VMX_TEST_LAUNCH_ERR),
  i(VMX_TEST_RESUME_ERR), i(VMX_TEST_RESUME)
: rax, rbx, rdi, rsi,
 r8, r9, r10, r11, r12, r13, r14, r15,
  memory, cc
);
switch (ret) {
case VMX_TEST_VMEXIT:
launched = 0;
return 0;
case VMX_TEST_LAUNCH_ERR:
printf(%s : vmlaunch failed.\n, __func__);
if (launched != 0)
printf(\tvmlaunch set flags error\n);
report(test vmlaunch, 0);
break;
case VMX_TEST_RESUME_ERR:
printf(%s : vmresume failed.\n, __func__);
if (launched != 0)
printf(\tvmlaunch set flags error\n);
report(test vmresume, 0);
break;
default:
launched = 0;
printf(%s : unhandled ret from exit_handler, ret=%d.\n,
__func__, ret);
break;
}
return 1;
}

On Wed, Jul 24, 2013 at 6:24 PM, Jan Kiszka jan.kis...@web.de wrote:
 On 2013-07-24 12:16, Arthur Chunqi Li wrote:
 On Wed, Jul 24, 2013 at 6:03 PM, Jan Kiszka jan.kis...@web.de wrote:
 On 2013-07-24 11:56, Arthur Chunqi Li wrote:
 So what about this one. I merged all the exit reason to ret and
 remove the flag detection after vmlaunch/vmresume (because I think
 this detection is useless). Currently we support only one guest, so
 variant launched is located in vmx_run(). If we want to support
 multiple guest, we could move it to some structures (e.g.
 environment_ctxt). Now I just put it here.

 static int vmx_run()
 {
 u32 ret = 0;
 bool launched = 0;

 asm volatile(
 mov %%rsp, %%rsi\n\t
 mov %2, %%edi\n\t
 call vmcs_write\n\t

 0: 
 LOAD_GPR_C
 cmp $0, %1\n\t
 jne 1f\n\t
 vmlaunch\n\t
 SAVE_GPR_C
 /* vmlaunch error, return VMX_TEST_LAUNCH_ERR */
 mov %3, %0\n\t
 jmp 2f\n\t
 1: 
 vmresume\n\t
 SAVE_GPR_C
 /* vmresume error, return VMX_TEST_RESUME_ERR */
 mov %4, %0\n\t
 jmp 2f\n\t

 Where do you store the flags now? You may want to differentiate / test
 if ZF of CF is set.
 I store the flags as a global variant. You mean I need to detect ZF/CF
 after vmlaunch/vmresume?

 Yes - if you want to check correct emulation of those instructions
 completely.

 Jan





-- 
Arthur Chunqi Li
Department of Computer Science
School of EECS
Peking University
Beijing, China
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] kvm-unit-tests : Basic architecture of VMX nested test case

2013-07-24 Thread Jan Kiszka
On 2013-07-24 13:20, Arthur Chunqi Li wrote:
 And what about this version:
 
 static int vmx_run()
 {
 u32 ret = 0;
 
 asm volatile(
 mov %%rsp, %%rsi\n\t
 mov %2, %%edi\n\t
 call vmcs_write\n\t
 
 0: 
 LOAD_GPR_C
 cmpl $0, %1\n\t
 jne 1f\n\t
 vmlaunch;seta %1\n\t

That doesn't differentiate between CF and ZF (so that you can check if
vmlaunch set the right flag).

Also, only one instruction per line, please.

Jan




signature.asc
Description: OpenPGP digital signature


Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor

2013-07-24 Thread Raghavendra K T

On 07/24/2013 04:09 PM, Gleb Natapov wrote:

On Wed, Jul 24, 2013 at 03:15:50PM +0530, Raghavendra K T wrote:

On 07/23/2013 08:37 PM, Gleb Natapov wrote:

On Mon, Jul 22, 2013 at 11:50:16AM +0530, Raghavendra K T wrote:

+static void kvm_lock_spinning(struct arch_spinlock *lock, __ticket_t want)

[...]

+
+   /*
+* halt until it's our turn and kicked. Note that we do safe halt
+* for irq enabled case to avoid hang when lock info is overwritten
+* in irq spinlock slowpath and no spurious interrupt occur to save us.
+*/
+   if (arch_irqs_disabled_flags(flags))
+   halt();
+   else
+   safe_halt();
+
+out:

So here now interrupts can be either disabled or enabled. Previous
version disabled interrupts here, so are we sure it is safe to have them
enabled at this point? I do not see any problem yet, will keep thinking.


If we enable interrupt here, then



+   cpumask_clear_cpu(cpu, waiting_cpus);


and if we start serving lock for an interrupt that came here,
cpumask clear and w-lock=null may not happen atomically.
if irq spinlock does not take slow path we would have non null value
for lock, but with no information in waitingcpu.

I am still thinking what would be problem with that.


Exactly, for kicker waiting_cpus and w-lock updates are
non atomic anyway.


+   w-lock = NULL;
+   local_irq_restore(flags);
+   spin_time_accum_blocked(start);
+}
+PV_CALLEE_SAVE_REGS_THUNK(kvm_lock_spinning);
+
+/* Kick vcpu waiting on @lock-head to reach value @ticket */
+static void kvm_unlock_kick(struct arch_spinlock *lock, __ticket_t ticket)
+{
+   int cpu;
+
+   add_stats(RELEASED_SLOW, 1);
+   for_each_cpu(cpu, waiting_cpus) {
+   const struct kvm_lock_waiting *w = per_cpu(lock_waiting, cpu);
+   if (ACCESS_ONCE(w-lock) == lock 
+   ACCESS_ONCE(w-want) == ticket) {
+   add_stats(RELEASED_SLOW_KICKED, 1);
+   kvm_kick_cpu(cpu);

What about using NMI to wake sleepers? I think it was discussed, but
forgot why it was dismissed.


I think I have missed that discussion. 'll go back and check. so
what is the idea here? we can easily wake up the halted vcpus that
have interrupt disabled?

We can of course. IIRC the objection was that NMI handling path is very
fragile and handling NMI on each wakeup will be more expensive then
waking up a guest without injecting an event, but it is still interesting
to see the numbers.



Haam, now I remember, We had tried request based mechanism. (new
request like REQ_UNHALT) and process that. It had worked, but had some
complex hacks in vcpu_enter_guest to avoid guest hang in case of
request cleared.  So had left it there..

https://lkml.org/lkml/2012/4/30/67

But I do not remember performance impact though.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor

2013-07-24 Thread Gleb Natapov
On Wed, Jul 24, 2013 at 05:30:20PM +0530, Raghavendra K T wrote:
 On 07/24/2013 04:09 PM, Gleb Natapov wrote:
 On Wed, Jul 24, 2013 at 03:15:50PM +0530, Raghavendra K T wrote:
 On 07/23/2013 08:37 PM, Gleb Natapov wrote:
 On Mon, Jul 22, 2013 at 11:50:16AM +0530, Raghavendra K T wrote:
 +static void kvm_lock_spinning(struct arch_spinlock *lock, __ticket_t 
 want)
 [...]
 +
 + /*
 +  * halt until it's our turn and kicked. Note that we do safe halt
 +  * for irq enabled case to avoid hang when lock info is overwritten
 +  * in irq spinlock slowpath and no spurious interrupt occur to save us.
 +  */
 + if (arch_irqs_disabled_flags(flags))
 + halt();
 + else
 + safe_halt();
 +
 +out:
 So here now interrupts can be either disabled or enabled. Previous
 version disabled interrupts here, so are we sure it is safe to have them
 enabled at this point? I do not see any problem yet, will keep thinking.
 
 If we enable interrupt here, then
 
 
 + cpumask_clear_cpu(cpu, waiting_cpus);
 
 and if we start serving lock for an interrupt that came here,
 cpumask clear and w-lock=null may not happen atomically.
 if irq spinlock does not take slow path we would have non null value
 for lock, but with no information in waitingcpu.
 
 I am still thinking what would be problem with that.
 
 Exactly, for kicker waiting_cpus and w-lock updates are
 non atomic anyway.
 
 + w-lock = NULL;
 + local_irq_restore(flags);
 + spin_time_accum_blocked(start);
 +}
 +PV_CALLEE_SAVE_REGS_THUNK(kvm_lock_spinning);
 +
 +/* Kick vcpu waiting on @lock-head to reach value @ticket */
 +static void kvm_unlock_kick(struct arch_spinlock *lock, __ticket_t 
 ticket)
 +{
 + int cpu;
 +
 + add_stats(RELEASED_SLOW, 1);
 + for_each_cpu(cpu, waiting_cpus) {
 + const struct kvm_lock_waiting *w = per_cpu(lock_waiting, cpu);
 + if (ACCESS_ONCE(w-lock) == lock 
 + ACCESS_ONCE(w-want) == ticket) {
 + add_stats(RELEASED_SLOW_KICKED, 1);
 + kvm_kick_cpu(cpu);
 What about using NMI to wake sleepers? I think it was discussed, but
 forgot why it was dismissed.
 
 I think I have missed that discussion. 'll go back and check. so
 what is the idea here? we can easily wake up the halted vcpus that
 have interrupt disabled?
 We can of course. IIRC the objection was that NMI handling path is very
 fragile and handling NMI on each wakeup will be more expensive then
 waking up a guest without injecting an event, but it is still interesting
 to see the numbers.
 
 
 Haam, now I remember, We had tried request based mechanism. (new
 request like REQ_UNHALT) and process that. It had worked, but had some
 complex hacks in vcpu_enter_guest to avoid guest hang in case of
 request cleared.  So had left it there..
 
 https://lkml.org/lkml/2012/4/30/67
 
 But I do not remember performance impact though.
No, this is something different. Wakeup with NMI does not need KVM changes at
all. Instead of kvm_kick_cpu(cpu) in kvm_unlock_kick you send NMI IPI.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: VU#976534 - How to submit security bugs?

2013-07-24 Thread Anthony Liguori
CERT(R) Coordination Center c...@cert.org writes:

 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Greetings,
   My name is Adam Rauf and I work for the CERT Coordination Center.  We
 have a report that may affect KVM/QEMU.  How can we securely send it over to
 you?  Thanks so much!

For QEMU bugs, please file a bug in Launchpad and mark it as a security
bug.  That will appropriately limit visibility.

http://launchpad.net/qemu

If you want to contact me directly, my public key is:

http://www.codemonkey.ws/files/aliguori.pub

You can verify that this key is what is used to sign QEMU releases at:

http://wiki.qemu.org/Download

Regards,

Anthony Liguori


 Adam Rauf
 Software Engineering Institute
 CERT Vulnerability Analysis Team

 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.5 (GNU/Linux)

 iQEVAwUBUe2DstXCAanP4MNyAQI8nwf/eTb1Qox5lmgMHifDKRjj69E37FW+o5Jp
 KMIP6+IgKdWQizPctXk2Gae50a+ioaXgkCGZZ7SwNJ9iE/AX2I32QvX6pZrDCBGw
 l5Ht6UiwOLUTP3sKWO9AIYcgTDABzyNE2+bCGvDz8aqwLB8NNVqQ50f46TrQNlmB
 oiG+XzskRG0BAxKTwWc8f4v+1hdqMtp811I7XmxXkAdtlmTWPHZfPiFs0dS++Puh
 T0uLuC4nDo83hP6Yv8seMZKZZApFGfR+q4qKx7f6riNsa5v1zGgW2if++u+zRKvg
 DvjLxjtRfE9JGmCZMBcFmRJ5y4Wx/m/2wtj2+a7D/D2Hd9L5LRB0lA==
 =npI1
 -END PGP SIGNATURE-
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH RFC V11 15/18] kvm : Paravirtual ticketlocks support for linux guests running on KVM hypervisor

2013-07-24 Thread Raghavendra K T

On 07/24/2013 05:36 PM, Gleb Natapov wrote:

On Wed, Jul 24, 2013 at 05:30:20PM +0530, Raghavendra K T wrote:

On 07/24/2013 04:09 PM, Gleb Natapov wrote:

On Wed, Jul 24, 2013 at 03:15:50PM +0530, Raghavendra K T wrote:

On 07/23/2013 08:37 PM, Gleb Natapov wrote:

On Mon, Jul 22, 2013 at 11:50:16AM +0530, Raghavendra K T wrote:

+static void kvm_lock_spinning(struct arch_spinlock *lock, __ticket_t want)

[...]

+
+   /*
+* halt until it's our turn and kicked. Note that we do safe halt
+* for irq enabled case to avoid hang when lock info is overwritten
+* in irq spinlock slowpath and no spurious interrupt occur to save us.
+*/
+   if (arch_irqs_disabled_flags(flags))
+   halt();
+   else
+   safe_halt();
+
+out:

So here now interrupts can be either disabled or enabled. Previous
version disabled interrupts here, so are we sure it is safe to have them
enabled at this point? I do not see any problem yet, will keep thinking.


If we enable interrupt here, then



+   cpumask_clear_cpu(cpu, waiting_cpus);


and if we start serving lock for an interrupt that came here,
cpumask clear and w-lock=null may not happen atomically.
if irq spinlock does not take slow path we would have non null value
for lock, but with no information in waitingcpu.

I am still thinking what would be problem with that.


Exactly, for kicker waiting_cpus and w-lock updates are
non atomic anyway.


+   w-lock = NULL;
+   local_irq_restore(flags);
+   spin_time_accum_blocked(start);
+}
+PV_CALLEE_SAVE_REGS_THUNK(kvm_lock_spinning);
+
+/* Kick vcpu waiting on @lock-head to reach value @ticket */
+static void kvm_unlock_kick(struct arch_spinlock *lock, __ticket_t ticket)
+{
+   int cpu;
+
+   add_stats(RELEASED_SLOW, 1);
+   for_each_cpu(cpu, waiting_cpus) {
+   const struct kvm_lock_waiting *w = per_cpu(lock_waiting, cpu);
+   if (ACCESS_ONCE(w-lock) == lock 
+   ACCESS_ONCE(w-want) == ticket) {
+   add_stats(RELEASED_SLOW_KICKED, 1);
+   kvm_kick_cpu(cpu);

What about using NMI to wake sleepers? I think it was discussed, but
forgot why it was dismissed.


I think I have missed that discussion. 'll go back and check. so
what is the idea here? we can easily wake up the halted vcpus that
have interrupt disabled?

We can of course. IIRC the objection was that NMI handling path is very
fragile and handling NMI on each wakeup will be more expensive then
waking up a guest without injecting an event, but it is still interesting
to see the numbers.



Haam, now I remember, We had tried request based mechanism. (new
request like REQ_UNHALT) and process that. It had worked, but had some
complex hacks in vcpu_enter_guest to avoid guest hang in case of
request cleared.  So had left it there..

https://lkml.org/lkml/2012/4/30/67

But I do not remember performance impact though.

No, this is something different. Wakeup with NMI does not need KVM changes at
all. Instead of kvm_kick_cpu(cpu) in kvm_unlock_kick you send NMI IPI.



True. It was not NMI.
just to confirm, are you talking about something like this to be tried ?

apic-send_IPI_mask(cpumask_of(cpu), APIC_DM_NMI);

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[ANNOUNCE] Key Signing Party at KVM Forum 2013

2013-07-24 Thread Anthony Liguori

I will be hosting a key signing party at this year's KVM Forum.

http://wiki.qemu.org/KeySigningParty2013

Starting for the 1.7 release (begins in December), I will only accepted
signed pull requests so please try to attend this event or make
alternative arrangements to have someone sign your key who will attend
the event.

I will also be attending LinuxCon/CloudOpen/Plumbers North America if
anyone wants to have another key signing party at that event and cannot
attend KVM Forum.

Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Cannot load kvm_amd module - (says disabled by bios)

2013-07-24 Thread Alex Williamson
On Wed, 2013-07-24 at 11:09 +0200, Massimiliano Adamo wrote:
 
 On Wed, 2013-07-24 at 10:04 +0200, Massimiliano Adamo wrote:
  
  On Tue, 2013-07-23 at 15:31 -0600, Alex Williamson wrote:
   On Tue, 2013-07-23 at 23:11 +0200, Massimiliano Adamo wrote:
All, 

this is a bug with KVM, impacting (at least) all mainstream kernels that
I tried so far: 3.2, 3.3, 3.4, 3.5, 3.8, 3.10 and 3.11

This is the link of the downstream bug:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1201092

- Firt of all I mention that this bug has been raised also for Fedora
18.
Here is the link: https://bugzilla.redhat.com/show_bug.cgi?id=978608

- I am running Ubuntu Raring (with the kernel 3.8.0-27-generic), but
I've also tried the mainstream kernel (without Ubuntu patches).

- It happens with the following CPU: AMD E-350D

- The kvm-ok executable says that the system is capable of running KVM,
but it also says the it's disabled in the BIOS.

- This is the out put of kvm-ok:
# kvm-ok
INFO: /dev/kvm does not exist
HINT: sudo modprobe kvm_amd
INFO: Your CPU supports KVM extensions
INFO: KVM (svm) is disabled by your BIOS
HINT: Enter your BIOS setup and enable Virtualization Technology (VT),
  and then hard poweroff/poweron your system
KVM acceleration can NOT be used

- This is what modprobe kvm-amd says:
# modprobe kvm-amd
ERROR: could not insert 'kvm_amd': Operation not supported
root@yasna:~# dmesg |tail -n1
[ 2542.263745] kvm: disabled by bios

- AMD-V extension on Virtualbox works correctly.
Therefore the extension works properly but it's not recognized by KVM.
   
   What evidence do you have that Virtualbox is actually making use of
   AMD-V?
   
  
  Hi! That's a good point. 
  Right now I can only say it was selectable from the GUI. Therefore I
  just imagined it's recognized.  
  Now I try to see how to enable/check logs on Virtualbox and I'll get
  back to you with some feedback.
  
 
 
 Hi,
 
 logs are enabled by default and they say that the extension might be
 disabled even for Virtualbox:
 
 00:00:01.406771 SVM - AMD VM Extensions= 0 (1)
 00:00:01.406945 HWACCM: No VT-x or AMD-V CPU extension found. Reason
 VERR_SVM_DISABLED
 
 (therefore, it let's you enable the AMD-V feature, because it's found on
 the CPU but it's not able to use it). 
 
 And I've found a suggestion to run the commands below: 
 # rdmsr -f 12:12 0xc080 # If set (1), SVME is enabled.
 0
 #
 # rdmsr -f 4:4 0xc0010114 # If set (1), it is disabled.
 1
 
 If I will have time, I would try installing Windows on a spare
 partition, and then try running virtualbox there (and check logs). 
 If you have a better advice, or any other idea, please just let me
 know. 
 
 p.s.: CC-ing the person who is facing the same issue with Fedora.


It looks like your BIOS is broken, contact the vendor.  You might try
setting it to disabled in the BIOS to make sure it's not wired
backwards.  There also used to be some FUD about fully powering off the
system around making changes to these settings.  You might try
unplugging the system for a few seconds and testing from there.  Thanks,

Alex


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 4/4] x86: properly handle kvm emulation of hyperv

2013-07-24 Thread KY Srinivasan


 -Original Message-
 From: Paolo Bonzini [mailto:paolo.bonz...@gmail.com] On Behalf Of Paolo
 Bonzini
 Sent: Wednesday, July 24, 2013 3:07 AM
 To: Jason Wang
 Cc: H. Peter Anvin; KY Srinivasan; t...@linutronix.de; mi...@redhat.com;
 x...@kernel.org; g...@redhat.com; kvm@vger.kernel.org; linux-
 ker...@vger.kernel.org
 Subject: Re: [PATCH 4/4] x86: properly handle kvm emulation of hyperv
 
 Il 24/07/2013 08:54, Jason Wang ha scritto:
  On 07/24/2013 12:48 PM, H. Peter Anvin wrote:
  On 07/23/2013 09:37 PM, Jason Wang wrote:
  On 07/23/2013 10:48 PM, H. Peter Anvin wrote:
  On 07/23/2013 06:55 AM, KY Srinivasan wrote:
  This strategy of hypervisor detection based on some detection order
 IMHO is not
  a robust detection strategy. The current scheme works since the only
 hypervisor emulated
  (by other hypervisors happens to be Hyper-V). What if this were to
 change.
 
  One strategy would be to pick the *last* one in the CPUID list, since
  the ones before it are logically the one(s) being emulated...
 
   -hpa
 
  How about simply does a reverse loop from 0x4001 to 0x4001?
 
  Not all systems like being poked too far into hyperspace.  Just remember
  the last match and walk the list.
 
 -hpa
 
 
  Ok, but it raises a question - how to know it was the 'last' match
  without knowing all signatures of other hyper-visor?
 
 You can return a priority value from the .detect function.  The
 priority value can simply be the CPUID leaf where the signature was
 found (or a low value such as 1 if detection was done with DMI).
 
 Then you can pick the hypervisor with the highest priority instead of
 hard-coding the order.

I like this idea; this allows some guest level control that is what we want
when we have hypervisors emulating each other.


Regards,

K. Y 
 
 Paolo
 
 



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] intel-iommu: Fix leaks in pagetable freeing

2013-07-24 Thread Alex Williamson

This is a pretty massive memory leak, anyone @Intel care?  Thanks,

Alex

On Sat, 2013-06-15 at 10:27 -0600, Alex Williamson wrote:
 At best the current code only seems to free the leaf pagetables and
 the root.  If you're unlucky enough to have a large gap (like any
 QEMU guest with more than 3G of memory), only the first chunk of leaf
 pagetables are freed (plus the root).  This is a massive memory leak.
 This patch re-writes the pagetable freeing function to use a
 recursive algorithm and manages to not only free all the pagetables,
 but does it without any apparent performance loss versus the current
 broken version.
 
 Signed-off-by: Alex Williamson alex.william...@redhat.com
 Cc: sta...@vger.kernel.org
 ---
 
 Suggesting for stable, would like to see some soak time, but it's
 hard to imagine this being any worse than the current code.
 
 This likely also affects device domains, but the current code does
 ok at freeing individual leaf pagetables and driver domains would
 only get a full pruning if the driver or device is removed.
 
 Some test programs:
 https://github.com/awilliam/tests/blob/master/kvm-huge-guest-test.c
 https://github.com/awilliam/tests/blob/master/vfio-huge-guest-test.c
 
 Both of these simulate a large guest on a small host system.  They
 mmap 4G of memory and map it across a large address space just like
 QEMU would (aside from re-using the same mmap across multiple IOVAs).
 On existing code the vfio version (w/o a KVM memory slot limit) will
 leak over 1G of pagetables per run.
 
  drivers/iommu/intel-iommu.c |   72 
 +--
  1 file changed, 35 insertions(+), 37 deletions(-)
 
 diff --git a/drivers/iommu/intel-iommu.c b/drivers/iommu/intel-iommu.c
 index eec0d3e..15e9b57 100644
 --- a/drivers/iommu/intel-iommu.c
 +++ b/drivers/iommu/intel-iommu.c
 @@ -890,56 +890,54 @@ static int dma_pte_clear_range(struct dmar_domain 
 *domain,
   return order;
  }
  
 +static void dma_pte_free_level(struct dmar_domain *domain, int level,
 +struct dma_pte *pte, unsigned long pfn,
 +unsigned long start_pfn, unsigned long last_pfn)
 +{
 + pfn = max(start_pfn, pfn);
 + pte = pte[pfn_level_offset(pfn, level)];
 +
 + do {
 + unsigned long level_pfn;
 + struct dma_pte *level_pte;
 +
 + if (!dma_pte_present(pte) || dma_pte_superpage(pte))
 + goto next;
 +
 + level_pfn = pfn  level_mask(level - 1);
 + level_pte = phys_to_virt(dma_pte_addr(pte));
 +
 + if (level  2)
 + dma_pte_free_level(domain, level - 1, level_pte,
 +level_pfn, start_pfn, last_pfn);
 +
 + /* If range covers entire pagetable, free it */
 + if (!(start_pfn  level_pfn ||
 +   last_pfn  level_pfn + level_size(level))) {
 + dma_clear_pte(pte);
 + domain_flush_cache(domain, pte, sizeof(*pte));
 + free_pgtable_page(level_pte);
 + }
 +next:
 + pfn += level_size(level);
 + } while (!first_pte_in_page(++pte)  pfn = last_pfn);
 +}
 +
  /* free page table pages. last level pte should already be cleared */
  static void dma_pte_free_pagetable(struct dmar_domain *domain,
  unsigned long start_pfn,
  unsigned long last_pfn)
  {
   int addr_width = agaw_to_width(domain-agaw) - VTD_PAGE_SHIFT;
 - struct dma_pte *first_pte, *pte;
 - int total = agaw_to_level(domain-agaw);
 - int level;
 - unsigned long tmp;
 - int large_page = 2;
  
   BUG_ON(addr_width  BITS_PER_LONG  start_pfn  addr_width);
   BUG_ON(addr_width  BITS_PER_LONG  last_pfn  addr_width);
   BUG_ON(start_pfn  last_pfn);
  
   /* We don't need lock here; nobody else touches the iova range */
 - level = 2;
 - while (level = total) {
 - tmp = align_to_level(start_pfn, level);
 -
 - /* If we can't even clear one PTE at this level, we're done */
 - if (tmp + level_size(level) - 1  last_pfn)
 - return;
 -
 - do {
 - large_page = level;
 - first_pte = pte = dma_pfn_level_pte(domain, tmp, level, 
 large_page);
 - if (large_page  level)
 - level = large_page + 1;
 - if (!pte) {
 - tmp = align_to_level(tmp + 1, level + 1);
 - continue;
 - }
 - do {
 - if (dma_pte_present(pte)) {
 - 
 free_pgtable_page(phys_to_virt(dma_pte_addr(pte)));
 - dma_clear_pte(pte);
 - }
 - pte++;
 - 

Re: Linux Plumbers ACPI/PM, PCI Microconference

2013-07-24 Thread Shuah Khan
On Tue, Jul 23, 2013 at 10:06 AM, Bjorn Helgaas bhelg...@google.com wrote:
 On Tue, Jul 16, 2013 at 8:21 PM, Myron Stowe myron.st...@gmail.com wrote:
 Linux Plumbers has approved an ACPI/PM, PCI microconference. The
 overview page is here:

 http://wiki.linuxplumbersconf.org/2013:pci_subsystem

 We would like to start receiving volunteers for presenting topics of
 interest.  There is a lot of activity in these subsystems so please
 respond by submitting presentation or discussion proposals that you
 would be willing to cover for consideration.  You should also feel
 free to submit ideas as proposals that others could cover.

 Somebody else privately suggested these ARM/PCI topics:

 - how it's been cleaned up to be more common across arm-32 machines
 - dependencies on dtb
 - future use/dependency(s) on ACPI
 - hotplug support issues (if any; acpi (host-bridge) vs whatever ARM uses 
 ?)

 and I'm interested in those as well.  There seems to be a lot of
 activity in that area.  In addition, I'm interested in ARM IOMMU
 support.  I see drivers/iommu/arm-smmu.c, but it's brand-new and I
 haven't looked at it at all.


I second this topic. I am interested in ARM/iommu space also.

thanks,
-- Shuah
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 4/4] x86: properly handle kvm emulation of hyperv

2013-07-24 Thread H. Peter Anvin
I don't see how this solves the A emulates B, B emulates A problem?

KY Srinivasan k...@microsoft.com wrote:


 -Original Message-
 From: Paolo Bonzini [mailto:paolo.bonz...@gmail.com] On Behalf Of
Paolo
 Bonzini
 Sent: Wednesday, July 24, 2013 3:07 AM
 To: Jason Wang
 Cc: H. Peter Anvin; KY Srinivasan; t...@linutronix.de;
mi...@redhat.com;
 x...@kernel.org; g...@redhat.com; kvm@vger.kernel.org; linux-
 ker...@vger.kernel.org
 Subject: Re: [PATCH 4/4] x86: properly handle kvm emulation of hyperv
 
 Il 24/07/2013 08:54, Jason Wang ha scritto:
  On 07/24/2013 12:48 PM, H. Peter Anvin wrote:
  On 07/23/2013 09:37 PM, Jason Wang wrote:
  On 07/23/2013 10:48 PM, H. Peter Anvin wrote:
  On 07/23/2013 06:55 AM, KY Srinivasan wrote:
  This strategy of hypervisor detection based on some detection
order
 IMHO is not
  a robust detection strategy. The current scheme works since the
only
 hypervisor emulated
  (by other hypervisors happens to be Hyper-V). What if this were
to
 change.
 
  One strategy would be to pick the *last* one in the CPUID list,
since
  the ones before it are logically the one(s) being emulated...
 
  -hpa
 
  How about simply does a reverse loop from 0x4001 to
0x4001?
 
  Not all systems like being poked too far into hyperspace.  Just
remember
  the last match and walk the list.
 
-hpa
 
 
  Ok, but it raises a question - how to know it was the 'last' match
  without knowing all signatures of other hyper-visor?
 
 You can return a priority value from the .detect function.  The
 priority value can simply be the CPUID leaf where the signature was
 found (or a low value such as 1 if detection was done with DMI).
 
 Then you can pick the hypervisor with the highest priority instead of
 hard-coding the order.

I like this idea; this allows some guest level control that is what we
want
when we have hypervisors emulating each other.


Regards,

K. Y 
 
 Paolo
 
 

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 4/4] x86: properly handle kvm emulation of hyperv

2013-07-24 Thread KY Srinivasan


 -Original Message-
 From: H. Peter Anvin [mailto:h...@zytor.com]
 Sent: Wednesday, July 24, 2013 11:14 AM
 To: KY Srinivasan; Paolo Bonzini; Jason Wang
 Cc: t...@linutronix.de; mi...@redhat.com; x...@kernel.org; g...@redhat.com;
 kvm@vger.kernel.org; linux-ker...@vger.kernel.org
 Subject: RE: [PATCH 4/4] x86: properly handle kvm emulation of hyperv
 
 I don't see how this solves the A emulates B, B emulates A problem?

As Paolo suggested if there were some priority encoded, the guest could make an
informed decision. If the guest under question can run on both hypervisors A 
and B,
we would rather the guest discover hypervisor A when running on A and 
hypervisor B
when running on B. The priority encoding could be as simple as surfacing the 
native hypervisor
signature earlier in the CPUID space.

K. Y
 
 KY Srinivasan k...@microsoft.com wrote:
 
 
  -Original Message-
  From: Paolo Bonzini [mailto:paolo.bonz...@gmail.com] On Behalf Of
 Paolo
  Bonzini
  Sent: Wednesday, July 24, 2013 3:07 AM
  To: Jason Wang
  Cc: H. Peter Anvin; KY Srinivasan; t...@linutronix.de;
 mi...@redhat.com;
  x...@kernel.org; g...@redhat.com; kvm@vger.kernel.org; linux-
  ker...@vger.kernel.org
  Subject: Re: [PATCH 4/4] x86: properly handle kvm emulation of hyperv
 
  Il 24/07/2013 08:54, Jason Wang ha scritto:
   On 07/24/2013 12:48 PM, H. Peter Anvin wrote:
   On 07/23/2013 09:37 PM, Jason Wang wrote:
   On 07/23/2013 10:48 PM, H. Peter Anvin wrote:
   On 07/23/2013 06:55 AM, KY Srinivasan wrote:
   This strategy of hypervisor detection based on some detection
 order
  IMHO is not
   a robust detection strategy. The current scheme works since the
 only
  hypervisor emulated
   (by other hypervisors happens to be Hyper-V). What if this were
 to
  change.
  
   One strategy would be to pick the *last* one in the CPUID list,
 since
   the ones before it are logically the one(s) being emulated...
  
 -hpa
  
   How about simply does a reverse loop from 0x4001 to
 0x4001?
  
   Not all systems like being poked too far into hyperspace.  Just
 remember
   the last match and walk the list.
  
   -hpa
  
  
   Ok, but it raises a question - how to know it was the 'last' match
   without knowing all signatures of other hyper-visor?
 
  You can return a priority value from the .detect function.  The
  priority value can simply be the CPUID leaf where the signature was
  found (or a low value such as 1 if detection was done with DMI).
 
  Then you can pick the hypervisor with the highest priority instead of
  hard-coding the order.
 
 I like this idea; this allows some guest level control that is what we
 want
 when we have hypervisors emulating each other.
 
 
 Regards,
 
 K. Y
 
  Paolo
 
 
 
 --
 Sent from my mobile phone. Please excuse brevity and lack of formatting.
 
 



[Bug 60620] New: guest loses frequently (multiple times per day!) connectivity to network device

2013-07-24 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=60620

Bug ID: 60620
   Summary: guest loses frequently (multiple times per day!)
connectivity to network device
   Product: Virtualization
   Version: unspecified
Kernel Version: 3.8, 3.9, 3.10 in both the host as well as the guest
  Hardware: x86-64
OS: Linux
  Tree: Mainline
Status: NEW
  Severity: high
  Priority: P1
 Component: kvm
  Assignee: virtualization_...@kernel-bugs.osdl.org
  Reporter: folk...@vanheusden.com
Regression: No

Hi,

I have a server with plenty of free ram (16GB free when running the guests).
Each guest has 8GB of ram.
One guest has 3 network interfaces. They are connected to bridges in the host
(is it called DOM0 in KVM as well?). That guest now frequently (multiple
times per day) loses all connectivity to that interface. If I then ping any
host connected to that interface, no ping comes back: only a message about
buffer space not being enough. As a test I increased wmem and friends but that
did not help at all. I also removed the bridge for that one problematic
interface and replaced by a direct connection: did not help.
Neither the host nor the guest log anything to dmesg or syslog or anything.
The only thing that helps is completely rebooting the guest. Ifdown in the
guest and/or host does not help.
This problem happens only with 1 guest and only with 1 interface. I verified
that the other adapters can use their networks just fine. When it is used in
bridging mode, the host is still able to use it; only the guest isn't.
When the guest is connected via a bridge and it fails, then the dropped
packets counter starts increasing for every packet send out. If it is
directly connected, this counter is not increased.
I did some googling and found the suggestion to modprobe the vhost_net module:
did not help. Using e1000 instead of virtio: did not help.
I verified that there were not too many sockets open: only +/- 800. Note that
this exact configuration ran fine for years on real hardware. Also the guest
frequently has plenty of free ram (not even used by cache/buffers as it also
happens after a couple of minutes up time); mostly 5GB.
I tried disabling STP on the bridge: did not help.
Apart of that increasing dropped packets counter, there's also an other
difference between direct-connection and connected via a bridge:

20:19:03.910892 ARP, Request who-has 192.168.178.2 tell 192.168.178.1, length
46
20:19:04.906854 ARP, Request who-has 192.168.178.2 tell 192.168.178.1, length
46
20:19:05.493445 ARP, Request who-has 192.168.178.2 tell 192.168.178.83, length
46
20:19:05.903027 ARP, Request who-has 192.168.178.2 tell 192.168.178.1, length
46
20:19:06.490750 ARP, Request who-has 192.168.178.2 tell 192.168.178.83, length
46
...

2 is the problematic guest and 83 and 1 are indeed devices in that network!
So arp comes in but no replies go out also no other traffic comes in: this is
the interface to the internet and normally it has a constant input of data
(e.g. NTP requests, VPN data (tinc), web-server requests, mail, etc.).

Versions used:

pxe-qemu1.0.0+git-20120202.f6840ba-3
kvm 1:1.1.2+dfsg-6
qemu1.1.2+dfsg-6a
qemu-keymaps1.1.2+dfsg-6a
qemu-kvm1.1.2+dfsg-6
qemu-system 1.1.2+dfsg-6a
qemu-user   1.1.2+dfsg-6a
qemu-utils  1.1.2+dfsg-6a

Kernels: 3.8, 3.9 and 3.10.
Both on the guest and the host.

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 60620] guest loses frequently (multiple times per day!) connectivity to network device

2013-07-24 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=60620

Folkert van Heusden folk...@vanheusden.com changed:

   What|Removed |Added

   Severity|high|normal

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-24 Thread Scott Wood

On 07/24/2013 04:39:59 AM, Alexander Graf wrote:


On 24.07.2013, at 11:35, Gleb Natapov wrote:

 On Wed, Jul 24, 2013 at 11:21:11AM +0200, Alexander Graf wrote:
 Are not we going to use page_is_ram() from   
e500_shadow_mas2_attrib() as Scott commented?


 rWhy aren't we using page_is_ram() in kvm_is_mmio_pfn()?


 Because it is much slower and, IIRC, actually used to build pfn map  
that allow

 us to check quickly for valid pfn.

Then why should we use page_is_ram()? :)

I really don't want the e500 code to diverge too much from what the  
rest of the kvm code is doing.


I don't understand actually used to build pfn map  What code is  
this?  I don't see any calls to page_is_ram() in the KVM code, or in  
generic mm code.  Is this a statement about what x86 does?


On PPC page_is_ram() is only called (AFAICT) for determining what  
attributes to set on mmaps.  We want to be sure that KVM always makes  
the same decision.  While pfn_valid() seems like it should be  
equivalent, it's not obvious from the PPC code that it is.


If pfn_valid() is better, why is that not used for mmap?  Why are there  
two different names for the same thing?


-Scott
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 4/4] x86: properly handle kvm emulation of hyperv

2013-07-24 Thread H. Peter Anvin
What I'm suggesting is exactly that except that the native hypervisor is later 
in CPUID space.

KY Srinivasan k...@microsoft.com wrote:


 -Original Message-
 From: H. Peter Anvin [mailto:h...@zytor.com]
 Sent: Wednesday, July 24, 2013 11:14 AM
 To: KY Srinivasan; Paolo Bonzini; Jason Wang
 Cc: t...@linutronix.de; mi...@redhat.com; x...@kernel.org;
g...@redhat.com;
 kvm@vger.kernel.org; linux-ker...@vger.kernel.org
 Subject: RE: [PATCH 4/4] x86: properly handle kvm emulation of hyperv
 
 I don't see how this solves the A emulates B, B emulates A problem?

As Paolo suggested if there were some priority encoded, the guest could
make an
informed decision. If the guest under question can run on both
hypervisors A and B,
we would rather the guest discover hypervisor A when running on A and
hypervisor B
when running on B. The priority encoding could be as simple as
surfacing the native hypervisor
signature earlier in the CPUID space.

K. Y
 
 KY Srinivasan k...@microsoft.com wrote:
 
 
  -Original Message-
  From: Paolo Bonzini [mailto:paolo.bonz...@gmail.com] On Behalf Of
 Paolo
  Bonzini
  Sent: Wednesday, July 24, 2013 3:07 AM
  To: Jason Wang
  Cc: H. Peter Anvin; KY Srinivasan; t...@linutronix.de;
 mi...@redhat.com;
  x...@kernel.org; g...@redhat.com; kvm@vger.kernel.org; linux-
  ker...@vger.kernel.org
  Subject: Re: [PATCH 4/4] x86: properly handle kvm emulation of
hyperv
 
  Il 24/07/2013 08:54, Jason Wang ha scritto:
   On 07/24/2013 12:48 PM, H. Peter Anvin wrote:
   On 07/23/2013 09:37 PM, Jason Wang wrote:
   On 07/23/2013 10:48 PM, H. Peter Anvin wrote:
   On 07/23/2013 06:55 AM, KY Srinivasan wrote:
   This strategy of hypervisor detection based on some
detection
 order
  IMHO is not
   a robust detection strategy. The current scheme works since
the
 only
  hypervisor emulated
   (by other hypervisors happens to be Hyper-V). What if this
were
 to
  change.
  
   One strategy would be to pick the *last* one in the CPUID
list,
 since
   the ones before it are logically the one(s) being emulated...
  
-hpa
  
   How about simply does a reverse loop from 0x4001 to
 0x4001?
  
   Not all systems like being poked too far into hyperspace.  Just
 remember
   the last match and walk the list.
  
  -hpa
  
  
   Ok, but it raises a question - how to know it was the 'last'
match
   without knowing all signatures of other hyper-visor?
 
  You can return a priority value from the .detect function.  The
  priority value can simply be the CPUID leaf where the signature
was
  found (or a low value such as 1 if detection was done with DMI).
 
  Then you can pick the hypervisor with the highest priority instead
of
  hard-coding the order.
 
 I like this idea; this allows some guest level control that is what
we
 want
 when we have hypervisors emulating each other.
 
 
 Regards,
 
 K. Y
 
  Paolo
 
 
 
 --
 Sent from my mobile phone. Please excuse brevity and lack of
formatting.
 
 

-- 
Sent from my mobile phone. Please excuse brevity and lack of formatting.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 04/10] powerpc: Prepare to support kernel handling of IOMMU map/unmap

2013-07-24 Thread Andrew Morton
On Tue, 23 Jul 2013 12:22:59 +1000 Alexey Kardashevskiy a...@ozlabs.ru wrote:

 Ping, anyone, please?

ew, you top-posted.

 Ben needs ack from any of MM people before proceeding with this patch. Thanks!

For what?  The three lines of comment in page-flags.h?   ack :)

Manipulating page-_count directly is considered poor form.  Don't
blame us if we break your code ;)

Actually, the manipulation in realmode_get_page() duplicates the
existing get_page_unless_zero() and the one in realmode_put_page()
could perhaps be placed in mm.h with a suitable name and some
documentation.  That would improve your form and might protect the code
from getting broken later on.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 04/10] powerpc: Prepare to support kernel handling of IOMMU map/unmap

2013-07-24 Thread Benjamin Herrenschmidt
On Wed, 2013-07-24 at 15:43 -0700, Andrew Morton wrote:
 For what?  The three lines of comment in page-flags.h?   ack :)
 
 Manipulating page-_count directly is considered poor form.  Don't
 blame us if we break your code ;)
 
 Actually, the manipulation in realmode_get_page() duplicates the
 existing get_page_unless_zero() and the one in realmode_put_page()
 could perhaps be placed in mm.h with a suitable name and some
 documentation.  That would improve your form and might protect the code
 from getting broken later on.

Yes, this stuff makes me really nervous :-) If it didn't provide an order
of magnitude performance improvement in KVM I would avoid it but heh...

Alexey, I like having that stuff in generic code.

However the meaning of the words real mode can be ambiguous accross
architectures, it might be best to then name it mmu_off_put_page to
make things a bit clearer, along with a comment explaining that this is
called in a context where none of the virtual mappings are accessible
(vmalloc, vmemmap, IOs, ...), and that in the case of sparsemem vmemmap
the caller must have taken care of getting the physical address of the
struct page and of ensuring it isn't split accross two vmemmap blocks.

Cheers,
Ben.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-24 Thread Andrea Arcangeli
Hi!

On Wed, Jul 24, 2013 at 01:30:12PM +0300, Gleb Natapov wrote:
 On Wed, Jul 24, 2013 at 12:25:18PM +0200, Alexander Graf wrote:
  
  On 24.07.2013, at 12:19, Gleb Natapov wrote:
  
   On Wed, Jul 24, 2013 at 12:09:42PM +0200, Alexander Graf wrote:
   
   On 24.07.2013, at 12:01, Gleb Natapov wrote:
   
   Copying Andrea for him to verify that I am not talking nonsense :)
   
   On Wed, Jul 24, 2013 at 10:25:20AM +0200, Alexander Graf wrote:
   diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
   index 1580dd4..5e8635b 100644
   --- a/virt/kvm/kvm_main.c
   +++ b/virt/kvm/kvm_main.c
   @@ -102,6 +102,10 @@ static bool largepages_enabled = true;
   
   bool kvm_is_mmio_pfn(pfn_t pfn)
   {
   +#ifdef CONFIG_MEMORY_HOTPLUG
   
   I'd feel safer if we narrow this down to e500.
   
   +   /*
   +* Currently only in memory hot remove case we may still need 
   this.
   +*/
if (pfn_valid(pfn)) {
   
   We still have to check for pfn_valid, no? So the #ifdef should be down 
   here.
   
int reserved;
struct page *tail = pfn_to_page(pfn);
   @@ -124,6 +128,7 @@ bool kvm_is_mmio_pfn(pfn_t pfn)
}
return PageReserved(tail);
}
   +#endif
   
return true;
   }
   
   Before apply this change:
   
   real(1m19.954s + 1m20.918s + 1m22.740s + 1m21.146s + 
   1m22.120s)/5= 1m21.376s
   user(0m23.181s + 0m23.550s + 0m23.506s + 0m23.410s + 
   0m23.520s)/5= 0m23.433s
   sys   (0m49.087s + 0m49.563s + 0m51.758s + 0m50.290s + 0m51.047s)/5= 
   0m50.349s
   
   After apply this change:
   
   real(1m19.507s + 1m20.919s + 1m21.436s + 1m21.179s + 
   1m20.293s)/5= 1m20.667s
   user(0m22.595s + 0m22.719s + 0m22.484s + 0m22.811s + 
   0m22.467s)/5= 0m22.615s
   sys   (0m48.841s + 0m49.929s + 0m50.310s + 0m49.813s + 0m48.587s)/5= 
   0m49.496s
   
   So,
   
   real(1m20.667s - 1m21.376s)/1m21.376s x 100% = -0.6%
   user(0m22.615s - 0m23.433s)/0m23.433s x 100% = -3.5%
   sys   (0m49.496s - 0m50.349s)/0m50.349s x 100% = -1.7%
   
   Very nice, so there is a real world performance benefit to doing this. 
   Then yes, I think it would make sense to change the global helper 
   function to be fast on e500 and use that one from 
   e500_shadow_mas2_attrib() instead.
   
   Gleb, Paolo, any hard feelings?
   
   I do not see how can we break the function in such a way and get
   away with it. Not all valid pfns point to memory. Physical address can
   be sparse (due to PCI hole, framebuffer or just because).
   
   But we don't check for sparseness today in here either. We merely check 
   for incomplete huge pages.
   
   That's not how I read the code. The code checks for reserved flag set.
   It should be set on pfns that point to memory holes. As far as I
  
  I couldn't find any traces of code that sets the reserved bits on e500 
  chips though. I've only seen it getting set for memory hotplug and memory 
  incoherent DMA code which doesn't get used on e500.
  
  But I'd be more than happy to get proven wrong :).
  
 Can you write a module that scans all page structures? AFAIK all pages
 are marked as reserved and then those that become regular memory are
 marked as unreserved. Hope Andrea will chime in here :)

So the situation with regard to non-RAM and PageReserved/pfn_valid is
quite simple.

struct page exists for non-RAM too as struct page must exist up to
at least 2^MAX_ORDER pfn alignment or things breaks, like the first
pfn must be 2^MXA_ORDER aligned or again things break in the buddy. We
don't make an effort to save a few struct page to keep it simpler.

But those non-RAM pages (or tiny non-RAM page holes if any) are marked
PageReserved.

If struct page doesn't exist pfn_valid returns false.

So you cannot get away skipping pfn_valid and at least one
PageReserved.

However it gets more complex than just ram vs non-RAM, because there
are pages that are real RAM (not left marked PageReserved at boot
after checking e820 or equivalent bios data for non-x86 archs) but
that are taken over by drivers, than then could use it as mmio regions
snooping the writes and mapping them in userland too as hugepages
maybe. That is the motivation for the THP related code in
kvm_is_mmio_pfn.

Those vmas have VM_PFNMAP set so vm_normal_page is zero and the
refcounting is skipped like if it's non-RAM and they're mapped with
remap_pfn_range (different mechanism for VM_MIXEDMAP that does the
refcounting and doesn't require in turn the driver to mark the page
PageReserved).

The above explains why KVM needs to skip the refcounting on
PageReserved == true  pfn_valid() == true, and it must skip the
refcounting for pfn_valid == false without trying to call pfn_to_page
(or it'll crash).

Now the code doing the THP check with smp_rmb is very safe, possibly
too safe. Looking at it now, it looks a minor overengineering
oversight.

The slight oversight is that split_huge_page cannot transfer the
PG_reserved bit from 

Re: Call for Proposals: 2013 Linux Plumbers Virtualization Microconference

2013-07-24 Thread Alex Williamson

Reminder, there's one week left to submit proposals for the
virtualization micro-conference at LPC.  Please see below for details
and note the update to submit proposals through the Linux Plumbers
website:

http://www.linuxplumbersconf.org/2013/ocw/events/LPC2013/proposals/new

Thanks,
Alex

On Sun, 2013-07-14 at 15:59 -0600, Alex Williamson wrote:
 On Fri, 2013-07-12 at 14:38 -0600, Alex Williamson wrote:
  The Call for Proposals for the 2013 Linux Plumbers Virtualization
  Microconference is now open.  This uconf is being held as part of Linux
  Plumbers Conference in New Orleans, Louisiana, USA September 18-20th and
  is co-located with LinuxCon North America.  For more information see:
  
  http://www.linuxplumbersconf.org/2013/
  
  The tentative deadline for proposals is August 1st.  To submit a topic
  please email a brief abstract to lpc2013-virt...@codemonkey.ws  If you
  require travel assistance (extremely limited) in order to attend, please
  note that in your submission.  Also, please keep an eye on:
  
  http://www.linuxplumbersconf.org/2013/submitting-topic/
  http://www.linuxplumbersconf.org/2013/participate/
  
  We've setup the above email submission as an interim approach until the
  LPC program committee brings the official submission tool online.  I'll
  send a follow-up message when that occurs, but please send your
  proposals as soon as possible.  Thanks,
 
 And the official tool is now online.  Please see:
 
 http://www.linuxplumbersconf.org/2013/microconference-discussion-topic-bof-submissions-now-open/
 
 for instructions to propose a discussion topic for the virtualization
 microconference.  Thanks,
 
 Alex



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm-unit-tests : Basic architecture of VMX nested test case

2013-07-24 Thread Arthur Chunqi Li
This is the first version of VMX nested environment. It contains the
basic VMX instructions test cases, including VMXON/VMXOFF/VMXPTRLD/
VMXPTRST/VMCLEAR/VMLAUNCH/VMRESUME/VMCALL. This patchalso tests the
basic execution routine in VMX nested environment andlet the VM print
Hello World to inform its successfully run.

The first release also includes a test suite for vmenter (vmlaunch and
vmresume). Besides, hypercall mechanism is included and currently it is
used to invoke VM normal exit.

New files added:
x86/vmx.h : contains all VMX related macro declerations
x86/vmx.c : main file for VMX nested test case

Signed-off-by: Arthur Chunqi Li yzt...@gmail.com
---
 config-x86-common.mak |2 +
 config-x86_64.mak |1 +
 lib/x86/msr.h |5 +
 x86/cstart64.S|4 +
 x86/unittests.cfg |6 +
 x86/vmx.c |  712 +
 x86/vmx.h |  479 +
 7 files changed, 1209 insertions(+)
 create mode 100644 x86/vmx.c
 create mode 100644 x86/vmx.h

diff --git a/config-x86-common.mak b/config-x86-common.mak
index 455032b..34a41e1 100644
--- a/config-x86-common.mak
+++ b/config-x86-common.mak
@@ -101,6 +101,8 @@ $(TEST_DIR)/asyncpf.elf: $(cstart.o) $(TEST_DIR)/asyncpf.o
 
 $(TEST_DIR)/pcid.elf: $(cstart.o) $(TEST_DIR)/pcid.o
 
+$(TEST_DIR)/vmx.elf: $(cstart.o) $(TEST_DIR)/vmx.o
+
 arch_clean:
$(RM) $(TEST_DIR)/*.o $(TEST_DIR)/*.flat $(TEST_DIR)/*.elf \
$(TEST_DIR)/.*.d $(TEST_DIR)/lib/.*.d $(TEST_DIR)/lib/*.o
diff --git a/config-x86_64.mak b/config-x86_64.mak
index 4e525f5..bb8ee89 100644
--- a/config-x86_64.mak
+++ b/config-x86_64.mak
@@ -9,5 +9,6 @@ tests = $(TEST_DIR)/access.flat $(TEST_DIR)/apic.flat \
  $(TEST_DIR)/xsave.flat $(TEST_DIR)/rmap_chain.flat \
  $(TEST_DIR)/pcid.flat
 tests += $(TEST_DIR)/svm.flat
+tests += $(TEST_DIR)/vmx.flat
 
 include config-x86-common.mak
diff --git a/lib/x86/msr.h b/lib/x86/msr.h
index 509a421..281255a 100644
--- a/lib/x86/msr.h
+++ b/lib/x86/msr.h
@@ -396,6 +396,11 @@
 #define MSR_IA32_VMX_VMCS_ENUM  0x048a
 #define MSR_IA32_VMX_PROCBASED_CTLS20x048b
 #define MSR_IA32_VMX_EPT_VPID_CAP   0x048c
+#define MSR_IA32_VMX_TRUE_PIN  0x048d
+#define MSR_IA32_VMX_TRUE_PROC 0x048e
+#define MSR_IA32_VMX_TRUE_EXIT 0x048f
+#define MSR_IA32_VMX_TRUE_ENTRY0x0490
+
 
 /* AMD-V MSRs */
 
diff --git a/x86/cstart64.S b/x86/cstart64.S
index 24df5f8..0fe76da 100644
--- a/x86/cstart64.S
+++ b/x86/cstart64.S
@@ -4,6 +4,10 @@
 .globl boot_idt
 boot_idt = 0
 
+.globl idt_descr
+.globl tss_descr
+.globl gdt64_desc
+
 ipi_vector = 0x20
 
 max_cpus = 64
diff --git a/x86/unittests.cfg b/x86/unittests.cfg
index bc9643e..85c36aa 100644
--- a/x86/unittests.cfg
+++ b/x86/unittests.cfg
@@ -149,3 +149,9 @@ extra_params = --append 1000 `date +%s`
 file = pcid.flat
 extra_params = -cpu qemu64,+pcid
 arch = x86_64
+
+[vmx]
+file = vmx.flat
+extra_params = -cpu host,+vmx
+arch = x86_64
+
diff --git a/x86/vmx.c b/x86/vmx.c
new file mode 100644
index 000..ca3e117
--- /dev/null
+++ b/x86/vmx.c
@@ -0,0 +1,712 @@
+#include libcflat.h
+#include processor.h
+#include vm.h
+#include desc.h
+#include vmx.h
+#include msr.h
+#include smp.h
+#include io.h
+
+int fails = 0, tests = 0;
+u32 *vmxon_region;
+struct vmcs *vmcs_root;
+u32 vpid_cnt;
+void *guest_stack, *guest_syscall_stack;
+u32 ctrl_pin, ctrl_enter, ctrl_exit, ctrl_cpu[2];
+ulong fix_cr0_set, fix_cr0_clr;
+ulong fix_cr4_set, fix_cr4_clr;
+struct regs regs;
+struct vmx_test *current;
+u64 hypercall_field = 0;
+bool launched = 0;
+
+extern u64 gdt64_desc[];
+extern u64 idt_descr[];
+extern u64 tss_descr[];
+extern void *vmx_return;
+extern void *entry_sysenter;
+extern void *guest_entry;
+
+static void report(const char *name, int result)
+{
+   ++tests;
+   if (result)
+   printf(PASS: %s\n, name);
+   else {
+   printf(FAIL: %s\n, name);
+   ++fails;
+   }
+}
+
+static int vmcs_clear(struct vmcs *vmcs)
+{
+   bool ret;
+   asm volatile (vmclear %1; setbe %0 : =q (ret) : m (vmcs) : cc);
+   return ret;
+}
+
+static u64 vmcs_read(enum Encoding enc)
+{
+   u64 val;
+   asm volatile (vmread %1, %0 : =rm (val) : r ((u64)enc) : cc);
+   return val;
+}
+
+static int vmcs_write(enum Encoding enc, u64 val)
+{
+   bool ret;
+   asm volatile (vmwrite %1, %2; setbe %0
+   : =q(ret) : rm (val), r ((u64)enc) : cc);
+   return ret;
+}
+
+static int make_vmcs_current(struct vmcs *vmcs)
+{
+   bool ret;
+
+   asm volatile (vmptrld %1; setbe %0 : =q (ret) : m (vmcs) : cc);
+   return ret;
+}
+
+static int save_vmcs(struct vmcs **vmcs)
+{
+   bool ret;
+
+   asm volatile (vmptrst %1; setbe %0 : =q (ret) : m (*vmcs) : cc);
+   return ret;
+}
+
+/* entry_sysenter */
+asm(
+   .align 4, 0x90\n\t
+   .globl 

RE: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-24 Thread Bhushan Bharat-R65777


 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Wednesday, July 24, 2013 1:55 PM
 To: “tiejun.chen”
 Cc: Bhushan Bharat-R65777; kvm-ppc@vger.kernel.org; k...@vger.kernel.org list;
 Wood Scott-B07421; Gleb Natapov; Paolo Bonzini
 Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel
 managed pages
 
 
 On 24.07.2013, at 04:26, “tiejun.chen” wrote:
 
  On 07/18/2013 06:27 PM, Alexander Graf wrote:
 
  On 18.07.2013, at 12:19, “tiejun.chen” wrote:
 
  On 07/18/2013 06:12 PM, Alexander Graf wrote:
 
  On 18.07.2013, at 12:08, “tiejun.chen” wrote:
 
  On 07/18/2013 05:48 PM, Alexander Graf wrote:
 
  On 18.07.2013, at 10:25, Bhushan Bharat-R65777 wrote:
 
 
 
  -Original Message-
  From: Bhushan Bharat-R65777
  Sent: Thursday, July 18, 2013 1:53 PM
  To: ' tiejun.chen '
  Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org;
  ag...@suse.de; Wood Scott-
  B07421
  Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only
  for kernel managed pages
 
 
 
  -Original Message-
  From:  tiejun.chen  [mailto:tiejun.c...@windriver.com]
  Sent: Thursday, July 18, 2013 1:52 PM
  To: Bhushan Bharat-R65777
  Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org;
  ag...@suse.de; Wood
  Scott-
  B07421
  Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency
  only for kernel managed pages
 
  On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote:
 
 
  -Original Message-
  From: kvm-ppc-ow...@vger.kernel.org
  [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of  tiejun.chen 
  
  Sent: Thursday, July 18, 2013 1:01 PM
  To: Bhushan Bharat-R65777
  Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org;
  ag...@suse.de; Wood
  Scott-
  B07421
  Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency
  only for kernel managed pages
 
  On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote:
 
 
  -Original Message-
  From:  tiejun.chen  [mailto:tiejun.c...@windriver.com]
  Sent: Thursday, July 18, 2013 11:56 AM
  To: Bhushan Bharat-R65777
  Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org;
  ag...@suse.de; Wood
  Scott- B07421; Bhushan Bharat-R65777
  Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency
  only for kernel managed pages
 
  On 07/18/2013 02:04 PM, Bharat Bhushan wrote:
  If there is a struct page for the requested mapping then
  it's normal DDR and the mapping sets M bit (coherent,
  cacheable) else this is treated as I/O and we set  I +
  G  (cache inhibited,
  guarded)
 
  This helps setting proper TLB mapping for direct assigned
  device
 
  Signed-off-by: Bharat Bhushan
  bharat.bhus...@freescale.com
  ---
 arch/powerpc/kvm/e500_mmu_host.c |   17 -
 1 files changed, 12 insertions(+), 5 deletions(-)
 
  diff --git a/arch/powerpc/kvm/e500_mmu_host.c
  b/arch/powerpc/kvm/e500_mmu_host.c
  index 1c6a9d7..089c227 100644
  --- a/arch/powerpc/kvm/e500_mmu_host.c
  +++ b/arch/powerpc/kvm/e500_mmu_host.c
  @@ -64,13 +64,20 @@ static inline u32
  e500_shadow_mas3_attrib(u32 mas3, int
  usermode)
 return mas3;
 }
 
  -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int
  usermode)
  +static inline u32 e500_shadow_mas2_attrib(u32 mas2,
  +pfn_t pfn)
 {
  +  u32 mas2_attr;
  +
  +  mas2_attr = mas2  MAS2_ATTRIB_MASK;
  +
  +  if (!pfn_valid(pfn)) {
 
  Why not directly use kvm_is_mmio_pfn()?
 
  What I understand from this function (someone can correct
  me) is that it
  returns false when the page is managed by kernel and is
  not marked as RESERVED (for some reason). For us it does not
  matter whether the page is reserved or not, if it is kernel
  visible page then it
  is DDR.
 
 
  I think you are setting I|G by addressing all mmio pages,
  right? If so,
 
   KVM: direct mmio pfn check
 
   Userspace may specify memory slots that are backed by
  mmio pages rather than
   normal RAM.  In some cases it is not enough to identify
  these mmio
  pages
   by pfn_valid().  This patch adds checking the PageReserved as
 well.
 
  Do you know what are those some cases and how checking
  PageReserved helps in
  those cases?
 
  No, myself didn't see these actual cases in qemu,too. But this
  should be chronically persistent as I understand ;-)
 
  Then I will wait till someone educate me :)
 
  The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do
 not want to call this for all tlbwe operation unless it is necessary.
 
  It certainly does more than we need and potentially slows down the fast
 path (RAM mapping). The only thing it does on top of if (pfn_valid()) is to
 check for pages that are declared reserved on the host. This happens in 2 
 cases:
 
1) Non cache coherent DMA
2) Memory hot remove
 
  The non coherent DMA case would be interesting, as with the mechanism 
  as
 it is in place in Linux today, we could potentially break normal guest 
 operation
 if we don't take it into account. However, it's Kconfig guarded by:
 
  

Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-24 Thread Alexander Graf

On 24.07.2013, at 11:11, Bhushan Bharat-R65777 wrote:

 
 
 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Wednesday, July 24, 2013 1:55 PM
 To: “tiejun.chen”
 Cc: Bhushan Bharat-R65777; kvm-ppc@vger.kernel.org; k...@vger.kernel.org 
 list;
 Wood Scott-B07421; Gleb Natapov; Paolo Bonzini
 Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel
 managed pages
 
 
 On 24.07.2013, at 04:26, “tiejun.chen” wrote:
 
 On 07/18/2013 06:27 PM, Alexander Graf wrote:
 
 On 18.07.2013, at 12:19, “tiejun.chen” wrote:
 
 On 07/18/2013 06:12 PM, Alexander Graf wrote:
 
 On 18.07.2013, at 12:08, “tiejun.chen” wrote:
 
 On 07/18/2013 05:48 PM, Alexander Graf wrote:
 
 On 18.07.2013, at 10:25, Bhushan Bharat-R65777 wrote:
 
 
 
 -Original Message-
 From: Bhushan Bharat-R65777
 Sent: Thursday, July 18, 2013 1:53 PM
 To: ' tiejun.chen '
 Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org;
 ag...@suse.de; Wood Scott-
 B07421
 Subject: RE: [PATCH 2/2] kvm: powerpc: set cache coherency only
 for kernel managed pages
 
 
 
 -Original Message-
 From:  tiejun.chen  [mailto:tiejun.c...@windriver.com]
 Sent: Thursday, July 18, 2013 1:52 PM
 To: Bhushan Bharat-R65777
 Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org;
 ag...@suse.de; Wood
 Scott-
 B07421
 Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency
 only for kernel managed pages
 
 On 07/18/2013 04:08 PM, Bhushan Bharat-R65777 wrote:
 
 
 -Original Message-
 From: kvm-ppc-ow...@vger.kernel.org
 [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of  tiejun.chen 
 
 Sent: Thursday, July 18, 2013 1:01 PM
 To: Bhushan Bharat-R65777
 Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org;
 ag...@suse.de; Wood
 Scott-
 B07421
 Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency
 only for kernel managed pages
 
 On 07/18/2013 03:12 PM, Bhushan Bharat-R65777 wrote:
 
 
 -Original Message-
 From:  tiejun.chen  [mailto:tiejun.c...@windriver.com]
 Sent: Thursday, July 18, 2013 11:56 AM
 To: Bhushan Bharat-R65777
 Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org;
 ag...@suse.de; Wood
 Scott- B07421; Bhushan Bharat-R65777
 Subject: Re: [PATCH 2/2] kvm: powerpc: set cache coherency
 only for kernel managed pages
 
 On 07/18/2013 02:04 PM, Bharat Bhushan wrote:
 If there is a struct page for the requested mapping then
 it's normal DDR and the mapping sets M bit (coherent,
 cacheable) else this is treated as I/O and we set  I +
 G  (cache inhibited,
 guarded)
 
 This helps setting proper TLB mapping for direct assigned
 device
 
 Signed-off-by: Bharat Bhushan
 bharat.bhus...@freescale.com
 ---
   arch/powerpc/kvm/e500_mmu_host.c |   17 -
   1 files changed, 12 insertions(+), 5 deletions(-)
 
 diff --git a/arch/powerpc/kvm/e500_mmu_host.c
 b/arch/powerpc/kvm/e500_mmu_host.c
 index 1c6a9d7..089c227 100644
 --- a/arch/powerpc/kvm/e500_mmu_host.c
 +++ b/arch/powerpc/kvm/e500_mmu_host.c
 @@ -64,13 +64,20 @@ static inline u32
 e500_shadow_mas3_attrib(u32 mas3, int
 usermode)
return mas3;
   }
 
 -static inline u32 e500_shadow_mas2_attrib(u32 mas2, int
 usermode)
 +static inline u32 e500_shadow_mas2_attrib(u32 mas2,
 +pfn_t pfn)
   {
 +  u32 mas2_attr;
 +
 +  mas2_attr = mas2  MAS2_ATTRIB_MASK;
 +
 +  if (!pfn_valid(pfn)) {
 
 Why not directly use kvm_is_mmio_pfn()?
 
 What I understand from this function (someone can correct
 me) is that it
 returns false when the page is managed by kernel and is
 not marked as RESERVED (for some reason). For us it does not
 matter whether the page is reserved or not, if it is kernel
 visible page then it
 is DDR.
 
 
 I think you are setting I|G by addressing all mmio pages,
 right? If so,
 
 KVM: direct mmio pfn check
 
 Userspace may specify memory slots that are backed by
 mmio pages rather than
 normal RAM.  In some cases it is not enough to identify
 these mmio
 pages
 by pfn_valid().  This patch adds checking the PageReserved as
 well.
 
 Do you know what are those some cases and how checking
 PageReserved helps in
 those cases?
 
 No, myself didn't see these actual cases in qemu,too. But this
 should be chronically persistent as I understand ;-)
 
 Then I will wait till someone educate me :)
 
 The reason is , kvm_is_mmio_pfn() function looks pretty heavy and I do
 not want to call this for all tlbwe operation unless it is necessary.
 
 It certainly does more than we need and potentially slows down the fast
 path (RAM mapping). The only thing it does on top of if (pfn_valid()) is to
 check for pages that are declared reserved on the host. This happens in 2 
 cases:
 
  1) Non cache coherent DMA
  2) Memory hot remove
 
 The non coherent DMA case would be interesting, as with the mechanism 
 as
 it is in place in Linux today, we could potentially break normal guest 
 operation
 if we don't take it into account. However, it's Kconfig guarded by:
 
depends on 4xx || 8xx || E200 || PPC_MPC512x || GAMECUBE_COMMON

Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-24 Thread Alexander Graf

On 24.07.2013, at 11:35, Gleb Natapov wrote:

 On Wed, Jul 24, 2013 at 11:21:11AM +0200, Alexander Graf wrote:
 Are not we going to use page_is_ram() from  e500_shadow_mas2_attrib() as 
 Scott commented?
 
 rWhy aren't we using page_is_ram() in kvm_is_mmio_pfn()?
 
 
 Because it is much slower and, IIRC, actually used to build pfn map that allow
 us to check quickly for valid pfn.

Then why should we use page_is_ram()? :)

I really don't want the e500 code to diverge too much from what the rest of the 
kvm code is doing.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-24 Thread Gleb Natapov
Copying Andrea for him to verify that I am not talking nonsense :)

On Wed, Jul 24, 2013 at 10:25:20AM +0200, Alexander Graf wrote:
  diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
  index 1580dd4..5e8635b 100644
  --- a/virt/kvm/kvm_main.c
  +++ b/virt/kvm/kvm_main.c
  @@ -102,6 +102,10 @@ static bool largepages_enabled = true;
  
  bool kvm_is_mmio_pfn(pfn_t pfn)
  {
  +#ifdef CONFIG_MEMORY_HOTPLUG
 
 I'd feel safer if we narrow this down to e500.
 
  +   /*
  +* Currently only in memory hot remove case we may still need this.
  +*/
 if (pfn_valid(pfn)) {
 
 We still have to check for pfn_valid, no? So the #ifdef should be down here.
 
 int reserved;
 struct page *tail = pfn_to_page(pfn);
  @@ -124,6 +128,7 @@ bool kvm_is_mmio_pfn(pfn_t pfn)
 }
 return PageReserved(tail);
 }
  +#endif
  
 return true;
  }
  
  Before apply this change:
  
  real(1m19.954s + 1m20.918s + 1m22.740s + 1m21.146s + 1m22.120s)/5= 
  1m21.376s
  user(0m23.181s + 0m23.550s + 0m23.506s + 0m23.410s + 0m23.520s)/5= 
  0m23.433s
  sys (0m49.087s + 0m49.563s + 0m51.758s + 0m50.290s + 0m51.047s)/5= 0m50.349s
  
  After apply this change:
  
  real(1m19.507s + 1m20.919s + 1m21.436s + 1m21.179s + 1m20.293s)/5= 
  1m20.667s
  user(0m22.595s + 0m22.719s + 0m22.484s + 0m22.811s + 0m22.467s)/5= 
  0m22.615s
  sys (0m48.841s + 0m49.929s + 0m50.310s + 0m49.813s + 0m48.587s)/5= 0m49.496s
  
  So,
  
  real(1m20.667s - 1m21.376s)/1m21.376s x 100% = -0.6%
  user(0m22.615s - 0m23.433s)/0m23.433s x 100% = -3.5%
  sys (0m49.496s - 0m50.349s)/0m50.349s x 100% = -1.7%
 
 Very nice, so there is a real world performance benefit to doing this. Then 
 yes, I think it would make sense to change the global helper function to be 
 fast on e500 and use that one from e500_shadow_mas2_attrib() instead.
 
 Gleb, Paolo, any hard feelings?
 
I do not see how can we break the function in such a way and get
away with it. Not all valid pfns point to memory. Physical address can
be sparse (due to PCI hole, framebuffer or just because).

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-24 Thread Alexander Graf

On 24.07.2013, at 12:01, Gleb Natapov wrote:

 Copying Andrea for him to verify that I am not talking nonsense :)
 
 On Wed, Jul 24, 2013 at 10:25:20AM +0200, Alexander Graf wrote:
 diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
 index 1580dd4..5e8635b 100644
 --- a/virt/kvm/kvm_main.c
 +++ b/virt/kvm/kvm_main.c
 @@ -102,6 +102,10 @@ static bool largepages_enabled = true;
 
 bool kvm_is_mmio_pfn(pfn_t pfn)
 {
 +#ifdef CONFIG_MEMORY_HOTPLUG
 
 I'd feel safer if we narrow this down to e500.
 
 +   /*
 +* Currently only in memory hot remove case we may still need this.
 +*/
   if (pfn_valid(pfn)) {
 
 We still have to check for pfn_valid, no? So the #ifdef should be down here.
 
   int reserved;
   struct page *tail = pfn_to_page(pfn);
 @@ -124,6 +128,7 @@ bool kvm_is_mmio_pfn(pfn_t pfn)
   }
   return PageReserved(tail);
   }
 +#endif
 
   return true;
 }
 
 Before apply this change:
 
 real(1m19.954s + 1m20.918s + 1m22.740s + 1m21.146s + 1m22.120s)/5= 
 1m21.376s
 user(0m23.181s + 0m23.550s + 0m23.506s + 0m23.410s + 0m23.520s)/5= 
 0m23.433s
 sys (0m49.087s + 0m49.563s + 0m51.758s + 0m50.290s + 0m51.047s)/5= 0m50.349s
 
 After apply this change:
 
 real(1m19.507s + 1m20.919s + 1m21.436s + 1m21.179s + 1m20.293s)/5= 
 1m20.667s
 user(0m22.595s + 0m22.719s + 0m22.484s + 0m22.811s + 0m22.467s)/5= 
 0m22.615s
 sys (0m48.841s + 0m49.929s + 0m50.310s + 0m49.813s + 0m48.587s)/5= 0m49.496s
 
 So,
 
 real(1m20.667s - 1m21.376s)/1m21.376s x 100% = -0.6%
 user(0m22.615s - 0m23.433s)/0m23.433s x 100% = -3.5%
 sys (0m49.496s - 0m50.349s)/0m50.349s x 100% = -1.7%
 
 Very nice, so there is a real world performance benefit to doing this. Then 
 yes, I think it would make sense to change the global helper function to be 
 fast on e500 and use that one from e500_shadow_mas2_attrib() instead.
 
 Gleb, Paolo, any hard feelings?
 
 I do not see how can we break the function in such a way and get
 away with it. Not all valid pfns point to memory. Physical address can
 be sparse (due to PCI hole, framebuffer or just because).

But we don't check for sparseness today in here either. We merely check for 
incomplete huge pages.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-24 Thread Gleb Natapov
On Wed, Jul 24, 2013 at 12:09:42PM +0200, Alexander Graf wrote:
 
 On 24.07.2013, at 12:01, Gleb Natapov wrote:
 
  Copying Andrea for him to verify that I am not talking nonsense :)
  
  On Wed, Jul 24, 2013 at 10:25:20AM +0200, Alexander Graf wrote:
  diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
  index 1580dd4..5e8635b 100644
  --- a/virt/kvm/kvm_main.c
  +++ b/virt/kvm/kvm_main.c
  @@ -102,6 +102,10 @@ static bool largepages_enabled = true;
  
  bool kvm_is_mmio_pfn(pfn_t pfn)
  {
  +#ifdef CONFIG_MEMORY_HOTPLUG
  
  I'd feel safer if we narrow this down to e500.
  
  +   /*
  +* Currently only in memory hot remove case we may still need 
  this.
  +*/
if (pfn_valid(pfn)) {
  
  We still have to check for pfn_valid, no? So the #ifdef should be down 
  here.
  
int reserved;
struct page *tail = pfn_to_page(pfn);
  @@ -124,6 +128,7 @@ bool kvm_is_mmio_pfn(pfn_t pfn)
}
return PageReserved(tail);
}
  +#endif
  
return true;
  }
  
  Before apply this change:
  
  real(1m19.954s + 1m20.918s + 1m22.740s + 1m21.146s + 1m22.120s)/5= 
  1m21.376s
  user(0m23.181s + 0m23.550s + 0m23.506s + 0m23.410s + 0m23.520s)/5= 
  0m23.433s
  sys   (0m49.087s + 0m49.563s + 0m51.758s + 0m50.290s + 0m51.047s)/5= 
  0m50.349s
  
  After apply this change:
  
  real(1m19.507s + 1m20.919s + 1m21.436s + 1m21.179s + 1m20.293s)/5= 
  1m20.667s
  user(0m22.595s + 0m22.719s + 0m22.484s + 0m22.811s + 0m22.467s)/5= 
  0m22.615s
  sys   (0m48.841s + 0m49.929s + 0m50.310s + 0m49.813s + 0m48.587s)/5= 
  0m49.496s
  
  So,
  
  real(1m20.667s - 1m21.376s)/1m21.376s x 100% = -0.6%
  user(0m22.615s - 0m23.433s)/0m23.433s x 100% = -3.5%
  sys   (0m49.496s - 0m50.349s)/0m50.349s x 100% = -1.7%
  
  Very nice, so there is a real world performance benefit to doing this. 
  Then yes, I think it would make sense to change the global helper function 
  to be fast on e500 and use that one from e500_shadow_mas2_attrib() instead.
  
  Gleb, Paolo, any hard feelings?
  
  I do not see how can we break the function in such a way and get
  away with it. Not all valid pfns point to memory. Physical address can
  be sparse (due to PCI hole, framebuffer or just because).
 
 But we don't check for sparseness today in here either. We merely check for 
 incomplete huge pages.
 
That's not how I read the code. The code checks for reserved flag set.
It should be set on pfns that point to memory holes. As far as I
understand huge page tricks they are there to guaranty that THP does not
change flags under us, Andrea?

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-24 Thread Alexander Graf

On 24.07.2013, at 12:19, Gleb Natapov wrote:

 On Wed, Jul 24, 2013 at 12:09:42PM +0200, Alexander Graf wrote:
 
 On 24.07.2013, at 12:01, Gleb Natapov wrote:
 
 Copying Andrea for him to verify that I am not talking nonsense :)
 
 On Wed, Jul 24, 2013 at 10:25:20AM +0200, Alexander Graf wrote:
 diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
 index 1580dd4..5e8635b 100644
 --- a/virt/kvm/kvm_main.c
 +++ b/virt/kvm/kvm_main.c
 @@ -102,6 +102,10 @@ static bool largepages_enabled = true;
 
 bool kvm_is_mmio_pfn(pfn_t pfn)
 {
 +#ifdef CONFIG_MEMORY_HOTPLUG
 
 I'd feel safer if we narrow this down to e500.
 
 +   /*
 +* Currently only in memory hot remove case we may still need 
 this.
 +*/
  if (pfn_valid(pfn)) {
 
 We still have to check for pfn_valid, no? So the #ifdef should be down 
 here.
 
  int reserved;
  struct page *tail = pfn_to_page(pfn);
 @@ -124,6 +128,7 @@ bool kvm_is_mmio_pfn(pfn_t pfn)
  }
  return PageReserved(tail);
  }
 +#endif
 
  return true;
 }
 
 Before apply this change:
 
 real(1m19.954s + 1m20.918s + 1m22.740s + 1m21.146s + 1m22.120s)/5= 
 1m21.376s
 user(0m23.181s + 0m23.550s + 0m23.506s + 0m23.410s + 0m23.520s)/5= 
 0m23.433s
 sys   (0m49.087s + 0m49.563s + 0m51.758s + 0m50.290s + 0m51.047s)/5= 
 0m50.349s
 
 After apply this change:
 
 real(1m19.507s + 1m20.919s + 1m21.436s + 1m21.179s + 1m20.293s)/5= 
 1m20.667s
 user(0m22.595s + 0m22.719s + 0m22.484s + 0m22.811s + 0m22.467s)/5= 
 0m22.615s
 sys   (0m48.841s + 0m49.929s + 0m50.310s + 0m49.813s + 0m48.587s)/5= 
 0m49.496s
 
 So,
 
 real(1m20.667s - 1m21.376s)/1m21.376s x 100% = -0.6%
 user(0m22.615s - 0m23.433s)/0m23.433s x 100% = -3.5%
 sys   (0m49.496s - 0m50.349s)/0m50.349s x 100% = -1.7%
 
 Very nice, so there is a real world performance benefit to doing this. 
 Then yes, I think it would make sense to change the global helper function 
 to be fast on e500 and use that one from e500_shadow_mas2_attrib() instead.
 
 Gleb, Paolo, any hard feelings?
 
 I do not see how can we break the function in such a way and get
 away with it. Not all valid pfns point to memory. Physical address can
 be sparse (due to PCI hole, framebuffer or just because).
 
 But we don't check for sparseness today in here either. We merely check for 
 incomplete huge pages.
 
 That's not how I read the code. The code checks for reserved flag set.
 It should be set on pfns that point to memory holes. As far as I

I couldn't find any traces of code that sets the reserved bits on e500 chips 
though. I've only seen it getting set for memory hotplug and memory incoherent 
DMA code which doesn't get used on e500.

But I'd be more than happy to get proven wrong :).


Alex

 understand huge page tricks they are there to guaranty that THP does not
 change flags under us, Andrea?
 
 --
   Gleb.
 --
 To unsubscribe from this list: send the line unsubscribe kvm-ppc in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-24 Thread Gleb Natapov
On Wed, Jul 24, 2013 at 12:25:18PM +0200, Alexander Graf wrote:
 
 On 24.07.2013, at 12:19, Gleb Natapov wrote:
 
  On Wed, Jul 24, 2013 at 12:09:42PM +0200, Alexander Graf wrote:
  
  On 24.07.2013, at 12:01, Gleb Natapov wrote:
  
  Copying Andrea for him to verify that I am not talking nonsense :)
  
  On Wed, Jul 24, 2013 at 10:25:20AM +0200, Alexander Graf wrote:
  diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
  index 1580dd4..5e8635b 100644
  --- a/virt/kvm/kvm_main.c
  +++ b/virt/kvm/kvm_main.c
  @@ -102,6 +102,10 @@ static bool largepages_enabled = true;
  
  bool kvm_is_mmio_pfn(pfn_t pfn)
  {
  +#ifdef CONFIG_MEMORY_HOTPLUG
  
  I'd feel safer if we narrow this down to e500.
  
  +   /*
  +* Currently only in memory hot remove case we may still need 
  this.
  +*/
   if (pfn_valid(pfn)) {
  
  We still have to check for pfn_valid, no? So the #ifdef should be down 
  here.
  
   int reserved;
   struct page *tail = pfn_to_page(pfn);
  @@ -124,6 +128,7 @@ bool kvm_is_mmio_pfn(pfn_t pfn)
   }
   return PageReserved(tail);
   }
  +#endif
  
   return true;
  }
  
  Before apply this change:
  
  real(1m19.954s + 1m20.918s + 1m22.740s + 1m21.146s + 1m22.120s)/5= 
  1m21.376s
  user(0m23.181s + 0m23.550s + 0m23.506s + 0m23.410s + 0m23.520s)/5= 
  0m23.433s
  sys (0m49.087s + 0m49.563s + 0m51.758s + 0m50.290s + 0m51.047s)/5= 
  0m50.349s
  
  After apply this change:
  
  real(1m19.507s + 1m20.919s + 1m21.436s + 1m21.179s + 1m20.293s)/5= 
  1m20.667s
  user(0m22.595s + 0m22.719s + 0m22.484s + 0m22.811s + 0m22.467s)/5= 
  0m22.615s
  sys (0m48.841s + 0m49.929s + 0m50.310s + 0m49.813s + 0m48.587s)/5= 
  0m49.496s
  
  So,
  
  real(1m20.667s - 1m21.376s)/1m21.376s x 100% = -0.6%
  user(0m22.615s - 0m23.433s)/0m23.433s x 100% = -3.5%
  sys (0m49.496s - 0m50.349s)/0m50.349s x 100% = -1.7%
  
  Very nice, so there is a real world performance benefit to doing this. 
  Then yes, I think it would make sense to change the global helper 
  function to be fast on e500 and use that one from 
  e500_shadow_mas2_attrib() instead.
  
  Gleb, Paolo, any hard feelings?
  
  I do not see how can we break the function in such a way and get
  away with it. Not all valid pfns point to memory. Physical address can
  be sparse (due to PCI hole, framebuffer or just because).
  
  But we don't check for sparseness today in here either. We merely check 
  for incomplete huge pages.
  
  That's not how I read the code. The code checks for reserved flag set.
  It should be set on pfns that point to memory holes. As far as I
 
 I couldn't find any traces of code that sets the reserved bits on e500 chips 
 though. I've only seen it getting set for memory hotplug and memory 
 incoherent DMA code which doesn't get used on e500.
 
 But I'd be more than happy to get proven wrong :).
 
Can you write a module that scans all page structures? AFAIK all pages
are marked as reserved and then those that become regular memory are
marked as unreserved. Hope Andrea will chime in here :)

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvm: powerpc: set cache coherency only for kernel managed pages

2013-07-24 Thread Scott Wood

On 07/24/2013 04:39:59 AM, Alexander Graf wrote:


On 24.07.2013, at 11:35, Gleb Natapov wrote:

 On Wed, Jul 24, 2013 at 11:21:11AM +0200, Alexander Graf wrote:
 Are not we going to use page_is_ram() from   
e500_shadow_mas2_attrib() as Scott commented?


 rWhy aren't we using page_is_ram() in kvm_is_mmio_pfn()?


 Because it is much slower and, IIRC, actually used to build pfn map  
that allow

 us to check quickly for valid pfn.

Then why should we use page_is_ram()? :)

I really don't want the e500 code to diverge too much from what the  
rest of the kvm code is doing.


I don't understand actually used to build pfn map  What code is  
this?  I don't see any calls to page_is_ram() in the KVM code, or in  
generic mm code.  Is this a statement about what x86 does?


On PPC page_is_ram() is only called (AFAICT) for determining what  
attributes to set on mmaps.  We want to be sure that KVM always makes  
the same decision.  While pfn_valid() seems like it should be  
equivalent, it's not obvious from the PPC code that it is.


If pfn_valid() is better, why is that not used for mmap?  Why are there  
two different names for the same thing?


-Scott
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 04/10] powerpc: Prepare to support kernel handling of IOMMU map/unmap

2013-07-24 Thread Andrew Morton
On Tue, 23 Jul 2013 12:22:59 +1000 Alexey Kardashevskiy a...@ozlabs.ru wrote:

 Ping, anyone, please?

ew, you top-posted.

 Ben needs ack from any of MM people before proceeding with this patch. Thanks!

For what?  The three lines of comment in page-flags.h?   ack :)

Manipulating page-_count directly is considered poor form.  Don't
blame us if we break your code ;)

Actually, the manipulation in realmode_get_page() duplicates the
existing get_page_unless_zero() and the one in realmode_put_page()
could perhaps be placed in mm.h with a suitable name and some
documentation.  That would improve your form and might protect the code
from getting broken later on.

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 04/10] powerpc: Prepare to support kernel handling of IOMMU map/unmap

2013-07-24 Thread Benjamin Herrenschmidt
On Wed, 2013-07-24 at 15:43 -0700, Andrew Morton wrote:
 For what?  The three lines of comment in page-flags.h?   ack :)
 
 Manipulating page-_count directly is considered poor form.  Don't
 blame us if we break your code ;)
 
 Actually, the manipulation in realmode_get_page() duplicates the
 existing get_page_unless_zero() and the one in realmode_put_page()
 could perhaps be placed in mm.h with a suitable name and some
 documentation.  That would improve your form and might protect the code
 from getting broken later on.

Yes, this stuff makes me really nervous :-) If it didn't provide an order
of magnitude performance improvement in KVM I would avoid it but heh...

Alexey, I like having that stuff in generic code.

However the meaning of the words real mode can be ambiguous accross
architectures, it might be best to then name it mmu_off_put_page to
make things a bit clearer, along with a comment explaining that this is
called in a context where none of the virtual mappings are accessible
(vmalloc, vmemmap, IOs, ...), and that in the case of sparsemem vmemmap
the caller must have taken care of getting the physical address of the
struct page and of ensuring it isn't split accross two vmemmap blocks.

Cheers,
Ben.


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html