Re: [PATCH] vhost: Add polling mode

2014-07-29 Thread Razya Ladelsky
kvm-ow...@vger.kernel.org wrote on 29/07/2014 04:30:34 AM:

 From: Zhang Haoyu zhan...@sangfor.com
 To: Jason Wang jasow...@redhat.com, Abel Gordon 
 abel.gor...@gmail.com, 
 Cc: Razya Ladelsky/Haifa/IBM@IBMIL, Alex Glikson/Haifa/IBM@IBMIL, 
 Eran Raichstein/Haifa/IBM@IBMIL, Joel Nider/Haifa/IBM@IBMIL, kvm 
 kvm@vger.kernel.org, Michael S. Tsirkin m...@redhat.com, Yossi 
 Kuperman1/Haifa/IBM@IBMIL
 Date: 29/07/2014 04:35 AM
 Subject: Re: [PATCH] vhost: Add polling mode
 Sent by: kvm-ow...@vger.kernel.org
 
 Maybe tie a knot between vhost-net scalability tuning: threading 
 for many VMs and vhost: Add polling mode is a good marriage,
 because it's more possibility to get work to do with less polling 
 time, so less cpu cycles waste.
 

Hi Zhang,
Indeed have one vhost thread shared by multiple vms, polling for their 
requests is
the ultimate goal of this plan.
The current challenge with it is that the cgroup mechanism needs to be 
supported/incorporated somehow by this shared vhost thread, as it now 
serves multiple vms(processes).
B.T.W. - if someone wants to help with this effort (mainly the cgroup 
issue),
it would be greatly appreciated...! 
 
Thank you,
Razya 

 Thanks,
 Zhang Haoyu
 
Hello All,
   
When vhost is waiting for buffers from the guest driver
 (e.g., more
packets
to send in vhost-net's transmit queue), it normally 
 goes to sleep and
waits
for the guest to kick it. This kick involves a PIO in
 the guest, and
therefore an exit (and possibly userspace involvement 
 in translating
this
PIO
exit into a file descriptor event), all of which hurts 
 performance.
   
If the system is under-utilized (has cpu time to 
 spare), vhost can
continuously poll the virtqueues for new buffers, and 
 avoid asking
the guest to kick us.
This patch adds an optional polling mode to vhost, that
 can be enabled
via a kernel module parameter, poll_start_rate.
   
When polling is active for a virtqueue, the guest is asked 
to
disable notification (kicks), and the worker thread 
continuously
checks
for
new buffers. When it does discover new buffers, it 
 simulates a kick
by
invoking the underlying backend driver (such as vhost-net), 
which
thinks
it
got a real kick from the guest, and acts accordingly. If 
the
underlying
driver asks not to be kicked, we disable polling on 
 this virtqueue.
   
We start polling on a virtqueue when we notice it has
work to do. Polling on this virtqueue is later disabled 
after 3
seconds of
polling turning up no new work, as in this case we are 
better off
returning
to the exit-based notification mechanism. The default 
 timeout of 3
seconds
can be changed with the poll_stop_idle kernel module 
parameter.
   
This polling approach makes lot of sense for new HW with
posted-interrupts
for which we have exitless host-to-guest notifications.
 But even with
support
for posted interrupts, guest-to-host communication 
 still causes exits.
Polling adds the missing part.
   
When systems are overloaded, there won?t be enough cpu 
 time for the
various
vhost threads to poll their guests' devices. For these 
 scenarios, we
plan
to add support for vhost threads that can be shared by 
multiple
devices,
even of multiple vms.
Our ultimate goal is to implement the I/O acceleration 
features
described
in:
KVM Forum 2013: Efficient and Scalable Virtio (by Abel 
Gordon)
https://www.youtube.com/watch?v=9EyweibHfEs
and

https://www.mail-archive.com/kvm@vger.kernel.org/msg98179.html
   
   
Comments are welcome,
Thank you,
Razya
Thanks for the work. Do you have perf numbers for this?
   
Hi Jason,
Thanks for reviewing. I ran some experiments with TCP 
 stream netperf and
filebench (having 2 threads performing random reads) 
 benchmarks on an IBM
System x3650 M4.
All runs loaded the guests in a way that they were (cpu) 
saturated.
The system had two cores per guest, as to allow for both 
 the vcpu and the
vhost thread to
run concurrently for maximum throughput (but I didn't pin 
 the threads to
specific cores)
I get:
   
Netperf, 1 vm:
The polling patch improved throughput by ~33%. Number of 
exits/sec
decreased 6x.
The same improvement was shown when I tested with 3 vms 
 running netperf.
   
filebench, 1 vm:
ops/sec improved by 13% with the polling patch. Number of exits 
was
reduced by 31%.
The same experiment with 3 vms running filebench showed 
 similar numbers.
  
   Looks good, may worth to add the result in the commit log.
   
And looks like the patch only poll for virtqueue. In the 
 future, may
worth to add callbacks for vhost_net to poll socket. Then
 it could be
used with rx busy polling in host which may speedup the rx 
also.
Did you mean polling the network device to avoid 

Re: [PATCH] vhost: Add polling mode

2014-07-29 Thread Michael S. Tsirkin
On Mon, Jul 21, 2014 at 04:23:44PM +0300, Razya Ladelsky wrote:
 Hello All,
 
 When vhost is waiting for buffers from the guest driver (e.g., more 
 packets
 to send in vhost-net's transmit queue), it normally goes to sleep and 
 waits
 for the guest to kick it. This kick involves a PIO in the guest, and
 therefore an exit (and possibly userspace involvement in translating this 
 PIO
 exit into a file descriptor event), all of which hurts performance.
 
 If the system is under-utilized (has cpu time to spare), vhost can 
 continuously poll the virtqueues for new buffers, and avoid asking 
 the guest to kick us.
 This patch adds an optional polling mode to vhost, that can be enabled 
 via a kernel module parameter, poll_start_rate.
 
 When polling is active for a virtqueue, the guest is asked to
 disable notification (kicks), and the worker thread continuously checks 
 for
 new buffers. When it does discover new buffers, it simulates a kick by
 invoking the underlying backend driver (such as vhost-net), which thinks 
 it
 got a real kick from the guest, and acts accordingly. If the underlying
 driver asks not to be kicked, we disable polling on this virtqueue.
 
 We start polling on a virtqueue when we notice it has
 work to do. Polling on this virtqueue is later disabled after 3 seconds of
 polling turning up no new work, as in this case we are better off 
 returning
 to the exit-based notification mechanism. The default timeout of 3 seconds
 can be changed with the poll_stop_idle kernel module parameter.
 
 This polling approach makes lot of sense for new HW with posted-interrupts
 for which we have exitless host-to-guest notifications. But even with 
 support 
 for posted interrupts, guest-to-host communication still causes exits. 
 Polling adds the missing part.
 
 When systems are overloaded, there won?t be enough cpu time for the 
 various 
 vhost threads to poll their guests' devices. For these scenarios, we plan 
 to add support for vhost threads that can be shared by multiple devices, 
 even of multiple vms. 
 Our ultimate goal is to implement the I/O acceleration features described 
 in:
 KVM Forum 2013: Efficient and Scalable Virtio (by Abel Gordon) 
 https://www.youtube.com/watch?v=9EyweibHfEs
 and
 https://www.mail-archive.com/kvm@vger.kernel.org/msg98179.html
 
  
 Comments are welcome, 
 Thank you,
 Razya
 
 From: Razya Ladelsky ra...@il.ibm.com
 
 Add an optional polling mode to continuously poll the virtqueues
 for new buffers, and avoid asking the guest to kick us.
 
 Signed-off-by: Razya Ladelsky ra...@il.ibm.com

This is an optimization patch, isn't it?
Could you please include some numbers showing its
effect?


 ---
  drivers/vhost/net.c   |6 +-
  drivers/vhost/scsi.c  |5 +-
  drivers/vhost/vhost.c |  247 
 +++--
  drivers/vhost/vhost.h |   37 +++-
  4 files changed, 277 insertions(+), 18 deletions(-)


Whitespace seems mangled to the point of making patch
unreadable. Can you pls repost?

 diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
 index 971a760..558aecb 100644
 --- a/drivers/vhost/net.c
 +++ b/drivers/vhost/net.c
 @@ -742,8 +742,10 @@ static int vhost_net_open(struct inode *inode, struct 
 file *f)
 }
 vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX);
  
 -   vhost_poll_init(n-poll + VHOST_NET_VQ_TX, handle_tx_net, POLLOUT, 
 dev);
 -   vhost_poll_init(n-poll + VHOST_NET_VQ_RX, handle_rx_net, POLLIN, 
 dev);
 +   vhost_poll_init(n-poll + VHOST_NET_VQ_TX, handle_tx_net, POLLOUT,
 +   vqs[VHOST_NET_VQ_TX]);
 +   vhost_poll_init(n-poll + VHOST_NET_VQ_RX, handle_rx_net, POLLIN,
 +   vqs[VHOST_NET_VQ_RX]);
  
 f-private_data = n;
  
 diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
 index 4f4ffa4..56f0233 100644
 --- a/drivers/vhost/scsi.c
 +++ b/drivers/vhost/scsi.c
 @@ -1528,9 +1528,8 @@ static int vhost_scsi_open(struct inode *inode, 
 struct file *f)
 if (!vqs)
 goto err_vqs;
  
 -   vhost_work_init(vs-vs_completion_work, 
 vhost_scsi_complete_cmd_work);
 -   vhost_work_init(vs-vs_event_work, tcm_vhost_evt_work);
 -
 +   vhost_work_init(vs-vs_completion_work, NULL, 
 vhost_scsi_complete_cmd_work);
 +   vhost_work_init(vs-vs_event_work, NULL, tcm_vhost_evt_work);
 vs-vs_events_nr = 0;
 vs-vs_events_missed = false;
  
 diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c
 index c90f437..678d766 100644
 --- a/drivers/vhost/vhost.c
 +++ b/drivers/vhost/vhost.c
 @@ -24,9 +24,17 @@
  #include linux/slab.h
  #include linux/kthread.h
  #include linux/cgroup.h
 +#include linux/jiffies.h
  #include linux/module.h
  
  #include vhost.h
 +static int poll_start_rate = 0;
 +module_param(poll_start_rate, int, S_IRUGO|S_IWUSR);
 +MODULE_PARM_DESC(poll_start_rate, Start continuous polling of virtqueue 
 when rate of events is at least this number per jiffy. If 0, never start 
 polling.);
 +
 

Re: [PATCH] target-mips: Ignore unassigned accesses with KVM

2014-07-29 Thread Paolo Bonzini
Il 28/07/2014 23:36, Aurelien Jarno ha scritto:
 On Mon, Jul 28, 2014 at 12:37:50PM +0100, James Hogan wrote:
 MIPS registers an unassigned access handler which raises a guest bus
 error exception. However this causes QEMU to crash when KVM is enabled
 as it isn't called from the main execution loop so longjmp() gets called
 without a corresponding setjmp().

 Until the KVM API can be updated to trigger a guest exception in
 response to an MMIO exit, prevent the bus error exception being raised
 from mips_cpu_unassigned_access() if KVM is enabled.

 The check is at run time since the do_unassigned_access callback is
 initialised before it is known whether KVM will be enabled.

 The problem can be triggered with Malta emulation by making the guest
 write to the reset region at physical address 0x1bf0, since it is
 marked read-only which is treated as unassigned for writes.

 Signed-off-by: James Hogan james.ho...@imgtec.com
 Cc: Aurelien Jarno aurel...@aurel32.net
 Cc: Peter Maydell peter.mayd...@linaro.org
 Cc: Paolo Bonzini pbonz...@redhat.com
 Cc: Gleb Natapov g...@redhat.com
 Cc: Christoffer Dall christoffer.d...@linaro.org
 Cc: Sanjay Lal sanj...@kymasys.com
 ---
  target-mips/op_helper.c | 11 +++
  1 file changed, 11 insertions(+)

 diff --git a/target-mips/op_helper.c b/target-mips/op_helper.c
 index 27651a4a00c1..df97b35f8701 100644
 --- a/target-mips/op_helper.c
 +++ b/target-mips/op_helper.c
 @@ -21,6 +21,7 @@
  #include qemu/host-utils.h
  #include exec/helper-proto.h
  #include exec/cpu_ldst.h
 +#include sysemu/kvm.h
  
  #ifndef CONFIG_USER_ONLY
  static inline void cpu_mips_tlb_flush (CPUMIPSState *env, int flush_global);
 @@ -2168,6 +2169,16 @@ void mips_cpu_unassigned_access(CPUState *cs, hwaddr 
 addr,
  MIPSCPU *cpu = MIPS_CPU(cs);
  CPUMIPSState *env = cpu-env;
  
 +/*
 + * Raising an exception with KVM enabled will crash because it won't be 
 from
 + * the main execution loop so the longjmp won't have a matching setjmp.
 + * Until we can trigger a bus error exception through KVM lets just 
 ignore
 + * the access.
 + */
 +if (kvm_enabled()) {
 +return;
 +}
 +
  if (is_exec) {
  helper_raise_exception(env, EXCP_IBE);
  } else {
 
 Reviewed-by: Aurelien Jarno aurel...@aurel32.net
 
 Note that even if the test is added for each exception, it is light
 enough compared to triggering and handling an exception so that it has
 no impact on performance.
 
 Paolo, do you want to take this patch in your kvm tree?

Sure, I'll include it for 2.2.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4 v3] xen-pciback: use pci device flag operation helper function

2014-07-29 Thread David Vrabel
On 29/07/14 05:06, Ethan Zhao wrote:
 Use pci device flag operation helper functions when set device
 to assigned or deassigned state.
 
 Signed-off-by: Ethan Zhao ethan.z...@oracle.com

Konrad already reviewed this but you've not included his reviewed-by tag.

I don't understand why we bother with this flag on Xen since we never
use it but:

Acked-by: David Vrabel david.vra...@citrix.com

I'm expecting this to go via the PCI tree.

David

 --- a/drivers/xen/xen-pciback/pci_stub.c
 +++ b/drivers/xen/xen-pciback/pci_stub.c
 @@ -133,7 +133,7 @@ static void pcistub_device_release(struct kref *kref)
   xen_pcibk_config_free_dyn_fields(dev);
   xen_pcibk_config_free_dev(dev);
  
 - dev-dev_flags = ~PCI_DEV_FLAGS_ASSIGNED;
 + pci_clear_dev_assigned(dev);
   pci_dev_put(dev);
  
   kfree(psdev);
 @@ -404,7 +404,7 @@ static int pcistub_init_device(struct pci_dev *dev)
   dev_dbg(dev-dev, reset device\n);
   xen_pcibk_reset_device(dev);
  
 - dev-dev_flags |= PCI_DEV_FLAGS_ASSIGNED;
 + pci_set_dev_assigned(dev);
   return 0;
  
  config_release:
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] vhost: Add polling mode

2014-07-29 Thread Razya Ladelsky
Michael S. Tsirkin m...@redhat.com wrote on 29/07/2014 11:06:40 AM:

 From: Michael S. Tsirkin m...@redhat.com
 To: Razya Ladelsky/Haifa/IBM@IBMIL, 
 Cc: kvm@vger.kernel.org, abel.gor...@gmail.com, Joel Nider/Haifa/
 IBM@IBMIL, Yossi Kuperman1/Haifa/IBM@IBMIL, Eran Raichstein/Haifa/
 IBM@IBMIL, Alex Glikson/Haifa/IBM@IBMIL
 Date: 29/07/2014 11:06 AM
 Subject: Re: [PATCH] vhost: Add polling mode
 
 On Mon, Jul 21, 2014 at 04:23:44PM +0300, Razya Ladelsky wrote:
  Hello All,
  
  When vhost is waiting for buffers from the guest driver (e.g., more 
  packets
  to send in vhost-net's transmit queue), it normally goes to sleep and 
  waits
  for the guest to kick it. This kick involves a PIO in the guest, and
  therefore an exit (and possibly userspace involvement in translating 
this 
  PIO
  exit into a file descriptor event), all of which hurts performance.
  
  If the system is under-utilized (has cpu time to spare), vhost can 
  continuously poll the virtqueues for new buffers, and avoid asking 
  the guest to kick us.
  This patch adds an optional polling mode to vhost, that can be enabled 

  via a kernel module parameter, poll_start_rate.
  
  When polling is active for a virtqueue, the guest is asked to
  disable notification (kicks), and the worker thread continuously 
checks 
  for
  new buffers. When it does discover new buffers, it simulates a kick 
by
  invoking the underlying backend driver (such as vhost-net), which 
thinks 
  it
  got a real kick from the guest, and acts accordingly. If the 
underlying
  driver asks not to be kicked, we disable polling on this virtqueue.
  
  We start polling on a virtqueue when we notice it has
  work to do. Polling on this virtqueue is later disabled after 3 
seconds of
  polling turning up no new work, as in this case we are better off 
  returning
  to the exit-based notification mechanism. The default timeout of 3 
seconds
  can be changed with the poll_stop_idle kernel module parameter.
  
  This polling approach makes lot of sense for new HW with 
posted-interrupts
  for which we have exitless host-to-guest notifications. But even with 
  support 
  for posted interrupts, guest-to-host communication still causes exits. 

  Polling adds the missing part.
  
  When systems are overloaded, there won?t be enough cpu time for the 
  various 
  vhost threads to poll their guests' devices. For these scenarios, we 
plan 
  to add support for vhost threads that can be shared by multiple 
devices, 
  even of multiple vms. 
  Our ultimate goal is to implement the I/O acceleration features 
described 
  in:
  KVM Forum 2013: Efficient and Scalable Virtio (by Abel Gordon) 
  https://www.youtube.com/watch?v=9EyweibHfEs
  and
  https://www.mail-archive.com/kvm@vger.kernel.org/msg98179.html
  
  
  Comments are welcome, 
  Thank you,
  Razya
  
  From: Razya Ladelsky ra...@il.ibm.com
  
  Add an optional polling mode to continuously poll the virtqueues
  for new buffers, and avoid asking the guest to kick us.
  
  Signed-off-by: Razya Ladelsky ra...@il.ibm.com
 
 This is an optimization patch, isn't it?
 Could you please include some numbers showing its
 effect?
 
 

Hi Michael,
Sure. I included them in a reply to Jason Wang in this thread,
Here it is:
http://www.spinics.net/linux/lists/kvm/msg106049.html




  ---
   drivers/vhost/net.c   |6 +-
   drivers/vhost/scsi.c  |5 +-
   drivers/vhost/vhost.c |  247 
  +++--
   drivers/vhost/vhost.h |   37 +++-
   4 files changed, 277 insertions(+), 18 deletions(-)
 
 
 Whitespace seems mangled to the point of making patch
 unreadable. Can you pls repost?
 

Sure.

  diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
  index 971a760..558aecb 100644
  --- a/drivers/vhost/net.c
  +++ b/drivers/vhost/net.c
  @@ -742,8 +742,10 @@ static int vhost_net_open(struct inode *inode, 
struct 
  file *f)
  }
  vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX);
  
  -   vhost_poll_init(n-poll + VHOST_NET_VQ_TX, handle_tx_net, 
POLLOUT, 
  dev);
  -   vhost_poll_init(n-poll + VHOST_NET_VQ_RX, handle_rx_net, 
POLLIN, 
  dev);
  +   vhost_poll_init(n-poll + VHOST_NET_VQ_TX, handle_tx_net, 
POLLOUT,
  +   vqs[VHOST_NET_VQ_TX]);
  +   vhost_poll_init(n-poll + VHOST_NET_VQ_RX, handle_rx_net, 
POLLIN,
  +   vqs[VHOST_NET_VQ_RX]);
  
  f-private_data = n;
  
  diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c
  index 4f4ffa4..56f0233 100644
  --- a/drivers/vhost/scsi.c
  +++ b/drivers/vhost/scsi.c
  @@ -1528,9 +1528,8 @@ static int vhost_scsi_open(struct inode *inode, 
  struct file *f)
  if (!vqs)
  goto err_vqs;
  
  -   vhost_work_init(vs-vs_completion_work, 
  vhost_scsi_complete_cmd_work);
  -   vhost_work_init(vs-vs_event_work, tcm_vhost_evt_work);
  -
  +   vhost_work_init(vs-vs_completion_work, NULL, 
  vhost_scsi_complete_cmd_work);
  +   

Re: [PATCH] vhost: Add polling mode

2014-07-29 Thread Michael S. Tsirkin
On Tue, Jul 29, 2014 at 01:30:03PM +0300, Razya Ladelsky wrote:

[..] had to snip off the quoted text, it's mangled up to
 unreadability.
Please take a look at Documentation/email-clients.txt
to fix this.

  This is an optimization patch, isn't it?
  Could you please include some numbers showing its
  effect?
  
  
 
 Hi Michael,
 Sure. I included them in a reply to Jason Wang in this thread,
 Here it is:
 http://www.spinics.net/linux/lists/kvm/msg106049.html
 

Hmm there aren't a lot of numbers there :(. Speed increased by 33% but
by how much?  E.g. maybe you are getting from 1Mbyte/sec to 1.3,
if so it's hard to get excited about it. Some questions that come to
mind: what was the message size? I would expect several measurements
with different values.  How did host CPU utilization change?

What about latency? As we are competing with guest for host CPU,
would worst-case or average latency suffer?

Thanks,

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 6/6] kvm, mem-hotplug: Reload L1's apic access page if it is migrated when L2 is running.

2014-07-29 Thread tangchen


On 07/26/2014 04:44 AM, Jan Kiszka wrote:

On 2014-07-23 21:42, Tang Chen wrote:

This patch only handle L1 and L2 vm share one apic access page situation.

When L1 vm is running, if the shared apic access page is migrated, mmu_notifier 
will
request all vcpus to exit to L0, and reload apic access page physical address 
for
all the vcpus' vmcs (which is done by patch 5/6). And when it enters L2 vm, 
L2's vmcs
will be updated in prepare_vmcs02() called by nested_vm_run(). So we need to do
nothing.

When L2 vm is running, if the shared apic access page is migrated, mmu_notifier 
will
request all vcpus to exit to L0, and reload apic access page physical address 
for
all L2 vmcs. And this patch requests apic access page reload in L2-L1 vmexit.

Shouldn't this patch come before we allow apic access page migration?

Yes, it should come before patch 5.

Thanks.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH/RFC 4/5] s390x/kvm: test whether a cpu is STOPPED when checking has_work

2014-07-29 Thread Christian Borntraeger
On 28/07/14 16:22, Alexander Graf wrote:
 
 On 28.07.2014, at 16:16, David Hildenbrand d...@linux.vnet.ibm.com wrote:
 

 On 10.07.14 15:10, Christian Borntraeger wrote:
 From: David Hildenbrand d...@linux.vnet.ibm.com

 If a cpu is stopped, it must never be allowed to run and no interrupt may 
 wake it
 up. A cpu also has to be unhalted if it is halted and has work to do - this
 scenario wasn't hit in kvm case yet, as only disabled wait is processed 
 within
 QEMU.

 Signed-off-by: David Hildenbrand d...@linux.vnet.ibm.com
 Reviewed-by: Cornelia Huck cornelia.h...@de.ibm.com
 Reviewed-by: Christian Borntraeger borntrae...@de.ibm.com
 Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com

 This looks like it's something that generic infrastructure should take 
 care of, no? How does this work for the other archs? They always get an 
 interrupt on the transition between !has_work - has_work. Why don't we 
 get one for s390x?


 Alex



 Well, we have the special case on s390 as a CPU that is in the STOPPED or the
 CHECK STOP state may never run - even if there is an interrupt. It's
 basically like this CPU has been switched off.

 Imagine that it is tried to inject an interrupt into a stopped vcpu. It
 will kick the stopped vcpu and thus lead to a call to
 kvm_arch_process_async_events(). We have to deny that this vcpu will ever
 run as long as it is stopped. It's like a way to suppress the
 interrupt for such a transition you mentioned.
 
 An interrupt kick usually just means we go back into the main loop. From 
 there we check the interrupt bitmap which interrupt to handle. Check out the 
 handling code here:
 
   
 http://git.qemu.org/?p=qemu.git;a=blob;f=cpu-exec.c;h=38e5f02a307523d99134f4e2e6c51683bb10b45b;hb=HEAD#l580
 
 If you just check for the stopped state in here, do_interrupt() will never 
 get called and thus the CPU shouldn't ever get executed. Unless I'm heavily 
 mistaken :).
 

 Later, another vcpu might decide to turn that vcpu back on (by e.g. sending a
 SIGP START to that vcpu).
 
 Yes, in that case that other CPU generates a signal (a different bit in 
 interrupt_request) and the first CPU would see that it has to wake up and 
 wake up.
 
 I am not sure if such a mechanism/scenario is applicable to any other arch. 
 They
 all seem to reset the cs-halted flag if they know they are able to run (e.g.
 due to an interrupt) - they have no such thing as stopped cpus, only
 halted/waiting cpus.
 
 There's not really much difference between the two. The only difference from 
 a software point of view is that a stopped CPU has its external interrupt 
 bits masked off, no?

We have
- wait (wait bit in PSW)
- disabled wait (wait bit and interrupt fencing in PSW)
- STOPPED (not related to PSW, state change usually handled via service 
processor or hypervisor)

I think we have to differentiate between KVM/TCG. On KVM we always do in kernel 
halt and qemu sees a halted only for STOPPED or disabled wait. TCG has to take 
care of the normal wait as well.

From a first glimpse, a disabled wait and STOPPED look similar, but there are 
(important) differences, e.g. other CPUs get a different a different result 
from a SIGP SENSE. This makes a big difference, e.g. for Linux guests, that 
send a SIGP STOP, followed by a SIGP SENSE loop until the CPU is down on 
hotplug (and shutdown, kexec..) So I think we agree, that handling the cpu 
states natively makes sense.

The question is now only how to model it correctly without breaking TCG/KVM and 
reuse as much common code as possible. Correct?

Do I understand you correctly, that your collapsing of stopped and halted is 
only in the qemu coding sense, IOW maybe we could just modify 
kvm_arch_process_async_events to consider the STOPPED state, as TCGs sigp 
implementation does not support SMP anyway?
David would that work?

Christian

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH/RFC 4/5] s390x/kvm: test whether a cpu is STOPPED when checking has_work

2014-07-29 Thread Alexander Graf


On 29.07.14 13:44, Christian Borntraeger wrote:

On 28/07/14 16:22, Alexander Graf wrote:

On 28.07.2014, at 16:16, David Hildenbrand d...@linux.vnet.ibm.com wrote:


On 10.07.14 15:10, Christian Borntraeger wrote:

From: David Hildenbrand d...@linux.vnet.ibm.com

If a cpu is stopped, it must never be allowed to run and no interrupt may wake 
it
up. A cpu also has to be unhalted if it is halted and has work to do - this
scenario wasn't hit in kvm case yet, as only disabled wait is processed within
QEMU.

Signed-off-by: David Hildenbrand d...@linux.vnet.ibm.com
Reviewed-by: Cornelia Huck cornelia.h...@de.ibm.com
Reviewed-by: Christian Borntraeger borntrae...@de.ibm.com
Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com

This looks like it's something that generic infrastructure should take
care of, no? How does this work for the other archs? They always get an
interrupt on the transition between !has_work - has_work. Why don't we
get one for s390x?


Alex



Well, we have the special case on s390 as a CPU that is in the STOPPED or the
CHECK STOP state may never run - even if there is an interrupt. It's
basically like this CPU has been switched off.

Imagine that it is tried to inject an interrupt into a stopped vcpu. It
will kick the stopped vcpu and thus lead to a call to
kvm_arch_process_async_events(). We have to deny that this vcpu will ever
run as long as it is stopped. It's like a way to suppress the
interrupt for such a transition you mentioned.

An interrupt kick usually just means we go back into the main loop. From there 
we check the interrupt bitmap which interrupt to handle. Check out the handling 
code here:

   
http://git.qemu.org/?p=qemu.git;a=blob;f=cpu-exec.c;h=38e5f02a307523d99134f4e2e6c51683bb10b45b;hb=HEAD#l580

If you just check for the stopped state in here, do_interrupt() will never get 
called and thus the CPU shouldn't ever get executed. Unless I'm heavily 
mistaken :).


Later, another vcpu might decide to turn that vcpu back on (by e.g. sending a
SIGP START to that vcpu).

Yes, in that case that other CPU generates a signal (a different bit in 
interrupt_request) and the first CPU would see that it has to wake up and wake 
up.


I am not sure if such a mechanism/scenario is applicable to any other arch. They
all seem to reset the cs-halted flag if they know they are able to run (e.g.
due to an interrupt) - they have no such thing as stopped cpus, only
halted/waiting cpus.

There's not really much difference between the two. The only difference from a software 
point of view is that a stopped CPU has its external interrupt bits masked 
off, no?

We have
- wait (wait bit in PSW)
- disabled wait (wait bit and interrupt fencing in PSW)
- STOPPED (not related to PSW, state change usually handled via service 
processor or hypervisor)

I think we have to differentiate between KVM/TCG. On KVM we always do in kernel 
halt and qemu sees a halted only for STOPPED or disabled wait. TCG has to take 
care of the normal wait as well.

 From a first glimpse, a disabled wait and STOPPED look similar, but there are 
(important) differences, e.g. other CPUs get a different a different result 
from a SIGP SENSE. This makes a big difference, e.g. for Linux guests, that 
send a SIGP STOP, followed by a SIGP SENSE loop until the CPU is down on 
hotplug (and shutdown, kexec..) So I think we agree, that handling the cpu 
states natively makes sense.

The question is now only how to model it correctly without breaking TCG/KVM and 
reuse as much common code as possible. Correct?

Do I understand you correctly, that your collapsing of stopped and halted is 
only in the qemu coding sense, IOW maybe we could just modify 
kvm_arch_process_async_events to consider the STOPPED state, as TCGs sigp 
implementation does not support SMP anyway?


That works for me, yes.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4 v3] xen-pciback: use pci device flag operation helper function

2014-07-29 Thread Ethan Zhao


 在 2014年7月29日,下午6:07,David Vrabel david.vra...@citrix.com 写道:
 
 On 29/07/14 05:06, Ethan Zhao wrote:
 Use pci device flag operation helper functions when set device
 to assigned or deassigned state.
 
 Signed-off-by: Ethan Zhao ethan.z...@oracle.com
 
 Konrad already reviewed this but you've not included his reviewed-by tag.
 
yep,I lost Konrad's reviewed-by tag.
Thanks.
 I don't understand why we bother with this flag on Xen since we never
 use it but:
 
 Acked-by: David Vrabel david.vra...@citrix.com
 
 I'm expecting this to go via the PCI tree.
 
 David
 
 --- a/drivers/xen/xen-pciback/pci_stub.c
 +++ b/drivers/xen/xen-pciback/pci_stub.c
 @@ -133,7 +133,7 @@ static void pcistub_device_release(struct kref *kref)
xen_pcibk_config_free_dyn_fields(dev);
xen_pcibk_config_free_dev(dev);
 
 -dev-dev_flags = ~PCI_DEV_FLAGS_ASSIGNED;
 +pci_clear_dev_assigned(dev);
pci_dev_put(dev);
 
kfree(psdev);
 @@ -404,7 +404,7 @@ static int pcistub_init_device(struct pci_dev *dev)
dev_dbg(dev-dev, reset device\n);
xen_pcibk_reset_device(dev);
 
 -dev-dev_flags |= PCI_DEV_FLAGS_ASSIGNED;
 +pci_set_dev_assigned(dev);
return 0;
 
 config_release:
 
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] vhost: Add polling mode

2014-07-29 Thread Razya Ladelsky
 
 Hmm there aren't a lot of numbers there :(. Speed increased by 33% but
 by how much?  E.g. maybe you are getting from 1Mbyte/sec to 1.3,
 if so it's hard to get excited about it. 

Netperf 1 VM: 1516 MB/sec - 2046 MB/sec
and for 3 VMs: 4086 MB/sec - 5545 MB/sec

 Some questions that come to
 mind: what was the message size? I would expect several measurements
 with different values.  How did host CPU utilization change?
 

message size  was 64B in order to get the VM to be cpu saturated. 
so vhost had 99% cpu and vhost 38%, with the polling patch both had 99%.



 What about latency? As we are competing with guest for host CPU,
 would worst-case or average latency suffer?
 

Polling indeed doesn't make a lot of sense if there aren't enough 
available cores.
In these cases polling should not be used.

Thank you,
Razya



 Thanks,
 
 -- 
 MST
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/4 v3] KVM: use pci device flag operation helper functions

2014-07-29 Thread Ethan Zhao
This patch was already
Acked-by: Paolo Bonzini pbonz...@redhat.com
I forgot to add the Ack tag.

Thanks,
Ethan

Sorry for last post in HTML format with ipad.

On Tue, Jul 29, 2014 at 12:06 PM, Ethan Zhao ethan.z...@oracle.com wrote:
 Use helper function instead of direct operation to pci device
 flag when set device to assigned or deassigned.

 Signed-off-by: Ethan Zhao ethan.z...@oracle.com
 ---
  v3: amend helper functions naming.
  virt/kvm/assigned-dev.c |2 +-
  virt/kvm/iommu.c|4 ++--
  2 files changed, 3 insertions(+), 3 deletions(-)

 diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c
 index bf06577..d122bda 100644
 --- a/virt/kvm/assigned-dev.c
 +++ b/virt/kvm/assigned-dev.c
 @@ -302,7 +302,7 @@ static void kvm_free_assigned_device(struct kvm *kvm,
 else
 pci_restore_state(assigned_dev-dev);

 -   assigned_dev-dev-dev_flags = ~PCI_DEV_FLAGS_ASSIGNED;
 +   pci_clear_dev_assigned(assigned_dev-dev);

 pci_release_regions(assigned_dev-dev);
 pci_disable_device(assigned_dev-dev);
 diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c
 index 0df7d4b..8cfe021 100644
 --- a/virt/kvm/iommu.c
 +++ b/virt/kvm/iommu.c
 @@ -194,7 +194,7 @@ int kvm_assign_device(struct kvm *kvm,
 goto out_unmap;
 }

 -   pdev-dev_flags |= PCI_DEV_FLAGS_ASSIGNED;
 +   pci_set_dev_assigned(pdev);

 dev_info(pdev-dev, kvm assign device\n);

 @@ -220,7 +220,7 @@ int kvm_deassign_device(struct kvm *kvm,

 iommu_detach_device(domain, pdev-dev);

 -   pdev-dev_flags = ~PCI_DEV_FLAGS_ASSIGNED;
 +   pci_clear_dev_assigned(pdev);

 dev_info(pdev-dev, kvm deassign device\n);

 --
 1.7.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/4 v3] xen-pciback: use pci device flag operation helper function

2014-07-29 Thread Ethan Zhao
On Tue, Jul 29, 2014 at 6:07 PM, David Vrabel david.vra...@citrix.com wrote:
 On 29/07/14 05:06, Ethan Zhao wrote:
 Use pci device flag operation helper functions when set device
 to assigned or deassigned state.

 Signed-off-by: Ethan Zhao ethan.z...@oracle.com

 Konrad already reviewed this but you've not included his reviewed-by tag.

 I don't understand why we bother with this flag on Xen since we never
 use it but:

 Acked-by: David Vrabel david.vra...@citrix.com

 I'm expecting this to go via the PCI tree.

  Seems Bjorn still holds his bullet for last shot :)

 David

 --- a/drivers/xen/xen-pciback/pci_stub.c
 +++ b/drivers/xen/xen-pciback/pci_stub.c
 @@ -133,7 +133,7 @@ static void pcistub_device_release(struct kref *kref)
   xen_pcibk_config_free_dyn_fields(dev);
   xen_pcibk_config_free_dev(dev);

 - dev-dev_flags = ~PCI_DEV_FLAGS_ASSIGNED;
 + pci_clear_dev_assigned(dev);
   pci_dev_put(dev);

   kfree(psdev);
 @@ -404,7 +404,7 @@ static int pcistub_init_device(struct pci_dev *dev)
   dev_dbg(dev-dev, reset device\n);
   xen_pcibk_reset_device(dev);

 - dev-dev_flags |= PCI_DEV_FLAGS_ASSIGNED;
 + pci_set_dev_assigned(dev);
   return 0;

  config_release:


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] vhost: Add polling mode

2014-07-29 Thread Michael S. Tsirkin
On Tue, Jul 29, 2014 at 03:23:59PM +0300, Razya Ladelsky wrote:
  
  Hmm there aren't a lot of numbers there :(. Speed increased by 33% but
  by how much?  E.g. maybe you are getting from 1Mbyte/sec to 1.3,
  if so it's hard to get excited about it. 
 
 Netperf 1 VM: 1516 MB/sec - 2046 MB/sec
 and for 3 VMs: 4086 MB/sec - 5545 MB/sec

What do you mean by 1 VM? Streaming TCP host to vm?
Also, your throughput is somewhat low, it's worth seeing
why you can't hit higher speeds.

  Some questions that come to
  mind: what was the message size? I would expect several measurements
  with different values.  How did host CPU utilization change?
  
 
 message size  was 64B in order to get the VM to be cpu saturated. 
 so vhost had 99% cpu and vhost 38%, with the polling patch both had 99%.

Hmm so a net loss in throughput/CPU.

 
 
  What about latency? As we are competing with guest for host CPU,
  would worst-case or average latency suffer?
  
 
 Polling indeed doesn't make a lot of sense if there aren't enough 
 available cores.
 In these cases polling should not be used.
 
 Thank you,
 Razya

OK but scheduler might run vm and vhost on the same cpu
even if cores are available.
This needs to be detected somehow and polling disabled.


 
 
  Thanks,
  
  -- 
  MST
  --
  To unsubscribe from this list: send the line unsubscribe kvm in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
  
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Mailbox Warning

2014-07-29 Thread Webmail Admin
Email Account Warning !!!

This mail is from Administrator; we wish to bring to your notice the Condition 
of your email account.
We have just noticed that you have exceeded your email Database limit of 500 MB 
quota and your email IP is causing conflict because it is been accessed in 
different server location. You need to Upgrade and expand your email quota 
limit before you can continue to use your email. Provide details or click the 
link below :


Update your email quota limit to 2.6 GB, use the below web link:
  
https://www.formlogix.com/Manager/UserConditionalSurvey248582.aspx?Param=VXNlcklkPTI0ODU4Mi5Gb3JtSWQ9MQ%3d%3d

Failure to do this will result to email deactivation within 24hours
Thank you for your understanding.

Copyright 2014 Help Desk
Email Upgrade 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH/RFC 4/5] s390x/kvm: test whether a cpu is STOPPED when checking has_work

2014-07-29 Thread Paolo Bonzini
Il 28/07/2014 17:03, David Hildenbrand ha scritto:
 Well the difference is, that a STOPPED vcpu can be woken up by non-interrupt
 like things (SIGP START) AND a special interrupt (SIGP RESTART - which is like
 a SIPI++ as it performs a psw exchange - NMI). So we basically have two
 paths that can lead to a state change.  All interrupt bits may be in any
 combination (SIGP RESTART interrupts can't be masked out, nor can SIGP START 
 be
 denied).
 
 The other thing may be that on s390, each vcpu (including itself) can put
 another vcpu into the STOPPED state - I assume that this is different for x86 
 
 INIT_RECEIVED. For this reason we have to watch out for bad race conditions
 (e.g. multiple vcpus working on another vcpu)...

You can do that in x86 by sending an INIT inter-processor interrupt.  A
SIPI is ignored if the CPU is not in INIT_RECEIVED state.

Commit 66450a21f99636af4fafac2afd33f1a40631bc3a introduced the current
implementation.

- an INIT cancels a previous SIPI;

- if both INIT and SIPI are sent, on real hardware you need to have a
few hundred microseconds between them, but KVM will reliably process
INIT before SIPI.

See commit 299018f44ac553dce3caf84df1d14c4764faa279 for an example of
the races that can happen.

Note that x86 has KVM_MP_STATE_SIPI_RECEIVED state but it is obsolete,
we go straight from KVM_MP_STATE_INIT_RECEIVED to KVM_MP_STATE_RUNNABLE.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/6] KVM: PPC: BOOKE: Allow guest to change MSR_DE

2014-07-29 Thread Alexander Graf


On 29.07.14 00:01, Scott Wood wrote:

On Fri, 2014-07-11 at 14:09 +0530, Bharat Bhushan wrote:

When userspace is debugging guest then MSR_DE is always set and
MSRP_DEP is set so that guest cannot change MSR_DE.
Guest debug resources are not yet emulated, So there seems no reason
we should stop guest controlling MSR_DE.
Also a followup patch will enable debug emulation and that requires
guest to control MSR_DE.

Why does it matter whether we emulate debug resources?  We still don't
want the guest to be able to clear MSR[DE] and thus break host debug.


The patch description is misleading. This patch changes the default of 
DEP to guest controlled when it boots up. Once QEMU wants control over 
the debug registers, it gets switched to QEMU controlled (that code is 
already there).



Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] [PATCH/RFC 4/5] s390x/kvm: test whether a cpu is STOPPED when checking has_work

2014-07-29 Thread David Hildenbrand
 Il 28/07/2014 17:03, David Hildenbrand ha scritto:
  Well the difference is, that a STOPPED vcpu can be woken up by non-interrupt
  like things (SIGP START) AND a special interrupt (SIGP RESTART - which is 
  like
  a SIPI++ as it performs a psw exchange - NMI). So we basically have two
  paths that can lead to a state change.  All interrupt bits may be in any
  combination (SIGP RESTART interrupts can't be masked out, nor can SIGP 
  START be
  denied).
  
  The other thing may be that on s390, each vcpu (including itself) can put
  another vcpu into the STOPPED state - I assume that this is different for 
  x86 
  INIT_RECEIVED. For this reason we have to watch out for bad race conditions
  (e.g. multiple vcpus working on another vcpu)...
 
 You can do that in x86 by sending an INIT inter-processor interrupt.  A
 SIPI is ignored if the CPU is not in INIT_RECEIVED state.
 
 Commit 66450a21f99636af4fafac2afd33f1a40631bc3a introduced the current
 implementation.
 
 - an INIT cancels a previous SIPI;
 
 - if both INIT and SIPI are sent, on real hardware you need to have a
 few hundred microseconds between them, but KVM will reliably process
 INIT before SIPI.
 
 See commit 299018f44ac553dce3caf84df1d14c4764faa279 for an example of
 the races that can happen.
 
 Note that x86 has KVM_MP_STATE_SIPI_RECEIVED state but it is obsolete,
 we go straight from KVM_MP_STATE_INIT_RECEIVED to KVM_MP_STATE_RUNNABLE.
 

Thanks for the explanation Paolo!

Looks like from an interrupt point of view, the states have a lot in common.
The major thing that differs on s390 is probably the way these interrupts are
generated and what else they influence (all the power of the SIGP facility :)
+ special check-stop state that can't be left by an interrupt, only by SIGP
  CPU resets).

David

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/6] KVM: PPC: BOOKE: Emulate debug registers and exception

2014-07-29 Thread Scott Wood
On Tue, 2014-07-29 at 16:06 +0200, Alexander Graf wrote:
 On 29.07.14 00:33, Scott Wood wrote:
  On Mon, 2014-07-28 at 16:04 +0200, Alexander Graf wrote:
  On 11.07.14 10:39, Bharat Bhushan wrote:
  This patch emulates debug registers and debug exception
  to support guest using debug resource. This enables running
  gdb/kgdb etc in guest.
 
  On BOOKE architecture we cannot share debug resources between QEMU and
  guest because:
When QEMU is using debug resources then debug exception must
be always enabled. To achieve this we set MSR_DE and also set
MSRP_DEP so guest cannot change MSR_DE.
 
When emulating debug resource for guest we want guest
to control MSR_DE (enable/disable debug interrupt on need).
 
So above mentioned two configuration cannot be supported
at the same time. So the result is that we cannot share
debug resources between QEMU and Guest on BOOKE architecture.
 
  In the current design QEMU gets priority over guest, this means that if
  QEMU is using debug resources then guest cannot use them and if guest is
  using debug resource then QEMU can overwrite them.
 
  Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
  ---
  Hi Alex,
 
  I thought of having some print in register emulation if QEMU
  is using debug resource, Also when QEMU overwrites guest written
  values but that looks excessive. If I uses some variable which
  get set when guest starts using debug registers and check in
  debug set ioctl then that look ugly. Looking for suggestions
  Whatever you do, have QEMU do the print, not the kernel.
  How would that be accomplished?  How would the kernel know to exit to
  QEMU, and how would the exit reason be conveyed?
 
 QEMU is the one forcefully enabling debug and overwriting guest debug 
 registers, so it also knows when it did overwrite valid ones.

QEMU knows when it overwrites the guest values, but it doesn't know if,
after enabling host debug, the guest tries to write to the debug
registers and it gets nopped.  If we keep the EDM setting, then we can
at least say the situation is no worse than with a JTAG.

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/6] KVM: PPC: BOOKE: Emulate debug registers and exception

2014-07-29 Thread Alexander Graf


 Am 29.07.2014 um 19:50 schrieb Scott Wood scottw...@freescale.com:
 
 On Tue, 2014-07-29 at 16:06 +0200, Alexander Graf wrote:
 On 29.07.14 00:33, Scott Wood wrote:
 On Mon, 2014-07-28 at 16:04 +0200, Alexander Graf wrote:
 On 11.07.14 10:39, Bharat Bhushan wrote:
 This patch emulates debug registers and debug exception
 to support guest using debug resource. This enables running
 gdb/kgdb etc in guest.
 
 On BOOKE architecture we cannot share debug resources between QEMU and
 guest because:
  When QEMU is using debug resources then debug exception must
  be always enabled. To achieve this we set MSR_DE and also set
  MSRP_DEP so guest cannot change MSR_DE.
 
  When emulating debug resource for guest we want guest
  to control MSR_DE (enable/disable debug interrupt on need).
 
  So above mentioned two configuration cannot be supported
  at the same time. So the result is that we cannot share
  debug resources between QEMU and Guest on BOOKE architecture.
 
 In the current design QEMU gets priority over guest, this means that if
 QEMU is using debug resources then guest cannot use them and if guest is
 using debug resource then QEMU can overwrite them.
 
 Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
 ---
 Hi Alex,
 
 I thought of having some print in register emulation if QEMU
 is using debug resource, Also when QEMU overwrites guest written
 values but that looks excessive. If I uses some variable which
 get set when guest starts using debug registers and check in
 debug set ioctl then that look ugly. Looking for suggestions
 Whatever you do, have QEMU do the print, not the kernel.
 How would that be accomplished?  How would the kernel know to exit to
 QEMU, and how would the exit reason be conveyed?
 
 QEMU is the one forcefully enabling debug and overwriting guest debug 
 registers, so it also knows when it did overwrite valid ones.
 
 QEMU knows when it overwrites the guest values, but it doesn't know if,
 after enabling host debug, the guest tries to write to the debug
 registers and it gets nopped.  If we keep the EDM setting, then we can
 at least say the situation is no worse than with a JTAG.

Yeah, I think that's perfectly reasonable. I don't think it'll be likely that a 
user starts debugging with qemu and then expects guest debugging to work.

The other way around is more likely and would warrant a warning to the user - 
if we care.

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Hello!!!

2014-07-29 Thread Katie Chow
Hello,

I am Ms. Claudia Hiu, I work with Fubon Bank (Hong Kong), I have a business 
transaction in your name which prompted me to contact you. Please i would 
appreciate your urgent respond immediately to serve you with more details.

Greetings
Ms. Claudia Hiu

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: vmx: remove duplicate vmx_mpx_supported()

2014-07-29 Thread Chris J Arges
Remove a function which was added by both 93c4adc7afe and 36be0b9deb2.

Signed-off-by: Chris J Arges chris.j.ar...@canonical.com
---
 arch/x86/kvm/vmx.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 801332e..c4ea039 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -740,7 +740,6 @@ static u32 vmx_segment_access_rights(struct kvm_segment 
*var);
 static void vmx_sync_pir_to_irr_dummy(struct kvm_vcpu *vcpu);
 static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx);
 static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx);
-static bool vmx_mpx_supported(void);
 
 static DEFINE_PER_CPU(struct vmcs *, vmxarea);
 static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
-- 
1.9.1

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


dirty page tracking in kvm/qemu -- page faults inevitable?

2014-07-29 Thread Chris Friesen

Hi,

I've got an issue where we're hitting major performance penalties while 
doing live migration, and it seems like it might be due to page faults 
triggering hypervisor exits, and then we get stuck waiting for the 
iothread lock which is held by the qemu dirty page scanning code.


Accordingly, I'm trying to figure out the actual mechanism whereby dirty 
pages are tracked in qemu/kvm.  I've got an Ivy Bridge CPU, a 3.4 kernel 
on the host, and qemu 1.4.


Looking at the qemu code, it seems to be calling down into kvm to get 
the dirty page information.


Looking at kvm, most of what I read seems to be doing the usual mark it 
read-only and then when we take a page fault mark it as dirty trick.


However, I read something about Intel EPT having hardware support for 
tracking dirty pages.  It seems like this might avoid the need for a 
page fault, but might only be available on Haswell or later CPUs--is 
that correct?  Is it supported in kvm?  If so, when was support added?


Thanks,
Chris

P.S.  Please CC me on reply, I'm not subscribed to the list.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 2/5] random: Add and use arch_get_rng_seed

2014-07-29 Thread Andy Lutomirski
On Wed, Jul 23, 2014 at 9:57 PM, Andy Lutomirski l...@amacapital.net wrote:
 Currently, init_std_data contains its own logic for using arch
 random sources.  This replaces that logic with a generic function
 arch_get_rng_seed that allows arch code to supply its own logic.
 The default implementation tries arch_get_random_seed_long and
 arch_get_random_long individually.

 The only functional change here is that random_get_entropy() is used
 unconditionally instead of being used only when the arch sources
 fail.  This may add a tiny amount of security.

tytso, are you okay with this approach?  I'd be happy to rework this
if you prefer some other way of doing it.

--Andy


 Signed-off-by: Andy Lutomirski l...@amacapital.net
 ---
  drivers/char/random.c  | 14 +++---
  include/linux/random.h | 40 
  2 files changed, 51 insertions(+), 3 deletions(-)

 diff --git a/drivers/char/random.c b/drivers/char/random.c
 index 0a7ac0a..be7a94e 100644
 --- a/drivers/char/random.c
 +++ b/drivers/char/random.c
 @@ -1236,6 +1236,10 @@ void get_random_bytes_arch(void *buf, int nbytes)
  }
  EXPORT_SYMBOL(get_random_bytes_arch);

 +static void seed_entropy_store(void *ctx, u32 data)
 +{
 +   mix_pool_bytes((struct entropy_store *)ctx, data, sizeof(data), 
 NULL);
 +}

  /*
   * init_std_data - initialize pool with system data
 @@ -1251,15 +1255,19 @@ static void init_std_data(struct entropy_store *r)
 int i;
 ktime_t now = ktime_get_real();
 unsigned long rv;
 +   char log_prefix[128];

 r-last_pulled = jiffies;
 mix_pool_bytes(r, now, sizeof(now), NULL);
 for (i = r-poolinfo-poolbytes; i  0; i -= sizeof(rv)) {
 -   if (!arch_get_random_seed_long(rv) 
 -   !arch_get_random_long(rv))
 -   rv = random_get_entropy();
 +   rv = random_get_entropy();
 mix_pool_bytes(r, rv, sizeof(rv), NULL);
 }
 +
 +   sprintf(log_prefix, random: seeded %s pool, r-name);
 +   arch_get_rng_seed(r, seed_entropy_store, 8 * r-poolinfo-poolbytes,
 + log_prefix);
 +
 mix_pool_bytes(r, utsname(), sizeof(*(utsname())), NULL);
  }

 diff --git a/include/linux/random.h b/include/linux/random.h
 index 57fbbff..81a6145 100644
 --- a/include/linux/random.h
 +++ b/include/linux/random.h
 @@ -106,6 +106,46 @@ static inline int arch_has_random_seed(void)
  }
  #endif

 +#ifndef __HAVE_ARCH_GET_RNG_SEED
 +
 +/**
 + * arch_get_rng_seed() - get architectural rng seed data
 + * @ctx: context for the seed function
 + * @seed: function to call for each u32 obtained
 + * @bits_per_source: number of bits from each source to try to use
 + * @log_prefix: beginning of log output (may be NULL)
 + *
 + * Synchronously load some architectural entropy or other best-effort
 + * random seed data.  An arch-specific implementation should be no worse
 + * than this generic implementation.  If the arch code does something
 + * interesting, it may log something of the form log_prefix with
 + * 8 bits of stuff.
 + *
 + * No arch-specific implementation should be any worse than the generic
 + * implementation.
 + */
 +static inline void arch_get_rng_seed(void *ctx,
 +void (*seed)(void *ctx, u32 data),
 +int bits_per_source,
 +const char *log_prefix)
 +{
 +   int i;
 +
 +   for (i = 0; i  bits_per_source; i += 8 * sizeof(long)) {
 +   unsigned long rv;
 +
 +   if (arch_get_random_seed_long(rv) ||
 +   arch_get_random_long(rv)) {
 +   seed(ctx, (u32)rv);
 +#if BITS_PER_LONG  32
 +   seed(ctx, (u32)(rv  32));
 +#endif
 +   }
 +   }
 +}
 +
 +#endif /* __HAVE_ARCH_GET_RNG_SEED */
 +
  /* Pseudo random number generator from numerical recipes. */
  static inline u32 next_pseudo_random32(u32 seed)
  {
 --
 1.9.3




-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 1/6] KVM: PPC: BOOKE: No need to set DBCR0_EDM in guest visible register

2014-07-29 Thread bharat.bhus...@freescale.com


 -Original Message-
 From: Wood Scott-B07421
 Sent: Tuesday, July 29, 2014 3:22 AM
 To: Bhushan Bharat-R65777
 Cc: ag...@suse.de; kvm-...@vger.kernel.org; kvm@vger.kernel.org; Yoder Stuart-
 B08248
 Subject: Re: [PATCH 1/6] KVM: PPC: BOOKE: No need to set DBCR0_EDM in guest
 visible register
 
 On Fri, 2014-07-11 at 14:08 +0530, Bharat Bhushan wrote:
  This is not used and  even I do not remember why this was added in
  first place.
 
  Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
  ---
   arch/powerpc/kvm/booke.c | 2 --
   1 file changed, 2 deletions(-)
 
  diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index
  ab62109..a5ee42c 100644
  --- a/arch/powerpc/kvm/booke.c
  +++ b/arch/powerpc/kvm/booke.c
  @@ -1804,8 +1804,6 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct 
  kvm_vcpu
 *vcpu,
  kvm_guest_protect_msr(vcpu, MSR_DE, true);
  vcpu-guest_debug = dbg-control;
  vcpu-arch.shadow_dbg_reg.dbcr0 = 0;
  -   /* Set DBCR0_EDM in guest visible DBCR0 register. */
  -   vcpu-arch.dbg_reg.dbcr0 = DBCR0_EDM;
 
  if (vcpu-guest_debug  KVM_GUESTDBG_SINGLESTEP)
  vcpu-arch.shadow_dbg_reg.dbcr0 |= DBCR0_IDM | DBCR0_IC;
 
 This was intended to let the guest know that the host owns the debug 
 resources,
 by analogy to what a JTAG debugger would do.
 
 The Power ISA has this Virtualized Implementation Note:
 
 It is the responsibility of the hypervisor to ensure that
 DBCR0[EDM] is consistent with usage of DEP.

Ok, That means that if MSRP_DEP is set then set DBCR0_EDM  and if MSRP_DEP is 
clear then clear DBCR0_EDM, right?
We need to implement above mentioned this.

Thanks
-Bharat

 
 -Scott
 

N�r��yb�X��ǧv�^�)޺{.n�+h����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf

RE: [PATCH 2/6] KVM: PPC: BOOKE: Force MSR_DE in rfci if guest is under debug

2014-07-29 Thread bharat.bhus...@freescale.com


 -Original Message-
 From: Wood Scott-B07421
 Sent: Tuesday, July 29, 2014 3:25 AM
 To: Bhushan Bharat-R65777
 Cc: ag...@suse.de; kvm-...@vger.kernel.org; kvm@vger.kernel.org; Yoder Stuart-
 B08248
 Subject: Re: [PATCH 2/6] KVM: PPC: BOOKE: Force MSR_DE in rfci if guest is 
 under
 debug
 
 On Fri, 2014-07-11 at 14:08 +0530, Bharat Bhushan wrote:
  When userspace (QEMU) is using the debug resource to debug guest then
  we want MSR_DE to be always set. This patch adds missing MSR_DE
  setting in rfci instruction.
 
  Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
  ---
   arch/powerpc/kvm/booke_emulate.c | 6 +-
   1 file changed, 5 insertions(+), 1 deletion(-)
 
  diff --git a/arch/powerpc/kvm/booke_emulate.c
  b/arch/powerpc/kvm/booke_emulate.c
  index 27a4b28..80c51a2 100644
  --- a/arch/powerpc/kvm/booke_emulate.c
  +++ b/arch/powerpc/kvm/booke_emulate.c
  @@ -40,7 +40,11 @@ static void kvmppc_emul_rfi(struct kvm_vcpu *vcpu)
  static void kvmppc_emul_rfci(struct kvm_vcpu *vcpu)  {
  vcpu-arch.pc = vcpu-arch.csrr0;
  -   kvmppc_set_msr(vcpu, vcpu-arch.csrr1);
  +   /* Force MSR_DE when guest does not own debug facilities */
  +   if (vcpu-guest_debug)
  +   kvmppc_set_msr(vcpu, vcpu-arch.csrr1 | MSR_DE);
  +   else
  +   kvmppc_set_msr(vcpu, vcpu-arch.csrr1);
   }
 
   int kvmppc_booke_emulate_op(struct kvm_run *run, struct kvm_vcpu
  *vcpu,
 
 It looks like this is already handled by kvmppc_vcpu_sync_debug(), which is
 called by kvmppc_set_msr().

Yes, you are right. This patch is not needed.

Thanks
-Bharat

 
 Plus, it should only be done for HV mode.
 
 -Scott
 



RE: [PATCH 5/6] KVM: PPC: BOOKE: Allow guest to change MSR_DE

2014-07-29 Thread bharat.bhus...@freescale.com


 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Tuesday, July 29, 2014 7:35 PM
 To: Wood Scott-B07421; Bhushan Bharat-R65777
 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; Yoder Stuart-B08248
 Subject: Re: [PATCH 5/6] KVM: PPC: BOOKE: Allow guest to change MSR_DE
 
 
 On 29.07.14 00:01, Scott Wood wrote:
  On Fri, 2014-07-11 at 14:09 +0530, Bharat Bhushan wrote:
  When userspace is debugging guest then MSR_DE is always set and
  MSRP_DEP is set so that guest cannot change MSR_DE.
  Guest debug resources are not yet emulated, So there seems no reason
  we should stop guest controlling MSR_DE.
  Also a followup patch will enable debug emulation and that requires
  guest to control MSR_DE.
  Why does it matter whether we emulate debug resources?  We still don't
  want the guest to be able to clear MSR[DE] and thus break host debug.
 
 The patch description is misleading. This patch changes the default of DEP to
 guest controlled when it boots up. Once QEMU wants control over the debug
 registers, it gets switched to QEMU controlled (that code is already there).

Yes, now default MSR_DE is controlled by guest and when QEMU wants to use debug 
resources then MSR_DEP is set, so guest cannot change MSR_DE.

Thanks
-Bharat 

 
 
 Alex



RE: [PATCH 6/6] KVM: PPC: BOOKE: Emulate debug registers and exception

2014-07-29 Thread bharat.bhus...@freescale.com


 -Original Message-
 From: Wood Scott-B07421
 Sent: Tuesday, July 29, 2014 11:20 PM
 To: Alexander Graf
 Cc: Bhushan Bharat-R65777; kvm-...@vger.kernel.org; kvm@vger.kernel.org; Yoder
 Stuart-B08248
 Subject: Re: [PATCH 6/6] KVM: PPC: BOOKE: Emulate debug registers and 
 exception
 
 On Tue, 2014-07-29 at 16:06 +0200, Alexander Graf wrote:
  On 29.07.14 00:33, Scott Wood wrote:
   On Mon, 2014-07-28 at 16:04 +0200, Alexander Graf wrote:
   On 11.07.14 10:39, Bharat Bhushan wrote:
   This patch emulates debug registers and debug exception to support
   guest using debug resource. This enables running gdb/kgdb etc in
   guest.
  
   On BOOKE architecture we cannot share debug resources between QEMU
   and guest because:
 When QEMU is using debug resources then debug exception must
 be always enabled. To achieve this we set MSR_DE and also set
 MSRP_DEP so guest cannot change MSR_DE.
  
 When emulating debug resource for guest we want guest
 to control MSR_DE (enable/disable debug interrupt on need).
  
 So above mentioned two configuration cannot be supported
 at the same time. So the result is that we cannot share
 debug resources between QEMU and Guest on BOOKE architecture.
  
   In the current design QEMU gets priority over guest, this means
   that if QEMU is using debug resources then guest cannot use them
   and if guest is using debug resource then QEMU can overwrite them.
  
   Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
   ---
   Hi Alex,
  
   I thought of having some print in register emulation if QEMU is
   using debug resource, Also when QEMU overwrites guest written
   values but that looks excessive. If I uses some variable which get
   set when guest starts using debug registers and check in debug set
   ioctl then that look ugly. Looking for suggestions
   Whatever you do, have QEMU do the print, not the kernel.
   How would that be accomplished?  How would the kernel know to exit
   to QEMU, and how would the exit reason be conveyed?
 
  QEMU is the one forcefully enabling debug and overwriting guest debug
  registers, so it also knows when it did overwrite valid ones.
 
 QEMU knows when it overwrites the guest values, but it doesn't know if, after
 enabling host debug, the guest tries to write to the debug registers and it 
 gets
 nopped.

Do we want that QEMU first get DBCR0 to know whether it is overwriting whenever 
set/clear debug register?

  If we keep the EDM setting, then we can at least say the situation is
 no worse than with a JTAG.

Yes

Thanks
-Bharat

 
 -Scott
 

N�r��yb�X��ǧv�^�)޺{.n�+h����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf

Re: [PATCH] powerpc: kvm: make the setup of hpte under the protection of KVMPPC_RMAP_LOCK_BIT

2014-07-29 Thread Liu ping fan
On Tue, Jul 29, 2014 at 2:57 PM, Benjamin Herrenschmidt
b...@kernel.crashing.org wrote:
 On Mon, 2014-07-28 at 15:58 +0800, Liu ping fan wrote:
 Hope I am right.  Take the following seq as an example

 if (hptep[0]  HPTE_V_VALID) {
 /* HPTE was previously valid, so we need to invalidate it */
 unlock_rmap(rmap);
 hptep[0] |= HPTE_V_ABSENT;
 kvmppc_invalidate_hpte(kvm, hptep, index);
 /* don't lose previous R and C bits */
 r |= hptep[1]  (HPTE_R_R | HPTE_R_C);
 } else {
 kvmppc_add_revmap_chain(kvm, rev, rmap, index, 0);
 }
   - if we try_to_unmap on
 pfn at here, then @r contains a invalid pfn
 hptep[1] = r;
 eieio();
 hptep[0] = hpte[0];
 asm volatile(ptesync : : : memory);

 If that was the case we would have the same race in kvmppc_do_h_enter().

 I think the fact that the HPTE is locked will prevent the race, ie,
 HPTE_V_HVLOCK is set until hptep[0] is written to.

 If I look at at the unmap case, my understanding is that it uses
 kvm_unmap_rmapp() which will also lock the HPTE (try_lock_hpte)
 and so shouldn't have a race vs the above code.

Yes, you are right :)

Thx,
Fan


 Or do you see a race I don't ?

 Cheers,
 Ben.

 Thx.
 Fan

 On Mon, Jul 28, 2014 at 2:42 PM, Benjamin Herrenschmidt
 b...@kernel.crashing.org wrote:
  On Mon, 2014-07-28 at 14:09 +0800, Liu Ping Fan wrote:
  In current code, the setup of hpte is under the risk of race with
  mmu_notifier_invalidate, i.e we may setup a hpte with a invalid pfn.
  Resolve this issue by sync the two actions by KVMPPC_RMAP_LOCK_BIT.
 
  Please describe the race you think you see. I'm quite sure both Paul and
  I went over that code and somewhat convinced ourselves that it was ok
  but it's possible that we were both wrong :-)
 
  Cheers,
  Ben.
 
  Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com
  ---
   arch/powerpc/kvm/book3s_64_mmu_hv.c | 15 ++-
   1 file changed, 10 insertions(+), 5 deletions(-)
 
  diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
  b/arch/powerpc/kvm/book3s_64_mmu_hv.c
  index 8056107..e6dcff4 100644
  --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
  +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
  @@ -754,19 +754,24 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run 
  *run, struct kvm_vcpu *vcpu,
 
if (hptep[0]  HPTE_V_VALID) {
/* HPTE was previously valid, so we need to invalidate it */
  - unlock_rmap(rmap);
hptep[0] |= HPTE_V_ABSENT;
kvmppc_invalidate_hpte(kvm, hptep, index);
/* don't lose previous R and C bits */
r |= hptep[1]  (HPTE_R_R | HPTE_R_C);
  +
  + hptep[1] = r;
  + eieio();
  + hptep[0] = hpte[0];
  + asm volatile(ptesync : : : memory);
  + unlock_rmap(rmap);
} else {
  + hptep[1] = r;
  + eieio();
  + hptep[0] = hpte[0];
  + asm volatile(ptesync : : : memory);
kvmppc_add_revmap_chain(kvm, rev, rmap, index, 0);
}
 
  - hptep[1] = r;
  - eieio();
  - hptep[0] = hpte[0];
  - asm volatile(ptesync : : : memory);
preempt_enable();
if (page  hpte_is_writable(r))
SetPageDirty(page);
 
 


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] powerpc: kvm: make the setup of hpte under the protection of KVMPPC_RMAP_LOCK_BIT

2014-07-29 Thread Benjamin Herrenschmidt
On Mon, 2014-07-28 at 15:58 +0800, Liu ping fan wrote:
 Hope I am right.  Take the following seq as an example
 
 if (hptep[0]  HPTE_V_VALID) {
 /* HPTE was previously valid, so we need to invalidate it */
 unlock_rmap(rmap);
 hptep[0] |= HPTE_V_ABSENT;
 kvmppc_invalidate_hpte(kvm, hptep, index);
 /* don't lose previous R and C bits */
 r |= hptep[1]  (HPTE_R_R | HPTE_R_C);
 } else {
 kvmppc_add_revmap_chain(kvm, rev, rmap, index, 0);
 }
   - if we try_to_unmap on
 pfn at here, then @r contains a invalid pfn
 hptep[1] = r;
 eieio();
 hptep[0] = hpte[0];
 asm volatile(ptesync : : : memory);

If that was the case we would have the same race in kvmppc_do_h_enter().

I think the fact that the HPTE is locked will prevent the race, ie,
HPTE_V_HVLOCK is set until hptep[0] is written to.

If I look at at the unmap case, my understanding is that it uses  
kvm_unmap_rmapp() which will also lock the HPTE (try_lock_hpte)
and so shouldn't have a race vs the above code.

Or do you see a race I don't ?

Cheers,
Ben.

 Thx.
 Fan
 
 On Mon, Jul 28, 2014 at 2:42 PM, Benjamin Herrenschmidt
 b...@kernel.crashing.org wrote:
  On Mon, 2014-07-28 at 14:09 +0800, Liu Ping Fan wrote:
  In current code, the setup of hpte is under the risk of race with
  mmu_notifier_invalidate, i.e we may setup a hpte with a invalid pfn.
  Resolve this issue by sync the two actions by KVMPPC_RMAP_LOCK_BIT.
 
  Please describe the race you think you see. I'm quite sure both Paul and
  I went over that code and somewhat convinced ourselves that it was ok
  but it's possible that we were both wrong :-)
 
  Cheers,
  Ben.
 
  Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com
  ---
   arch/powerpc/kvm/book3s_64_mmu_hv.c | 15 ++-
   1 file changed, 10 insertions(+), 5 deletions(-)
 
  diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
  b/arch/powerpc/kvm/book3s_64_mmu_hv.c
  index 8056107..e6dcff4 100644
  --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
  +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
  @@ -754,19 +754,24 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, 
  struct kvm_vcpu *vcpu,
 
if (hptep[0]  HPTE_V_VALID) {
/* HPTE was previously valid, so we need to invalidate it */
  - unlock_rmap(rmap);
hptep[0] |= HPTE_V_ABSENT;
kvmppc_invalidate_hpte(kvm, hptep, index);
/* don't lose previous R and C bits */
r |= hptep[1]  (HPTE_R_R | HPTE_R_C);
  +
  + hptep[1] = r;
  + eieio();
  + hptep[0] = hpte[0];
  + asm volatile(ptesync : : : memory);
  + unlock_rmap(rmap);
} else {
  + hptep[1] = r;
  + eieio();
  + hptep[0] = hpte[0];
  + asm volatile(ptesync : : : memory);
kvmppc_add_revmap_chain(kvm, rev, rmap, index, 0);
}
 
  - hptep[1] = r;
  - eieio();
  - hptep[0] = hpte[0];
  - asm volatile(ptesync : : : memory);
preempt_enable();
if (page  hpte_is_writable(r))
SetPageDirty(page);
 
 


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/6] KVM: PPC: BOOKE: Allow guest to change MSR_DE

2014-07-29 Thread Alexander Graf


On 29.07.14 00:01, Scott Wood wrote:

On Fri, 2014-07-11 at 14:09 +0530, Bharat Bhushan wrote:

When userspace is debugging guest then MSR_DE is always set and
MSRP_DEP is set so that guest cannot change MSR_DE.
Guest debug resources are not yet emulated, So there seems no reason
we should stop guest controlling MSR_DE.
Also a followup patch will enable debug emulation and that requires
guest to control MSR_DE.

Why does it matter whether we emulate debug resources?  We still don't
want the guest to be able to clear MSR[DE] and thus break host debug.


The patch description is misleading. This patch changes the default of 
DEP to guest controlled when it boots up. Once QEMU wants control over 
the debug registers, it gets switched to QEMU controlled (that code is 
already there).



Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/6] KVM: PPC: BOOKE: Emulate debug registers and exception

2014-07-29 Thread Alexander Graf


On 29.07.14 00:33, Scott Wood wrote:

On Mon, 2014-07-28 at 16:04 +0200, Alexander Graf wrote:

On 11.07.14 10:39, Bharat Bhushan wrote:

This patch emulates debug registers and debug exception
to support guest using debug resource. This enables running
gdb/kgdb etc in guest.

On BOOKE architecture we cannot share debug resources between QEMU and
guest because:
  When QEMU is using debug resources then debug exception must
  be always enabled. To achieve this we set MSR_DE and also set
  MSRP_DEP so guest cannot change MSR_DE.

  When emulating debug resource for guest we want guest
  to control MSR_DE (enable/disable debug interrupt on need).

  So above mentioned two configuration cannot be supported
  at the same time. So the result is that we cannot share
  debug resources between QEMU and Guest on BOOKE architecture.

In the current design QEMU gets priority over guest, this means that if
QEMU is using debug resources then guest cannot use them and if guest is
using debug resource then QEMU can overwrite them.

Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
---
Hi Alex,

I thought of having some print in register emulation if QEMU
is using debug resource, Also when QEMU overwrites guest written
values but that looks excessive. If I uses some variable which
get set when guest starts using debug registers and check in
debug set ioctl then that look ugly. Looking for suggestions

Whatever you do, have QEMU do the print, not the kernel.

How would that be accomplished?  How would the kernel know to exit to
QEMU, and how would the exit reason be conveyed?


QEMU is the one forcefully enabling debug and overwriting guest debug 
registers, so it also knows when it did overwrite valid ones.



Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/6] KVM: PPC: BOOKE: Emulate debug registers and exception

2014-07-29 Thread Scott Wood
On Tue, 2014-07-29 at 16:06 +0200, Alexander Graf wrote:
 On 29.07.14 00:33, Scott Wood wrote:
  On Mon, 2014-07-28 at 16:04 +0200, Alexander Graf wrote:
  On 11.07.14 10:39, Bharat Bhushan wrote:
  This patch emulates debug registers and debug exception
  to support guest using debug resource. This enables running
  gdb/kgdb etc in guest.
 
  On BOOKE architecture we cannot share debug resources between QEMU and
  guest because:
When QEMU is using debug resources then debug exception must
be always enabled. To achieve this we set MSR_DE and also set
MSRP_DEP so guest cannot change MSR_DE.
 
When emulating debug resource for guest we want guest
to control MSR_DE (enable/disable debug interrupt on need).
 
So above mentioned two configuration cannot be supported
at the same time. So the result is that we cannot share
debug resources between QEMU and Guest on BOOKE architecture.
 
  In the current design QEMU gets priority over guest, this means that if
  QEMU is using debug resources then guest cannot use them and if guest is
  using debug resource then QEMU can overwrite them.
 
  Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
  ---
  Hi Alex,
 
  I thought of having some print in register emulation if QEMU
  is using debug resource, Also when QEMU overwrites guest written
  values but that looks excessive. If I uses some variable which
  get set when guest starts using debug registers and check in
  debug set ioctl then that look ugly. Looking for suggestions
  Whatever you do, have QEMU do the print, not the kernel.
  How would that be accomplished?  How would the kernel know to exit to
  QEMU, and how would the exit reason be conveyed?
 
 QEMU is the one forcefully enabling debug and overwriting guest debug 
 registers, so it also knows when it did overwrite valid ones.

QEMU knows when it overwrites the guest values, but it doesn't know if,
after enabling host debug, the guest tries to write to the debug
registers and it gets nopped.  If we keep the EDM setting, then we can
at least say the situation is no worse than with a JTAG.

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/6] KVM: PPC: BOOKE: Emulate debug registers and exception

2014-07-29 Thread Alexander Graf


 Am 29.07.2014 um 19:50 schrieb Scott Wood scottw...@freescale.com:
 
 On Tue, 2014-07-29 at 16:06 +0200, Alexander Graf wrote:
 On 29.07.14 00:33, Scott Wood wrote:
 On Mon, 2014-07-28 at 16:04 +0200, Alexander Graf wrote:
 On 11.07.14 10:39, Bharat Bhushan wrote:
 This patch emulates debug registers and debug exception
 to support guest using debug resource. This enables running
 gdb/kgdb etc in guest.
 
 On BOOKE architecture we cannot share debug resources between QEMU and
 guest because:
  When QEMU is using debug resources then debug exception must
  be always enabled. To achieve this we set MSR_DE and also set
  MSRP_DEP so guest cannot change MSR_DE.
 
  When emulating debug resource for guest we want guest
  to control MSR_DE (enable/disable debug interrupt on need).
 
  So above mentioned two configuration cannot be supported
  at the same time. So the result is that we cannot share
  debug resources between QEMU and Guest on BOOKE architecture.
 
 In the current design QEMU gets priority over guest, this means that if
 QEMU is using debug resources then guest cannot use them and if guest is
 using debug resource then QEMU can overwrite them.
 
 Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
 ---
 Hi Alex,
 
 I thought of having some print in register emulation if QEMU
 is using debug resource, Also when QEMU overwrites guest written
 values but that looks excessive. If I uses some variable which
 get set when guest starts using debug registers and check in
 debug set ioctl then that look ugly. Looking for suggestions
 Whatever you do, have QEMU do the print, not the kernel.
 How would that be accomplished?  How would the kernel know to exit to
 QEMU, and how would the exit reason be conveyed?
 
 QEMU is the one forcefully enabling debug and overwriting guest debug 
 registers, so it also knows when it did overwrite valid ones.
 
 QEMU knows when it overwrites the guest values, but it doesn't know if,
 after enabling host debug, the guest tries to write to the debug
 registers and it gets nopped.  If we keep the EDM setting, then we can
 at least say the situation is no worse than with a JTAG.

Yeah, I think that's perfectly reasonable. I don't think it'll be likely that a 
user starts debugging with qemu and then expects guest debugging to work.

The other way around is more likely and would warrant a warning to the user - 
if we care.

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH 1/6] KVM: PPC: BOOKE: No need to set DBCR0_EDM in guest visible register

2014-07-29 Thread bharat.bhus...@freescale.com


 -Original Message-
 From: Wood Scott-B07421
 Sent: Tuesday, July 29, 2014 3:22 AM
 To: Bhushan Bharat-R65777
 Cc: ag...@suse.de; kvm-ppc@vger.kernel.org; k...@vger.kernel.org; Yoder 
 Stuart-
 B08248
 Subject: Re: [PATCH 1/6] KVM: PPC: BOOKE: No need to set DBCR0_EDM in guest
 visible register
 
 On Fri, 2014-07-11 at 14:08 +0530, Bharat Bhushan wrote:
  This is not used and  even I do not remember why this was added in
  first place.
 
  Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
  ---
   arch/powerpc/kvm/booke.c | 2 --
   1 file changed, 2 deletions(-)
 
  diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index
  ab62109..a5ee42c 100644
  --- a/arch/powerpc/kvm/booke.c
  +++ b/arch/powerpc/kvm/booke.c
  @@ -1804,8 +1804,6 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct 
  kvm_vcpu
 *vcpu,
  kvm_guest_protect_msr(vcpu, MSR_DE, true);
  vcpu-guest_debug = dbg-control;
  vcpu-arch.shadow_dbg_reg.dbcr0 = 0;
  -   /* Set DBCR0_EDM in guest visible DBCR0 register. */
  -   vcpu-arch.dbg_reg.dbcr0 = DBCR0_EDM;
 
  if (vcpu-guest_debug  KVM_GUESTDBG_SINGLESTEP)
  vcpu-arch.shadow_dbg_reg.dbcr0 |= DBCR0_IDM | DBCR0_IC;
 
 This was intended to let the guest know that the host owns the debug 
 resources,
 by analogy to what a JTAG debugger would do.
 
 The Power ISA has this Virtualized Implementation Note:
 
 It is the responsibility of the hypervisor to ensure that
 DBCR0[EDM] is consistent with usage of DEP.

Ok, That means that if MSRP_DEP is set then set DBCR0_EDM  and if MSRP_DEP is 
clear then clear DBCR0_EDM, right?
We need to implement above mentioned this.

Thanks
-Bharat

 
 -Scott
 

N�r��yb�X��ǧv�^�)޺{.n�+jir)w*jg����ݢj/���z�ޖ��2�ޙ�)ߡ�a�����G���h��j:+v���w��٥

RE: [PATCH 2/6] KVM: PPC: BOOKE: Force MSR_DE in rfci if guest is under debug

2014-07-29 Thread bharat.bhus...@freescale.com


 -Original Message-
 From: Wood Scott-B07421
 Sent: Tuesday, July 29, 2014 3:25 AM
 To: Bhushan Bharat-R65777
 Cc: ag...@suse.de; kvm-ppc@vger.kernel.org; k...@vger.kernel.org; Yoder 
 Stuart-
 B08248
 Subject: Re: [PATCH 2/6] KVM: PPC: BOOKE: Force MSR_DE in rfci if guest is 
 under
 debug
 
 On Fri, 2014-07-11 at 14:08 +0530, Bharat Bhushan wrote:
  When userspace (QEMU) is using the debug resource to debug guest then
  we want MSR_DE to be always set. This patch adds missing MSR_DE
  setting in rfci instruction.
 
  Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
  ---
   arch/powerpc/kvm/booke_emulate.c | 6 +-
   1 file changed, 5 insertions(+), 1 deletion(-)
 
  diff --git a/arch/powerpc/kvm/booke_emulate.c
  b/arch/powerpc/kvm/booke_emulate.c
  index 27a4b28..80c51a2 100644
  --- a/arch/powerpc/kvm/booke_emulate.c
  +++ b/arch/powerpc/kvm/booke_emulate.c
  @@ -40,7 +40,11 @@ static void kvmppc_emul_rfi(struct kvm_vcpu *vcpu)
  static void kvmppc_emul_rfci(struct kvm_vcpu *vcpu)  {
  vcpu-arch.pc = vcpu-arch.csrr0;
  -   kvmppc_set_msr(vcpu, vcpu-arch.csrr1);
  +   /* Force MSR_DE when guest does not own debug facilities */
  +   if (vcpu-guest_debug)
  +   kvmppc_set_msr(vcpu, vcpu-arch.csrr1 | MSR_DE);
  +   else
  +   kvmppc_set_msr(vcpu, vcpu-arch.csrr1);
   }
 
   int kvmppc_booke_emulate_op(struct kvm_run *run, struct kvm_vcpu
  *vcpu,
 
 It looks like this is already handled by kvmppc_vcpu_sync_debug(), which is
 called by kvmppc_set_msr().

Yes, you are right. This patch is not needed.

Thanks
-Bharat

 
 Plus, it should only be done for HV mode.
 
 -Scott
 



RE: [PATCH 5/6] KVM: PPC: BOOKE: Allow guest to change MSR_DE

2014-07-29 Thread bharat.bhus...@freescale.com


 -Original Message-
 From: Alexander Graf [mailto:ag...@suse.de]
 Sent: Tuesday, July 29, 2014 7:35 PM
 To: Wood Scott-B07421; Bhushan Bharat-R65777
 Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; Yoder Stuart-B08248
 Subject: Re: [PATCH 5/6] KVM: PPC: BOOKE: Allow guest to change MSR_DE
 
 
 On 29.07.14 00:01, Scott Wood wrote:
  On Fri, 2014-07-11 at 14:09 +0530, Bharat Bhushan wrote:
  When userspace is debugging guest then MSR_DE is always set and
  MSRP_DEP is set so that guest cannot change MSR_DE.
  Guest debug resources are not yet emulated, So there seems no reason
  we should stop guest controlling MSR_DE.
  Also a followup patch will enable debug emulation and that requires
  guest to control MSR_DE.
  Why does it matter whether we emulate debug resources?  We still don't
  want the guest to be able to clear MSR[DE] and thus break host debug.
 
 The patch description is misleading. This patch changes the default of DEP to
 guest controlled when it boots up. Once QEMU wants control over the debug
 registers, it gets switched to QEMU controlled (that code is already there).

Yes, now default MSR_DE is controlled by guest and when QEMU wants to use debug 
resources then MSR_DEP is set, so guest cannot change MSR_DE.

Thanks
-Bharat 

 
 
 Alex



RE: [PATCH 6/6] KVM: PPC: BOOKE: Emulate debug registers and exception

2014-07-29 Thread bharat.bhus...@freescale.com


 -Original Message-
 From: Wood Scott-B07421
 Sent: Tuesday, July 29, 2014 11:20 PM
 To: Alexander Graf
 Cc: Bhushan Bharat-R65777; kvm-ppc@vger.kernel.org; k...@vger.kernel.org; 
 Yoder
 Stuart-B08248
 Subject: Re: [PATCH 6/6] KVM: PPC: BOOKE: Emulate debug registers and 
 exception
 
 On Tue, 2014-07-29 at 16:06 +0200, Alexander Graf wrote:
  On 29.07.14 00:33, Scott Wood wrote:
   On Mon, 2014-07-28 at 16:04 +0200, Alexander Graf wrote:
   On 11.07.14 10:39, Bharat Bhushan wrote:
   This patch emulates debug registers and debug exception to support
   guest using debug resource. This enables running gdb/kgdb etc in
   guest.
  
   On BOOKE architecture we cannot share debug resources between QEMU
   and guest because:
 When QEMU is using debug resources then debug exception must
 be always enabled. To achieve this we set MSR_DE and also set
 MSRP_DEP so guest cannot change MSR_DE.
  
 When emulating debug resource for guest we want guest
 to control MSR_DE (enable/disable debug interrupt on need).
  
 So above mentioned two configuration cannot be supported
 at the same time. So the result is that we cannot share
 debug resources between QEMU and Guest on BOOKE architecture.
  
   In the current design QEMU gets priority over guest, this means
   that if QEMU is using debug resources then guest cannot use them
   and if guest is using debug resource then QEMU can overwrite them.
  
   Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
   ---
   Hi Alex,
  
   I thought of having some print in register emulation if QEMU is
   using debug resource, Also when QEMU overwrites guest written
   values but that looks excessive. If I uses some variable which get
   set when guest starts using debug registers and check in debug set
   ioctl then that look ugly. Looking for suggestions
   Whatever you do, have QEMU do the print, not the kernel.
   How would that be accomplished?  How would the kernel know to exit
   to QEMU, and how would the exit reason be conveyed?
 
  QEMU is the one forcefully enabling debug and overwriting guest debug
  registers, so it also knows when it did overwrite valid ones.
 
 QEMU knows when it overwrites the guest values, but it doesn't know if, after
 enabling host debug, the guest tries to write to the debug registers and it 
 gets
 nopped.

Do we want that QEMU first get DBCR0 to know whether it is overwriting whenever 
set/clear debug register?

  If we keep the EDM setting, then we can at least say the situation is
 no worse than with a JTAG.

Yes

Thanks
-Bharat

 
 -Scott
 

N�r��yb�X��ǧv�^�)޺{.n�+jir)w*jg����ݢj/���z�ޖ��2�ޙ�)ߡ�a�����G���h��j:+v���w��٥