Re: [PATCH] vhost: Add polling mode
kvm-ow...@vger.kernel.org wrote on 29/07/2014 04:30:34 AM: From: Zhang Haoyu zhan...@sangfor.com To: Jason Wang jasow...@redhat.com, Abel Gordon abel.gor...@gmail.com, Cc: Razya Ladelsky/Haifa/IBM@IBMIL, Alex Glikson/Haifa/IBM@IBMIL, Eran Raichstein/Haifa/IBM@IBMIL, Joel Nider/Haifa/IBM@IBMIL, kvm kvm@vger.kernel.org, Michael S. Tsirkin m...@redhat.com, Yossi Kuperman1/Haifa/IBM@IBMIL Date: 29/07/2014 04:35 AM Subject: Re: [PATCH] vhost: Add polling mode Sent by: kvm-ow...@vger.kernel.org Maybe tie a knot between vhost-net scalability tuning: threading for many VMs and vhost: Add polling mode is a good marriage, because it's more possibility to get work to do with less polling time, so less cpu cycles waste. Hi Zhang, Indeed have one vhost thread shared by multiple vms, polling for their requests is the ultimate goal of this plan. The current challenge with it is that the cgroup mechanism needs to be supported/incorporated somehow by this shared vhost thread, as it now serves multiple vms(processes). B.T.W. - if someone wants to help with this effort (mainly the cgroup issue), it would be greatly appreciated...! Thank you, Razya Thanks, Zhang Haoyu Hello All, When vhost is waiting for buffers from the guest driver (e.g., more packets to send in vhost-net's transmit queue), it normally goes to sleep and waits for the guest to kick it. This kick involves a PIO in the guest, and therefore an exit (and possibly userspace involvement in translating this PIO exit into a file descriptor event), all of which hurts performance. If the system is under-utilized (has cpu time to spare), vhost can continuously poll the virtqueues for new buffers, and avoid asking the guest to kick us. This patch adds an optional polling mode to vhost, that can be enabled via a kernel module parameter, poll_start_rate. When polling is active for a virtqueue, the guest is asked to disable notification (kicks), and the worker thread continuously checks for new buffers. When it does discover new buffers, it simulates a kick by invoking the underlying backend driver (such as vhost-net), which thinks it got a real kick from the guest, and acts accordingly. If the underlying driver asks not to be kicked, we disable polling on this virtqueue. We start polling on a virtqueue when we notice it has work to do. Polling on this virtqueue is later disabled after 3 seconds of polling turning up no new work, as in this case we are better off returning to the exit-based notification mechanism. The default timeout of 3 seconds can be changed with the poll_stop_idle kernel module parameter. This polling approach makes lot of sense for new HW with posted-interrupts for which we have exitless host-to-guest notifications. But even with support for posted interrupts, guest-to-host communication still causes exits. Polling adds the missing part. When systems are overloaded, there won?t be enough cpu time for the various vhost threads to poll their guests' devices. For these scenarios, we plan to add support for vhost threads that can be shared by multiple devices, even of multiple vms. Our ultimate goal is to implement the I/O acceleration features described in: KVM Forum 2013: Efficient and Scalable Virtio (by Abel Gordon) https://www.youtube.com/watch?v=9EyweibHfEs and https://www.mail-archive.com/kvm@vger.kernel.org/msg98179.html Comments are welcome, Thank you, Razya Thanks for the work. Do you have perf numbers for this? Hi Jason, Thanks for reviewing. I ran some experiments with TCP stream netperf and filebench (having 2 threads performing random reads) benchmarks on an IBM System x3650 M4. All runs loaded the guests in a way that they were (cpu) saturated. The system had two cores per guest, as to allow for both the vcpu and the vhost thread to run concurrently for maximum throughput (but I didn't pin the threads to specific cores) I get: Netperf, 1 vm: The polling patch improved throughput by ~33%. Number of exits/sec decreased 6x. The same improvement was shown when I tested with 3 vms running netperf. filebench, 1 vm: ops/sec improved by 13% with the polling patch. Number of exits was reduced by 31%. The same experiment with 3 vms running filebench showed similar numbers. Looks good, may worth to add the result in the commit log. And looks like the patch only poll for virtqueue. In the future, may worth to add callbacks for vhost_net to poll socket. Then it could be used with rx busy polling in host which may speedup the rx also. Did you mean polling the network device to avoid
Re: [PATCH] vhost: Add polling mode
On Mon, Jul 21, 2014 at 04:23:44PM +0300, Razya Ladelsky wrote: Hello All, When vhost is waiting for buffers from the guest driver (e.g., more packets to send in vhost-net's transmit queue), it normally goes to sleep and waits for the guest to kick it. This kick involves a PIO in the guest, and therefore an exit (and possibly userspace involvement in translating this PIO exit into a file descriptor event), all of which hurts performance. If the system is under-utilized (has cpu time to spare), vhost can continuously poll the virtqueues for new buffers, and avoid asking the guest to kick us. This patch adds an optional polling mode to vhost, that can be enabled via a kernel module parameter, poll_start_rate. When polling is active for a virtqueue, the guest is asked to disable notification (kicks), and the worker thread continuously checks for new buffers. When it does discover new buffers, it simulates a kick by invoking the underlying backend driver (such as vhost-net), which thinks it got a real kick from the guest, and acts accordingly. If the underlying driver asks not to be kicked, we disable polling on this virtqueue. We start polling on a virtqueue when we notice it has work to do. Polling on this virtqueue is later disabled after 3 seconds of polling turning up no new work, as in this case we are better off returning to the exit-based notification mechanism. The default timeout of 3 seconds can be changed with the poll_stop_idle kernel module parameter. This polling approach makes lot of sense for new HW with posted-interrupts for which we have exitless host-to-guest notifications. But even with support for posted interrupts, guest-to-host communication still causes exits. Polling adds the missing part. When systems are overloaded, there won?t be enough cpu time for the various vhost threads to poll their guests' devices. For these scenarios, we plan to add support for vhost threads that can be shared by multiple devices, even of multiple vms. Our ultimate goal is to implement the I/O acceleration features described in: KVM Forum 2013: Efficient and Scalable Virtio (by Abel Gordon) https://www.youtube.com/watch?v=9EyweibHfEs and https://www.mail-archive.com/kvm@vger.kernel.org/msg98179.html Comments are welcome, Thank you, Razya From: Razya Ladelsky ra...@il.ibm.com Add an optional polling mode to continuously poll the virtqueues for new buffers, and avoid asking the guest to kick us. Signed-off-by: Razya Ladelsky ra...@il.ibm.com This is an optimization patch, isn't it? Could you please include some numbers showing its effect? --- drivers/vhost/net.c |6 +- drivers/vhost/scsi.c |5 +- drivers/vhost/vhost.c | 247 +++-- drivers/vhost/vhost.h | 37 +++- 4 files changed, 277 insertions(+), 18 deletions(-) Whitespace seems mangled to the point of making patch unreadable. Can you pls repost? diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 971a760..558aecb 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -742,8 +742,10 @@ static int vhost_net_open(struct inode *inode, struct file *f) } vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX); - vhost_poll_init(n-poll + VHOST_NET_VQ_TX, handle_tx_net, POLLOUT, dev); - vhost_poll_init(n-poll + VHOST_NET_VQ_RX, handle_rx_net, POLLIN, dev); + vhost_poll_init(n-poll + VHOST_NET_VQ_TX, handle_tx_net, POLLOUT, + vqs[VHOST_NET_VQ_TX]); + vhost_poll_init(n-poll + VHOST_NET_VQ_RX, handle_rx_net, POLLIN, + vqs[VHOST_NET_VQ_RX]); f-private_data = n; diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c index 4f4ffa4..56f0233 100644 --- a/drivers/vhost/scsi.c +++ b/drivers/vhost/scsi.c @@ -1528,9 +1528,8 @@ static int vhost_scsi_open(struct inode *inode, struct file *f) if (!vqs) goto err_vqs; - vhost_work_init(vs-vs_completion_work, vhost_scsi_complete_cmd_work); - vhost_work_init(vs-vs_event_work, tcm_vhost_evt_work); - + vhost_work_init(vs-vs_completion_work, NULL, vhost_scsi_complete_cmd_work); + vhost_work_init(vs-vs_event_work, NULL, tcm_vhost_evt_work); vs-vs_events_nr = 0; vs-vs_events_missed = false; diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index c90f437..678d766 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -24,9 +24,17 @@ #include linux/slab.h #include linux/kthread.h #include linux/cgroup.h +#include linux/jiffies.h #include linux/module.h #include vhost.h +static int poll_start_rate = 0; +module_param(poll_start_rate, int, S_IRUGO|S_IWUSR); +MODULE_PARM_DESC(poll_start_rate, Start continuous polling of virtqueue when rate of events is at least this number per jiffy. If 0, never start polling.); +
Re: [PATCH] target-mips: Ignore unassigned accesses with KVM
Il 28/07/2014 23:36, Aurelien Jarno ha scritto: On Mon, Jul 28, 2014 at 12:37:50PM +0100, James Hogan wrote: MIPS registers an unassigned access handler which raises a guest bus error exception. However this causes QEMU to crash when KVM is enabled as it isn't called from the main execution loop so longjmp() gets called without a corresponding setjmp(). Until the KVM API can be updated to trigger a guest exception in response to an MMIO exit, prevent the bus error exception being raised from mips_cpu_unassigned_access() if KVM is enabled. The check is at run time since the do_unassigned_access callback is initialised before it is known whether KVM will be enabled. The problem can be triggered with Malta emulation by making the guest write to the reset region at physical address 0x1bf0, since it is marked read-only which is treated as unassigned for writes. Signed-off-by: James Hogan james.ho...@imgtec.com Cc: Aurelien Jarno aurel...@aurel32.net Cc: Peter Maydell peter.mayd...@linaro.org Cc: Paolo Bonzini pbonz...@redhat.com Cc: Gleb Natapov g...@redhat.com Cc: Christoffer Dall christoffer.d...@linaro.org Cc: Sanjay Lal sanj...@kymasys.com --- target-mips/op_helper.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/target-mips/op_helper.c b/target-mips/op_helper.c index 27651a4a00c1..df97b35f8701 100644 --- a/target-mips/op_helper.c +++ b/target-mips/op_helper.c @@ -21,6 +21,7 @@ #include qemu/host-utils.h #include exec/helper-proto.h #include exec/cpu_ldst.h +#include sysemu/kvm.h #ifndef CONFIG_USER_ONLY static inline void cpu_mips_tlb_flush (CPUMIPSState *env, int flush_global); @@ -2168,6 +2169,16 @@ void mips_cpu_unassigned_access(CPUState *cs, hwaddr addr, MIPSCPU *cpu = MIPS_CPU(cs); CPUMIPSState *env = cpu-env; +/* + * Raising an exception with KVM enabled will crash because it won't be from + * the main execution loop so the longjmp won't have a matching setjmp. + * Until we can trigger a bus error exception through KVM lets just ignore + * the access. + */ +if (kvm_enabled()) { +return; +} + if (is_exec) { helper_raise_exception(env, EXCP_IBE); } else { Reviewed-by: Aurelien Jarno aurel...@aurel32.net Note that even if the test is added for each exception, it is light enough compared to triggering and handling an exception so that it has no impact on performance. Paolo, do you want to take this patch in your kvm tree? Sure, I'll include it for 2.2. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4 v3] xen-pciback: use pci device flag operation helper function
On 29/07/14 05:06, Ethan Zhao wrote: Use pci device flag operation helper functions when set device to assigned or deassigned state. Signed-off-by: Ethan Zhao ethan.z...@oracle.com Konrad already reviewed this but you've not included his reviewed-by tag. I don't understand why we bother with this flag on Xen since we never use it but: Acked-by: David Vrabel david.vra...@citrix.com I'm expecting this to go via the PCI tree. David --- a/drivers/xen/xen-pciback/pci_stub.c +++ b/drivers/xen/xen-pciback/pci_stub.c @@ -133,7 +133,7 @@ static void pcistub_device_release(struct kref *kref) xen_pcibk_config_free_dyn_fields(dev); xen_pcibk_config_free_dev(dev); - dev-dev_flags = ~PCI_DEV_FLAGS_ASSIGNED; + pci_clear_dev_assigned(dev); pci_dev_put(dev); kfree(psdev); @@ -404,7 +404,7 @@ static int pcistub_init_device(struct pci_dev *dev) dev_dbg(dev-dev, reset device\n); xen_pcibk_reset_device(dev); - dev-dev_flags |= PCI_DEV_FLAGS_ASSIGNED; + pci_set_dev_assigned(dev); return 0; config_release: -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] vhost: Add polling mode
Michael S. Tsirkin m...@redhat.com wrote on 29/07/2014 11:06:40 AM: From: Michael S. Tsirkin m...@redhat.com To: Razya Ladelsky/Haifa/IBM@IBMIL, Cc: kvm@vger.kernel.org, abel.gor...@gmail.com, Joel Nider/Haifa/ IBM@IBMIL, Yossi Kuperman1/Haifa/IBM@IBMIL, Eran Raichstein/Haifa/ IBM@IBMIL, Alex Glikson/Haifa/IBM@IBMIL Date: 29/07/2014 11:06 AM Subject: Re: [PATCH] vhost: Add polling mode On Mon, Jul 21, 2014 at 04:23:44PM +0300, Razya Ladelsky wrote: Hello All, When vhost is waiting for buffers from the guest driver (e.g., more packets to send in vhost-net's transmit queue), it normally goes to sleep and waits for the guest to kick it. This kick involves a PIO in the guest, and therefore an exit (and possibly userspace involvement in translating this PIO exit into a file descriptor event), all of which hurts performance. If the system is under-utilized (has cpu time to spare), vhost can continuously poll the virtqueues for new buffers, and avoid asking the guest to kick us. This patch adds an optional polling mode to vhost, that can be enabled via a kernel module parameter, poll_start_rate. When polling is active for a virtqueue, the guest is asked to disable notification (kicks), and the worker thread continuously checks for new buffers. When it does discover new buffers, it simulates a kick by invoking the underlying backend driver (such as vhost-net), which thinks it got a real kick from the guest, and acts accordingly. If the underlying driver asks not to be kicked, we disable polling on this virtqueue. We start polling on a virtqueue when we notice it has work to do. Polling on this virtqueue is later disabled after 3 seconds of polling turning up no new work, as in this case we are better off returning to the exit-based notification mechanism. The default timeout of 3 seconds can be changed with the poll_stop_idle kernel module parameter. This polling approach makes lot of sense for new HW with posted-interrupts for which we have exitless host-to-guest notifications. But even with support for posted interrupts, guest-to-host communication still causes exits. Polling adds the missing part. When systems are overloaded, there won?t be enough cpu time for the various vhost threads to poll their guests' devices. For these scenarios, we plan to add support for vhost threads that can be shared by multiple devices, even of multiple vms. Our ultimate goal is to implement the I/O acceleration features described in: KVM Forum 2013: Efficient and Scalable Virtio (by Abel Gordon) https://www.youtube.com/watch?v=9EyweibHfEs and https://www.mail-archive.com/kvm@vger.kernel.org/msg98179.html Comments are welcome, Thank you, Razya From: Razya Ladelsky ra...@il.ibm.com Add an optional polling mode to continuously poll the virtqueues for new buffers, and avoid asking the guest to kick us. Signed-off-by: Razya Ladelsky ra...@il.ibm.com This is an optimization patch, isn't it? Could you please include some numbers showing its effect? Hi Michael, Sure. I included them in a reply to Jason Wang in this thread, Here it is: http://www.spinics.net/linux/lists/kvm/msg106049.html --- drivers/vhost/net.c |6 +- drivers/vhost/scsi.c |5 +- drivers/vhost/vhost.c | 247 +++-- drivers/vhost/vhost.h | 37 +++- 4 files changed, 277 insertions(+), 18 deletions(-) Whitespace seems mangled to the point of making patch unreadable. Can you pls repost? Sure. diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 971a760..558aecb 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -742,8 +742,10 @@ static int vhost_net_open(struct inode *inode, struct file *f) } vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX); - vhost_poll_init(n-poll + VHOST_NET_VQ_TX, handle_tx_net, POLLOUT, dev); - vhost_poll_init(n-poll + VHOST_NET_VQ_RX, handle_rx_net, POLLIN, dev); + vhost_poll_init(n-poll + VHOST_NET_VQ_TX, handle_tx_net, POLLOUT, + vqs[VHOST_NET_VQ_TX]); + vhost_poll_init(n-poll + VHOST_NET_VQ_RX, handle_rx_net, POLLIN, + vqs[VHOST_NET_VQ_RX]); f-private_data = n; diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c index 4f4ffa4..56f0233 100644 --- a/drivers/vhost/scsi.c +++ b/drivers/vhost/scsi.c @@ -1528,9 +1528,8 @@ static int vhost_scsi_open(struct inode *inode, struct file *f) if (!vqs) goto err_vqs; - vhost_work_init(vs-vs_completion_work, vhost_scsi_complete_cmd_work); - vhost_work_init(vs-vs_event_work, tcm_vhost_evt_work); - + vhost_work_init(vs-vs_completion_work, NULL, vhost_scsi_complete_cmd_work); +
Re: [PATCH] vhost: Add polling mode
On Tue, Jul 29, 2014 at 01:30:03PM +0300, Razya Ladelsky wrote: [..] had to snip off the quoted text, it's mangled up to unreadability. Please take a look at Documentation/email-clients.txt to fix this. This is an optimization patch, isn't it? Could you please include some numbers showing its effect? Hi Michael, Sure. I included them in a reply to Jason Wang in this thread, Here it is: http://www.spinics.net/linux/lists/kvm/msg106049.html Hmm there aren't a lot of numbers there :(. Speed increased by 33% but by how much? E.g. maybe you are getting from 1Mbyte/sec to 1.3, if so it's hard to get excited about it. Some questions that come to mind: what was the message size? I would expect several measurements with different values. How did host CPU utilization change? What about latency? As we are competing with guest for host CPU, would worst-case or average latency suffer? Thanks, -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 6/6] kvm, mem-hotplug: Reload L1's apic access page if it is migrated when L2 is running.
On 07/26/2014 04:44 AM, Jan Kiszka wrote: On 2014-07-23 21:42, Tang Chen wrote: This patch only handle L1 and L2 vm share one apic access page situation. When L1 vm is running, if the shared apic access page is migrated, mmu_notifier will request all vcpus to exit to L0, and reload apic access page physical address for all the vcpus' vmcs (which is done by patch 5/6). And when it enters L2 vm, L2's vmcs will be updated in prepare_vmcs02() called by nested_vm_run(). So we need to do nothing. When L2 vm is running, if the shared apic access page is migrated, mmu_notifier will request all vcpus to exit to L0, and reload apic access page physical address for all L2 vmcs. And this patch requests apic access page reload in L2-L1 vmexit. Shouldn't this patch come before we allow apic access page migration? Yes, it should come before patch 5. Thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH/RFC 4/5] s390x/kvm: test whether a cpu is STOPPED when checking has_work
On 28/07/14 16:22, Alexander Graf wrote: On 28.07.2014, at 16:16, David Hildenbrand d...@linux.vnet.ibm.com wrote: On 10.07.14 15:10, Christian Borntraeger wrote: From: David Hildenbrand d...@linux.vnet.ibm.com If a cpu is stopped, it must never be allowed to run and no interrupt may wake it up. A cpu also has to be unhalted if it is halted and has work to do - this scenario wasn't hit in kvm case yet, as only disabled wait is processed within QEMU. Signed-off-by: David Hildenbrand d...@linux.vnet.ibm.com Reviewed-by: Cornelia Huck cornelia.h...@de.ibm.com Reviewed-by: Christian Borntraeger borntrae...@de.ibm.com Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com This looks like it's something that generic infrastructure should take care of, no? How does this work for the other archs? They always get an interrupt on the transition between !has_work - has_work. Why don't we get one for s390x? Alex Well, we have the special case on s390 as a CPU that is in the STOPPED or the CHECK STOP state may never run - even if there is an interrupt. It's basically like this CPU has been switched off. Imagine that it is tried to inject an interrupt into a stopped vcpu. It will kick the stopped vcpu and thus lead to a call to kvm_arch_process_async_events(). We have to deny that this vcpu will ever run as long as it is stopped. It's like a way to suppress the interrupt for such a transition you mentioned. An interrupt kick usually just means we go back into the main loop. From there we check the interrupt bitmap which interrupt to handle. Check out the handling code here: http://git.qemu.org/?p=qemu.git;a=blob;f=cpu-exec.c;h=38e5f02a307523d99134f4e2e6c51683bb10b45b;hb=HEAD#l580 If you just check for the stopped state in here, do_interrupt() will never get called and thus the CPU shouldn't ever get executed. Unless I'm heavily mistaken :). Later, another vcpu might decide to turn that vcpu back on (by e.g. sending a SIGP START to that vcpu). Yes, in that case that other CPU generates a signal (a different bit in interrupt_request) and the first CPU would see that it has to wake up and wake up. I am not sure if such a mechanism/scenario is applicable to any other arch. They all seem to reset the cs-halted flag if they know they are able to run (e.g. due to an interrupt) - they have no such thing as stopped cpus, only halted/waiting cpus. There's not really much difference between the two. The only difference from a software point of view is that a stopped CPU has its external interrupt bits masked off, no? We have - wait (wait bit in PSW) - disabled wait (wait bit and interrupt fencing in PSW) - STOPPED (not related to PSW, state change usually handled via service processor or hypervisor) I think we have to differentiate between KVM/TCG. On KVM we always do in kernel halt and qemu sees a halted only for STOPPED or disabled wait. TCG has to take care of the normal wait as well. From a first glimpse, a disabled wait and STOPPED look similar, but there are (important) differences, e.g. other CPUs get a different a different result from a SIGP SENSE. This makes a big difference, e.g. for Linux guests, that send a SIGP STOP, followed by a SIGP SENSE loop until the CPU is down on hotplug (and shutdown, kexec..) So I think we agree, that handling the cpu states natively makes sense. The question is now only how to model it correctly without breaking TCG/KVM and reuse as much common code as possible. Correct? Do I understand you correctly, that your collapsing of stopped and halted is only in the qemu coding sense, IOW maybe we could just modify kvm_arch_process_async_events to consider the STOPPED state, as TCGs sigp implementation does not support SMP anyway? David would that work? Christian -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH/RFC 4/5] s390x/kvm: test whether a cpu is STOPPED when checking has_work
On 29.07.14 13:44, Christian Borntraeger wrote: On 28/07/14 16:22, Alexander Graf wrote: On 28.07.2014, at 16:16, David Hildenbrand d...@linux.vnet.ibm.com wrote: On 10.07.14 15:10, Christian Borntraeger wrote: From: David Hildenbrand d...@linux.vnet.ibm.com If a cpu is stopped, it must never be allowed to run and no interrupt may wake it up. A cpu also has to be unhalted if it is halted and has work to do - this scenario wasn't hit in kvm case yet, as only disabled wait is processed within QEMU. Signed-off-by: David Hildenbrand d...@linux.vnet.ibm.com Reviewed-by: Cornelia Huck cornelia.h...@de.ibm.com Reviewed-by: Christian Borntraeger borntrae...@de.ibm.com Signed-off-by: Christian Borntraeger borntrae...@de.ibm.com This looks like it's something that generic infrastructure should take care of, no? How does this work for the other archs? They always get an interrupt on the transition between !has_work - has_work. Why don't we get one for s390x? Alex Well, we have the special case on s390 as a CPU that is in the STOPPED or the CHECK STOP state may never run - even if there is an interrupt. It's basically like this CPU has been switched off. Imagine that it is tried to inject an interrupt into a stopped vcpu. It will kick the stopped vcpu and thus lead to a call to kvm_arch_process_async_events(). We have to deny that this vcpu will ever run as long as it is stopped. It's like a way to suppress the interrupt for such a transition you mentioned. An interrupt kick usually just means we go back into the main loop. From there we check the interrupt bitmap which interrupt to handle. Check out the handling code here: http://git.qemu.org/?p=qemu.git;a=blob;f=cpu-exec.c;h=38e5f02a307523d99134f4e2e6c51683bb10b45b;hb=HEAD#l580 If you just check for the stopped state in here, do_interrupt() will never get called and thus the CPU shouldn't ever get executed. Unless I'm heavily mistaken :). Later, another vcpu might decide to turn that vcpu back on (by e.g. sending a SIGP START to that vcpu). Yes, in that case that other CPU generates a signal (a different bit in interrupt_request) and the first CPU would see that it has to wake up and wake up. I am not sure if such a mechanism/scenario is applicable to any other arch. They all seem to reset the cs-halted flag if they know they are able to run (e.g. due to an interrupt) - they have no such thing as stopped cpus, only halted/waiting cpus. There's not really much difference between the two. The only difference from a software point of view is that a stopped CPU has its external interrupt bits masked off, no? We have - wait (wait bit in PSW) - disabled wait (wait bit and interrupt fencing in PSW) - STOPPED (not related to PSW, state change usually handled via service processor or hypervisor) I think we have to differentiate between KVM/TCG. On KVM we always do in kernel halt and qemu sees a halted only for STOPPED or disabled wait. TCG has to take care of the normal wait as well. From a first glimpse, a disabled wait and STOPPED look similar, but there are (important) differences, e.g. other CPUs get a different a different result from a SIGP SENSE. This makes a big difference, e.g. for Linux guests, that send a SIGP STOP, followed by a SIGP SENSE loop until the CPU is down on hotplug (and shutdown, kexec..) So I think we agree, that handling the cpu states natively makes sense. The question is now only how to model it correctly without breaking TCG/KVM and reuse as much common code as possible. Correct? Do I understand you correctly, that your collapsing of stopped and halted is only in the qemu coding sense, IOW maybe we could just modify kvm_arch_process_async_events to consider the STOPPED state, as TCGs sigp implementation does not support SMP anyway? That works for me, yes. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4 v3] xen-pciback: use pci device flag operation helper function
在 2014年7月29日,下午6:07,David Vrabel david.vra...@citrix.com 写道: On 29/07/14 05:06, Ethan Zhao wrote: Use pci device flag operation helper functions when set device to assigned or deassigned state. Signed-off-by: Ethan Zhao ethan.z...@oracle.com Konrad already reviewed this but you've not included his reviewed-by tag. yep,I lost Konrad's reviewed-by tag. Thanks. I don't understand why we bother with this flag on Xen since we never use it but: Acked-by: David Vrabel david.vra...@citrix.com I'm expecting this to go via the PCI tree. David --- a/drivers/xen/xen-pciback/pci_stub.c +++ b/drivers/xen/xen-pciback/pci_stub.c @@ -133,7 +133,7 @@ static void pcistub_device_release(struct kref *kref) xen_pcibk_config_free_dyn_fields(dev); xen_pcibk_config_free_dev(dev); -dev-dev_flags = ~PCI_DEV_FLAGS_ASSIGNED; +pci_clear_dev_assigned(dev); pci_dev_put(dev); kfree(psdev); @@ -404,7 +404,7 @@ static int pcistub_init_device(struct pci_dev *dev) dev_dbg(dev-dev, reset device\n); xen_pcibk_reset_device(dev); -dev-dev_flags |= PCI_DEV_FLAGS_ASSIGNED; +pci_set_dev_assigned(dev); return 0; config_release: -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] vhost: Add polling mode
Hmm there aren't a lot of numbers there :(. Speed increased by 33% but by how much? E.g. maybe you are getting from 1Mbyte/sec to 1.3, if so it's hard to get excited about it. Netperf 1 VM: 1516 MB/sec - 2046 MB/sec and for 3 VMs: 4086 MB/sec - 5545 MB/sec Some questions that come to mind: what was the message size? I would expect several measurements with different values. How did host CPU utilization change? message size was 64B in order to get the VM to be cpu saturated. so vhost had 99% cpu and vhost 38%, with the polling patch both had 99%. What about latency? As we are competing with guest for host CPU, would worst-case or average latency suffer? Polling indeed doesn't make a lot of sense if there aren't enough available cores. In these cases polling should not be used. Thank you, Razya Thanks, -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/4 v3] KVM: use pci device flag operation helper functions
This patch was already Acked-by: Paolo Bonzini pbonz...@redhat.com I forgot to add the Ack tag. Thanks, Ethan Sorry for last post in HTML format with ipad. On Tue, Jul 29, 2014 at 12:06 PM, Ethan Zhao ethan.z...@oracle.com wrote: Use helper function instead of direct operation to pci device flag when set device to assigned or deassigned. Signed-off-by: Ethan Zhao ethan.z...@oracle.com --- v3: amend helper functions naming. virt/kvm/assigned-dev.c |2 +- virt/kvm/iommu.c|4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/virt/kvm/assigned-dev.c b/virt/kvm/assigned-dev.c index bf06577..d122bda 100644 --- a/virt/kvm/assigned-dev.c +++ b/virt/kvm/assigned-dev.c @@ -302,7 +302,7 @@ static void kvm_free_assigned_device(struct kvm *kvm, else pci_restore_state(assigned_dev-dev); - assigned_dev-dev-dev_flags = ~PCI_DEV_FLAGS_ASSIGNED; + pci_clear_dev_assigned(assigned_dev-dev); pci_release_regions(assigned_dev-dev); pci_disable_device(assigned_dev-dev); diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c index 0df7d4b..8cfe021 100644 --- a/virt/kvm/iommu.c +++ b/virt/kvm/iommu.c @@ -194,7 +194,7 @@ int kvm_assign_device(struct kvm *kvm, goto out_unmap; } - pdev-dev_flags |= PCI_DEV_FLAGS_ASSIGNED; + pci_set_dev_assigned(pdev); dev_info(pdev-dev, kvm assign device\n); @@ -220,7 +220,7 @@ int kvm_deassign_device(struct kvm *kvm, iommu_detach_device(domain, pdev-dev); - pdev-dev_flags = ~PCI_DEV_FLAGS_ASSIGNED; + pci_clear_dev_assigned(pdev); dev_info(pdev-dev, kvm deassign device\n); -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/4 v3] xen-pciback: use pci device flag operation helper function
On Tue, Jul 29, 2014 at 6:07 PM, David Vrabel david.vra...@citrix.com wrote: On 29/07/14 05:06, Ethan Zhao wrote: Use pci device flag operation helper functions when set device to assigned or deassigned state. Signed-off-by: Ethan Zhao ethan.z...@oracle.com Konrad already reviewed this but you've not included his reviewed-by tag. I don't understand why we bother with this flag on Xen since we never use it but: Acked-by: David Vrabel david.vra...@citrix.com I'm expecting this to go via the PCI tree. Seems Bjorn still holds his bullet for last shot :) David --- a/drivers/xen/xen-pciback/pci_stub.c +++ b/drivers/xen/xen-pciback/pci_stub.c @@ -133,7 +133,7 @@ static void pcistub_device_release(struct kref *kref) xen_pcibk_config_free_dyn_fields(dev); xen_pcibk_config_free_dev(dev); - dev-dev_flags = ~PCI_DEV_FLAGS_ASSIGNED; + pci_clear_dev_assigned(dev); pci_dev_put(dev); kfree(psdev); @@ -404,7 +404,7 @@ static int pcistub_init_device(struct pci_dev *dev) dev_dbg(dev-dev, reset device\n); xen_pcibk_reset_device(dev); - dev-dev_flags |= PCI_DEV_FLAGS_ASSIGNED; + pci_set_dev_assigned(dev); return 0; config_release: -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] vhost: Add polling mode
On Tue, Jul 29, 2014 at 03:23:59PM +0300, Razya Ladelsky wrote: Hmm there aren't a lot of numbers there :(. Speed increased by 33% but by how much? E.g. maybe you are getting from 1Mbyte/sec to 1.3, if so it's hard to get excited about it. Netperf 1 VM: 1516 MB/sec - 2046 MB/sec and for 3 VMs: 4086 MB/sec - 5545 MB/sec What do you mean by 1 VM? Streaming TCP host to vm? Also, your throughput is somewhat low, it's worth seeing why you can't hit higher speeds. Some questions that come to mind: what was the message size? I would expect several measurements with different values. How did host CPU utilization change? message size was 64B in order to get the VM to be cpu saturated. so vhost had 99% cpu and vhost 38%, with the polling patch both had 99%. Hmm so a net loss in throughput/CPU. What about latency? As we are competing with guest for host CPU, would worst-case or average latency suffer? Polling indeed doesn't make a lot of sense if there aren't enough available cores. In these cases polling should not be used. Thank you, Razya OK but scheduler might run vm and vhost on the same cpu even if cores are available. This needs to be detected somehow and polling disabled. Thanks, -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Mailbox Warning
Email Account Warning !!! This mail is from Administrator; we wish to bring to your notice the Condition of your email account. We have just noticed that you have exceeded your email Database limit of 500 MB quota and your email IP is causing conflict because it is been accessed in different server location. You need to Upgrade and expand your email quota limit before you can continue to use your email. Provide details or click the link below : Update your email quota limit to 2.6 GB, use the below web link: https://www.formlogix.com/Manager/UserConditionalSurvey248582.aspx?Param=VXNlcklkPTI0ODU4Mi5Gb3JtSWQ9MQ%3d%3d Failure to do this will result to email deactivation within 24hours Thank you for your understanding. Copyright 2014 Help Desk Email Upgrade -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH/RFC 4/5] s390x/kvm: test whether a cpu is STOPPED when checking has_work
Il 28/07/2014 17:03, David Hildenbrand ha scritto: Well the difference is, that a STOPPED vcpu can be woken up by non-interrupt like things (SIGP START) AND a special interrupt (SIGP RESTART - which is like a SIPI++ as it performs a psw exchange - NMI). So we basically have two paths that can lead to a state change. All interrupt bits may be in any combination (SIGP RESTART interrupts can't be masked out, nor can SIGP START be denied). The other thing may be that on s390, each vcpu (including itself) can put another vcpu into the STOPPED state - I assume that this is different for x86 INIT_RECEIVED. For this reason we have to watch out for bad race conditions (e.g. multiple vcpus working on another vcpu)... You can do that in x86 by sending an INIT inter-processor interrupt. A SIPI is ignored if the CPU is not in INIT_RECEIVED state. Commit 66450a21f99636af4fafac2afd33f1a40631bc3a introduced the current implementation. - an INIT cancels a previous SIPI; - if both INIT and SIPI are sent, on real hardware you need to have a few hundred microseconds between them, but KVM will reliably process INIT before SIPI. See commit 299018f44ac553dce3caf84df1d14c4764faa279 for an example of the races that can happen. Note that x86 has KVM_MP_STATE_SIPI_RECEIVED state but it is obsolete, we go straight from KVM_MP_STATE_INIT_RECEIVED to KVM_MP_STATE_RUNNABLE. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/6] KVM: PPC: BOOKE: Allow guest to change MSR_DE
On 29.07.14 00:01, Scott Wood wrote: On Fri, 2014-07-11 at 14:09 +0530, Bharat Bhushan wrote: When userspace is debugging guest then MSR_DE is always set and MSRP_DEP is set so that guest cannot change MSR_DE. Guest debug resources are not yet emulated, So there seems no reason we should stop guest controlling MSR_DE. Also a followup patch will enable debug emulation and that requires guest to control MSR_DE. Why does it matter whether we emulate debug resources? We still don't want the guest to be able to clear MSR[DE] and thus break host debug. The patch description is misleading. This patch changes the default of DEP to guest controlled when it boots up. Once QEMU wants control over the debug registers, it gets switched to QEMU controlled (that code is already there). Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH/RFC 4/5] s390x/kvm: test whether a cpu is STOPPED when checking has_work
Il 28/07/2014 17:03, David Hildenbrand ha scritto: Well the difference is, that a STOPPED vcpu can be woken up by non-interrupt like things (SIGP START) AND a special interrupt (SIGP RESTART - which is like a SIPI++ as it performs a psw exchange - NMI). So we basically have two paths that can lead to a state change. All interrupt bits may be in any combination (SIGP RESTART interrupts can't be masked out, nor can SIGP START be denied). The other thing may be that on s390, each vcpu (including itself) can put another vcpu into the STOPPED state - I assume that this is different for x86 INIT_RECEIVED. For this reason we have to watch out for bad race conditions (e.g. multiple vcpus working on another vcpu)... You can do that in x86 by sending an INIT inter-processor interrupt. A SIPI is ignored if the CPU is not in INIT_RECEIVED state. Commit 66450a21f99636af4fafac2afd33f1a40631bc3a introduced the current implementation. - an INIT cancels a previous SIPI; - if both INIT and SIPI are sent, on real hardware you need to have a few hundred microseconds between them, but KVM will reliably process INIT before SIPI. See commit 299018f44ac553dce3caf84df1d14c4764faa279 for an example of the races that can happen. Note that x86 has KVM_MP_STATE_SIPI_RECEIVED state but it is obsolete, we go straight from KVM_MP_STATE_INIT_RECEIVED to KVM_MP_STATE_RUNNABLE. Thanks for the explanation Paolo! Looks like from an interrupt point of view, the states have a lot in common. The major thing that differs on s390 is probably the way these interrupts are generated and what else they influence (all the power of the SIGP facility :) + special check-stop state that can't be left by an interrupt, only by SIGP CPU resets). David -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/6] KVM: PPC: BOOKE: Emulate debug registers and exception
On Tue, 2014-07-29 at 16:06 +0200, Alexander Graf wrote: On 29.07.14 00:33, Scott Wood wrote: On Mon, 2014-07-28 at 16:04 +0200, Alexander Graf wrote: On 11.07.14 10:39, Bharat Bhushan wrote: This patch emulates debug registers and debug exception to support guest using debug resource. This enables running gdb/kgdb etc in guest. On BOOKE architecture we cannot share debug resources between QEMU and guest because: When QEMU is using debug resources then debug exception must be always enabled. To achieve this we set MSR_DE and also set MSRP_DEP so guest cannot change MSR_DE. When emulating debug resource for guest we want guest to control MSR_DE (enable/disable debug interrupt on need). So above mentioned two configuration cannot be supported at the same time. So the result is that we cannot share debug resources between QEMU and Guest on BOOKE architecture. In the current design QEMU gets priority over guest, this means that if QEMU is using debug resources then guest cannot use them and if guest is using debug resource then QEMU can overwrite them. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- Hi Alex, I thought of having some print in register emulation if QEMU is using debug resource, Also when QEMU overwrites guest written values but that looks excessive. If I uses some variable which get set when guest starts using debug registers and check in debug set ioctl then that look ugly. Looking for suggestions Whatever you do, have QEMU do the print, not the kernel. How would that be accomplished? How would the kernel know to exit to QEMU, and how would the exit reason be conveyed? QEMU is the one forcefully enabling debug and overwriting guest debug registers, so it also knows when it did overwrite valid ones. QEMU knows when it overwrites the guest values, but it doesn't know if, after enabling host debug, the guest tries to write to the debug registers and it gets nopped. If we keep the EDM setting, then we can at least say the situation is no worse than with a JTAG. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/6] KVM: PPC: BOOKE: Emulate debug registers and exception
Am 29.07.2014 um 19:50 schrieb Scott Wood scottw...@freescale.com: On Tue, 2014-07-29 at 16:06 +0200, Alexander Graf wrote: On 29.07.14 00:33, Scott Wood wrote: On Mon, 2014-07-28 at 16:04 +0200, Alexander Graf wrote: On 11.07.14 10:39, Bharat Bhushan wrote: This patch emulates debug registers and debug exception to support guest using debug resource. This enables running gdb/kgdb etc in guest. On BOOKE architecture we cannot share debug resources between QEMU and guest because: When QEMU is using debug resources then debug exception must be always enabled. To achieve this we set MSR_DE and also set MSRP_DEP so guest cannot change MSR_DE. When emulating debug resource for guest we want guest to control MSR_DE (enable/disable debug interrupt on need). So above mentioned two configuration cannot be supported at the same time. So the result is that we cannot share debug resources between QEMU and Guest on BOOKE architecture. In the current design QEMU gets priority over guest, this means that if QEMU is using debug resources then guest cannot use them and if guest is using debug resource then QEMU can overwrite them. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- Hi Alex, I thought of having some print in register emulation if QEMU is using debug resource, Also when QEMU overwrites guest written values but that looks excessive. If I uses some variable which get set when guest starts using debug registers and check in debug set ioctl then that look ugly. Looking for suggestions Whatever you do, have QEMU do the print, not the kernel. How would that be accomplished? How would the kernel know to exit to QEMU, and how would the exit reason be conveyed? QEMU is the one forcefully enabling debug and overwriting guest debug registers, so it also knows when it did overwrite valid ones. QEMU knows when it overwrites the guest values, but it doesn't know if, after enabling host debug, the guest tries to write to the debug registers and it gets nopped. If we keep the EDM setting, then we can at least say the situation is no worse than with a JTAG. Yeah, I think that's perfectly reasonable. I don't think it'll be likely that a user starts debugging with qemu and then expects guest debugging to work. The other way around is more likely and would warrant a warning to the user - if we care. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hello!!!
Hello, I am Ms. Claudia Hiu, I work with Fubon Bank (Hong Kong), I have a business transaction in your name which prompted me to contact you. Please i would appreciate your urgent respond immediately to serve you with more details. Greetings Ms. Claudia Hiu -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: vmx: remove duplicate vmx_mpx_supported()
Remove a function which was added by both 93c4adc7afe and 36be0b9deb2. Signed-off-by: Chris J Arges chris.j.ar...@canonical.com --- arch/x86/kvm/vmx.c | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 801332e..c4ea039 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -740,7 +740,6 @@ static u32 vmx_segment_access_rights(struct kvm_segment *var); static void vmx_sync_pir_to_irr_dummy(struct kvm_vcpu *vcpu); static void copy_vmcs12_to_shadow(struct vcpu_vmx *vmx); static void copy_shadow_to_vmcs12(struct vcpu_vmx *vmx); -static bool vmx_mpx_supported(void); static DEFINE_PER_CPU(struct vmcs *, vmxarea); static DEFINE_PER_CPU(struct vmcs *, current_vmcs); -- 1.9.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
dirty page tracking in kvm/qemu -- page faults inevitable?
Hi, I've got an issue where we're hitting major performance penalties while doing live migration, and it seems like it might be due to page faults triggering hypervisor exits, and then we get stuck waiting for the iothread lock which is held by the qemu dirty page scanning code. Accordingly, I'm trying to figure out the actual mechanism whereby dirty pages are tracked in qemu/kvm. I've got an Ivy Bridge CPU, a 3.4 kernel on the host, and qemu 1.4. Looking at the qemu code, it seems to be calling down into kvm to get the dirty page information. Looking at kvm, most of what I read seems to be doing the usual mark it read-only and then when we take a page fault mark it as dirty trick. However, I read something about Intel EPT having hardware support for tracking dirty pages. It seems like this might avoid the need for a page fault, but might only be available on Haswell or later CPUs--is that correct? Is it supported in kvm? If so, when was support added? Thanks, Chris P.S. Please CC me on reply, I'm not subscribed to the list. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 2/5] random: Add and use arch_get_rng_seed
On Wed, Jul 23, 2014 at 9:57 PM, Andy Lutomirski l...@amacapital.net wrote: Currently, init_std_data contains its own logic for using arch random sources. This replaces that logic with a generic function arch_get_rng_seed that allows arch code to supply its own logic. The default implementation tries arch_get_random_seed_long and arch_get_random_long individually. The only functional change here is that random_get_entropy() is used unconditionally instead of being used only when the arch sources fail. This may add a tiny amount of security. tytso, are you okay with this approach? I'd be happy to rework this if you prefer some other way of doing it. --Andy Signed-off-by: Andy Lutomirski l...@amacapital.net --- drivers/char/random.c | 14 +++--- include/linux/random.h | 40 2 files changed, 51 insertions(+), 3 deletions(-) diff --git a/drivers/char/random.c b/drivers/char/random.c index 0a7ac0a..be7a94e 100644 --- a/drivers/char/random.c +++ b/drivers/char/random.c @@ -1236,6 +1236,10 @@ void get_random_bytes_arch(void *buf, int nbytes) } EXPORT_SYMBOL(get_random_bytes_arch); +static void seed_entropy_store(void *ctx, u32 data) +{ + mix_pool_bytes((struct entropy_store *)ctx, data, sizeof(data), NULL); +} /* * init_std_data - initialize pool with system data @@ -1251,15 +1255,19 @@ static void init_std_data(struct entropy_store *r) int i; ktime_t now = ktime_get_real(); unsigned long rv; + char log_prefix[128]; r-last_pulled = jiffies; mix_pool_bytes(r, now, sizeof(now), NULL); for (i = r-poolinfo-poolbytes; i 0; i -= sizeof(rv)) { - if (!arch_get_random_seed_long(rv) - !arch_get_random_long(rv)) - rv = random_get_entropy(); + rv = random_get_entropy(); mix_pool_bytes(r, rv, sizeof(rv), NULL); } + + sprintf(log_prefix, random: seeded %s pool, r-name); + arch_get_rng_seed(r, seed_entropy_store, 8 * r-poolinfo-poolbytes, + log_prefix); + mix_pool_bytes(r, utsname(), sizeof(*(utsname())), NULL); } diff --git a/include/linux/random.h b/include/linux/random.h index 57fbbff..81a6145 100644 --- a/include/linux/random.h +++ b/include/linux/random.h @@ -106,6 +106,46 @@ static inline int arch_has_random_seed(void) } #endif +#ifndef __HAVE_ARCH_GET_RNG_SEED + +/** + * arch_get_rng_seed() - get architectural rng seed data + * @ctx: context for the seed function + * @seed: function to call for each u32 obtained + * @bits_per_source: number of bits from each source to try to use + * @log_prefix: beginning of log output (may be NULL) + * + * Synchronously load some architectural entropy or other best-effort + * random seed data. An arch-specific implementation should be no worse + * than this generic implementation. If the arch code does something + * interesting, it may log something of the form log_prefix with + * 8 bits of stuff. + * + * No arch-specific implementation should be any worse than the generic + * implementation. + */ +static inline void arch_get_rng_seed(void *ctx, +void (*seed)(void *ctx, u32 data), +int bits_per_source, +const char *log_prefix) +{ + int i; + + for (i = 0; i bits_per_source; i += 8 * sizeof(long)) { + unsigned long rv; + + if (arch_get_random_seed_long(rv) || + arch_get_random_long(rv)) { + seed(ctx, (u32)rv); +#if BITS_PER_LONG 32 + seed(ctx, (u32)(rv 32)); +#endif + } + } +} + +#endif /* __HAVE_ARCH_GET_RNG_SEED */ + /* Pseudo random number generator from numerical recipes. */ static inline u32 next_pseudo_random32(u32 seed) { -- 1.9.3 -- Andy Lutomirski AMA Capital Management, LLC -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 1/6] KVM: PPC: BOOKE: No need to set DBCR0_EDM in guest visible register
-Original Message- From: Wood Scott-B07421 Sent: Tuesday, July 29, 2014 3:22 AM To: Bhushan Bharat-R65777 Cc: ag...@suse.de; kvm-...@vger.kernel.org; kvm@vger.kernel.org; Yoder Stuart- B08248 Subject: Re: [PATCH 1/6] KVM: PPC: BOOKE: No need to set DBCR0_EDM in guest visible register On Fri, 2014-07-11 at 14:08 +0530, Bharat Bhushan wrote: This is not used and even I do not remember why this was added in first place. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- arch/powerpc/kvm/booke.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index ab62109..a5ee42c 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -1804,8 +1804,6 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, kvm_guest_protect_msr(vcpu, MSR_DE, true); vcpu-guest_debug = dbg-control; vcpu-arch.shadow_dbg_reg.dbcr0 = 0; - /* Set DBCR0_EDM in guest visible DBCR0 register. */ - vcpu-arch.dbg_reg.dbcr0 = DBCR0_EDM; if (vcpu-guest_debug KVM_GUESTDBG_SINGLESTEP) vcpu-arch.shadow_dbg_reg.dbcr0 |= DBCR0_IDM | DBCR0_IC; This was intended to let the guest know that the host owns the debug resources, by analogy to what a JTAG debugger would do. The Power ISA has this Virtualized Implementation Note: It is the responsibility of the hypervisor to ensure that DBCR0[EDM] is consistent with usage of DEP. Ok, That means that if MSRP_DEP is set then set DBCR0_EDM and if MSRP_DEP is clear then clear DBCR0_EDM, right? We need to implement above mentioned this. Thanks -Bharat -Scott N�r��yb�X��ǧv�^�){.n�+h����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf
RE: [PATCH 2/6] KVM: PPC: BOOKE: Force MSR_DE in rfci if guest is under debug
-Original Message- From: Wood Scott-B07421 Sent: Tuesday, July 29, 2014 3:25 AM To: Bhushan Bharat-R65777 Cc: ag...@suse.de; kvm-...@vger.kernel.org; kvm@vger.kernel.org; Yoder Stuart- B08248 Subject: Re: [PATCH 2/6] KVM: PPC: BOOKE: Force MSR_DE in rfci if guest is under debug On Fri, 2014-07-11 at 14:08 +0530, Bharat Bhushan wrote: When userspace (QEMU) is using the debug resource to debug guest then we want MSR_DE to be always set. This patch adds missing MSR_DE setting in rfci instruction. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- arch/powerpc/kvm/booke_emulate.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/booke_emulate.c b/arch/powerpc/kvm/booke_emulate.c index 27a4b28..80c51a2 100644 --- a/arch/powerpc/kvm/booke_emulate.c +++ b/arch/powerpc/kvm/booke_emulate.c @@ -40,7 +40,11 @@ static void kvmppc_emul_rfi(struct kvm_vcpu *vcpu) static void kvmppc_emul_rfci(struct kvm_vcpu *vcpu) { vcpu-arch.pc = vcpu-arch.csrr0; - kvmppc_set_msr(vcpu, vcpu-arch.csrr1); + /* Force MSR_DE when guest does not own debug facilities */ + if (vcpu-guest_debug) + kvmppc_set_msr(vcpu, vcpu-arch.csrr1 | MSR_DE); + else + kvmppc_set_msr(vcpu, vcpu-arch.csrr1); } int kvmppc_booke_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu, It looks like this is already handled by kvmppc_vcpu_sync_debug(), which is called by kvmppc_set_msr(). Yes, you are right. This patch is not needed. Thanks -Bharat Plus, it should only be done for HV mode. -Scott
RE: [PATCH 5/6] KVM: PPC: BOOKE: Allow guest to change MSR_DE
-Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Tuesday, July 29, 2014 7:35 PM To: Wood Scott-B07421; Bhushan Bharat-R65777 Cc: kvm-...@vger.kernel.org; kvm@vger.kernel.org; Yoder Stuart-B08248 Subject: Re: [PATCH 5/6] KVM: PPC: BOOKE: Allow guest to change MSR_DE On 29.07.14 00:01, Scott Wood wrote: On Fri, 2014-07-11 at 14:09 +0530, Bharat Bhushan wrote: When userspace is debugging guest then MSR_DE is always set and MSRP_DEP is set so that guest cannot change MSR_DE. Guest debug resources are not yet emulated, So there seems no reason we should stop guest controlling MSR_DE. Also a followup patch will enable debug emulation and that requires guest to control MSR_DE. Why does it matter whether we emulate debug resources? We still don't want the guest to be able to clear MSR[DE] and thus break host debug. The patch description is misleading. This patch changes the default of DEP to guest controlled when it boots up. Once QEMU wants control over the debug registers, it gets switched to QEMU controlled (that code is already there). Yes, now default MSR_DE is controlled by guest and when QEMU wants to use debug resources then MSR_DEP is set, so guest cannot change MSR_DE. Thanks -Bharat Alex
RE: [PATCH 6/6] KVM: PPC: BOOKE: Emulate debug registers and exception
-Original Message- From: Wood Scott-B07421 Sent: Tuesday, July 29, 2014 11:20 PM To: Alexander Graf Cc: Bhushan Bharat-R65777; kvm-...@vger.kernel.org; kvm@vger.kernel.org; Yoder Stuart-B08248 Subject: Re: [PATCH 6/6] KVM: PPC: BOOKE: Emulate debug registers and exception On Tue, 2014-07-29 at 16:06 +0200, Alexander Graf wrote: On 29.07.14 00:33, Scott Wood wrote: On Mon, 2014-07-28 at 16:04 +0200, Alexander Graf wrote: On 11.07.14 10:39, Bharat Bhushan wrote: This patch emulates debug registers and debug exception to support guest using debug resource. This enables running gdb/kgdb etc in guest. On BOOKE architecture we cannot share debug resources between QEMU and guest because: When QEMU is using debug resources then debug exception must be always enabled. To achieve this we set MSR_DE and also set MSRP_DEP so guest cannot change MSR_DE. When emulating debug resource for guest we want guest to control MSR_DE (enable/disable debug interrupt on need). So above mentioned two configuration cannot be supported at the same time. So the result is that we cannot share debug resources between QEMU and Guest on BOOKE architecture. In the current design QEMU gets priority over guest, this means that if QEMU is using debug resources then guest cannot use them and if guest is using debug resource then QEMU can overwrite them. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- Hi Alex, I thought of having some print in register emulation if QEMU is using debug resource, Also when QEMU overwrites guest written values but that looks excessive. If I uses some variable which get set when guest starts using debug registers and check in debug set ioctl then that look ugly. Looking for suggestions Whatever you do, have QEMU do the print, not the kernel. How would that be accomplished? How would the kernel know to exit to QEMU, and how would the exit reason be conveyed? QEMU is the one forcefully enabling debug and overwriting guest debug registers, so it also knows when it did overwrite valid ones. QEMU knows when it overwrites the guest values, but it doesn't know if, after enabling host debug, the guest tries to write to the debug registers and it gets nopped. Do we want that QEMU first get DBCR0 to know whether it is overwriting whenever set/clear debug register? If we keep the EDM setting, then we can at least say the situation is no worse than with a JTAG. Yes Thanks -Bharat -Scott N�r��yb�X��ǧv�^�){.n�+h����ܨ}���Ơz�j:+v���zZ+��+zf���h���~i���z��w���?��)ߢf
Re: [PATCH] powerpc: kvm: make the setup of hpte under the protection of KVMPPC_RMAP_LOCK_BIT
On Tue, Jul 29, 2014 at 2:57 PM, Benjamin Herrenschmidt b...@kernel.crashing.org wrote: On Mon, 2014-07-28 at 15:58 +0800, Liu ping fan wrote: Hope I am right. Take the following seq as an example if (hptep[0] HPTE_V_VALID) { /* HPTE was previously valid, so we need to invalidate it */ unlock_rmap(rmap); hptep[0] |= HPTE_V_ABSENT; kvmppc_invalidate_hpte(kvm, hptep, index); /* don't lose previous R and C bits */ r |= hptep[1] (HPTE_R_R | HPTE_R_C); } else { kvmppc_add_revmap_chain(kvm, rev, rmap, index, 0); } - if we try_to_unmap on pfn at here, then @r contains a invalid pfn hptep[1] = r; eieio(); hptep[0] = hpte[0]; asm volatile(ptesync : : : memory); If that was the case we would have the same race in kvmppc_do_h_enter(). I think the fact that the HPTE is locked will prevent the race, ie, HPTE_V_HVLOCK is set until hptep[0] is written to. If I look at at the unmap case, my understanding is that it uses kvm_unmap_rmapp() which will also lock the HPTE (try_lock_hpte) and so shouldn't have a race vs the above code. Yes, you are right :) Thx, Fan Or do you see a race I don't ? Cheers, Ben. Thx. Fan On Mon, Jul 28, 2014 at 2:42 PM, Benjamin Herrenschmidt b...@kernel.crashing.org wrote: On Mon, 2014-07-28 at 14:09 +0800, Liu Ping Fan wrote: In current code, the setup of hpte is under the risk of race with mmu_notifier_invalidate, i.e we may setup a hpte with a invalid pfn. Resolve this issue by sync the two actions by KVMPPC_RMAP_LOCK_BIT. Please describe the race you think you see. I'm quite sure both Paul and I went over that code and somewhat convinced ourselves that it was ok but it's possible that we were both wrong :-) Cheers, Ben. Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com --- arch/powerpc/kvm/book3s_64_mmu_hv.c | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 8056107..e6dcff4 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -754,19 +754,24 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, if (hptep[0] HPTE_V_VALID) { /* HPTE was previously valid, so we need to invalidate it */ - unlock_rmap(rmap); hptep[0] |= HPTE_V_ABSENT; kvmppc_invalidate_hpte(kvm, hptep, index); /* don't lose previous R and C bits */ r |= hptep[1] (HPTE_R_R | HPTE_R_C); + + hptep[1] = r; + eieio(); + hptep[0] = hpte[0]; + asm volatile(ptesync : : : memory); + unlock_rmap(rmap); } else { + hptep[1] = r; + eieio(); + hptep[0] = hpte[0]; + asm volatile(ptesync : : : memory); kvmppc_add_revmap_chain(kvm, rev, rmap, index, 0); } - hptep[1] = r; - eieio(); - hptep[0] = hpte[0]; - asm volatile(ptesync : : : memory); preempt_enable(); if (page hpte_is_writable(r)) SetPageDirty(page); -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] powerpc: kvm: make the setup of hpte under the protection of KVMPPC_RMAP_LOCK_BIT
On Mon, 2014-07-28 at 15:58 +0800, Liu ping fan wrote: Hope I am right. Take the following seq as an example if (hptep[0] HPTE_V_VALID) { /* HPTE was previously valid, so we need to invalidate it */ unlock_rmap(rmap); hptep[0] |= HPTE_V_ABSENT; kvmppc_invalidate_hpte(kvm, hptep, index); /* don't lose previous R and C bits */ r |= hptep[1] (HPTE_R_R | HPTE_R_C); } else { kvmppc_add_revmap_chain(kvm, rev, rmap, index, 0); } - if we try_to_unmap on pfn at here, then @r contains a invalid pfn hptep[1] = r; eieio(); hptep[0] = hpte[0]; asm volatile(ptesync : : : memory); If that was the case we would have the same race in kvmppc_do_h_enter(). I think the fact that the HPTE is locked will prevent the race, ie, HPTE_V_HVLOCK is set until hptep[0] is written to. If I look at at the unmap case, my understanding is that it uses kvm_unmap_rmapp() which will also lock the HPTE (try_lock_hpte) and so shouldn't have a race vs the above code. Or do you see a race I don't ? Cheers, Ben. Thx. Fan On Mon, Jul 28, 2014 at 2:42 PM, Benjamin Herrenschmidt b...@kernel.crashing.org wrote: On Mon, 2014-07-28 at 14:09 +0800, Liu Ping Fan wrote: In current code, the setup of hpte is under the risk of race with mmu_notifier_invalidate, i.e we may setup a hpte with a invalid pfn. Resolve this issue by sync the two actions by KVMPPC_RMAP_LOCK_BIT. Please describe the race you think you see. I'm quite sure both Paul and I went over that code and somewhat convinced ourselves that it was ok but it's possible that we were both wrong :-) Cheers, Ben. Signed-off-by: Liu Ping Fan pingf...@linux.vnet.ibm.com --- arch/powerpc/kvm/book3s_64_mmu_hv.c | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c index 8056107..e6dcff4 100644 --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c @@ -754,19 +754,24 @@ int kvmppc_book3s_hv_page_fault(struct kvm_run *run, struct kvm_vcpu *vcpu, if (hptep[0] HPTE_V_VALID) { /* HPTE was previously valid, so we need to invalidate it */ - unlock_rmap(rmap); hptep[0] |= HPTE_V_ABSENT; kvmppc_invalidate_hpte(kvm, hptep, index); /* don't lose previous R and C bits */ r |= hptep[1] (HPTE_R_R | HPTE_R_C); + + hptep[1] = r; + eieio(); + hptep[0] = hpte[0]; + asm volatile(ptesync : : : memory); + unlock_rmap(rmap); } else { + hptep[1] = r; + eieio(); + hptep[0] = hpte[0]; + asm volatile(ptesync : : : memory); kvmppc_add_revmap_chain(kvm, rev, rmap, index, 0); } - hptep[1] = r; - eieio(); - hptep[0] = hpte[0]; - asm volatile(ptesync : : : memory); preempt_enable(); if (page hpte_is_writable(r)) SetPageDirty(page); -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/6] KVM: PPC: BOOKE: Allow guest to change MSR_DE
On 29.07.14 00:01, Scott Wood wrote: On Fri, 2014-07-11 at 14:09 +0530, Bharat Bhushan wrote: When userspace is debugging guest then MSR_DE is always set and MSRP_DEP is set so that guest cannot change MSR_DE. Guest debug resources are not yet emulated, So there seems no reason we should stop guest controlling MSR_DE. Also a followup patch will enable debug emulation and that requires guest to control MSR_DE. Why does it matter whether we emulate debug resources? We still don't want the guest to be able to clear MSR[DE] and thus break host debug. The patch description is misleading. This patch changes the default of DEP to guest controlled when it boots up. Once QEMU wants control over the debug registers, it gets switched to QEMU controlled (that code is already there). Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/6] KVM: PPC: BOOKE: Emulate debug registers and exception
On 29.07.14 00:33, Scott Wood wrote: On Mon, 2014-07-28 at 16:04 +0200, Alexander Graf wrote: On 11.07.14 10:39, Bharat Bhushan wrote: This patch emulates debug registers and debug exception to support guest using debug resource. This enables running gdb/kgdb etc in guest. On BOOKE architecture we cannot share debug resources between QEMU and guest because: When QEMU is using debug resources then debug exception must be always enabled. To achieve this we set MSR_DE and also set MSRP_DEP so guest cannot change MSR_DE. When emulating debug resource for guest we want guest to control MSR_DE (enable/disable debug interrupt on need). So above mentioned two configuration cannot be supported at the same time. So the result is that we cannot share debug resources between QEMU and Guest on BOOKE architecture. In the current design QEMU gets priority over guest, this means that if QEMU is using debug resources then guest cannot use them and if guest is using debug resource then QEMU can overwrite them. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- Hi Alex, I thought of having some print in register emulation if QEMU is using debug resource, Also when QEMU overwrites guest written values but that looks excessive. If I uses some variable which get set when guest starts using debug registers and check in debug set ioctl then that look ugly. Looking for suggestions Whatever you do, have QEMU do the print, not the kernel. How would that be accomplished? How would the kernel know to exit to QEMU, and how would the exit reason be conveyed? QEMU is the one forcefully enabling debug and overwriting guest debug registers, so it also knows when it did overwrite valid ones. Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/6] KVM: PPC: BOOKE: Emulate debug registers and exception
On Tue, 2014-07-29 at 16:06 +0200, Alexander Graf wrote: On 29.07.14 00:33, Scott Wood wrote: On Mon, 2014-07-28 at 16:04 +0200, Alexander Graf wrote: On 11.07.14 10:39, Bharat Bhushan wrote: This patch emulates debug registers and debug exception to support guest using debug resource. This enables running gdb/kgdb etc in guest. On BOOKE architecture we cannot share debug resources between QEMU and guest because: When QEMU is using debug resources then debug exception must be always enabled. To achieve this we set MSR_DE and also set MSRP_DEP so guest cannot change MSR_DE. When emulating debug resource for guest we want guest to control MSR_DE (enable/disable debug interrupt on need). So above mentioned two configuration cannot be supported at the same time. So the result is that we cannot share debug resources between QEMU and Guest on BOOKE architecture. In the current design QEMU gets priority over guest, this means that if QEMU is using debug resources then guest cannot use them and if guest is using debug resource then QEMU can overwrite them. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- Hi Alex, I thought of having some print in register emulation if QEMU is using debug resource, Also when QEMU overwrites guest written values but that looks excessive. If I uses some variable which get set when guest starts using debug registers and check in debug set ioctl then that look ugly. Looking for suggestions Whatever you do, have QEMU do the print, not the kernel. How would that be accomplished? How would the kernel know to exit to QEMU, and how would the exit reason be conveyed? QEMU is the one forcefully enabling debug and overwriting guest debug registers, so it also knows when it did overwrite valid ones. QEMU knows when it overwrites the guest values, but it doesn't know if, after enabling host debug, the guest tries to write to the debug registers and it gets nopped. If we keep the EDM setting, then we can at least say the situation is no worse than with a JTAG. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/6] KVM: PPC: BOOKE: Emulate debug registers and exception
Am 29.07.2014 um 19:50 schrieb Scott Wood scottw...@freescale.com: On Tue, 2014-07-29 at 16:06 +0200, Alexander Graf wrote: On 29.07.14 00:33, Scott Wood wrote: On Mon, 2014-07-28 at 16:04 +0200, Alexander Graf wrote: On 11.07.14 10:39, Bharat Bhushan wrote: This patch emulates debug registers and debug exception to support guest using debug resource. This enables running gdb/kgdb etc in guest. On BOOKE architecture we cannot share debug resources between QEMU and guest because: When QEMU is using debug resources then debug exception must be always enabled. To achieve this we set MSR_DE and also set MSRP_DEP so guest cannot change MSR_DE. When emulating debug resource for guest we want guest to control MSR_DE (enable/disable debug interrupt on need). So above mentioned two configuration cannot be supported at the same time. So the result is that we cannot share debug resources between QEMU and Guest on BOOKE architecture. In the current design QEMU gets priority over guest, this means that if QEMU is using debug resources then guest cannot use them and if guest is using debug resource then QEMU can overwrite them. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- Hi Alex, I thought of having some print in register emulation if QEMU is using debug resource, Also when QEMU overwrites guest written values but that looks excessive. If I uses some variable which get set when guest starts using debug registers and check in debug set ioctl then that look ugly. Looking for suggestions Whatever you do, have QEMU do the print, not the kernel. How would that be accomplished? How would the kernel know to exit to QEMU, and how would the exit reason be conveyed? QEMU is the one forcefully enabling debug and overwriting guest debug registers, so it also knows when it did overwrite valid ones. QEMU knows when it overwrites the guest values, but it doesn't know if, after enabling host debug, the guest tries to write to the debug registers and it gets nopped. If we keep the EDM setting, then we can at least say the situation is no worse than with a JTAG. Yeah, I think that's perfectly reasonable. I don't think it'll be likely that a user starts debugging with qemu and then expects guest debugging to work. The other way around is more likely and would warrant a warning to the user - if we care. Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 1/6] KVM: PPC: BOOKE: No need to set DBCR0_EDM in guest visible register
-Original Message- From: Wood Scott-B07421 Sent: Tuesday, July 29, 2014 3:22 AM To: Bhushan Bharat-R65777 Cc: ag...@suse.de; kvm-ppc@vger.kernel.org; k...@vger.kernel.org; Yoder Stuart- B08248 Subject: Re: [PATCH 1/6] KVM: PPC: BOOKE: No need to set DBCR0_EDM in guest visible register On Fri, 2014-07-11 at 14:08 +0530, Bharat Bhushan wrote: This is not used and even I do not remember why this was added in first place. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- arch/powerpc/kvm/booke.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index ab62109..a5ee42c 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -1804,8 +1804,6 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, kvm_guest_protect_msr(vcpu, MSR_DE, true); vcpu-guest_debug = dbg-control; vcpu-arch.shadow_dbg_reg.dbcr0 = 0; - /* Set DBCR0_EDM in guest visible DBCR0 register. */ - vcpu-arch.dbg_reg.dbcr0 = DBCR0_EDM; if (vcpu-guest_debug KVM_GUESTDBG_SINGLESTEP) vcpu-arch.shadow_dbg_reg.dbcr0 |= DBCR0_IDM | DBCR0_IC; This was intended to let the guest know that the host owns the debug resources, by analogy to what a JTAG debugger would do. The Power ISA has this Virtualized Implementation Note: It is the responsibility of the hypervisor to ensure that DBCR0[EDM] is consistent with usage of DEP. Ok, That means that if MSRP_DEP is set then set DBCR0_EDM and if MSRP_DEP is clear then clear DBCR0_EDM, right? We need to implement above mentioned this. Thanks -Bharat -Scott N�r��yb�X��ǧv�^�){.n�+jir)w*jg����ݢj/���z�ޖ��2�ޙ�)ߡ�a�����G���h��j:+v���w��٥
RE: [PATCH 2/6] KVM: PPC: BOOKE: Force MSR_DE in rfci if guest is under debug
-Original Message- From: Wood Scott-B07421 Sent: Tuesday, July 29, 2014 3:25 AM To: Bhushan Bharat-R65777 Cc: ag...@suse.de; kvm-ppc@vger.kernel.org; k...@vger.kernel.org; Yoder Stuart- B08248 Subject: Re: [PATCH 2/6] KVM: PPC: BOOKE: Force MSR_DE in rfci if guest is under debug On Fri, 2014-07-11 at 14:08 +0530, Bharat Bhushan wrote: When userspace (QEMU) is using the debug resource to debug guest then we want MSR_DE to be always set. This patch adds missing MSR_DE setting in rfci instruction. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- arch/powerpc/kvm/booke_emulate.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/booke_emulate.c b/arch/powerpc/kvm/booke_emulate.c index 27a4b28..80c51a2 100644 --- a/arch/powerpc/kvm/booke_emulate.c +++ b/arch/powerpc/kvm/booke_emulate.c @@ -40,7 +40,11 @@ static void kvmppc_emul_rfi(struct kvm_vcpu *vcpu) static void kvmppc_emul_rfci(struct kvm_vcpu *vcpu) { vcpu-arch.pc = vcpu-arch.csrr0; - kvmppc_set_msr(vcpu, vcpu-arch.csrr1); + /* Force MSR_DE when guest does not own debug facilities */ + if (vcpu-guest_debug) + kvmppc_set_msr(vcpu, vcpu-arch.csrr1 | MSR_DE); + else + kvmppc_set_msr(vcpu, vcpu-arch.csrr1); } int kvmppc_booke_emulate_op(struct kvm_run *run, struct kvm_vcpu *vcpu, It looks like this is already handled by kvmppc_vcpu_sync_debug(), which is called by kvmppc_set_msr(). Yes, you are right. This patch is not needed. Thanks -Bharat Plus, it should only be done for HV mode. -Scott
RE: [PATCH 5/6] KVM: PPC: BOOKE: Allow guest to change MSR_DE
-Original Message- From: Alexander Graf [mailto:ag...@suse.de] Sent: Tuesday, July 29, 2014 7:35 PM To: Wood Scott-B07421; Bhushan Bharat-R65777 Cc: kvm-ppc@vger.kernel.org; k...@vger.kernel.org; Yoder Stuart-B08248 Subject: Re: [PATCH 5/6] KVM: PPC: BOOKE: Allow guest to change MSR_DE On 29.07.14 00:01, Scott Wood wrote: On Fri, 2014-07-11 at 14:09 +0530, Bharat Bhushan wrote: When userspace is debugging guest then MSR_DE is always set and MSRP_DEP is set so that guest cannot change MSR_DE. Guest debug resources are not yet emulated, So there seems no reason we should stop guest controlling MSR_DE. Also a followup patch will enable debug emulation and that requires guest to control MSR_DE. Why does it matter whether we emulate debug resources? We still don't want the guest to be able to clear MSR[DE] and thus break host debug. The patch description is misleading. This patch changes the default of DEP to guest controlled when it boots up. Once QEMU wants control over the debug registers, it gets switched to QEMU controlled (that code is already there). Yes, now default MSR_DE is controlled by guest and when QEMU wants to use debug resources then MSR_DEP is set, so guest cannot change MSR_DE. Thanks -Bharat Alex
RE: [PATCH 6/6] KVM: PPC: BOOKE: Emulate debug registers and exception
-Original Message- From: Wood Scott-B07421 Sent: Tuesday, July 29, 2014 11:20 PM To: Alexander Graf Cc: Bhushan Bharat-R65777; kvm-ppc@vger.kernel.org; k...@vger.kernel.org; Yoder Stuart-B08248 Subject: Re: [PATCH 6/6] KVM: PPC: BOOKE: Emulate debug registers and exception On Tue, 2014-07-29 at 16:06 +0200, Alexander Graf wrote: On 29.07.14 00:33, Scott Wood wrote: On Mon, 2014-07-28 at 16:04 +0200, Alexander Graf wrote: On 11.07.14 10:39, Bharat Bhushan wrote: This patch emulates debug registers and debug exception to support guest using debug resource. This enables running gdb/kgdb etc in guest. On BOOKE architecture we cannot share debug resources between QEMU and guest because: When QEMU is using debug resources then debug exception must be always enabled. To achieve this we set MSR_DE and also set MSRP_DEP so guest cannot change MSR_DE. When emulating debug resource for guest we want guest to control MSR_DE (enable/disable debug interrupt on need). So above mentioned two configuration cannot be supported at the same time. So the result is that we cannot share debug resources between QEMU and Guest on BOOKE architecture. In the current design QEMU gets priority over guest, this means that if QEMU is using debug resources then guest cannot use them and if guest is using debug resource then QEMU can overwrite them. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- Hi Alex, I thought of having some print in register emulation if QEMU is using debug resource, Also when QEMU overwrites guest written values but that looks excessive. If I uses some variable which get set when guest starts using debug registers and check in debug set ioctl then that look ugly. Looking for suggestions Whatever you do, have QEMU do the print, not the kernel. How would that be accomplished? How would the kernel know to exit to QEMU, and how would the exit reason be conveyed? QEMU is the one forcefully enabling debug and overwriting guest debug registers, so it also knows when it did overwrite valid ones. QEMU knows when it overwrites the guest values, but it doesn't know if, after enabling host debug, the guest tries to write to the debug registers and it gets nopped. Do we want that QEMU first get DBCR0 to know whether it is overwriting whenever set/clear debug register? If we keep the EDM setting, then we can at least say the situation is no worse than with a JTAG. Yes Thanks -Bharat -Scott N�r��yb�X��ǧv�^�){.n�+jir)w*jg����ݢj/���z�ޖ��2�ޙ�)ߡ�a�����G���h��j:+v���w��٥