Re: [PATCH 3/3] kvm tools: Make virtio-blk kvm__irq_line thread safe
On Fri, Apr 29, 2011 at 9:44 AM, Ingo Molnar wrote: > Hm, this looks a bit strange (the mutex here protects only a kernel call - > that > cannot be right) and there's no explanation why it's needed. Why do > VIRTIO_BLK_IRQ (== KVM_IRQ_LINE ioctl()) calls have to be covered by the > mutex? > > A short blurb about expected behavior on SMP and locking rules at the top of > virtio-blk.c would be nice. Yes, looks strange. Asias, did you see some bad behavior that this fixes? The per-device mutexes are there to protect device state. The assumption here is that KVM handles KVM_IRQ_LINE ioctl() serialization by titself. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] kvm tools: Make virtio-net kvm__irq_line thread safe
On Fri, Apr 29, 2011 at 9:36 AM, Asias He wrote: > This patch fixes virtio-net randmom stall > > host $ scp guest:/root/big.guest . > big.guest 42% 440MB 67.7KB/s - stalled - > > Signed-off-by: Asias He > --- > tools/kvm/virtio-net.c | 9 +++-- > 1 files changed, 7 insertions(+), 2 deletions(-) > > diff --git a/tools/kvm/virtio-net.c b/tools/kvm/virtio-net.c > index 58b3de4..efe06cb 100644 > --- a/tools/kvm/virtio-net.c > +++ b/tools/kvm/virtio-net.c > @@ -77,9 +77,12 @@ static void virtio_net_rx_callback(struct kvm *self, void > *param) > head = virt_queue__get_iov(vq, iov, &out, &in, self); > len = readv(net_device.tap_fd, iov, in); > virt_queue__set_used_elem(vq, head, len); > - } > > - kvm__irq_line(self, VIRTIO_NET_IRQ, 1); > + /* We should interrupt guest right now, otherwise latency is > huge. */ > + mutex_lock(&net_device.mutex); > + kvm__irq_line(self, VIRTIO_NET_IRQ, 1); > + mutex_unlock(&net_device.mutex); > + } > } > > static void virtio_net_tx_callback(struct kvm *self, void *param) Please make that IRQ latency fix a separate patch. Don't we need to do it for TX path as well, btw? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] kvm tools: Make virtio-net kvm__irq_line thread safe
* Asias He wrote: > This patch fixes virtio-net randmom stall > > host $ scp guest:/root/big.guest . > big.guest 42% 440MB 67.7KB/s - stalled - > > Signed-off-by: Asias He > --- > tools/kvm/virtio-net.c |9 +++-- > 1 files changed, 7 insertions(+), 2 deletions(-) > > diff --git a/tools/kvm/virtio-net.c b/tools/kvm/virtio-net.c > index 58b3de4..efe06cb 100644 > --- a/tools/kvm/virtio-net.c > +++ b/tools/kvm/virtio-net.c > @@ -77,9 +77,12 @@ static void virtio_net_rx_callback(struct kvm *self, void > *param) > head = virt_queue__get_iov(vq, iov, &out, &in, self); > len = readv(net_device.tap_fd, iov, in); > virt_queue__set_used_elem(vq, head, len); > - } > > - kvm__irq_line(self, VIRTIO_NET_IRQ, 1); > + /* We should interrupt guest right now, otherwise latency is > huge. */ > + mutex_lock(&net_device.mutex); > + kvm__irq_line(self, VIRTIO_NET_IRQ, 1); > + mutex_unlock(&net_device.mutex); > + } > } > > static void virtio_net_tx_callback(struct kvm *self, void *param) > @@ -98,7 +101,9 @@ static void virtio_net_tx_callback(struct kvm *self, void > *param) > virt_queue__set_used_elem(vq, head, len); > } > > + mutex_lock(&net_device.mutex); > kvm__irq_line(self, VIRTIO_NET_IRQ, 1); > + mutex_unlock(&net_device.mutex); I do not find this explanation adequate either. This file too could use some comments about how the SMP behavior looks like. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] kvm tools: Make virtio-console kvm__irq_line thread safe
* Asias He wrote: > Signed-off-by: Asias He > --- > tools/kvm/virtio-console.c |2 ++ > 1 files changed, 2 insertions(+), 0 deletions(-) > > diff --git a/tools/kvm/virtio-console.c b/tools/kvm/virtio-console.c > index e66d198..492c859 100644 > --- a/tools/kvm/virtio-console.c > +++ b/tools/kvm/virtio-console.c > @@ -162,7 +162,9 @@ static void virtio_console_handle_callback(struct kvm > *self, void *param) > virt_queue__set_used_elem(vq, head, len); > } > > + mutex_lock(&console_device.mutex); > kvm__irq_line(self, VIRTIO_CONSOLE_IRQ, 1); > + mutex_unlock(&console_device.mutex); > } This looks wrong for similar reasons as for virtio-blk.c. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] kvm tools: Make virtio-blk kvm__irq_line thread safe
* Asias He wrote: > Signed-off-by: Asias He > --- > tools/kvm/virtio-blk.c |2 ++ > 1 files changed, 2 insertions(+), 0 deletions(-) > > diff --git a/tools/kvm/virtio-blk.c b/tools/kvm/virtio-blk.c > index 3feabd0..ade6335 100644 > --- a/tools/kvm/virtio-blk.c > +++ b/tools/kvm/virtio-blk.c > @@ -159,7 +159,9 @@ static void virtio_blk_do_io(struct kvm *kvm, void *param) > while (virt_queue__available(vq)) > virtio_blk_do_io_request(kvm, vq); > > + mutex_lock(&blk_device.mutex); > kvm__irq_line(kvm, VIRTIO_BLK_IRQ, 1); > + mutex_unlock(&blk_device.mutex); Hm, this looks a bit strange (the mutex here protects only a kernel call - that cannot be right) and there's no explanation why it's needed. Why do VIRTIO_BLK_IRQ (== KVM_IRQ_LINE ioctl()) calls have to be covered by the mutex? A short blurb about expected behavior on SMP and locking rules at the top of virtio-blk.c would be nice. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] kvm tools: Make virtio-blk kvm__irq_line thread safe
Signed-off-by: Asias He --- tools/kvm/virtio-blk.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/tools/kvm/virtio-blk.c b/tools/kvm/virtio-blk.c index 3feabd0..ade6335 100644 --- a/tools/kvm/virtio-blk.c +++ b/tools/kvm/virtio-blk.c @@ -159,7 +159,9 @@ static void virtio_blk_do_io(struct kvm *kvm, void *param) while (virt_queue__available(vq)) virtio_blk_do_io_request(kvm, vq); + mutex_lock(&blk_device.mutex); kvm__irq_line(kvm, VIRTIO_BLK_IRQ, 1); + mutex_unlock(&blk_device.mutex); } static bool virtio_blk_pci_io_out(struct kvm *self, uint16_t port, void *data, int size, uint32_t count) -- 1.7.4.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] kvm tools: Make virtio-console kvm__irq_line thread safe
Signed-off-by: Asias He --- tools/kvm/virtio-console.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/tools/kvm/virtio-console.c b/tools/kvm/virtio-console.c index e66d198..492c859 100644 --- a/tools/kvm/virtio-console.c +++ b/tools/kvm/virtio-console.c @@ -162,7 +162,9 @@ static void virtio_console_handle_callback(struct kvm *self, void *param) virt_queue__set_used_elem(vq, head, len); } + mutex_lock(&console_device.mutex); kvm__irq_line(self, VIRTIO_CONSOLE_IRQ, 1); + mutex_unlock(&console_device.mutex); } static bool virtio_console_pci_io_out(struct kvm *self, uint16_t port, void *data, int size, uint32_t count) -- 1.7.4.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] kvm tools: Make virtio-net kvm__irq_line thread safe
This patch fixes virtio-net randmom stall host $ scp guest:/root/big.guest . big.guest 42% 440MB 67.7KB/s - stalled - Signed-off-by: Asias He --- tools/kvm/virtio-net.c |9 +++-- 1 files changed, 7 insertions(+), 2 deletions(-) diff --git a/tools/kvm/virtio-net.c b/tools/kvm/virtio-net.c index 58b3de4..efe06cb 100644 --- a/tools/kvm/virtio-net.c +++ b/tools/kvm/virtio-net.c @@ -77,9 +77,12 @@ static void virtio_net_rx_callback(struct kvm *self, void *param) head = virt_queue__get_iov(vq, iov, &out, &in, self); len = readv(net_device.tap_fd, iov, in); virt_queue__set_used_elem(vq, head, len); - } - kvm__irq_line(self, VIRTIO_NET_IRQ, 1); + /* We should interrupt guest right now, otherwise latency is huge. */ + mutex_lock(&net_device.mutex); + kvm__irq_line(self, VIRTIO_NET_IRQ, 1); + mutex_unlock(&net_device.mutex); + } } static void virtio_net_tx_callback(struct kvm *self, void *param) @@ -98,7 +101,9 @@ static void virtio_net_tx_callback(struct kvm *self, void *param) virt_queue__set_used_elem(vq, head, len); } + mutex_lock(&net_device.mutex); kvm__irq_line(self, VIRTIO_NET_IRQ, 1); + mutex_unlock(&net_device.mutex); } static bool virtio_net_pci_io_device_specific_in(void *data, unsigned long offset, int size, uint32_t count) -- 1.7.4.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 3/3] KVM: MMU: Optimize guest page table walk
On Fri, 29 Apr 2011 14:38:08 +0900 Takuya Yoshikawa wrote: > On Thu, 28 Apr 2011 19:46:00 -0700 > Andi Kleen wrote: > > > Avi Kivity writes: > > > > > > Good optimization. copy_from_user() really isn't optimized for short > > > buffers, I expect much of the improvement comes from that. > > > > Actually it is equivalent to get_user for the lenghts supported by > > get_user, assuming you pass in a constant length. You probably do not. > > > > -Andi > > > Weird, I actually measured some changes even after dropping other parts > than get_user() usage. > > Only I can guess for that reason is the reduction of some function calls > by inlining some functions. A bit to clarify. Original path: kvm_read_guest_page_mmu() kvm_read_guest_page() copy_from_user() Takuya -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Autotest] [Autotest PATCH] KVM-test: TSC drift test
On Thu, Apr 21, 2011 at 4:33 AM, Amos Kong wrote: > This case is used to test the drift between host and guest. > Use taskset to make tsc program execute in a single cpu. > If the drift ratio bigger than 10%, then fail this case. The calculations of the tsc frequency looks wrong... See comments below. Also, when Glauber or Zach could take a look into this test it'd be great! > Signed-off-by: Amos Kong > --- > client/tests/kvm/deps/get_tsc.c | 27 ++ > client/tests/kvm/tests/tsc_drift.py | 88 > > client/tests/kvm/tests_base.cfg.sample | 5 ++ > 3 files changed, 120 insertions(+), 0 deletions(-) > create mode 100644 client/tests/kvm/deps/get_tsc.c > create mode 100644 client/tests/kvm/tests/tsc_drift.py > > diff --git a/client/tests/kvm/deps/get_tsc.c b/client/tests/kvm/deps/get_tsc.c > new file mode 100644 > index 000..e91a41f > --- /dev/null > +++ b/client/tests/kvm/deps/get_tsc.c > @@ -0,0 +1,27 @@ > +/* > + * Programme to get cpu's TSC(time stamp counter) > + * Copyright(C) 2009 Redhat, Inc. > + * Amos Kong > + * Dec 9, 2009 > + * > + */ > + > +#define _GNU_SOURCE > +#include > +#include > + > +typedef unsigned long long u64; > + > +u64 rdtsc(void) > +{ > + unsigned tsc_lo, tsc_hi; > + > + asm volatile("rdtsc" : "=a"(tsc_lo), "=d"(tsc_hi)); > + return tsc_lo | (u64)tsc_hi << 32; > +} > + > +int main(void) > +{ > + printf("%lld\n", rdtsc()); > + return 0; > +} > diff --git a/client/tests/kvm/tests/tsc_drift.py > b/client/tests/kvm/tests/tsc_drift.py > new file mode 100644 > index 000..de2fb76 > --- /dev/null > +++ b/client/tests/kvm/tests/tsc_drift.py > @@ -0,0 +1,88 @@ > +import time, os, logging, commands, re > +from autotest_lib.client.common_lib import error > +from autotest_lib.client.bin import local_host > +import kvm_test_utils > + > + > +def run_tsc_drift(test, params, env): > + """ > + Check the TSC(time stamp counter) frequency of guest and host whether > match > + or not > + > + 1) Computer average tsc frequency of host's cpus by C the program > + 2) Copy the C code to the guest, complie and run it to get tsc > + frequency of guest's vcpus > + 3) Sleep sometimes and get the TSC of host and guest again > + 4) Compute the TSC frequency of host and guest > + 5) Compare the frequency deviation between host and guest with standard > + > + @param test: Kvm test object > + @param params: Dictionary with the test parameters. > + @param env: Dictionary with test environment. > + """ > + drift_threshold = float(params.get("drift_threshold")) > + interval = float(params.get("interval")) > + cpu_chk_cmd = params.get("cpu_chk_cmd") > + tsc_freq_path = os.path.join(test.bindir, 'deps/get_tsc.c') > + host_freq = 0 > + > + def get_tsc(machine="host", i=0): > + cmd = "taskset -c %s /tmp/get_tsc" % i > + if machine == "host": > + s, o = commands.getstatusoutput(cmd) > + else: > + s, o = session.cmd_status_output(cmd) > + if s != 0: > + logging.debug(o) > + raise error.TestError("Fail to get tsc of host, ncpu: %d" % i) > + return float(re.findall("(\d+)",o)[0]) > + > + vm = env.get_vm(params["main_vm"]) > + vm.verify_alive() > + timeout = float(params.get("login_timeout", 240)) > + session = vm.wait_for_login(timeout=timeout) > + > + commands.getoutput("gcc %s -o /tmp/get_tsc" % tsc_freq_path) > + ncpu = local_host.LocalHost().get_num_cpu() > + > + logging.info("Interval is %s" % interval) > + logging.info("Determine the TSC frequency in the host") > + for i in range(ncpu): > + tsc1 = get_tsc("host", i) > + time.sleep(interval) > + tsc2 = get_tsc("host", i) > + > + delta = tsc2 - tsc1 > + logging.info("Host TSC delta for cpu %s is %s" % (i, delta)) > + if delta < 0: > + raise error.TestError("Host TSC for cpu %s warps %s" % (i, > delta)) ^ Yeah, I don't think this is expected to warp, but yet, good to check. > + host_freq += delta / ncpu Now, i really didn't understand the concept behind the tsc frequency. So we have a difference between 2 timestamps taken over an arbitrary period of time (in this case, looks 30s by default) and divide by the number of cpus, however we will repeat this procedure by the same amount so: N * ( d_tsc1/N + d_tsc2/N + d_tsc3/N + ... + d_tscn/N) This could be simplified to (d_tsc1 + d_tsc2 + d_tsc3 + + d_tscn) Unless I'm missing something... so host_freq = sum(d_tsci) The calculation could be simplified then... And by definition, isn't frequency how many times a phenomena occurs (in this case, change of timestamp counter) per time? So, don't we have to divide this sum by the time the program slept? Also, it looks like the whole logic to get the frequencies can be factored to a single function and just call that function for guest and host.
Re: [RFC PATCH 3/3] KVM: MMU: Optimize guest page table walk
On Thu, 28 Apr 2011 19:46:00 -0700 Andi Kleen wrote: > Avi Kivity writes: > > > > Good optimization. copy_from_user() really isn't optimized for short > > buffers, I expect much of the improvement comes from that. > > Actually it is equivalent to get_user for the lenghts supported by > get_user, assuming you pass in a constant length. You probably do not. > > -Andi Weird, I actually measured some changes even after dropping other parts than get_user() usage. Only I can guess for that reason is the reduction of some function calls by inlining some functions. Takuya -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Autotest] [Autotest PATCH] KVM-test: Simple stop/continue test
On Thu, Apr 21, 2011 at 3:21 AM, Amos Kong wrote: > Change guest state by monitor cmd, verify guest status, > and try to login guest by network. I don't like the way you're handling human monitor and QMP monitors in this tests... comments below: > Signed-off-by: Jason Wang > Signed-off-by: Amos Kong > --- > client/tests/kvm/tests/stop_continue.py | 52 > +++ > client/tests/kvm/tests_base.cfg.sample | 4 ++ > 2 files changed, 56 insertions(+), 0 deletions(-) > create mode 100644 client/tests/kvm/tests/stop_continue.py > > diff --git a/client/tests/kvm/tests/stop_continue.py > b/client/tests/kvm/tests/stop_continue.py > new file mode 100644 > index 000..c7d8025 > --- /dev/null > +++ b/client/tests/kvm/tests/stop_continue.py > @@ -0,0 +1,52 @@ > +import logging > +from autotest_lib.client.common_lib import error > + > + > +def run_stop_continue(test, params, env): > + """ > + Suspend a running Virtual Machine and verify its state. > + > + 1) Boot the vm > + 2) Suspend the vm through stop command > + 3) Verify the state through info status command > + 4) Check is the ssh session to guest is still responsive, > + if succeed, fail the test. > + > + @param test: Kvm test object > + @param params: Dictionary with the test parameters > + @param env: Dictionary with test environment. > + """ > + vm = env.get_vm(params["main_vm"]) > + vm.verify_alive() > + timeout = float(params.get("login_timeout", 240)) > + session = vm.wait_for_login(timeout=timeout) > + > + try: > + logging.info("Suspend the virtual machine") > + vm.monitor.cmd("stop") > + > + logging.info("Verifying the status of virtual machine through > monitor") > + o = vm.monitor.info("status") > + if 'paused' not in o and ( "u'running': False" not in str(o)): ^ Here, it's not clear what means a paused qmp monitor or a hmp monitor. this statement is unnecessarily confusing. Here 'paused' not in o -> Is what would define a 'not paused hmp monitor' "u'running': False" not in str(o) -> This defines a 'not paused qmp monitor' why we are checking one _and_ the other, as one monitor can't be hmp and qmp at the same time? It would be at least _or_. And like I said, it's non trivial to flow this assertion made. It seems to me that a better (although involving more code changes) approach is: 1) Introduce VM methods is_paused and verify_paused, which would internally for the kvm vm class, call monitor methods also called is_paused and verify_paused, with implementations for both hmp and qmp. verify_paused would throw an exception in case of a failure, while is_paused would return a boolean. > + logging.error(o) > + raise error.TestFail("Fail to suspend through monitor command > line") > + > + logging.info("Check the session responsiveness") > + if session.is_responsive(): > + raise error.TestFail("Session is still responsive after stop") > + > + logging.info("Try to resume the guest") > + vm.monitor.cmd("cont") > + > + o = vm.monitor.info("status") > + m_type = params.get("monitor_type", "human") > + if ('human' in m_type and 'running' not in o) or\ > + ('qmp' in m_type and "u'running': True" not in str(o)): ^ same here, we should have is_running and verify_running methods on VM that would call monitor methods with the same names. Now, of course I might be really mistaken here, would like to hear your opinion on that subject. -- Lucas -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Autotest] [Autotest PATCH] KVM-test: Check if guest bootable after reseting several times
On Thu, Apr 21, 2011 at 3:47 AM, Amos Kong wrote: > This test comes from a regression bug: > Guest can not found bootable device after reseting several times by > monitor command. Can you point out the bug number? I really don't expect that we keep integrity of the disk after several resets, at least I wouldn't if it was a bare metal machine... Anyway, I've made some changes to the code, mostly removing unnecessary imports and having some messages explaining what the test is doing... I am just not convinced that this test should pass in all circumstances (I tried here and it does pass, by the way). > Signed-off-by: Amos Kong > --- > client/tests/kvm/tests/system_reset_bootable.py | 29 > +++ > client/tests/kvm/tests_base.cfg.sample | 7 ++ > 2 files changed, 36 insertions(+), 0 deletions(-) > create mode 100755 client/tests/kvm/tests/system_reset_bootable.py > > diff --git a/client/tests/kvm/tests/system_reset_bootable.py > b/client/tests/kvm/tests/system_reset_bootable.py > new file mode 100755 > index 000..ca9fb70 > --- /dev/null > +++ b/client/tests/kvm/tests/system_reset_bootable.py > @@ -0,0 +1,29 @@ > +import logging, time > +from autotest_lib.client.common_lib import error > +import kvm_test_utils > + > + > +def run_system_reset_bootable(test, params, env): > + """ > + KVM reset test: > + 1) Boot guest. > + 2) Send some times system_reset monitor command. > + 3) Log into the guest to verify it could normally boot. > + > + @param test: kvm test object > + @param params: Dictionary with the test parameters > + @param env: Dictionary with test environment. > + """ > + vm = env.get_vm(params["main_vm"]) > + vm.verify_alive() > + timeout = float(params.get("login_timeout", 240)) > + reset_times = int(params.get("reset_times",20)) > + interval = int(params.get("reset_interval",10)) > + wait_time = int(params.get("wait_time_for_reset",60)) > + time.sleep(wait_time) > + > + for i in range(reset_times): > + vm.monitor.cmd("system_reset") > + time.sleep(interval) > + > + session = vm.wait_for_login(timeout=timeout) > diff --git a/client/tests/kvm/tests_base.cfg.sample > b/client/tests/kvm/tests_base.cfg.sample > index 7333ed0..ceafebe 100644 > --- a/client/tests/kvm/tests_base.cfg.sample > +++ b/client/tests/kvm/tests_base.cfg.sample > @@ -961,6 +961,13 @@ variants: > sleep_before_reset = 20 > kill_vm_on_error = yes > > + - system_reset_bootable: > + type = system_reset_bootable > + interval = 1 > + reset_times = 20 > + wait_time_for_reset = 120 > + kill_vm_on_error = yes > + > - shutdown: install setup unattended_install.cdrom > type = shutdown > shutdown_method = shell > > ___ > Autotest mailing list > autot...@test.kernel.org > http://test.kernel.org/cgi-bin/mailman/listinfo/autotest > -- Lucas -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM test: Fix problem when unexpected strings appear on SSH prompt
In some conditions, ssh may respond things like: Warning: Permanently added localhost' (RSA) to the list of known hosts. Take that into account on the remote login function. Signed-off-by: Jason D. Gaston --- client/virt/virt_utils.py |7 +-- 1 files changed, 5 insertions(+), 2 deletions(-) diff --git a/client/virt/virt_utils.py b/client/virt/virt_utils.py index 8fa64ca..398ecd4 100644 --- a/client/virt/virt_utils.py +++ b/client/virt/virt_utils.py @@ -557,7 +557,7 @@ def _remote_login(session, username, password, prompt, timeout=10): match, text = session.read_until_last_line_matches( [r"[Aa]re you sure", r"[Pp]assword:\s*$", r"[Ll]ogin:\s*$", r"[Cc]onnection.*closed", r"[Cc]onnection.*refused", - r"[Pp]lease wait", prompt], + r"[Pp]lease wait", r"[Ww]arning", prompt], timeout=timeout, internal_timeout=0.5) if match == 0: # "Are you sure you want to continue connecting" logging.debug("Got 'Are you sure...'; sending 'yes'") @@ -592,7 +592,10 @@ def _remote_login(session, username, password, prompt, timeout=10): logging.debug("Got 'Please wait'") timeout = 30 continue -elif match == 6: # prompt +elif match == 6: # "Warning added RSA" +logging.debug("Got 'Warning added RSA to known host list") +continue +elif match == 7: # prompt logging.debug("Got shell prompt -- logged in") break except aexpect.ExpectTimeoutError, e: -- 1.7.4.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] virt.virt_utils: Get rid of create_report function
As it has been transfered to the html_report module itself. Signed-off-by: Lucas Meneghel Rodrigues --- client/virt/virt_utils.py | 12 1 files changed, 0 insertions(+), 12 deletions(-) diff --git a/client/virt/virt_utils.py b/client/virt/virt_utils.py index 97a652d..8fa64ca 100644 --- a/client/virt/virt_utils.py +++ b/client/virt/virt_utils.py @@ -1187,18 +1187,6 @@ def run_tests(parser, job): return not failed -def create_report(report_dir, results_dir): -""" -Creates a neatly arranged HTML results report in the results dir. - -@param report_dir: Directory where the report script is located. -@param results_dir: Directory where the results will be output. -""" -reporter = os.path.join(report_dir, 'html_report.py') -html_file = os.path.join(results_dir, 'results.html') -os.system('%s -r %s -f %s -R' % (reporter, results_dir, html_file)) - - def display_attributes(instance): """ Inspects a given class instance attributes and displays them, convenient -- 1.7.4.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] KVM test: control: Get rid of report generation code
As this is now handled by autotest and control file writers don't have to worry about it anymore. Signed-off-by: Lucas Meneghel Rodrigues --- client/tests/kvm/control |3 --- 1 files changed, 0 insertions(+), 3 deletions(-) diff --git a/client/tests/kvm/control b/client/tests/kvm/control index 6437d88..c887a3e 100644 --- a/client/tests/kvm/control +++ b/client/tests/kvm/control @@ -67,6 +67,3 @@ if args: parser.parse_string(str) virt_utils.run_tests(parser, job) - -# Generate a nice HTML report inside the job's results dir -virt_utils.create_report(kvm_test_dir, job.resultdir) -- 1.7.4.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] job: Write a job HTML report for every autotest client job
We have a tool that can generate such a file and it makes it easier for people who don't have access to the autotest web interface to analyze job results. With this, all client jobs write such a file, so test writers don't have to worry about it. This change does not regress the job unittests. Signed-off-by: Lucas Meneghel Rodrigues --- client/bin/job.py |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/client/bin/job.py b/client/bin/job.py index 9effcdb..a792309 100644 --- a/client/bin/job.py +++ b/client/bin/job.py @@ -17,6 +17,7 @@ from autotest_lib.client.common_lib import base_job from autotest_lib.client.common_lib import error, barrier, log, logging_manager from autotest_lib.client.common_lib import base_packages, packages from autotest_lib.client.common_lib import global_config +from autotest_lib.client.tools import html_report LAST_BOOT_TAG = object() @@ -950,6 +951,9 @@ class base_client_job(base_job.base_job): self._tap.write() self._tap._write_tap_archive() +# write out a job HTML report +html_report.create_report(self.resultdir) + # We are about to exit 'complete' so clean up the control file. dest = os.path.join(self.resultdir, os.path.basename(self._state_file)) shutil.move(self._state_file, dest) -- 1.7.4.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] tools/html_report: Make html report generation autotest generic
The html report tool was a bit too tied to KVM autotest, and that was not necessary. Turn it into a generic module that can be used by any autotest job, encapsulate the main worker function into autotest API so we don't need to exec it in a subshell, fix naming of the report and CSS details. Signed-off-by: Lucas Meneghel Rodrigues --- client/tools/html_report.py | 158 --- 1 files changed, 104 insertions(+), 54 deletions(-) diff --git a/client/tools/html_report.py b/client/tools/html_report.py index 8b4b109..7b17a75 100755 --- a/client/tools/html_report.py +++ b/client/tools/html_report.py @@ -1,6 +1,6 @@ #!/usr/bin/python """ -Script used to parse the test results and generate an HTML report. +Module used to parse the autotest job results and generate an HTML report. @copyright: (c)2005-2007 Matt Kruse (javascripttoolbox.com) @copyright: Red Hat 2008-2009 @@ -27,7 +27,6 @@ body { text-decoration:none; font:bold 2em/2em Arial, Helvetica, sans-serif; text-transform:none; -text-shadow: 2px 2px 2px #555; text-align: left; color:#55; border-bottom: 1px solid #55; @@ -37,7 +36,6 @@ body { text-decoration:none; font:bold 16px Arial, Helvetica, sans-serif; text-transform:uppercase; -text-shadow: 2px 2px 2px #555; text-align: left; color:#55; margin-bottom:0; @@ -1374,21 +1372,27 @@ function processList(ul) { } """ - -# -## This script gets kvm autotest results directory path as an ## -## input and create a single html formatted result page. ## -# - stimelist = [] def make_html_file(metadata, results, tag, host, output_file_name, dirname): +""" +Create HTML file contents for the job report, to stdout or filesystem. + +@param metadata: Dictionary with Job metadata (tests, exec time, etc). +@param results: List with testcase results. +@param tag: Job tag. +@param host: Client hostname. +@param output_file_name: Output file name. If empty string, prints to +stdout. +@param dirname: Prefix for HTML links. If empty string, the HTML links +will be relative to the results dir. +""" html_prefix = """ http://www.w3.org/TR/html4/strict.dtd";> -KVM Autotest Results +Autotest job execution results %s @@ -1407,14 +1411,13 @@ return true; """ % (format_css, table_js, maketree_js) - if output_file_name: output = open(output_file_name, "w") else: #if no output file defined, print html file to console output = sys.stdout # create html page print >> output, html_prefix -print >> output, 'KVM Autotest Execution Report' +print >> output, 'Autotest job execution report' # formating date and time to print t = datetime.datetime.now() @@ -1438,7 +1441,7 @@ return true; stat_str = ('From %d tests executed, %d have passed (%d%% failures)' % (total_executed, total_passed, failed_perct)) -kvm_ver_str = metadata['kvmver'] +kvm_ver_str = metadata.get('kvmver', None) print >> output, '' print >> output, 'HOST:%s' % host @@ -1446,10 +1449,10 @@ return true; print >> output, 'DATE:%s' % now.ctime() print >> output, 'STATS:%s'% stat_str print >> output, '' -print >> output, 'KVM VERSION:%s' % kvm_ver_str +if kvm_ver_str is not None: +print >> output, 'KVM VERSION:%s' % kvm_ver_str print >> output, '' - ## print test results print >> output, '' print >> output, 'Test Results' @@ -1529,6 +1532,12 @@ id="t1" class="stats table-autosort:4 table-autofilter table-stripeclass:alterna def parse_result(dirname, line): +""" +Parse job status log line. + +@param dirname: Job results dir +@param line: Status log line. +""" parts = line.split() if len(parts) < 4: return None @@ -1556,7 +1565,7 @@ def parse_result(dirname, line): # assign actual values rx = re.compile('^(\w+)\.(.*)$') m1 = rx.findall(parts[3]) -result['testcase'] = m1[0][1] +result['testcase'] = str(tag) result['title'] = str(tag) result['status'] = parts[1] if result['status'] != 'GOOD': @@ -1572,10 +1581,16 @@ def parse_result(dirname, line): def get_exec_log(resdir, tag): -stdout_file = os.path.join(resdir, tag) + '/debug/stdout' -stderr_file = os.path.join(resdir, tag) + '/debug/stderr' -status_file = os.path.join(resdir, tag) + '/status' -dmesg_file = os.path.join(resdir, tag) + '/sysinfo/dmesg' +""" +Get job execution summary. + +@param resdir: Job results dir. +@param tag: Job tag. +""" +stdout_file = os.path.join(resdir, tag, 'debug', 'stdout') +stderr_file = os.path.join(resdir, tag, 'de
[PATCH 0/4] Fix HTML report generation v2
Made HTML report generation generic and useable by any autotest client job, fixed up some bugs in HTML report code. Now it'll be easier to browse and visualize results without the autotest web interface. Lucas Meneghel Rodrigues (4): tools/html_report: Make html report generation autotest generic job: Write a job HTML report for every autotest client job virt.virt_utils: Get rid of create_report function KVM test: control: Get rid of report generation code client/bin/job.py |4 + client/tests/kvm/control|3 - client/tools/html_report.py | 158 --- client/virt/virt_utils.py | 12 --- 4 files changed, 108 insertions(+), 69 deletions(-) -- 1.7.4.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug in KVM clock backwards compensation
On 04/28/2011 01:20 PM, Joerg Roedel wrote: On Thu, Apr 28, 2011 at 11:34:44AM -0700, Zachary Amsden wrote: On 04/28/2011 12:13 AM, Roedel, Joerg wrote: I see it different. This code wants to check if the _guest_ tsc moves forwared (or at least not backwards). So it is fully legitimate to just do this by reading the guest-tsc and compare it to the last one the guest had. That wasn't the intention when I wrote that code. It's simply there to detect backwards motion of the host TSC. The guest TSC can legally go backwards whenever the guest decides to change it, so checking the guest TSC doesn't make sense here. This code checks how many guest tsc cycles have passed since this vCPU was de-scheduled last time (and before it is running again). So since the vCPU hasn't run in the meantime it had no chance to change its TSC. Further, the other parameters like the TSC offset and the scaling multiplier havn't changed too, so the only variable in the guest-tsc calculation is the host-tsc. So this calculation using the guest-tsc can detect backwards going host-tsc as good as the old one. The benefit here is that we can feed consistent values into adjust_tsc_offset(). While true, this is more complex than the original code. The original code here doesn't try to actually compensate for the guest TSC difference - instead what it does is NULL any discovered host TSC delta: if (tsc_delta < 0) mark_tsc_unstable("KVM discovered backwards TSC"); if (check_tsc_unstable()) { kvm_x86_ops->adjust_tsc_offset(vcpu, -tsc_delta); vcpu->arch.tsc_catchup = 1; } kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); Erasing that delta also erases elapsed time since the VCPU has last been run, which isn't desirable, so it then sets tsc_catchup mode, which will restore the proper TSC. The request here triggers code which later updates the TSC offset again. To avoid complexity, I think it's simplest to do the first computation in terms of the host TSC. Yes, with tsc-scaling, the machines already have stable TSCs - the above test is for older hardware which could have problems, and can be reverted back to the original code without worrying about switching units. This is the case pratically. But architecturally the tsc-scaling feature does not depend on a constant tsc, so we can make no such assumtions. Additionally, it may happen that Linux mis-detects an unstable tsc for some reason (broken BIOS, bug in the code, ...). Therefore I think it is dangerous to assume that this code will never run on tsc-scaling capable hosts. And if it does and we don't manage the tsc-offset units right, we may see very weird behavior. I agree, it is best to handle this case - hardware can and will change - but the TSC adjustment in terms of guest rate should be done under the atomic protection right before entering hardware virtualized mode - here: I left compute_guest_tsc in place to recompute time in guest units here, even if the underlying hardware rate changes. /* * We may have to catch up the TSC to match elapsed wall clock * time for two reasons, even if kvmclock is used. * 1) CPU could have been running below the maximum TSC rate * 2) Broken TSC compensation resets the base at each VCPU * entry to avoid unknown leaps of TSC even when running * again on the same CPU. This may cause apparent elapsed * time to disappear, and the guest to stand still or run * very slowly. */ if (vcpu->tsc_catchup) { u64 tsc = compute_guest_tsc(v, kernel_ns); if (tsc > tsc_timestamp) { kvm_x86_ops->adjust_tsc_offset(v, tsc - tsc_timestamp); tsc_timestamp = tsc; } } So yeah, the code is getting pretty complex but we'd like to avoid that as much as possible - so I would prefer to have the hardware backwards compensation separate from the guest rate computation by doing this: step 1) remove any backwards hardware TSC delta (in hardware units) step 2) recompute guest TSC from a stable clock (gotten from kernel_ns) and apply adjustment (in guest units) So it appears you can just share most of the logic of guest TSC catchup mode. What do you think? Zach -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH 3/3] KVM: MMU: Optimize guest page table walk
Avi Kivity writes: > > Good optimization. copy_from_user() really isn't optimized for short > buffers, I expect much of the improvement comes from that. Actually it is equivalent to get_user for the lenghts supported by get_user, assuming you pass in a constant length. You probably do not. -Andi -- a...@linux.intel.com -- Speaking for myself only -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -v3] Monitor command: x-gpa2hva, translate guest physical address to host virtual address
On 04/28/2011 10:04 PM, Marcelo Tosatti wrote: > On Thu, Apr 28, 2011 at 08:00:19AM -0500, Anthony Liguori wrote: >> On 04/27/2011 06:06 PM, Marcelo Tosatti wrote: >>> On Fri, Nov 19, 2010 at 04:17:35PM +0800, Huang Ying wrote: On Tue, 2010-11-16 at 10:23 +0800, Huang Ying wrote: > Author: Max Asbock > > Add command x-gpa2hva to translate guest physical address to host > virtual address. Because gpa to hva translation is not consistent, so > this command is only used for debugging. > > The x-gpa2hva command provides one step in a chain of translations from > guest virtual to guest physical to host virtual to host physical. Host > physical is then used to inject a machine check error. As a > consequence the HWPOISON code on the host and the MCE injection code > in qemu-kvm are exercised. > > v3: > > - Rename to x-gpa2hva > - Remove QMP support, because gpa2hva is not consistent Is this patch an acceptable solution for now? This command is useful for our testing. >>> >>> Anthony? >> >> Yeah, but it should come through qemu-devel, no? > > Yes, Huang Ying, can you please resend? Via QEMU git or uq/master branch of KVM git? Best Regards, Huang Ying -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Guest virtual CPUs limited to only 50% of host CPU
Thanks for the reply. Turns out it was htop reporting it wrong. The utime in /proc/stat already includes guest time. On Tue, Apr 5, 2011 at 2:03 AM, Avi Kivity wrote: > On 03/31/2011 11:03 PM, Drew Johnson wrote: >> >> Hi, >> >> I am using Qemu-KVM-0.12.5 > > This is somewhat old. Try upgrading. > >> on Intel Xeon (Vt-x enabled) processors and >> monitoring the system using htop on the host. On the processors that >> are running Qemu-KVM I am seeing a 50/50 split between userspace and >> guest ("gu:" in htop). I have pinned the vCPU qemu-kvm threads to >> specific host CPUs using taskset. In the guest the CPUs are nearly >> 100% userspace in htop. >> >> Does anyone have ideas on why this is? Is there a way I can get much >> higher utilization for the guest virtual CPUs wrt the host? >> > > Please build qemu with --disable-cpu-strip and run 'perf top' to see what's > going on. > > -- > error compiling committee.c: too many arguments to function > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v3 1/5] hpet 'driftfix': add hooks required to detect coalesced interrupts (x86 apic only)
On 2011-04-28 20:51, Blue Swirl wrote: > On Thu, Apr 28, 2011 at 5:24 PM, Ulrich Obergfell wrote: >> 'target_get_irq_delivered' and 'target_reset_irq_delivered' contain >> entry addresses of functions that are utilized by update_irq() to >> detect coalesced interrupts. apic code loads these pointers during >> initialization. > > I'm not so happy with this approach, but probably then the i386 > dependencies can be removed from RTC and it can be compiled only once > for all targets. This whole series is really the minimalistic approach. The callbacks defined here must remain a temporary "shortcut". Just like proper abstraction of periodic tick compensation for reuse in other timers has to be added later on. And the limitation to edge-triggered legacy HPET INTs has to be removed. Jan signature.asc Description: OpenPGP digital signature
Re: Bug in KVM clock backwards compensation
On 2011-04-28 21:06, Zachary Amsden wrote: > On 04/28/2011 12:22 AM, Roedel, Joerg wrote: >> On Thu, Apr 28, 2011 at 03:06:01AM -0400, Jan Kiszka wrote: >> >>> And /me still wonders (like I did when this first popped up) if the >>> proper place of determining TSC stability really have to be KVM. >>> >>> If the Linux core fails to detect some instability and KVM has to jump >>> in, shouldn't we better improve the core's detection abilities and make >>> use of them in KVM? Conceptually this looks like we are currently just >>> working around a core deficit in KVM. >>> >> Yes, good question. Has this ever triggered on a real machine (not >> counting the suspend/resume issue in)? >> > > Yes... some platforms don't go haywire until you start using power > mangement, TSC is stable before that, but not afterwards, and depending > on the version of the kernel, KVM might detect this before the kernel does. > > Honestly, the code is obsolete, but still useful for those who build KVM > as an external module on older kernels using the kvm-kmod system. I'll happily accept patches that migrate any logic to kvm-kmod that the current kernel does not need it anymore. Jan signature.asc Description: OpenPGP digital signature
Re: [Qemu-devel] KVM call minutes for Apr 26
On Tue, 2011-04-26 at 16:29 -0500, Anthony Liguori wrote: > On 04/26/2011 11:47 AM, Lucas Meneghel Rodrigues wrote: > > On Tue, 2011-04-26 at 17:58 +0300, Avi Kivity wrote: > >> On 04/26/2011 05:41 PM, Chris Wright wrote: > >>> - having basic common config could be useful > > Hi Lucas, > > Could you send your suggested config as a patch to qemu.git? Even > better if it was automatically invoked via a make autotest target > although if you supply the config, I'll happily patch the Makefile. > > Regards, > > Anthony Liguori So Anthony, after hearing quite a lot of feedback from our colleagues, I'll start working on a 'make autotest' target for qemu and qemu-kvm. A lot of compromises will have to be made here, I was thinking on a very minimum setup where folks can get unittests, followed by booting, rebooting, and single host migration, using a minimal fedora guest image, that 'make autotest' would download. Meanwhile, here is Avi's config file translated to a more current autotest state. I don't think we should rush into putting this into qemu's repo, as I am aiming for having this more transparent to the user. Just publishing here FYI. include tests_base.cfg include cdkeys.cfg image_name(_.*)? ?<= /tmp/kvm_autotest_root/images/ cdrom(_.*)? ?<= /tmp/kvm_autotest_root/ floppy ?<= /tmp/kvm_autotest_root/ Linux..unattended_install: kernel ?<= /tmp/kvm_autotest_root/ initrd ?<= /tmp/kvm_autotest_root/ variants: # The variant names are the testset names - @regression: # We want qemu-kvm for this run qemu_binary = /usr/bin/qemu-kvm qemu_img_binary = /usr/bin/qemu-img only qcow2 only rtl8139 only ide only smp2 only no_pci_assignable only smallpages only Fedora.14 Win7 only unattended_install.cdrom, boot, reboot, migrate, shutdown abort_on_error = yes kill_vm.* ?= no kill_unresponsive_vms.* ?= no # Put the testset you want here only regression -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [CentOS] Install CentOS as KVM guest
On 4/28/11, Gleb Natapov wrote: > of virt stack. You should use libvirt or virt-manager instead. Especially > if you are concerned about security. I think libvirt can start guest on > headless server. > > If this still fails for you you need to complain to libvirt developers > (not in a rant mode, but providing details of what exact version of > software you have problem with and what are you trying to do). And > libvirt developers will not be shy to complain to qemu developers if the > problem turned to be on their side. I've finally got an installation working, not using virt-install or virt-manager. After reading through the libvirt site, I started writing the domain definition manually. Through trial and error, comparison with what virt-install generated and the online examples, I got a working xml. Just for the record, virsh --version reports 0.8.1 For the benefits of other newbies, my discovery so far is that 1. No-activity after the guest VM started Originally, when I specified the CentOS DVD ISO, the guest will load and then do nothing but chew up 100% CPU cycle on the allocated 1 vcpu for quite some time. Subsequently, it appeared that mounting the ISO as loop back is the solution. This seemed to imply that libvirt or KVM couldn't boot a guest from ISO... which didn't quite make sense. I ran into the issue when using my manually generated XML, it turned out that the reason was the permissions (644) on vmlinuz and initrd.img on the DVD. By copying the two files to local disk, changing the permissions and using the and options, I was able to boot the guest. I was curious how virt-install got around this and learnt that I could dump the config from a running machine. It turns out that virt-install didn't exactly use the .xml it created, it added stuff to the running version. Importantly making a temporary copy of initrd.img and vmlinuz. I think the ISO problem with virt-install may be that it was unable to mount the ISO to copy these files despite me running it as root. 2. Anaconda couldn't see the DVD Which was my rant earlier, since it sounded really stupid that the installer couldn't see the disc it just booted off. Now, with #1 solved, it seems that anaconda wasn't booting off the disc after all. However, the interesting thing here is that once I got past #1, My guest could install from the DVD. After comparing the xml files, it seems the problem is virt-install did not save the path to the ISO/mounted DVD. Under the element, there wasn't a source. With my manually generated xml, specifying the ISO as the source worked. But the virt-installed anaconda was complaining I don't have any hard disks or cdroms. Not that there was no disk in the drive. Everytime I picked an option like install media in HDD or CDROM, it prompted me no device, do I want to install a driver. Since the hard disk definition appears to be the same, I'm not sure why that happened with virt-install's xml but not mine. So right now, I managed to get the OS installed, rebooting it required removing the initrd and kernel entry as well as the source so that it would boot from the image disk. Only problem is... networking still isn't working although brctl show on the host shows that a vnet0 had been created and attached to the bridge. Any pointers would be appreciated! -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] kvm tools: Emulate RTC to fix system time in guests
* Pekka Enberg wrote: > This patch fixes system time in guests by implementing proper CMOS RTC clock > support. > > # Before: > > sh-2.05b# date > Fri Aug 7 04:02:01 UTC 2009 > > # After: > > sh-2.05b# date > Thu Apr 28 19:12:21 UTC 2011 Very nice! Ingo -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug in KVM clock backwards compensation
On Thu, Apr 28, 2011 at 11:34:44AM -0700, Zachary Amsden wrote: > On 04/28/2011 12:13 AM, Roedel, Joerg wrote: >> I see it different. This code wants to check if the _guest_ tsc moves >> forwared (or at least not backwards). So it is fully legitimate to just >> do this by reading the guest-tsc and compare it to the last one the >> guest had. > > That wasn't the intention when I wrote that code. It's simply there to > detect backwards motion of the host TSC. The guest TSC can legally go > backwards whenever the guest decides to change it, so checking the guest > TSC doesn't make sense here. This code checks how many guest tsc cycles have passed since this vCPU was de-scheduled last time (and before it is running again). So since the vCPU hasn't run in the meantime it had no chance to change its TSC. Further, the other parameters like the TSC offset and the scaling multiplier havn't changed too, so the only variable in the guest-tsc calculation is the host-tsc. So this calculation using the guest-tsc can detect backwards going host-tsc as good as the old one. The benefit here is that we can feed consistent values into adjust_tsc_offset(). > Yes, with tsc-scaling, the machines already have stable TSCs - the above > test is for older hardware which could have problems, and can be > reverted back to the original code without worrying about switching > units. This is the case pratically. But architecturally the tsc-scaling feature does not depend on a constant tsc, so we can make no such assumtions. Additionally, it may happen that Linux mis-detects an unstable tsc for some reason (broken BIOS, bug in the code, ...). Therefore I think it is dangerous to assume that this code will never run on tsc-scaling capable hosts. And if it does and we don't manage the tsc-offset units right, we may see very weird behavior. Regards, Joerg -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm tools: Emulate RTC to fix system time in guests
This patch fixes system time in guests by implementing proper CMOS RTC clock support. # Before: sh-2.05b# date Fri Aug 7 04:02:01 UTC 2009 # After: sh-2.05b# date Thu Apr 28 19:12:21 UTC 2011 Cc: Asias He Cc: Cyrill Gorcunov Cc: Ingo Molnar Cc: Prasad Joshi Cc: Sasha Levin Signed-off-by: Pekka Enberg --- tools/kvm/Makefile |3 +- tools/kvm/include/kvm/rtc.h |6 +++ tools/kvm/ioport.c | 26 - tools/kvm/kvm-run.c |3 + tools/kvm/rtc.c | 87 +++ 5 files changed, 98 insertions(+), 27 deletions(-) create mode 100644 tools/kvm/include/kvm/rtc.h create mode 100644 tools/kvm/rtc.c diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile index fbce14d..659bc35 100644 --- a/tools/kvm/Makefile +++ b/tools/kvm/Makefile @@ -26,8 +26,9 @@ OBJS += kvm-cpu.o OBJS += main.o OBJS += mmio.o OBJS += pci.o -OBJS += util.o +OBJS += rtc.o OBJS += term.o +OBJS += util.o OBJS += virtio.o OBJS+= util/parse-options.o OBJS+= util/strbuf.o diff --git a/tools/kvm/include/kvm/rtc.h b/tools/kvm/include/kvm/rtc.h new file mode 100644 index 000..0b8d9f9 --- /dev/null +++ b/tools/kvm/include/kvm/rtc.h @@ -0,0 +1,6 @@ +#ifndef KVM__RTC_H +#define KVM__RTC_H + +void rtc__init(void); + +#endif /* KVM__RTC_H */ diff --git a/tools/kvm/ioport.c b/tools/kvm/ioport.c index e3f67fc..a38d2d1 100644 --- a/tools/kvm/ioport.c +++ b/tools/kvm/ioport.c @@ -12,28 +12,6 @@ bool ioport_debug; -static uint8_t ioport_to_uint8(void *data) -{ - uint8_t *p = data; - - return *p; -} - -static bool cmos_ram_rtc_io_out(struct kvm *self, uint16_t port, void *data, int size, uint32_t count) -{ - uint8_t value; - - value = ioport_to_uint8(data); - - self->nmi_disabled = value & (1UL << 7); - - return true; -} - -static struct ioport_operations cmos_ram_rtc_ops = { - .io_out = cmos_ram_rtc_io_out, -}; - static bool debug_io_out(struct kvm *self, uint16_t port, void *data, int size, uint32_t count) { exit(EXIT_SUCCESS); @@ -128,10 +106,6 @@ void ioport__setup_legacy(void) ioport__register(0x0060, &dummy_read_write_ioport_ops, 2); ioport__register(0x0064, &dummy_read_write_ioport_ops, 1); - /* PORT 0070-007F - CMOS RAM/RTC (REAL TIME CLOCK) */ - ioport__register(0x0070, &cmos_ram_rtc_ops, 1); - ioport__register(0x0071, &dummy_read_write_ioport_ops, 1); - /* 0x00A0 - 0x00AF - 8259A PIC 2 */ ioport__register(0x00A0, &dummy_read_write_ioport_ops, 2); diff --git a/tools/kvm/kvm-run.c b/tools/kvm/kvm-run.c index b21b4a2..64f3409 100644 --- a/tools/kvm/kvm-run.c +++ b/tools/kvm/kvm-run.c @@ -22,6 +22,7 @@ #include #include #include +#include #include #include #include @@ -416,6 +417,8 @@ int kvm_cmd_run(int argc, const char **argv, const char *prefix) ioport__setup_legacy(); + rtc__init(); + kvm__setup_bios(kvm); serial8250__init(kvm); diff --git a/tools/kvm/rtc.c b/tools/kvm/rtc.c new file mode 100644 index 000..b9c09b9 --- /dev/null +++ b/tools/kvm/rtc.c @@ -0,0 +1,87 @@ +#include "kvm/rtc.h" + +#include "kvm/ioport.h" +#include "kvm/kvm.h" + +#include + +static uint8_t cmos_index; + +#define CMOS_RTC_SECONDS 0x00 +#define CMOS_RTC_MINUTES 0x02 +#define CMOS_RTC_HOURS 0x04 +#define CMOS_RTC_DATE_OF_MONTH 0x07 +#define CMOS_RTC_MONTH 0x08 +#define CMOS_RTC_YEAR 0x09 + +static inline unsigned char bin2bcd(unsigned val) +{ + return ((val / 10) << 4) + val % 10; +} + +static bool cmos_ram_data_in(struct kvm *self, uint16_t port, void *data, int size, uint32_t count) +{ + struct tm *tm; + time_t ti; + + time(&ti); + + tm = gmtime(&ti); + + switch (cmos_index) { + case CMOS_RTC_SECONDS: + ioport__write8(data, bin2bcd(tm->tm_sec)); + break; + case CMOS_RTC_MINUTES: + ioport__write8(data, bin2bcd(tm->tm_min)); + break; + case CMOS_RTC_HOURS: + ioport__write8(data, bin2bcd(tm->tm_hour)); + break; + case CMOS_RTC_DATE_OF_MONTH: + ioport__write8(data, bin2bcd(tm->tm_mday)); + break; + case CMOS_RTC_MONTH: + ioport__write8(data, bin2bcd(tm->tm_mon + 1)); + break; + case CMOS_RTC_YEAR: + ioport__write8(data, bin2bcd(tm->tm_year)); + break; + } + + return true; +} + +static bool cmos_ram_data_out(struct kvm *self, uint16_t port, void *data, int size, uint32_t count) +{ + return true; +} + +static struct ioport_operations cmos_ram_data_ioport_ops = { + .io_out = cmos_ram_data_out, + .io_in = cmos_ram_data_in, +}; + +static bool cmos_ram_index_out(struct kvm *self, uint16_t port, vo
[PATCH 3/3] KVM test: control: Replace create_report function call
Following up with the previous changes. Signed-off-by: Lucas Meneghel Rodrigues --- client/tests/kvm/control |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/client/tests/kvm/control b/client/tests/kvm/control index 6437d88..eb95a33 100644 --- a/client/tests/kvm/control +++ b/client/tests/kvm/control @@ -23,6 +23,7 @@ For online docs, please refer to http://www.linux-kvm.org/page/KVM-Autotest import sys, os, logging from autotest_lib.client.common_lib import cartesian_config from autotest_lib.client.virt import virt_utils +from autotest_lib.client.tools import html_report # set English environment (command output might be localized, need to be safe) os.environ['LANG'] = 'en_US.UTF-8' @@ -69,4 +70,4 @@ parser.parse_string(str) virt_utils.run_tests(parser, job) # Generate a nice HTML report inside the job's results dir -virt_utils.create_report(kvm_test_dir, job.resultdir) +html_report.create_report(job.resultdir) -- 1.7.4.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] virt.virt_utils: Get rid of create_report function
As it has been transfered to the html_report module itself. Signed-off-by: Lucas Meneghel Rodrigues --- client/virt/virt_utils.py | 12 1 files changed, 0 insertions(+), 12 deletions(-) diff --git a/client/virt/virt_utils.py b/client/virt/virt_utils.py index 97a652d..8fa64ca 100644 --- a/client/virt/virt_utils.py +++ b/client/virt/virt_utils.py @@ -1187,18 +1187,6 @@ def run_tests(parser, job): return not failed -def create_report(report_dir, results_dir): -""" -Creates a neatly arranged HTML results report in the results dir. - -@param report_dir: Directory where the report script is located. -@param results_dir: Directory where the results will be output. -""" -reporter = os.path.join(report_dir, 'html_report.py') -html_file = os.path.join(results_dir, 'results.html') -os.system('%s -r %s -f %s -R' % (reporter, results_dir, html_file)) - - def display_attributes(instance): """ Inspects a given class instance attributes and displays them, convenient -- 1.7.4.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] tools/html_report: Make html report generation autotest generic
The html report tool was a bit too tied to KVM autotest, and that was not necessary. Turn it into a generic module that can be used by any autotest job, encapsulate the main worker function into autotest API so we don't need to exec it in a subshell, fix naming of the report and CSS details. Signed-off-by: Lucas Meneghel Rodrigues --- client/tools/html_report.py | 85 --- 1 files changed, 47 insertions(+), 38 deletions(-) diff --git a/client/tools/html_report.py b/client/tools/html_report.py index 8b4b109..1b1584c 100755 --- a/client/tools/html_report.py +++ b/client/tools/html_report.py @@ -27,7 +27,6 @@ body { text-decoration:none; font:bold 2em/2em Arial, Helvetica, sans-serif; text-transform:none; -text-shadow: 2px 2px 2px #555; text-align: left; color:#55; border-bottom: 1px solid #55; @@ -37,7 +36,6 @@ body { text-decoration:none; font:bold 16px Arial, Helvetica, sans-serif; text-transform:uppercase; -text-shadow: 2px 2px 2px #555; text-align: left; color:#55; margin-bottom:0; @@ -1388,7 +1386,7 @@ def make_html_file(metadata, results, tag, host, output_file_name, dirname): http://www.w3.org/TR/html4/strict.dtd";> -KVM Autotest Results +Autotest job execution results %s @@ -1414,7 +1412,7 @@ return true; output = sys.stdout # create html page print >> output, html_prefix -print >> output, 'KVM Autotest Execution Report' +print >> output, 'Autotest job execution report' # formating date and time to print t = datetime.datetime.now() @@ -1446,7 +1444,8 @@ return true; print >> output, 'DATE:%s' % now.ctime() print >> output, 'STATS:%s'% stat_str print >> output, '' -print >> output, 'KVM VERSION:%s' % kvm_ver_str +if kvm_ver_str: +print >> output, 'KVM VERSION:%s' % kvm_ver_str print >> output, '' @@ -1556,7 +1555,7 @@ def parse_result(dirname, line): # assign actual values rx = re.compile('^(\w+)\.(.*)$') m1 = rx.findall(parts[3]) -result['testcase'] = m1[0][1] +result['testcase'] = str(tag) result['title'] = str(tag) result['status'] = parts[1] if result['status'] != 'GOOD': @@ -1648,9 +1647,50 @@ def get_kvm_version(result_dir): kvm_version = get_keyval_value(result_dir, "kvm_version") kvm_userspace_version = get_keyval_value(result_dir, "kvm_userspace_version") +if kvm_version == "Unknown" or kvm_userspace_version == "Unknown": +return None return "Kernel: %sUserspace: %s" % (kvm_version, kvm_userspace_version) +def create_report(dirname, html_path=None, output_file_name=None): +res_dir = os.path.abspath(dirname) +tag = res_dir +status_file_name = dirname + '/status' +sysinfo_dir = dirname + '/sysinfo' +host = get_info_file('%s/hostname' % sysinfo_dir) +rx = re.compile('^\s+[END|START].*$') +# create the results set dict +results_data = [] +if os.path.exists(status_file_name): +f = open(status_file_name, "r") +lines = f.readlines() +f.close() +for line in lines: +if rx.match(line): +result_dict = parse_result(dirname, line) +if result_dict: +results_data.append(result_dict) +# create the meta info dict +metalist = { +'uname': get_info_file('%s/uname' % sysinfo_dir), +'cpuinfo':get_info_file('%s/cpuinfo' % sysinfo_dir), +'meminfo':get_info_file('%s/meminfo' % sysinfo_dir), +'df':get_info_file('%s/df' % sysinfo_dir), +'modules':get_info_file('%s/modules' % sysinfo_dir), +'gcc':get_info_file('%s/gcc_--version' % sysinfo_dir), +'dmidecode':get_info_file('%s/dmidecode' % sysinfo_dir), +'dmesg':get_info_file('%s/dmesg' % sysinfo_dir), +'kvmver':get_kvm_version(dirname) +} + +if html_path is None: +html_path = dirname +if output_file_name is None: +output_file_name = os.path.join(dirname, 'job_report.html') +make_html_file(metalist, results_data, tag, host, output_file_name, + html_path) + + def main(argv): dirname = None output_file_name = None @@ -1682,38 +1722,7 @@ def main(argv): if dirname: if os.path.isdir(dirname): # TBD: replace it with a validation of # autotest result dir -res_dir = os.path.abspath(dirname) -tag = res_dir -status_file_name = dirname + '/status' -sysinfo_dir = dirname + '/sysinfo' -host = get_info_file('%s/hostname' % sysinfo_dir) -rx = re.compile('^\s+[END|START].*$') -# create the results set dict -results_data = [
[PATCH 0/3] Fix HTML report generation
Made HTML report generation generic and useable by any autotest client job, fixed up some bugs in HTML report code. Now it'll be easier to browse and visualize results without the autotest web interface. Lucas Meneghel Rodrigues (3): tools/html_report: Make html report generation autotest generic virt.virt_utils: Get rid of create_report function KVM test: control: Replace create_report function call client/tests/kvm/control|3 +- client/tools/html_report.py | 85 --- client/virt/virt_utils.py | 12 -- 3 files changed, 49 insertions(+), 51 deletions(-) -- 1.7.4.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug in KVM clock backwards compensation
On 04/28/2011 12:22 AM, Roedel, Joerg wrote: On Thu, Apr 28, 2011 at 03:06:01AM -0400, Jan Kiszka wrote: And /me still wonders (like I did when this first popped up) if the proper place of determining TSC stability really have to be KVM. If the Linux core fails to detect some instability and KVM has to jump in, shouldn't we better improve the core's detection abilities and make use of them in KVM? Conceptually this looks like we are currently just working around a core deficit in KVM. Yes, good question. Has this ever triggered on a real machine (not counting the suspend/resume issue in)? Yes... some platforms don't go haywire until you start using power mangement, TSC is stable before that, but not afterwards, and depending on the version of the kernel, KVM might detect this before the kernel does. Honestly, the code is obsolete, but still useful for those who build KVM as an external module on older kernels using the kvm-kmod system. Zach -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH v3 1/5] hpet 'driftfix': add hooks required to detect coalesced interrupts (x86 apic only)
On Thu, Apr 28, 2011 at 5:24 PM, Ulrich Obergfell wrote: > 'target_get_irq_delivered' and 'target_reset_irq_delivered' contain > entry addresses of functions that are utilized by update_irq() to > detect coalesced interrupts. apic code loads these pointers during > initialization. I'm not so happy with this approach, but probably then the i386 dependencies can be removed from RTC and it can be compiled only once for all targets. > This change can be replaced if a generic feedback infrastructure to > track coalesced IRQs for periodic, clock providing devices becomes > available. > > Signed-off-by: Ulrich Obergfell > --- > hw/apic.c | 4 > sysemu.h | 3 +++ > vl.c | 3 +++ > 3 files changed, 10 insertions(+), 0 deletions(-) > > diff --git a/hw/apic.c b/hw/apic.c > index a45b57f..eb0f6d9 100644 > --- a/hw/apic.c > +++ b/hw/apic.c > @@ -24,6 +24,7 @@ > #include "sysbus.h" > #include "trace.h" > #include "kvm.h" > +#include "sysemu.h" > > /* APIC Local Vector Table */ > #define APIC_LVT_TIMER 0 > @@ -1143,6 +1144,9 @@ static SysBusDeviceInfo apic_info = { > > static void apic_register_devices(void) > { > + target_get_irq_delivered = apic_get_irq_delivered; > + target_reset_irq_delivered = apic_reset_irq_delivered; > + > sysbus_register_withprop(&apic_info); > } > > diff --git a/sysemu.h b/sysemu.h > index 07d85cd..75b0139 100644 > --- a/sysemu.h > +++ b/sysemu.h > @@ -98,6 +98,9 @@ int qemu_savevm_state_complete(Monitor *mon, QEMUFile *f); > void qemu_savevm_state_cancel(Monitor *mon, QEMUFile *f); > int qemu_loadvm_state(QEMUFile *f); > > +extern int (*target_get_irq_delivered)(void); > +extern void (*target_reset_irq_delivered)(void); > + > /* SLIRP */ > void do_info_slirp(Monitor *mon); > > diff --git a/vl.c b/vl.c > index 0c24e07..7bab315 100644 > --- a/vl.c > +++ b/vl.c > @@ -233,6 +233,9 @@ const char *prom_envs[MAX_PROM_ENVS]; > const char *nvram = NULL; > int boot_menu; > > +int (*target_get_irq_delivered)(void) = 0; > +void (*target_reset_irq_delivered)(void) = 0; Instead of initializing with 0 (should be actually NULL in C), please define stub functions, which are used by default. Then the users don't need to check for NULL pointers. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug in KVM clock backwards compensation
On 04/28/2011 12:13 AM, Roedel, Joerg wrote: On Thu, Apr 28, 2011 at 02:59:57AM -0400, Zachary Amsden wrote: So I've been going over the new code changes to the TSC related code and I don't like one particular set of changes. In particular, here: kvm_x86_ops->vcpu_load(vcpu, cpu); if (unlikely(vcpu->cpu != cpu) || check_tsc_unstable()) { /* Make sure TSC doesn't go backwards */ s64 tsc_delta; u64 tsc; kvm_get_msr(vcpu, MSR_IA32_TSC,&tsc); tsc_delta = !vcpu->arch.last_guest_tsc ? 0 : tsc - vcpu->arch.last_guest_tsc; if (tsc_delta< 0) mark_tsc_unstable("KVM discovered backwards TSC"); if (check_tsc_unstable()) { kvm_x86_ops->adjust_tsc_offset(vcpu, -tsc_delta); vcpu->arch.tsc_catchup = 1; } The point of this code fragment is to test the host clock to see if it is stable, because we may have just come back from an idle phase which stopped the TSC, switched CPUs, or come back from a deep sleep state which reset the host TSC. I see it different. This code wants to check if the _guest_ tsc moves forwared (or at least not backwards). So it is fully legitimate to just do this by reading the guest-tsc and compare it to the last one the guest had. That wasn't the intention when I wrote that code. It's simply there to detect backwards motion of the host TSC. The guest TSC can legally go backwards whenever the guest decides to change it, so checking the guest TSC doesn't make sense here. I saw a patch floating around that touched this code recently, but I think there's a definite issue here that needs addressing. In fact, this change was done to address one of your concerns. You mentioned that the values passed to adjust_tsc_offset() were in unconsistent units in my first version of tsc-scaling. This was a right objection because one call-site used guest-tsc-units while the other used host-tsc-units. This change intended to fix that by using guest-tsc-units always for adjust_tsc_offset(). Not that the guest and the host tsc have the same units on current machines. But with tsc-scaling these units are different. Yes, with tsc-scaling, the machines already have stable TSCs - the above test is for older hardware which could have problems, and can be reverted back to the original code without worrying about switching units. Zach -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug in KVM clock backwards compensation
On 04/28/2011 12:06 AM, Jan Kiszka wrote: On 2011-04-28 08:59, Zachary Amsden wrote: So I've been going over the new code changes to the TSC related code and I don't like one particular set of changes. In particular, here: kvm_x86_ops->vcpu_load(vcpu, cpu); if (unlikely(vcpu->cpu != cpu) || check_tsc_unstable()) { /* Make sure TSC doesn't go backwards */ s64 tsc_delta; u64 tsc; kvm_get_msr(vcpu, MSR_IA32_TSC,&tsc); tsc_delta = !vcpu->arch.last_guest_tsc ? 0 : tsc - vcpu->arch.last_guest_tsc; if (tsc_delta< 0) mark_tsc_unstable("KVM discovered backwards TSC"); if (check_tsc_unstable()) { kvm_x86_ops->adjust_tsc_offset(vcpu, -tsc_delta); vcpu->arch.tsc_catchup = 1; } The point of this code fragment is to test the host clock to see if it is stable, because we may have just come back from an idle phase which stopped the TSC, switched CPUs, or come back from a deep sleep state which reset the host TSC. However, the above code is fetching the guest TSC instead of the host TSC, which isn't the way it is supposed to work. I saw a patch floating around that touched this code recently, but I think there's a definite issue here that needs addressing. And /me still wonders (like I did when this first popped up) if the proper place of determining TSC stability really have to be KVM. No, it's not. I have a patch which fixes that. Was in the process of merging it into my local tree when I found this. If the Linux core fails to detect some instability and KVM has to jump in, shouldn't we better improve the core's detection abilities and make use of them in KVM? Conceptually this looks like we are currently just working around a core deficit in KVM. 100% correct you are. Zach -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [CentOS] Install CentOS as KVM guest
On 4/28/11, Gleb Natapov wrote: > Qemu is not intended to be used directly by end user. It is too complex as > you already found out. VMware don't even give you access to such low parts > of virt stack. You should use libvirt or virt-manager instead. Especially > if you are concerned about security. I think libvirt can start guest on > headless server. Sorry for the confusion, I was using libvirtd in CLI, i.e. virt-install and virsh, not qemu directly. > If this still fails for you you need to complain to libvirt developers > (not in a rant mode, but providing details of what exact version of > software you have problem with and what are you trying to do). And > libvirt developers will not be shy to complain to qemu developers if the > problem turned to be on their side. Apologies about the rant mode as well. Before that, I tried sending two emails (25th and 26th Apr) to the KVM list with some details, hoping to get some advice. But each of these failed to materialize on the kvm list for unknown reasons. So I resorted to posting to the CentOS list, where I did get some response for which I'm very thankful. The rant post came when despite the additional advice which helped me get a little further, I keep running into unexpected brickwalls like anaconda not seeing the "dvd" (mounted ISO specified using --location) that it just booted from. Out of frustration, I CC'd that particular email to the kvm list, figuring that since it's likely to get me flamed, the imp of perversity would probably let it through... and it did. > As far as I know libvirt has no problem using bridged networking and > virt-manager use libvirt so it should work if you use new enough virt > stack, but you should ask on libvirt mailing list instead. I guess those were outdated warnings on older versions. I'll give it another spin given some of the new suggestions like using virt-install to create the disk file. If it still doesn't work, I'll go check the libvirt ML (I'm belatedly getting the idea that libvirt is not part of kvm). -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Virtualize BTS?
On 04/28/2011 06:03 PM, Jun Koi wrote: hi, i am wondering if current KVM version virtualizes the BTS (Branch Trace Store) facility? do we have this facility inside VM? No. On AMD we do virtualize a similar feature (LBR). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] pci-assign: Clean up assigned_dev_pci_read/write_config
On 2011-04-28 16:54, Alex Williamson wrote: > On Thu, 2011-04-28 at 16:46 +0200, Jan Kiszka wrote: >> On 2011-04-28 16:29, Alex Williamson wrote: >>> On Thu, 2011-04-28 at 10:59 +0200, Jan Kiszka wrote: Use rages_overlap and proper constants to match the access range against >>> ^ typo - only if you resend >>> regions that need special handling. This also fixes yet uncaught high-byte write access to the command register. Moreover, use more constants instead of magic numbers. Signed-off-by: Jan Kiszka --- hw/device-assignment.c | 39 +-- 1 files changed, 29 insertions(+), 10 deletions(-) diff --git a/hw/device-assignment.c b/hw/device-assignment.c index 606d725..3481c93 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -404,13 +404,20 @@ static void assigned_dev_pci_write_config(PCIDevice *d, uint32_t address, return assigned_device_pci_cap_write_config(d, address, val, len); } -if (address == 0x4) { +if (ranges_overlap(address, len, PCI_COMMAND, 2)) { pci_default_write_config(d, address, val, len); /* Continue to program the card */ } -if ((address >= 0x10 && address <= 0x24) || address == 0x30 || -address == 0x34 || address == 0x3c || address == 0x3d) { +/* + * Catch access to + * - base address registers + * - ROM base address & capability pointer + * - interrupt line & pin + */ +if (ranges_overlap(address, len, PCI_BASE_ADDRESS_0, 24) || +ranges_overlap(address, len, PCI_ROM_ADDRESS, 8) || >>> >>> Should this be 5 bytes instead of 8? I'm not sure why we'd add catching >>> these reserved fields, but not those immediately after this range. >> >> Yes, that's asking for clarification: Should we allow direct access to >> the complete reserved space or virtualize it? Depending on this, the >> proper value should be 5 or 14 (the latter would also save one >> ranges_overlap). > > I vote for 5 here since a cleanup patch shouldn't have behavior changes > hidden in it. I don't see any great value in virtualizing reserved > bits. It seems like it could only make things not work if a vendor was > stupid enough to hide something in there. Thanks, Thinking about this and the other sub-dword matches again, there is a problem in my approach: I redirect the complete write to the virtualized space once there is an overlap. So a dword write at PCI_INTERRUPT_LINE would not touch the real PCI_MIN_GNT & PCI_MAX_LAT. But writing directly to those without overlap would. The same applies to our reserved space, though this inconsistency is not critical here - but still undesirable. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/5] pci-assign: Fix dword read at PCI_COMMAND
On 2011-04-28 16:51, Alex Williamson wrote: > On Thu, 2011-04-28 at 10:59 +0200, Jan Kiszka wrote: >> If we emulate the command register, we must only read its content from >> the shadow config space. For dword read of both PCI_COMMAND and >> PCI_STATUS, at least the latter must be read from the device. >> >> For simplicity reasons and as the code path is not considered >> performance critical for the affected SRIOV devices, the fix performes >> device access to the command word unconditionally, even if emulation is >> enabled and only that word is read. >> >> Signed-off-by: Jan Kiszka >> --- >> hw/device-assignment.c |8 +--- >> 1 files changed, 5 insertions(+), 3 deletions(-) >> >> diff --git a/hw/device-assignment.c b/hw/device-assignment.c >> index f37f108..ee81434 100644 >> --- a/hw/device-assignment.c >> +++ b/hw/device-assignment.c >> @@ -482,14 +482,11 @@ static uint32_t assigned_dev_pci_read_config(PCIDevice >> *d, uint32_t address, >> /* >> * Catch access to >> * - vendor & device ID >> - * - command register (if emulation needed) >> * - base address registers >> * - ROM base address & capability pointer >> * - interrupt line & pin >> */ >> if (ranges_overlap(address, len, PCI_VENDOR_ID, 4) || >> -(pci_dev->need_emulate_cmd && >> - ranges_overlap(address, len, PCI_COMMAND, 2)) || >> ranges_overlap(address, len, PCI_BASE_ADDRESS_0, 24) || >> ranges_overlap(address, len, PCI_ROM_ADDRESS, 8) || >> ranges_overlap(address, len, PCI_INTERRUPT_LINE, 2)) { >> @@ -521,6 +518,11 @@ do_log: >> DEBUG("(%x.%x): address=%04x val=0x%08x len=%d\n", >>(d->devfn >> 3) & 0x1F, (d->devfn & 0x7), address, val, len); >> >> +if (pci_dev->need_emulate_cmd) { >> +val = merge_bits(val, pci_default_read_config(d, PCI_COMMAND, 2), >> + address, len, PCI_COMMAND, 0x); > > Shouldn't this be pci_default_read_config(d, address, len)? I think > what you have works since PCI_COMMAND is at the start of a dword, but it > violates the merge_bits() assumption that val and mval are from the same > address with the same length. Thanks, This is actually a real issue, reading a byte from PCI_COMMAND+1 is broken. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Virtualize BTS?
hi, i am wondering if current KVM version virtualizes the BTS (Branch Trace Store) facility? do we have this facility inside VM? thanks, Jun -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] pci-assign: Clean up assigned_dev_pci_read/write_config
On 2011-04-28 16:54, Alex Williamson wrote: > On Thu, 2011-04-28 at 16:46 +0200, Jan Kiszka wrote: >> On 2011-04-28 16:29, Alex Williamson wrote: >>> On Thu, 2011-04-28 at 10:59 +0200, Jan Kiszka wrote: Use rages_overlap and proper constants to match the access range against >>> ^ typo - only if you resend >>> regions that need special handling. This also fixes yet uncaught high-byte write access to the command register. Moreover, use more constants instead of magic numbers. Signed-off-by: Jan Kiszka --- hw/device-assignment.c | 39 +-- 1 files changed, 29 insertions(+), 10 deletions(-) diff --git a/hw/device-assignment.c b/hw/device-assignment.c index 606d725..3481c93 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -404,13 +404,20 @@ static void assigned_dev_pci_write_config(PCIDevice *d, uint32_t address, return assigned_device_pci_cap_write_config(d, address, val, len); } -if (address == 0x4) { +if (ranges_overlap(address, len, PCI_COMMAND, 2)) { pci_default_write_config(d, address, val, len); /* Continue to program the card */ } -if ((address >= 0x10 && address <= 0x24) || address == 0x30 || -address == 0x34 || address == 0x3c || address == 0x3d) { +/* + * Catch access to + * - base address registers + * - ROM base address & capability pointer + * - interrupt line & pin + */ +if (ranges_overlap(address, len, PCI_BASE_ADDRESS_0, 24) || +ranges_overlap(address, len, PCI_ROM_ADDRESS, 8) || >>> >>> Should this be 5 bytes instead of 8? I'm not sure why we'd add catching >>> these reserved fields, but not those immediately after this range. >> >> Yes, that's asking for clarification: Should we allow direct access to >> the complete reserved space or virtualize it? Depending on this, the >> proper value should be 5 or 14 (the latter would also save one >> ranges_overlap). > > I vote for 5 here since a cleanup patch shouldn't have behavior changes > hidden in it. I don't see any great value in virtualizing reserved > bits. It seems like it could only make things not work if a vendor was > stupid enough to hide something in there. Thanks, Yeah, and as we properly restore the config space now, it should be Mostly Harmless. Will update. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -v3] Monitor command: x-gpa2hva, translate guest physical address to host virtual address
On Thu, Apr 28, 2011 at 08:00:19AM -0500, Anthony Liguori wrote: > On 04/27/2011 06:06 PM, Marcelo Tosatti wrote: > >On Fri, Nov 19, 2010 at 04:17:35PM +0800, Huang Ying wrote: > >>On Tue, 2010-11-16 at 10:23 +0800, Huang Ying wrote: > >>>Author: Max Asbock > >>> > >>>Add command x-gpa2hva to translate guest physical address to host > >>>virtual address. Because gpa to hva translation is not consistent, so > >>>this command is only used for debugging. > >>> > >>>The x-gpa2hva command provides one step in a chain of translations from > >>>guest virtual to guest physical to host virtual to host physical. Host > >>>physical is then used to inject a machine check error. As a > >>>consequence the HWPOISON code on the host and the MCE injection code > >>>in qemu-kvm are exercised. > >>> > >>>v3: > >>> > >>>- Rename to x-gpa2hva > >>>- Remove QMP support, because gpa2hva is not consistent > >> > >>Is this patch an acceptable solution for now? This command is useful for > >>our testing. > > > >Anthony? > > Yeah, but it should come through qemu-devel, no? Yes, Huang Ying, can you please resend? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] pci-assign: Clean up assigned_dev_pci_read/write_config
On Thu, 2011-04-28 at 16:46 +0200, Jan Kiszka wrote: > On 2011-04-28 16:29, Alex Williamson wrote: > > On Thu, 2011-04-28 at 10:59 +0200, Jan Kiszka wrote: > >> Use rages_overlap and proper constants to match the access range against > > ^ typo - only if you resend > > > >> regions that need special handling. This also fixes yet uncaught > >> high-byte write access to the command register. Moreover, use more > >> constants instead of magic numbers. > >> > >> Signed-off-by: Jan Kiszka > >> --- > >> hw/device-assignment.c | 39 +-- > >> 1 files changed, 29 insertions(+), 10 deletions(-) > >> > >> diff --git a/hw/device-assignment.c b/hw/device-assignment.c > >> index 606d725..3481c93 100644 > >> --- a/hw/device-assignment.c > >> +++ b/hw/device-assignment.c > >> @@ -404,13 +404,20 @@ static void assigned_dev_pci_write_config(PCIDevice > >> *d, uint32_t address, > >> return assigned_device_pci_cap_write_config(d, address, val, len); > >> } > >> > >> -if (address == 0x4) { > >> +if (ranges_overlap(address, len, PCI_COMMAND, 2)) { > >> pci_default_write_config(d, address, val, len); > >> /* Continue to program the card */ > >> } > >> > >> -if ((address >= 0x10 && address <= 0x24) || address == 0x30 || > >> -address == 0x34 || address == 0x3c || address == 0x3d) { > >> +/* > >> + * Catch access to > >> + * - base address registers > >> + * - ROM base address & capability pointer > >> + * - interrupt line & pin > >> + */ > >> +if (ranges_overlap(address, len, PCI_BASE_ADDRESS_0, 24) || > >> +ranges_overlap(address, len, PCI_ROM_ADDRESS, 8) || > > > > Should this be 5 bytes instead of 8? I'm not sure why we'd add catching > > these reserved fields, but not those immediately after this range. > > Yes, that's asking for clarification: Should we allow direct access to > the complete reserved space or virtualize it? Depending on this, the > proper value should be 5 or 14 (the latter would also save one > ranges_overlap). I vote for 5 here since a cleanup patch shouldn't have behavior changes hidden in it. I don't see any great value in virtualizing reserved bits. It seems like it could only make things not work if a vendor was stupid enough to hide something in there. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/5] pci-assign: Fix dword read at PCI_COMMAND
On Thu, 2011-04-28 at 10:59 +0200, Jan Kiszka wrote: > If we emulate the command register, we must only read its content from > the shadow config space. For dword read of both PCI_COMMAND and > PCI_STATUS, at least the latter must be read from the device. > > For simplicity reasons and as the code path is not considered > performance critical for the affected SRIOV devices, the fix performes > device access to the command word unconditionally, even if emulation is > enabled and only that word is read. > > Signed-off-by: Jan Kiszka > --- > hw/device-assignment.c |8 +--- > 1 files changed, 5 insertions(+), 3 deletions(-) > > diff --git a/hw/device-assignment.c b/hw/device-assignment.c > index f37f108..ee81434 100644 > --- a/hw/device-assignment.c > +++ b/hw/device-assignment.c > @@ -482,14 +482,11 @@ static uint32_t assigned_dev_pci_read_config(PCIDevice > *d, uint32_t address, > /* > * Catch access to > * - vendor & device ID > - * - command register (if emulation needed) > * - base address registers > * - ROM base address & capability pointer > * - interrupt line & pin > */ > if (ranges_overlap(address, len, PCI_VENDOR_ID, 4) || > -(pci_dev->need_emulate_cmd && > - ranges_overlap(address, len, PCI_COMMAND, 2)) || > ranges_overlap(address, len, PCI_BASE_ADDRESS_0, 24) || > ranges_overlap(address, len, PCI_ROM_ADDRESS, 8) || > ranges_overlap(address, len, PCI_INTERRUPT_LINE, 2)) { > @@ -521,6 +518,11 @@ do_log: > DEBUG("(%x.%x): address=%04x val=0x%08x len=%d\n", >(d->devfn >> 3) & 0x1F, (d->devfn & 0x7), address, val, len); > > +if (pci_dev->need_emulate_cmd) { > +val = merge_bits(val, pci_default_read_config(d, PCI_COMMAND, 2), > + address, len, PCI_COMMAND, 0x); Shouldn't this be pci_default_read_config(d, address, len)? I think what you have works since PCI_COMMAND is at the start of a dword, but it violates the merge_bits() assumption that val and mval are from the same address with the same length. Thanks, Alex > +} > + > if (!pci_dev->cap.available) { > /* kill the special capabilities */ > if (address == PCI_COMMAND && len == 4) { -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] pci-assign: Clean up assigned_dev_pci_read/write_config
On 2011-04-28 16:29, Alex Williamson wrote: > On Thu, 2011-04-28 at 10:59 +0200, Jan Kiszka wrote: >> Use rages_overlap and proper constants to match the access range against > ^ typo - only if you resend > >> regions that need special handling. This also fixes yet uncaught >> high-byte write access to the command register. Moreover, use more >> constants instead of magic numbers. >> >> Signed-off-by: Jan Kiszka >> --- >> hw/device-assignment.c | 39 +-- >> 1 files changed, 29 insertions(+), 10 deletions(-) >> >> diff --git a/hw/device-assignment.c b/hw/device-assignment.c >> index 606d725..3481c93 100644 >> --- a/hw/device-assignment.c >> +++ b/hw/device-assignment.c >> @@ -404,13 +404,20 @@ static void assigned_dev_pci_write_config(PCIDevice >> *d, uint32_t address, >> return assigned_device_pci_cap_write_config(d, address, val, len); >> } >> >> -if (address == 0x4) { >> +if (ranges_overlap(address, len, PCI_COMMAND, 2)) { >> pci_default_write_config(d, address, val, len); >> /* Continue to program the card */ >> } >> >> -if ((address >= 0x10 && address <= 0x24) || address == 0x30 || >> -address == 0x34 || address == 0x3c || address == 0x3d) { >> +/* >> + * Catch access to >> + * - base address registers >> + * - ROM base address & capability pointer >> + * - interrupt line & pin >> + */ >> +if (ranges_overlap(address, len, PCI_BASE_ADDRESS_0, 24) || >> +ranges_overlap(address, len, PCI_ROM_ADDRESS, 8) || > > Should this be 5 bytes instead of 8? I'm not sure why we'd add catching > these reserved fields, but not those immediately after this range. Yes, that's asking for clarification: Should we allow direct access to the complete reserved space or virtualize it? Depending on this, the proper value should be 5 or 14 (the latter would also save one ranges_overlap). Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] pci-assign: Clean up assigned_dev_pci_read/write_config
On Thu, 2011-04-28 at 10:59 +0200, Jan Kiszka wrote: > Use rages_overlap and proper constants to match the access range against ^ typo - only if you resend > regions that need special handling. This also fixes yet uncaught > high-byte write access to the command register. Moreover, use more > constants instead of magic numbers. > > Signed-off-by: Jan Kiszka > --- > hw/device-assignment.c | 39 +-- > 1 files changed, 29 insertions(+), 10 deletions(-) > > diff --git a/hw/device-assignment.c b/hw/device-assignment.c > index 606d725..3481c93 100644 > --- a/hw/device-assignment.c > +++ b/hw/device-assignment.c > @@ -404,13 +404,20 @@ static void assigned_dev_pci_write_config(PCIDevice *d, > uint32_t address, > return assigned_device_pci_cap_write_config(d, address, val, len); > } > > -if (address == 0x4) { > +if (ranges_overlap(address, len, PCI_COMMAND, 2)) { > pci_default_write_config(d, address, val, len); > /* Continue to program the card */ > } > > -if ((address >= 0x10 && address <= 0x24) || address == 0x30 || > -address == 0x34 || address == 0x3c || address == 0x3d) { > +/* > + * Catch access to > + * - base address registers > + * - ROM base address & capability pointer > + * - interrupt line & pin > + */ > +if (ranges_overlap(address, len, PCI_BASE_ADDRESS_0, 24) || > +ranges_overlap(address, len, PCI_ROM_ADDRESS, 8) || Should this be 5 bytes instead of 8? I'm not sure why we'd add catching these reserved fields, but not those immediately after this range. > +ranges_overlap(address, len, PCI_INTERRUPT_LINE, 2)) { > /* used for update-mappings (BAR emulation) */ > pci_default_write_config(d, address, val, len); > return; > @@ -450,9 +457,20 @@ static uint32_t assigned_dev_pci_read_config(PCIDevice > *d, uint32_t address, > return val; > } > > -if (address < 0x4 || (pci_dev->need_emulate_cmd && address == 0x4) || > - (address >= 0x10 && address <= 0x24) || address == 0x30 || > -address == 0x34 || address == 0x3c || address == 0x3d) { > +/* > + * Catch access to > + * - vendor & device ID > + * - command register (if emulation needed) > + * - base address registers > + * - ROM base address & capability pointer > + * - interrupt line & pin > + */ > +if (ranges_overlap(address, len, PCI_VENDOR_ID, 4) || > +(pci_dev->need_emulate_cmd && > + ranges_overlap(address, len, PCI_COMMAND, 2)) || > +ranges_overlap(address, len, PCI_BASE_ADDRESS_0, 24) || > +ranges_overlap(address, len, PCI_ROM_ADDRESS, 8) || Same here. > +ranges_overlap(address, len, PCI_INTERRUPT_LINE, 2)) { > val = pci_default_read_config(d, address, len); > DEBUG("(%x.%x): address=%04x val=0x%08x len=%d\n", >(d->devfn >> 3) & 0x1F, (d->devfn & 0x7), address, val, len); > @@ -483,10 +501,11 @@ do_log: > > if (!pci_dev->cap.available) { > /* kill the special capabilities */ > -if (address == 4 && len == 4) > -val &= ~0x10; > -else if (address == 6) > -val &= ~0x10; > +if (address == PCI_COMMAND && len == 4) { > +val &= ~(PCI_STATUS_CAP_LIST << 16); > +} else if (address == PCI_STATUS) { > +val &= ~PCI_STATUS_CAP_LIST; > +} > } > > return val; Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 2/5] hpet 'driftfix': add driftfix property to HPETState and DeviceInfo
driftfix is a 'bit type' property. Compensation of delayed callbacks and coalesced interrupts can be enabled with the command line option -global hpet.driftfix=on driftfix is 'off' (disabled) by default. Signed-off-by: Ulrich Obergfell --- hw/hpet.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/hw/hpet.c b/hw/hpet.c index 6ce07bc..7513065 100644 --- a/hw/hpet.c +++ b/hw/hpet.c @@ -72,6 +72,8 @@ typedef struct HPETState { uint64_t isr; /* interrupt status reg */ uint64_t hpet_counter; /* main counter */ uint8_t hpet_id; /* instance id */ + +uint32_t driftfix; } HPETState; static uint32_t hpet_in_legacy_mode(HPETState *s) @@ -738,6 +740,7 @@ static SysBusDeviceInfo hpet_device_info = { .qdev.props = (Property[]) { DEFINE_PROP_UINT8("timers", HPETState, num_timers, HPET_MIN_TIMERS), DEFINE_PROP_BIT("msi", HPETState, flags, HPET_MSI_SUPPORT, false), +DEFINE_PROP_BIT("driftfix", HPETState, driftfix, 0, false), DEFINE_PROP_END_OF_LIST(), }, }; -- 1.6.2.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 4/5] hpet 'driftfix': add code in update_irq() to detect coalesced interrupts (x86 apic only)
update_irq() uses a similar method as in 'rtc_td_hack' to detect coalesced interrupts. The function entry addresses are retrieved from 'target_get_irq_delivered' and 'target_reset_irq_delivered'. This change can be replaced if a generic feedback infrastructure to track coalesced IRQs for periodic, clock providing devices becomes available. Signed-off-by: Ulrich Obergfell --- hw/hpet.c | 15 +-- 1 files changed, 13 insertions(+), 2 deletions(-) diff --git a/hw/hpet.c b/hw/hpet.c index 7ab6e62..35466ae 100644 --- a/hw/hpet.c +++ b/hw/hpet.c @@ -31,6 +31,7 @@ #include "hpet_emul.h" #include "sysbus.h" #include "mc146818rtc.h" +#include "sysemu.h" //#define HPET_DEBUG #ifdef HPET_DEBUG @@ -175,11 +176,12 @@ static inline uint64_t hpet_calculate_diff(HPETTimer *t, uint64_t current) } } -static void update_irq(struct HPETTimer *timer, int set) +static int update_irq(struct HPETTimer *timer, int set) { uint64_t mask; HPETState *s; int route; +int irq_delivered = 1; if (timer->tn <= 1 && hpet_in_legacy_mode(timer->state)) { /* if LegacyReplacementRoute bit is set, HPET specification requires @@ -204,8 +206,17 @@ static void update_irq(struct HPETTimer *timer, int set) qemu_irq_raise(s->irqs[route]); } else { s->isr &= ~mask; -qemu_irq_pulse(s->irqs[route]); +if (s->driftfix && target_get_irq_delivered +&& target_reset_irq_delivered) { +target_reset_irq_delivered(); +qemu_irq_raise(s->irqs[route]); +irq_delivered = target_get_irq_delivered(); +qemu_irq_lower(s->irqs[route]); +} else { +qemu_irq_pulse(s->irqs[route]); +} } +return irq_delivered; } static void hpet_pre_save(void *opaque) -- 1.6.2.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 3/5] hpet 'driftfix': add fields to HPETTimer and VMStateDescription
The new fields in HPETTimer are covered by a separate VMStateDescription which is a subsection of 'vmstate_hpet_timer'. They are only migrated if -global hpet.driftfix=on Signed-off-by: Ulrich Obergfell --- hw/hpet.c | 33 + 1 files changed, 33 insertions(+), 0 deletions(-) diff --git a/hw/hpet.c b/hw/hpet.c index 7513065..7ab6e62 100644 --- a/hw/hpet.c +++ b/hw/hpet.c @@ -55,6 +55,10 @@ typedef struct HPETTimer { /* timers */ uint8_t wrap_flag; /* timer pop will indicate wrap for one-shot 32-bit * mode. Next pop will be actual timer expiration. */ +uint64_t prev_period; +uint64_t ticks_not_accounted; +uint32_t irq_rate; +uint32_t divisor; } HPETTimer; typedef struct HPETState { @@ -246,6 +250,27 @@ static int hpet_post_load(void *opaque, int version_id) return 0; } +static bool hpet_timer_driftfix_vmstate_needed(void *opaque) +{ +HPETTimer *t = opaque; + +return (t->state->driftfix != 0); +} + +static const VMStateDescription vmstate_hpet_timer_driftfix = { +.name = "hpet_timer_driftfix", +.version_id = 1, +.minimum_version_id = 1, +.minimum_version_id_old = 1, +.fields = (VMStateField []) { +VMSTATE_UINT64(prev_period, HPETTimer), +VMSTATE_UINT64(ticks_not_accounted, HPETTimer), +VMSTATE_UINT32(irq_rate, HPETTimer), +VMSTATE_UINT32(divisor, HPETTimer), +VMSTATE_END_OF_LIST() +} +}; + static const VMStateDescription vmstate_hpet_timer = { .name = "hpet_timer", .version_id = 1, @@ -260,6 +285,14 @@ static const VMStateDescription vmstate_hpet_timer = { VMSTATE_UINT8(wrap_flag, HPETTimer), VMSTATE_TIMER(qemu_timer, HPETTimer), VMSTATE_END_OF_LIST() +}, +.subsections = (VMStateSubsection []) { +{ +.vmsd = &vmstate_hpet_timer_driftfix, +.needed = hpet_timer_driftfix_vmstate_needed, +}, { +/* empty */ +} } }; -- 1.6.2.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 0/5] hpet 'driftfix': alleviate time drift with HPET periodic timers
Hi, This is version 3 of a series of patches that I originally posted in: http://lists.gnu.org/archive/html/qemu-devel/2011-03/msg01989.html http://lists.gnu.org/archive/html/qemu-devel/2011-03/msg01992.html http://lists.gnu.org/archive/html/qemu-devel/2011-03/msg01991.html http://lists.gnu.org/archive/html/qemu-devel/2011-03/msg01990.html http://article.gmane.org/gmane.comp.emulators.kvm.devel/69325 http://article.gmane.org/gmane.comp.emulators.kvm.devel/69326 http://article.gmane.org/gmane.comp.emulators.kvm.devel/69327 http://article.gmane.org/gmane.comp.emulators.kvm.devel/69328 Changes since version 2: - The vmstate related to 'driftfix' is now in a separate subsection that can be migrated conditionally. - The new fields of the HPETTimer structure are no longer initialized in hpet_init(). This is now done in hpet_reset() only. - Compensation of lost timer interrupts is now based on a backlog of unaccounted HPET clock periods instead of 'irqs_to_inject'. This eliminates the need to scale 'irqs_to_inject' when a guest o/s modifies the comparator register value. Please review and please comment. Regards, Uli Ulrich Obergfell (5): hpet 'driftfix': add hooks required to detect coalesced interrupts (x86 apic only) hpet 'driftfix': add driftfix property to HPETState and DeviceInfo hpet 'driftfix': add fields to HPETTimer and VMStateDescription hpet 'driftfix': add code in update_irq() to detect coalesced interrupts (x86 apic only) hpet 'driftfix': add code in hpet_timer() to compensate delayed callbacks and coalesced interrupts hw/apic.c |4 ++ hw/hpet.c | 114 ++-- sysemu.h |3 ++ vl.c |3 ++ 4 files changed, 120 insertions(+), 4 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 1/5] hpet 'driftfix': add hooks required to detect coalesced interrupts (x86 apic only)
'target_get_irq_delivered' and 'target_reset_irq_delivered' contain entry addresses of functions that are utilized by update_irq() to detect coalesced interrupts. apic code loads these pointers during initialization. This change can be replaced if a generic feedback infrastructure to track coalesced IRQs for periodic, clock providing devices becomes available. Signed-off-by: Ulrich Obergfell --- hw/apic.c |4 sysemu.h |3 +++ vl.c |3 +++ 3 files changed, 10 insertions(+), 0 deletions(-) diff --git a/hw/apic.c b/hw/apic.c index a45b57f..eb0f6d9 100644 --- a/hw/apic.c +++ b/hw/apic.c @@ -24,6 +24,7 @@ #include "sysbus.h" #include "trace.h" #include "kvm.h" +#include "sysemu.h" /* APIC Local Vector Table */ #define APIC_LVT_TIMER 0 @@ -1143,6 +1144,9 @@ static SysBusDeviceInfo apic_info = { static void apic_register_devices(void) { +target_get_irq_delivered = apic_get_irq_delivered; +target_reset_irq_delivered = apic_reset_irq_delivered; + sysbus_register_withprop(&apic_info); } diff --git a/sysemu.h b/sysemu.h index 07d85cd..75b0139 100644 --- a/sysemu.h +++ b/sysemu.h @@ -98,6 +98,9 @@ int qemu_savevm_state_complete(Monitor *mon, QEMUFile *f); void qemu_savevm_state_cancel(Monitor *mon, QEMUFile *f); int qemu_loadvm_state(QEMUFile *f); +extern int (*target_get_irq_delivered)(void); +extern void (*target_reset_irq_delivered)(void); + /* SLIRP */ void do_info_slirp(Monitor *mon); diff --git a/vl.c b/vl.c index 0c24e07..7bab315 100644 --- a/vl.c +++ b/vl.c @@ -233,6 +233,9 @@ const char *prom_envs[MAX_PROM_ENVS]; const char *nvram = NULL; int boot_menu; +int (*target_get_irq_delivered)(void) = 0; +void (*target_reset_irq_delivered)(void) = 0; + typedef struct FWBootEntry FWBootEntry; struct FWBootEntry { -- 1.6.2.5 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 5/5] hpet 'driftfix': add code in hpet_timer() to compensate delayed callbacks and coalesced interrupts
Loss of periodic timer interrupts caused by delayed callbacks and by interrupt coalescing is compensated by gradually injecting additional interrupts during subsequent timer intervals, starting at a rate of one additional interrupt per interval. The injection of additional interrupts is based on a backlog of unaccounted HPET clock periods (new HPETTimer field 'ticks_not_accounted'). The backlog increases due to delayed callbacks and coalesced interrupts, and it decreases if an interrupt was injected successfully. If the backlog increases while compensation is still in progress, the rate at which additional interrupts are injected is increased too. A limit is imposed on the backlog and on the rate. Injecting additional timer interrupts to compensate lost interrupts can alleviate long term time drift. However, on a short time scale, this method can have the side effect of making virtual machine time intermittently pass slower and faster than real time (depending on the guest's time keeping algorithm). Compensation is disabled by default and can be enabled for guests where this behaviour may be acceptable. Signed-off-by: Ulrich Obergfell --- hw/hpet.c | 63 +++- 1 files changed, 61 insertions(+), 2 deletions(-) diff --git a/hw/hpet.c b/hw/hpet.c index 35466ae..92d5f58 100644 --- a/hw/hpet.c +++ b/hw/hpet.c @@ -32,6 +32,7 @@ #include "sysbus.h" #include "mc146818rtc.h" #include "sysemu.h" +#include //#define HPET_DEBUG #ifdef HPET_DEBUG @@ -42,6 +43,9 @@ #define HPET_MSI_SUPPORT0 +#define MAX_TICKS_NOT_ACCOUNTED (uint64_t)5 /* 5 sec */ +#define MAX_IRQ_RATE(uint32_t)10 + struct HPETState; typedef struct HPETTimer { /* timers */ uint8_t tn; /*timer number*/ @@ -326,28 +330,63 @@ static const VMStateDescription vmstate_hpet = { } }; +static bool hpet_timer_has_tick_backlog(HPETTimer *t) +{ +uint64_t backlog = t->ticks_not_accounted - (t->period + t->prev_period); +return (backlog >= t->period); +} + /* * timer expiration callback */ static void hpet_timer(void *opaque) { HPETTimer *t = opaque; +HPETState *s = t->state; uint64_t diff; - +int irq_delivered = 0; +uint32_t irq_count = 0; uint64_t period = t->period; uint64_t cur_tick = hpet_get_ticks(t->state); +if (s->driftfix && !t->ticks_not_accounted) { +t->ticks_not_accounted = t->prev_period = t->period; +} if (timer_is_periodic(t) && period != 0) { if (t->config & HPET_TN_32BIT) { while (hpet_time_after(cur_tick, t->cmp)) { t->cmp = (uint32_t)(t->cmp + t->period); +t->ticks_not_accounted += t->period; +irq_count++; } } else { while (hpet_time_after64(cur_tick, t->cmp)) { t->cmp += period; +t->ticks_not_accounted += period; +irq_count++; } } diff = hpet_calculate_diff(t, cur_tick); +if (s->driftfix) { +if (t->ticks_not_accounted > MAX_TICKS_NOT_ACCOUNTED) { +t->ticks_not_accounted = t->period + t->prev_period; +} +if (hpet_timer_has_tick_backlog(t)) { +if (t->irq_rate == 1 || irq_count > 1) { +t->irq_rate++; +t->irq_rate = MIN(t->irq_rate, MAX_IRQ_RATE); +} +if (t->divisor == 0) { +assert(irq_count); +} +if (irq_count) { +t->divisor = t->irq_rate; +} +diff /= t->divisor--; +} else { +t->irq_rate = 1; +} +} qemu_mod_timer(t->qemu_timer, qemu_get_clock_ns(vm_clock) + (int64_t)ticks_to_ns(diff)); } else if (t->config & HPET_TN_32BIT && !timer_is_periodic(t)) { @@ -358,7 +397,22 @@ static void hpet_timer(void *opaque) t->wrap_flag = 0; } } -update_irq(t, 1); +if (s->driftfix && timer_is_periodic(t) && period != 0) { +if (t->ticks_not_accounted >= t->period + t->prev_period) { +irq_delivered = update_irq(t, 1); +if (irq_delivered) { +t->ticks_not_accounted -= t->prev_period; +t->prev_period = t->period; +} else { +if (irq_count) { +t->irq_rate++; +t->irq_rate = MIN(t->irq_rate, MAX_IRQ_RATE); +} +} +} +} else { +update_irq(t, 1); +} } static void hpet_set_timer(HPETTimer *t) @@ -697,6 +751,11 @@ static void hpet_reset(DeviceState *d) timer->config |= 0x0004ULL << 32; timer->period = 0ULL; timer->wrap_flag = 0; + +timer->prev_period = 0; +timer->ticks_not_accounted = 0; +
Re: nmi is broken?
OGAWA Hirofumi writes: > Avi Kivity writes: > >>> This seems to be complex stuff depending on hardware configurations. I'm >>> not fully understanding though, current state of it is, >>> >>> Yes, PIC is in AEOI mode if linux is using IO-APIC. Um.., kvm says >>> irq == 0 is mp_INT mode in MADT, not mp_ExtINT. >> >> That is correct, kvm doesn't connect the master 8259 output to the >> IOAPIC. Instead the 8259 is connected to LINT0 (which is configured for >> ExtInt when the IOAPIC is disabled, or for NMI which the NMI watchdog is >> enabled). >> >> However, now I can't see how it would work. auto EOI works on the INTA >> cycle, which would only occur if LINT0 is configured for ExtInt. If it >> is configured for NMI, I don't think it would issue the INTA cycle. So >> the NMI watchdog not working is actually correct for our hardware >> configuration! >> >> But I may be misunderstanding something here. > > I see. If the physical machine was configured as above, I guess (not > pretty sure, I don't have this configuration machine), IOAPIC test > (check_timer() in io_apic.c) should fail, and IOAPIC wouldn't have any > effect. And I think MADT should tell mp_ExtINT. > > Yes, I also guess the above configuration wouldn't work NMI watchdog of > IOAPIC mode, and linux will report as NMI watchdog can't work in > check_timer(). Hm.., if smp was enabled, what configuration model is used by kvm? I think this configuration model can't work on smp. Thanks. -- OGAWA Hirofumi -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: nmi is broken?
Avi Kivity writes: >> This seems to be complex stuff depending on hardware configurations. I'm >> not fully understanding though, current state of it is, >> >> Yes, PIC is in AEOI mode if linux is using IO-APIC. Um.., kvm says >> irq == 0 is mp_INT mode in MADT, not mp_ExtINT. > > That is correct, kvm doesn't connect the master 8259 output to the > IOAPIC. Instead the 8259 is connected to LINT0 (which is configured for > ExtInt when the IOAPIC is disabled, or for NMI which the NMI watchdog is > enabled). > > However, now I can't see how it would work. auto EOI works on the INTA > cycle, which would only occur if LINT0 is configured for ExtInt. If it > is configured for NMI, I don't think it would issue the INTA cycle. So > the NMI watchdog not working is actually correct for our hardware > configuration! > > But I may be misunderstanding something here. I see. If the physical machine was configured as above, I guess (not pretty sure, I don't have this configuration machine), IOAPIC test (check_timer() in io_apic.c) should fail, and IOAPIC wouldn't have any effect. And I think MADT should tell mp_ExtINT. Yes, I also guess the above configuration wouldn't work NMI watchdog of IOAPIC mode, and linux will report as NMI watchdog can't work in check_timer(). >> So I guess system does >> automatically INTA cycle (and AEOI because of PIC config), or not >> connected via 8259A? (like in mpspec figure 5-2.) >> >> To checking it, I've tested in check_timer() of linux on the physical >> machine (irq==0 and mp_INT). The test is something like, >> >> if (pin1 != -1) { >> /* >> * Ok, does IRQ0 through the IOAPIC work? >> */ >> unmask_IO_APIC_irq(0); >> disable_8259_irq(0); >> if (timer_irq_works()) { >> >> even if I called disable_8259_irq(0), timer was still working via >> IO-APIC. >> >> Would this explain why? > > Sorry, I got lost - what does this explain? I think this explains irq == 0 and mp_INT configuration tell PIT is connected to both of IOAPIC pin2 and PIC pin0, and it is why timer interrupt is working even when PIC pin0 was disabled. Thanks. -- OGAWA Hirofumi -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/6] kvm tools: Use threadpool for virtio-net
virtio-net has been converted to use the threadpool. This is very similar to the change done in virtio-blk, only here we had 2 queues to handle. Signed-off-by: Sasha Levin --- tools/kvm/virtio-net.c | 101 1 files changed, 25 insertions(+), 76 deletions(-) diff --git a/tools/kvm/virtio-net.c b/tools/kvm/virtio-net.c index 3e13429..58b3de4 100644 --- a/tools/kvm/virtio-net.c +++ b/tools/kvm/virtio-net.c @@ -7,6 +7,7 @@ #include "kvm/util.h" #include "kvm/kvm.h" #include "kvm/pci.h" +#include "kvm/threadpool.h" #include #include @@ -40,16 +41,9 @@ struct net_device { uint8_t status; uint16_tqueue_selector; - pthread_t io_rx_thread; - pthread_mutex_t io_rx_mutex; - pthread_cond_t io_rx_cond; - - pthread_t io_tx_thread; - pthread_mutex_t io_tx_mutex; - pthread_cond_t io_tx_cond; - int tap_fd; chartap_name[IFNAMSIZ]; + void*jobs[VIRTIO_NET_NUM_QUEUES]; }; static struct net_device net_device = { @@ -69,70 +63,44 @@ static struct net_device net_device = { 1UL << VIRTIO_NET_F_GUEST_TSO6, }; -static void *virtio_net_rx_thread(void *p) +static void virtio_net_rx_callback(struct kvm *self, void *param) { struct iovec iov[VIRTIO_NET_QUEUE_SIZE]; struct virt_queue *vq; - struct kvm *self; uint16_t out, in; uint16_t head; int len; - self = p; - vq = &net_device.vqs[VIRTIO_NET_RX_QUEUE]; - - while (1) { - mutex_lock(&net_device.io_rx_mutex); - if (!virt_queue__available(vq)) - pthread_cond_wait(&net_device.io_rx_cond, &net_device.io_rx_mutex); - mutex_unlock(&net_device.io_rx_mutex); - - while (virt_queue__available(vq)) { - head = virt_queue__get_iov(vq, iov, &out, &in, self); - len = readv(net_device.tap_fd, iov, in); - virt_queue__set_used_elem(vq, head, len); - /* We should interrupt guest right now, otherwise latency is huge. */ - kvm__irq_line(self, VIRTIO_NET_IRQ, 1); - } + vq = param; + while (virt_queue__available(vq)) { + head = virt_queue__get_iov(vq, iov, &out, &in, self); + len = readv(net_device.tap_fd, iov, in); + virt_queue__set_used_elem(vq, head, len); } - pthread_exit(NULL); - return NULL; - + kvm__irq_line(self, VIRTIO_NET_IRQ, 1); } -static void *virtio_net_tx_thread(void *p) +static void virtio_net_tx_callback(struct kvm *self, void *param) { struct iovec iov[VIRTIO_NET_QUEUE_SIZE]; struct virt_queue *vq; - struct kvm *self; uint16_t out, in; uint16_t head; int len; - self = p; - vq = &net_device.vqs[VIRTIO_NET_TX_QUEUE]; - - while (1) { - mutex_lock(&net_device.io_tx_mutex); - if (!virt_queue__available(vq)) - pthread_cond_wait(&net_device.io_tx_cond, &net_device.io_tx_mutex); - mutex_unlock(&net_device.io_tx_mutex); + vq = param; - while (virt_queue__available(vq)) { - head = virt_queue__get_iov(vq, iov, &out, &in, self); - len = writev(net_device.tap_fd, iov, out); - virt_queue__set_used_elem(vq, head, len); - } - - kvm__irq_line(self, VIRTIO_NET_IRQ, 1); + while (virt_queue__available(vq)) { + head = virt_queue__get_iov(vq, iov, &out, &in, self); + len = writev(net_device.tap_fd, iov, out); + virt_queue__set_used_elem(vq, head, len); } - pthread_exit(NULL); - return NULL; - + kvm__irq_line(self, VIRTIO_NET_IRQ, 1); } + static bool virtio_net_pci_io_device_specific_in(void *data, unsigned long offset, int size, uint32_t count) { uint8_t *config_space = (uint8_t *) &net_device.net_config; @@ -193,19 +161,7 @@ static bool virtio_net_pci_io_in(struct kvm *self, uint16_t port, void *data, in static void virtio_net_handle_callback(struct kvm *self, uint16_t queue_index) { - if (queue_index == VIRTIO_NET_TX_QUEUE) { - - mutex_lock(&net_device.io_tx_mutex); - pthread_cond_signal(&net_device.io_tx_cond); - mutex_unlock(&net_device.io_tx_mutex); - - } else if (queue_index == VIRTIO_NET_RX_QUEUE) { - - mutex_lock(&net_device.io_rx_mutex); - pthread_cond_signal(&net_device.io_rx_cond); - mutex_unlock(&net_device.
[PATCH 3/6] kvm tools: Introduce generic IO threadpool
This patch adds a generic pool to create a common interface for working with threads within the kvm tool. Main idea here is using this threadpool for all I/O threads instead of having every I/O module write it's own thread code. The process of working with the thread pool is supposed to be very simple. During initialization, Each module which is interested in working with the threadpool will call threadpool__add_jobtype with the callback function and a void* parameter. For example, virtio modules will register every virt_queue as a new job type. During operation, When theres work to do for a specific job, the module will signal it to the queue and would expect the callback to be called with proper parameters. It is assured that the callback will be called once for every signal action and each callback will be called only once at a time (i.e. callback functions themselves don't need to handle threading). Signed-off-by: Sasha Levin --- tools/kvm/Makefile |1 + tools/kvm/include/kvm/threadpool.h | 16 tools/kvm/kvm-run.c|5 + tools/kvm/threadpool.c | 171 4 files changed, 193 insertions(+), 0 deletions(-) create mode 100644 tools/kvm/include/kvm/threadpool.h create mode 100644 tools/kvm/threadpool.c diff --git a/tools/kvm/Makefile b/tools/kvm/Makefile index 1b0c76e..fbce14d 100644 --- a/tools/kvm/Makefile +++ b/tools/kvm/Makefile @@ -36,6 +36,7 @@ OBJS+= kvm-cmd.o OBJS+= kvm-run.o OBJS+= qcow.o OBJS+= mptable.o +OBJS+= threadpool.o DEPS := $(patsubst %.o,%.d,$(OBJS)) diff --git a/tools/kvm/include/kvm/threadpool.h b/tools/kvm/include/kvm/threadpool.h new file mode 100644 index 000..25b5eb8 --- /dev/null +++ b/tools/kvm/include/kvm/threadpool.h @@ -0,0 +1,16 @@ +#ifndef KVM__THREADPOOL_H +#define KVM__THREADPOOL_H + +#include + +struct kvm; + +typedef void (*kvm_thread_callback_fn_t)(struct kvm *kvm, void *data); + +int thread_pool__init(unsigned long thread_count); + +void *thread_pool__add_jobtype(struct kvm *kvm, kvm_thread_callback_fn_t callback, void *data); + +void thread_pool__signal_work(void *job); + +#endif diff --git a/tools/kvm/kvm-run.c b/tools/kvm/kvm-run.c index 071157a..97a17dd 100644 --- a/tools/kvm/kvm-run.c +++ b/tools/kvm/kvm-run.c @@ -24,6 +24,7 @@ #include #include #include +#include /* header files for gitish interface */ #include @@ -312,6 +313,7 @@ int kvm_cmd_run(int argc, const char **argv, const char *prefix) int i; struct virtio_net_parameters net_params; char *hi; + unsigned int nr_online_cpus; signal(SIGALRM, handle_sigalrm); signal(SIGQUIT, handle_sigquit); @@ -457,6 +459,9 @@ int kvm_cmd_run(int argc, const char **argv, const char *prefix) kvm__init_ram(kvm); + nr_online_cpus = sysconf(_SC_NPROCESSORS_ONLN); + thread_pool__init(nr_online_cpus); + for (i = 0; i < nrcpus; i++) { if (pthread_create(&kvm_cpus[i]->thread, NULL, kvm_cpu_thread, kvm_cpus[i]) != 0) die("unable to create KVM VCPU thread"); diff --git a/tools/kvm/threadpool.c b/tools/kvm/threadpool.c new file mode 100644 index 000..e78db3a --- /dev/null +++ b/tools/kvm/threadpool.c @@ -0,0 +1,171 @@ +#include "kvm/threadpool.h" +#include "kvm/mutex.h" + +#include +#include +#include +#include + +struct thread_pool__job_info { + kvm_thread_callback_fn_t callback; + struct kvm *kvm; + void *data; + + int signalcount; + pthread_mutex_t mutex; + + struct list_head queue; +}; + +static pthread_mutex_t job_mutex = PTHREAD_MUTEX_INITIALIZER; +static pthread_mutex_t thread_mutex= PTHREAD_MUTEX_INITIALIZER; +static pthread_cond_t job_cond= PTHREAD_COND_INITIALIZER; + +static LIST_HEAD(head); + +static pthread_t *threads; +static longthreadcount; + +static struct thread_pool__job_info *thread_pool__job_info_pop(void) +{ + struct thread_pool__job_info *job; + + if (list_empty(&head)) + return NULL; + + job = list_first_entry(&head, struct thread_pool__job_info, queue); + list_del(&job->queue); + + return job; +} + +static void thread_pool__job_info_push(struct thread_pool__job_info *job) +{ + list_add_tail(&job->queue, &head); +} + +static struct thread_pool__job_info *thread_pool__job_info_pop_locked(void) +{ + struct thread_pool__job_info *job; + + mutex_lock(&job_mutex); + job = thread_pool__job_info_pop(); + mutex_unlock(&job_mutex); + return job; +} + +static void thread_pool__job_info_push_locked(struct thread_pool__job_info *job) +{ + mutex_lock(&job_mutex); + thread_pool__job_info_push(job); + mutex_unlock(&job_mutex); +} + +static void thread_pool__handle_job(struct thread_pool__job_info *job) +{ + while (job) { + job->cal
[PATCH 5/6] kvm tools: Use threadpool for virtio-console.
This is very similar to the change done in virtio-net. Notice that one signal here comes from outside the module (actual terminal) while the other one is generated by the virtio module. Signed-off-by: Sasha Levin --- tools/kvm/virtio-console.c | 40 ++-- 1 files changed, 26 insertions(+), 14 deletions(-) diff --git a/tools/kvm/virtio-console.c b/tools/kvm/virtio-console.c index f11ce4e..e66d198 100644 --- a/tools/kvm/virtio-console.c +++ b/tools/kvm/virtio-console.c @@ -8,6 +8,7 @@ #include "kvm/mutex.h" #include "kvm/kvm.h" #include "kvm/pci.h" +#include "kvm/threadpool.h" #include #include @@ -41,6 +42,8 @@ struct console_device { uint16_tconfig_vector; uint8_t status; uint16_tqueue_selector; + + void*jobs[VIRTIO_CONSOLE_NUM_QUEUES]; }; static struct console_device console_device = { @@ -58,7 +61,7 @@ static struct console_device console_device = { /* * Interrupts are injected for hvc0 only. */ -void virtio_console__inject_interrupt(struct kvm *self) +static void virtio_console__inject_interrupt_callback(struct kvm *self, void *param) { struct iovec iov[VIRTIO_CONSOLE_QUEUE_SIZE]; struct virt_queue *vq; @@ -68,7 +71,7 @@ void virtio_console__inject_interrupt(struct kvm *self) mutex_lock(&console_device.mutex); - vq = &console_device.vqs[VIRTIO_CONSOLE_RX_QUEUE]; + vq = param; if (term_readable(CONSOLE_VIRTIO) && virt_queue__available(vq)) { head = virt_queue__get_iov(vq, iov, &out, &in, self); @@ -80,6 +83,11 @@ void virtio_console__inject_interrupt(struct kvm *self) mutex_unlock(&console_device.mutex); } +void virtio_console__inject_interrupt(struct kvm *self) +{ + thread_pool__signal_work(console_device.jobs[VIRTIO_CONSOLE_RX_QUEUE]); +} + static bool virtio_console_pci_io_device_specific_in(void *data, unsigned long offset, int size, uint32_t count) { uint8_t *config_space = (uint8_t *) &console_device.console_config; @@ -138,7 +146,7 @@ static bool virtio_console_pci_io_in(struct kvm *self, uint16_t port, void *data return ret; } -static void virtio_console_handle_callback(struct kvm *self, uint16_t queue_index) +static void virtio_console_handle_callback(struct kvm *self, void *param) { struct iovec iov[VIRTIO_CONSOLE_QUEUE_SIZE]; struct virt_queue *vq; @@ -146,18 +154,15 @@ static void virtio_console_handle_callback(struct kvm *self, uint16_t queue_inde uint16_t head; uint32_t len; - vq = &console_device.vqs[queue_index]; - - if (queue_index == VIRTIO_CONSOLE_TX_QUEUE) { + vq = param; - while (virt_queue__available(vq)) { - head = virt_queue__get_iov(vq, iov, &out, &in, self); - len = term_putc_iov(CONSOLE_VIRTIO, iov, out); - virt_queue__set_used_elem(vq, head, len); - } - - kvm__irq_line(self, VIRTIO_CONSOLE_IRQ, 1); + while (virt_queue__available(vq)) { + head = virt_queue__get_iov(vq, iov, &out, &in, self); + len = term_putc_iov(CONSOLE_VIRTIO, iov, out); + virt_queue__set_used_elem(vq, head, len); } + + kvm__irq_line(self, VIRTIO_CONSOLE_IRQ, 1); } static bool virtio_console_pci_io_out(struct kvm *self, uint16_t port, void *data, int size, uint32_t count) @@ -183,6 +188,13 @@ static bool virtio_console_pci_io_out(struct kvm *self, uint16_t port, void *dat vring_init(&queue->vring, VIRTIO_CONSOLE_QUEUE_SIZE, p, 4096); + if (console_device.queue_selector == VIRTIO_CONSOLE_TX_QUEUE) + console_device.jobs[console_device.queue_selector] = + thread_pool__add_jobtype(self, virtio_console_handle_callback, queue); + else if (console_device.queue_selector == VIRTIO_CONSOLE_RX_QUEUE) + console_device.jobs[console_device.queue_selector] = + thread_pool__add_jobtype(self, virtio_console__inject_interrupt_callback, queue); + break; } case VIRTIO_PCI_QUEUE_SEL: @@ -191,7 +203,7 @@ static bool virtio_console_pci_io_out(struct kvm *self, uint16_t port, void *dat case VIRTIO_PCI_QUEUE_NOTIFY: { uint16_t queue_index; queue_index = ioport__read16(data); - virtio_console_handle_callback(self, queue_index); + thread_pool__signal_work(console_device.jobs[queue_index]); break; } case VIRTIO_PCI_STATUS: -- 1.7.5.rc3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/6] kvm tools: Use threadpool for virtio-blk
virtio-blk has been converted to use the threadpool. All the threading code has been removed, which left only simple callback handling code. New threadpool job types are created within VIRTIO_PCI_QUEUE_PFN for every queue (just one in the case of virtio-blk). The module signals for work after receiving VIRTIO_PCI_QUEUE_NOTIFY and expects the threadpool to call virtio_blk_do_io to handle the I/O. It is possible that the module will signal work several times while virtio_blk_do_io is already working, but there is no need to handle multithreading there since the threadpool will call each job in linear and not in parallel. Signed-off-by: Sasha Levin --- tools/kvm/virtio-blk.c | 86 +++- 1 files changed, 12 insertions(+), 74 deletions(-) diff --git a/tools/kvm/virtio-blk.c b/tools/kvm/virtio-blk.c index 3516b1c..3feabd0 100644 --- a/tools/kvm/virtio-blk.c +++ b/tools/kvm/virtio-blk.c @@ -9,6 +9,7 @@ #include "kvm/util.h" #include "kvm/kvm.h" #include "kvm/pci.h" +#include "kvm/threadpool.h" #include #include @@ -31,15 +32,13 @@ struct blk_device { uint32_tguest_features; uint16_tconfig_vector; uint8_t status; - pthread_t io_thread; - pthread_mutex_t io_mutex; - pthread_cond_t io_cond; /* virtio queue */ uint16_tqueue_selector; - uint64_tvirtio_blk_queue_set_flags; struct virt_queue vqs[NUM_VIRT_QUEUES]; + + void*jobs[NUM_VIRT_QUEUES]; }; #define DISK_SEG_MAX 126 @@ -57,9 +56,6 @@ static struct blk_device blk_device = { * same applies to VIRTIO_BLK_F_BLK_SIZE */ .host_features = (1UL << VIRTIO_BLK_F_SEG_MAX), - - .io_mutex = PTHREAD_MUTEX_INITIALIZER, - .io_cond= PTHREAD_COND_INITIALIZER }; static bool virtio_blk_pci_io_device_specific_in(void *data, unsigned long offset, int size, uint32_t count) @@ -156,73 +152,14 @@ static bool virtio_blk_do_io_request(struct kvm *self, struct virt_queue *queue) return true; } - - -static int virtio_blk_get_selected_queue(struct blk_device *dev) -{ - int i; - - for (i = 0 ; i < NUM_VIRT_QUEUES ; i++) { - if (dev->virtio_blk_queue_set_flags & (1 << i)) { - dev->virtio_blk_queue_set_flags &= ~(1 << i); - return i; - } - } - - return -1; -} - -static void virtio_blk_do_io(struct kvm *kvm, struct blk_device *dev) +static void virtio_blk_do_io(struct kvm *kvm, void *param) { - for (;;) { - struct virt_queue *vq; - int queue_index; - - mutex_lock(&dev->io_mutex); - queue_index = virtio_blk_get_selected_queue(dev); - mutex_unlock(&dev->io_mutex); - - if (queue_index < 0) - break; + struct virt_queue *vq = param; - vq = &dev->vqs[queue_index]; + while (virt_queue__available(vq)) + virtio_blk_do_io_request(kvm, vq); - while (virt_queue__available(vq)) - virtio_blk_do_io_request(kvm, vq); - - kvm__irq_line(kvm, VIRTIO_BLK_IRQ, 1); - } -} - -static void *virtio_blk_io_thread(void *ptr) -{ - struct kvm *self = ptr; - - for (;;) { - int ret; - - mutex_lock(&blk_device.io_mutex); - ret = pthread_cond_wait(&blk_device.io_cond, &blk_device.io_mutex); - mutex_unlock(&blk_device.io_mutex); - - if (ret != 0) - break; - - virtio_blk_do_io(self, &blk_device); - } - - return NULL; -} - -static void virtio_blk_handle_callback(struct blk_device *dev, uint16_t queue_index) -{ - mutex_lock(&dev->io_mutex); - - dev->virtio_blk_queue_set_flags |= (1 << queue_index); - - mutex_unlock(&dev->io_mutex); - - pthread_cond_signal(&dev->io_cond); + kvm__irq_line(kvm, VIRTIO_BLK_IRQ, 1); } static bool virtio_blk_pci_io_out(struct kvm *self, uint16_t port, void *data, int size, uint32_t count) @@ -250,6 +187,9 @@ static bool virtio_blk_pci_io_out(struct kvm *self, uint16_t port, void *data, i vring_init(&queue->vring, VIRTIO_BLK_QUEUE_SIZE, p, 4096); + blk_device.jobs[blk_device.queue_selector] = + thread_pool__add_jobtype(self, virtio_blk_do_io, queue); + break; } case VIRTIO_PCI_QUEUE_SEL: @@ -258,7 +198,7 @@ static bool virtio_blk_pci_io_out(struct kvm *self, uint16_t port, void *data, i case VIRTIO_PCI_QUEUE_NOTIFY: { uint16_t queue_index; queue_index = i
[PATCH 2/6] kvm tools: Add kernel headers required for using list
Adds kernel headers so that (and others) could be included directly. Signed-off-by: Sasha Levin --- tools/kvm/include/linux/kernel.h | 26 ++ tools/kvm/include/linux/prefetch.h |6 ++ tools/kvm/include/linux/types.h| 12 3 files changed, 44 insertions(+), 0 deletions(-) create mode 100644 tools/kvm/include/linux/kernel.h create mode 100644 tools/kvm/include/linux/prefetch.h diff --git a/tools/kvm/include/linux/kernel.h b/tools/kvm/include/linux/kernel.h new file mode 100644 index 000..8d83037 --- /dev/null +++ b/tools/kvm/include/linux/kernel.h @@ -0,0 +1,26 @@ +#ifndef KVM__LINUX_KERNEL_H_ +#define KVM__LINUX_KERNEL_H_ + +#define DIV_ROUND_UP(n,d) (((n) + (d) - 1) / (d)) + +#define ALIGN(x,a) __ALIGN_MASK(x,(typeof(x))(a)-1) +#define __ALIGN_MASK(x,mask) (((x)+(mask))&~(mask)) + +#ifndef offsetof +#define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER) +#endif + +#ifndef container_of +/** + * container_of - cast a member of a structure out to the containing structure + * @ptr: the pointer to the member. + * @type: the type of the container struct this is embedded in. + * @member:the name of the member within the struct. + * + */ +#define container_of(ptr, type, member) ({ \ + const typeof(((type *)0)->member) * __mptr = (ptr); \ + (type *)((char *)__mptr - offsetof(type, member)); }) +#endif + +#endif diff --git a/tools/kvm/include/linux/prefetch.h b/tools/kvm/include/linux/prefetch.h new file mode 100644 index 000..62f6788 --- /dev/null +++ b/tools/kvm/include/linux/prefetch.h @@ -0,0 +1,6 @@ +#ifndef KVM__LINUX_PREFETCH_H +#define KVM__LINUX_PREFETCH_H + +static inline void prefetch(void *a __attribute__((unused))) { } + +#endif diff --git a/tools/kvm/include/linux/types.h b/tools/kvm/include/linux/types.h index efd8519..c7c444e 100644 --- a/tools/kvm/include/linux/types.h +++ b/tools/kvm/include/linux/types.h @@ -46,4 +46,16 @@ typedef __u32 __bitwise __be32; typedef __u64 __bitwise __le64; typedef __u64 __bitwise __be64; +struct list_head { + struct list_head *next, *prev; +}; + +struct hlist_head { + struct hlist_node *first; +}; + +struct hlist_node { + struct hlist_node *next, **pprev; +}; + #endif /* LINUX_TYPES_H */ -- 1.7.5.rc3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/6] kvm tools: Prevent duplicate definitions of ALIGN
Signed-off-by: Sasha Levin --- tools/kvm/include/kvm/bios.h |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/tools/kvm/include/kvm/bios.h b/tools/kvm/include/kvm/bios.h index dd70c44..914720b 100644 --- a/tools/kvm/include/kvm/bios.h +++ b/tools/kvm/include/kvm/bios.h @@ -51,8 +51,10 @@ #define MB_BIOS_SS 0xfff7 #define MB_BIOS_SP 0x40 +#ifndef ALIGN #define ALIGN(x, a)\ (((x) + ((a) - 1)) & ~((a) - 1)) +#endif /* * note we use 16 bytes alignment which makes segment based -- 1.7.5.rc3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: VMX: Avoid reading %rip unnecessarily when handling exceptions
Avoids a VMREAD. Signed-off-by: Avi Kivity --- arch/x86/kvm/vmx.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 3f6e9bf..139a5cb 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -3170,7 +3170,6 @@ static int handle_exception(struct kvm_vcpu *vcpu) } error_code = 0; - rip = kvm_rip_read(vcpu); if (intr_info & INTR_INFO_DELIVER_CODE_MASK) error_code = vmcs_read32(VM_EXIT_INTR_ERROR_CODE); if (is_page_fault(intr_info)) { @@ -3217,6 +3216,7 @@ static int handle_exception(struct kvm_vcpu *vcpu) vmx->vcpu.arch.event_exit_inst_len = vmcs_read32(VM_EXIT_INSTRUCTION_LEN); kvm_run->exit_reason = KVM_EXIT_DEBUG; + rip = kvm_rip_read(vcpu); kvm_run->debug.arch.pc = vmcs_readl(GUEST_CS_BASE) + rip; kvm_run->debug.arch.exception = ex_no; break; -- 1.7.4.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -v3] Monitor command: x-gpa2hva, translate guest physical address to host virtual address
On 04/27/2011 06:06 PM, Marcelo Tosatti wrote: On Fri, Nov 19, 2010 at 04:17:35PM +0800, Huang Ying wrote: On Tue, 2010-11-16 at 10:23 +0800, Huang Ying wrote: Author: Max Asbock Add command x-gpa2hva to translate guest physical address to host virtual address. Because gpa to hva translation is not consistent, so this command is only used for debugging. The x-gpa2hva command provides one step in a chain of translations from guest virtual to guest physical to host virtual to host physical. Host physical is then used to inject a machine check error. As a consequence the HWPOISON code on the host and the MCE injection code in qemu-kvm are exercised. v3: - Rename to x-gpa2hva - Remove QMP support, because gpa2hva is not consistent Is this patch an acceptable solution for now? This command is useful for our testing. Anthony? Yeah, but it should come through qemu-devel, no? Regards, Anthony Liguori Best Regards, Huang Ying -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [CentOS] Install CentOS as KVM guest
On Thu, Apr 28, 2011 at 07:46:28PM +0800, Emmanuel Noobadmin wrote: > On 4/28/11, Gleb Natapov wrote: > > So why don't you use virt-manager? > > The original intention was to run the host without any graphical > desktop or anything not necessary to host the guests. That was based > on reading and such which recommends not having anything beyond the > necessary to minimize potential security problems and maximize > resources available. > Qemu is not intended to be used directly by end user. It is too complex as you already found out. VMware don't even give you access to such low parts of virt stack. You should use libvirt or virt-manager instead. Especially if you are concerned about security. I think libvirt can start guest on headless server. If this still fails for you you need to complain to libvirt developers (not in a rant mode, but providing details of what exact version of software you have problem with and what are you trying to do). And libvirt developers will not be shy to complain to qemu developers if the problem turned to be on their side. > Then there were those pages which warn that virt-manager didn't work > too well if bridged networking was required. As far as I know libvirt has no problem using bridged networking and virt-manager use libvirt so it should work if you use new enough virt stack, but you should ask on libvirt mailing list instead. > > Last but not least, when I finally gave up and installed the desktop, > virt-manager couldn't find the hypervisor. Checking up, it appears > that the user needed additional permissions to certain files, which > after given and tested via CLI, I still get errors. What distribution are you using? I didn't saw recent Fedoras having such problems (admittedly I do not use libvirt/virt-manager much). > > Starting up X as root gave me this ominous warning that I really > shouldn't be doing this and since I didn't think it was wise in the > first place to have the desktop with root access running on what's > supposed to be a production machine, I stopped trying that route and > went back to figuring how to get virt-install to work. -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH -v3] Monitor command: x-gpa2hva, translate guest physical address to host virtual address
On Fri, Nov 19, 2010 at 04:17:35PM +0800, Huang Ying wrote: > On Tue, 2010-11-16 at 10:23 +0800, Huang Ying wrote: > > Author: Max Asbock > > > > Add command x-gpa2hva to translate guest physical address to host > > virtual address. Because gpa to hva translation is not consistent, so > > this command is only used for debugging. > > > > The x-gpa2hva command provides one step in a chain of translations from > > guest virtual to guest physical to host virtual to host physical. Host > > physical is then used to inject a machine check error. As a > > consequence the HWPOISON code on the host and the MCE injection code > > in qemu-kvm are exercised. > > > > v3: > > > > - Rename to x-gpa2hva > > - Remove QMP support, because gpa2hva is not consistent > > Is this patch an acceptable solution for now? This command is useful for > our testing. Anthony? > > Best Regards, > Huang Ying > > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [CentOS] Install CentOS as KVM guest
On 4/28/11, Gleb Natapov wrote: > So why don't you use virt-manager? The original intention was to run the host without any graphical desktop or anything not necessary to host the guests. That was based on reading and such which recommends not having anything beyond the necessary to minimize potential security problems and maximize resources available. Then there were those pages which warn that virt-manager didn't work too well if bridged networking was required. Last but not least, when I finally gave up and installed the desktop, virt-manager couldn't find the hypervisor. Checking up, it appears that the user needed additional permissions to certain files, which after given and tested via CLI, I still get errors. Starting up X as root gave me this ominous warning that I really shouldn't be doing this and since I didn't think it was wise in the first place to have the desktop with root access running on what's supposed to be a production machine, I stopped trying that route and went back to figuring how to get virt-install to work. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2998292 ] XP Guest Crash on Ubuntu 10.04
Bugs item #2998292, was opened at 2010-05-07 20:28 Message generated for change (Settings changed) made by jessorensen You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2998292&group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None >Status: Closed >Resolution: Invalid Priority: 7 Private: No Submitted By: cheesewright () Assigned to: Nobody/Anonymous (nobody) Summary: XP Guest Crash on Ubuntu 10.04 Initial Comment: OS: Ubuntu Server 64-bit 10.04, kernel 2.6.32 Server: Supermicro X8DTN+ 2x Xeon E5530 2.4GHz Nehalem quad core cpu, 12GB RAM Guest: Windows XP SP3 32-bit Below is my kvm command line: /usr/bin/kvm -S -M pc-0.11 -enable-kvm -m 1024 -smp 2 -name data2 -uuid 9a85ee7a-4db9-9108-df3c-61d04d3c943d -chardev socket,id=monitor,path=/var/lib/libvirt/qemu/data2.monitor,server,nowait -monitor chardev:monitor -localtime -boot c -drive file=/san/data2/winxp.img,if=ide,index=0,boot=on -drive file=/dev/cdrom,if=ide,media=cdrom,index=2 -net nic,macaddr=52:54:00:38:a8:af,vlan=0,name=nic.0 -net tap,fd=62,vlan=0,name=tap.0 -chardev pty,id=serial0 -serial chardev:serial0 -parallel none -usb -usbdevice tablet -vnc 0.0.0.0:2,password -k en-us -vga cirrus I have two XP SP3 guests which crash frequently on random events when connecting via remote desktop. The Windows Event Log reports Event ID 1003. These errors result in either a system freeze or full reboot. There are several types of errors recorded: Event Source: System Error Event Category: (102) Event ID: 1003 Error code 0005, parameter1 04c0, parameter2 8653a6c8, parameter3 , parameter4 . Event Source: System Error Event Category: (102) Event ID: 1003 Error code 000a, parameter1 3008, parameter2 0002, parameter3 , parameter4 804ff806. Event Source: System Error Event Category: (102) Event ID: 1003 Error code 100a, parameter1 3008, parameter2 0002, parameter3 , parameter4 804ff806. -- >Comment By: Jes Sorensen (jessorensen) Date: 2011-04-28 11:52 Message: This bug tracker isn't taking any new bugs, please submit your bug in https://bugs.launchpad.net/qemu/ In addition this is against a vendor kernel (Ubuntu), so please start by filing the bug with them. Thanks, Jes -- Comment By: J Snabb (epipe) Date: 2011-04-28 11:48 Message: I am seeing this problem also. Using Debian packaged qemu-kvm 0.12.5+dfsg-5 with Debian Linux kernel 2.6.32-5-amd64 on similar hardware. kvm command line is: /usr/bin/kvm -S -M pc-0.12 -enable-kvm -m 1024 -smp 2,sockets=2,cores=1,threads=1 -name -uuid ---- -nodefaults -chardev socket,id=monitor,path=/var/lib/libvirt/qemu/.monitor,server,nowait -mon chardev=monitor,mode=readline -rtc base=utc -boot c -drive file=/dev//XXX,if=none,id=drive-ide0-0-0,boot=on,format=raw -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -device rtl8139,vlan=0,id=net0,mac=XX:XX:XX:XX:XX:XX,bus=pci.0,addr=0x3 -net tap,fd=70,vlan=0,name=hostnet0 -chardev pty,id=serial0 -device isa-serial,chardev=serial0 -usb -vnc 127.0.0.1:6 -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -- Comment By: Jes Sorensen (jessorensen) Date: 2010-11-26 12:01 Message: Hi, Are you still able to reproduce this with uptodate kvm/qemu? If you are, would you mind opening a bug in https://bugs.launchpad.net/qemu and close this one ? Does this only happen when you access the systems via remote desktop? (I presume this is some windows thing?) Thanks, Jes -- Comment By: cheesewright () Date: 2010-05-07 20:32 Message: The crash also happens when using kvm -M pc-0.12. It is easily reproducible after clicking through windows such control panel apps, or web browsers, for a few minutes. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2998292&group_id=180599 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [CentOS] Install CentOS as KVM guest
On Wed, Apr 27, 2011 at 04:57:18AM +0800, Emmanuel Noobadmin wrote: > Unfortunately, things still don't work. > > > It's just ridiculous that the installer under KVM does not detect the > cdrom drive it was booted from. Trying to do a net-install doesn't > work, maybe I messed up the networking even though br0 and eth0 is > working on the host. > > Nevermind, let's install apache and use the mounted ISO. Verified > apache is working and the CentOS folder is accessible via web browser. > But, still the guest installer cannot seem to access the installation > files. > > OK, so maybe I messed up the networking, after all I am the noob... > maybe specifying --network=bridge:br0 isn't enough. There was So why don't you use virt-manager? > something about a tap or tunnel when initially reading up on bridged > networking. Looking up more on this, there are several resources > (sorry KVM FAQ leads to a page that no longer exist) which like many > other instructions, give the commands without really explaining > what/why. > > So I have to run some tunctl command and scripts to add a bridge and > tunnel/tap... but wait, I already made my bridge, will running the > script kill my networking by creating a second bridge? Especially the > warning about temporarily loosing connectivity due to ifcfg1 being > reset. > > And if I need to run this script everytime in order to activate the > bridge and tunnel, doesn't that mean all my guest OS are screwed if > the host reboots while I'm not around? Shouldn't these things go into > permanent files like if-tun0 or something? > > Every year, I get a little closer to not using VMWare but it seems > like this year is going to be victory for VMWare again. > > CC to kvm mailing list but I expect, like my previous request for help > to the list, it will be rejected by mailman or a moderator. > > > Just damn frustrated, even if it's probably just me being too stupid > to know how to use KVM. > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] rcu: export rcu_note_context_switch() function
On Thu, Apr 28, 2011 at 01:01:04PM +0300, Avi Kivity wrote: > On 04/28/2011 12:52 PM, Gleb Natapov wrote: > >Signed-off-by: Gleb Natapov > >@@ -93,11 +93,7 @@ static inline int rcu_needs_cpu(int cpu) > > > > #endif /* #else #ifdef CONFIG_TINY_RCU */ > > > >-static inline void rcu_note_context_switch(int cpu) > >-{ > >-rcu_sched_qs(cpu); > >-rcu_preempt_note_context_switch(); > >-} > >+extern void rcu_note_context_switch(int cpu); > > > > Why are you uninlining this function? Why not export the two > functions it calls instead? > To keep exported interface uniformal between both rcu implementations. I do not think that rcu_note_context_switch() is inlined for performance reason since two functions it calls are quite big in rcutiny implementation. Paul what do you think? -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: SVM: Make dump_vmcb static, reduce text
On 04/25/2011 08:00 AM, Joe Perches wrote: dump_vmcb isn't used outside this module, make it static. Shrink text and object by ~1% by standardizing formats. $ size arch/x86/kvm/svm.o* text data bss dec hex filename 52910580 10072 63562f84a arch/x86/kvm/svm.o.new 53563580 10072 64215fad7 arch/x86/kvm/svm.o.old Applied, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2998292 ] XP Guest Crash on Ubuntu 10.04
Bugs item #2998292, was opened at 2010-05-08 01:28 Message generated for change (Comment added) made by epipe You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2998292&group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 7 Private: No Submitted By: cheesewright () Assigned to: Nobody/Anonymous (nobody) Summary: XP Guest Crash on Ubuntu 10.04 Initial Comment: OS: Ubuntu Server 64-bit 10.04, kernel 2.6.32 Server: Supermicro X8DTN+ 2x Xeon E5530 2.4GHz Nehalem quad core cpu, 12GB RAM Guest: Windows XP SP3 32-bit Below is my kvm command line: /usr/bin/kvm -S -M pc-0.11 -enable-kvm -m 1024 -smp 2 -name data2 -uuid 9a85ee7a-4db9-9108-df3c-61d04d3c943d -chardev socket,id=monitor,path=/var/lib/libvirt/qemu/data2.monitor,server,nowait -monitor chardev:monitor -localtime -boot c -drive file=/san/data2/winxp.img,if=ide,index=0,boot=on -drive file=/dev/cdrom,if=ide,media=cdrom,index=2 -net nic,macaddr=52:54:00:38:a8:af,vlan=0,name=nic.0 -net tap,fd=62,vlan=0,name=tap.0 -chardev pty,id=serial0 -serial chardev:serial0 -parallel none -usb -usbdevice tablet -vnc 0.0.0.0:2,password -k en-us -vga cirrus I have two XP SP3 guests which crash frequently on random events when connecting via remote desktop. The Windows Event Log reports Event ID 1003. These errors result in either a system freeze or full reboot. There are several types of errors recorded: Event Source: System Error Event Category: (102) Event ID: 1003 Error code 0005, parameter1 04c0, parameter2 8653a6c8, parameter3 , parameter4 . Event Source: System Error Event Category: (102) Event ID: 1003 Error code 000a, parameter1 3008, parameter2 0002, parameter3 , parameter4 804ff806. Event Source: System Error Event Category: (102) Event ID: 1003 Error code 100a, parameter1 3008, parameter2 0002, parameter3 , parameter4 804ff806. -- Comment By: J Snabb (epipe) Date: 2011-04-28 16:48 Message: I am seeing this problem also. Using Debian packaged qemu-kvm 0.12.5+dfsg-5 with Debian Linux kernel 2.6.32-5-amd64 on similar hardware. kvm command line is: /usr/bin/kvm -S -M pc-0.12 -enable-kvm -m 1024 -smp 2,sockets=2,cores=1,threads=1 -name -uuid ---- -nodefaults -chardev socket,id=monitor,path=/var/lib/libvirt/qemu/.monitor,server,nowait -mon chardev=monitor,mode=readline -rtc base=utc -boot c -drive file=/dev//XXX,if=none,id=drive-ide0-0-0,boot=on,format=raw -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -device rtl8139,vlan=0,id=net0,mac=XX:XX:XX:XX:XX:XX,bus=pci.0,addr=0x3 -net tap,fd=70,vlan=0,name=hostnet0 -chardev pty,id=serial0 -device isa-serial,chardev=serial0 -usb -vnc 127.0.0.1:6 -vga cirrus -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -- Comment By: Jes Sorensen (jessorensen) Date: 2010-11-26 18:01 Message: Hi, Are you still able to reproduce this with uptodate kvm/qemu? If you are, would you mind opening a bug in https://bugs.launchpad.net/qemu and close this one ? Does this only happen when you access the systems via remote desktop? (I presume this is some windows thing?) Thanks, Jes -- Comment By: cheesewright () Date: 2010-05-08 01:32 Message: The crash also happens when using kvm -M pc-0.12. It is easily reproducible after clicking through windows such control panel apps, or web browsers, for a few minutes. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2998292&group_id=180599 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] KVM: MMU: Fix 64-bit paging breakage on x86_32
On 04/28/2011 01:08 AM, Takuya Yoshikawa wrote: From: Takuya Yoshikawa Fix regression introduced by commit e30d2a170506830d5eef5e9d7990c5aedf1b0a51 KVM: MMU: Optimize guest page table walk On x86_32, get_user() does not support 64-bit values and we fail to build KVM at the point of 64-bit paging. This patch fixes this by using get_user() twice for that condition. Applied, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] rcu: export rcu_note_context_switch() function
On 04/28/2011 12:52 PM, Gleb Natapov wrote: Signed-off-by: Gleb Natapov @@ -93,11 +93,7 @@ static inline int rcu_needs_cpu(int cpu) #endif /* #else #ifdef CONFIG_TINY_RCU */ -static inline void rcu_note_context_switch(int cpu) -{ - rcu_sched_qs(cpu); - rcu_preempt_note_context_switch(); -} +extern void rcu_note_context_switch(int cpu); Why are you uninlining this function? Why not export the two functions it calls instead? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: nmi is broken?
On 04/28/2011 04:28 AM, OGAWA Hirofumi wrote: Avi Kivity writes: > On 04/24/2011 03:24 PM, Jan Kiszka wrote: >> > >> > This would cause IRQs to be delivered even if the PIT is masked, no? >> >> I checked the patch and our code again: NMI watchdog masking is managed >> via arch.vapics_in_nmi_mode and by re-checking the per-APIC mask >> situation in kvm_apic_local_deliver when delivering the NMI. >> >> So the patch looks correct - NMIs aren't acked like timer IRQs, the >> current logic is definitely wrong. > > Can you elaborate? Why aren't NMIs acked (if delivered via the PIC)? > Is the PIC programmed into auto-EOI mode or something? This seems to be complex stuff depending on hardware configurations. I'm not fully understanding though, current state of it is, Yes, PIC is in AEOI mode if linux is using IO-APIC. Um.., kvm says irq == 0 is mp_INT mode in MADT, not mp_ExtINT. That is correct, kvm doesn't connect the master 8259 output to the IOAPIC. Instead the 8259 is connected to LINT0 (which is configured for ExtInt when the IOAPIC is disabled, or for NMI which the NMI watchdog is enabled). However, now I can't see how it would work. auto EOI works on the INTA cycle, which would only occur if LINT0 is configured for ExtInt. If it is configured for NMI, I don't think it would issue the INTA cycle. So the NMI watchdog not working is actually correct for our hardware configuration! But I may be misunderstanding something here. So I guess system does automatically INTA cycle (and AEOI because of PIC config), or not connected via 8259A? (like in mpspec figure 5-2.) To checking it, I've tested in check_timer() of linux on the physical machine (irq==0 and mp_INT). The test is something like, if (pin1 != -1) { /* * Ok, does IRQ0 through the IOAPIC work? */ unmask_IO_APIC_irq(0); disable_8259_irq(0); if (timer_irq_works()) { even if I called disable_8259_irq(0), timer was still working via IO-APIC. Would this explain why? Sorry, I got lost - what does this explain? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/2] make kvm guest mode to be rcu quiescent state
CPU may spend a lot of time in a guest mode while other CPUs wait for rcu grace period to elapse. This patch series makes guest mode into quiescent state to shorten wait time. Gleb Natapov (2): rcu: export rcu_note_context_switch() function KVM: make guest mode entry to be rcu quiescent state include/linux/kvm_host.h |1 + include/linux/rcutiny.h |6 +- kernel/rcutiny.c |7 +++ kernel/rcutree.c |1 + 4 files changed, 10 insertions(+), 5 deletions(-) -- 1.7.2.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] rcu: export rcu_note_context_switch() function
Signed-off-by: Gleb Natapov --- include/linux/rcutiny.h |6 +- kernel/rcutiny.c|7 +++ kernel/rcutree.c|1 + 3 files changed, 9 insertions(+), 5 deletions(-) diff --git a/include/linux/rcutiny.h b/include/linux/rcutiny.h index 30ebd7c..8e5f7cf 100644 --- a/include/linux/rcutiny.h +++ b/include/linux/rcutiny.h @@ -93,11 +93,7 @@ static inline int rcu_needs_cpu(int cpu) #endif /* #else #ifdef CONFIG_TINY_RCU */ -static inline void rcu_note_context_switch(int cpu) -{ - rcu_sched_qs(cpu); - rcu_preempt_note_context_switch(); -} +extern void rcu_note_context_switch(int cpu); /* * Return the number of grace periods. diff --git a/kernel/rcutiny.c b/kernel/rcutiny.c index 0c343b9..3d715a4 100644 --- a/kernel/rcutiny.c +++ b/kernel/rcutiny.c @@ -78,6 +78,13 @@ void rcu_exit_nohz(void) #endif /* #ifdef CONFIG_NO_HZ */ +void rcu_note_context_switch(int cpu) +{ + rcu_sched_qs(cpu); + rcu_preempt_note_context_switch(); +} +EXPORT_SYMBOL_GPL(rcu_note_context_switch); + /* * Helper function for rcu_qsctr_inc() and rcu_bh_qsctr_inc(). * Also disable irqs to avoid confusion due to interrupt handlers diff --git a/kernel/rcutree.c b/kernel/rcutree.c index dd4aea8..0837d63 100644 --- a/kernel/rcutree.c +++ b/kernel/rcutree.c @@ -124,6 +124,7 @@ void rcu_note_context_switch(int cpu) rcu_sched_qs(cpu); rcu_preempt_note_context_switch(cpu); } +EXPORT_SYMBOL_GPL(rcu_note_context_switch); #ifdef CONFIG_NO_HZ DEFINE_PER_CPU(struct rcu_dynticks, rcu_dynticks) = { -- 1.7.2.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] KVM: make guest mode entry to be rcu quiescent state
KVM does not hold any references to rcu protected data when it switches CPU into a guest mode. In fact switching to a guest mode is very similar to exiting to userspase from rcu point of view. In addition CPU may stay in a guest mode for quite a long time (up to one time slice). Lets treat guest mode as quiescent state, just like we do with user-mode execution. Signed-off-by: Gleb Natapov --- include/linux/kvm_host.h |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 0bc3d37..a347bce 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -593,6 +593,7 @@ static inline void kvm_guest_enter(void) { account_system_vtime(current); current->flags |= PF_VCPU; + rcu_note_context_switch(smp_processor_id()); } static inline void kvm_guest_exit(void) -- 1.7.2.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/5] qemu-kvm: pci-assign: Some more cleanups
On 04/28/2011 11:59 AM, Jan Kiszka wrote: I found these patches in my shared-INTx queue. They do not depend on that topic (which is on my to do list, not on the top, but close), so I decided to repost them. Looks good, will apply after Alex reviews. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] KVM: VMX: Cache vmcs segment fields
Since the emulator now checks segment limits and access rights, it generates a lot more accesses to the vmcs segment fields. Undo some of the performance hit by cacheing those fields in a read-only cache (the entire cache is invalidated on any write, or on guest exit). Signed-off-by: Avi Kivity --- arch/x86/include/asm/kvm_host.h |1 + arch/x86/kvm/vmx.c | 102 +++ 2 files changed, 93 insertions(+), 10 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index afb0e69..d2ac8e2 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -136,6 +136,7 @@ enum kvm_reg_ex { VCPU_EXREG_CR3, VCPU_EXREG_RFLAGS, VCPU_EXREG_CPL, + VCPU_EXREG_SEGMENTS, }; enum { diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 3f6e9bf..d7463c3 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -162,6 +162,10 @@ struct vcpu_vmx { u32 ar; } tr, es, ds, fs, gs; } rmode; + struct { + u32 bitmask; /* 4 bits per segment (1 bit per field) */ + struct kvm_save_segment seg[8]; + } segment_cache; int vpid; bool emulation_required; @@ -174,6 +178,15 @@ struct vcpu_vmx { bool rdtscp_enabled; }; +enum segment_cache_field { + SEG_FIELD_SEL = 0, + SEG_FIELD_BASE = 1, + SEG_FIELD_LIMIT = 2, + SEG_FIELD_AR = 3, + + SEG_FIELD_NR = 4 +}; + static inline struct vcpu_vmx *to_vmx(struct kvm_vcpu *vcpu) { return container_of(vcpu, struct vcpu_vmx, vcpu); @@ -646,6 +659,62 @@ static void vmcs_set_bits(unsigned long field, u32 mask) vmcs_writel(field, vmcs_readl(field) | mask); } +static void vmx_segment_cache_clear(struct vcpu_vmx *vmx) +{ + vmx->segment_cache.bitmask = 0; +} + +static bool vmx_segment_cache_test_set(struct vcpu_vmx *vmx, unsigned seg, + unsigned field) +{ + bool ret; + u32 mask = 1 << (seg * SEG_FIELD_NR + field); + + if (!(vmx->vcpu.arch.regs_avail & (1 << VCPU_EXREG_SEGMENTS))) { + vmx->vcpu.arch.regs_avail |= (1 << VCPU_EXREG_SEGMENTS); + vmx->segment_cache.bitmask = 0; + } + ret = vmx->segment_cache.bitmask & mask; + vmx->segment_cache.bitmask |= mask; + return ret; +} + +static u16 vmx_read_guest_seg_selector(struct vcpu_vmx *vmx, unsigned seg) +{ + u16 *p = &vmx->segment_cache.seg[seg].selector; + + if (!vmx_segment_cache_test_set(vmx, seg, SEG_FIELD_SEL)) + *p = vmcs_read16(kvm_vmx_segment_fields[seg].selector); + return *p; +} + +static ulong vmx_read_guest_seg_base(struct vcpu_vmx *vmx, unsigned seg) +{ + ulong *p = &vmx->segment_cache.seg[seg].base; + + if (!vmx_segment_cache_test_set(vmx, seg, SEG_FIELD_BASE)) + *p = vmcs_readl(kvm_vmx_segment_fields[seg].base); + return *p; +} + +static u32 vmx_read_guest_seg_limit(struct vcpu_vmx *vmx, unsigned seg) +{ + u32 *p = &vmx->segment_cache.seg[seg].limit; + + if (!vmx_segment_cache_test_set(vmx, seg, SEG_FIELD_LIMIT)) + *p = vmcs_read32(kvm_vmx_segment_fields[seg].limit); + return *p; +} + +static u32 vmx_read_guest_seg_ar(struct vcpu_vmx *vmx, unsigned seg) +{ + u32 *p = &vmx->segment_cache.seg[seg].ar; + + if (!vmx_segment_cache_test_set(vmx, seg, SEG_FIELD_AR)) + *p = vmcs_read32(kvm_vmx_segment_fields[seg].ar_bytes); + return *p; +} + static void update_exception_bitmap(struct kvm_vcpu *vcpu) { u32 eb; @@ -1271,9 +1340,11 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data) break; #ifdef CONFIG_X86_64 case MSR_FS_BASE: + vmx_segment_cache_clear(vmx); vmcs_writel(GUEST_FS_BASE, data); break; case MSR_GS_BASE: + vmx_segment_cache_clear(vmx); vmcs_writel(GUEST_GS_BASE, data); break; case MSR_KERNEL_GS_BASE: @@ -1717,6 +1788,8 @@ static void enter_pmode(struct kvm_vcpu *vcpu) vmx->emulation_required = 1; vmx->rmode.vm86_active = 0; + vmx_segment_cache_clear(vmx); + vmcs_write16(GUEST_TR_SELECTOR, vmx->rmode.tr.selector); vmcs_writel(GUEST_TR_BASE, vmx->rmode.tr.base); vmcs_write32(GUEST_TR_LIMIT, vmx->rmode.tr.limit); @@ -1740,6 +1813,8 @@ static void enter_pmode(struct kvm_vcpu *vcpu) fix_pmode_dataseg(VCPU_SREG_GS, &vmx->rmode.gs); fix_pmode_dataseg(VCPU_SREG_FS, &vmx->rmode.fs); + vmx_segment_cache_clear(vmx); + vmcs_write16(GUEST_SS_SELECTOR, 0); vmcs_write32(GUEST_SS_AR_BYTES, 0x93); @@ -1803,6 +1878,8 @@ static void enter_rmode(struct kvm_vcpu *vcpu) vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu); } + vmx_segment_cach
[PATCH 1/2] KVM: x86 emulator: consolidate segment accessors
Instead of separate accessors for the segment selector and cached descriptor, use one accessor for both. This simplifies the code somewhat. Signed-off-by: Avi Kivity --- arch/x86/include/asm/kvm_emulate.h | 13 +--- arch/x86/kvm/emulate.c | 122 arch/x86/kvm/x86.c | 41 +++- 3 files changed, 83 insertions(+), 93 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index 28114f5..0049211 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -164,15 +164,10 @@ struct x86_emulate_ops { int size, unsigned short port, const void *val, unsigned int count); - bool (*get_cached_descriptor)(struct x86_emulate_ctxt *ctxt, - struct desc_struct *desc, u32 *base3, - int seg); - void (*set_cached_descriptor)(struct x86_emulate_ctxt *ctxt, - struct desc_struct *desc, u32 base3, - int seg); - u16 (*get_segment_selector)(struct x86_emulate_ctxt *ctxt, int seg); - void (*set_segment_selector)(struct x86_emulate_ctxt *ctxt, -u16 sel, int seg); + bool (*get_segment)(struct x86_emulate_ctxt *ctxt, u16 *selector, + struct desc_struct *desc, u32 *base3, int seg); + void (*set_segment)(struct x86_emulate_ctxt *ctxt, u16 selector, + struct desc_struct *desc, u32 base3, int seg); unsigned long (*get_cached_segment_base)(struct x86_emulate_ctxt *ctxt, int seg); void (*get_gdt)(struct x86_emulate_ctxt *ctxt, struct desc_ptr *dt); diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c index 1568c49..a8faf8d 100644 --- a/arch/x86/kvm/emulate.c +++ b/arch/x86/kvm/emulate.c @@ -553,6 +553,26 @@ static int emulate_nm(struct x86_emulate_ctxt *ctxt) return emulate_exception(ctxt, NM_VECTOR, 0, false); } +static u16 get_segment_selector(struct x86_emulate_ctxt *ctxt, unsigned seg) +{ + u16 selector; + struct desc_struct desc; + + ctxt->ops->get_segment(ctxt, &selector, &desc, NULL, seg); + return selector; +} + +static void set_segment_selector(struct x86_emulate_ctxt *ctxt, u16 selector, +unsigned seg) +{ + u16 dummy; + u32 base3; + struct desc_struct desc; + + ctxt->ops->get_segment(ctxt, &dummy, &desc, &base3, seg); + ctxt->ops->set_segment(ctxt, selector, &desc, base3, seg); +} + static int __linearize(struct x86_emulate_ctxt *ctxt, struct segmented_address addr, unsigned size, bool write, bool fetch, @@ -563,6 +583,7 @@ static int __linearize(struct x86_emulate_ctxt *ctxt, bool usable; ulong la; u32 lim; + u16 sel; unsigned cpl, rpl; la = seg_base(ctxt, ctxt->ops, addr.seg) + addr.ea; @@ -574,8 +595,8 @@ static int __linearize(struct x86_emulate_ctxt *ctxt, return emulate_gp(ctxt, 0); break; default: - usable = ctxt->ops->get_cached_descriptor(ctxt, &desc, NULL, - addr.seg); + usable = ctxt->ops->get_segment(ctxt, &sel, &desc, NULL, + addr.seg); if (!usable) goto bad; /* code segment or read-only data segment */ @@ -598,7 +619,7 @@ static int __linearize(struct x86_emulate_ctxt *ctxt, goto bad; } cpl = ctxt->ops->cpl(ctxt); - rpl = ctxt->ops->get_segment_selector(ctxt, addr.seg) & 3; + rpl = sel & 3; cpl = max(cpl, rpl); if (!(desc.type & 8)) { /* data segment */ @@ -1142,9 +1163,10 @@ static void get_descriptor_table_ptr(struct x86_emulate_ctxt *ctxt, { if (selector & 1 << 2) { struct desc_struct desc; + u16 sel; + memset (dt, 0, sizeof *dt); - if (!ops->get_cached_descriptor(ctxt, &desc, NULL, - VCPU_SREG_LDTR)); + if (!ops->get_segment(ctxt, &sel, &desc, NULL, VCPU_SREG_LDTR)); return; dt->size = desc_limit_scaled(&desc); /* what if limit > 65535? */ @@ -1305,8 +1327,7 @@ static int load_segment_descriptor(struct x86_emulate_ctxt *ctxt, return ret; } load: - ops->set_segment_selector(ctxt, selector, seg); - ops->set_cached_descriptor(ctxt, &seg_desc, 0, seg); + ops->set_segment(ctxt, selector,
[PATCH 0/2] Segment cleanups and optimizations
The first patch in this segment-themed patchset cleans up segment handling in the emulator; while the second patch undoes some of the performance hit taken by the emulator segment limit checks. Avi Kivity (2): KVM: x86 emulator: consolidate segment accessors KVM: VMX: Cache vmcs segment fields arch/x86/include/asm/kvm_emulate.h | 13 +--- arch/x86/include/asm/kvm_host.h|1 + arch/x86/kvm/emulate.c | 122 arch/x86/kvm/vmx.c | 102 +++--- arch/x86/kvm/x86.c | 41 +++- 5 files changed, 176 insertions(+), 103 deletions(-) -- 1.7.4.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/5] pci-assign: Convert need_emulate_cmd into a bitmask
Define a mask of PCI command register bits that need to be emulated, i.e. read back from their shadow state. We will need this for selectively emulating the INTx mask bit. Note: No initialization of emulate_cmd_mask to zero needed, the device state is already zero-initialized. Signed-off-by: Jan Kiszka --- hw/device-assignment.c | 11 +-- hw/device-assignment.h |2 +- 2 files changed, 6 insertions(+), 7 deletions(-) diff --git a/hw/device-assignment.c b/hw/device-assignment.c index 7ff1320..56b4832 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -513,9 +513,9 @@ again: DEBUG("(%x.%x): address=%04x val=0x%08x len=%d\n", (d->devfn >> 3) & 0x1F, (d->devfn & 0x7), address, val, len); -if (pci_dev->need_emulate_cmd) { +if (pci_dev->emulate_cmd_mask) { val = merge_bits(val, pci_default_read_config(d, PCI_COMMAND, 2), - address, len, PCI_COMMAND, 0x); + address, len, PCI_COMMAND, pci_dev->emulate_cmd_mask); } if (!pci_dev->cap.available) { @@ -778,10 +778,9 @@ again: /* dealing with virtual function device */ snprintf(name, sizeof(name), "%sphysfn/", dir); -if (!stat(name, &statbuf)) - pci_dev->need_emulate_cmd = 1; -else - pci_dev->need_emulate_cmd = 0; +if (!stat(name, &statbuf)) { +pci_dev->emulate_cmd_mask = 0x; +} dev->region_number = r; return 0; diff --git a/hw/device-assignment.h b/hw/device-assignment.h index 86af0a9..ae1bc58 100644 --- a/hw/device-assignment.h +++ b/hw/device-assignment.h @@ -109,7 +109,7 @@ typedef struct AssignedDevice { void *msix_table_page; target_phys_addr_t msix_table_addr; int mmio_index; -int need_emulate_cmd; +uint32_t emulate_cmd_mask; char *configfd_name; int32_t bootindex; QLIST_ENTRY(AssignedDevice) next; -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/5] pci-assign: Clean up assigned_dev_pci_read/write_config
Use rages_overlap and proper constants to match the access range against regions that need special handling. This also fixes yet uncaught high-byte write access to the command register. Moreover, use more constants instead of magic numbers. Signed-off-by: Jan Kiszka --- hw/device-assignment.c | 39 +-- 1 files changed, 29 insertions(+), 10 deletions(-) diff --git a/hw/device-assignment.c b/hw/device-assignment.c index 606d725..3481c93 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -404,13 +404,20 @@ static void assigned_dev_pci_write_config(PCIDevice *d, uint32_t address, return assigned_device_pci_cap_write_config(d, address, val, len); } -if (address == 0x4) { +if (ranges_overlap(address, len, PCI_COMMAND, 2)) { pci_default_write_config(d, address, val, len); /* Continue to program the card */ } -if ((address >= 0x10 && address <= 0x24) || address == 0x30 || -address == 0x34 || address == 0x3c || address == 0x3d) { +/* + * Catch access to + * - base address registers + * - ROM base address & capability pointer + * - interrupt line & pin + */ +if (ranges_overlap(address, len, PCI_BASE_ADDRESS_0, 24) || +ranges_overlap(address, len, PCI_ROM_ADDRESS, 8) || +ranges_overlap(address, len, PCI_INTERRUPT_LINE, 2)) { /* used for update-mappings (BAR emulation) */ pci_default_write_config(d, address, val, len); return; @@ -450,9 +457,20 @@ static uint32_t assigned_dev_pci_read_config(PCIDevice *d, uint32_t address, return val; } -if (address < 0x4 || (pci_dev->need_emulate_cmd && address == 0x4) || - (address >= 0x10 && address <= 0x24) || address == 0x30 || -address == 0x34 || address == 0x3c || address == 0x3d) { +/* + * Catch access to + * - vendor & device ID + * - command register (if emulation needed) + * - base address registers + * - ROM base address & capability pointer + * - interrupt line & pin + */ +if (ranges_overlap(address, len, PCI_VENDOR_ID, 4) || +(pci_dev->need_emulate_cmd && + ranges_overlap(address, len, PCI_COMMAND, 2)) || +ranges_overlap(address, len, PCI_BASE_ADDRESS_0, 24) || +ranges_overlap(address, len, PCI_ROM_ADDRESS, 8) || +ranges_overlap(address, len, PCI_INTERRUPT_LINE, 2)) { val = pci_default_read_config(d, address, len); DEBUG("(%x.%x): address=%04x val=0x%08x len=%d\n", (d->devfn >> 3) & 0x1F, (d->devfn & 0x7), address, val, len); @@ -483,10 +501,11 @@ do_log: if (!pci_dev->cap.available) { /* kill the special capabilities */ -if (address == 4 && len == 4) -val &= ~0x10; -else if (address == 6) -val &= ~0x10; +if (address == PCI_COMMAND && len == 4) { +val &= ~(PCI_STATUS_CAP_LIST << 16); +} else if (address == PCI_STATUS) { +val &= ~PCI_STATUS_CAP_LIST; +} } return val; -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/5] pci-assign: Fix dword read at PCI_COMMAND
If we emulate the command register, we must only read its content from the shadow config space. For dword read of both PCI_COMMAND and PCI_STATUS, at least the latter must be read from the device. For simplicity reasons and as the code path is not considered performance critical for the affected SRIOV devices, the fix performes device access to the command word unconditionally, even if emulation is enabled and only that word is read. Signed-off-by: Jan Kiszka --- hw/device-assignment.c |8 +--- 1 files changed, 5 insertions(+), 3 deletions(-) diff --git a/hw/device-assignment.c b/hw/device-assignment.c index f37f108..ee81434 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -482,14 +482,11 @@ static uint32_t assigned_dev_pci_read_config(PCIDevice *d, uint32_t address, /* * Catch access to * - vendor & device ID - * - command register (if emulation needed) * - base address registers * - ROM base address & capability pointer * - interrupt line & pin */ if (ranges_overlap(address, len, PCI_VENDOR_ID, 4) || -(pci_dev->need_emulate_cmd && - ranges_overlap(address, len, PCI_COMMAND, 2)) || ranges_overlap(address, len, PCI_BASE_ADDRESS_0, 24) || ranges_overlap(address, len, PCI_ROM_ADDRESS, 8) || ranges_overlap(address, len, PCI_INTERRUPT_LINE, 2)) { @@ -521,6 +518,11 @@ do_log: DEBUG("(%x.%x): address=%04x val=0x%08x len=%d\n", (d->devfn >> 3) & 0x1F, (d->devfn & 0x7), address, val, len); +if (pci_dev->need_emulate_cmd) { +val = merge_bits(val, pci_default_read_config(d, PCI_COMMAND, 2), + address, len, PCI_COMMAND, 0x); +} + if (!pci_dev->cap.available) { /* kill the special capabilities */ if (address == PCI_COMMAND && len == 4) { -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/5] pci-assign: Move merge_bits
We will need it earlier in the file, so move it unmodified to the top. Signed-off-by: Jan Kiszka --- hw/device-assignment.c | 44 ++-- 1 files changed, 22 insertions(+), 22 deletions(-) diff --git a/hw/device-assignment.c b/hw/device-assignment.c index 3481c93..f37f108 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -71,6 +71,28 @@ static void assigned_device_pci_cap_write_config(PCIDevice *pci_dev, static uint32_t assigned_device_pci_cap_read_config(PCIDevice *pci_dev, uint32_t address, int len); +/* Merge the bits set in mask from mval into val. Both val and mval are + * at the same addr offset, pos is the starting offset of the mask. */ +static uint32_t merge_bits(uint32_t val, uint32_t mval, uint8_t addr, + int len, uint8_t pos, uint32_t mask) +{ +if (!ranges_overlap(addr, len, pos, 4)) { +return val; +} + +if (addr >= pos) { +mask >>= (addr - pos) * 8; +} else { +mask <<= (pos - addr) * 8; +} +mask &= 0xU >> (4 - len) * 8; + +val &= ~mask; +val |= (mval & mask); + +return val; +} + static uint32_t assigned_dev_ioport_rw(AssignedDevRegion *dev_region, uint32_t addr, int len, uint32_t *val) { @@ -1278,28 +1300,6 @@ static uint8_t find_vndr_start(PCIDevice *pci_dev, uint32_t address) return cap; } -/* Merge the bits set in mask from mval into val. Both val and mval are - * at the same addr offset, pos is the starting offset of the mask. */ -static uint32_t merge_bits(uint32_t val, uint32_t mval, uint8_t addr, - int len, uint8_t pos, uint32_t mask) -{ -if (!ranges_overlap(addr, len, pos, 4)) { -return val; -} - -if (addr >= pos) { -mask >>= (addr - pos) * 8; -} else { -mask <<= (pos - addr) * 8; -} -mask &= 0xU >> (4 - len) * 8; - -val &= ~mask; -val |= (mval & mask); - -return val; -} - static uint32_t assigned_device_pci_cap_read_config(PCIDevice *pci_dev, uint32_t address, int len) { -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/5] pci-assign: Remove suspicious hunk from assigned_dev_pci_read_config
No one can remember where this came from, and it looks very hacky anyway (we return 0 for config space address 0xfc of _every_ assigned device, not only vga as the comment claims). So better remove it and wait for the underlying issue to reappear. Signed-off-by: Jan Kiszka --- hw/device-assignment.c |5 - 1 files changed, 0 insertions(+), 5 deletions(-) diff --git a/hw/device-assignment.c b/hw/device-assignment.c index ee81434..7ff1320 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -496,10 +496,6 @@ static uint32_t assigned_dev_pci_read_config(PCIDevice *d, uint32_t address, return val; } -/* vga specific, remove later */ -if (address == 0xFC) -goto do_log; - fd = pci_dev->real_device.config_fd; again: @@ -514,7 +510,6 @@ again: exit(1); } -do_log: DEBUG("(%x.%x): address=%04x val=0x%08x len=%d\n", (d->devfn >> 3) & 0x1F, (d->devfn & 0x7), address, val, len); -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/5] qemu-kvm: pci-assign: Some more cleanups
I found these patches in my shared-INTx queue. They do not depend on that topic (which is on my to do list, not on the top, but close), so I decided to repost them. Jan Kiszka (5): pci-assign: Clean up assigned_dev_pci_read/write_config pci-assign: Move merge_bits pci-assign: Fix dword read at PCI_COMMAND pci-assign: Remove suspicious hunk from assigned_dev_pci_read_config pci-assign: Convert need_emulate_cmd into a bitmask hw/device-assignment.c | 97 +++ hw/device-assignment.h |2 +- 2 files changed, 57 insertions(+), 42 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [CentOS] Install CentOS as KVM guest
On 4/28/11, Simon Grinberg wrote: > What version of VMWare are you using? Currently, I'm not using VMWare yet on this new server as I really do hope to be able to use an "unified" solution. But so far, it's just one brickwall after another. I've given myself until this weekend to get things working or just go the easy way. Previously, I've used VMServer 2 as well as VMPlayer 3. All running off CentOS 5.x host. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: faking vendor-id to guest for driver-install-triggering?
On 04/27/2011 04:15 PM, Oliver Rath wrote: Hi there, is it possible to announce the kvm-guest (i.e. winxp) some arbitrary vendor-id's in a way, that the win-client starts to install the driver for this "card"? I.e. the RTL8111/8168B PCI Express Gigabit Ethernet Card has the vendor-id 10ec:8168 (taken from lspci -nn), so if i give this ID to the kvm-guest, he should install the driver for this card. It isn't possible without some hacking. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: performance of virtual functions compared to virtio
On 04/28/2011 12:13 AM, David Ahern wrote: Is the following depict where copies are done for virtio-net? Yes. You should have been an artist. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [CentOS] Install CentOS as KVM guest
- Original Message - > From: "Emmanuel Noobadmin" > To: "CentOS mailing list" , "kvm" > Sent: Tuesday, April 26, 2011 11:57:18 PM > Subject: Re: [CentOS] Install CentOS as KVM guest > Unfortunately, things still don't work. > > > It's just ridiculous that the installer under KVM does not detect the > cdrom drive it was booted from. Trying to do a net-install doesn't > work, maybe I messed up the networking even though br0 and eth0 is > working on the host. > > Nevermind, let's install apache and use the mounted ISO. Verified > apache is working and the CentOS folder is accessible via web browser. > But, still the guest installer cannot seem to access the installation > files. > > OK, so maybe I messed up the networking, after all I am the noob... > maybe specifying --network=bridge:br0 isn't enough. There was > something about a tap or tunnel when initially reading up on bridged > networking. Looking up more on this, there are several resources > (sorry KVM FAQ leads to a page that no longer exist) which like many > other instructions, give the commands without really explaining > what/why. > > So I have to run some tunctl command and scripts to add a bridge and > tunnel/tap... but wait, I already made my bridge, will running the > script kill my networking by creating a second bridge? Especially the > warning about temporarily loosing connectivity due to ifcfg1 being > reset. > > And if I need to run this script everytime in order to activate the > bridge and tunnel, doesn't that mean all my guest OS are screwed if > the host reboots while I'm not around? Shouldn't these things go into > permanent files like if-tun0 or something? > > Every year, I get a little closer to not using VMWare but it seems > like this year is going to be victory for VMWare again. What version of VMWare are you using? > > CC to kvm mailing list but I expect, like my previous request for help > to the list, it will be rejected by mailman or a moderator. > > > Just damn frustrated, even if it's probably just me being too stupid > to know how to use KVM. > ___ > CentOS mailing list > cen...@centos.org > http://lists.centos.org/mailman/listinfo/centos -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug in KVM clock backwards compensation
On Thu, Apr 28, 2011 at 03:06:01AM -0400, Jan Kiszka wrote: > And /me still wonders (like I did when this first popped up) if the > proper place of determining TSC stability really have to be KVM. > > If the Linux core fails to detect some instability and KVM has to jump > in, shouldn't we better improve the core's detection abilities and make > use of them in KVM? Conceptually this looks like we are currently just > working around a core deficit in KVM. Yes, good question. Has this ever triggered on a real machine (not counting the suspend/resume issue in)? Regards, Joerg -- AMD Operating System Research Center Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach General Managers: Alberto Bozzo, Andrew Bowd Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug in KVM clock backwards compensation
On Thu, Apr 28, 2011 at 02:59:57AM -0400, Zachary Amsden wrote: > So I've been going over the new code changes to the TSC related code and > I don't like one particular set of changes. In particular, here: > > kvm_x86_ops->vcpu_load(vcpu, cpu); > if (unlikely(vcpu->cpu != cpu) || check_tsc_unstable()) { > /* Make sure TSC doesn't go backwards */ > s64 tsc_delta; > u64 tsc; > > kvm_get_msr(vcpu, MSR_IA32_TSC, &tsc); > tsc_delta = !vcpu->arch.last_guest_tsc ? 0 : > tsc - vcpu->arch.last_guest_tsc; > > if (tsc_delta < 0) > mark_tsc_unstable("KVM discovered backwards TSC"); > if (check_tsc_unstable()) { > kvm_x86_ops->adjust_tsc_offset(vcpu, -tsc_delta); > vcpu->arch.tsc_catchup = 1; > } > > > The point of this code fragment is to test the host clock to see if it > is stable, because we may have just come back from an idle phase which > stopped the TSC, switched CPUs, or come back from a deep sleep state > which reset the host TSC. I see it different. This code wants to check if the _guest_ tsc moves forwared (or at least not backwards). So it is fully legitimate to just do this by reading the guest-tsc and compare it to the last one the guest had. > I saw a patch floating around that touched this code recently, but I > think there's a definite issue here that needs addressing. In fact, this change was done to address one of your concerns. You mentioned that the values passed to adjust_tsc_offset() were in unconsistent units in my first version of tsc-scaling. This was a right objection because one call-site used guest-tsc-units while the other used host-tsc-units. This change intended to fix that by using guest-tsc-units always for adjust_tsc_offset(). Not that the guest and the host tsc have the same units on current machines. But with tsc-scaling these units are different. Regards, Joerg -- AMD Operating System Research Center Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach General Managers: Alberto Bozzo, Andrew Bowd Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug in KVM clock backwards compensation
On 2011-04-28 08:59, Zachary Amsden wrote: > So I've been going over the new code changes to the TSC related code and > I don't like one particular set of changes. In particular, here: > > kvm_x86_ops->vcpu_load(vcpu, cpu); > if (unlikely(vcpu->cpu != cpu) || check_tsc_unstable()) { > /* Make sure TSC doesn't go backwards */ > s64 tsc_delta; > u64 tsc; > > kvm_get_msr(vcpu, MSR_IA32_TSC, &tsc); > tsc_delta = !vcpu->arch.last_guest_tsc ? 0 : > tsc - vcpu->arch.last_guest_tsc; > > if (tsc_delta < 0) > mark_tsc_unstable("KVM discovered backwards TSC"); > if (check_tsc_unstable()) { > kvm_x86_ops->adjust_tsc_offset(vcpu, -tsc_delta); > vcpu->arch.tsc_catchup = 1; > } > > > The point of this code fragment is to test the host clock to see if it > is stable, because we may have just come back from an idle phase which > stopped the TSC, switched CPUs, or come back from a deep sleep state > which reset the host TSC. > > However, the above code is fetching the guest TSC instead of the host > TSC, which isn't the way it is supposed to work. > > I saw a patch floating around that touched this code recently, but I > think there's a definite issue here that needs addressing. And /me still wonders (like I did when this first popped up) if the proper place of determining TSC stability really have to be KVM. If the Linux core fails to detect some instability and KVM has to jump in, shouldn't we better improve the core's detection abilities and make use of them in KVM? Conceptually this looks like we are currently just working around a core deficit in KVM. Jan signature.asc Description: OpenPGP digital signature
Bug in KVM clock backwards compensation
So I've been going over the new code changes to the TSC related code and I don't like one particular set of changes. In particular, here: kvm_x86_ops->vcpu_load(vcpu, cpu); if (unlikely(vcpu->cpu != cpu) || check_tsc_unstable()) { /* Make sure TSC doesn't go backwards */ s64 tsc_delta; u64 tsc; kvm_get_msr(vcpu, MSR_IA32_TSC, &tsc); tsc_delta = !vcpu->arch.last_guest_tsc ? 0 : tsc - vcpu->arch.last_guest_tsc; if (tsc_delta < 0) mark_tsc_unstable("KVM discovered backwards TSC"); if (check_tsc_unstable()) { kvm_x86_ops->adjust_tsc_offset(vcpu, -tsc_delta); vcpu->arch.tsc_catchup = 1; } The point of this code fragment is to test the host clock to see if it is stable, because we may have just come back from an idle phase which stopped the TSC, switched CPUs, or come back from a deep sleep state which reset the host TSC. However, the above code is fetching the guest TSC instead of the host TSC, which isn't the way it is supposed to work. I saw a patch floating around that touched this code recently, but I think there's a definite issue here that needs addressing. Zach -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html